These workarounds describe how to use:
a. application-level partitioning
b. table partitioning for on-disk tables that contain cold data, in combination with memory-optimized tables for hot data.
Both of these workarounds maintain separate tables with identical schema, which for existing apps would require changes in order to know which table to insert/update/delete rows in. These workarounds might be acceptable for new apps, but technologists are not crazy about changing existing applications.
Even if we accept that these are viable solutions for existing applications, there are other potential problems with using either of these approaches.
An OLTP database schema is usually highly normalized, with lots of parent/child relationships, and those relationships are usually enforced with PRIMARY KEY and FOREIGN KEY constraints. SQL 2016 allows us to implement PK/FK constraints for memory-optimized tables, but only if all participating tables are memory-optimized.
That leads us to an interesting problem:
How can we enforce PK and FK relationships if a database contains both disk-based and memory-optimized tables, when each table requires the same validation?
In a simplified scenario, let’s say we have the following tables:
Parent table: memory-optimized, States_InMem
Child table 1: memory-optimized, contains hot data, Addresses_InMem
Child table 2: disk-based, contains cold data, Addresses_OnDisk
We must satisfy at least three conditions:
a. Condition 1: an insert/update on the memory-optimized child table must validate StateID
b. Condition 2: an insert/update on the disk-based child table must validate StateID
c. Condition 3: deleting a row from the parent table must not create orphaned child records
Assume Addresses_InMem has a column named StateID that references States_InMem.StateID.
If we create the States_InMem table as memory- optimized, the Addresses_InMem table can define a FOREIGN KEY that references it. Condition 1 is satisfied.
The disk-based Addresses_Disk table can use a trigger to validate the StateID for inserts or updates. Condition 2 is satisfied.
If we want to delete a record from the memory-optimized Parent table (States_InMem), the FK from memory-optimized Addresses_InMem will prevent the delete if child records exist (assuming we don’t cascade).
Triggers on memory-optimized tables must be natively compiled, and that means they cannot reference disk-based tables. Therefore, when you want to delete a record from the memory-optimized parent table, triggers cannot be used to enforce referential integrity to the disk-based child table.
Without a trigger or a parent/child relationship enforced at the database level, it will be possible to delete a record from States_InMem that references Addresses_OnDisk, thereby creating an orphaned child record. Condition 3 is NOT satisfied.
This “memory-optimized triggers cannot reference disk-based tables” issue also prevents the parent table from being disk-based (described next).
Parent table: disk-based, States_OnDisk
Child table 1: Hot data in memory-optimized table, Addresses_InMem
Child table 2: Cold data in disk-based table, Addresses_Disk
We can only define PK/FK between memory-optimized tables, so that won’t work for validating Addresses_InMem.StateID
As just described, we cannot use triggers on Addresses_InMem to enforce referential integrity, because triggers on memory-optimized tables must be natively compiled, and that means they cannot reference disk-based tables (States_OnDisk).
One solution might be to have all DML for this type of lookup table occur through interop stored procedures. But this has some drawbacks:
1. if a stored procedure must access both disk-based and memory-optimized tables, it cannot be natively compiled
2. Without PRIMARY and FOREIGN KEY rules enforced at the database engine level, invalid data can be introduced
Ideally we would like to have only a single copy of the parent table that can be referenced from either disk-based or memory-optimized child tables.
Separate “lookup” database
You might think that you can simply put reference tables in a separate database, but this approach won’t work, because memory-optimized tables don’t support cross-database queries. Also, the example of the States lookup table is overly simplified – it’s a single table that is a parent to child tables, but itself has no parent.
What if the tables were not Addresses and States, but instead Orders and OrderDetails? Orders might have a parent record, which can also have a parent record, and so on. Even if it was possible to place referenced tables in a separate database, this complexity will likely prevent you from doing so.
For small lookup tables with no “parent”, one potential solution would be to store the reference data twice (on disk and in-memory). In this scenario you would modify only the disk-based table, and use triggers on the disk-based table to keep the memory-optimized lookup table in synch.
Entire table in memory
Of course if you put entire tables in memory (a single table that holds both hot and cold data), all of these problems go away. Depending on the complexity of the data model, this solution might work. However, placing both hot and cold data in memory will affect recovery time, and therefore RTO (see my other blog post on recovery for databases with memory-optimized data here).
All data in memory
You could also put your entire database in memory, but In-Memory OLTP isn’t designed for this. Its purpose is to locate tables with the highest activity to memory (or a subset of data for those hot tables). Putting your entire database in memory has even more impact on RTO than placing hot/cold data for a few tables in memory.
Also, cold data won’t benefit from most of what In-Memory OLTP has to offer, as by definition cold data rarely changes. However, there will likely be some benefit from querying data that resides solely in memory-optimized tables (no latching/locking).
If your data is temporal in nature, it’s possible to use the new Temporal table feature of SQL 2016 to solve part of the issues discussed. It would work only for memory-optimized tables that are reference tables, like the States table.
You could define both the memory-optimized reference table and your memory-optimized referencing tables to be temporal, and that way the history of both over time is captured. At a given point in time, an Addresses record referenced a specific version of the States record (this will also work for disk-based tables, but the subject of this blog post is how In-Memory OLTP can be used to handle hot/cold data).
It’s recommended to use a clustered columnstore index on the history table to minimize the storage footprint and maximize query performance. Partitioning of the history table is also supported.
If due to regulatory requirements multiple years of data must be retained, then you could create a view that encompassed both archival and hot data in memory-optimized temporal tables. And removing large amounts of data from the archival tables can easily be done with partitioning. But adding large amounts of data to the archival tables cannot be done seamlessly, because as mentioned earlier, partitioning is not supported for memory-optimized tables.
Down the road
With the current limitations on triggers, foreign keys, and partitioning for memory-optimized tables, enforcing referential integrity with a mix of hot and cold schemas/tables remains a challenge.