Category Archives: In-Memory OLTP

Clustered columnstore: on-disk vs. in-mem

This post will highlight the fairly wide gap in functionality between clustered columnstore indexes for on-disk tables compared to memory-optimized tables, for SQL 2016.

The first thing to note – which is not specific to columnstore for either type of table – is that Microsoft chose a really poor name for this type of index.

Why do I say that?

For the following reasons:

1. Traditional/regular clustered indexes physically sort data, but data within a columnstore index is unordered (that’s true for clustered and nonclustered columnstore indexes).

2. For a traditional/regular clustered index, the key columns are added to all nonclustered indexes. Columnstore indexes don’t have a “key” so nothing about them gets copied to any nonclustered indexes.

Now that we’ve established the issues with the name – and the potential confusion as a result – let’s get to the differences between clustered columnstore indexes for on-disk tables compared to memory-optimized tables.

The only type of columnstore index you are allowed to create on a memory-optimized table is a clustered columnstore. But as you’ll find out in a moment, that naming  convention has it’s own set of issues when discussing memory-optimized tables…..

1. Primary data source

On-disk: When you create a clustered columnstore index on a disk-based table, it actually replaces the rowstore, and becomes the primary data source. Columnstore compression typically reduces the storage footprint greatly (often by ~90% or so), depending on the specific data types represented in the table.

Memory-optimized: When you create a clustered columnstore index on a memory-optimized table, the rows in memory remain the primary data source. Creating a “clustered columnstore” index on a memory-optimized table actually increases the storage footprint (same for nonclustered HASH and RANGE indexes ) by about 10%. That’s why I say that all indexes on memory-optimized tables are nonclustered (because they are). Keep in mind that “storage footprint” in this case means “additional memory”. Despite what you may have read or heard elsewhere, clustered columnstore indexes on memory-optimized tables are persisted to disk, so that also take up space on your persistent storage. This reason this is done is to reduce the impact on RTO (see my blog post here for more info).

2. Schema changes

On-disk: After adding a clustered columnstore index to an on-disk table, you can use ALTER TABLE to modify the schema, adding or dropping both columns and nonclustered indexes.

Memory-optimized: After adding a clustered columnstore index to an memory-optimized table, you can not use ALTER TABLE to modify the schema. The schema becomes read-only, but of course you can still perform DML on the table.

3. LOB data

On-disk: You can create a clustered columnstore index on an on-disk table that has LOB data in vNext (not SQL 2014 or 2016).

Memory-optimized: As of vNext RC1, LOBs are not supported for memory-optimized tables, although this might change by the time we get to RTM.

(Update: 27 Aug 2017 – SQL 2017 does not support LOB columns for CCI on memory-optimized tables)

4. NC indexes

On-disk: After you create a clustered columnstore index on an on-disk table, you can use ALTER TABLE to create nonclustered indexes.

Memory-optimized: After you create a clustered columnstore index on a memory-optimized table, you can not use ALTER TABLE to create nonclustered indexes, but they can be created inline as part of the CREATE TABLE statement.

5. Parallelism

On-disk: When you query an on-disk table that has a columnstore index, the database engine can use parallelism to process the results more quickly.

Memory-optimized: When you query a memory-optimized table that has a columnstore index, the database engine can use parallelism to process the results more quickly, BUT, that statement is only true if you use interop. Natively compiled modules are always executed serially.

6. Archival compression

On-disk: There are two compression options for on-disk tables: columnstore and columnstore archival.

Memory-optimized: There is only one compression option for memory-optimized tables: columnstore.

7. Index maintenance

On-disk: You can use REBUILD or REORGANIZE for clustered columnstore indexes for on-disk tables.

Memory-optimized: REORGANIZE is not supported for memory-optimized tables, but in order to migrate rows from the virtual delta rowgroup into compressed rowgroups, you can use: sys.sp_memory_optimized_cs_migration. REBUILD is supported, but only for changing the bucket count for HASH indexes.

8. Filtered indexes

On-disk: For on-disk tables, it’s possible to create a nonclustered columnstore index with a filter.

Memory-optimized: For memory-optimized tables, it’s not possible to create a clustered columnstore index with a filter (and you cannot create a nonclustered columnstore index).

9. Temporary structures

On-disk: I’m not sure why you’d want to do it, but it’s possible to create a clustered columnstore index on a #temp table.

Memory-optimized: The equivalent of #temp tables would be memory-optimized table variables. It’s not possible to create clustered columnstore indexes on them, and besides, memory-optimized table variables are only able to be created with the body of a natively compiled module, and as we discussed earlier in this post, natively compiled modules cannot make use of columnstore indexes.

10. Other items to note

There are a few more wrinkles in the memory-optimized universe for columnstore indexes…..

Durability: The memory-optimized table must be durable (created with DURABILITY = SCHEMA_AND_DATA)

Off-row: You cannot create a clustered columnstore index on a memory-optimized table that has off-row columns

11. (special thanks to Niko Neugebauer)

On-disk: Columnstore contains Vertipaq optimisations (this changes for CCI when we build secondary nonclustered rowstore indexes).

Memory-optimzed: Columnstore does not contain Vertipaq optimisations.

12. Computed columns (special thanks to Niko Neugebauer)

On-disk: CCI supports computed columns

Memory-optimzed: CCI does not support computed columns (but memory-optimized tabes support computed columns and indexes on computed columns)

How NOT to benchmark In-Memory OLTP

In forums over the last few months, I’ve seen a number of posts like the following:

  • “I tested native compilation, and it’s not much faster than using interpreted TSQL”
  • “I’m seeing performance issues with memory-optimized tables”

Tools and latency

Sometimes the bottleneck is the tool that’s used for testing. One person was using Ostress.exe and logging output to a file, using the –o parameter. This caused the benchmark they ran for memory-optimized tables to actually perform worse than disk-based tables! The overhead of logging Ostress output to disk created a high degree of latency, but once they removed the –o parameter, In-Memory OLTP performed super-fast for their workload.

Across the wire

Client/server messaging has overhead, and this cannot be improved by using In-Memory OLTP. Whether you’re returning one million rows from a disk-based table or a memory-optimized table, you’re still sending one million rows across the wire, which is not a valid test of In-Memory OLTP performance.

Core count

When you do a proof of concept, you should keep in mind that In-Memory OLTP is designed to work with many cores, and many concurrent processes. If you do your POC on a laptop with a single-threaded workload, In-Memory OLTP is not likely to deliver orders-of-magnitude performance benefits.

Simple queries used for testing Native Compilation

If you test with a query like:

SELECT col1
FROM table1

 

then native compilation will probably not be much faster than disk-based tables. Native compilation will show the greatest benefit when encapsulating complex business logic.

“Test” workloads

Doing a proof of concept with a contrived workload will not accurately determine if your real workload would benefit from migrating some or all data to In-Memory OLTP. The best way to do a proof of concept would be to use a copy of your production database with a realistic workload. You could run against disk-based tables first, and after migrating data to In-Memory, you could re-run and compare the results.

Deploying In-Memory OLTP can increase workload performance in several ways:

  • latch/lock free architecture
  • reduced/enhanced logging – modifications to indexes are not logged, and also the entire logging process has been redesigned for memory-optimized tables
  • interpreted TSQL overhead
  • temp table/tempdb overhead
  • excessive computation

Obviously you’d need to have a reasonable amount of concurrent activity in order to determine if In-Memory OLTP would achieve performance gains for your workload.

Monitoring In-Memory OLTP Resource Pools

For databases that contain memory-optimized data, it’s considered a best practice to create a separate resource pool that limits memory consumption for memory-optimized objects. If you don’t use use Resource Governor to constrain the amount of memory allocated to memory-optimized objects, it’s possible that the buffer pool will respond to memory pressure, shrink, and performance for traditional/on-disk tables will be impacted (using Resource Governor also allows you to track memory consumption specific to that database). We must carefully monitor the amount of memory used by the resource pool, and respond when it goes beyond a certain threshold. If we don’t, then we can reach an out-of-memory condition (OOM).

This blog post details the various approaches I considered.

Option 1

SQL Agent Performance Condition Alerts can respond to changes on the “Resource Pool Stats” object. This gets us pretty close to the stated goal, but unfortunately you have to hard code a threshold value for “Used memory (KB)”. If you needed to deploy this type of monitoring on many servers, and over time some of them upgrade physical memory, you have to remember to go back and adjust the hard-coded value for the new threshold. I was hoping to be able to evaluate this as a percentage of “Target Memory (KB)”, and then alert above x%, but that doesn’t seem to be possible with Performance Condition Alerts.

image

Option 2

SQL Server Event Alerts can take action when a specific error number occurs. I checked sys.sysmessages for references to the word ‘pool’, but as far as I can tell, none of them allow you to proactively respond to memory conditions before you run out of memory. We have errors like:

  • Disallowing page allocations for database ‘%.*ls’ due to insufficient memory in the resource pool ‘%ls’.
  • There is insufficient memory in the resource pool ‘%ls’ to run this operation on memory-optimized tables.
  • There is insufficient system memory in resource pool ‘%ls’ to run this query.

image

Option 3

Another possibility would be to have a TSQL Agent job that constantly polls sys.dm_os_performance_counters:

This would allow you to calculate the ratio you want to alert on, and then send an email. I like this approach for its simplicity, but you’d have to compare the value of cntr_value for two different rows to calculate the percentage.

Option 4

I looked into WMI, and the documentation is pretty bad about exactly what’s supported for SQL Server. I was surprised to find that the class I needed was part of root\cimv2, not in the SQL Server branch. Note that there can be security issues with accessing WMI, so that might be a concern for your environment (NOTE: WMI is pretty old, so it’s not recommended to keep using it. Instead, we should CIM, Common Information Model).

This does exactly what we need:

No matter how much memory is assigned to the pool, it calculates the percentage that’s used. If the value of Used memory exceeds a pre-defined percentage of Target memory, you might send an email, write to the SQL Errorlog (and then notify based on that error number), or use other options for logging and notification.

Transactional Replication meets In-Memory OLTP

Transactional replication hasn’t changed much since it was re-written for SQL 2005. However, with the release of SQL 2014 , there is at least one new possibility: memory-optimized tables at the subscriber.

With the release of SQL 2016, some of the restrictions for memory-optimized subscriber tables have been lifted:

  • snapshot schema files that create the memory-optimized tables no longer have to be manually modified (but see “Gotcha #6, silent schema killer” below)
  • tables replicated to memory-optimized tables on a subscriber are no longer limited to the SQL 2014 max row length of 8060 bytes for memory-optimized tables. This seems sort of moot, because published tables cannot themselves be memory-optimized, and are therefore still restricted to 8060 bytes. However, if for some reason you needed to add a lot of columns to the subscriber table that cause it to be greater than 8060 bytes, you can do it. Note that there is no limit on how large a row can be for memory-optimized tables in SQL 2016. The following statement is perfectly valid:

 

Why would you want to use memory-optimized tables in a subscriber database? There can only be one answer: speed.

Subscriber latency due to data volume could be a result of the following, in combination with each other, or individually:

  • excessive logging – changes to indexes are not logged for memory-optimized tables, and in general logging is much more efficient than for traditional/on-disk tables
  • locking – no locks are taken for DML statements that touch memory-optimized tables
  • blocking – blocking as a result of a transaction making changes to rows is not possible for memory-optimized tables
  • latching – no latches are taken on memory-optimized tables

The design of the In-Memory OLTP engine can alleviate latency due to these issues – BUT – before you start jumping for joy, you’ll need to be aware of the impact of deploying In-Memory OLTP in general.

DBAs love to tune things (indexes, queries, etc.), and subscriber tables are no exception. Until SQL 2014, when memory-optimized subscriber tables were introduced, some of the things that DBAs tuned on the subscriber included:

  • compression settings
  • different ways that the data in subscriber tables can be reinitialized, i.e. TRUNCATE TABLE, DELETE, DROP/CREATE table, or do nothing (these choices are for the ‘Action if name is in use’ section of the ‘Destination Object’, see the next screen shot).
  • custom indexes
  • snapshot isolation

For reinitializing, being able to use TRUNCATE TABLE is a great benefit, because all custom indexes and compression settings are retained for the destination table. If you choose drop/create, all compression settings and custom indexing must be reapplied to the subscriber table upon (re)initialization.

Deployment considerations

Article properties

On the dialog for Article Properties, you’ll need to make sure that both “Enable Memory Optimization” and “Convert clustered index to nonclustered index for memory optimized article” are set to “True”. Despite what you might have read, there is no concept of a “clustered” index for a memory-optimized table. If you have a clustered index on the published table, the DDL will fail when applied on the subscriber unless you set this option.

ReplicationArticleMemOptHighlighted

Subscription Properties

The Subscription Properties can be configured when initially creating the subscription:

ReplicationSubscriptionMemOptHighlight

or from the Subscription Properties dialog, if the subscription already exists:

ReplicationSubscriptionHighlilghted

Gotcha #1, DML support

Reinitialization is likely to happen at some point in the future, and so we’ll need to make the correct choice for “Action if name is in use”, on the same Article Properties dialog.

TRUNCATE TABLE is not supported for memory-optimized tables. If the table must be dropped, you’ll have to reapply scripts to handle any subscriber-level customization.

ReplicationArticleNameInUse_dropdown

Gotcha #2, compression

On-disk tables are stored in pages. Memory-optimized tables are not stored in pages, and therefore don’t support any form of compression (columnstore indexes on memory-optimized tables create a separate compressed copy of the rows, but the primary data source remains the rows in memory).

Gotcha #3, potential WRITELOG bottleneck

All DML operations on durable memory-optimized tables are fully logged, regardless of database-level recovery settings (for more details, see my post on “Optimizing Data Load” here). If deploying In-Memory OLTP solves the latency issues your app was experiencing, WRITELOG is likely to become one of the top waits. This prevents realizing the full potential of deploying In-Memory OLTP, but fear not – as of SQL 2016/SP1, NVDIMM is supported for the transaction log, reducing/eliminating the log as a performance bottleneck. See the link here for more detail.

Gotcha #4, impact on RTO

If by chance you must restore a subscriber database that contains a lot of durable memory-optimized data (I realize that “a lot” is subjective), RTO will be affected. That’s because the number and placement of containers has a significant effect on the amount of time required to recover a database that contains durable memory-optimized data. See my post “In-Memory OLTP: The moving target that is RTO” here for more details. You might also be interested in “Backup and Recovery for SQL Server databases that contain durable memory-optimized data” here.

Gotcha #5, resource consumption

Updates on memory-optimized tables are performed as DELETE + INSERT, and INSERTs create row versions, and the newly inserted row becomes the current version. Older versions consume additional memory, and must be retained as long as any processes that reference them are still executing (like queries running on the subscriber). It’s possible to have long chains of versioned rows, and that means your environment might require additional memory. For a detailed explanation of row versioning, including the Garbage Collection process, see my post on “Row version lifecycle for In-Memory OLTP” here. There are additional considerations if your workload uses memory-optimized table variables (also detailed in that post).

Gotcha #6, silent schema killer

Let’s say you’ve done you’re homework, and that your configuration for memory-optimized subscriber tables is perfect. There is additional database configuration that must be done to support memory-optimized tables, and without that, your subscriber tables will not be initialized/reinitialized as memory-optimized (they’ll still be created on the subscriber, but will be traditional/on-disk tables). In the stored procedure that executes on the subscriber, there is validation to determine if there is a memory-optimized filegroup for the subscriber database (there are other conditions, but this is the one we’re interested in).

If you lookup the definition of sys.filegroups, it relates to sys.data_spaces, and there we see a column named “type” that can have the following values:

FG = Filegroup
FD = FILESTREAM data filegroup
FX = Memory-optimized tables filegroup
PS = Partition scheme

If the query finds a filegroup of type “FX”, the table is created as memory-optimized, and if not (along with some other conditions), it’s created as a traditional/on-disk table.

While it seems obvious that you should have already configured your database to have a memory-optimized filegroup, if you skipped that step, there is no warning, error, or other type of message received, stating that the subscriber database is not memory-optimized. Of course, simply having a memory-optimized filegroup is not enough to create memory-optimized tables, because you must also have containers that belong to that memory-optimized filegroup. The “memory-optimized filegroup exists” validation will pass, but the (re)initialization will fail because no containers exist, and you’ll receive an error about that.

Index limitations

As of this writing (SQL 2016, SP1), a memory-optimized table can have a maximum of 9 indexes (if one of them is a columnstore index). That may or may not be an issue for your environment, but it’s a much lower number than traditional/on-disk tables.

Stored procedure execution

A quick review of Interop vs. Native Compilation:

  • Interop – interpreted TSQL as existed prior to SQL 2014. The full TSQL surface area is available with interop mode, and you can access both on-disk and memory-optimized tables.
  • Native Compilation – for maximum speed, you’ll want to use natively compiled stored procedures. There are restrictions for natively compiled modules, the most significant being that they can only reference memory-optimized tables, and the full TSQL surface area is not available. As of SQL 2016/SP1, natively compiled modules don’t support CASE statements, views, and there many other restrictions. For more details, check “Transact-SQL Constructs Not Supported by In-Memory OLTP” here.

If you execute an UPDATE or DELETE that affects a large number of rows, then that statement is turned into individual UPDATE or DELETE statements that are sent to the distributor, and finally to the subscriber(s). To avoid the overhead of sending all those changes, it’s possible to publish the “execution” of a stored procedure. The documentation says: “..only the procedure execution is replicated, bypassing the need to replicate the individual changes for each row..” Please refer to the document about replicating stored procedure execution here.

The documentation also states that you can customize the stored procedure on the subscriber. Although the documentation doesn’t mention it, the stored procedure can be natively compiled, which should greatly increase performance on the subscriber for transactions that affect a large number of rows. Keep in mind that any changes made to the procedure at the publisher are sent to the subscriber. If this isn’t the behavior you want, disable the propagation of schema changes before executing ALTER PROCEDURE.

IDENTITY crisis

You’ll likely be disappointed with native compilation if you’re trying to INSERT many rows at the subscriber, and the destination table includes an IDENTITY column. That’s because it’s not possible to insert a row that has an IDENTITY column in a natively compiled stored procedure. Even if you SET IDENTITY_INSERT on before calling the procedure, the insert still fails with: “The function ‘setidentity’ is not supported with natively compiled modules.”

Custom stored procedures

There is a difference between “replicating stored procedure execution”, and using “custom stored procedures”.  Microsoft does not support anything you might create as a “custom stored procedure”, whether or not it’s natively compiled.

Please check the documentation here.

Wrapping up

In-Memory OLTP is steadily making its way into the full feature set offered by SQL Server. If you’re running SQL 2016 SP1, In-Memory OLTP is now included with all editions of SQL 2016, except LocalDB.

Troubleshooting Natively Compiled Stored Procedures, Part 1

A subset of the tools available for troubleshooting interpreted stored procedures are available for troubleshooting natively compiled procedures.

The following table highlights the differences

Method Interpreted Natively compiled
Recompile specific statements Supported Not supported – but theoretically not required, due to the impossibility of parameter sniffing
Execute procedure with RECOMPILE Supported Not supported – but theoretically not required, due to the impossibility of parameter sniffing
Estimated/Actual plan Supported “Estimated Plan” makes no sense in the context of natively compiled stored procedures. The plan that will be executed is available from SHOWPLAN_XML or by clicking (“Estimated Plan” in SSMS, but it’s not “estimated”)
Remove plan from plan cache Supported Not supported – plans for natively compiled stored procedures are not stored in the plan cache.
DBCC FREEPROCCACHE Supported No effect, because plans for natively compiled stored procedures are not stored in the plan cache.
SET STATISTICS IO ON Supported Not supported/required, because there is no such thing as IO for memory-optimized tables.
SET STATISTICS TIME ON Supported Supported, but might not be 100% accurate, because execution times less than 1 millisecond are reported as 0 seconds. Total_worker_time may not be accurate if many executions take less than 1 millisecond.
SET FMTONLY Supported Not supported, but you can use sp_describe_first_result_set.
SHOWPLAN_XML Supported Supported
SHOWPLAN_ALL Supported Not supported
SHOWPLAN_TEXT Supported Not supported
Mismatched datatypes xEvents hekaton_slow_parameter_passing, with reason = parameter_conversion.
Named parameters, i.e.

EXEC dbo.Proc @Param1 = @Param1

Supported Supported, but not recommended, due to performance impact. You can track this type of execution with xEvents hekaton_slow_parameter_passing, with reason = named_parameters.

If any SET options are in effect, statistics are gathered at the procedure level and not at the statement level.

Note 1: Statement-level execution statistics can be gathered with xEvents by capturing the sp_statement_completed event. They can also be seen using Query Store (detailed in a future post).

Note 2: Due to the nature of working with memory-optimized tables in general, it’s likely that you will have to implement retry logic. Because of this, and also because of feature limitations within the natively compiled space, Microsoft suggest using an interpreted TSQL wrapper when calling natively compiled stored procedures.

The following query references sys.dm_exec_query_stats to get statistics for natively compiled procedures:

Note 3: The documentation for sys.dm_exec_query_stats states that the total_rows, min_rows, max_rows, and last_rows columns cannot be NULL, but NULL is still returned. A Connect item has been filed to have those columns return 0.

Parallelism

Parallelism is supported for memory-optimized tables for all index types. While that statement is true when using interpreted stored procedures that reference memory-optimized tables, unfortunately it’s not true when using natively compiled stored procedures.

Actual vs. Estimated

These terms have confused generations of SQL Server technologists.

For natively compiled procedures, enabling “Actual Plan” in SSMS does not return any plan information, but still executes the procedure. Enabling “Estimated Plan” in SSMS for natively compiled procedures is the same as setting SHOWPLAN_XML ON, but does not actually execute the stored procedure. The plan that will be executed is displayed.

Removing plans from the cache

For interpreted stored procedures, DBAs have the ability to remove an offending plan from the plan cache. This is not possible with natively compiled stored procedures, because the plan is not stored in the plan cache.

DBCC FREEPROCCACHE

If you execute DBCC FREEPROCCACHE and expect your natively compiled plans to magically disappear, you will no doubt be disappointed. That’s because DBCC FREEPROCCACHE has no effect on compiled modules, as they are not stored in the plan cache that’s used for interpreted TSQL. But executing DBCC FREEPROCCACHE will of course remove all existing plans for interpreted TSQL from the plan cache (so don’t do that…unless you’re really, really sure you want to recompile all of your interpreted procs).

Parameter sniffing

With interpreted stored procedures, parameter sniffing can severely impact performance. Parameter sniffing is not possible for natively compiled procedures, because all natively compiled procedures are executed with OPTIMIZE FOR UNKNOWN.

Statistics

SQL 2016 has the ability to automatically update statistics on memory-optimized tables if your database is has a compatibility level of at least 130. If you don’t want to depend on SQL Server to determine when stats should be updated, you can update statistics manually (and we no longer have to use FULLSCAN, as was the case in SQL 2014). Statistics for index key columns are created when an index is created.

Database upgrades and statistics

As mentioned earlier, if your database was set to compatibility level 120 (SQL 2014), and you want to take advantage of auto-update statistics, you must change the compatibility level to 130. But statistics still won’t be auto-updated unless you manually update them one last time.

Recompilation

When you create a natively compiled stored procedure, it gets compiled, and execution plans for the queries contained within the procedure are created. As the data changes, those execution plans will be based on older statistics, and might not perform at the highest possible level. Many people think that if you update statistics, natively compiled stored procedures will magically recompile. Unfortunately, this is not correct – natively compiled stored procedures are only recompiled under the following circumstances:

  • When you execute sp_recompile (this should be done after statistics are updated)
  • Database restart

Database restart includes at least the following events:

  • Database RESTORE
  • OFFLINE/ONLINE of database
  • Failover (FCI or Availability Group)
  • SQL Server service restart
  • Server boot

Unlike memory-optimized tables – which are all created, compiled, and placed into memory upon database restart – natively compiled stored procedures are recompiled when first executed. This reduces the amount of time required for database recovery, but affects the first-time execution of the procedure.

Plan operators

For traditional tables (harddrive-based), the number of pages expected to be returned by an operator has a significant impact on the cost, and therefore affects the plan. Since memory-optimized tables are not stored in pages, this type of calculation is irrelevant.

For memory-optimized tables, the engine keeps track of how many rows are in each table. This means that estimates for full table scans and index scans are always accurate (because they are always known). For memory-optimized tables, the most important factor for costing  is the number of rows that will be processed by a single operator. Older statistics might reference row counts that are no longer valid, and this can affect plan quality.

Nested execution

Prior to SQL 2014, it was not possible for one natively compiled stored procedure to call another natively compiled stored procedure. This restriction has been lifted in SQL 2016.

We will continue troubleshooting natively compiled stored procedures in a future post.

In-Memory OLTP: How fast can you write?

Typical bottlenecks for write-intensive workloads include:

  • the SQL Server engine (latching/locking)
  • continually (re)interpreted TSQL
  • write-ahead-logging (WAL)
  • logging of modifications to indexes
  • pessimistic concurrency

The design of In-Memory OLTP can address all of the items in the list above, but doing so can reveal new bottlenecks.

If you have a write-intensive In-Memory OLTP workload, and your data must be durable, you will likely find that WRITELOG is one of your top waits. That’s because regardless of database recovery setting, all DML changes to durable memory-optimized tables are fully logged.

A new type of storage (explained a bit later) allows for faster writes to the transaction log, but in order to grasp how it results in increased IO, we must fully understand the process of how changes are made to the transaction log.

Log Buffers

A database must have at least one data file and one transaction log file, but there is another critical component of transaction logging, known as the log buffer. This is an area of memory where log records are first written, before being flushed from memory to the actual log file.

When a change occurs to a table (on-disk table or durable memory-optimized table), log records are created in the log buffer. The contents of the log buffer are flushed to the log file when any of the following occurs:

  1. A transaction commits and that transaction has an active log record in the log buffer
  2. The 60K size limit for the log buffer is reached
  3. Data pages must be written to disk and there are log records in the log buffer related to those pages. Log records must be written to the log file before writing the data pages to the data files (this is known as Write Ahead Logging, and applies to on-disk tables, but not to memory-optimized tables).

(See “Observing SQL Server Transaction Log Flush Sizes using Extended Events and Process Monitor” at this link for more details)

In-Memory OLTP is capable of incredible performance, but it can be constrained by the requirement to navigate the IO stack for log writes. This means that you are not able to fully use all of the power in the cores you purchased, as proven in this Microsoft Ignite session (starting at 23:06).

The issues of core underutilization and the ability to improve writes to the transaction log can be addressed by a new type of storage known as NVDIMM-N, which is supported in Windows Server 2016. An NVDIMM-N is “persistent memory”, or “storage class memory” (SCM), and has the following characteristics;

  1. One side is DD4 DRAM, and the other side is Flash storage
  2. Sits on the memory bus
  3. In the event of power failure, the contents of DRAM gets written to the Flash storage, via super cap, battery, or other means
  4. Extraordinary write speeds (see below)
  5. Fairly expensive
  6. Limited in capacity
  7. Are presented to the operating system as drives
  8. Formatted in two possible modes, explained below
  9. No need to cache data
  10. Allows full CPU usage

SQL Server can benefit from using persistent memory, because once the write to the log buffer occurs, data is guaranteed to persist – there is no need to flush the log buffer to disk. As a result, log writes are performed at the speed of memory, not the speed of IO, which would require a context switch and going through the kernel mode IO stack.

Block vs. DAX

As mentioned above, there are two different modes that can be used when formatting SCM storage:

  1. Block – sectors are written, no app changes required, but has overhead to convert to byte-addressable ranges, circumvents IO stack
  2. DAX – byte addressable, app changes required, circumvents IO stack, uses memory-mapped files

DAX is considerably faster than Block, because it allows your transaction log (or more accurately the log buffers associated with it) to be placed on byte-addressable storage.

How do we use SCM?

SCM uses DRAM slots, which has the effect of limiting max physical memory on the server. As mentioned before, it’s used for log buffers only, so only about 20MB per database is required (although DRAM size is specified in gigabytes). You can use a single NVDIMM-N to hold log buffers for multiple databases.

How fast is SCM?

Please see the Tobias Klima videos listed at the bottom of this post for benchmarking results.

Limitations

For those who trespass in the virtual world, Hyper-V cannot directly use persistent memory yet.

Server crash

If your server crashes, theoretically you can remove the persistent memory and place it in another server, as it’s just an NTFS formatted drive. The other server must also be running Windows Server 2016. S2D supports NVDIMMs as described here, and provide high availability for SCM, and are also to aggregate multiple NVDIMM-N for performance or resilience, depending on configuration.

Potential issues

Filter drivers might not work properly:

  • Caching
  • Encryption
  • Antivirus
  • Replication

These types of drivers can have issues when using SCM, because they work based on IOs, but there are no IOs when using SCM.

Benefits for In-Memory OLTP workloads

  • Eliminating WRITELOG waits
  • Allowing full usage of the expensive cores you purchased

More info

Tobias Klima (PM for MS Storage Group) has created videos about using SCM as Block storage here and byte-addressable storage here.

Glenn Berry wrote an excellent post on persistent memory here, where he describes using it as an alternative to deploying In-Memory OLTP. Hopefully this post has proven to you that deploying persistent memory has tangible benefits for In-Memory OLTP workloads as well.

In-Memory OLTP: The moving target that is RTO

Anyone who has worked with or studied In-Memory OLTP should know that upon database restart (which occurs for a variety of reasons), data for durable memory-optimized databases must be streamed from checkpoint file pairs that reside on disk (CFPs), into memory.

During recovery, data files are read in parallel, using as many threads as there are logical CPUs. If all CFPs reside on a single volume, the speed of recovery is limited by how much IO is available from just that single volume. That’s why RTO is affected by not only how many containers exist for a database, but how many volumes those containers are spread across.

That means that other than keeping the amount of data stored in durable memory-optimized tables to a minimum, the only option you have for reducing the amount of time it takes to stream data into memory, is to spread your containers across multiple volumes.

Let’s say you’ve got 5 containers, and your RTO is 15 minutes. You’ve tested recovery with your current data set, and the actual time required to restart your database is 5 minutes. Everything is ok, you’ve got RTO minutes to spare.

Over time, the amount of memory consumed by your durable memory-optimized tables – and the required storage footprint for CFPs – continue to grow. But you’re ok with all of that, because hey – memory is cheap, and so is storage, so you’ll just make sure that you have enough of both to continue doing business.

Now fast forward to some point in the future, and the size of your durable data set has quadrupled. You can’t get application owners to agree to migrate data from memory-optimized tables to disk, because of who-knows-what, or perhaps it’s because of some of the reasons mentioned here.

And now, due to the continued growth of your durable memory-optimized data, you can’t make your RTO. How will you solve this problem?

If you only create additional containers, the In-Memory OLTP engine will be able to use them for writing new data. But that won’t get you where you want to be, because the existing CFPs don’t get redistributed among all existing containers – the new containers get their own CFPs.

The moment you add the new containers, 100% of your existing data still resides in the 5 original containers. Even though you theoretically have enough containers to make your RTO (the original 5 containers plus 5 new ones), it’s still blown, because the pre-existing CFPs have not been redistributed across all containers (10 containers total).

The solution is to:

  • Create new containers on the pre-existing volumes
  • Create new containers on the new volumes
  • Drop the old containers

Dropping the old containers will cause the existing CFPs to be redistributed among all new containers (10).

Now we have a way to redistribute pre-existing CFPs across new containers/volumes so that you can make your RTO. And in case you were wondering, Microsoft says that redistribution of CFPs can be performed while your database is online (obviously you should test this entire process in a non-production environment to see if it works as promised).

How much durable data can you store in a SQL Server database? For SQL 2016, you are limited only by what the operating system can handle. With Windows Server 2012R2, the maximum memory is 4TB, but Windows Server 2016 supports 24TB of memory. That could mean you got a whole lotta CPF redistribution goin’ on…..

UPDATE [2017-05-03]

The Microsoft documentation seems to directly contradict what I’ve written in this blog post, but I can assure you that what I wrote is correct.

Here’s a script to prove it:

USE [master]
GO
 
DROP DATABASE IF EXISTS [InMemContainerTest]
 
CREATE DATABASE [InMemContainerTest] ON PRIMARY 
(
    NAME = N'InMemContainerTest'
   ,FILENAME = N'S:\SQLDATA\InMemContainerTest.mdf' 
   ,SIZE = 100MB 
   ,MAXSIZE = UNLIMITED
   ,FILEGROWTH =100MB 
)
LOG ON 
(
    NAME = N'InMemContainerTest_log'
   ,FILENAME = N'S:\SQLDATA\InMemContainerTest.ldf' 
   ,SIZE = 100MB
   ,MAXSIZE = 2048GB
   ,FILEGROWTH = 100MB
)
GO
 
ALTER DATABASE [InMemContainerTest]
ADD FILEGROUP InMemContainerTestFG
CONTAINS MEMORY_OPTIMIZED_DATA;
GO
ALTER DATABASE [InMemContainerTest]
ADD FILE
(
    NAME = 'InMemContainerTestFile'
   ,FILENAME = 'S:\SQLDATA\InMemContainerTestFile'
)
TO FILEGROUP InMemContainerTestFG;
GO
ALTER DATABASE [InMemContainerTest] SET COMPATIBILITY_LEVEL = 130;
GO
 
USE [InMemContainerTest]
GO
DROP TABLE IF EXISTS [dbo].[InMemT1]
GO
CREATE TABLE [dbo].[InMemT1]
(
    [OrderId]      [INT]           IDENTITY(1, 1) NOT NULL
   ,[StoreID]      [INT]           NOT NULL
   ,[CustomerID]   [INT]           NOT NULL
   ,[OrderDate]    [DATETIME]      NOT NULL
   ,[DeliveryDate] [DATETIME]      NULL
   ,[Amount]       [FLOAT]         NULL
   ,[Discount]     [FLOAT]         NULL
   ,[DiscountCode] [VARCHAR](25)   NULL
   ,INDEX [IndOrders_OrderID] NONCLUSTERED ([OrderId] ASC)
   ,INDEX [IndOrders_StoreID] NONCLUSTERED ([StoreID] ASC)
   ,INDEX [IndOrders_CustomerID] NONCLUSTERED ([CustomerID] ASC)
   ,CONSTRAINT [PK_InMemT1_OrderID] PRIMARY KEY NONCLUSTERED ([OrderId] ASC)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
GO
 
/*
	add data
*/
SET NOCOUNT ON
GO
INSERT dbo.InMemT1
SELECT 1 AS [StoreID]     
      ,2 AS [CustomerID]  
      ,GETDATE() AS [OrderDate]   
      ,GETDATE()AS [DeliveryDate]
      ,1.11 AS [Amount]      
      ,0.10 AS [Discount]    
      ,'Holiday1' AS [DiscountCode]
GO 200000
 
 
/*
	verify CFP folder
	dir S:\SQLData\InMemContainerTestFile\$HKv2
*/
 
 
/*
	add two new containers
*/
 
ALTER DATABASE [InMemContainerTest]
ADD FILE
(
    NAME = 'InMemContainerNew1'
   ,FILENAME = 'S:\SQLDATA\InMemContainerNew1'
)
TO FILEGROUP InMemContainerTestFG;
GO
ALTER DATABASE [InMemContainerTest]
ADD FILE
(
    NAME = 'InMemContainerNew2'
   ,FILENAME = 'S:\SQLDATA\InMemContainerNew2'
)
TO FILEGROUP InMemContainerTestFG;
GO
 
/*
	verify contents of NEW CFP folders
	dir S:\SQLData\InMemContainerNew1\$HKv2
	dir S:\SQLData\InMemContainerNew2\$HKv2
 
	on my server, these folders are empty
*/
 
ALTER DATABASE [InMemContainerTest] REMOVE FILE InMemContainerTestFile;
 
/*
	verify contents of NEW CFP folders
	dir S:\SQLData\InMemContainerNew1\$HKv2
	dir S:\SQLData\InMemContainerNew2\$HKv2
 
	InMemContainerTestFile has been deleted, and the CFPs have been 
	redistributed to InMemContainerNew1 and InMemContainerNew2
*/

In-Memory OLTP query plans: The Unexpected

Spills to TempDB

While testing the new features of In-Memory OLTP in SQL 2016, I discovered something unexpected: query plans for interop can spill to TempDB.

It was explained to me that the accessing of memory-optimized tables is done at the lowest level, and that if sort or join operators are not allocated enough memory, they’ll spill. I contacted Microsoft and expressed my surprise, and their reply was: “If the plan spills with disk-based, it spills with memory-optimized”.

This is not possible with natively compiled stored procedures. For sorts or joins, you simply have to have enough memory, and unfortunately the xtp DMVs don’t reveal the level of granularity required to track how much memory is being used. All joins are nested loop, so the memory requirements are typically modest.

So we need to remember that for interop procedure that access memory-optimized tables, spills to TempDB can still occur.

image

image

Memory-optimized table variable gotcha

In-Memory OLTP can increase performance for a variety of workloads. For example, if your workload creates many #TempTables, ##TempTables, or @TableVariables, they all have to be allocated in TempDB, and it’s possible that TempDB itself is a bottleneck. Some DBAs/Developers mistakenly believe that @TableVariables reside only in memory, which is not true, and has been proven many times in blog posts like this and this, by Wayne Sheffield and Gail Shaw respectively.

Microsoft has described the ways in which temp tables and table variables can be replaced by using memory-optimized objects here. It’s true that we can now have truly memory-resident temporary objects, and that if your workload was bottlenecked due to TempDB io or allocation issues (GAM/SGAM/PFS), using memory-optimized tables variables can increase workload throughput. However, what’s not mentioned in that article is the impact of choosing different types of indexes for the table variable, which has the effect of using 2x memory for the table variable. For large numbers of rows this can even result in an out-of-memory condition. This would be particularly relevant if you are migrating a large number of rows from harddrive-based tables to memory-optimized tables, and the source and destination databases are different.

Creating a memory-optimized table variable is a two step process:

1. create a table type

2. create a variable of that type

Example (note that the PK column of the table type is defined as PRIMARY KEY NONCLUSTERED HASH):

In the following script, ’64K page pool’ indicates the amount of memory allocated to memory-optimized table variables

The PK column of the table type is defined as PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 20000000)

If you instead define the PK column to use a RANGE index (non-HASH index), my testing has shown that memory allocation for the variable is almost exactly double that of using the HASH index.

HASH index, 64 page pool:

image

RANGE index, 64 page pool:

image

Not related to index choice – but still significant – is that the memory allocated to memory-optimized table variables (and their row versions, if any) is not released until the variable goes out of scope. Garbage collection for row versions ignores memory-optimized table variables.

Updating all rows in the variable will create row versions, and at least in this case, the row versions did not consume a lot of additional memory. I blogged about row versions in here.

If you think Microsoft should fix this issue bug with RANGE indexes on memory-optimized table variables, please upvote this connect item.

In-Memory OLTP data/delta file corruption: “Now trappable”

4 April 2017

(This post used to be called In-Memory OLTP data/delta file corruption: “The Untrappable”, but as you’ll see in a moment, things have changed)

There’s a lot of confusion out there about SQL Server’s In-Memory OLTP feature.

You can run CHECKDB on a database that contains memory-optimized tables, but all memory-optimized tables will be ignored. And running CHECKTABLE against memory-optimized tables will fail with an “unsupported” error. As a result, some shops might not consider deploying this feature. While it’s not possible to recover from data/delta file corruption, you can still detect corruption. As I blogged a while ago in this post, a checksum is calculated for every block written to data/delta files, and those checksums are recalculated any time the block is read. That occurs during restore, backup, and any other operation that reads data/delta files. As Brent Ozar blogged in this post, you can execute a copy-only backup of the memory-optimized filegroup to DISK = ‘nul’, to force recalculation of all checksums, which will in turn compare them to the values stored with the blocks. If there are no mismatches between the newly calculated and stored checksum values, your durable memory-optimized data is corruption free.

Let’s say you execute that copy-only backup for your memory-optimized filegroup each night – if there is corruption, which conditions can trigger alerts, so that you can be notified?

As of SQL 2016/SP1, Microsoft has fixed the various code paths that access data/delta files, so that they produce trappable errors upon checksum failure:

  • Regular processing, i.e. file merge: Severity 21, Error 41355
  • Backup/Restore: Severity 16, Error 41316
  • Database recovery: Severity 21, Error 3414

(The sordid details of my attempts to reproduce data/delta file corruption before SQL 2016/SP1 are listed below).

17 August 2016

There’s a lot of confusion out there about SQL Server’s In-Memory OLTP feature.

If there is corruption, what mechanism can you use to be alerted?

I had been in touch with Microsoft about this type of corruption, and they stated that it would be logged in the SQL errorlog as Severity 21. Of course you can create an alert on Severity 21 errors, but I wanted to find a way to determine that it’s specifically related to data/delta file corruption.

How would you go about reproducing this type of corruption for durable memory-optimized tables?

About a year ago I sent an email to corruption guru Paul Randal, asking if he had experimented with this type of corruption for durable memory-optimized data, and at least at that point he had not. So I set out to roll my own corruption repro, and so far the results are not what I expected.

I created a single durable memory-optimized table, and added one row. Then I ran CHECKPOINT to close the data file, and used a hex editor, attempting to open each of the data files. If I tried to open one of the files that had been written to, I received a “file in use” error, so I set the database OFFLINE, and overwrote some of the data in the formerly “in use” file with 00.

Next, I attempted to ONLINE the database, and received the following error:

Msg 41316, Level 16, State 0, Line 51
Restore operation failed for database ‘HKCorruption’ with internal error code ‘0x8800000e’.
Msg 5181, Level 16, State 5, Line 52
Could not restart database “HKCorruption”. Reverting to the previous status.
Msg 5069, Level 16, State 1, Line 52
ALTER DATABASE statement failed.

I checked the SQL errorlog, and there was no Severity 21 listed. That’s where it stands for now – unfortunately I’ve not been able to reproduce pseudo storage corruption that affects data/delta files.

I’ve contacted Microsoft, informing them of the details of my testing, and I’ll update this post if/when I hear back from them.

Update 23 August 2016

Today I heard back from Microsoft. Turns out I had actually been able to reproduce corruption in the data/delta files. Look carefully at the errors from the SQL errorlog that I posted above. See that ‘0x8800000e’ ? It’s the only indication that there was a checksum failure. To be clear, this is what does and does not happen when there is a checksum failure found in data/delta files:

1. a value of 0x8800000e is written to the SQL errorlog
2. no severity is written to the SQL errorlog
3. no standardized error ID is written to the SQL errorlog
4. no text indicating corruption is written to the SQL errorlog

There are many problems with this situation, the first one being that there is no way to trap the corruption error with an alert. If there was a Severity associated with the error, we could create an alert, and receive some type of notification when the corruption occurs.

It’s bad enough that CHECKDB/CHECKTABLE ignores memory-optimized tables. If we force checksums to be calculated by backing up the memory-optimized filegroup to disk = ‘nul’, in order to determine that there are no checksum errors, you will have to scan the SQL errlog for ‘0x8800000e‘ after every memory-optimized filegroup backup.

This would seem to be a somewhat radical departure from standard ways to be informed of corruption (and other SQL Server errors in general).

Who could blame potential adopters of In-Memory OLTP for thinking that it’s not ready for prime time (and in this regard it’s definitely not). What could be more important than knowing your data is corruption free, and being alerted immediately if corruption occurs?

The present state of corruption detection and notification will do little to change the minds of those hesitant to adopt In-Memory OLTP. If Microsoft wants people to consider using In-Memory OLTP, they need to fix this issue immediately.

I have created this connect item about the issues described in this post (thanks for upvoting!)

Update 24 August 2016

Microsoft followed up with me again today, and said the following:

  • If the checkpointing process detects a checksum failure during regular processing, for example during a file merge, a sev21, error 41355 is written to SQL the error log
  • If there is a checksum failure during backup or restore, a sev16 error is written to the SQL error log, which is the same as what SQL Server does for checksum failures in mdf/ndf or ldf files
  • The team is looking at the DB startup code path to raise a sev21 error

That’s much better news than what I thought (what was originally explained to me).

Hopefully Microsoft will be able to fix the DB startup code path issue as part of a CU or SP1 (which in recent history would take about a year from the RTM release date).