Typical bottlenecks for write-intensive workloads include:
- the SQL Server engine (latching/locking)
- continually (re)interpreted TSQL
- write-ahead-logging (WAL)
- logging of modifications to indexes
- pessimistic concurrency
The design of In-Memory OLTP can address all of the items in the list above, but doing so can reveal new bottlenecks.
If you have a write-intensive In-Memory OLTP workload, and your data must be durable, you will likely find that WRITELOG is one of your top waits. That’s because regardless of database recovery setting, all DML changes to durable memory-optimized tables are fully logged.
A new type of storage (explained a bit later) allows for faster writes to the transaction log, but in order to grasp how it results in increased IO, we must fully understand the process of how changes are made to the transaction log.
Log Buffers
A database must have at least one data file and one transaction log file, but there is another critical component of transaction logging, known as the log buffer. This is an area of memory where log records are first written, before being flushed from memory to the actual log file.
When a change occurs to a table (on-disk table or durable memory-optimized table), log records are created in the log buffer. The contents of the log buffer are flushed to the log file when any of the following occurs:
- A transaction commits and that transaction has an active log record in the log buffer
- The 60K size limit for the log buffer is reached
- Data pages must be written to disk and there are log records in the log buffer related to those pages. Log records must be written to the log file before writing the data pages to the data files (this is known as Write Ahead Logging, and applies to on-disk tables, but not to memory-optimized tables).
(See “Observing SQL Server Transaction Log Flush Sizes using Extended Events and Process Monitor” at this link for more details)
In-Memory OLTP is capable of incredible performance, but it can be constrained by the requirement to navigate the IO stack for log writes. This means that you are not able to fully use all of the power in the cores you purchased, as proven in this Microsoft Ignite session (starting at 23:06).
The issues of core underutilization and the ability to improve writes to the transaction log can be addressed by a new type of storage known as NVDIMM-N, which is supported in Windows Server 2016. An NVDIMM-N is “persistent memory”, or “storage class memory” (SCM), and has the following characteristics;
- One side is DD4 DRAM, and the other side is Flash storage
- Sits on the memory bus
- In the event of power failure, the contents of DRAM gets written to the Flash storage, via super cap, battery, or other means
- Extraordinary write speeds (see below)
- Fairly expensive
- Limited in capacity
- Are presented to the operating system as drives
- Formatted in two possible modes, explained below
- No need to cache data
- Allows full CPU usage
SQL Server can benefit from using persistent memory, because once the write to the log buffer occurs, data is guaranteed to persist – there is no need to flush the log buffer to disk. As a result, log writes are performed at the speed of memory, not the speed of IO, which would require a context switch and going through the kernel mode IO stack.
Block vs. DAX
As mentioned above, there are two different modes that can be used when formatting SCM storage:
- Block – sectors are written, no app changes required, but has overhead to convert to byte-addressable ranges, circumvents IO stack
- DAX – byte addressable, app changes required, circumvents IO stack, uses memory-mapped files
DAX is considerably faster than Block, because it allows your transaction log (or more accurately the log buffers associated with it) to be placed on byte-addressable storage.
How do we use SCM?
SCM uses DRAM slots, which has the effect of limiting max physical memory on the server. As mentioned before, it’s used for log buffers only, so only about 20MB per database is required (although DRAM size is specified in gigabytes). You can use a single NVDIMM-N to hold log buffers for multiple databases.
How fast is SCM?
Please see the Tobias Klima videos listed at the bottom of this post for benchmarking results.
Limitations
For those who trespass in the virtual world, Hyper-V cannot directly use persistent memory yet.
Server crash
If your server crashes, theoretically you can remove the persistent memory and place it in another server, as it’s just an NTFS formatted drive. The other server must also be running Windows Server 2016. S2D supports NVDIMMs as described here, and provide high availability for SCM, and are also to aggregate multiple NVDIMM-N for performance or resilience, depending on configuration.
Potential issues
Filter drivers might not work properly:
- Caching
- Encryption
- Antivirus
- Replication
These types of drivers can have issues when using SCM, because they work based on IOs, but there are no IOs when using SCM.
Benefits for In-Memory OLTP workloads
- Eliminating WRITELOG waits
- Allowing full usage of the expensive cores you purchased
More info
Tobias Klima (PM for MS Storage Group) has created videos about using SCM as Block storage here and byte-addressable storage here.
Glenn Berry wrote an excellent post on persistent memory here, where he describes using it as an alternative to deploying In-Memory OLTP. Hopefully this post has proven to you that deploying persistent memory has tangible benefits for In-Memory OLTP workloads as well.