Category Archives: In-Memory OLTP

Hekatonized Tempdb

At PASS Summit 2018, I attended a session led by Pam Lahoud (t) of the SQL Tiger Team , entitled “TempDB: The Good, The Bad, and The Ugly”. If you have access to the PASS recordings from 2018, I highly recommend watching this session.

It was a really fantastic presentation, detailing the full history of how the SQL Server engineering team has attempted to optimize TempDB in various ways. The two problems that busy servers can have with regard to TempDB are allocation page contention, and metadata contention, and the engineering team should be applauded for its clever approaches to solving these types of contention throughout the years. To be clear, all of the optimizations were related to temp table usage in stored procedures, not scripts.

However, none of those solutions for contention scaled – some only relocated the issue. As part of Pam’s presentation, she did a demo with a single TempDB metadata table that was “Hekatonized”  – actually using the In-Memory OLTP engine – and the difference in throughput was significant. She said that Microsoft intends to convert the remaining system tables in TempDB to be memory-optimized (you’ll need SQL 2019 CTP 3.0 or later to test).

So once you’ve got it installed or have started a container running it – how to you automagically convert TempDB system tables to be memory-optimized? With TSQL, of course:

ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = ON;

Like other changes to TempDB, in order for the new memory-optimization to take effect a restart of the SQL Server service is required. Once the service is restarted, system tables in TempDB are now memory-optimized (it should be that way for RTM, but in CTP 3.0, it could be the case that not all system tables have been converted to Hekaton). You can reverse this setting with the following command, and again restarting the SQL Server service:

ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = OFF;

Unless your workload was truly hammering TempDB, you probably won’t see much difference in TempDB performance.

We need to be careful with this new In-Memory power, because depending on workload characteristics, we might need a whole lot more memory just to handle what’s going on in TempDB. Also, if you have scripts and/or monitoring that interrogate system tables in TempDB, you might be affected by some of the restrictions if TempDB system tables are memory-optimized. As the CTP release notes, state:

“A single transaction may not access memory-optimized tables in more than one database. This means that any transactions that involve a memory-optimized table in a user database will not be able to access TempDB system views in the same transaction.”

Another thing I want to make clear is that this new TempDB optimization only affects system tables in TempDB – not the tables you create; #table and ##table do not become memory-optimized as a result of this new feature.

After all, it’s name is MEMORY_OPTIMIZED_TEMPDB_METADATA

SQL 2019 In-Memory hotness

SQL 2019 is on track to become one of the most awesome releases – the product touches so many realms of the data platform, it’s truly mind boggling.

Since I have such a keen interest in Hekaton/In-Memory OLTP, when the CTPs are released for a new version of SQL Server, I look forward to any potential announcements about that feature.

So far, there’s been only one publicly announced enhancement for In-Memory OLTP in SQL 2019: system tables in TempDB will be “Hekatonized”. This will forever solve the issue of system table contention in TempDB, which is a fantastic use of Hekaton. I’m told it will be “opt in”, so you can use this enhancement if you want to, but you can also back out of it, which would require a restart of the SQL Server service.

But there’s at least one other enhancement that’s not been announced, although the details of its implementation are not yet known.

When you start to research the Hekaton feature, most are shocked to learn that CHECKDB does not verify anything about durable In-Memory tables: it silently ignores them.

That appears to have changed in SQL 2019, although either the informational message about what it does is misleading, or behind the scenes it does something different.

This is the output for DBCC CHECKDB of a memory-optimized database in SQL 2017:

Object ID 949578421 (object ‘inmem_table’): The operation is not
supported with memory optimized tables. This object has been skipped and will not be processed.

(the emphasis was added by me)

This is the output for DBCC CHECKDB of a memory-optimized database in SQL 2019:

DBCC results for ‘inmem_table’.
There are 101 rows in 3 pages for object “inmem_table”.

Why do I say the message is misleading?

Because durable data for memory-optimized tables is not stored in pages, but instead in a streaming fashion in files known as checkpoint file pairs (or data and delta files). Also, while it’s true that there are 101 rows in this table, the engine pre-creates a number of data and delta files, and it would make DBAs sleep a lot better at night, if all of those files were verified as being corruption free.

We’ll just have to stay tuned to the future CTPs and RTM of SQL 2019 to see how all of this shakes out.

Dangerous moves: Setting max size for In-Memory OLTP containers

I recently saw a thread on twitter, where the OP talked about setting the max size for an In-Memory OLTP container. I responded as I always do: it’s not possible to set a limit on anything having to do with storage for In-Memory OLTP.

Unfortunately, that’s not correct: through SSMS or TSQL, you can in fact set a max size for a container.

But you should not ever do that…..

Why?

Because if you do, and your checkpoint files exceed the max size of the container, your database can go into the In Recovery, Suspect, or OFFLINE state. The following code reproduces this issue:

Note that I’ve not yet found a way around this. The OP from that thread on twitter said he had to actually restart the SQL Server service to resolve the issue with that database, but I don’t see why that would make any difference (when I tried it, the database attempted recovery, but eventually went offline).

Setting a max size for the container is a really, really really bad idea, because it guarantees that the database will have some form of outage when you hit the threshold. The bottom line is that containers must be free to grow, period. That’s part of the capacity planning good DBAs will do before deploying the In-Memory OLTP feature.

New kid on the block: sp_BlitzInMemoryOLTP

In-Memory OLTP has been included in the last three releases of SQL Server, starting with 2014 through 2017, and now runs on Linux, Windows, Azure SQL Database, and Azure Managed Instances. Additionally, since SQL 2016/SP1, the In-Memory OLTP feature has been available in non-enterprise editions.

What does this all mean?

It most likely means that it’s only a matter of time before a memory-optimized database lands on your doorstep, and you’ll probably have no idea how or why it’s different.

For a while now, I’ve been working on a script to evaluate a SQL Server environment for anything related to In-Memory OLTP, and I had help with testing, general suggestions, and final touches from Konstantin Taranov and Aleksey Nagorskiy; their assistance was invaluable. Konstantin suggested to Erik Darling and Brent Ozar that my script be included as part of their great Blitz series, and the the result is…..sp_BlitzInMemoryOLTP.

It is now part of the awesomeness known as the First Responder Kit, and the direct link to the script can be found here.

sp_BlitzInMemoryOLTP reports on two categories: instance level and database level.

First let’s discuss which parameters it sp_BlitzInMemoryOLTP accepts, and then we’ll break out the results, section by section.

@instanceLevelOnly BIT

This flag determines whether or not to simply report on the server-level environment (if applicable, i.e. there is no server-level environment for Azure SQL Database). With this parameter, memory-optimized databases are ignored. If you specify @instanceLevelOnly and a database name, the database name is ignored.

@dbName NVARCHAR(4000) = N’ALL’

If you don’t specify a database name, then sp_BlitzInMemoryOLTP reports on all memory-optimized databases within the instance that it executes in, or in the case of Azure SQL Database, the database that you provisioned. This is because the default for the @dbName parameter is N’ALL’.

Example:

It’s also possible to report on a specific database name.

Example:

The results of calling sp_BlitzInMemoryOLTP this way are explained later in this post.

@tableName NVARCHAR(4000) = NULL

Example:

If you only want to report on a specific memory-optimized table, you would supply a value for the @tableName parameter, and sp_BlitzInMemoryOLTP will search through all memory-optimized databases, looking for memory-optimized user tables that match. There is currently no wildcard matching for the @tableName parameter.

@debug BIT

Using the @debug =1 parameter tells sp_BlitzInMemoryOLTP to only print the TSQL statements that would have been executed. This allows you (or more likely, me) to resolve problems like missing quotes, or other potential issues that can occur when using dynamic SQL.

Example:

Supported platforms

This script has been tested on SQL 2014, SQL 2016, SQL 2017, and Azure SQL Database. It has not been tested against Azure Managed Instances

In the comments, please let me know other things about memory-optimized environments and/or databases you’d like to see included in the script.

How to interpret the results for sp_BlitzInMemoryOLTP

When you execute sp_BlitzInMemoryOLTP, it runs several queries that pertain to the In-Memory OLTP environment. It should be noted that if there are no results for a given query, i.e. no temporal memory-optimized tables, sp_BlitzInMemoryOLTP does not return an empty result set (this keeps the clutter to a minimum).

For example, it could be that a memory-optimized filegroup has been added to a database, but no memory-optimized objects have been created. Depending on the version of SQL Server, there might not be details about the containers or files within them, so sp_BlitzInMemoryOLTP won’t return information on that.

Instance level

Instance level evaluates the following:

  • the version/edition of SQL server
  • SQL Server ‘max memory’ setting
  • memory clerks
  • XTP memory consumers, aggregated
  • XTP memory consumers, detailed
  • the value of the committed_target_kb column from sys.dm_os_sys_info
  • whether or not instance-level collection of execution statistics has been enabled for all natively compiled stored procedures (because this can kill their performance….)
  • when running Enterprise, if there are any resource groups defined, and which memory-optimized databases are bound to them
  • XTP and buffer pool memory allocations, because In-Memory OLTP can affect on-disk workloads
  • summary of memory used by XTP

Section 1: version/edition of SQL server

Documentation here.

Section 2: SQL Server ‘max memory’ setting

Documentation here.

Section 3: memory clerks

Documentation here.

Section 4: XTP memory consumers, aggregated

Documentation here.

Section 5: XTP memory consumers, detailed

Section 6: the value of the committed_target_kb column from sys.dm_os_sys_info. The amount of memory that SQL Server can use for the In-Memory OLTP feature is a percentage of the committed_target_kb value. But be forewarned, this value is not static. Details in my post here.

Section 7: whether or not instance-level collection of execution statistics has been enabled for all natively compiled stored procedures. Enabling this on a production server could be considered drastic. More details can be found in my post here.

Section 8: if running Enterprise, if there are any resource groups defined, and which memory-optimized databases are bound to them. Binding memory-optimized database to a Resource Pool (using Resource Governor) is considered a best practice, but unfortunately this capability is still Enterprise only. But if you’re on that edition, you should also be monitoring how close to the out of memory threshold you’re getting, and fire an alert when required. More details in my post here.

Section 9: XTP and buffer pool memory allocations, because In-Memory OLTP can affect on-disk workloads

Database level

For a given memory-optimized database (or all memory-optimized databases), database level evaluates the following:

  • all memory-optimized tables
  • all indexes on all memory-optimized tables
  • the average chain length for HASH indexes (and informs you if the bucket count is too low)
  • the number of indexes per memory-optimized table
  • all natively compiled stored procedures
  • which native modules are loaded (stored procedures only, and this is not relevant for Azure SQL Database)
  • the number of natively compiled procedures
  • whether or not the collection of execution statistics is enabled for any natively compiled procedures
  • if using the temporal feature for memory-optimized tables, the amount of memory consumed by hidden temporal internal tables (which are memory-optimized)
  • memory structures for LOB columns (off-row)
  • all memory-optimized table types
  • database layout, which includes mdf, ldf, ndf, and containers, and the size in various formats (KB/MB/GB). The totalSizeMB column is the total for the entire database (uses a Window Function).

Three separate result sets that describe containers:

  • Container details by container name
  • Container details by fileType and fileState
  • Container file details by container_id, fileType and fileState

For Azure SQL Database, sp_BlitzInMemoryOLTP:

  • verifies if you are running on the Premium tier (that’s the only tier that supports In-Memory OLTP)
  • displays all records for xtp_storage_percent, in descending order (more info here)
  • displays the status of XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS (more info here)

The output in the photos that follow was returned from executing sp_BlitzInMemoryOLTP, for a database named OOM-DB. You can get information on all memory-optimized databases if you don’t supply a database name when calling sp_BlitzInMemoryOLTP.

Section 1: Listing of memory-optimized databases on this instance of SQL Server

· Section 2: memory-optimized tables, including row counts

Section 3: indexes on memory-optimized tables. It’s helpful to know how many, and what type of indexes there are.

Section 4: average chain length for HASH indexes (if any). When a HASH index is created for a memory-optimized table, a value must be supplied for what’s known as the “bucket count”. But it doesn’t get adjusted automatically, and as a result, it can cause performance problems. More details here.

Section 5: Number of indexes per memory-optimized table. SQL 2014 and SQL 2016 have a limit of 8 nonclustered (RANGE) indexes per memory-optimized table. That ceiling was lifted in SQL 2017, and I’ve tested creating several hundred indexes on a single memory-optimized table (but please don’t do that in production!).

clip_image028

Sections 6 through 8:

  • natively compiled stored procedures
  • which natively compiled stored procedures are currently loaded
  • how many natively compiled stored procedures there are

Section 9: if using the temporal feature for memory-optimized tables, the amount of memory consumed by hidden temporal internal tables (which are memory-optimized). For temporal tables, there’s a difference between how things are handled if the temporal table is memory-optimized. I’ve written about that in this post.

Section 10: memory structures for LOB columns (off-row). For memory-optimized tables, LOB columns are actually stored as separate tables, and this can lead to performance problems. MCM Dimitri Korotkovitch has a great post on it here.

Section 11: memory-optimized table types. Yes, tables and table types can be memory-optimized, and you’ll want to be aware of the potential gotchas with those memory-optimized types, as detailed in my post.

Section 12: all database files, including the name, size, and location for each container.

Sections 13 through 15 pertain to the amount of storage consumed by durable memory-optimized tables. The files that persist durable data to storage go through several state changes over time. As a result, the storage footprint for memory-optimized databases that contain durable data can be surprisingly large, relative to the amount of data that’s stored in memory (Microsoft suggest 4x memory-optimized data size as a starting point). So it’s a good idea to keep an eye on the storage footprint.

Section 13: Container details by container name

One row per container, listing the aggregated size of all files within that container, as well as how many files per container

Section 14: Container details by fileType and fileState

Here, the breakdown is a bit different, taking into account the type of file.

For each type of file, i.e. DATA or DELTA, aggregate the storage consumed and number of files for each file type, across ALL containers for this database. For example, there are a total of 11 files of fileType DATA with a fileState of ACTIVE, across all containers for this memory-optimized database. (Note that SQL 2014 has file types that don’t exist in later versions of SQL Server)

Section 15: Container file details by container_id, fileType and fileState

For each type of file, i.e. DATA or DELTA, aggregated the storage consumed and number of files for each file type, PER CONTAINER.

In the prior example, we saw that there were a total of 11 files of fileType DATA with a fileState of ACTIVE, across all containers for this memory-optimized databases.

This result shows the breakdown of each fileType and fileState PER CONTAINER. The container named InMemDB_inmem1 has 3 files that have a fileType of DATA and a fileState of ACTIVE. So we expect to see 8 more files with this type and state, in the remaining containers. Sure enough, we see that the container named InMemDB_inmem2 has an additional 8 files with a fileType of DATA and a fileState of ACTIVE.

Understanding how In-Memory OLTP works (with all of its various gotchas) can only be addressed by putting in the required time. If you read the documentation, and then study the real-world deployment concepts detailed in my extensive blog post series on In-Memory OLTP, you’ll be on the right path. Once you begin to wrap your brain around In-Memory OLTP, you’ll need some help evaluating memory-optimized environments and/or databases, and that’s where sp_BlitzInMemoryOLTP can help.

In-Memory OLTP Resources, Part 4: OOM, the most feared acronym in all of In-Memory OLTP

Earlier parts of this series can be found here:

Part 1: The Foundation

Part 2: Checkpoint File Pairs

Part 3: OOS (Out of Storage)

This post will cover memory requirements and usage, and what happens if you run actually reach OOM, also known as ”Out Of Memory”, a condition that strikes fear in the hearts of DBAs supporting memory-optimized databases. We’ll also cover CPU-bound conditions.

How memory is allocated to the In-Memory OLTP engine

At a high level, the memory that’s allocated to the In-Memory OLTP engine comes from the SQL Server ‘max memory’ setting, as does everything else within SQL Server. But beneath that level, we need to be aware of memory pools.

image

The pool that can be used for allocating memory to the In-Memory OLTP engine depends on which edition you are running:

  1. if you are running Enterprise Edition, you can use Resource Governor to configure a Resource Pool. Memory-optimized databases can be bound to separate pools, or multiple databases can be bound to a single pool. If you don’t bind a memory-optimized database to a pool created with Resource Governor, then all memory allocations for In-Memory OLTP for that database comes from the Default pool.
  2. if you are NOT running Enterprise Edition, all memory for In-Memory OLTP is allocated from the Default pool.

If using the Default pool, then as a result of deploying the In-Memory OLTP feature, there can be performance issues with on-disk workloads.

The following image shows that as we add rows to memory-optimized tables – and put pressure on the buffer pool – the buffer pool responds by shrinking, and that can affect disk-based workloads. If we then delete rows from memory-optimized tables, the buffer pool can expand. But what if we don’t delete rows from memory-optimized tables? Then the buffer pool will stay in its reduced state (or shrink even more), and that can cause problems due to buffer churn (continually having to do physical I/Os to retrieve pages from storage, for disk-based workloads).

image

Astute readers will consider using Buffer Pool Extensions (BPE), which is available in Standard Edition only. Yes, you could do that, but BPE retrieves a single 8K page at a time, and can actually make performance worse. And in case you’re wondering, no, it’s not possible to compress memory-optimized data that’s stored in memory. Think Windows will actually page out any of the memory allocated to In-Memory OLTP? That’s simply not possible.

Resource Governor

If you are running Enterprise Edition, then this problem gets solved by creating a resource pool. Now, to be clear, that doesn’t mean you can’t run out of memory for memory-optimized objects. It only means that your In-Memory workload can’t affect the on-disk workload, unless of course you configure the resource pool incorrectly. I’ve got a blog post on how to monitor resource pools here.

Let’s create a resource pool, with an artificially low upper bound, and insert rows until we hit the limit.

On my server, I was able to INSERT 305 rows before the pool ran out of memory, and receiving error 41805:

image

Causes of OOM

What can cause a memory-optimized database to run out of memory? It could be that resource consumption (memory) exceeded:

  • the relevant percentage of committed_target_kb from the sys.dm_os_sys_info DMV (explained in a moment)
  • MAX_MEMORY_PERCENT value of a Resource Pool that the database is bound to (if running Enterprise Edition and using Resource Governor)

or:

  • garbage collection is not operational (the purpose of GC is to reclaim memory consumed by stale row versions)
  • updates to memory-optimized table variables caused row versions to be created, and because GC does not operate on table variables, you ran out of memory (for table variables that have a very large amount of rows)

The only thing that can prevent GC from working is a long running transaction.

committed_target_kb

We are supposed to base our belief of how much memory is available for our memory-optimized databases, upon committed_target_kb from the sys.dm_os_sys_info DMV. Memory available for In-Memory OLTP is expressed as a percentage of committed_target_kb, based on total system memory, which is detailed here. Prior to SQL 2016/SP1, the In-Memory OLTP feature was only supported on Enterprise Edition, and the amount of memory allocated to SQL Server was limited to what the operating system supported.

But in a post-SQL 2016/SP1 world, things are different, because the In-Memory OLTP feature is now supported on non-enterprise editions. This means that people will start deploying In-Memory OLTP on servers with a lot less memory than is possible with Enterprise, and therein lies a potential issue.

The problem is that committed_target_kb is a moving target. 

From the documentation:

Applies to: SQL Server 2012 through SQL Server 2017.
Represents the amount of memory, in kilobytes (KB), that can be consumed by SQL Server memory manager. The target amount is calculated using a variety of inputs like:
– the current state of the system including its load
– the memory requested by current processes
– the amount of memory installed on the computer
– configuration parameters
If committed_target_kb is larger than committed_kb, the memory manager will try to obtain additional memory. If committed_target_kb is smaller than committed_kb, the memory manager will try to shrink the amount of memory committed. The committed_target_kb always includes stolen and reserved memory.

Those parts about “the current state of the system including its load” and “the memory requested by current processes” concern me. If there is x amount of memory available on a server, and you check the value of committed_target_kb when the server is “at rest”, then under load there might in fact be much less memory available. I believe this is one of the main causes of OOM for memory-optimized workloads, especially when people do a POC on under-provisioned machines (like laptops).

Database restore and recovery

The process of recovering a database is different for databases with durable memory-optimized data.

Step 1: the backup file is read, and the various types of of files are created. For example, all MD/NDF/LDF and data and delta files are created.

Step 2: data is copied from the backup into the files created in Step 1. If you restore a database WITH NORECOVERY, you have completed both Step 1 and Step 2

Step 3: For databases with durable memory-optimized data, there is one additional step, and that’s to stream data from the Checkpoint File Pairs (data/delta files) back into memory

It should be noted that if the backup contains both on-disk and memory-optimized tables, none of the on-disk data is available until all of the memory-optimized data has finished streaming. When restoring a backup – whether the database has memory-optimized data or not – the process short-circuits if there isn’t enough free space to create the files in Step 1. Unfortunately, no such validation of available memory is done for Step 3. That means you can spend a long time creating files on disk, then spend an additional lengthy amount of time streaming data to memory, only to find that you don’t have enough memory. If you think Microsoft should change this, please upvote my Connect item.

When data is streamed into memory, the wait type will be WAIT_XTP_RECOVERY.

The unwary DBA would logically think that the only time you can see WAIT_XTP_RECOVERY is when actually restoring a database with memory-optimized data, but unfortunately that’s not correct. The Microsoft documentation doesn’t list all of the possible “recovery events” that can cause restreaming, but through my own testing, I’ve come with the following list:

setting a database:

  • OFFLINE
  • READ_ONLY when it was READ_WRITE
  • READ_WRITE when it was READ_ONLY

Also, setting Read Committed Snapshot Isolation ON or OFF, will cause restreaming.

Additionally, the speed of restreaming is directly influenced by the number of volumes that you have created containers on, and the IOPS available from those volumes.

Potential solutions to OOM

  1. Open a DAC (Dedicated Admin Connection). Then delete rows, and/or move data from memory to disk.
  2. Increase system memory
  3. If Garbage Collection for row versions is not operational (due to long running transactions), clear up those long-running transactions so that GC can proceed

If you attempt to move data from memory-optimized tables to disk-based tables, i.e. using SELECT INTO, please note that it’s possible to create schema for memory-optimized tables that you can’t simply migrate to disk.

For example, the following CREATE TABLE is perfectly legal for memory-optimized tables, but will fail for disk-based tables (and also fails if using SELECT * INTO on-disktable FROM in-memtable):

The ability to create tables like this is detailed at this link, with the relevant section being:

“…you can have a memory-optimized table with a row size > 8060 bytes, even when no column in the table uses a LOB type. There is no run-time limitation on the size of rows or the data in individual columns; this is part of the table definition.”

What happens if you hit OOM

So how does hitting OOM affect workloads for memory-optimized databases?

SELECT still works, and also DELETE and DROP, but of course INSERT and UPDATE will fail.

CPU bound

Last but not least, I wanted to touch on potential CPU issues for memory-optimized databases. Database recovery can be CPU bound under the following circumstances:

  • many indexes on large memory-optimized tables (2014, 2016)
  • too many LOB columns (2016+)
  • incorrect bucket count set for HASH indexes (2014, 2016, 2017)

The first item in this list, “many indexes on large memory-optimized tables (2014, 2016)” has supposedly been addressed in SQL 2017.

LOB columns are actually stored as separate memory-optimized tables, and as noted by Dmitri Korotkevitch (blog) in this post, can impact performance.

The “incorrect bucket size for HASH indexes” issue persists to this day. If the bucket count is too low, there will be many sets of key columns that hash to the same value, increasing the chain length, and having not only a terrible effect on performance in general, but database recovery in particular.

Wrapping up

Hopefully this mini-series about resource consumption for memory-optimized workloads has given you a clear understanding of why Microsoft recommends the following:

  • 2x data set in memory for starting memory allocation (only for In-Memory, does not include memory for on-disk workload)
  • 3x workload IOPS from disks where containers are stored (handles operational workload plus read/write File Merge workload)
  • 4x durable memory-optimized data size for initial storage footprint

These are rough guides, but should be observed at first, and then tuned as required.

This concludes the series on resources issues for In-Memory OLTP.

In-Memory OLTP Resources, Part 3: OOS (Out of Storage)

Zero free space

This is a continuation of Part 1 and Part 2 of this blog post series, related to resource issues/requirements for memory-optimized databases.

In this post, we’ll continue with simulating what happens to a memory-optimized database when all volumes run out of free space.

In my lab, I’m running Windows Server 2012. Let’s use Powershell to install the File System Resource Manager, which will allow us to create a quota for the relevant folder:

add-windowsfeature –name fs-resource-manager –includemanagementtools

After installing the Windows feature we can set the quota for the folder, but we shouldn’t enable it just yet, because first we have to verify the current size of the folder.

On my server, I created a quota of 1.5GB, and then enabled it.

Now let’s INSERT rows into the table, in batches of 1000, until we reach the limit (the INSERT script is listed in Part 2, I’m trying to keep this post from getting too long).

Once the quota has been reached, we receive the dreaded 41822 error – this is what you’ll see when all of the volumes where your containers reside run out of free space (if even one of the volumes has free space, your workload can still execute).

image_thumb[389]_thumb

Just out of curiosity, we’ll verify how many rows actually got inserted. On my server, I’ve got 4,639 rows in that table, and the folder consumes 1.44GB. So theoretically, there was enough space on the drive to create more checkpoint files, but it seems as though the engine won’t just create what it can to fit in the available space. It’s more likely that the engine attempts to precreate a set of files, and it either succeeds or fails all at once, but I’ve not confirmed that.

I disabled the quota, executed a manual CHECKPOINT, and ran the diagnostic queries again:

image_thumb[26]

File Merge

Data files persist rows that reside in durable memory-optimized tables, and delta files store references to logically deleted rows. As more and more rows become logically deleted across different sets of CFPs, two things happen:

  1. the storage footprint increases (imagine that all data files have 50% of their rows logically deleted)
  2. query performance gets worse, because result sets must be filtered by entries in the delta files, which are increasing in size

Microsoft killed both of these birds with one stone: File Merge (aka Garbage Collection for data/delta files)

In the background – while your workload is running – the File Merge process attempts to combine adjacent sets of CFPs, and this is where we get to one of the file states that we didn’t cover in Part 1: MERGE TARGET

A file that has the fileType of MERGE TARGET is the new set of combined data/delta files from the File Merge process. Once the merge has completed, the MERGE TARGET transitions to ACTIVE, and as we stated earlier in this series, ACTIVE files can no longer be populated.

But what about the source files that the MERGE TARGET is derived from? After a CHECKPOINT, these files transition to WAITING FOR LOG TRUNCATION, and can be removed. It should be noted that it can take several checkpoints and transaction log backups for CFPs to transition to a state where they can actually be removed. That’s why Microsoft recommends 4x durable memory-optimized data size for the initial storage footprint.

In the images that follow, we can see that the formerly distinct transaction ranges of 101 to 200, and 201 to 300, have been combined into a single CFP, which has the range of 101 to 300.

image_thumb[29]

image_thumb[31]

image_thumb[33]

Effect on backup size

File Merge – and the requisite file state changes that CFPs must go through – explain why backups for memory-optimized databases can be considerably larger than the amount of data stored in memory. Until CFPs go through the required state changes, they must be included in backups.

IOPS

The File Merge process requires both storage and IOPS, as it reads from both sets of CFPs, and writes to a new set. Let’s say your workload requires 500 IOPS to perform well. We’ve just added another 1,000 IOPS as a requirement for your workload to maintain the same level of performance: 500 IOPS each for the read and write components of File Merge. That’s why Microsoft recommends 3x workload IOPS for your memory-optimized storage.

Potential remedies, real and imagined

What happens to your memory-optimized database when all volumes run out of free space?

In my testing of inserts that breached the quota for the folder, I saw no affect on database status. However, if I created the database, set the quota to a much lower value, and then created a memory-optimized table, the database status became SUSPECT. In a real-world situation, with hundreds of gigabytes or more of memory-optimized data, the last thing you want to do is a database restore in order to return your database to a usable state.

I was able to set the database OFFLINE, and then ONLINE, and that cleared the SUSPECT status. But keep in mind, that setting the database OFFLINE/ONLINE will restream all your data, so there will be a delay in database recovery due to that.

What can you do if your volumes run out of free space?

Well, in SQL 2014, your database went into “SUSPENDED” mode (not suspect), and it was offline, until perhaps you added more space and restarted the database (not sure, I didn’t test that). In SQL 2016+, the database goes into what’s known as “delete-only mode”, where you can still SELECT data, but modifying data is limited to deleting rows and/or dropping indexes/tables. Of course, SELECT, DELETE, and DROP to nothing to solve your problem: you need more free space.

When a database transitions to delete-only mode, that fact is written to the SQL errorlog:

[WARNING] Database ID: [9]. Checkpoint hit an error code 0x8300000a. Database is now in DeleteOnlyMode

You might think that you can issue CHECKPOINT manually, and do transaction log backups, hoping that File Merge will kick in. Or you could manually execute File Merge, with this uber-long thing:

EXEC sys.sp_xtp_checkpoint_force_garbage_collection <dbname>

But keep in mind that if there was no additional free space on the volumes to precreate CFPs, then it’s not likely that there will be enough free space to write a new set of CFPs for DBA-initiated File Merge.

The only thing you can do to remedy this situation is to either free up some space on the existing volumes, or create a new container on a new volume that has free space.

In Part 4, we’ll discuss memory in the same ways we’ve discussed storage – how it’s allocated, and what happens to your memory-optimized workload when you run out of it.

Entire database in memory: Fact or fiction?

HP Servers and Persistent Memory

Advances in hardware and software have converged to allow storing your entire database in memory (depending on how large it is), even if you don’t use Microsoft’s In-Memory OLTP feature.

HP Gen9 servers support NVDIMM-N (known as Persistent Memory), which at that time had a maximum size of 8GB, and with 16 slots, offered a total server capacity of 128GB. Hardly large enough to run today’s mega-sized databases, and also there was no way to actually store your database there. So the use case for SQL Server 2016 was to store log blocks for transaction logs there. This could be beneficial in general, but particularly when using durable memory-optimized tables. That’s because WRITELOG waits for the transaction log could be a scalability bottleneck, which reduced the benefit of migrating to In-Memory OLTP.

There were other potential issues when using Persistent Memory, detailed in this blog post. But what’s not covered in that post is the fact that deploying NVIDMM-N reduced the memory speed and/or capacity, because they are not compatible with LRDIMM. This causes you to use RDIMM, which reduces capacity, and because NVDIMM-N operates at a slower speed than RDIMM, it also affects total memory speed.

HP has since released Gen10 servers, and they have changed the landscape for those seeking reduced latency by storing larger data sets in memory. For one thing, they raise the bar for what’s now referred to as Scalable Persistent Memory, with a total server capacity of 1TB. To be clear, NVDIMM-N is not used in this configuration. Instead, regular DIMMs are used, and they are persisted to flash via a power source (this was also the case for NVDIMM-N, but both the flash, DIMM, and power source were located on the NVDIMM-N).

In this video, Bob Ward demonstrates ~5x performance increase for the industry’s first “disklesss” database, using a HPE Gen10 server, SUSE Linux, Scalable Persistent Memory, and columnstore (presumably on a “traditional/formerly on-disk table”, not a memory-optimized table, although that’s not specifically detailed in the video).

Brett Gibbs, Persistent Memory Category Manager for HP servers, states in this video that even databases that use In-Memory OLTP can benefit from Scalable Persistent Memory, because the time required to restart the database can be significantly reduced. He stated that a 200GB memory-optimized database that took 20 minutes to restart on SAS drives, took 45 seconds using Persistent Scalable Memory. However, no details are provided about the circumstances under which those results are obtained.

We are left to guess about the number of containers used, and the IOPS available from storage. It may be that in both cases, they tested using a single container, which would be a worst practice. And if that’s correct, to reduce database restart time all you had to do was spread the containers across more volumes, to “parallelize” the streaming from storage to memory.

I’m assuming that the 45 seconds specified represents the amount of time required to get durable memory-optimized data from flash storage back into memory. If that’s correct, then the reduction of time required to restart the database has nothing to do with the Scalable Persistent Memory (other than memory speed), and everything to do with how fast flash storage can read data.

Licensing

The HP video also details how there might be a licensing benefit. It’s stated that if your workload requires 32 cores to perform well, and you reduce latency through the use of Scalable Persistent Memory, then you might be able to handle the same workload with less cores. I’d love to see independent test results of this.

In-Memory OLTP

If you are considering placing a database entirely in memory, and don’t want to be tied to a specific hardware vendor’s solution, In-Memory OLTP might be an option to consider.

This is an extremely vast topic that I’ve been interested in for quite a while, and I’ll summarize some of the potential benefits:

  • Maintaining referential integrity – Microsoft recommends keeping cold data in on-disk tables, and hot data in memory-optimized tables. But there’s just one problem with that: FOREIGN KEY constraints are not supported between on-disk and memory-optimized tables. Migrating all data to memory-optimized tables solves this specific issue.
  • Native compilation – if you want to use native compilation, it can only be used against memory-optimized tables. If you can deal with the potential TSQL surface area restrictions, migrating all data to memory-optimized tables might allow greater use of native compilation.
  • Single table structure – if you were to keep cold data on disk, and hot data in-memory, you would need to use two different table names, and perhaps reference them through a view. Migrating all data to memory-optimized tables solves this problem.
  • Unsupported isolation levels for cross-container transactions – it’s possible to reference both on-disk and memory-optimized tables in a single query, but memory-optimized tables only support a subset of the isolations that are available for on-disk tables, and some combinations are not supported (SNAPSHOT, for example).
  • Near-zero index maintenance – other than potentially reconfiguring the bucket count for HASH indexes, HASH and RANGE indexes don’t require any type of index maintenance. FILLFACTOR and fragmentation don’t exist for any of the indexes that are supported for memory-optimized tables.
  • Very large memory-optimized database size – Windows Server 2016 supports 24TB of memory, and most of that could be assigned to In-Memory OLTP, if you are using Enterprise Edition. This is way beyond the capacity supported by the current line of HP servers using Scalable Persistent Memory.

One extremely crucial point to make is that if you decide to migrate an entire database to In-Memory OLTP, then database recovery time must be rigorously tested. You will need to have enough containers spread across enough volumes to meet your RTO SLA.


In-Memory OLTP Resources, Part 1: The Foundation

This multi-part blog post will cover various resource conditions that can affect memory-optimized workloads. We’ll first lay the foundation for what types of resources are required for In-Memory OLTP, and why.

The following topics will be covered :

  • causes of OOM (Out of Memory)
  • how files that persist durable memory-optimized data affect backup size
  • how memory is allocated, including resource pools, if running Enterprise Edition
  • potential effect on disk-based workloads (buffer pool pressure)
  • what happens when volumes that store durable memory-optimized data run out of free space
  • what you can and cannot do when a memory-optimized database runs out of resources
  • database restore/recovery
  • garbage collection (GC) for row versions and files (file merge)
  • BPE (buffer pool extension)

Like most everything in the database world, In-Memory OLTP requires the following resources:

  • storage
  • IOPS
  • memory
  • CPU

Let’s take storage first – why would a memory-optimized database require storage, what is it used for, and how much storage is required?

Why and What?

You’ll need more storage than you might expect, to hold the files that persist your durable memory-optimized data, and backups. 

How much storage? 

No one can exactly answer that question, as we’ll explain over the next few blog posts. However, Microsoft’s recommendation is that you have 4x durable memory-optimized data size as a starting point for storage capacity planning.

Architecture

A memory-optimized database must have a special filegroup designated for memory-optimized data, known as a memory-optimized filegroup. This special filegroup is logically associated with one or more “containers”. What the heck is a “container”? Well, it’s just a fancy word for “folder”, nothing more, nothing less. But what is actually stored in those fancy folders?

Containers hold files known as “checkpoint file pairs”, which are also known as “data and delta files”, and these files persist durable memory-optimized data (in this blog post series, I’ll use the terms CFP and data/delta files interchangeably). You’ll note on the following image that it clearly states in bold red letters, “NO MAXSIZE” and “STREAMING”. “NO MAXSIZE” means that you can’t specify how large these files will grow, nor can you specify how large the container that houses them can grow (unless you set a quota, but you should NOT do that). And there’s also no way at the database level to control the size of anything having to do with In-Memory OLTP storage – you simply must have enough available free space for the data and delta files to grow.

This is the first potential resource issue for In-Memory OLTP: certain types of data modifications are no longer allowed if the volume your container resides upon runs out of free space. I’ll cover workload recovery from resource depletion in a future blog post.

“STREAMING” means that the data stored within these files is different than what’s stored in MDF/LDF/NDF files. Data files for disk-based tables store data rows on 8K pages, a group of which is known as an extent. Data for durable memory-optimized tables is not stored on pages or extents. Instead, memory-optimized data is written in a sequential, streaming fashion, like the FILESTREAM feature (it should be noted that you do not have to enable the FILESTREAM feature in order to use In-Memory OLTP, and that statement has been true since In-Memory OLTP was first released in SQL 2014).   

  Storage1

How do these data/delta files get populated? All that is durable in SQL Server is written to the transaction log, and memory-optimized tables are no exception. After first being written to the transaction log, a process known as “offline checkpoint” harvests changes related to memory-optimized tables, and persists those changes in the data/delta files. In SQL 2014, there was a single offline checkpoint thread, but as of SQL 2016, there are multiple offline checkpoint threads. 

Storage2

Let’s create a sample database:

After creating the database, the InMemOOMTest folder looks like this:

image

OOM_DB_inmem1 and OOM_DB_inmem2 are containers (folders), and they’ll be used to hold checkpoint file pairs. You’ll note in the DDL listed above, that under the memory-optimized filegroup, each container has both a name and filename entry. The name is the logical name of the container, while the filename is actually the container name, which represents the folder that gets created on disk. Initially there are no CPFs in the containers, but as soon as you create your first memory-optimized table, CFPs get created in both containers.

If we have a look in one of the containers, we can see files that have GUIDs as names, and are created with different sizes.

image

This is definitely not human-readable, but luckily, Microsoft has created a DMV to allow us to figure out what these files represent.

Below we can clearly see that there are different types of files, and that files can have different “states”, which is central to the discussion of the storage footprint for memory-optimized databases, and backups of those databases. There are different values for container_id – remember we said that a memory-optimized database can have one or more containers. Next, we should pay attention to the fact that all entries for the “relative_file_path” column begin with “$HKv2\”. This means that in each container, we have a folder with the name “$HKv2”, and all data/delta files for that container are located there.

image

At this point, it’s time for a discussion of the various file states. I’ll stick to SQL 2016+ (because SQL 2014 had more file states).

The possible file states are:

  • PRECREATED
  • UNDER CONSTRUCTION
  • ACTIVE
  • MERGE TARGET
  • WAITING FOR LOG TRUNCATION

We’ll discuss the first three now, and save MERGE TARGET and WAITING FOR LOG TRUNCATION for later.

PRECREATED: as a performance optimization technique, the In-Memory engine will “precreate” files. These precreated files have nothing in them – they are completely empty, from a durable data perspective. A file in this state cannot yet be populated.

UNDER CONSTRUCTION: when the engine starts adding data to a file, the state of the file changes from PRECREATED to UNDER CONSTRUCTION. Data and delta files are shared by all durable memory-optimized tables, so it’s entirely possible that the first entry is for TableA, the next entry for TableB, and so on. “UNDER CONSTRUCTION” could be interpreted as “able to be populated”.

ACTIVE: When a file that was previously UNDER CONSTRUCTION gets closed, the state transitions to ACTIVE. That means it has entries in it, but is no longer able to be be populated. What causes a file to be closed? The CHECKPOINT process will close the checkpoint, changing all UNDER CONSTRUCTION files to ACTIVE.

That’s the basic rundown of the file states we need to know about at this point.

In Part 2, we’ll dive deeper into the impact of data/delta file states and the storage footprint for memory-optimized databases.

The subtleties of In-Memory OLTP Indexing

For this post, I wanted to cover some of the indexing subtleties for memory-optimized tables, with an accent on columnstore indexes

Let’s create a memory-optimized table:

Now, let’s attempt to create a NONCLUSTERED COLUMNSTORE INDEX:

Msg 10794, Level 16, State 76, Line 76
The feature ‘NONCLUSTERED COLUMNSTORE’ is not supported with memory optimized tables.

It fails because we can only create a CLUSTERED columnstore index (CCI). For 25 years, Microsoft SQL Server differentiated between indexes that physically ordered data on storage (CLUSTERED) and those that did not (NONCLUSTERED). Unfortunately, they chose to ignore that pattern when creating the syntax for memory-optimized tables; using the word CLUSTERED is required when creating a columnstore index on memory-optimized tables.

Can we create a clustered columnstore index on a memory-optimized table that is defined as SCHEMA_ONLY?

Only one way to find out:

Msg 35320, Level 16, State 1, Line 39
Column store indexes are not allowed on tables for which the durability option SCHEMA_ONLY is specified.

That won’t work, so let’s create our table with SCHEMA_AND_DATA:

Now, let’s create a clustered columnstore index:

Success! Let’s attempt to create a NONCLUSTERED index….

Msg 10794, Level 16, State 15, Line 117
The operation ‘ALTER TABLE’ is not supported with memory optimized tables that have a column store index.

Ooops – no can do. Once you add a clustered columnstore index to a memory-optimized table, the schema is totally locked down.

What about if we create the CCI and nonclustered index inline?

Awesome! We’ve proven that we can create both clustered columnstore and nonclustered indexes, but we must create them inline.

Now that we’ve got our indexes created, let’s try to add a column:

Msg 12349, Level 16, State 1, Line 68
Operation not supported for memory optimized tables having columnstore index.

Hey, when I said that the schema is locked down once you add a clustered columnstore index, I mean it!

What type of index maintenance is possible for indexes on memory-optimized tables?

For HASH indexes there is only one possible type of index maintenance, and that’s to modify/adjust the bucket count. There is zero index maintenance for RANGE/NONCLUSTERED indexes.

Let’s create a memory-optimized table with a HASH index, and verify the syntax for rebuilding the bucket count.

Here’s the syntax for rebuilding the bucket count for a HASH INDEX:

We can add a column, as long as we don’t have a CCI in place:

How about trying to rebuild the bucket count if we created the memory-optimized table with inline CCI and HASH indexes?

Msg 10794, Level 16, State 13, Line 136
The operation ‘ALTER TABLE’ is not supported with memory optimized tables that have a column store index.

You can’t rebuild that index if you also have a columnstore index on the table. We would have to drop the columnstore index, reconfigure the bucket count for the HASH index, and then recreate the columnstore index. Both the drop and the create of the columnstore index will be fully logged, and executed serially. Not a huge problem if the amount of data is not too large, but it’s a potentially much larger problem if you’ve got a lot of data.

We can create a clustered columnstore index on a #temp table (on-disk):

We can create multiple indexes with a single command:

Can we create a columnstore index on a memory-optimized table variable?

Create a table that includes a LOB column with a MAX datatype, then add a clustered columnstore index:

Msg 35343, Level 16, State 1, Line 22    The statement failed. Column ‘Notes’ has a data type that cannot participate in a columnstore index. Omit column ‘Notes’.   

Msg 1750, Level 16, State 1, Line 22    Could not create constraint or index. See previous errors.

For memory-optimized tables, LOB columns prevent creation of a clustered columnstore index.

Now let’s try creating a table using CHAR(8000). Astute readers will notice that the following table would create rows that are 32,060 bytes wide – this would fail for on-disk tables, but is perfectly valid for memory-optimized tables:

Msg 41833, Level 16, State 1, Line 29    Columnstore index ‘CCI_InMemLOB’ cannot be created, because table ‘InMemLOB’ has columns stored off-row.   
Columnstore indexes can only be created on memory-optimized table if the columns fit within the 8060 byte limit for in-row data.   
Reduce the size of the columns to fit within 8060 bytes.

Create a table with non-MAX LOB columns, but they are stored on-row,  then add a clustered columnstore index:

Let’s create a natively compiled module that selects from this table:

ENABLE “Actual Plan” and SELECT – which index is used?

CCIPlan1

Now highlight the EXEC statement, and click “Estimated Plan” – which index is used?

CCIPlan2

The SELECT statement uses the columnstore index, but the natively compiled procedure does not (that’s because natively compiled procedures ignore columnstore indexes).

Summing up

In this post, we’ve covered some of the finer points of indexing memory-optimized tables. Never know when they might come in handy….