Dangerous moves: Setting max size for In-Memory OLTP containers

I recently saw a thread on twitter, where the OP talked about setting the max size for an In-Memory OLTP container. I responded as I always do: it’s not possible to set a limit on anything having to do with storage for In-Memory OLTP.

Unfortunately, that’s not correct: through SSMS or TSQL, you can in fact set a max size for a container.

But you should not ever do that…..

Why?

Because if you do, and your checkpoint files exceed the max size of the container, your database can go into the In Recovery, Suspect, or OFFLINE state. The following code reproduces this issue:

Note that I’ve not yet found a way around this. The OP from that thread on twitter said he had to actually restart the SQL Server service to resolve the issue with that database, but I don’t see why that would make any difference (when I tried it, the database attempted recovery, but eventually went offline).

Setting a max size for the container is a really, really really bad idea, because it guarantees that the database will have some form of outage when you hit the threshold. The bottom line is that containers must be free to grow, period. That’s part of the capacity planning good DBAs will do before deploying the In-Memory OLTP feature.

Trials and tribulations of learning Linux

Decades ago, before Microsoft SQL Server existed, I spent $500 (quite a hefty sum in those days) attempting to learn C language programming and Unix. It was the best $500 I ever spent, because it informed me that my brain simply does not work well with that technology (or at least, it didn’t back then). Fast forward to 2017, and voila!: SQL Server runs on Linux. But this time, there are some big differences. For one thing, Powershell can ease the burden of learning *nix commands. Also, Linux has the ability to install a desktop.

And so I’ve begun my deep dive into various aspects of running SQL Server on Linux, and Ubuntu was my distribution of choice.

Windows man

This life-long Windows SQL Server DBA depends on the ease with which one can copy and paste in either direction between a guest VM and the host, using RDP – it’s a huge time saver. Folks in the Linux world love to type stuff, and that’s ok with me, because I started in technology in the days before Windows existed, so I’m a seasoned MS-DOS/command line guy.

While researching various aspects of what’s possible on Linux, I read a lot of blog posts, and some of them had long lists of commands. While I could have collected those commands into a file on my Windows host and copied that file to the Linux guest, I simply wanted to copy and paste to and from my Ubuntu VM running on Hyper-V.

Alas, that was not to be.

If you search the web for “copy paste Ubuntu Hyper-V”, you’ll find loads of answers in forums, dispensing all types of advice that might have been good at that time. But now it’s 2018, and I tripped across this blog post from Craig Wilhite @ Microsoft:

Sneak Peek: Taking a Spin with Enhanced Linux VMs

There, it details how to go about setting up Enhanced Linux VMs, and so I downloaded Ubuntu Server 18.04, and got to work, following that blog post to the letter.

Denied

I spent the better part of a week after hours, trying to get this to work, plugging the error messages into search engines to see what came back.

After entering credentials into xrdp, I received the message: “Video remoting was disconnected”, and searching for that led me to this thread on github, which is related to Craig Wilhite’s post.

So clearly, others had experienced this issue, but there didn’t seem to be any resolution. I posted a message, asking for what next steps I might take, and followed recommendations, but nothing panned out. Finally, Craig suggested that perhaps the difference was due to the fact that I was using Ubuntu server, and he had verified the steps using Ubuntu desktop. I just finished testing with Ubuntu desktop, and hallelujah, Enhanced Session Linux VMs work with Ubuntu desktop.

But the entire reason I wanted to experiment with the server version was to investigate Kubernetes, and I wanted to use Ubuntu server for that.

As luck would have it, the next day I attended a webinar given by Argenis Fernandez (b| t) on using SQL Server on containers, and during the presentation, Argenis mentioned MobaXterm, which allows copy/paste, and has a free version. So I reinstalled Ubuntu server, installed MobaXterm, and lo and behold, I now have bidirectional copy/paste between host and guest.

That’s how it is when you learn any new, unfamiliar technology – you spin your wheels, make mistakes, fail, and if you push through and leave your mind open, you can be rewarded with expertise.

New kid on the block: sp_BlitzInMemoryOLTP

In-Memory OLTP has been included in the last three releases of SQL Server, starting with 2014 through 2017, and now runs on Linux, Windows, Azure SQL Database, and Azure Managed Instances. Additionally, since SQL 2016/SP1, the In-Memory OLTP feature has been available in non-enterprise editions.

What does this all mean?

It most likely means that it’s only a matter of time before a memory-optimized database lands on your doorstep, and you’ll probably have no idea how or why it’s different.

For a while now, I’ve been working on a script to evaluate a SQL Server environment for anything related to In-Memory OLTP, and I had help with testing, general suggestions, and final touches from Konstantin Taranov and Aleksey Nagorskiy; their assistance was invaluable. Konstantin suggested to Erik Darling and Brent Ozar that my script be included as part of their great Blitz series, and the the result is…..sp_BlitzInMemoryOLTP.

It is now part of the awesomeness known as the First Responder Kit, and the direct link to the script can be found here.

sp_BlitzInMemoryOLTP reports on two categories: instance level and database level.

First let’s discuss which parameters it sp_BlitzInMemoryOLTP accepts, and then we’ll break out the results, section by section.

@instanceLevelOnly BIT

This flag determines whether or not to simply report on the server-level environment (if applicable, i.e. there is no server-level environment for Azure SQL Database). With this parameter, memory-optimized databases are ignored. If you specify @instanceLevelOnly and a database name, the database name is ignored.

@dbName NVARCHAR(4000) = N’ALL’

If you don’t specify a database name, then sp_BlitzInMemoryOLTP reports on all memory-optimized databases within the instance that it executes in, or in the case of Azure SQL Database, the database that you provisioned. This is because the default for the @dbName parameter is N’ALL’.

Example:

It’s also possible to report on a specific database name.

Example:

The results of calling sp_BlitzInMemoryOLTP this way are explained later in this post.

@tableName NVARCHAR(4000) = NULL

Example:

If you only want to report on a specific memory-optimized table, you would supply a value for the @tableName parameter, and sp_BlitzInMemoryOLTP will search through all memory-optimized databases, looking for memory-optimized user tables that match. There is currently no wildcard matching for the @tableName parameter.

@debug BIT

Using the @debug =1 parameter tells sp_BlitzInMemoryOLTP to only print the TSQL statements that would have been executed. This allows you (or more likely, me) to resolve problems like missing quotes, or other potential issues that can occur when using dynamic SQL.

Example:

Supported platforms

This script has been tested on SQL 2014, SQL 2016, SQL 2017, and Azure SQL Database. It has not been tested against Azure Managed Instances

In the comments, please let me know other things about memory-optimized environments and/or databases you’d like to see included in the script.

How to interpret the results for sp_BlitzInMemoryOLTP

When you execute sp_BlitzInMemoryOLTP, it runs several queries that pertain to the In-Memory OLTP environment. It should be noted that if there are no results for a given query, i.e. no temporal memory-optimized tables, sp_BlitzInMemoryOLTP does not return an empty result set (this keeps the clutter to a minimum).

For example, it could be that a memory-optimized filegroup has been added to a database, but no memory-optimized objects have been created. Depending on the version of SQL Server, there might not be details about the containers or files within them, so sp_BlitzInMemoryOLTP won’t return information on that.

Instance level

Instance level evaluates the following:

  • the version/edition of SQL server
  • SQL Server ‘max memory’ setting
  • memory clerks
  • XTP memory consumers, aggregated
  • XTP memory consumers, detailed
  • the value of the committed_target_kb column from sys.dm_os_sys_info
  • whether or not instance-level collection of execution statistics has been enabled for all natively compiled stored procedures (because this can kill their performance….)
  • when running Enterprise, if there are any resource groups defined, and which memory-optimized databases are bound to them
  • XTP and buffer pool memory allocations, because In-Memory OLTP can affect on-disk workloads
  • summary of memory used by XTP

Section 1: version/edition of SQL server

Documentation here.

Section 2: SQL Server ‘max memory’ setting

Documentation here.

Section 3: memory clerks

Documentation here.

Section 4: XTP memory consumers, aggregated

Documentation here.

Section 5: XTP memory consumers, detailed

Section 6: the value of the committed_target_kb column from sys.dm_os_sys_info. The amount of memory that SQL Server can use for the In-Memory OLTP feature is a percentage of the committed_target_kb value. But be forewarned, this value is not static. Details in my post here.

Section 7: whether or not instance-level collection of execution statistics has been enabled for all natively compiled stored procedures. Enabling this on a production server could be considered drastic. More details can be found in my post here.

Section 8: if running Enterprise, if there are any resource groups defined, and which memory-optimized databases are bound to them. Binding memory-optimized database to a Resource Pool (using Resource Governor) is considered a best practice, but unfortunately this capability is still Enterprise only. But if you’re on that edition, you should also be monitoring how close to the out of memory threshold you’re getting, and fire an alert when required. More details in my post here.

Section 9: XTP and buffer pool memory allocations, because In-Memory OLTP can affect on-disk workloads

Database level

For a given memory-optimized database (or all memory-optimized databases), database level evaluates the following:

  • all memory-optimized tables
  • all indexes on all memory-optimized tables
  • the average chain length for HASH indexes (and informs you if the bucket count is too low)
  • the number of indexes per memory-optimized table
  • all natively compiled stored procedures
  • which native modules are loaded (stored procedures only, and this is not relevant for Azure SQL Database)
  • the number of natively compiled procedures
  • whether or not the collection of execution statistics is enabled for any natively compiled procedures
  • if using the temporal feature for memory-optimized tables, the amount of memory consumed by hidden temporal internal tables (which are memory-optimized)
  • memory structures for LOB columns (off-row)
  • all memory-optimized table types
  • database layout, which includes mdf, ldf, ndf, and containers, and the size in various formats (KB/MB/GB). The totalSizeMB column is the total for the entire database (uses a Window Function).

Three separate result sets that describe containers:

  • Container details by container name
  • Container details by fileType and fileState
  • Container file details by container_id, fileType and fileState

For Azure SQL Database, sp_BlitzInMemoryOLTP:

  • verifies if you are running on the Premium tier (that’s the only tier that supports In-Memory OLTP)
  • displays all records for xtp_storage_percent, in descending order (more info here)
  • displays the status of XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS (more info here)

The output in the photos that follow was returned from executing sp_BlitzInMemoryOLTP, for a database named OOM-DB. You can get information on all memory-optimized databases if you don’t supply a database name when calling sp_BlitzInMemoryOLTP.

Section 1: Listing of memory-optimized databases on this instance of SQL Server

· Section 2: memory-optimized tables, including row counts

Section 3: indexes on memory-optimized tables. It’s helpful to know how many, and what type of indexes there are.

Section 4: average chain length for HASH indexes (if any). When a HASH index is created for a memory-optimized table, a value must be supplied for what’s known as the “bucket count”. But it doesn’t get adjusted automatically, and as a result, it can cause performance problems. More details here.

Section 5: Number of indexes per memory-optimized table. SQL 2014 and SQL 2016 have a limit of 8 nonclustered (RANGE) indexes per memory-optimized table. That ceiling was lifted in SQL 2017, and I’ve tested creating several hundred indexes on a single memory-optimized table (but please don’t do that in production!).

clip_image028

Sections 6 through 8:

  • natively compiled stored procedures
  • which natively compiled stored procedures are currently loaded
  • how many natively compiled stored procedures there are

Section 9: if using the temporal feature for memory-optimized tables, the amount of memory consumed by hidden temporal internal tables (which are memory-optimized). For temporal tables, there’s a difference between how things are handled if the temporal table is memory-optimized. I’ve written about that in this post.

Section 10: memory structures for LOB columns (off-row). For memory-optimized tables, LOB columns are actually stored as separate tables, and this can lead to performance problems. MCM Dimitri Korotkovitch has a great post on it here.

Section 11: memory-optimized table types. Yes, tables and table types can be memory-optimized, and you’ll want to be aware of the potential gotchas with those memory-optimized types, as detailed in my post.

Section 12: all database files, including the name, size, and location for each container.

Sections 13 through 15 pertain to the amount of storage consumed by durable memory-optimized tables. The files that persist durable data to storage go through several state changes over time. As a result, the storage footprint for memory-optimized databases that contain durable data can be surprisingly large, relative to the amount of data that’s stored in memory (Microsoft suggest 4x memory-optimized data size as a starting point). So it’s a good idea to keep an eye on the storage footprint.

Section 13: Container details by container name

One row per container, listing the aggregated size of all files within that container, as well as how many files per container

Section 14: Container details by fileType and fileState

Here, the breakdown is a bit different, taking into account the type of file.

For each type of file, i.e. DATA or DELTA, aggregate the storage consumed and number of files for each file type, across ALL containers for this database. For example, there are a total of 11 files of fileType DATA with a fileState of ACTIVE, across all containers for this memory-optimized database. (Note that SQL 2014 has file types that don’t exist in later versions of SQL Server)

Section 15: Container file details by container_id, fileType and fileState

For each type of file, i.e. DATA or DELTA, aggregated the storage consumed and number of files for each file type, PER CONTAINER.

In the prior example, we saw that there were a total of 11 files of fileType DATA with a fileState of ACTIVE, across all containers for this memory-optimized databases.

This result shows the breakdown of each fileType and fileState PER CONTAINER. The container named InMemDB_inmem1 has 3 files that have a fileType of DATA and a fileState of ACTIVE. So we expect to see 8 more files with this type and state, in the remaining containers. Sure enough, we see that the container named InMemDB_inmem2 has an additional 8 files with a fileType of DATA and a fileState of ACTIVE.

Understanding how In-Memory OLTP works (with all of its various gotchas) can only be addressed by putting in the required time. If you read the documentation, and then study the real-world deployment concepts detailed in my extensive blog post series on In-Memory OLTP, you’ll be on the right path. Once you begin to wrap your brain around In-Memory OLTP, you’ll need some help evaluating memory-optimized environments and/or databases, and that’s where sp_BlitzInMemoryOLTP can help.

TDE and backup compression

In my recent post about “Options for smaller backups”, I intentionally omitted backup compression, which I’ll cover in this post. We’ll drill down a bit into the specifics of using TDE and backup compression together.

The history of TDE and backup compression is that until SQL 2016, they were great features that didn’t play well together – if TDE was in play, backup compression didn’t work well, or at all.

However, with the release of SQL 2016, Microsoft aimed to have these two awesome features get along better (the blog post announcing this feature interoperability is here). Then there was this “you need to patch” post, due to edge cases that might cause your backup to not be restored. So if you haven’t patched in a while, now would be a good time to do so, because Microsoft says those issues have been resolved (although that seems to be disputed here).

That “you need to patch” blog post was recently updated to make it (hopefully) crystal clear about the conditions under which database backups use a value for MAXTRANSFERSIZE that is other than the default, thereby optimizing the backup process. To be clear, the following conditions are specific to backups that do not use TDE. Without TDE, the engine will internally change the default MAXTRANSFERSIZE, if:

  • your database (not your backup) has >1 file
  • you are backing up to URL

BUT – if TDE is enabled for the database you’re backing up – and you don’t supply a value for MAXTRANSFERSIZE, the engine uses a MAXTRANSFERSIZE of 65536 (64K), and the new algorithm for getting good compression with TDE will not be used.

You must supply a value for MAXTRANSFERSIZE of at least 65537 (one byte > 64K) to enable the new compression algorithm when using TDE.

Yeah, it’s sort of hackish, and Microsoft is aware of that, but that’s the way it is for now.

I’ll update this post if/when more information becomes available about the co-existence of these features.

Options for smaller backups

I have a client that is running SQL 2016 Enterprise, and wants to get a full backup offsite every day. They’ve been doing it for over 5 years, and are now seeing scalability issues. 

In researching this blog post, I found a lot of useful information written by Dmitri Korotkevitch, who blogged about “Size does matter: 10 ways to reduce the database size and improve performance in SQL Server”. There is some overlap between his post and mine, but those who are interested in this topic will probably want to read both.

SPARSE columns

IF a column contains mostly NULLs, then depending on the data type, you can achieve space savings by using the SPARSE property (documentation here). SPARSE columns can be used with filtered indexes to theoretically reduce storage space and increase query performance. But there are a boatload of gotchas, such as issues with query plan caching (filtered indexes), and the fact that if you use SPARSE columns, neither the table or indexes can have any form of compression (the documentation is clear about not supporting table compression, but does not mention index compression being an issue – but it is).

As the documentation clearly states, when converting a column from non-sparse to sparse, the following steps are taken:

  1. Adds a new column to the table in the new storage size and format
  2. For each row in the table, updates and copies the value stored in the old column to the new column
  3. Removes the old column from the table schema
  4. Rebuilds the table (if there is no clustered index) or rebuilds the clustered index to reclaim space used by the old column

For large tables with even a few columns that you wanted to convert to SPARSE, this process would take forever, because you must do this for each column you want to convert.

In 2016+, if the conditions are right, we can get minimal logging plus parallelism for INSERT statements (see this CAT team blog post for more information). You might do something like:

  1. create a new table, adding SPARSE to the relevant columns
  2. use INSERT <newtable> WITH (TABLOCK)/SELECT FROM <originaltable>
  3. recreate indexes
  4. drop original table

In my case, I decided to not use SPARSE columns, because of the restrictions related to using other forms of compression on tables/indexes.

Data and/or index Compression

Compressing rowstore data and/or indexes used to be an Enterprise-only feature, but that’s changed since SQL 2016/SP1. However, to get any real benefit from doing this (especially for an OLTP system), you need to use some form of partitioning (see below), which can be a monumental task. Some have stated that when attempting to use compression on very wide tables (500+ columns), compression can fail, and in that case, SPARSE columns are your only option, assuming you can’t use other features described in this post.

COMPRESS() and DECOMPRESS()

ROW and PAGE compression only work with in-row data. However, SQL 2016 introduced the ability to compress off-row data with the COMPRESS() function. Depending on how much off-row data your databases contain, you might get storage savings when using this, although it will require some form of application change to decompress the relevant column(s) when required.

CLUSTERED COLUMNSTORE

Another formerly Enterprise-only feature, again included in other editions since SQL 2016/SP1. For the right type of workload, i.e. not too write intensive, you might consider replacing a rowstore with a clustered columnstore. I want to be clear that when I write about clustered columnstore indexes replacing a rowstore, I’m referring to on-disk tables only. There’s a lot of confusion about this because memory-optimized tables also support clustered columnstore, but in that case, the columnstore does not replace the rowstore (please refer to my blog post on the differences between columnstore for on-disk vs. in-mem here). When using partitioning with data compression, you can decide which partitions are compressed, if any, and what form of compression to deploy – the supported options are PAGE, ROW, and NONE. Columnstore is “all or nothing at all”, even when used with table partitioning. You can choose between ARCHIVAL and non-archival columnstore compression, but there is no way to designate specific partitions as uncompressed, as is the case with data compression. The deltastore (where inserts initially land) is an uncompressed rowstore.

One potential problem when using clustered columnstore is that you can’t deploy it on a table that has triggers. Also, LOB types (NVARCHAR(MAX)) are not supported for clustered columnstore indexes until SQL 2017.

Separating clustered and PRIMARY KEY

If you have an existing clustered rowstore that’s defined as a CONSTRAINT (for example with CREATE/ALTER TABLE), and you want to replace it with a clustered columnstore, then you’ll have to drop the constraint before creating the columnstore. That’s because the DROP_EXISTING = ON syntax is not supported for ALTER TABLE.

And because the key columns of a clustered index are also stored in every nonclustered index, it might be faster to drop nonclustered indexes before dropping a constraint that’s also the clustering key.

Keep in mind that even though a clustered columnstore contains the word “clustered” – which in the rowstore world means that it’s physically ordered – clustered columnstore indexes have no order. To achieve the best rowgroup elimination, you would first have to physically order your data using a regular clustered index, and then create the clustered columnstore with DROP_EXISTING = ON.

Data types

Violating the fundamentals of database design can have far reaching effects, long after the original designers have moved on. Common mistakes are using MAX for VARCHAR/NVARCHAR columns that don’t need it, like FirstName/LastName/Address, etc., and using DATETIME when you don’t need the time tick values, like for a check date. You’re not likely to see the negative effects of this for a long time, but those who come after you will be left with headaches that are difficult to fix. Let’s say that you had a CheckDate column on a table with billions of rows, and the CheckDate column was part of the clustering key. All nonclustered indexes store the clustering key internally, so instead of storing 3 bytes for a CheckDate column based upon the DATE datatype, each nonclustered index will store an extra 5 bytes (total of 8 bytes) for the DATETIME datatype.

Other solutions

If you want to optimize the size of your backups, what’s been discussed to far can help. But eventually, you’ll probably hit some type of time and/or size constraint when doing backups, even if using compression. One solution to this issue is to use some form of partitioning, be it partitioned tables and/or partitioned views.

With partitioned tables, you can mark filegroups as readonly, back them up once, and from that point on do only full and differential filegroup backups. Even CHECKDB can be run for specific filegroups. But be forewarned – table partitioning was introduced in SQL 2005, and there hasn’t been a lot of investment in this feature in recent years. Partitioned views solve a lot of the problems that exist with partitioned tables, but they have their own gotchas, such as not being able to insert through a partitioned view if the any of the base tables have columns that use the IDENTITY property.

As is often the case, choosing the best solution includes balancing requirements with feature limitations.

Farewell, Robert

Everything about using the past tense when referring to Robert L. Davis aka @SQLSoldier feels wrong, but it’s true – Robert left this world a few days ago. Many years ago, I read his awesome book on database mirroring, and for a long time, that was my only connection to him.

Then in early 2013, I went on SQL Cruise, and part of the follow up was to start blogging. The topic of my first post was how I got into SQL Server, and when someone tweeted about it, Robert responded:

image

That was the essence of Robert’s spirit: always encouraging others.

Later that year, I met Robert at the PASS Summit, and discovered that we both had an over-the-moon affection for dogs. The conversation was short; I mostly listened to him and Argenis Fernandez trade war stories about when they worked together, but was thrilled to have finally met both of them.

In the ensuing years, the interaction between Robert and I was probably typical of his interactions with others – I followed him on Twitter, and would ask questions on #sqlhelp that he responded to – our connection was “virtual”. I saw him speak at PASS at least once, and watched videos of him presenting on a variety of SQL topics. His experience was vast, and he had an unending thirst for knowledge.

Then one day, he followed me back, and I was sort of “walking on clouds”, as they say.

Fast forward a bit, and I received a message from Robert that he’d be moving to my home town (NYC), and he was asking about the SQL community here.

image

He explained to me that he intended to buy a house, and knowing that he used a cane, I asked him why he wouldn’t want to live in an apartment.

image

And so I began to realize that Robert was a very private person, and although he obviously had a burning desire to help people, something about him was not crazy about “the public”.

We had some more back and forth about his potential move, some of it related to mass transit:

image

 

image

And then one day, I was somewhat stunned to receive this note from Robert:

image

I began the process of interviewing for a position on Robert’s team, and had a lot of interaction with him along the way. When it came time for the face-to-face, I will admit that I was somewhat terrified at the thought of receiving a technical interview from Robert. At the interview, he and his manager came up almost completely empty handed in the question department! Maybe they had a big lunch, maybe the stars had aligned, I don’t know, but I was struck by how uncomplicated it was. I wrote him afterwards, and apologized for missing a bunch of stuff.

image

I wish I would have been able to get to know Robert really well, but alas that was not to be. He and I shared the “really want to help others” philosophy of life, and when I saw him at the interview, I could see he was struggling with his personal health. I came really close to saying something – I thought to myself, here’s a person who can help anyone else, but has difficulty helping himself. I considered contacting others who knew Robert better than I, to try and talk to him. I’m not sure if the outcome would have been any different, but there’s a part of me that deeply regrets not trying.

But there’s risk in doing that, and I suppose I valued my connection with Robert more than taking a chance that I’d offend him, and have it affect our relationship.

Here’s to Robert L. Davis – our @SQLSoldier – a person who truly defined the gold standard of what it means to be a community contributor. He loved dogs without abandon, and received a lot by giving of himself. Not a bad life, when you look at it that way.

I am extremely grateful that our paths crossed.

PASS Summit 2017

What a difference a tweet makes

I’ve been involved with SQL Server longer than most folks, but was always a sort of lone wolf. In the early days, there was no community to speak of, and even when the community started to gain momentum, I shied away from it.

That changed after the 2015 Summit, when I made a conscious decision to become involved in the SQL Server community. I spoke at my first user group, began applying to SQL Saturdays, and last year, I submitted abstracts to speak at the 2017 Summit. I was not at all surprised that my abstracts were not selected, but after reviewing the schedule, I was a bit disappointed to see my favorite feature somewhat under-represented (In-Memory OLTP) .

So I took to twitter, and wrote this:

image

And so it began, my roundabout path to presenting at PASS Summit 2017.

Niko Neugebauer – who I had been in touch with for a while, but had never actually met in person – replied to my tweet, and a dialog ensued.

After a bit of back and forth with folks at PASS, a panel had been formed to specifically discuss In-Memory OLTP, and I would be a member of that panel. Included was of course Niko himself, who was a fantastic MC, as well as Bob Ward, Kevin Farlee, Tehas Shah, and Jos de Briujn. Sunal Agarwal was also supposed to join us, but due to scheduling issues, couldn’t make it. Many of the Microsoft panel members were responsible for actually delivering the In-Memory OLTP feature, and to say that I was honored to be on a panel with them would be a great understatement. It was really thrilling – definitely the highlight of my presenting life!

Presenting takes you deep

Think you know a topic well? Presenting will prove that you don’t! The simple act of organizing your thoughts, such that they can be imparted to others, forces you to drill down into a topic in a way that you would never otherwise get to. All of the facts about SQL Server are written in the documentation, but delivering those facts to a room full of people requires a variety of skills: creating slides, demos, scripts, and anticipating questions.

It’s “Live”

My life as a jazz musician before I got into SQL Server included a lot of public performance, so I was pretty comfortable being out in front of a room full of people. There’s some common ground between jazz and presenting – they are both “live”, and anything can happen at any time. Projectors fail, other issues arise, and you have to find a way forward no matter what.

You

I encourage all readers of this post to submit abstracts for the 2018 PASS Summit – my experience is proof that you never know what will happen!

In-Memory OLTP Resources, Part 4: OOM, the most feared acronym in all of In-Memory OLTP

Earlier parts of this series can be found here:

Part 1: The Foundation

Part 2: Checkpoint File Pairs

Part 3: OOS (Out of Storage)

This post will cover memory requirements and usage, and what happens if you run actually reach OOM, also known as ”Out Of Memory”, a condition that strikes fear in the hearts of DBAs supporting memory-optimized databases. We’ll also cover CPU-bound conditions.

How memory is allocated to the In-Memory OLTP engine

At a high level, the memory that’s allocated to the In-Memory OLTP engine comes from the SQL Server ‘max memory’ setting, as does everything else within SQL Server. But beneath that level, we need to be aware of memory pools.

image

The pool that can be used for allocating memory to the In-Memory OLTP engine depends on which edition you are running:

  1. if you are running Enterprise Edition, you can use Resource Governor to configure a Resource Pool. Memory-optimized databases can be bound to separate pools, or multiple databases can be bound to a single pool. If you don’t bind a memory-optimized database to a pool created with Resource Governor, then all memory allocations for In-Memory OLTP for that database comes from the Default pool.
  2. if you are NOT running Enterprise Edition, all memory for In-Memory OLTP is allocated from the Default pool.

If using the Default pool, then as a result of deploying the In-Memory OLTP feature, there can be performance issues with on-disk workloads.

The following image shows that as we add rows to memory-optimized tables – and put pressure on the buffer pool – the buffer pool responds by shrinking, and that can affect disk-based workloads. If we then delete rows from memory-optimized tables, the buffer pool can expand. But what if we don’t delete rows from memory-optimized tables? Then the buffer pool will stay in its reduced state (or shrink even more), and that can cause problems due to buffer churn (continually having to do physical I/Os to retrieve pages from storage, for disk-based workloads).

image

Astute readers will consider using Buffer Pool Extensions (BPE), which is available in Standard Edition only. Yes, you could do that, but BPE retrieves a single 8K page at a time, and can actually make performance worse. And in case you’re wondering, no, it’s not possible to compress memory-optimized data that’s stored in memory. Think Windows will actually page out any of the memory allocated to In-Memory OLTP? That’s simply not possible.

Resource Governor

If you are running Enterprise Edition, then this problem gets solved by creating a resource pool. Now, to be clear, that doesn’t mean you can’t run out of memory for memory-optimized objects. It only means that your In-Memory workload can’t affect the on-disk workload, unless of course you configure the resource pool incorrectly. I’ve got a blog post on how to monitor resource pools here.

Let’s create a resource pool, with an artificially low upper bound, and insert rows until we hit the limit.

On my server, I was able to INSERT 305 rows before the pool ran out of memory, and receiving error 41805:

image

Causes of OOM

What can cause a memory-optimized database to run out of memory? It could be that resource consumption (memory) exceeded:

  • the relevant percentage of committed_target_kb from the sys.dm_os_sys_info DMV (explained in a moment)
  • MAX_MEMORY_PERCENT value of a Resource Pool that the database is bound to (if running Enterprise Edition and using Resource Governor)

or:

  • garbage collection is not operational (the purpose of GC is to reclaim memory consumed by stale row versions)
  • updates to memory-optimized table variables caused row versions to be created, and because GC does not operate on table variables, you ran out of memory (for table variables that have a very large amount of rows)

The only thing that can prevent GC from working is a long running transaction.

committed_target_kb

We are supposed to base our belief of how much memory is available for our memory-optimized databases, upon committed_target_kb from the sys.dm_os_sys_info DMV. Memory available for In-Memory OLTP is expressed as a percentage of committed_target_kb, based on total system memory, which is detailed here. Prior to SQL 2016/SP1, the In-Memory OLTP feature was only supported on Enterprise Edition, and the amount of memory allocated to SQL Server was limited to what the operating system supported.

But in a post-SQL 2016/SP1 world, things are different, because the In-Memory OLTP feature is now supported on non-enterprise editions. This means that people will start deploying In-Memory OLTP on servers with a lot less memory than is possible with Enterprise, and therein lies a potential issue.

The problem is that committed_target_kb is a moving target. 

From the documentation:

Applies to: SQL Server 2012 through SQL Server 2017.
Represents the amount of memory, in kilobytes (KB), that can be consumed by SQL Server memory manager. The target amount is calculated using a variety of inputs like:
– the current state of the system including its load
– the memory requested by current processes
– the amount of memory installed on the computer
– configuration parameters
If committed_target_kb is larger than committed_kb, the memory manager will try to obtain additional memory. If committed_target_kb is smaller than committed_kb, the memory manager will try to shrink the amount of memory committed. The committed_target_kb always includes stolen and reserved memory.

Those parts about “the current state of the system including its load” and “the memory requested by current processes” concern me. If there is x amount of memory available on a server, and you check the value of committed_target_kb when the server is “at rest”, then under load there might in fact be much less memory available. I believe this is one of the main causes of OOM for memory-optimized workloads, especially when people do a POC on under-provisioned machines (like laptops).

Database restore and recovery

The process of recovering a database is different for databases with durable memory-optimized data.

Step 1: the backup file is read, and the various types of of files are created. For example, all MD/NDF/LDF and data and delta files are created.

Step 2: data is copied from the backup into the files created in Step 1. If you restore a database WITH NORECOVERY, you have completed both Step 1 and Step 2

Step 3: For databases with durable memory-optimized data, there is one additional step, and that’s to stream data from the Checkpoint File Pairs (data/delta files) back into memory

It should be noted that if the backup contains both on-disk and memory-optimized tables, none of the on-disk data is available until all of the memory-optimized data has finished streaming. When restoring a backup – whether the database has memory-optimized data or not – the process short-circuits if there isn’t enough free space to create the files in Step 1. Unfortunately, no such validation of available memory is done for Step 3. That means you can spend a long time creating files on disk, then spend an additional lengthy amount of time streaming data to memory, only to find that you don’t have enough memory. If you think Microsoft should change this, please upvote my Connect item.

When data is streamed into memory, the wait type will be WAIT_XTP_RECOVERY.

The unwary DBA would logically think that the only time you can see WAIT_XTP_RECOVERY is when actually restoring a database with memory-optimized data, but unfortunately that’s not correct. The Microsoft documentation doesn’t list all of the possible “recovery events” that can cause restreaming, but through my own testing, I’ve come with the following list:

setting a database:

  • OFFLINE
  • READ_ONLY when it was READ_WRITE
  • READ_WRITE when it was READ_ONLY

Also, setting Read Committed Snapshot Isolation ON or OFF, will cause restreaming.

Additionally, the speed of restreaming is directly influenced by the number of volumes that you have created containers on, and the IOPS available from those volumes.

Potential solutions to OOM

  1. Open a DAC (Dedicated Admin Connection). Then delete rows, and/or move data from memory to disk.
  2. Increase system memory
  3. If Garbage Collection for row versions is not operational (due to long running transactions), clear up those long-running transactions so that GC can proceed

If you attempt to move data from memory-optimized tables to disk-based tables, i.e. using SELECT INTO, please note that it’s possible to create schema for memory-optimized tables that you can’t simply migrate to disk.

For example, the following CREATE TABLE is perfectly legal for memory-optimized tables, but will fail for disk-based tables (and also fails if using SELECT * INTO on-disktable FROM in-memtable):

The ability to create tables like this is detailed at this link, with the relevant section being:

“…you can have a memory-optimized table with a row size > 8060 bytes, even when no column in the table uses a LOB type. There is no run-time limitation on the size of rows or the data in individual columns; this is part of the table definition.”

What happens if you hit OOM

So how does hitting OOM affect workloads for memory-optimized databases?

SELECT still works, and also DELETE and DROP, but of course INSERT and UPDATE will fail.

CPU bound

Last but not least, I wanted to touch on potential CPU issues for memory-optimized databases. Database recovery can be CPU bound under the following circumstances:

  • many indexes on large memory-optimized tables (2014, 2016)
  • too many LOB columns (2016+)
  • incorrect bucket count set for HASH indexes (2014, 2016, 2017)

The first item in this list, “many indexes on large memory-optimized tables (2014, 2016)” has supposedly been addressed in SQL 2017.

LOB columns are actually stored as separate memory-optimized tables, and as noted by Dmitri Korotkevitch (blog) in this post, can impact performance.

The “incorrect bucket size for HASH indexes” issue persists to this day. If the bucket count is too low, there will be many sets of key columns that hash to the same value, increasing the chain length, and having not only a terrible effect on performance in general, but database recovery in particular.

Wrapping up

Hopefully this mini-series about resource consumption for memory-optimized workloads has given you a clear understanding of why Microsoft recommends the following:

  • 2x data set in memory for starting memory allocation (only for In-Memory, does not include memory for on-disk workload)
  • 3x workload IOPS from disks where containers are stored (handles operational workload plus read/write File Merge workload)
  • 4x durable memory-optimized data size for initial storage footprint

These are rough guides, but should be observed at first, and then tuned as required.

This concludes the series on resources issues for In-Memory OLTP.

In-Memory OLTP Resources, Part 3: OOS (Out of Storage)

Zero free space

This is a continuation of Part 1 and Part 2 of this blog post series, related to resource issues/requirements for memory-optimized databases.

In this post, we’ll continue with simulating what happens to a memory-optimized database when all volumes run out of free space.

In my lab, I’m running Windows Server 2012. Let’s use Powershell to install the File System Resource Manager, which will allow us to create a quota for the relevant folder:

add-windowsfeature –name fs-resource-manager –includemanagementtools

After installing the Windows feature we can set the quota for the folder, but we shouldn’t enable it just yet, because first we have to verify the current size of the folder.

On my server, I created a quota of 1.5GB, and then enabled it.

Now let’s INSERT rows into the table, in batches of 1000, until we reach the limit (the INSERT script is listed in Part 2, I’m trying to keep this post from getting too long).

Once the quota has been reached, we receive the dreaded 41822 error – this is what you’ll see when all of the volumes where your containers reside run out of free space (if even one of the volumes has free space, your workload can still execute).

image_thumb[389]_thumb

Just out of curiosity, we’ll verify how many rows actually got inserted. On my server, I’ve got 4,639 rows in that table, and the folder consumes 1.44GB. So theoretically, there was enough space on the drive to create more checkpoint files, but it seems as though the engine won’t just create what it can to fit in the available space. It’s more likely that the engine attempts to precreate a set of files, and it either succeeds or fails all at once, but I’ve not confirmed that.

I disabled the quota, executed a manual CHECKPOINT, and ran the diagnostic queries again:

image_thumb[26]

File Merge

Data files persist rows that reside in durable memory-optimized tables, and delta files store references to logically deleted rows. As more and more rows become logically deleted across different sets of CFPs, two things happen:

  1. the storage footprint increases (imagine that all data files have 50% of their rows logically deleted)
  2. query performance gets worse, because result sets must be filtered by entries in the delta files, which are increasing in size

Microsoft killed both of these birds with one stone: File Merge (aka Garbage Collection for data/delta files)

In the background – while your workload is running – the File Merge process attempts to combine adjacent sets of CFPs, and this is where we get to one of the file states that we didn’t cover in Part 1: MERGE TARGET

A file that has the fileType of MERGE TARGET is the new set of combined data/delta files from the File Merge process. Once the merge has completed, the MERGE TARGET transitions to ACTIVE, and as we stated earlier in this series, ACTIVE files can no longer be populated.

But what about the source files that the MERGE TARGET is derived from? After a CHECKPOINT, these files transition to WAITING FOR LOG TRUNCATION, and can be removed. It should be noted that it can take several checkpoints and transaction log backups for CFPs to transition to a state where they can actually be removed. That’s why Microsoft recommends 4x durable memory-optimized data size for the initial storage footprint.

In the images that follow, we can see that the formerly distinct transaction ranges of 101 to 200, and 201 to 300, have been combined into a single CFP, which has the range of 101 to 300.

image_thumb[29]

image_thumb[31]

image_thumb[33]

Effect on backup size

File Merge – and the requisite file state changes that CFPs must go through – explain why backups for memory-optimized databases can be considerably larger than the amount of data stored in memory. Until CFPs go through the required state changes, they must be included in backups.

IOPS

The File Merge process requires both storage and IOPS, as it reads from both sets of CFPs, and writes to a new set. Let’s say your workload requires 500 IOPS to perform well. We’ve just added another 1,000 IOPS as a requirement for your workload to maintain the same level of performance: 500 IOPS each for the read and write components of File Merge. That’s why Microsoft recommends 3x workload IOPS for your memory-optimized storage.

Potential remedies, real and imagined

What happens to your memory-optimized database when all volumes run out of free space?

In my testing of inserts that breached the quota for the folder, I saw no affect on database status. However, if I created the database, set the quota to a much lower value, and then created a memory-optimized table, the database status became SUSPECT. In a real-world situation, with hundreds of gigabytes or more of memory-optimized data, the last thing you want to do is a database restore in order to return your database to a usable state.

I was able to set the database OFFLINE, and then ONLINE, and that cleared the SUSPECT status. But keep in mind, that setting the database OFFLINE/ONLINE will restream all your data, so there will be a delay in database recovery due to that.

What can you do if your volumes run out of free space?

Well, in SQL 2014, your database went into “SUSPENDED” mode (not suspect), and it was offline, until perhaps you added more space and restarted the database (not sure, I didn’t test that). In SQL 2016+, the database goes into what’s known as “delete-only mode”, where you can still SELECT data, but modifying data is limited to deleting rows and/or dropping indexes/tables. Of course, SELECT, DELETE, and DROP to nothing to solve your problem: you need more free space.

When a database transitions to delete-only mode, that fact is written to the SQL errorlog:

[WARNING] Database ID: [9]. Checkpoint hit an error code 0x8300000a. Database is now in DeleteOnlyMode

You might think that you can issue CHECKPOINT manually, and do transaction log backups, hoping that File Merge will kick in. Or you could manually execute File Merge, with this uber-long thing:

EXEC sys.sp_xtp_checkpoint_force_garbage_collection <dbname>

But keep in mind that if there was no additional free space on the volumes to precreate CFPs, then it’s not likely that there will be enough free space to write a new set of CFPs for DBA-initiated File Merge.

The only thing you can do to remedy this situation is to either free up some space on the existing volumes, or create a new container on a new volume that has free space.

In Part 4, we’ll discuss memory in the same ways we’ve discussed storage – how it’s allocated, and what happens to your memory-optimized workload when you run out of it.

In-Memory OLTP Resources, Part 2: Checkpoint File Pairs

In Part 1, we created a memory-optimized database, and explained the different states that CFPs can have.

In this post, we’ll take note of the changes to free space on the volumes that host our containers, before/after creating a memory-optimized table.

To begin with, we can see that the folder currently consumes 100MB of storage, because that’s how much we allocated for the MDF file when we created our database.

image_thumb[405]

image_thumb[406]

I’ve written a script that among other things, displays summary and detail information about memory-optimized databases, which can be found here. After changing the script to only give us details about the OOM_DB database, the relevant sections are listed below. In the somewhat voluminous output from the script, you can scroll down to the section entitled “’Database layout”.

We should note the following:

  1. First, the general database layout is reported, which includes all mdf and ldf files, as well as listing all containers. If you are not familiar with what a container is, please go back to Part 1.
  2. The second section displays details about each container, i.e. how large it is, and how many files reside there.
  3. Next we drill further down, and summarize at the fileType and fileState level, regardless of which container the file belong to.
  4. And finally, for each container, we detail the number of files, and aggregate the amount of storage consumed, per fileType and fileState.

This information is extremely valuable when assessing the storage state of a memory-optimized database.

OOM1

The containers consume 584MB and 568MB respectively (but after running these tests several times, it seems that the numbers fluctuate slightly), and all of the files in each container are “PRECREATED”. As we mentioned in Part 1, as a performance optimization, the In-Memory engine precreates files, and this has some interesting implications, which we’ll see later on.

The image above is what you’ll see when you have created a memory-optimized database, and created at least one memory-optimized table. But as I said earlier, if you’ve only created a memory-optimized database, the containers will be empty.

Let’s create our table (which is specifically designed to consume a lot of memory for each row):

I ran the diagnostic script again, and now you can see that all of the data and delta files are PRECREATED, because none of them have been populated yet.

image

Let’s INSERT 10 rows, and run the diagnostic script again.

GO 10

After adding 10 rows, we have:

image

It’s clear that before we inserted any data, we had 20 files that were in the PRECREATED state. After inserting 10 rows, we now have 18 PRECREATED files, and 2 UNDER CONSTRUCTION files, which means the In-Memory engine is populating these files, that they are “open” in terms of their CHECKPOINT status. If you don’t understand what these terms mean, please read Part 1.

But there’s one thing that doesn’t look right here: we’ve inserted data into the table, but sizeBytesUsed is still zero for the UNDER CONSTRUCTION files. Why is that?

The Microsoft documentation explains it:

“file_size_used_in_bytes: for checkpoint file pairs that are still being populated, this column will be updated after the next checkpoint.”

After executing a manual CHECKPOINT, the following image show the before/after state of our memory-optimized database. We can see the difference: we now have values for the sizeBytesUsed column for the UNDER CONSTRUCTION rows.

Please note:

  • The ‘Container details by fileType and fileState’ had only PRECREATED AND UNDER CONSTRUCTION data and delta files
  • All files that were UNDER CONSTRUCTION before executing the manual CHECKPOINT are now ACTIVE. This was discussed in Part 1 – when a CHECKPOINT occurs, the files that were being populated are now closed to further entries. This happens automatically, but sometimes you need to do it manually (more on that in a future post).
  • In the ‘Before’ image, data and delta files have two states, ACTIVE and PRECREATED.
  • In the ‘After’ image, data and delta files have three states, ACTIVE and PRECREATED, and UNDER CONSTRUCTION.
  • For the first time, we’re seeing the fileState of ‘WAITING FOR LOG TRUNCATION’( which we’ll explain in Part 3)

Before #################

image

After #################

image

In Part 3, we’ll dive deeper into IOPS and free space requirements, and how to reset the database status when all volumes run out of free space.