Category Archives: DBA

Troubleshooting Natively Compiled Stored Procedures, Part 1

A subset of the tools available for troubleshooting interpreted stored procedures are available for troubleshooting natively compiled procedures.

The following table highlights the differences

Method Interpreted Natively compiled
Recompile specific statements Supported Not supported – but theoretically not required, due to the impossibility of parameter sniffing
Execute procedure with RECOMPILE Supported Not supported – but theoretically not required, due to the impossibility of parameter sniffing
Estimated/Actual plan Supported “Estimated Plan” makes no sense in the context of natively compiled stored procedures. The plan that will be executed is available from SHOWPLAN_XML or by clicking (“Estimated Plan” in SSMS, but it’s not “estimated”)
Remove plan from plan cache Supported Not supported – plans for natively compiled stored procedures are not stored in the plan cache.
DBCC FREEPROCCACHE Supported No effect, because plans for natively compiled stored procedures are not stored in the plan cache.
SET STATISTICS IO ON Supported Not supported/required, because there is no such thing as IO for memory-optimized tables.
SET STATISTICS TIME ON Supported Supported, but might not be 100% accurate, because execution times less than 1 millisecond are reported as 0 seconds. Total_worker_time may not be accurate if many executions take less than 1 millisecond.
SET FMTONLY Supported Not supported, but you can use sp_describe_first_result_set.
SHOWPLAN_XML Supported Supported
SHOWPLAN_ALL Supported Not supported
SHOWPLAN_TEXT Supported Not supported
Mismatched datatypes xEvents hekaton_slow_parameter_passing, with reason = parameter_conversion.
Named parameters, i.e.

EXEC dbo.Proc @Param1 = @Param1

Supported Supported, but not recommended, due to performance impact. You can track this type of execution with xEvents hekaton_slow_parameter_passing, with reason = named_parameters.

If any SET options are in effect, statistics are gathered at the procedure level and not at the statement level.

Note 1: Statement-level execution statistics can be gathered with xEvents by capturing the sp_statement_completed event. They can also be seen using Query Store (detailed in a future post).

Note 2: Due to the nature of working with memory-optimized tables in general, it’s likely that you will have to implement retry logic. Because of this, and also because of feature limitations within the natively compiled space, Microsoft suggest using an interpreted TSQL wrapper when calling natively compiled stored procedures.

The following query references sys.dm_exec_query_stats to get statistics for natively compiled procedures:

Note 3: The documentation for sys.dm_exec_query_stats states that the total_rows, min_rows, max_rows, and last_rows columns cannot be NULL, but NULL is still returned. A Connect item has been filed to have those columns return 0.

Parallelism

Parallelism is supported for memory-optimized tables for all index types. While that statement is true when using interpreted stored procedures that reference memory-optimized tables, unfortunately it’s not true when using natively compiled stored procedures.

Actual vs. Estimated

These terms have confused generations of SQL Server technologists.

For natively compiled procedures, enabling “Actual Plan” in SSMS does not return any plan information, but still executes the procedure. Enabling “Estimated Plan” in SSMS for natively compiled procedures is the same as setting SHOWPLAN_XML ON, but does not actually execute the stored procedure. The plan that will be executed is displayed.

Removing plans from the cache

For interpreted stored procedures, DBAs have the ability to remove an offending plan from the plan cache. This is not possible with natively compiled stored procedures, because the plan is not stored in the plan cache.

DBCC FREEPROCCACHE

If you execute DBCC FREEPROCCACHE and expect your natively compiled plans to magically disappear, you will no doubt be disappointed. That’s because DBCC FREEPROCCACHE has no effect on compiled modules, as they are not stored in the plan cache that’s used for interpreted TSQL. But executing DBCC FREEPROCCACHE will of course remove all existing plans for interpreted TSQL from the plan cache (so don’t do that…unless you’re really, really sure you want to recompile all of your interpreted procs).

Parameter sniffing

With interpreted stored procedures, parameter sniffing can severely impact performance. Parameter sniffing is not possible for natively compiled procedures, because all natively compiled procedures are executed with OPTIMIZE FOR UNKNOWN.

Statistics

SQL 2016 has the ability to automatically update statistics on memory-optimized tables if your database is has a compatibility level of at least 130. If you don’t want to depend on SQL Server to determine when stats should be updated, you can update statistics manually (and we no longer have to use FULLSCAN, as was the case in SQL 2014). Statistics for index key columns are created when an index is created.

Database upgrades and statistics

As mentioned earlier, if your database was set to compatibility level 120 (SQL 2014), and you want to take advantage of auto-update statistics, you must change the compatibility level to 130. But statistics still won’t be auto-updated unless you manually update them one last time.

Recompilation

When you create a natively compiled stored procedure, it gets compiled, and execution plans for the queries contained within the procedure are created. As the data changes, those execution plans will be based on older statistics, and might not perform at the highest possible level. Many people think that if you update statistics, natively compiled stored procedures will magically recompile. Unfortunately, this is not correct – natively compiled stored procedures are only recompiled under the following circumstances:

  • When you execute sp_recompile (this should be done after statistics are updated)
  • Database restart

Database restart includes at least the following events:

  • Database RESTORE
  • OFFLINE/ONLINE of database
  • Failover (FCI or Availability Group)
  • SQL Server service restart
  • Server boot

Unlike memory-optimized tables – which are all created, compiled, and placed into memory upon database restart – natively compiled stored procedures are recompiled when first executed. This reduces the amount of time required for database recovery, but affects the first-time execution of the procedure.

Plan operators

For traditional tables (harddrive-based), the number of pages expected to be returned by an operator has a significant impact on the cost, and therefore affects the plan. Since memory-optimized tables are not stored in pages, this type of calculation is irrelevant.

For memory-optimized tables, the engine keeps track of how many rows are in each table. This means that estimates for full table scans and index scans are always accurate (because they are always known). For memory-optimized tables, the most important factor for costing  is the number of rows that will be processed by a single operator. Older statistics might reference row counts that are no longer valid, and this can affect plan quality.

Nested execution

Prior to SQL 2014, it was not possible for one natively compiled stored procedure to call another natively compiled stored procedure. This restriction has been lifted in SQL 2016.

We will continue troubleshooting natively compiled stored procedures in a future post.

In-Memory OLTP: How fast can you write?

Typical bottlenecks for write-intensive workloads include:

  • the SQL Server engine (latching/locking)
  • continually (re)interpreted TSQL
  • write-ahead-logging (WAL)
  • logging of modifications to indexes
  • pessimistic concurrency

The design of In-Memory OLTP can address all of the items in the list above, but doing so can reveal new bottlenecks.

If you have a write-intensive In-Memory OLTP workload, and your data must be durable, you will likely find that WRITELOG is one of your top waits. That’s because regardless of database recovery setting, all DML changes to durable memory-optimized tables are fully logged.

A new type of storage (explained a bit later) allows for faster writes to the transaction log, but in order to grasp how it results in increased IO, we must fully understand the process of how changes are made to the transaction log.

Log Buffers

A database must have at least one data file and one transaction log file, but there is another critical component of transaction logging, known as the log buffer. This is an area of memory where log records are first written, before being flushed from memory to the actual log file.

When a change occurs to a table (on-disk table or durable memory-optimized table), log records are created in the log buffer. The contents of the log buffer are flushed to the log file when any of the following occurs:

  1. A transaction commits and that transaction has an active log record in the log buffer
  2. The 60K size limit for the log buffer is reached
  3. Data pages must be written to disk and there are log records in the log buffer related to those pages. Log records must be written to the log file before writing the data pages to the data files (this is known as Write Ahead Logging, and applies to on-disk tables, but not to memory-optimized tables).

(See “Observing SQL Server Transaction Log Flush Sizes using Extended Events and Process Monitor” at this link for more details)

In-Memory OLTP is capable of incredible performance, but it can be constrained by the requirement to navigate the IO stack for log writes. This means that you are not able to fully use all of the power in the cores you purchased, as proven in this Microsoft Ignite session (starting at 23:06).

The issues of core underutilization and the ability to improve writes to the transaction log can be addressed by a new type of storage known as NVDIMM-N, which is supported in Windows Server 2016. An NVDIMM-N is “persistent memory”, or “server class memory” (SCM), and has the following characteristics;

  1. One side is DD4 DRAM, and the other side is Flash storage
  2. Sits on the memory bus
  3. In the event of power failure, the contents of DRAM gets written to the Flash storage, via super cap, battery, or other means
  4. Extraordinary write speeds (see below)
  5. Fairly expensive
  6. Limited in capacity
  7. Are presented to the operating system as drives
  8. Formatted in two possible modes, explained below
  9. No need to cache data
  10. Allows full CPU usage

SQL Server can benefit from using persistent memory, because once the write to the log buffer occurs, data is guaranteed to persist – there is no need to flush the log buffer to disk. As a result, log writes are performed at the speed of memory, not the speed of IO, which would require a context switch and going through the kernel mode IO stack.

Block vs. DAX

As mentioned above, there are two different modes that can be used when formatting SCM storage:

  1. Block – sectors are written, no app changes required, but has overhead to convert to byte-addressable ranges, circumvents IO stack
  2. DAX – byte addressable, app changes required, circumvents IO stack, uses memory-mapped files

DAX is considerably faster than Block, because it allows your transaction log (or more accurately the log buffers associated with it) to be placed on byte-addressable storage.

How do we use SCM?

SCM uses DRAM slots, which has the effect of limiting max physical memory on the server. As mentioned before, it’s used for log buffers only, so only about 20MB per database is required (although DRAM size is specified in gigabytes). You can use a single NVDIMM-N to hold log buffers for multiple databases.

How fast is SCM?

Please see the Tobias Klima videos listed at the bottom of this post for benchmarking results.

Limitations

For those who trespass in the virtual world, Hyper-V cannot directly use persistent memory yet.

Server crash

If your server crashes, theoretically you can remove the persistent memory and place it in another server, as it’s just an NTFS formatted drive. The other server must also be running Windows Server 2016. S2D supports NVDIMMs as described here, and provide high availability for SCM, and are also to aggregate multiple NVDIMM-N for performance or resilience, depending on configuration.

Potential issues

Filter drivers might not work properly:

  • Caching
  • Encryption
  • Antivirus
  • Replication

These types of drivers can have issues when using SCM, because they work based on IOs, but there are no IOs when using SCM.

Benefits for In-Memory OLTP workloads

  • Eliminating WRITELOG waits
  • Allowing full usage of the expensive cores you purchased

More info

Tobias Klima (PM for MS Storage Group) has created videos about using SCM as Block storage here and byte-addressable storage here.

Glenn Berry wrote an excellent post on persistent memory here, where he describes using it as an alternative to deploying In-Memory OLTP. Hopefully this post has proven to you that deploying persistent memory has tangible benefits for In-Memory OLTP workloads as well.

In-Memory OLTP: The moving target that is RTO

Anyone who has worked with or studied In-Memory OLTP should know that upon database restart (which occurs for a variety of reasons), data for durable memory-optimized databases must be streamed from checkpoint file pairs that reside on disk (CFPs), into memory.

During recovery, data files are read in parallel, using as many threads as there are logical CPUs. If all CFPs reside on a single volume, the speed of recovery is limited by how much IO is available from just that single volume. That’s why RTO is affected by not only how many containers exist for a database, but how many volumes those containers are spread across.

That means that other than keeping the amount of data stored in durable memory-optimized tables to a minimum, the only option you have for reducing the amount of time it takes to stream data into memory, is to spread your containers across multiple volumes.

Let’s say you’ve got 5 containers, and your RTO is 15 minutes. You’ve tested recovery with your current data set, and the actual time required to restart your database is 5 minutes. Everything is ok, you’ve got RTO minutes to spare.

Over time, the amount of memory consumed by your durable memory-optimized tables – and the required storage footprint for CFPs – continue to grow. But you’re ok with all of that, because hey – memory is cheap, and so is storage, so you’ll just make sure that you have enough of both to continue doing business.

Now fast forward to some point in the future, and the size of your durable data set has quadrupled. You can’t get application owners to agree to migrate data from memory-optimized tables to disk, because of who-knows-what, or perhaps it’s because of some of the reasons mentioned here.

And now, due to the continued growth of your durable memory-optimized data, you can’t make your RTO. How will you solve this problem?

If you only create additional containers, the In-Memory OLTP engine will be able to use them for writing new data. But that won’t get you where you want to be, because the existing CFPs don’t get redistributed among all existing containers – the new containers get their own CFPs.

The moment you add the new containers, 100% of your existing data still resides in the 5 original containers. Even though you theoretically have enough containers to make your RTO (the original 5 containers plus 5 new ones), it’s still blown, because the pre-existing CFPs have not been redistributed across all containers (10 containers total).

The solution is to:

  • Create new containers on the pre-existing volumes
  • Create new containers on the new volumes
  • Drop the old containers

Dropping the old containers will cause the existing CFPs to be redistributed among all new containers (10).

Now we have a way to redistribute pre-existing CFPs across new containers/volumes so that you can make your RTO. And in case you were wondering, Microsoft says that redistribution of CFPs can be performed while your database is online (obviously you should test this entire process in a non-production environment to see if it works as promised).

How much durable data can you store in a SQL Server database? For SQL 2016, you are limited only by what the operating system can handle. With Windows Server 2012R2, the maximum memory is 4TB, but Windows Server 2016 supports 24TB of memory. That could mean you got a whole lotta CPF redistribution goin’ on…..

UPDATE [2017-05-03]

The Microsoft documentation seems to directly contradict what I’ve written in this blog post, but I can assure you that what I wrote is correct.

Here’s a script to prove it:

USE [master]
GO
 
DROP DATABASE IF EXISTS [InMemContainerTest]
 
CREATE DATABASE [InMemContainerTest] ON PRIMARY 
(
    NAME = N'InMemContainerTest'
   ,FILENAME = N'S:\SQLDATA\InMemContainerTest.mdf' 
   ,SIZE = 100MB 
   ,MAXSIZE = UNLIMITED
   ,FILEGROWTH =100MB 
)
LOG ON 
(
    NAME = N'InMemContainerTest_log'
   ,FILENAME = N'S:\SQLDATA\InMemContainerTest.ldf' 
   ,SIZE = 100MB
   ,MAXSIZE = 2048GB
   ,FILEGROWTH = 100MB
)
GO
 
ALTER DATABASE [InMemContainerTest]
ADD FILEGROUP InMemContainerTestFG
CONTAINS MEMORY_OPTIMIZED_DATA;
GO
ALTER DATABASE [InMemContainerTest]
ADD FILE
(
    NAME = 'InMemContainerTestFile'
   ,FILENAME = 'S:\SQLDATA\InMemContainerTestFile'
)
TO FILEGROUP InMemContainerTestFG;
GO
ALTER DATABASE [InMemContainerTest] SET COMPATIBILITY_LEVEL = 130;
GO
 
USE [InMemContainerTest]
GO
DROP TABLE IF EXISTS [dbo].[InMemT1]
GO
CREATE TABLE [dbo].[InMemT1]
(
    [OrderId]      [INT]           IDENTITY(1, 1) NOT NULL
   ,[StoreID]      [INT]           NOT NULL
   ,[CustomerID]   [INT]           NOT NULL
   ,[OrderDate]    [DATETIME]      NOT NULL
   ,[DeliveryDate] [DATETIME]      NULL
   ,[Amount]       [FLOAT]         NULL
   ,[Discount]     [FLOAT]         NULL
   ,[DiscountCode] [VARCHAR](25)   NULL
   ,INDEX [IndOrders_OrderID] NONCLUSTERED ([OrderId] ASC)
   ,INDEX [IndOrders_StoreID] NONCLUSTERED ([StoreID] ASC)
   ,INDEX [IndOrders_CustomerID] NONCLUSTERED ([CustomerID] ASC)
   ,CONSTRAINT [PK_InMemT1_OrderID] PRIMARY KEY NONCLUSTERED ([OrderId] ASC)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
GO
 
/*
	add data
*/
SET NOCOUNT ON
GO
INSERT dbo.InMemT1
SELECT 1 AS [StoreID]     
      ,2 AS [CustomerID]  
      ,GETDATE() AS [OrderDate]   
      ,GETDATE()AS [DeliveryDate]
      ,1.11 AS [Amount]      
      ,0.10 AS [Discount]    
      ,'Holiday1' AS [DiscountCode]
GO 200000
 
 
/*
	verify CFP folder
	dir S:\SQLData\InMemContainerTestFile\$HKv2
*/
 
 
/*
	add two new containers
*/
 
ALTER DATABASE [InMemContainerTest]
ADD FILE
(
    NAME = 'InMemContainerNew1'
   ,FILENAME = 'S:\SQLDATA\InMemContainerNew1'
)
TO FILEGROUP InMemContainerTestFG;
GO
ALTER DATABASE [InMemContainerTest]
ADD FILE
(
    NAME = 'InMemContainerNew2'
   ,FILENAME = 'S:\SQLDATA\InMemContainerNew2'
)
TO FILEGROUP InMemContainerTestFG;
GO
 
/*
	verify contents of NEW CFP folders
	dir S:\SQLData\InMemContainerNew1\$HKv2
	dir S:\SQLData\InMemContainerNew2\$HKv2
 
	on my server, these folders are empty
*/
 
ALTER DATABASE [InMemContainerTest] REMOVE FILE InMemContainerTestFile;
 
/*
	verify contents of NEW CFP folders
	dir S:\SQLData\InMemContainerNew1\$HKv2
	dir S:\SQLData\InMemContainerNew2\$HKv2
 
	InMemContainerTestFile has been deleted, and the CFPs have been 
	redistributed to InMemContainerNew1 and InMemContainerNew2
*/

In-Memory OLTP query plans: The Unexpected

Spills to TempDB

While testing the new features of In-Memory OLTP in SQL 2016, I discovered something unexpected: query plans for interop can spill to TempDB.

It was explained to me that the accessing of memory-optimized tables is done at the lowest level, and that if sort or join operators are not allocated enough memory, they’ll spill. I contacted Microsoft and expressed my surprise, and their reply was: “If the plan spills with disk-based, it spills with memory-optimized”.

This is not possible with natively compiled stored procedures. For sorts or joins, you simply have to have enough memory, and unfortunately the xtp DMVs don’t reveal the level of granularity required to track how much memory is being used. All joins are nested loop, so the memory requirements are typically modest.

So we need to remember that for interop procedure that access memory-optimized tables, spills to TempDB can still occur.

image

image

Memory-optimized table variable gotcha

In-Memory OLTP can increase performance for a variety of workloads. For example, if your workload creates many #TempTables, ##TempTables, or @TableVariables, they all have to be allocated in TempDB, and it’s possible that TempDB itself is a bottleneck. Some DBAs/Developers mistakenly believe that @TableVariables reside only in memory, which is not true, and has been proven many times in blog posts like this and this, by Wayne Sheffield and Gail Shaw respectively.

Microsoft has described the ways in which temp tables and table variables can be replaced by using memory-optimized objects here. It’s true that we can now have truly memory-resident temporary objects, and that if your workload was bottlenecked due to TempDB io or allocation issues (GAM/SGAM/PFS), using memory-optimized tables variables can increase workload throughput. However, what’s not mentioned in that article is the impact of choosing different types of indexes for the table variable, which has the effect of using 2x memory for the table variable. For large numbers of rows this can even result in an out-of-memory condition. This would be particularly relevant if you are migrating a large number of rows from harddrive-based tables to memory-optimized tables, and the source and destination databases are different.

Creating a memory-optimized table variable is a two step process:

1. create a table type

2. create a variable of that type

Example (note that the PK column of the table type is defined as PRIMARY KEY NONCLUSTERED HASH):

In the following script, ’64K page pool’ indicates the amount of memory allocated to memory-optimized table variables

The PK column of the table type is defined as PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 20000000)

If you instead define the PK column to use a RANGE index (non-HASH index), my testing has shown that memory allocation for the variable is almost exactly double that of using the HASH index.

HASH index, 64 page pool:

image

RANGE index, 64 page pool:

image

Not related to index choice – but still significant – is that the memory allocated to memory-optimized table variables (and their row versions, if any) is not released until the variable goes out of scope. Garbage collection for row versions ignores memory-optimized table variables.

Updating all rows in the variable will create row versions, and at least in this case, the row versions did not consume a lot of additional memory. I blogged about row versions in here.

If you think Microsoft should fix this issue bug with RANGE indexes on memory-optimized table variables, please upvote this connect item.

In-Memory OLTP data/delta file corruption: “Now trappable”

4 April 2017

(This post used to be called In-Memory OLTP data/delta file corruption: “The Untrappable”, but as you’ll see in a moment, things have changed)

There’s a lot of confusion out there about SQL Server’s In-Memory OLTP feature.

You can run CHECKDB on a database that contains memory-optimized tables, but all memory-optimized tables will be ignored. And running CHECKTABLE against memory-optimized tables will fail with an “unsupported” error. As a result, some shops might not consider deploying this feature. While it’s not possible to recover from data/delta file corruption, you can still detect corruption. As I blogged a while ago in this post, a checksum is calculated for every block written to data/delta files, and those checksums are recalculated any time the block is read. That occurs during restore, backup, and any other operation that reads data/delta files. As Brent Ozar blogged in this post, you can execute a copy-only backup of the memory-optimized filegroup to DISK = ‘nul’, to force recalculation of all checksums, which will in turn compare them to the values stored with the blocks. If there are no mismatches between the newly calculated and stored checksum values, your durable memory-optimized data is corruption free.

Let’s say you execute that copy-only backup for your memory-optimized filegroup each night – if there is corruption, which conditions can trigger alerts, so that you can be notified?

As of SQL 2016/SP1, Microsoft has fixed the various code paths that access data/delta files, so that they produce trappable errors upon checksum failure:

  • Regular processing, i.e. file merge: Severity 21, Error 41355
  • Backup/Restore: Severity 16, Error 41316
  • Database recovery: Severity 21, Error 3414

(The sordid details of my attempts to reproduce data/delta file corruption before SQL 2016/SP1 are listed below).

17 August 2016

There’s a lot of confusion out there about SQL Server’s In-Memory OLTP feature.

If there is corruption, what mechanism can you use to be alerted?

I had been in touch with Microsoft about this type of corruption, and they stated that it would be logged in the SQL errorlog as Severity 21. Of course you can create an alert on Severity 21 errors, but I wanted to find a way to determine that it’s specifically related to data/delta file corruption.

How would you go about reproducing this type of corruption for durable memory-optimized tables?

About a year ago I sent an email to corruption guru Paul Randal, asking if he had experimented with this type of corruption for durable memory-optimized data, and at least at that point he had not. So I set out to roll my own corruption repro, and so far the results are not what I expected.

I created a single durable memory-optimized table, and added one row. Then I ran CHECKPOINT to close the data file, and used a hex editor, attempting to open each of the data files. If I tried to open one of the files that had been written to, I received a “file in use” error, so I set the database OFFLINE, and overwrote some of the data in the formerly “in use” file with 00.

Next, I attempted to ONLINE the database, and received the following error:

Msg 41316, Level 16, State 0, Line 51
Restore operation failed for database ‘HKCorruption’ with internal error code ‘0x8800000e’.
Msg 5181, Level 16, State 5, Line 52
Could not restart database “HKCorruption”. Reverting to the previous status.
Msg 5069, Level 16, State 1, Line 52
ALTER DATABASE statement failed.

I checked the SQL errorlog, and there was no Severity 21 listed. That’s where it stands for now – unfortunately I’ve not been able to reproduce pseudo storage corruption that affects data/delta files.

I’ve contacted Microsoft, informing them of the details of my testing, and I’ll update this post if/when I hear back from them.

Update 23 August 2016

Today I heard back from Microsoft. Turns out I had actually been able to reproduce corruption in the data/delta files. Look carefully at the errors from the SQL errorlog that I posted above. See that ‘0x8800000e’ ? It’s the only indication that there was a checksum failure. To be clear, this is what does and does not happen when there is a checksum failure found in data/delta files:

1. a value of 0x8800000e is written to the SQL errorlog
2. no severity is written to the SQL errorlog
3. no standardized error ID is written to the SQL errorlog
4. no text indicating corruption is written to the SQL errorlog

There are many problems with this situation, the first one being that there is no way to trap the corruption error with an alert. If there was a Severity associated with the error, we could create an alert, and receive some type of notification when the corruption occurs.

It’s bad enough that CHECKDB/CHECKTABLE ignores memory-optimized tables. If we force checksums to be calculated by backing up the memory-optimized filegroup to disk = ‘nul’, in order to determine that there are no checksum errors, you will have to scan the SQL errlog for ‘0x8800000e‘ after every memory-optimized filegroup backup.

This would seem to be a somewhat radical departure from standard ways to be informed of corruption (and other SQL Server errors in general).

Who could blame potential adopters of In-Memory OLTP for thinking that it’s not ready for prime time (and in this regard it’s definitely not). What could be more important than knowing your data is corruption free, and being alerted immediately if corruption occurs?

The present state of corruption detection and notification will do little to change the minds of those hesitant to adopt In-Memory OLTP. If Microsoft wants people to consider using In-Memory OLTP, they need to fix this issue immediately.

I have created this connect item about the issues described in this post (thanks for upvoting!)

Update 24 August 2016

Microsoft followed up with me again today, and said the following:

  • If the checkpointing process detects a checksum failure during regular processing, for example during a file merge, a sev21, error 41355 is written to SQL the error log
  • If there is a checksum failure during backup or restore, a sev16 error is written to the SQL error log, which is the same as what SQL Server does for checksum failures in mdf/ndf or ldf files
  • The team is looking at the DB startup code path to raise a sev21 error

That’s much better news than what I thought (what was originally explained to me).

Hopefully Microsoft will be able to fix the DB startup code path issue as part of a CU or SP1 (which in recent history would take about a year from the RTM release date).

Catch a Cluster by its Tail

I’ve been fascinated with SQL Server clustering for at least 15 years. It has matured considerably since the “Wolfpack” days back in 2000, when I sat next to the resident clustering guru at the contracting client I had at that time. He explained the basics to me, and I’m sure I had that “deer in the headlights” look. As a DBA, I had absolutely no interest in storage, networking, DNS, or Active Directory. I simply wanted to expand my SQL DBA skills in a vacuum. Besides, the initial MS implementation of clustering was not at all robust.

But as the years passed, I could see that world of clustering/high availability was catching on, so I decided to learn more about it, and I let go of my irrational lack of desire to learn things not directly connected to SQL Server. I set them up in my lab multiple dozens of times, and came to see clusters as a sort of gigantic puzzle, one that had many inputs and variables, and could be difficult to troubleshoot. Eventually Microsoft released SQL 2012 which included Availability Groups, whose foundation is Windows Server Failover Clustering. I knew my understanding of clustering needed improvement, and so I signed up for an in-person class.  There were only five other students in the class, and so we each received a lot of attention from the instructor, who was definitely super-knowledgeable. In some ways, there is nothing like being in the same room with a technologist who has that type of experience, and the ability to ask questions and also hear the questions that others ask is invaluable.

However, the costs for this class were not insignificant. The course fee was $2,395, hotel was $840, and I missed 4 days of work, for which I was not paid (I’m a contractor/consultant). I considered it an investment in my career, and didn’t give it a second thought. After the training, and following up with the materials that were given in class, my understanding and skills were improved. But four days wasn’t enough for me, and so I began to seek another way of taking my clustering skills to the next level, desiring to have a much deeper understanding of both Windows Server Failover Clustering (WSFC) and SQL Failover Cluster Instances (FCI).

“Timing is everything”, as they say, and I was thrilled to discover that SQL Server MCM and Data Platform MVP Edwin Sarmiento (b | t) had just completed the Herculean effort of creating an online course of study entitled  “Windows Server Failover Clustering for the Smart SQL Server DBA”. I reviewed the course outline, and saw many things that I already knew well, but also many that I needed to improve in my skill set. I liked that you could purchase only the modules that you needed.

Here’s the course outline:

  • Introduction to Microsoft® High Availability Technologies
  • Windows Server Failover Clustering (WSFC) Fundamentals
  • Planning and Installing a Windows Server Failover Cluster (WSFC)
  • Deep Dive on Windows Server Failover Cluster Quorum
  • Windows Server Failover Cluster (WSFC) Configuration
  • Planning and Installing SQL Server Failover Clustered Instance
  • Configuring SQL Server Failover Clustered Instances
  • Managing SQL Server Failover Clustered Instances

The course is described as “advanced” and “deep-dive”, and that’s definitely true, but it starts at the very beginning, and makes no assumptions about the skill level of the viewer with regard to WSFC or FCIs.

When it comes to learning, it’s often said that “repetition is good”. That’s one of the benefits that online training has versus in-person training – you can review it over and over again, and really let it sink in.

You can purchase individual modules or the entire course, and the pricing is extremely reasonable. The course can be viewed at a time and place of your choosing, and you can view modules an unlimited number of times. 

“Windows Server Failover Clustering for the Smart SQL Server DBAtruly expanded my mind about Windows Failover Clustering and FCIs, and Edwin always responded to the dozens of questions I had. His course is a fantastic resource, and I highly recommend it to anyone seeking to up their game in the vast and complex world of clustering.

The course is located here: https://learnsqlserverhadr.com

In-Memory OLTP: Optimizing data load

In-Memory OLTP: Optimizing data load

Inserting large sets of data to memory-optimized tables might be required when initially migrating data from:

harddrive-based or memory-optimized tables in

  • the same database
  • a separate database (not directly supported)

Some of the ways to load data into memory-optimized tables are:

  • SSIS
  • BULK INSERT
  • bcp
  • INSERT/SELECT

SELECT INTO is not supported for memory-optimized tables.

Harddrive-based tables

Let’s review the basic requirements to optimally load data to harddrive-based tables.

PowerPoint Presentation

Recovery model: Most if not all OLTP databases run with the recovery model set to FULL. DBAs are taught from birth that when loading data, the recovery model should be set to BULK_LOGGED so that the transaction log doesn’t explode when you load data. The next transaction log backup will still include all the data that was loaded, but if you set the recovery model to BULK_LOGGED, you won’t require the extra storage to accommodate transaction log growth.

Itzik Ben-Gan wrote an excellent article on minimal logging here. It covers Trace Flag 610 and many other aspects of loading data into harddrive-based tables.

Indexes: For harddrive-based tables, we should have the minimum amount of indexes in place or enabled, because all index modifications are fully logged, which slows down the data load (TF 610 changes this behavior). You’ll still have to rebuild/create those indexes, and that will be logged, but it’s often faster to do that than load data with indexes in place, if for some reason TF 610 can’t be used.

Clustered indexes: For harddrive-based tables, we want to load the data sorted by the clustering key, so that we can eliminate any sorting.

Memory-optimized tables

Basic requirements to optimally load data to memory-optimized tables:

PowerPoint Presentation

Most DBAs are surprised to learn that DML changes to memory-optimized tables are always fully logged, regardless of the database recovery model. For INSERT/UPDATE/DELETE on memory-optimized tables, there is no such thing as “minimally logged”.

In SQL Server 2016 we finally have the ability to use the ALTER TABLE command to change memory-optimized tables. Most ALTER TABLE operations are executed in parallel and have the benefit of being minimally logged.

I did the following to verify that index creation is indeed minimally logged (based on SQL 2016 RC3**):

  • Create a memory-optimized table and load 15 million rows
  • Execute BACKUP LOG and CHECKPOINT (a few times)
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 30 rows
  • ALTER TABLE/ADD NOT NULL column: 7 seconds
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 308 rows
  • Execute BACKUP LOG and CHECKPOINT (a few times)
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 35 rows
  • ALTER TABLE ADD INDEX: 13 seconds
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 118 rows

**If an index column is currently off-row, creating an index that references this column causes the column to be moved in-row. If the index is dropped, the column is again moved off-row. In both of these scenarios, ALTER TABLE is fully logged and single-threaded.

Then I executed a command that is definitely not minimally logged:

  • ALTER TABLE/ADD NOT NULL nvarchar(max) column: 6 minutes, 52 seconds
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 210,280 rows

So from a logging perspective, it probably doesn’t make a lot of difference if non-clustered indexes are in place when data is loaded to memory-optimized tables. But concurrency will definitely suffer when creating indexes with ALTER TABLE/ADD INDEX, as the table is offline for the entire duration of any ALTER commands. That might be somewhat mitigated by the fact that you can now create multiple indexes, constraints, etc, with a single ALTER TABLE statement:

ALTER TABLE dbo. MyInMemTable ADD INDEX IX_Column1(Column1) , INDEX IX_Column2 (Column2)

“Clustered” indexes

Sadly, using the label “clustered” to describe any index on memory-optimized tables will confuse many people. For harddrive-based tables, a clustered index determines the physical order of data pages on disk, and clustered indexes for harddrive-based tables are the primary source of data – they are in fact the actual data for the table.

With regard to how data for memory-optimized tables is stored in memory, it’s not possible to have any form of ordering. Yes, you can create a “clustered” index on a memory-optimized table, but it is not the primary source of data for that table. The primary source of data is still the memory-optimized table in memory.

Loading

You should determine a way to break up the data loading process so that multiple clients can be executed in parallel. By client I mean SSMS, Powershell, SQLCMD, etc. This is no different than the approach you would take for loading data to harddrive-based tables.

When reviewing the following chart, remember that natively compiled stored procedures won’t work for any scenario that includes both harddrive-based and memory-optimized tables.

Source

Method

Notes

harddrive-based, same db

INSERT/SELECT

Supported, but excruciatingly painful with large data sets (single INSERT/SELECT statement), even if using a HASH index with bucket count properly configured. I succeeded in locking up my server several times with this approach.

harddrive-based, different db

INSERT/SELECT

Not supported.

You can use tempdb to stage the data, i.e. SELECT INTO ##temptable. Then process data with multiple clients.

harddrive-based, files

bcp out/ bcp in

Supported

harddrive-based, different db

indexed memory-optimized table variable

Supported, but not “transactional”.

Modifications to rows in a memory-optimized table variable creates row versions (see note below).

BULK INSERT is also supported, with the same restrictions as INSERT/SELECT (can’t go cross-database).

Different Source and Destination databases

a. If you are copying data between databases, i.e. Database A is the source for harddrive-based data you want to migrate, and Database B is the destination for memory-optimized data, you can’t use INSERT/SELECT. That’s because if there is a memory-optimized table as the source or destination of the INSERT/SELECT, you’ll be going “cross-database”, and that’s not allowed. You’ll either need to copy harddrive-based data to a global table (##) in TempDB, to an external file and then use BCP, or to a memory-optimized table variable (further explanation below).

b. Next, you’ll have to get the data into the memory-optimized tables. If using a ##TempTable, you can use stored procedures to process distinct key value ranges, allowing the procedures to be executed in parallel. For performance reasons, before calling these stored procedures, you’ll need to create an index on the primary key of the ##TempTable. If using stored procedures, you should determine the optimal batch size for your server/storage (see chart at the end of this post for my results using this method).

c. Natively compiled stored procedures won’t work in this scenario, because you can’t reference disk-based tables or TempDB from natively compiled stored procedures.

d. Instead of using a ##TempTable, it’s possible to insert data into an indexed memory-optimized table variable from the source database, and then use INSERT/SELECT from that variable into the destination database. That would solve the issue of making a second copy on disk, but be careful if you need to transform the data in the memory-optimized table variables, because updating data in memory-optimized table variables creates row versions, which will consume memory. That’s in addition to the memory required for the memory-optimized table variable itself.

e. Garbage collection is a process that frees memory consumed by row versions, which were created as a result of changes to data in memory-optimized tables. Unfortunately, the garbage collection process does not free up memory consumed by memory-optimized table variables – those row versions will consume additional memory (until the memory-optimized table variable goes out of scope).

In order to use a natively compiled stored procedure for copying data from one table to another, the source and destination tables must both be memory-optimized, and both must reside in the same database.

Hardware/software used for testing

Software

  • Windows Server 2012 Datacenter
  • SQL 2016 RC3
  • sp_configure max memory: 51200 MB
  • Resource pool of 70%

Hardware

  • Make/model: custom built
  • Physical memory: 64GB
  • Memory stick: Samsung M386A4G40DM0 32GB x 2
  • Dual Intel Xeon E5-2630 v3 CPU
  • Transaction log on Intel 750 PCIe SSD
  • Checkpoint File Pairs on OWC Mercury Accelsior PCIe SSD

Testing details:

  • SELECT INTO ##TempTable was used to make the source data visible from within the memory-optimized database.
  • An index was created on the primary key for ##TempTable (INT IDENTITY). The “table on SSD” in the chart below was stored on the Intel 750 PCIe SSD
  • All inserts were done by calling an interpreted TSQL stored procedure which processed rows in batches, using “PrimaryKey BETWEEN val1 and val2”. No key generation was involved, because in the procedure, SET IDENTITY_INSERT was ON.
  • There was a single HASH index on the memory-optimized table, with BUCKET_COUNT set to 10 million, in order to handle the initial data set of 5 million rows. Increasing the BUCKET_COUNT TO 30 million did not make any appreciable difference in the final test (with three sessions loading 5 million rows each).

PowerPoint Presentation

In-Memory OLTP relationship status: “it’s complicated”

Because partitioning is not supported for memory-optimized tables, Microsoft has posted workarounds here and here.

These workarounds describe how to use:

a. application-level partitioning

b. table partitioning for on-disk tables that contain cold data, in combination with memory-optimized tables for hot data.

Both of these workarounds maintain separate tables with identical schema. The first workaround would not require app changes, but the second workaround would require changes in order to know which table to insert/update/delete rows in. Technologists are not crazy about changing existing applications.

Even if we accept that these are viable solutions for existing applications, there are other potential problems with using either of these approaches.

Parent/Child issues

An OLTP database schema is usually highly normalized, with lots of parent/child relationships, and those relationships are usually enforced with PRIMARY KEY and FOREIGN KEY constraints. SQL 2016 allows us to implement PK/FK constraints for memory-optimized tables, but only if all participating tables are memory-optimized.

That leads us to an interesting problem:

How can we enforce PK and FK relationships if a database contains both disk-based and memory-optimized tables, when each table requires the same validation?

Sample scenario

In a simplified scenario, let’s say we have the following tables:

Parent table: memory-optimized, States_InMem

Child table 1: memory-optimized, contains hot data, Addresses_InMem

Child table 2: disk-based, contains cold data, Addresses_OnDisk

We must satisfy at least three conditions:

a. Condition 1: an insert/update on the memory-optimized child table must validate StateID

b. Condition 2: an insert/update on the disk-based child table must validate StateID

c. Condition 3: deleting a row from the parent table must not create orphaned child records

Example 1:

Condition 1

Assume Addresses_InMem has a column named StateID that references States_InMem.StateID.

If we create the States_InMem table as memory- optimized, the Addresses_InMem table can define a FOREIGN KEY that references it. Condition 1 is satisfied.

Condition 2

The disk-based Addresses_Disk table can use a trigger to validate the StateID for inserts or updates. Condition 2 is satisfied.

Condition 3

If we want to delete a record from the memory-optimized Parent table (States_InMem), the FK from memory-optimized Addresses_InMem will prevent the delete if child records exist (assuming we don’t cascade).

Triggers on memory-optimized tables must be natively compiled, and that means they cannot reference disk-based tables. Therefore, when you want to delete a record from the memory-optimized parent table, triggers cannot be used to enforce referential integrity to the disk-based child table.

Without a trigger or a parent/child relationship enforced at the database level, it will be possible to delete a record from States_InMem that references Addresses_OnDisk, thereby creating an orphaned child record. Condition 3 is NOT satisfied.

This “memory-optimized triggers cannot reference disk-based tables” issue also prevents the parent table from being disk-based (described next).

Example 2:

Parent table: disk-based, States_OnDisk

Child table 1: Hot data in memory-optimized table, Addresses_InMem

Child table 2: Cold data in disk-based table, Addresses_Disk

We can only define PK/FK between memory-optimized tables, so that won’t work for validating Addresses_InMem.StateID

As just described, we cannot use triggers on Addresses_InMem to enforce referential integrity, because triggers on memory-optimized tables must be natively compiled, and that means they cannot reference disk-based tables (States_OnDisk).

One solution might be to have all DML for this type of lookup table occur through interop stored procedures. But this has some drawbacks:

1. if a stored procedure must access both disk-based and memory-optimized tables, it cannot be natively compiled

2. Without PRIMARY and FOREIGN KEY rules enforced at the database engine level, invalid data can be introduced

Ideally we would like to have only a single copy of the parent table that can be referenced from either disk-based or memory-optimized child tables.

Separate “lookup” database

You might think that you can simply put reference tables in a separate database, but this approach won’t work, because memory-optimized tables don’t support cross-database queries. Also, the example of the States lookup table is overly simplified – it’s a single table that is a parent to child tables, but itself has no parent.

What if the tables were not Addresses and States, but instead Orders and OrderDetails? Orders might have a parent record, which can also have a parent record, and so on. Even if it was possible to place referenced tables in a separate database, this complexity will likely prevent you from doing so.

Double entry

For small lookup tables with no “parent”, one potential solution would be to store the reference data twice (on disk and in-memory). In this scenario you would modify only the disk-based table, and use triggers on the disk-based table to keep the memory-optimized lookup table in synch.

Entire table in memory

Of course if you put entire tables in memory (a single table that holds both hot and cold data), all of these problems go away. Depending on the complexity of the data model, this solution might work. However, placing both hot and cold data in memory will affect recovery time, and therefore RTO (see my other blog post on recovery for databases with memory-optimized data here).

All data in memory

You could also put your entire database in memory, but In-Memory OLTP isn’t designed for this. Its purpose is to locate tables with the highest activity to memory (or a subset of data for those hot tables). Putting your entire database in memory has even more impact on RTO than placing hot/cold data for a few tables in memory.

Also, cold data won’t benefit from most of what In-Memory OLTP has to offer, as by definition cold data rarely changes. However, there will likely be some benefit from querying data that resides solely in memory-optimized tables (no latching/locking).

Temporal

If your data is temporal in nature, it’s possible to use the new Temporal table feature of SQL 2016 to solve part of the issues discussed. It would work only for memory-optimized tables that are reference tables, like the States table.

You could define both the memory-optimized reference table and your memory-optimized referencing tables to be temporal, and that way the history of both over time is captured. At a given point in time, an Addresses record referenced a specific version of the States record (this will also work for disk-based tables, but the subject of this blog post is how In-Memory OLTP can be used to handle hot/cold data).

It’s recommended to use a clustered columnstore index on the history table to minimize the storage footprint and maximize query performance. Partitioning of the history table is also supported.

Archival data

If due to regulatory requirements multiple years of data must be retained, then you could create a view that encompassed both archival and hot data in memory-optimized temporal tables. And removing large amounts of data from the archival tables can easily be done with partitioning. But adding large amounts of data to the archival tables cannot be done seamlessly, because as mentioned earlier, partitioning is not supported for memory-optimized tables.

Down the road

With the current limitations on triggers, foreign keys, and partitioning for memory-optimized tables, enforcing referential integrity with a mix of hot and cold schemas/tables remains a challenge.

Backup and Recovery for SQL Server databases that contain durable memory-optimized data

With regard to backup and recovery, databases that contain durable memory-optimized tables are treated differently than backups that contain only disk-based tables. DBAs must be aware of the differences so that they don’t mistakenly affect production environments and impact SLAs.

The following image describes files/filegroups for databases that contain durable memory-optimized data:

clip_image002

Data/delta files are required so that memory-optimized tables can be durable, and they reside in Containers, which is a special type of folder. Containers can reside on different drives (more about why you’d want to do that in a bit).

Database recovery occurs due to the following events:

  • Database RESTORE
  • Database OFFLINE/ONLINE
  • Restart of SQL Server service
  • Server boot
  • Failover, including
      • FCI
    • Availability Groups*
    • Log Shipping
    • Database mirroring

The first thing to be aware of is that having durable memory-optimized data in a database can affect your Recovery Time Objective (RTO).

Why?

Because for each of the recovery events listed above, SQL Server must stream data from the data/delta files into memory as part of recovery.

There’s no getting around the fact that if you have lots of durable memory-optimized data, even if you have multiple containers on different volumes, recovery can take a while. That’s especially true in SQL 2016 because Microsoft has raised the limit on the amount of memory-optimized data per database from 256GB to multiple TB (yes, terabytes, limited only by the OS). Imagine waiting for your multi-terabytes of data to stream into memory, and how that will impact your SLAs (when SQL Server streams data to memory, you’ll see a wait type of WAIT_XTP_RECOVERY).

*One exception to the impact that failover can have is when you use Availability Groups with a Secondary replica. In that specific scenario, the REDO process keeps memory-optimized tables up to date in memory on the Secondary, which greatly reduces failover time.

Indexes for memory-optimized tables have no physical representation on disk. That means they must be created as part of database recovery, further extending the recovery timeline.

CPU bound recovery

The recovery process for memory-optimized data uses one thread per logical CPU, and each thread handles a set of data/delta files. That means that simply restoring a database can cause the server to be CPU bound, potentially affecting other databases on the server.

During recovery, SQL Server workloads can be affected by increased CPU utilization due to:

  • low bucket count for hash indexes – this can lead to excessive collisions, causing inserts to be slower
  • nonclustered indexes – unlike static HASH indexes, the size of nonclustered indexes will grow as the data grows. This could be an issue when SQL Server must create those indexes upon recovery.
  • LOB columns – new in SQL 2016, SQL Server maintains a separate internal table for each LOB column. LOB usage is exposed through the sys.memory_optimized_tables_internal_attributes and sys.dm_db_xtp_memory_consumers views. LOB-related documentation for these views has not yet been released.

You can see from the following output that SQL 2016 does indeed create a separate internal table per LOB column. The Items_nvarchar table has a single NVARCHAR(MAX) column. It will take additional time during the recovery phase to recreate these internal per-column tables.

image

Corruption

Because they don’t have any physical representation on disk (except for durability, if you so choose), memory-optimized tables are completely ignored by both CHECKDB and CHECKTABLE. There is no allocation verification, or any of the myriad other benefits that come from running CHECKDB/CHECKTABLE on disk-based tables. So what is done to verify that everything is ok with your memory-optimized data?

CHECKSUM of data/delta files

When a write occurs to a file, a CHECKSUM for the block is calculated and stored with the block. During database backup, the CHECKSUM is calculated again and compared to the CHECKSUM value stored with the block. If the comparison fails, the backup fails (no backup file gets created).

Restore/Recovery

If a backup file contains durable memory-optimized data, there is currently no way to interrogate that backup file to determine how much memory is required to successfully restore.

I did the following to test backup/recovery for a database that contained durable memory-optimized data:

  • Created a database with only one durable memory-optimized table
  • Generated an INSERT only workload (no merging of delta/delta files)
  • INSERTed rows until the size of the table in memory was 20GB
  • Created a full database backup
  • Executed RESTORE FILELISTONLY for that backup file

The following are the relevant columns from the FILELISTONLY output. Note the last row, the one that references the memory-optimized filegroup:

image

There are several things to be aware of here:

  • The size of the memory-optimized data in the backup is 10GB larger than memory allocated for the table (the combined size of the data/delta files is 30GB, hence the extra 10GB)
  • The Type for the memory-optimized filegroup is ‘S’. Within backup files, Filestream, FileTable and In-Memory OLTP all have the same value for Type, which means that database backups that contain two or more types of streaming data don’t have a way to differentiate resource requirements for restoring. A reasonable naming convention should help with that.
  • It is not possible to determine how much memory is required to restore this database. Usually the amount of memory is about the same size as the data/delta storage footprint, but in this case the storage footprint was overestimated by 50%, perhaps due to file pre-creation. There should be a fix in SQL 2016 RC0 to reduce the size of pre-created data/delta files for initial data load. However, this does not help with determining memory requirements for a successful restore.

Now let’s have a look at a slightly different scenario — imagine that you have a 1TB backup file, and that you are tasked with restoring it to a development server. The backup file is comprised of the following:

  • 900GB disk-based data
  • 100GB memory-optimized data

The restore process will create all of the files that must reside on disk, including files for disk-based data (mdf/ndf/ldf) and files for durable memory-optimized data (data/delta files). The general steps that the restore process performs are:

  • Create files to hold disk-based data (size = 900GB, so this can take quite a while)
  • Create files for durable memory-optimized data (size = 100GB)
  • After all files are created, 100GB of durable memory-optimized data must be streamed from the data files into memory

But what if the server you are restoring to only has 64GB of memory for the entire SQL Server instance? In that case, the process of streaming data to memory will fail when there is no more memory available to stream data. Wouldn’t it have been great to know that before you wasted precious time creating 1TB worth of files on disk?

When you ask SQL Server to restore a database, it determines if there is enough free space to create the required files from the backup, and if there isn’t enough free space, the restore fails immediately. If you think that Microsoft should treat databases containing memory-optimized data the same way (fail immediately if there is not enough memory to restore), please vote for this Connect item.