Corruption | Ned Otter Blog

SQL 2019 is on track to become one of the most awesome releases – the product touches so many realms of the data platform, it’s truly mind boggling.

Since I have such a keen interest in Hekaton/In-Memory OLTP, when the CTPs are released for a new version of SQL Server, I look forward to any potential announcements about that feature.

So far, there’s been only one publicly announced enhancement for In-Memory OLTP in SQL 2019: system tables in TempDB will be “Hekatonized”. This will forever solve the issue of system table contention in TempDB, which is a fantastic use of Hekaton. I’m told it will be “opt in”, so you can use this enhancement if you want to, but you can also back out of it, which would require a restart of the SQL Server service.

But there’s at least one other enhancement that’s not been announced, although the details of its implementation are not yet known.

When you start to research the Hekaton feature, most are shocked to learn that CHECKDB does not verify anything about durable In-Memory tables: it silently ignores them.

That appears to have changed in SQL 2019, although either the informational message about what it does is misleading, or behind the scenes it does something different.

This is the output for DBCC CHECKDB of a memory-optimized database in SQL 2017:

Object ID 949578421 (object ‘inmem_table’): The operation is not
supported with memory optimized tables. This object has been skipped and will not be processed.

(the emphasis was added by me)

This is the output for DBCC CHECKDB of a memory-optimized database in SQL 2019:

DBCC results for ‘inmem_table’.
There are 101 rows in 3 pages for object “inmem_table”.

Why do I say the message is misleading?

Because durable data for memory-optimized tables is not stored in pages, but instead in a streaming fashion in files known as checkpoint file pairs (or data and delta files). Also, while it’s true that there are 101 rows in this table, the engine pre-creates a number of data and delta files, and it would make DBAs sleep a lot better at night, if all of those files were verified as being corruption free.

We’ll just have to stay tuned to the future CTPs and RTM of SQL 2019 to see how all of this shakes out.

4 April 2017

(This post used to be called In-Memory OLTP data/delta file corruption: “The Untrappable”, but as you’ll see in a moment, things have changed)

There’s a lot of confusion out there about SQL Server’s In-Memory OLTP feature.

You can run CHECKDB on a database that contains memory-optimized tables, but all memory-optimized tables will be ignored. And running CHECKTABLE against memory-optimized tables will fail with an “unsupported” error. As a result, some shops might not consider deploying this feature. While it’s not possible to recover from data/delta file corruption, you can still detect corruption. As I blogged a while ago in this post, a checksum is calculated for every block written to data/delta files, and those checksums are recalculated any time the block is read. That occurs during restore, backup, and any other operation that reads data/delta files. As Brent Ozar blogged in this post, you can execute a copy-only backup of the memory-optimized filegroup to DISK = ‘nul’, to force recalculation of all checksums, which will in turn compare them to the values stored with the blocks. If there are no mismatches between the newly calculated and stored checksum values, your durable memory-optimized data is corruption free.

Let’s say you execute that copy-only backup for your memory-optimized filegroup each night – if there is corruption, which conditions can trigger alerts, so that you can be notified?

As of SQL 2016/SP1, Microsoft has fixed the various code paths that access data/delta files, so that they produce trappable errors upon checksum failure:

Regular processing, i.e. file merge: Severity 21, Error 41355
Backup/Restore: Severity 16, Error 41316
Database recovery: Severity 21, Error 3414

(The sordid details of my attempts to reproduce data/delta file corruption before SQL 2016/SP1 are listed below).

~~17 August 2016~~

~~There’s a lot of confusion out there about SQL Server’s In-Memory OLTP feature.~~

~~If there is corruption, what mechanism can you use to be alerted?~~

I had been in touch with Microsoft about this type of corruption, and they stated that it would be logged in the SQL errorlog as Severity 21. Of course you can create an alert on Severity 21 errors, but I wanted to find a way to determine that it’s specifically related to data/delta file corruption.

~~How would you go about reproducing this type of corruption for durable memory-optimized tables?~~

About a year ago I sent an email to corruption guru Paul Randal, asking if he had experimented with this type of corruption for durable memory-optimized data, and at least at that point he had not. So I set out to roll my own corruption repro, and so far the results are not what I expected.

I created a single durable memory-optimized table, and added one row. Then I ran CHECKPOINT to close the data file, and used a hex editor, attempting to open each of the data files. If I tried to open one of the files that had been written to, I received a “file in use” error, so I set the database OFFLINE, and overwrote some of the data in the formerly “in use” file with 00.

~~Next, I attempted to ONLINE the database, and received the following error:~~

Msg 41316, Level 16, State 0, Line 51
Restore operation failed for database ‘HKCorruption’ with internal error code ‘0x8800000e’.
Msg 5181, Level 16, State 5, Line 52
Could not restart database “HKCorruption”. Reverting to the previous status.
Msg 5069, Level 16, State 1, Line 52
ALTER DATABASE statement failed.

I checked the SQL errorlog, and there was no Severity 21 listed. That’s where it stands for now – unfortunately I’ve not been able to reproduce pseudo storage corruption that affects data/delta files.

~~I’ve contacted Microsoft, informing them of the details of my testing, and I’ll update this post if/when I hear back from them.~~

~~Update 23 August 2016~~

Today I heard back from Microsoft. Turns out I had actually been able to reproduce corruption in the data/delta files. Look carefully at the errors from the SQL errorlog that I posted above. See that ‘0x8800000e’ ? It’s the only indication that there was a checksum failure. To be clear, this is what does and does not happen when there is a checksum failure found in data/delta files:

1. a value of 0x8800000e is written to the SQL errorlog
2. no severity is written to the SQL errorlog
3. no standardized error ID is written to the SQL errorlog
4. no text indicating corruption is written to the SQL errorlog

There are many problems with this situation, the first one being that there is no way to trap the corruption error with an alert. If there was a Severity associated with the error, we could create an alert, and receive some type of notification when the corruption occurs.

It’s bad enough that CHECKDB/CHECKTABLE ignores memory-optimized tables. If we force checksums to be calculated by backing up the memory-optimized filegroup to disk = ‘nul’, in order to determine that there are no checksum errors, you will have to scan the SQL errlog for ‘0x8800000e‘ after every memory-optimized filegroup backup.

~~This would seem to be a somewhat radical departure from standard ways to be informed of corruption (and other SQL Server errors in general).~~

Who could blame potential adopters of In-Memory OLTP for thinking that it’s not ready for prime time (and in this regard it’s definitely not). What could be more important than knowing your data is corruption free, and being alerted immediately if corruption occurs?

The present state of corruption detection and notification will do little to change the minds of those hesitant to adopt In-Memory OLTP. If Microsoft wants people to consider using In-Memory OLTP, they need to fix this issue immediately.

~~I have created~~ ~~this connect item~~ ~~about the issues described in this post (thanks for upvoting!)~~

~~Update 24 August 2016~~

~~Microsoft followed up with me again today, and said the following:~~

~~If the checkpointing process detects a checksum failure during regular processing, for example during a file merge, a sev21, error 41355 is written to SQL the error log~~
~~If there is a checksum failure during backup or restore, a sev16 error is written to the SQL error log, which is the same as what SQL Server does for checksum failures in mdf/ndf or ldf files~~
~~The team is looking at the DB startup code path to raise a sev21 error~~

~~That’s much better news than what I thought (what was originally explained to me).~~

~~Hopefully Microsoft will be able to fix the DB startup code path issue as part of a CU or SP1 (which in recent history would take about a year from the RTM release date).~~

Ned Otter Blog

SQL Server DBA and Musician

Category Archives: Corruption

SQL 2019 In-Memory hotness

In-Memory OLTP data/delta file corruption: “Now trappable”