Category Archives: Native compilation

Availability Groups and Native Compilation

For disk-based tables, query plans for interpreted/traditional stored procedures will be recompiled when statistics have changed. That’s because when you update statistics, cached query plans for interpreted stored procedures are invalidated, and will automatically recompile the next time they’re executed. That’s true for an interpreted stored procedure that references disk-based tables, and/or memory-optimized tables.

As of SQL 2016, the database engine automatically updates statistics for memory-optimized tables (documentation here), but recompilation of native modules must still be performed manually. But hey, that’s way better than SQL 2014, when you couldn’t recompile at all; you had to drop/recreate the native module. And natively compiled stored procedures don’t reside in the plan cache, because they are executed directly by the database engine.

This post attempts to determine if the requirement to manually recompile native modules is any different for AG secondary replicas.

Stats on the primary

Statistics that are updated on the primary replica will eventually make their way to all secondary replicas. This blog post by Sunil Agarwal details what happens on the secondary replica if the statistics are stale (relative to any temporary statistics that were created on the secondary).

How do we…?

The first question we must answer is: how do you determine when the last time a natively compiled stored procedure was compiled?

We can do that by checking the value of the cached_time column from the following query:

The query is simple, but you won’t get any results unless you enable the collection of stored procedure execution statistics for natively compiled procedures. Execution statistics can be collected at the object level or instance level.

NOTE: Enabling the collection of stored procedure statistics for natively compiled procedures can crush your server, potentially resulting in disastrous performance impact. You must be extremely careful with this method of troubleshooting.

Once you’ve enabled stats collection for native procedures, you should get results from the query above.

How I tested

Here are the steps I executed, after creating an AG that used synchronous mode (containing a single database with a memory-optimized filegroup):

  1. Create a sample table
  2. Insert some rows
  3. Create a natively compiled procedure that selects from the sample table
  4. Execute the native procedure on the primary and secondary (it must be executed at least once in order to have usage stats collected)
  5. Enable collection of stored procedure execution statistics on the primary and secondary replicas
  6. Again execute the native procedure on the primary and secondary
  7. Note the value of sys.dm_exec_procedure_stats.cached_time on the primary and secondary
  8. Recompile the native procedure on the primary
  9. Execute the native procedure on the primary and secondary
  10. Again note the value of sys.dm_exec_procedure_stats.cached_time on the primary and secondary

Results

The cached_time value on the secondary did not get updated when the native module was recompiled on the primary.

What does this mean for DBAs that are responsible for maintaining AGs that use native compilation?  It means that when you recompile native modules on the primary replica (which you would always do after updating statistics on the primary), those modules must be recompiled on all secondary replicas. The recompilation on the secondary can be performed manually or perhaps through some automated mechanism. For example, if you have a SQL Agent job on the primary replica to update statistics, one of the job steps might be for marking all natively compiled stored procedures on the secondary for recompilation, using sp_recompile.

How would that job step handle the recompile for all secondary replicas?

Perhaps after defining linked servers, you could do something like:

EXEC SecondaryServer1.msdb.dbo.sp_start_job @job_name = N’Recompile native procs’;

EXEC SecondaryServer2.msdb.dbo.sp_start_job @job_name = N’Recompile native procs’;

But it might be involved to define this for all secondary replicas – it sounds like a topic for another post…..

Happy recompiling –

How NOT to benchmark In-Memory OLTP

In forums over the last few months, I’ve seen a number of posts like the following:

  • “I tested native compilation, and it’s not much faster than using interpreted TSQL”
  • “I’m seeing performance issues with memory-optimized tables”

Tools and latency

Sometimes the bottleneck is the tool that’s used for testing. One person was using Ostress.exe and logging output to a file, using the –o parameter. This caused the benchmark they ran for memory-optimized tables to actually perform worse than disk-based tables! The overhead of logging Ostress output to disk created a high degree of latency, but once they removed the –o parameter, In-Memory OLTP performed super-fast for their workload.

Across the wire

Client/server messaging has overhead, and this cannot be improved by using In-Memory OLTP. Whether you’re returning one million rows from a disk-based table or a memory-optimized table, you’re still sending one million rows across the wire, which is not a valid test of In-Memory OLTP performance.

Core count

When you do a proof of concept, you should keep in mind that In-Memory OLTP is designed to work with many cores, and many concurrent processes. If you do your POC on a laptop with a single-threaded workload, In-Memory OLTP is not likely to deliver orders-of-magnitude performance benefits.

Simple queries used for testing Native Compilation

If you test with a query like:

SELECT col1
FROM table1

then native compilation will probably not be much faster than disk-based tables. Native compilation will show the greatest benefit when encapsulating complex business logic.

“Test” workloads

Doing a proof of concept with a contrived workload will not accurately determine if your real workload would benefit from migrating some or all data to In-Memory OLTP. The best way to do a proof of concept would be to use a copy of your production database with a realistic workload. You could run against disk-based tables first, and after migrating data to In-Memory, you could re-run and compare the results.

Deploying In-Memory OLTP can increase workload performance in several ways:

  • latch/lock free architecture
  • reduced/enhanced logging – modifications to indexes are not logged, and also the entire logging process has been redesigned for memory-optimized tables
  • interpreted TSQL overhead
  • temp table/tempdb overhead
  • excessive computation

Obviously you’d need to have a reasonable amount of concurrent activity in order to determine if In-Memory OLTP would achieve performance gains for your workload.

Transactional Replication meets In-Memory OLTP

Transactional replication hasn’t changed much since it was re-written for SQL 2005. However, with the release of SQL 2014 , there is at least one new possibility: memory-optimized tables at the subscriber.

With the release of SQL 2016, some of the restrictions for memory-optimized subscriber tables have been lifted:

  • snapshot schema files that create the memory-optimized tables no longer have to be manually modified (but see “Gotcha #6, silent schema killer” below)
  • tables replicated to memory-optimized tables on a subscriber are no longer limited to the SQL 2014 max row length of 8060 bytes for memory-optimized tables. This seems sort of moot, because published tables cannot themselves be memory-optimized, and are therefore still restricted to 8060 bytes. However, if for some reason you needed to add a lot of columns to the subscriber table that cause it to be greater than 8060 bytes, you can do it. Note that there is no limit on how large a row can be for memory-optimized tables in SQL 2016. The following statement is perfectly valid:

 

Why would you want to use memory-optimized tables in a subscriber database? There can only be one answer: speed.

Subscriber latency due to data volume could be a result of the following, in combination with each other, or individually:

  • excessive logging – changes to indexes are not logged for memory-optimized tables, and in general logging is much more efficient than for traditional/on-disk tables
  • locking – no locks are taken for DML statements that touch memory-optimized tables
  • blocking – blocking as a result of a transaction making changes to rows is not possible for memory-optimized tables
  • latching – no latches are taken on memory-optimized tables

The design of the In-Memory OLTP engine can alleviate latency due to these issues – BUT – before you start jumping for joy, you’ll need to be aware of the impact of deploying In-Memory OLTP in general.

DBAs love to tune things (indexes, queries, etc.), and subscriber tables are no exception. Until SQL 2014, when memory-optimized subscriber tables were introduced, some of the things that DBAs tuned on the subscriber included:

  • compression settings
  • different ways that the data in subscriber tables can be reinitialized, i.e. TRUNCATE TABLE, DELETE, DROP/CREATE table, or do nothing (these choices are for the ‘Action if name is in use’ section of the ‘Destination Object’, see the next screen shot).
  • custom indexes
  • snapshot isolation

For reinitializing, being able to use TRUNCATE TABLE is a great benefit, because all custom indexes and compression settings are retained for the destination table. If you choose drop/create, all compression settings and custom indexing must be reapplied to the subscriber table upon (re)initialization.

Deployment considerations

Article properties

On the dialog for Article Properties, you’ll need to make sure that both “Enable Memory Optimization” and “Convert clustered index to nonclustered index for memory optimized article” are set to “True”. Despite what you might have read, there is no concept of a “clustered” index for a memory-optimized table. If you have a clustered index on the published table, the DDL will fail when applied on the subscriber unless you set this option.

ReplicationArticleMemOptHighlighted

Subscription Properties

The Subscription Properties can be configured when initially creating the subscription:

ReplicationSubscriptionMemOptHighlight

or from the Subscription Properties dialog, if the subscription already exists:

ReplicationSubscriptionHighlilghted

Gotcha #1, DML support

Reinitialization is likely to happen at some point in the future, and so we’ll need to make the correct choice for “Action if name is in use”, on the same Article Properties dialog.

TRUNCATE TABLE is not supported for memory-optimized tables. If the table must be dropped, you’ll have to reapply scripts to handle any subscriber-level customization.

ReplicationArticleNameInUse_dropdown

Gotcha #2, compression

On-disk tables are stored in pages. Memory-optimized tables are not stored in pages, and therefore don’t support any form of compression (columnstore indexes on memory-optimized tables create a separate compressed copy of the rows, but the primary data source remains the rows in memory).

Gotcha #3, potential WRITELOG bottleneck

All DML operations on durable memory-optimized tables are fully logged, regardless of database-level recovery settings (for more details, see my post on “Optimizing Data Load” here). If deploying In-Memory OLTP solves the latency issues your app was experiencing, WRITELOG is likely to become one of the top waits. This prevents realizing the full potential of deploying In-Memory OLTP, but fear not – as of SQL 2016/SP1, NVDIMM is supported for the transaction log, reducing/eliminating the log as a performance bottleneck. See the link here for more detail.

Gotcha #4, impact on RTO

If by chance you must restore a subscriber database that contains a lot of durable memory-optimized data (I realize that “a lot” is subjective), RTO will be affected. That’s because the number and placement of containers has a significant effect on the amount of time required to recover a database that contains durable memory-optimized data. See my post “In-Memory OLTP: The moving target that is RTO” here for more details. You might also be interested in “Backup and Recovery for SQL Server databases that contain durable memory-optimized data” here.

Gotcha #5, resource consumption

Updates on memory-optimized tables are performed as DELETE + INSERT, and INSERTs create row versions, and the newly inserted row becomes the current version. Older versions consume additional memory, and must be retained as long as any processes that reference them are still executing (like queries running on the subscriber). It’s possible to have long chains of versioned rows, and that means your environment might require additional memory. For a detailed explanation of row versioning, including the Garbage Collection process, see my post on “Row version lifecycle for In-Memory OLTP” here. There are additional considerations if your workload uses memory-optimized table variables (also detailed in that post).

Gotcha #6, silent schema killer

Let’s say you’ve done you’re homework, and that your configuration for memory-optimized subscriber tables is perfect. There is additional database configuration that must be done to support memory-optimized tables, and without that, your subscriber tables will not be initialized/reinitialized as memory-optimized (they’ll still be created on the subscriber, but will be traditional/on-disk tables). In the stored procedure that executes on the subscriber, there is validation to determine if there is a memory-optimized filegroup for the subscriber database (there are other conditions, but this is the one we’re interested in).

If you lookup the definition of sys.filegroups, it relates to sys.data_spaces, and there we see a column named “type” that can have the following values:

FG = Filegroup
FD = FILESTREAM data filegroup
FX = Memory-optimized tables filegroup
PS = Partition scheme

If the query finds a filegroup of type “FX”, the table is created as memory-optimized, and if not (along with some other conditions), it’s created as a traditional/on-disk table.

While it seems obvious that you should have already configured your database to have a memory-optimized filegroup, if you skipped that step, there is no warning, error, or other type of message received, stating that the subscriber database is not memory-optimized. Of course, simply having a memory-optimized filegroup is not enough to create memory-optimized tables, because you must also have containers that belong to that memory-optimized filegroup. The “memory-optimized filegroup exists” validation will pass, but the (re)initialization will fail because no containers exist, and you’ll receive an error about that.

Index limitations

As of this writing (SQL 2016, SP1), a memory-optimized table can have a maximum of 9 indexes (if one of them is a columnstore index). That may or may not be an issue for your environment, but it’s a much lower number than traditional/on-disk tables.

Stored procedure execution

A quick review of Interop vs. Native Compilation:

  • Interop – interpreted TSQL as existed prior to SQL 2014. The full TSQL surface area is available with interop mode, and you can access both on-disk and memory-optimized tables.
  • Native Compilation – for maximum speed, you’ll want to use natively compiled stored procedures. There are restrictions for natively compiled modules, the most significant being that they can only reference memory-optimized tables, and the full TSQL surface area is not available. As of SQL 2016/SP1, natively compiled modules don’t support CASE statements, views, and there many other restrictions. For more details, check “Transact-SQL Constructs Not Supported by In-Memory OLTP” here.

If you execute an UPDATE or DELETE that affects a large number of rows, then that statement is turned into individual UPDATE or DELETE statements that are sent to the distributor, and finally to the subscriber(s). To avoid the overhead of sending all those changes, it’s possible to publish the “execution” of a stored procedure. The documentation says: “..only the procedure execution is replicated, bypassing the need to replicate the individual changes for each row..” Please refer to the document about replicating stored procedure execution here.

The documentation also states that you can customize the stored procedure on the subscriber. Although the documentation doesn’t mention it, the stored procedure can be natively compiled, which should greatly increase performance on the subscriber for transactions that affect a large number of rows. Keep in mind that any changes made to the procedure at the publisher are sent to the subscriber. If this isn’t the behavior you want, disable the propagation of schema changes before executing ALTER PROCEDURE.

IDENTITY crisis

You’ll likely be disappointed with native compilation if you’re trying to INSERT many rows at the subscriber, and the destination table includes an IDENTITY column. That’s because it’s not possible to insert a row that has an IDENTITY column in a natively compiled stored procedure. Even if you SET IDENTITY_INSERT on before calling the procedure, the insert still fails with: “The function ‘setidentity’ is not supported with natively compiled modules.”

Custom stored procedures

There is a difference between “replicating stored procedure execution”, and using “custom stored procedures”.  Microsoft does not support anything you might create as a “custom stored procedure”, whether or not it’s natively compiled.

Please check the documentation here.

Wrapping up

In-Memory OLTP is steadily making its way into the full feature set offered by SQL Server. If you’re running SQL 2016 SP1, In-Memory OLTP is now included with all editions of SQL 2016, except LocalDB.

Troubleshooting Natively Compiled Stored Procedures, Part 1

A subset of the tools available for troubleshooting interpreted stored procedures are available for troubleshooting natively compiled procedures.

The following table highlights the differences

Method Interpreted Natively compiled
Recompile specific statements Supported Not supported – but theoretically not required, due to the impossibility of parameter sniffing
Execute procedure with RECOMPILE Supported Not supported – but theoretically not required, due to the impossibility of parameter sniffing
Estimated/Actual plan Supported “Estimated Plan” makes no sense in the context of natively compiled stored procedures. The plan that will be executed is available from SHOWPLAN_XML or by clicking (“Estimated Plan” in SSMS, but it’s not “estimated”)
Remove plan from plan cache Supported Not supported – plans for natively compiled stored procedures are not stored in the plan cache.
DBCC FREEPROCCACHE Supported No effect, because plans for natively compiled stored procedures are not stored in the plan cache.
SET STATISTICS IO ON Supported Not supported/required, because there is no such thing as IO for memory-optimized tables.
SET STATISTICS TIME ON Supported Supported, but might not be 100% accurate, because execution times less than 1 millisecond are reported as 0 seconds. Total_worker_time may not be accurate if many executions take less than 1 millisecond.
SET FMTONLY Supported Not supported, but you can use sp_describe_first_result_set.
SHOWPLAN_XML Supported Supported
SHOWPLAN_ALL Supported Not supported
SHOWPLAN_TEXT Supported Not supported
Mismatched datatypes xEvents hekaton_slow_parameter_passing, with reason = parameter_conversion.
Named parameters, i.e.

EXEC dbo.Proc @Param1 = @Param1

Supported Supported, but not recommended, due to performance impact. You can track this type of execution with xEvents hekaton_slow_parameter_passing, with reason = named_parameters.

If any SET options are in effect, statistics are gathered at the procedure level and not at the statement level.

Note 1: Statement-level execution statistics can be gathered with xEvents by capturing the sp_statement_completed event. They can also be seen using Query Store (detailed in a future post).

Note 2: Due to the nature of working with memory-optimized tables in general, it’s likely that you will have to implement retry logic. Because of this, and also because of feature limitations within the natively compiled space, Microsoft suggest using an interpreted TSQL wrapper when calling natively compiled stored procedures.

The following query references sys.dm_exec_query_stats to get statistics for natively compiled procedures:

Note 3: The documentation for sys.dm_exec_query_stats states that the total_rows, min_rows, max_rows, and last_rows columns cannot be NULL, but NULL is still returned. A Connect item has been filed to have those columns return 0.

Parallelism

Parallelism is supported for memory-optimized tables for all index types. While that statement is true when using interpreted stored procedures that reference memory-optimized tables, unfortunately it’s not true when using natively compiled stored procedures.

Actual vs. Estimated

These terms have confused generations of SQL Server technologists.

For natively compiled procedures, enabling “Actual Plan” in SSMS does not return any plan information, but still executes the procedure. Enabling “Estimated Plan” in SSMS for natively compiled procedures is the same as setting SHOWPLAN_XML ON, but does not actually execute the stored procedure. The plan that will be executed is displayed.

Removing plans from the cache

For interpreted stored procedures, DBAs have the ability to remove an offending plan from the plan cache. This is not possible with natively compiled stored procedures, because the plan is not stored in the plan cache.

DBCC FREEPROCCACHE

If you execute DBCC FREEPROCCACHE and expect your natively compiled plans to magically disappear, you will no doubt be disappointed. That’s because DBCC FREEPROCCACHE has no effect on compiled modules, as they are not stored in the plan cache that’s used for interpreted TSQL. But executing DBCC FREEPROCCACHE will of course remove all existing plans for interpreted TSQL from the plan cache (so don’t do that…unless you’re really, really sure you want to recompile all of your interpreted procs).

Parameter sniffing

With interpreted stored procedures, parameter sniffing can severely impact performance. Parameter sniffing is not possible for natively compiled procedures, because all natively compiled procedures are executed with OPTIMIZE FOR UNKNOWN.

Statistics

SQL 2016 has the ability to automatically update statistics on memory-optimized tables if your database is has a compatibility level of at least 130. If you don’t want to depend on SQL Server to determine when stats should be updated, you can update statistics manually (and we no longer have to use FULLSCAN, as was the case in SQL 2014). Statistics for index key columns are created when an index is created.

Database upgrades and statistics

As mentioned earlier, if your database was set to compatibility level 120 (SQL 2014), and you want to take advantage of auto-update statistics, you must change the compatibility level to 130. But statistics still won’t be auto-updated unless you manually update them one last time.

Recompilation

When you create a natively compiled stored procedure, it gets compiled, and execution plans for the queries contained within the procedure are created. As the data changes, those execution plans will be based on older statistics, and might not perform at the highest possible level. Many people think that if you update statistics, natively compiled stored procedures will magically recompile. Unfortunately, this is not correct – natively compiled stored procedures are only recompiled under the following circumstances:

  • When you execute sp_recompile (this should be done after statistics are updated)
  • Database restart

Database restart includes at least the following events:

  • Database RESTORE
  • OFFLINE/ONLINE of database
  • Failover (FCI or Availability Group)
  • SQL Server service restart
  • Server boot

Unlike memory-optimized tables – which are all created, compiled, and placed into memory upon database restart – natively compiled stored procedures are recompiled when first executed. This reduces the amount of time required for database recovery, but affects the first-time execution of the procedure.

Plan operators

For traditional tables (harddrive-based), the number of pages expected to be returned by an operator has a significant impact on the cost, and therefore affects the plan. Since memory-optimized tables are not stored in pages, this type of calculation is irrelevant.

For memory-optimized tables, the engine keeps track of how many rows are in each table. This means that estimates for full table scans and index scans are always accurate (because they are always known). For memory-optimized tables, the most important factor for costing  is the number of rows that will be processed by a single operator. Older statistics might reference row counts that are no longer valid, and this can affect plan quality.

Nested execution

Prior to SQL 2014, it was not possible for one natively compiled stored procedure to call another natively compiled stored procedure. This restriction has been lifted in SQL 2016.

We will continue troubleshooting natively compiled stored procedures in a future post.