Row version lifecycle for In-Memory OLTP

    In this post we’re going to talk about a crucial element of the In-Memory database engine: the row version life cycle.

    We’ll cover:

    1. why row versions are part of the In-Memory engine
    2. which types of memory-optimized objects create row versions
    3. potential impact on production workloads of using row versioning
    4. and finally, we’ll talk about what happens to row versions after they’re no longer needed

    In a world without row versions – as was the case until SQL 2005 – due to the pessimistic nature of the SQL engine, readers and writers that tried to access the same row at the same time would block each other. This affected the scalability of workloads that had a large number of concurrent users, and/or with data that changed often.

    Creating row versions switches the concurrency model from pessimistic to optimistic, which resolves contention issues for readers and writers. This is achieved by using a process called Multi-Version-Concurrency-Control, which allows queries to see data as of a specific point in time – the view of the data is consistent, and this level of consistency is achieved by creating and referencing row versions.

    Harddrive-based tables only have row versions created when specific database options are set, and row versions are always stored in TempDB. However, for memory-optimized tables, rows versions are stored in memory, and created based on the following conditions, and are not related database settings:

    DML memory consumption:

    1. INSERT: a row version is created and consumes memory

    2. UPDATE: a row version is created, and consumes memory (logically a DELETE followed by an INSERT)

    3. DELETE: a row version is NOT created, and therefore no additional memory is consumed (the row is only logically deleted in the Delta file)

    Why must we be aware of row versions for memory-optimized tables? Because row versions affect the total amount of memory that’s used by the In-Memory engine, and so you need to allow for that as part of capacity planning.

    Let’s have a quick look at how row versioning works. On the following slide you can see that there are two processes that reference the same row – the row that has the pk value of 1.

    Before any data is changed, the value of col is 99.

    PowerPoint Presentation

    A new row version is created each time a row is modified, but queries issued before the modification commits see a version of the row as it existed before the modification.

    Process 1 updates the value of col to 100, and row version A is created. Because this version is a copy of the row as it existed before the update, row version A has a col value of 99.

    Then Process 2 issues a SELECT. It can only see committed data, and since Process 1 has not yet committed, Process 2 sees row version A, which has a col value of 99, not the value of 100 from the UPDATE.

    Next, Process 1 commits. At this point, the value of co1 in the database is 100, but it’s important to remember that row version A is still in use by the SELECT from Process 2, and that means that row version A cannot be discarded. Imagine this happening on a much larger scale, and think about the amount of memory all those row versions will consume. At the extreme end of this scenario, the In-Memory engine can actually run out of memory, and SQL Server itself can become unstable.

    Things to note:

  • Memory allocated to the In-Memory engine can never be paged out under any circumstance
  • Memory-optimized tables don’t support compression

    That’s why there must be a separate process to reclaim memory used by row versions after they’re no longer needed. A background process called Garbage Collection takes care of this, and it’s designed to allow the memory consumed by row versions to be deallocated, and therefore re-used.

    Garbage Collection is designed to be:

  • Non-blocking
  • Responsive
  • Cooperative
  • Scalable

The following slide shows various stages of memory allocation for an instance of SQL Server, and assumes that both disk-based and memory-optimized tables exist in the database. To avoid the performance penalty of doing physical IOs, data for harddrive-based tables should be cached in the buffer pool. But an ever-increasing footprint for the In-Memory engine puts pressure on the buffer pool, causing it to shrink. As a result, performance for harddrive-based tables can suffer from the ever-growing footprint of the In-Memory engine. In fact, the entire SQL Server instance can be impacted. 

PowerPoint Presentation

    We need to understand how Garbage Collection works, so that we can determine what might cause it to fail – or perform below expected levels.

    There are two types of objects that can hold rows in memory:

  • Memory-optimized tables
  • Memory-optimized table variables

Modifications to data in both types of objects will create row versions, and those row versions will of course consume memory. Unfortunately, row versions for memory-optimized table variables are not handled by the Garbage Collection process – the memory consumed by them is only released when the variable goes out of scope. If changes are made to memory-optimized table variables that affect many rows – especially if the table variable has a NONCLUSTERED index – a large amount of memory can be consumed by row versions (see Connect item here).

The Garbage Collection process

    By default, the main garbage collection thread wakes up once every minute, but this frequency changes with the number of completed transactions.

    Garbage Collection occurs in two phases:

  • Unlinking rows from all relevant indexes
  • Deallocating rows from memory

1. Unlinking rows from all relevant indexes

Before: Index references stale row versions

PowerPoint Presentation

After: Index no longer references stale row versions. As part of user activity, indexes are scanned for rows that qualify for garbage collection. So stale row versions are easily identified if they reside in an active index range. But if an index range has low activity, a separate process is required to identity stale row versions. That process is called a “dusty corner” sweep – and it has to do much more work than the user activity processes to identify stale rows. This can affect the performance of Garbage Collection, and allow the footprint for the In-Memory engine to grow.

PowerPoint Presentation

2. Deallocating rows from memory

Each CPU scheduler has a garbage collection queue, and the main garbage collection thread places items on those queues. There is one scheduler for each queue, and after a user transaction commits, it selects all queued items on the scheduler it ran on, and deallocates memory for those items. If there are no items in the queue on its scheduler, the user transaction will search on any queue in the current NUMA node that’s not empty.

PowerPoint Presentation

If transaction activity is low and there’s memory pressure, the main garbage-collection thread can deallocate rows from any queue.

    So the two triggers for Garbage Collection are memory pressure and/or transactional activity. Conversely, that means if there’s no memory pressure – or transactional activity is low – it’s perfectly reasonable to have row versions that aren’t garbage collected. There’s also no way to force garbage collection to occur.

    Monitoring memory usage per table

    We can use the sys.dm_db_xtp_table_memory_stats DMV to see how much memory is in use by a memory-optimized table.  Row versions exist as rows in the table, which is why when we SELECT from the sys.dm_db_xtp_table_memory_stats  DMV, the memory_used_by_table_kb column represents the total amount of memory in use by the table, which includes the amount consumed by row versions. There’s no way to see the amount of memory consumed by row versions at the table or database level.

    SELECT CONVERT(CHAR(20), OBJECT_NAME(object_id)) 
          ,* 
    FROM sys.dm_db_xtp_table_memory_stats 

    tablememoryallocation

    Monitoring the Garbage Collection process

    To verify the current state of garbage collection, we can look at the output from the sys.dm_xtp_gc_queue_stats DMV. The output contains one row for each logical CPU on the server.

    SELECT * 
    FROM sys.dm_xtp_gc_queue_stats
    
    

        GCstatus

        If Garbage Collection is operational, we’ll see that there are non-zero values in the current_queue_depth column, and those values change every time we select from the queue stats DMV. If entries in the current_queue_depth column are not being processed or if no new items are being added to current_queue_depth for some of the queues, it means that garbage collection is not actively reclaiming memory, and as stated before, that might be ok, depending on memory pressure and/or transactional activity.

        Also remember that if we were modifying rows in a memory-optimized table variable, Garbage Collection could not have cleaned up any row versions.

        Blocking Garbage Collection

        The only thing that can prevent Garbage Collection from being operational is a long running transaction. That’s because long running transactions can create long chains of row versions, and they can’t be cleaned up until all of the queries that reference them have completed – Garbage Collection will simply have to wait.

        So – if you expect Garbage Collection to be active, and it’s not, the first thing you should check is if there are any long running transactions.

        Summing up

        Now you know about how the Garbage Collection process works for row versions, which types of memory-optimized objects you expect it to work with, and how to determine if it’s operational. There’s also a completely separate Garbage Collection process for handling data/delta files, and I’ll cover that in a separate post.

         

      7 thoughts on “Row version lifecycle for In-Memory OLTP

      1. Yuri

        Hi Ned, thank you for the series

        But “The only thing that can prevent Garbage Collection from being operational is a long running transaction” mean that 1 stuck transaction will prevent GC work on records touched by this transaction or like in transaction log – all transactions started after beginning of the oldest transaction cannot be cleaned?

        Reply
        1. Yuri

          Got the answer in MS white papare:
          To determine which rows can be safely deleted, the system keeps track of the timestamp of the oldest active transaction running in the system

          Reply
      2. SQLDBA

        Thanks for sharing very in depth workings on MemOptimized Tables and GC workings.
        My Scenario: Heavy OLTP production db (SQL2019) on AG cluster. Total InMem table size at start is ~350GB. However, when heavy traffic times, InMem grows linearly upto ~800GB pushing pressure up on disk based table sizes and total SQL memory.

        Help Needed : What could possibly be filling up InMem footprint linearly and prevent GC from running? One thing I can think of is because its part of AG cluster, our transaction log backups are only running every 4-hours meaning transaction logs wont shrink until 4-hours. Can this impact GC running frequency by slowing it down until 4-hours causing this?

        Appreciate your work.

        Reply
        1. Ned Otter Post author

          What version and edition of SQL are you running?

          Are you using memory optimized tables or table variables?

          Any long-running transactions?

          Running AGs and/or infrequent log backups has no effect on GC.

          Reply

      Leave a Reply

      Your email address will not be published. Required fields are marked *