Memory-optimized table variable gotcha

In-Memory OLTP can increase performance for a variety of workloads. For example, if your workload creates many #TempTables, ##TempTables, or @TableVariables, they all have to be allocated in TempDB, and it’s possible that TempDB itself is a bottleneck. Some DBAs/Developers mistakenly believe that @TableVariables reside only in memory, which is not true, and has been proven many times in blog posts like this and this, by Wayne Sheffield and Gail Shaw respectively.

Microsoft has described the ways in which temp tables and table variables can be replaced by using memory-optimized objects here. It’s true that we can now have truly memory-resident temporary objects, and that if your workload was bottlenecked due to TempDB io or allocation issues (GAM/SGAM/PFS), using memory-optimized tables variables can increase workload throughput. However, what’s not mentioned in that article is the impact of choosing different types of indexes for the table variable, which has the effect of using 2x memory for the table variable. For large numbers of rows this can even result in an out-of-memory condition. This would be particularly relevant if you are migrating a large number of rows from harddrive-based tables to memory-optimized tables, and the source and destination databases are different.

Creating a memory-optimized table variable is a two step process:

1. create a table type

2. create a variable of that type

Example (note that the PK column of the table type is defined as PRIMARY KEY NONCLUSTERED HASH):

CREATE TYPE dbo.InMemType AS TABLE  
    (  
         [PK] [INT] NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 20000000) 
        ,[Col1] [INT] NOT NULL
        ,[Col2] [NVARCHAR](255) NOT NULL
        ,[Col3] [UNIQUEIDENTIFIER] NULL
        ,[Col4] [INT] NULL
        ,[Col5] [INT] NULL
        ,[Col6] [VARCHAR](6) NULL
        ,[Col7] [NVARCHAR](255) NOT NULL
        ,[Col8] [NVARCHAR](255) NOT NULL
        ,[Col9] [NVARCHAR](255) NOT NULL
        ,[Col10] [DATETIME] NOT NULL
        ,[Col11] [NVARCHAR](1640) NULL
        ,[Col12] [NVARCHAR](1640) NULL 
    );  
go  
      
SET NOCOUNT ON;  
DECLARE @InMemVariable dbo.InMemType

In the following script, ’64K page pool’ indicates the amount of memory allocated to memory-optimized table variables

SET NOCOUNT ON;  
DECLARE @InMemVariable dbo.InMemType
DECLARE @InMemVariable dbo.InMemType ;  
DECLARE @MaxValue INT = 1000000
DECLARE @Increment INT = 1
DECLARE @Counter INT = 1

SELECT  memory_consumer_id
       ,memory_consumer_type_desc
       ,memory_consumer_desc
       ,object_id
       ,OBJECT_SCHEMA_NAME(object_id) + '.' + OBJECT_NAME(object_id) [Table_Name]
       ,index_id
       ,CAST(allocated_bytes / 1024. AS NUMERIC(15, 2)) [allocated_kb]
       ,CAST(used_bytes / 1024. AS NUMERIC(15, 2)) [used_kb]
FROM sys.dm_db_xtp_memory_consumers
WHERE memory_consumer_desc = '64K page pool'

WHILE (@Counter <= @MaxValue)
BEGIN

    INSERT @InMemVariable
    (
         [Col1]
        ,[Col2]
        ,[Col3]
        ,[Col4]
        ,[Col5]
        ,[Col6]
        ,[Col7]
        ,[Col8]
        ,[Col9]
        ,[Col10]
        ,[Col11]
        ,[Col12] 
    )
    SELECT [Col1]
          ,[Col2]
          ,[Col3]
          ,[Col4]
          ,[Col5]
          ,[Col6]
          ,[Col7]
          ,[Col8]
          ,[Col9]
          ,[Col10]
          ,[Col11]
          ,[Col12] 
    FROM <harddrive-based table>
    WHERE PK = @Counter

    SET @Counter = @Counter + 1

END

SELECT COUNT(*)
FROM @InMemVariable

SELECT  memory_consumer_id
       ,memory_consumer_type_desc
       ,memory_consumer_desc
       ,object_id
       ,OBJECT_SCHEMA_NAME(object_id) + '.' + OBJECT_NAME(object_id) [Table_Name]
       ,index_id
       ,CAST(allocated_bytes / 1024. AS NUMERIC(15, 2)) [allocated_kb]
       ,CAST(used_bytes / 1024. AS NUMERIC(15, 2)) [used_kb]
FROM sys.dm_db_xtp_memory_consumers
WHERE memory_consumer_desc = '64K page pool'

The PK column of the table type is defined as PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 20000000)

If you instead define the PK column to use a RANGE index (non-HASH index), my testing has shown that memory allocation for the variable is almost exactly double that of using the HASH index.

CREATE TYPE dbo.InMemType AS TABLE  
    (  
         [PK] [INT] NOT NULL PRIMARY KEY NONCLUSTERED
        ,[Col1] [INT] NOT NULL
        ,[Col2] [NVARCHAR](255) NOT NULL
        ,[Col3] [UNIQUEIDENTIFIER] NULL
        ,[Col4] [INT] NULL
        ,[Col5] [INT] NULL
        ,[Col6] [VARCHAR](6) NULL
        ,[Col7] [NVARCHAR](255) NOT NULL
        ,[Col8] [NVARCHAR](255) NOT NULL
        ,[Col9] [NVARCHAR](255) NOT NULL
        ,[Col10] [DATETIME] NOT NULL
        ,[Col11] [NVARCHAR](1640) NULL
        ,[Col12] [NVARCHAR](1640) NULL 
    );  
go

HASH index, 64 page pool:

image

RANGE index, 64 page pool:

image

Not related to index choice – but still significant – is that the memory allocated to memory-optimized table variables (and their row versions, if any) is not released until the variable goes out of scope. Garbage collection for row versions ignores memory-optimized table variables.

Updating all rows in the variable will create row versions, and at least in this case, the row versions did not consume a lot of additional memory. I blogged about row versions in here.

If you think Microsoft should fix this issue bug with RANGE indexes on memory-optimized table variables, please upvote this connect item.

In-Memory OLTP data/delta file corruption: “The Untrappable”

17 August 2016

There’s a lot of confusion out there about SQL Server’s In-Memory OLTP feature.

Because CHECKDB and CHECKTABLE ignore memory-optimized tables, some might not consider deploying this feature. While it’s not possible to recover from data/delta file corruption, you can still detect corruption. As I blogged a while ago in this post, a checksum is calculated for every block written to data/delta files. Those checksums are recalculated any time the block is read. That occurs during restore, backup, and any other operation that reads the data/delta files. As Brent Ozar blogged in this post, you can backup the memory-optimized filegroup to DISK = ‘nul’ to force recalculation of all checksums, which will in turn compare them to the values stored with the blocks. If there are no mismatches between the newly calculated and stored checksum values, your memory-optimized data is corruption free.

Let’s say you run the fake backup for your memory-optimized filegroup each night – if there is corruption, what mechanism can you use to be alerted?

I had been in touch with Microsoft about this type of corruption, and they stated that it would be logged in the SQL errorlog as Severity 21. Of course you can create an alert on Severity 21 errors, but I wanted to find a way to determine that it’s specifically related to data/delta file corruption.

How would you go about reproducing this type of corruption for durable memory-optimized tables?

About a year ago I sent an email to corruption guru Paul Randal, asking if he had experimented with this type of corruption for durable memory-optimized data, and at least at that point he had not. So I set out to roll my own corruption repro, and so far the results are not what I expected.

I created a single durable memory-optimized table, and added one row. Then I ran CHECKPOINT to close the data file, and used a hex editor, attempting to open each of the data files. If I tried to open one of the files that had been written to, I received a “file in use” error, so I set the database OFFLINE, and overwrote some of the data in the formerly “in use” file with 00.

Next, I attempted to ONLINE the database, and received the following error:

Msg 41316, Level 16, State 0, Line 51
Restore operation failed for database ‘HKCorruption’ with internal error code ‘0x8800000e’.
Msg 5181, Level 16, State 5, Line 52
Could not restart database “HKCorruption”. Reverting to the previous status.
Msg 5069, Level 16, State 1, Line 52
ALTER DATABASE statement failed.

I checked the SQL errorlog, and there was no Severity 21 listed. That’s where it stands for now – unfortunately I’ve not been able to reproduce pseudo storage corruption that affects data/delta files.

I’ve contacted Microsoft, informing them of the details of my testing, and I’ll update this post if/when I hear back from them.

Update 23 August 2016

Today I heard back from Microsoft. Turns out I had actually been able to reproduce corruption in the data/delta files. Look carefully at the errors from the SQL errorlog that I posted above. See that ‘0x8800000e’ ? It’s the only indication that there was a checksum failure. To be clear, this is what does and does not happen when there is a checksum failure found in data/delta files:

1. a value of 0x8800000e is written to the SQL errorlog
2. no severity is written to the SQL errorlog
3. no standardized error ID is written to the SQL errorlog
4. no text indicating corruption is written to the SQL errorlog

There are many problems with this situation, the first one being that there is no way to trap the corruption error with an alert. If there was a Severity associated with the error, we could create an alert, and receive some type of notification when the corruption occurs.

It’s bad enough that CHECKDB/CHECKTABLE ignores memory-optimized tables. If we force checksums to be calculated by backing up the memory-optimized filegroup to disk = ‘nul’, in order to determine that there are no checksum errors, you will have to scan the SQL errlog for ‘0x8800000e‘ after every memory-optimized filegroup backup.

This would seem to be a somewhat radical departure from standard ways to be informed of corruption (and other SQL Server errors in general).

Who could blame potential adopters of In-Memory OLTP for thinking that it’s not ready for prime time (and in this regard it’s definitely not). What could be more important than knowing your data is corruption free, and being alerted immediately if corruption occurs?

The present state of corruption detection and notification will do little to change the minds of those hesitant to adopt In-Memory OLTP. If Microsoft wants people to consider using In-Memory OLTP, they need to fix this issue immediately.

I have created this connect item about the issues described in this post (thanks for upvoting!)

Update 24 August 2016

Microsoft followed up with me again today, and said the following:

  • If the checkpointing process detects a checksum failure during regular processing, for example during a file merge, a sev21, error 41355 is written to SQL the error log
  • If there is a checksum failure during backup or restore, a sev16 error is written to the SQL error log, which is the same as what SQL Server does for checksum failures in mdf/ndf or ldf files
  • The team is looking at the DB startup code path to raise a sev21 error

That’s much better news than what I thought (what was originally explained to me).

Hopefully Microsoft will be able to fix the DB startup code path issue as part of a CU or SP1 (which in recent history would take about a year from the RTM release date).

Catch a Cluster by its Tail

I’ve been fascinated with SQL Server clustering for at least 15 years. It has matured considerably since the “Wolfpack” days back in 2000, when I sat next to the resident clustering guru at the contracting client I had at that time. He explained the basics to me, and I’m sure I had that “deer in the headlights” look. As a DBA, I had absolutely no interest in storage, networking, DNS, or Active Directory. I simply wanted to expand my SQL DBA skills in a vacuum. Besides, the initial MS implementation of clustering was not at all robust.

But as the years passed, I could see that world of clustering/high availability was catching on, so I decided to learn more about it, and I let go of my irrational lack of desire to learn things not directly connected to SQL Server. I set them up in my lab multiple dozens of times, and came to see clusters as a sort of gigantic puzzle, one that had many inputs and variables, and could be difficult to troubleshoot. Eventually Microsoft released SQL 2012 which included Availability Groups, whose foundation is Windows Server Failover Clustering. I knew my understanding of clustering needed improvement, and so I signed up for an in-person class.  There were only five other students in the class, and so we each received a lot of attention from the instructor, who was definitely super-knowledgeable. In some ways, there is nothing like being in the same room with a technologist who has that type of experience, and the ability to ask questions and also hear the questions that others ask is invaluable.

However, the costs for this class were not insignificant. The course fee was $2,395, hotel was $840, and I missed 4 days of work, for which I was not paid (I’m a contractor/consultant). I considered it an investment in my career, and didn’t give it a second thought. After the training, and following up with the materials that were given in class, my understanding and skills were improved. But four days wasn’t enough for me, and so I began to seek another way of taking my clustering skills to the next level, desiring to have a much deeper understanding of both Windows Server Failover Clustering (WSFC) and SQL Failover Cluster Instances (FCI).

“Timing is everything”, as they say, and I was thrilled to discover that SQL Server MCM and Data Platform MVP Edwin Sarmiento (b | t) had just completed the Herculean effort of creating an online course of study entitled  “Windows Server Failover Clustering for the Smart SQL Server DBA”. I reviewed the course outline, and saw many things that I already knew well, but also many that I needed to improve in my skill set. I liked that you could purchase only the modules that you needed.

Here’s the course outline:

  • Introduction to Microsoft® High Availability Technologies
  • Windows Server Failover Clustering (WSFC) Fundamentals
  • Planning and Installing a Windows Server Failover Cluster (WSFC)
  • Deep Dive on Windows Server Failover Cluster Quorum
  • Windows Server Failover Cluster (WSFC) Configuration
  • Planning and Installing SQL Server Failover Clustered Instance
  • Configuring SQL Server Failover Clustered Instances
  • Managing SQL Server Failover Clustered Instances

The course is described as “advanced” and “deep-dive”, and that’s definitely true, but it starts at the very beginning, and makes no assumptions about the skill level of the viewer with regard to WSFC or FCIs.

When it comes to learning, it’s often said that “repetition is good”. That’s one of the benefits that online training has versus in-person training – you can review it over and over again, and really let it sink in.

You can purchase individual modules or the entire course, and the pricing is extremely reasonable. The course can be viewed at a time and place of your choosing, and you can view modules an unlimited number of times. 

“Windows Server Failover Clustering for the Smart SQL Server DBAtruly expanded my mind about Windows Failover Clustering and FCIs, and Edwin always responded to the dozens of questions I had. His course is a fantastic resource, and I highly recommend it to anyone seeking to up their game in the vast and complex world of clustering.

The course is located here: https://learnsqlserverhadr.com

In-Memory OLTP: Optimizing data load

In-Memory OLTP: Optimizing data load

Inserting large sets of data to memory-optimized tables might be required when initially migrating data from:

harddrive-based or memory-optimized tables in

  • the same database
  • a separate database (not directly supported)

Some of the ways to load data into memory-optimized tables are:

  • SSIS
  • BULK INSERT
  • bcp
  • INSERT/SELECT

SELECT INTO is not supported for memory-optimized tables.

Harddrive-based tables

Let’s review the basic requirements to optimally load data to harddrive-based tables.

PowerPoint Presentation

Recovery model: Most if not all OLTP databases run with the recovery model set to FULL. DBAs are taught from birth that when loading data, the recovery model should be set to BULK_LOGGED so that the transaction log doesn’t explode when you load data. The next transaction log backup will still include all the data that was loaded, but if you set the recovery model to BULK_LOGGED, you won’t require the extra storage to accommodate transaction log growth.

Itzik Ben-Gan wrote an excellent article on minimal logging here. It covers Trace Flag 610 and many other aspects of loading data into harddrive-based tables.

Indexes: For harddrive-based tables, we should have the minimum amount of indexes in place or enabled, because all index modifications are fully logged, which slows down the data load (TF 610 changes this behavior). You’ll still have to rebuild/create those indexes, and that will be logged, but it’s often faster to do that than load data with indexes in place, if for some reason TF 610 can’t be used.

Clustered indexes: For harddrive-based tables, we want to load the data sorted by the clustering key, so that we can eliminate any sorting.

Memory-optimized tables

Basic requirements to optimally load data to memory-optimized tables:

PowerPoint Presentation

Most DBAs are surprised to learn that DML changes to memory-optimized tables are always fully logged, regardless of the database recovery model. For INSERT/UPDATE/DELETE on memory-optimized tables, there is no such thing as “minimally logged”.

In SQL Server 2016 we finally have the ability to use the ALTER TABLE command to change memory-optimized tables. Most ALTER TABLE operations are executed in parallel and have the benefit of being minimally logged.

I did the following to verify that index creation is indeed minimally logged (based on SQL 2016 RC3**):

  • Create a memory-optimized table and load 15 million rows
  • Execute BACKUP LOG and CHECKPOINT (a few times)
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 30 rows
  • ALTER TABLE/ADD NOT NULL column: 7 seconds
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 308 rows
  • Execute BACKUP LOG and CHECKPOINT (a few times)
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 35 rows
  • ALTER TABLE ADD INDEX: 13 seconds
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 118 rows

**If an index column is currently off-row, creating an index that references this column causes the column to be moved in-row. If the index is dropped, the column is again moved off-row. In both of these scenarios, ALTER TABLE is fully logged and single-threaded.

Then I executed a command that is definitely not minimally logged:

  • ALTER TABLE/ADD NOT NULL nvarchar(max) column: 6 minutes, 52 seconds
  • Execute SELECT COUNT(*) FROM fn_dblog(NULL, NULL), result is 210,280 rows

So from a logging perspective, it probably doesn’t make a lot of difference if non-clustered indexes are in place when data is loaded to memory-optimized tables. But concurrency will definitely suffer when creating indexes with ALTER TABLE/ADD INDEX, as the table is offline for the entire duration of any ALTER commands. That might be somewhat mitigated by the fact that you can now create multiple indexes, constraints, etc, with a single ALTER TABLE statement:

ALTER TABLE dbo. MyInMemTable ADD INDEX IX_Column1(Column1) , INDEX IX_Column2 (Column2)

“Clustered” indexes

Sadly, using the label “clustered” to describe any index on memory-optimized tables will confuse many people. For harddrive-based tables, a clustered index determines the physical order of data pages on disk, and clustered indexes for harddrive-based tables are the primary source of data – they are in fact the actual data for the table.

With regard to how data for memory-optimized tables is stored in memory, it’s not possible to have any form of ordering. Yes, you can create a “clustered” index on a memory-optimized table, but it is not the primary source of data for that table. The primary source of data is still the memory-optimized table in memory.

Loading

You should determine a way to break up the data loading process so that multiple clients can be executed in parallel. By client I mean SSMS, Powershell, SQLCMD, etc. This is no different than the approach you would take for loading data to harddrive-based tables.

When reviewing the following chart, remember that natively compiled stored procedures won’t work for any scenario that includes both harddrive-based and memory-optimized tables.

Source

Method

Notes

harddrive-based, same db

INSERT/SELECT

Supported, but excruciatingly painful with large data sets (single INSERT/SELECT statement), even if using a HASH index with bucket count properly configured. I succeeded in locking up my server several times with this approach.

harddrive-based, different db

INSERT/SELECT

Not supported.

You can use tempdb to stage the data, i.e. SELECT INTO ##temptable. Then process data with multiple clients.

harddrive-based, files

bcp out/ bcp in

Supported

harddrive-based, different db

indexed memory-optimized table variable

Supported, but not “transactional”.

Modifications to rows in a memory-optimized table variable creates row versions (see note below).

BULK INSERT is also supported, with the same restrictions as INSERT/SELECT (can’t go cross-database).

Different Source and Destination databases

a. If you are copying data between databases, i.e. Database A is the source for harddrive-based data you want to migrate, and Database B is the destination for memory-optimized data, you can’t use INSERT/SELECT. That’s because if there is a memory-optimized table as the source or destination of the INSERT/SELECT, you’ll be going “cross-database”, and that’s not allowed. You’ll either need to copy harddrive-based data to a global table (##) in TempDB, to an external file and then use BCP, or to a memory-optimized table variable (further explanation below).

b. Next, you’ll have to get the data into the memory-optimized tables. If using a ##TempTable, you can use stored procedures to process distinct key value ranges, allowing the procedures to be executed in parallel. For performance reasons, before calling these stored procedures, you’ll need to create an index on the primary key of the ##TempTable. If using stored procedures, you should determine the optimal batch size for your server/storage (see chart at the end of this post for my results using this method).

c. Natively compiled stored procedures won’t work in this scenario, because you can’t reference disk-based tables or TempDB from natively compiled stored procedures.

d. Instead of using a ##TempTable, it’s possible to insert data into an indexed memory-optimized table variable from the source database, and then use INSERT/SELECT from that variable into the destination database. That would solve the issue of making a second copy on disk, but be careful if you need to transform the data in the memory-optimized table variables, because unlike inserts, updating data in memory-optimized table variables creates row versions, which will consume memory. That’s in addition to the memory required for the memory-optimized table variable itself.

e. Garbage collection is a process that frees memory consumed by row versions, which were created as a result of changes to data in memory-optimized tables. Unfortunately, the garbage collection process does not free up memory consumed by memory-optimized table variables. Those row versions will consume memory until the memory-optimized table variable goes out of scope.

In order to use a natively compiled stored procedure for copying data from one table to another, the source and destination tables must both be memory-optimized, and both must reside in the same database.

Hardware/software used for testing

Software

  • Windows Server 2012 Datacenter
  • SQL 2016 RC3
  • sp_configure max memory: 51200 MB
  • Resource pool of 70%

Hardware

  • Make/model: custom built
  • Physical memory: 64GB
  • Memory stick: Samsung M386A4G40DM0 32GB x 2
  • Dual Intel Xeon E5-2630 v3 CPU
  • Transaction log on Intel 750 PCIe SSD
  • Checkpoint File Pairs on OWC Mercury Accelsior PCIe SSD

Testing details:

  • SELECT INTO ##TempTable was used to prepare the source table.
  • The source table had an index on an IDENTITY column which was the primary key. The “table on SSD” in the chart below was stored on the Intel 750 PCIe SSD
  • All inserts were done by calling an interpreted TSQL stored procedure which processed rows in batches, using “PrimaryKey BETWEEN val1 and val2”. No key generation was involved, because in the procedure, SET IDENTITY_INSERT was ON.
  • There was a single HASH index on the memory-optimized table, with BUCKET_COUNT set to 10 million, in order to handle the initial data set of 5 million rows. Increasing the BUCKET_COUNT TO 30 million did not make any appreciable difference in the final test (with three sessions loading 5 million rows each).

PowerPoint Presentation

In-Memory OLTP relationship status: “it’s complicated”

Because partitioning is not supported for memory-optimized tables, Microsoft has posted workarounds here and here.

These workarounds describe how to use:

a. application-level partitioning

b. table partitioning for on-disk tables that contain cold data, in combination with memory-optimized tables for hot data.

Both of these workarounds maintain separate tables with identical schema. The first workaround would not require app changes, but the second workaround would require changes in order to know which table to insert/update/delete rows in. Technologists are not crazy about changing existing applications.

Even if we accept that these are viable solutions for existing applications, there are other potential problems with using either of these approaches.

Parent/Child issues

An OLTP database schema is usually highly normalized, with lots of parent/child relationships, and those relationships are usually enforced with PRIMARY KEY and FOREIGN KEY constraints. SQL 2016 allows us to implement PK/FK constraints for memory-optimized tables, but only if all participating tables are memory-optimized.

That leads us to an interesting problem:

How can we enforce PK and FK relationships if a database contains both disk-based and memory-optimized tables, when each table requires the same validation?

Sample scenario

In a simplified scenario, let’s say we have the following tables:

Parent table: memory-optimized, States_InMem

Child table 1: memory-optimized, contains hot data, Addresses_InMem

Child table 2: disk-based, contains cold data, Addresses_OnDisk

We must satisfy at least three conditions:

a. Condition 1: an insert/update on the memory-optimized child table must validate StateID

b. Condition 2: an insert/update on the disk-based child table must validate StateID

c. Condition 3: deleting a row from the parent table must not create orphaned child records

Example 1:

Condition 1

Assume Addresses_InMem has a column named StateID that references States_InMem.StateID.

If we create the States_InMem table as memory- optimized, the Addresses_InMem table can define a FOREIGN KEY that references it. Condition 1 is satisfied.

Condition 2

The disk-based Addresses_Disk table can use a trigger to validate the StateID for inserts or updates. Condition 2 is satisfied.

Condition 3

If we want to delete a record from the memory-optimized Parent table (States_InMem), the FK from memory-optimized Addresses_InMem will prevent the delete if child records exist (assuming we don’t cascade).

Triggers on memory-optimized tables must be natively compiled, and that means they cannot reference disk-based tables. Therefore, when you want to delete a record from the memory-optimized parent table, triggers cannot be used to enforce referential integrity to the disk-based child table.

Without a trigger or a parent/child relationship enforced at the database level, it will be possible to delete a record from States_InMem that references Addresses_OnDisk, thereby creating an orphaned child record. Condition 3 is NOT satisfied.

This “memory-optimized triggers cannot reference disk-based tables” issue also prevents the parent table from being disk-based (described next).

Example 2:

Parent table: disk-based, States_OnDisk

Child table 1: Hot data in memory-optimized table, Addresses_InMem

Child table 2: Cold data in disk-based table, Addresses_Disk

We can only define PK/FK between memory-optimized tables, so that won’t work for validating Addresses_InMem.StateID

As just described, we cannot use triggers on Addresses_InMem to enforce referential integrity, because triggers on memory-optimized tables must be natively compiled, and that means they cannot reference disk-based tables (States_OnDisk).

One solution might be to have all DML for this type of lookup table occur through interop stored procedures. But this has some drawbacks:

1. if a stored procedure must access both disk-based and memory-optimized tables, it cannot be natively compiled

2. Without PRIMARY and FOREIGN KEY rules enforced at the database engine level, invalid data can be introduced

Ideally we would like to have only a single copy of the parent table that can be referenced from either disk-based or memory-optimized child tables.

Separate “lookup” database

You might think that you can simply put reference tables in a separate database, but this approach won’t work, because memory-optimized tables don’t support cross-database queries. Also, the example of the States lookup table is overly simplified – it’s a single table that is a parent to child tables, but itself has no parent.

What if the tables were not Addresses and States, but instead Orders and OrderDetails? Orders might have a parent record, which can also have a parent record, and so on. Even if it was possible to place referenced tables in a separate database, this complexity will likely prevent you from doing so.

Double entry

For small lookup tables with no “parent”, one potential solution would be to store the reference data twice (on disk and in-memory). In this scenario you would modify only the disk-based table, and use triggers on the disk-based table to keep the memory-optimized lookup table in synch.

Entire table in memory

Of course if you put entire tables in memory (a single table that holds both hot and cold data), all of these problems go away. Depending on the complexity of the data model, this solution might work. However, placing both hot and cold data in memory will affect recovery time, and therefore RTO (see my other blog post on recovery for databases with memory-optimized data here).

All data in memory

You could also put your entire database in memory, but In-Memory OLTP isn’t designed for this. Its purpose is to locate tables with the highest activity to memory (or a subset of data for those hot tables). Putting your entire database in memory has even more impact on RTO than placing hot/cold data for a few tables in memory.

Also, cold data won’t benefit from most of what In-Memory OLTP has to offer, as by definition cold data rarely changes. However, there will likely be some benefit from querying data that resides solely in memory-optimized tables (no latching/locking).

Temporal

If your data is temporal in nature, it’s possible to use the new Temporal table feature of SQL 2016 to solve part of the issues discussed. It would work only for memory-optimized tables that are reference tables, like the States table.

You could define both the memory-optimized reference table and your memory-optimized referencing tables to be temporal, and that way the history of both over time is captured. At a given point in time, an Addresses record referenced a specific version of the States record (this will also work for disk-based tables, but the subject of this blog post is how In-Memory OLTP can be used to handle hot/cold data).

It’s recommended to use a clustered columnstore index on the history table to minimize the storage footprint and maximize query performance. Partitioning of the history table is also supported.

Archival data

If due to regulatory requirements multiple years of data must be retained, then you could create a view that encompassed both archival and hot data in memory-optimized temporal tables. And removing large amounts of data from the archival tables can easily be done with partitioning. But adding large amounts of data to the archival tables cannot be done seamlessly, because as mentioned earlier, partitioning is not supported for memory-optimized tables.

Down the road

With the current limitations on triggers, foreign keys, and partitioning for memory-optimized tables, enforcing referential integrity with a mix of hot and cold schemas/tables remains a challenge.

Row version lifecycle for In-Memory OLTP

    In this post we’re going to talk about a crucial element of the In-Memory database engine: the row version life cycle.

    We’ll cover:

    1. why row versions are part of the In-Memory engine
    2. which types of memory-optimized objects create row versions
    3. potential impact on production workloads of using row versioning
    4. and finally, we’ll talk about what happens to row versions after they’re no longer needed

    In a world without row versions – as was the case until SQL 2005 – due to the pessimistic nature of the SQL engine, readers and writers that tried to access the same row at the same time would block each other. This affected the scalability of workloads that had a large number of concurrent users, and/or with data that changed often.

    Creating row versions switches the concurrency model from pessimistic to optimistic, which resolves contention issues for readers and writers. This is achieved by using a process called Multi-Version-Concurrency-Control, which allows queries to see data as of a specific point in time – the view of the data is consistent, and this level of consistency is achieved by creating and referencing row versions.

    Harddrive-based tables only have row versions created when specific database options are set, and row versions are always stored in TempDB. However, for memory-optimized tables, rows versions are stored in memory, and created based on the following conditions, and are not related database settings:

    DML memory consumption:

    1. INSERT: a row version is created and consumes memory

    2. UPDATE: a row version is created, and consumes memory (logically a DELETE followed by an INSERT)

    3. DELETE: a row version is NOT created, and therefore no additional memory is consumed (the row is only logically deleted in the Delta file)

    Why must we be aware of row versions for memory-optimized tables? Because row versions affect the total amount of memory that’s used by the In-Memory engine, and so you need to allow for that as part of capacity planning.

    Let’s have a quick look at how row versioning works. On the following slide you can see that there are two processes that reference the same row – the row that has the pk value of 1.

    Before any data is changed, the value of col is 99.

    PowerPoint Presentation

    A new row version is created each time a row is modified, but queries issued before the modification commits see a version of the row as it existed before the modification.

    Process 1 updates the value of col to 100, and row version A is created. Because this version is a copy of the row as it existed before the update, row version A has a col value of 99.

    Then Process 2 issues a SELECT. It can only see committed data, and since Process 1 has not yet committed, Process 2 sees row version A, which has a col value of 99, not the value of 100 from the UPDATE.

    Next, Process 1 commits. At this point, the value of co1 in the database is 100, but it’s important to remember that row version A is still in use by the SELECT from Process 2, and that means that row version A cannot be discarded. Imagine this happening on a much larger scale, and think about the amount of memory all those row versions will consume. At the extreme end of this scenario, the In-Memory engine can actually run out of memory, and SQL Server itself can become unstable.

    Things to note:

  • Memory allocated to the In-Memory engine can never be paged out under any circumstance
  • Memory-optimized tables don’t support compression

    That’s why there must be a separate process to reclaim memory used by row versions after they’re no longer needed. A background process called Garbage Collection takes care of this, and it’s designed to allow the memory consumed by row versions to be deallocated, and therefore re-used.

    Garbage Collection is designed to be:

  • Non-blocking
  • Responsive
  • Cooperative
  • Scalable

The following slide shows various stages of memory allocation for an instance of SQL Server, and assumes that both disk-based and memory-optimized tables exist in the database. To avoid the performance penalty of doing physical IOs, data for harddrive-based tables should be cached in the buffer pool. But an ever-increasing footprint for the In-Memory engine puts pressure on the buffer pool, causing it to shrink. As a result, performance for harddrive-based tables can suffer from the ever-growing footprint of the In-Memory engine. In fact, the entire SQL Server instance can be impacted. 

PowerPoint Presentation

    We need to understand how Garbage Collection works, so that we can determine what might cause it to fail – or perform below expected levels.

    There are two types of objects that can hold rows in memory:

  • Memory-optimized tables
  • Memory-optimized table variables

Modifications to data in both types of objects will create row versions, and those row versions will of course consume memory. Unfortunately, row versions for memory-optimized table variables are not handled by the Garbage Collection process – the memory consumed by them is only released when the variable goes out of scope. If changes are made to memory-optimized table variables that affect many rows – especially if the table variable has a NONCLUSTERED index – a large amount of memory can be consumed by row versions (see Connect item here).

The Garbage Collection process

    By default, the main garbage collection thread wakes up once every minute, but this frequency changes with the number of completed transactions.

    Garbage Collection occurs in two phases:

  • Unlinking rows from all relevant indexes
  • Deallocating rows from memory

1. Unlinking rows from all relevant indexes

Before: Index references stale row versions

PowerPoint Presentation

After: Index no longer references stale row versions. As part of user activity, indexes are scanned for rows that qualify for garbage collection. So stale row versions are easily identified if they reside in an active index range. But if an index range has low activity, a separate process is required to identity stale row versions. That process is called a “dusty corner” sweep – and it has to do much more work than the user activity processes to identify stale rows. This can affect the performance of Garbage Collection, and allow the footprint for the In-Memory engine to grow.

PowerPoint Presentation

2. Deallocating rows from memory

Each CPU scheduler has a garbage collection queue, and the main garbage collection thread places items on those queues. There is one scheduler for each queue, and after a user transaction commits, it selects all queued items on the scheduler it ran on, and deallocates memory for those items. If there are no items in the queue on its scheduler, the user transaction will search on any queue in the current NUMA node that’s not empty.

PowerPoint Presentation

If transaction activity is low and there’s memory pressure, the main garbage-collection thread can deallocate rows from any queue.

    So the two triggers for Garbage Collection are memory pressure and/or transactional activity. Conversely, that means if there’s no memory pressure – or transactional activity is low – it’s perfectly reasonable to have row versions that aren’t garbage collected. There’s also no way to force garbage collection to occur.

    Monitoring memory usage per table

    We can use the sys.dm_db_xtp_table_memory_stats DMV to see how much memory is in use by a memory-optimized table.  Row versions exist as rows in the table, which is why when we SELECT from the sys.dm_db_xtp_table_memory_stats  DMV, the memory_used_by_table_kb column represents the total amount of memory in use by the table, which includes the amount consumed by row versions. There’s no way to see the amount of memory consumed by row versions at the table or database level.

    SELECT CONVERT(CHAR(20), OBJECT_NAME(object_id)) 
          ,* 
    FROM sys.dm_db_xtp_table_memory_stats 

    tablememoryallocation

    Monitoring the Garbage Collection process

    To verify the current state of garbage collection, we can look at the output from the sys.dm_xtp_gc_queue_stats DMV. The output contains one row for each logical CPU on the server.

    SELECT * 
    FROM sys.dm_xtp_gc_queue_stats
    
    

        GCstatus

        If Garbage Collection is operational, we’ll see that there are non-zero values in the current_queue_depth column, and those values change every time we select from the queue stats DMV. If entries in the current_queue_depth column are not being processed or if no new items are being added to current_queue_depth for some of the queues, it means that garbage collection is not actively reclaiming memory, and as stated before, that might be ok, depending on memory pressure and/or transactional activity.

        Also remember that if we were modifying rows in a memory-optimized table variable, Garbage Collection could not have cleaned up any row versions.

        Blocking Garbage Collection

        The only thing that can prevent Garbage Collection from being operational is a long running transaction. That’s because long running transactions can create long chains of row versions, and they can’t be cleaned up until all of the queries that reference them have completed – Garbage Collection will simply have to wait.

        So – if you expect Garbage Collection to be active, and it’s not, the first thing you should check is if there are any long running transactions.

        Summing up

        Now you know about how the Garbage Collection process works for row versions, which types of memory-optimized objects you expect it to work with, and how to determine if it’s operational. There’s also a completely separate Garbage Collection process for handling data/delta files, and I’ll cover that in a separate post.

         

      Backup and Recovery for SQL Server databases that contain durable memory-optimized data

      With regard to backup and recovery, databases that contain durable memory-optimized tables are treated differently than backups that contain only disk-based tables. DBAs must be aware of the differences so that they don’t mistakenly affect production environments and impact SLAs.

      The following image describes files/filegroups for databases that contain durable memory-optimized data:

      clip_image002

      Data/delta files are required so that memory-optimized tables can be durable, and they reside in Containers, which is a special type of folder. Containers can reside on different drives (more about why you’d want to do that in a bit).

      Database recovery occurs due to the following events:

      • Database RESTORE
      • Database OFFLINE/ONLINE
      • Restart of SQL Server service
      • Server boot
      • Failover, including
          • FCI
        • Availability Groups*
        • Log Shipping
        • Database mirroring

      The first thing to be aware of is that having durable memory-optimized data in a database can affect your Recovery Time Objective (RTO).

      Why?

      Because for each of the recovery events listed above, SQL Server must stream data from the data/delta files into memory as part of recovery.

      There’s no getting around the fact that if you have lots of durable memory-optimized data, even if you have multiple containers on different volumes, recovery can take a while. That’s especially true in SQL 2016 because Microsoft has raised the limit on the amount of memory-optimized data per database from 256GB to multiple TB (yes, terabytes, limited only by the OS). Imagine waiting for your multi-terabytes of data to stream into memory, and how that will impact your SLAs (when SQL Server streams data to memory, you’ll see a wait type of WAIT_XTP_RECOVERY).

      *One exception to the impact that failover can have is when you use Availability Groups with a Secondary replica. In that specific scenario, the REDO process keeps memory-optimized tables up to date in memory on the Secondary, which greatly reduces failover time.

      Indexes for memory-optimized tables have no physical representation on disk. That means they must be created as part of database recovery, further extending the recovery timeline.

      CPU bound recovery

      The recovery process for memory-optimized data uses one thread per logical CPU, and each thread handles a set of data/delta files. That means that simply restoring a database can cause the server to be CPU bound, potentially affecting other databases on the server.

      During recovery, SQL Server workloads can be affected by increased CPU utilization due to:

      • low bucket count for hash indexes – this can lead to excessive collisions, causing inserts to be slower
      • nonclustered indexes – unlike static HASH indexes, the size of nonclustered indexes will grow as the data grows. This could be an issue when SQL Server must create those indexes upon recovery.
      • LOB columns – new in SQL 2016, SQL Server maintains a separate internal table for each LOB column. LOB usage is exposed through the sys.memory_optimized_tables_internal_attributes and sys.dm_db_xtp_memory_consumers views. LOB-related documentation for these views has not yet been released.

      You can see from the following output that SQL 2016 does indeed create a separate internal table per LOB column. The Items_nvarchar table has a single NVARCHAR(MAX) column. It will take additional time during the recovery phase to recreate these internal per-column tables.

      image

      Corruption

      Because they don’t have any physical representation on disk (except for durability, if you so choose), memory-optimized tables are completely ignored by both CHECKDB and CHECKTABLE. There is no allocation verification, or any of the myriad other benefits that come from running CHECKDB/CHECKTABLE on disk-based tables. So what is done to verify that everything is ok with your memory-optimized data?

      CHECKSUM of data/delta files

      When a write occurs to a file, a CHECKSUM for the block is calculated and stored with the block. During database backup, the CHECKSUM is calculated again and compared to the CHECKSUM value stored with the block. If the comparison fails, the backup fails (no backup file gets created).

      Restore/Recovery

      If a backup file contains durable memory-optimized data, there is currently no way to interrogate that backup file to determine how much memory is required to successfully restore.

      I did the following to test backup/recovery for a database that contained durable memory-optimized data:

      • Created a database with only one durable memory-optimized table
      • Generated an INSERT only workload (no merging of delta/delta files)
      • INSERTed rows until the size of the table in memory was 20GB
      • Created a full database backup
      • Executed RESTORE FILELISTONLY for that backup file

      The following are the relevant columns from the FILELISTONLY output. Note the last row, the one that references the memory-optimized filegroup:

      image

      There are several things to be aware of here:

      • The size of the memory-optimized data in the backup is 10GB larger than memory allocated for the table (the combined size of the data/delta files is 30GB, hence the extra 10GB)
      • The Type for the memory-optimized filegroup is ‘S’. Within backup files, Filestream, FileTable and In-Memory OLTP all have the same value for Type, which means that database backups that contain two or more types of streaming data don’t have a way to differentiate resource requirements for restoring. A reasonable naming convention should help with that.
      • It is not possible to determine how much memory is required to restore this database. Usually the amount of memory is about the same size as the data/delta storage footprint, but in this case the storage footprint was overestimated by 50%, perhaps due to file pre-creation. There should be a fix in SQL 2016 RC0 to reduce the size of pre-created data/delta files for initial data load. However, this does not help with determining memory requirements for a successful restore.

      Now let’s have a look at a slightly different scenario — imagine that you have a 1TB backup file, and that you are tasked with restoring it to a development server. The backup file is comprised of the following:

      • 900GB disk-based data
      • 100GB memory-optimized data

      The restore process will create all of the files that must reside on disk, including files for disk-based data (mdf/ndf/ldf) and files for durable memory-optimized data (data/delta files). The general steps that the restore process performs are:

      • Create files to hold disk-based data (size = 900GB, so this can take quite a while)
      • Create files for durable memory-optimized data (size = 100GB)
      • After all files are created, 100GB of durable memory-optimized data must be streamed from the data files into memory

      But what if the server you are restoring to only has 64GB of memory for the entire SQL Server instance? In that case, the process of streaming data to memory will fail when there is no more memory available to stream data. Wouldn’t it have been great to know that before you wasted precious time creating 1TB worth of files on disk?

      When you ask SQL Server to restore a database, it determines if there is enough free space to create the required files from the backup, and if there isn’t enough free space, the restore fails immediately. If you think that Microsoft should treat databases containing memory-optimized data the same way (fail immediately if there is not enough memory to restore), please vote for this Connect item.

      SQL Server log shipping within the AWS Cloud

      Much of what you see in the blogosphere pertaining to log shipping and AWS references an on-premise server as part of the topology. I searched far and wide for any information about how to setup log shipping between AWS VMs, but found very little. However, I have a client that does business solely within AWS, and needed a solution for HA/DR that did not include on-premise servers.

      Due to network latency issues and disaster recovery requirements (the log shipping secondary server must reside in a separate AWS region), it was decided to have the Primary server push transaction logs to S3, and the Secondary server pull from S3. On the Primary, log shipping would occur as usual, backing up to a local share, with a separate SQL Agent job responsible for copying the transaction log backups to S3. Amazon has created a set of Powershell functionality embodied in AWS Tools for Windows Powershell, which can be downloaded here. One could argue that Amazon RDS might solve some of the HA/DR issues that this client faced, but it was deemed too restrictive.

      image_thumb12

      S3 quirks

      When files are written to S3, the date and time of when the file was last modified is not retained. That means when the Secondary server polls S3 for files to copy, it cannot rely on the date/time from S3. Also, it is not possible to set the LastModified value on S3 files. Instead, a list of S3 file name must be generated, and compared to files that reside on the Secondary. If the S3 file does not reside locally, it must be copied.

      Credentials – AWS Authentication

      AWS supports different methods of authentication:

      1. IAM roles (details here)
      2. profiles (details here)

      From an administrative perspective, I don’t have and don’t want access to the client’s AWS administratove console. Additionally, I needed a solution that I could easily test and modify without involving the client. For this reason, I chose an authentication solution based on AWS profiles that are stored within the Windows environment, for a specific Windows account (in case you’re wondering, the profiles are encrypted).

      Windows setup

      • create a Windows user named SQLAgentCmdProxy
      • create a password for the SQLAgentCmdProxy account (you will need this later)

      The SQLAgentCmdProxy Windows account will be used as a proxy in for SQL Agent job steps, which will execute Powershell scripts. (NOTE: if you change the drive letters and or folder names, you will need to update the scripts in this post)

      from a cmd prompt, execute the following:

      Powershell setup

      (The scripts in this blog post should be run on the Secondary log shipping server, but with very little effort, they can be modified to run on the Primary and push transaction log backups to S3.)

      The following scripts assume you already have an S3 bucket that contains one or more transaction log files that you want to copy to the Secondary server (they must have the extension “trn”, otherwise you will need to change -Match “trn” in the script below). Change the bucket name to match your bucket, and if required, also change the name of the region. Depending on the security configuration for your server, you may also need to execute “Set-ExecutionPolicy RemoteSigned” in a Powershell prompt as a Windows Administrator, prior to executing any Powershell scripts.

      After installing AWS Tools for Windows Powershell, create a new Powershell script with the following commands

      Be sure to fill in your AccessKey and SecretKey values in the script above, then save the script as C:\Powershell\Setup.ps1. When this script is executed, it will establish an AWS environment based on the proxy for the SQL Agent job step.

      The next step is to create a new Powershell script with the following commands:

      Again you should substitute your bucket and region names in the script above. Note that after the files are copied to the Secondary, the LastModifiedTime is updated based on the file name (log shipping uses the UTC format when naming transaction log backups). Save the Powershell script as C:\powershell\CopyS3TRNToLocal.ps1

      SQL Server setup

      • create a login for the SQLAgentCmdProxy Windows account (for our purposes, we will make this account a member of the sysadmin role, but you should not do that in your production environment)
      • create a credential named TlogCopyFromS3Credential, mapped to SQLAgentCmdProxy (you will need the password for SQLAgentCmdProxy in order to accomplish this)
      • create a SQL Agent job
      • create a job step, Type: Operating System (CmdExec), Runas: TlogCopyFromS3Credential

      Script for the above steps

      • Change references to <DomainName> to be your domain or local server name, and save the script
      • Execute the job
      • Open the job and navigate to the job step. In the Command window, change the name of the Powershell script from Setup.ps1 to CopyS3TRNToLocal.ps1
      • Execute the job
      • Verify the contents of the C:\Backups\logs folder – you should now see the file(s) from your S3 bucket

      Troubleshooting credentials

      If you see errors for the job that resemble this:

      InitializeDefaultsCmdletGet-S3Object : No credentials specified or obtained from persisted/shell defaults.

      then recheck the AccessKey and SecretKey values that you ran in the Setup.ps1 script. If you find errors in either of those keys, you’ll need to rerun the Setup.ps1 file (change the name of the file to be executed in the SQL Agent job, and re-run the job). If you don’t find any errors in the AccessKey or SecretKey values, you might have luck with creating the AWS profile for the proxy account manually (my results with this approach have been mixed). Since profiles are specific to a Windows user, we can use runas /user:SQLAgentCmdProxy powershell_ise.exe to launch the Powershell ISE, and then execute the code from Setup.ps1.

      You can verify that the Powershell environment uses the SQL proxy account by temporarily adding $env:USERNAME to the script.

      S3 Maintenance

      When you setup log shipping on the Primary or Secondary, you can specify the retention period, but S3 file maintenance needs to be a bit more hands on. The following script handles purging local and S3 files with the extension “trn” that are more than 30 days old, based on UTC file name.

      Save the script, and create a SQL Agent job to execute it. You’ll also have to reference the proxy account as in the prior SQL Agent job.

      Don’t forget

      If you use log shipping between AWS VMs as outlined in this post, you will need to disable/delete the SQL Agent copy jobs on the Primary and Secondary servers.

      Disaster Recovery

      All log shipping described here occurs within the AWS cloud. An alternative would be to ship transaction logs to a separate storage service (that does not use S3), or a completely separate cloud. At the time of this writing, this blog post by David Bermingham clearly describes many of the issues and resources associated with HA/DR in AWS.

      “Hope is not a strategy”

      HA/DR strategies require careful planning and thorough testing. In order to save money, some AWS users may be tempted to create a Secondary instance with small memory and CPU requirements, hoping to be able to resize the Secondary when failover is required. For patching, the ‘”resize it when we need it” approach might work, but for Disaster Recovery it can be fatal. Be forewarned that Amazon does not guarantee the ability to start an instance of a specific size, in a specific availability zone/region, unless the instance is reserved. If the us-east region has just gone down, everyone with Disaster Recovery instances in other AWS regions will attempt to launch them. As a result, it is likely that some of those who are desperately trying to resize and then launch their unreserved Disaster Recovery instances in the new region will receive the dreaded “InsufficientInstanceCapacity” error message from AWS. Even in my limited testing for this blog post, I encountered this error after resizing a t1-micro instance to r2.xlarge, and attempting to start the instance (this error persisted for at least 30 minutes, but the web is full of stories of people waiting multiple hours). You could try to launch a different size EC2 instance, but there is no guarantee you will have success (more details on InstanceCapacity can be found here).

      The bottom line is that if you run a DR instance that is not reserved, at the precise moment you require more capacity it may be unavailable. That’s not the type of hassle you want when you’re in the middle of recovering from a disaster.

      I am indebted to Mike Fal (b) for reviewing this post.

      A life in love with music, Part II

      Part I of this post can be found here.

      More good luck

      In the late 1970s, a lady who lived in my apartment building told me that her boyfriend was also a jazz musician and would soon be moving into the building. I was stunned when she mentioned the name Red Rodney (his bio can be seen here). Red was a trumpet player from Philly, who had played and recorded with the all-time genius of modern music, Charlie Parker. Red had quite a colorful history and many stories abound.

      He was kind enough to let me sit in with him on several occasions.

      Clark Terry

      In late 1980 I heard from some other musicians that the great trumpeter Clark Terry (of Count Basie and Duke Ellington fame) was putting together a big band to go on the road. I obtained contact information for Clark’s manager who was handling the tour, and much to my surprise, the requirements were not purely musical. In addition to a recent recording, you had to submit a photograph.

      Hmmm.

      What’s that you say?? Hardly a “double-blind” audition? Something’s not quite right with that. You didn’t have to be Albert Einstein to figure out what was going on here. Clark Terry is African-American, and he wants to make sure that he has African-Americans in his band.

      Perhaps the photographic requirement was related to the fact that despite being a creation of African-American culture, by 1980 Jazz had largely been abandoned by young African-American listeners and players.

      I asked a friend to shoot a Polaroid (seriously dating myself, I know….) so that I could include it with the recording I planned to submit. But – before he snapped the photo, I reached back in time for the hair style I had in the mid-1970s – a mammoth, black-hole-like AFRO. We dimmed the lights, and he clicked the shutter.

      The photo and recording were sent to Clark’s manager, and I was quite surprised to receive a call to join the band. It was a nine-week tour starting in February of 1981: three weeks in Europe, six weeks in the USA.

      When we got to Europe, Clark went to lunch with some of the other band members, and told them: “I could have sworn that mofo Ned Otter was black, I picked him myself!”

      That tour included Branford Marsalis on alto saxophone (he didn’t even own a tenor saxophone yet). To say that Branford and I were outspoken in our disapproval of Clark’s not-so-unique-to-jazz version of creative financial accounting would be an understatement.

      Road Warrior

      Chris Woods was a friend of Clark’s and a great alto saxophonist, but had the unfortunate task of being the road manager for us wild young folk.

      After complaining about something, we would get the party line from Chris. One day, he ended his remarks with “And that’s all you need to know.”

      That phrase would reverberate around the bus for the next nine weeks.

      Branford and I would often recreate the events of the day, with extra helpings of outrageous mockery. It went something like this:

      Me: “Hey Branford!”

      Branford: “Yeah, man, what’s up?”

      Me: “Look man, I’ve got a gig for you –”

      Branford: “That’s great, man. Details please….”

      Me: “Well look, it’s like this – first, we parachute into Zimbabwe…”

      Branford: “Ok!”

      Me: “We drive for 10 hours, do a sound check, then we do the gig –”

      Branford: “Beautiful!”

      Me: “Then, after the gig, we drive another 10 hours (no dinner), and uh…oh yeah, I almost forgot….that’s right…we have an unscheduled TV show….I don’t know how that slipped into the schedule…fancy that! But in exchange for the unscheduled TV show (which by the way you’re not getting paid for), we’ll be covering your hotel co-pay for tomorrow night”.

      Branford: “Fantastic, man, I’m just happy to have a gig! Can’t wait!”

      Me: “And that’s all you need to know–”

      And it went downhill from there.

      Loyalty

      On the bus, there was always a clear delineation of loyalty. The booty-kissers were all up front with Clark. The in-betweens were in-between. And the trouble makers were in the back with Branford and myself. After our daily mock-a-thon, you could actually see the steam start to rise up out of Clark’s ears.

      I figured that if I was going to be exploited, there was no reason I had to be quiet about it.

      One night, Branford – who at least back then was a devious sort of fellow – switched the valves on Clark’s trumpet in between sets. But Clark was such a great trumpet player, he somehow managed to keep playing (I’m sure it required an effort worthy of Hercules).

      Another time, Branford and I conspired to play a trick on the vocalist in the band. She was featured on “A Tisket, A Tasket”, made famous by Ella Fitzgerald, and after she sang the opening melody, it was Branford’s turn to solo. But we decided to change things up a bit. Branford stood up to play, and sort of mimed as if he was playing a solo, but I had the microphone passed down my way. The sounds that emanated from my horn would have made Albert Ayler sound like Jelly Roll Morton. The vocalist looked back in horror as Branford tried to keep from falling over with laughter.

      Ahh..the Baptism of the Road.

      Dizzy Gillespie

      In 1988, I got word that Dizzy Gillespie was organizing a big band tour. One of the saxophonists who did the same tour in 1987 and was slated to do it in 1988 – had an opportunity to join a different ensemble that would give him more solo space. This created an opening in Dizzy’s band, and I sent a package to the musical director.

      That guy threw my package into the large and ever-growing pile of packages that he had already received, never opening it. Rather than listening to them, he simply called George Coleman for a recommendation. George mentioned my name, and the guy said, “Yeah, I’ve got a package here from Ned Otter”. George suggested that he listen to what I’d sent, and if he liked what he heard, to give me a call.

      I was very fortunate to be able to play with Dizzy Gillespie on that tour in 1988. We played Carnegie Hall in New York, Albert Hall in London, massive amphitheatres all throughout Europe, and even went to Istanbul. I am greatly indebted to George for referring me.

      clip_image001

      in Europe with Dizzy Gillespie, July 1988

      (somber faces due to rain delay….)

      Further studies

      George Coleman mostly performed with a quartet/quintet, but in the early 1970s started an octet. He wrote a lot of the arrangements for this ensemble, but there were contributions by other great musicians as well. In 1996 I produced a recording of George’s octet, and got bitten by the arranging bug myself. My first effort was an arrangement of “Tenderly” – it took six weeks, day and night trying to get it together.

      I poured over the existing arrangements in George’s octet book. Among others, there were contributions by Harold Vick, Frank Foster, Frank Strozier, George Coleman, Harold Mabern and Bill Lee (father of renowned filmmaker Spike Lee).

      Bill Lee’s offerings were unique – they had a quality that was different than any of the others. I had met Bill years earlier at his home in Brooklyn, when George and I passed through one time.

      And so I thought – why not contact Bill Lee for some lessons on arranging and composition? Beginning in 2000 I studied with Bill as often as possible for about a year, and it revolutionized my approach to music. I can say without reservation that Bill Lee is one of the greatest musicians that I have been fortunate enough to be around.

      Coda

      More than 50 years have passed since there has been a period of great jazz innovation.

      Some say that it’s due to the (dis)-integration of the African-American community, that the melting pot of an essentially closed community gives birth to these types of culturally significant seismic shifts.

      I’m not sure what’s at the root of it, but I can’t help but think that the chart on this page has a lot to do with it:

      The USA is at the top of the television viewing hierarchy, weighing in at a whopping 293 minutes per-person on a daily basis. That’s almost five hours daily, thirty-five hours weekly.

      The jazz clubs that were prevalent in all urban environments have all but disappeared. Even New York City – the supposed Mecca of Jazz – has but a handful of clubs left, and a significant portion of those are tourist traps.

      Jazz and classical music suffer from the same type of issues: lack of exposure for new audiences. To paraphrase great pianist Barry Harris: “people don’t like jazz but never really heard it…” I think exposure to jazz would have a positive effect on a significant percentage of young people.

      Thanks for reading –

      Ned Otter

      New York City, 2014

      A life in love with music, Part I

      Music is my first love – it is an unparalleled force of auditory seduction. Without equal in its ability to bridge cultural, political and geographic divides, it is a union of the unspoken, lyrical and rhythmic aspects of sonic vibration.

      No disrespect intended to any lyricists, but lyrics by their nature are limited to what humans can express in words. Perhaps it’s because I am an instrumentalist, I relate directly to the most powerful musical force: the melody.

      Many have transcribed and studied the musical notes that great jazz musicians play. However, throughout those classic performances, many other facets of music wash over the listener, which in totality have a powerful effect on the perceived musical experience. A majority of those facets cannot be embodied by any known means of symbolization.

      I am so grateful that I have alternate means of economic survival outside of music (see my technology post here). This has allowed me to be true to my music, and avoid the “music as a job” approach that a lot of musicians have to deal with in order to simply survive.

      Musical ancestry

      While attending school during his early years, my father had been a member of the “color guard”. He was one of the few who was chosen to carry a flag during special activities, and the role was coveted. But there was a problem. Someone – they weren’t sure who – was throwing the entire vocal ensemble off key. They tracked it down to my father, and told him that if he wanted to continue carrying the flag, he would have to stop singing. He had to mouth the words to the national anthem from that day forward.

      Bob Otter was absolutely, unequivocally, one hundred percent tone-deaf (luckily he pursued the visual realm, documented here).

      Next generation

      My brother Sam was the first saxophonist I ever saw perform.

      In 1970, he played the Paul Desmond classic “Take Five” with the concert band at I.S. 70 (Intermediate School, for those of you not familiar with public school abbreviations).

      It was as if I was struck by lightning – I decided at that moment to become a professional musician, despite the fact that (other than kid stuff on a piano) I had never touched a musical instrument. I had no idea if I had any aptitude for things musical.

      First mentor

      I followed in my brother’s footsteps and attended I.S. 70. It was a fairly new school, and had a large group of young and dedicated teachers. One of them was a tireless motivator, a ceaseless source of musical inspiration: band director Jerry Sheik taught generations of us young folks how to play and appreciate music.

      Sheik was not your typical middle school band director. He was a professional musician – a drummer – and had personal relationships with many great musicians of the day, among them Tito Puente. Sheik was my first musical mentor, and as such, he occupies a special place in my musical lineage.

      In my last year at I.S.70, Sheik selected a few of us to be members of his “sign-out” crew. Young students who could not afford to purchase musical instruments could take instruments home from Sheik’s band room, and it was our job to keep track of it all. As a member of Sheik’s sign-out crew, we had access to him more than the other kids. After regular school hours, he sometimes played records in the band room, introducing us to new music. One day he was spinning a record that included Cannonball Adderley’s performance of “The Song Is You”, and I asked him who the record belonged to. Seeing how captivated I was by the music, Sheik replied: “You!”, and insisted that I keep it (it is a part of my record collection to this day).

      Five days a week, three years running, Sheik was part of my world. While in his care, I developed a deep and infinite love of all things musical, particularly jazz. As Sheik’s musical palette was extremely diverse, we played all types of music. Perhaps because it was also the height of the disco era, I developed a permanent dislike of music of a flippant and/or purely commercial nature.

      clip_image001

      Jerry Sheik, Musical Mentor Extraordinaire, 1974

      photo by Robert Otter

      clip_image003

      Me – One Funky White Boy, circa 1973

      My brother Sam Otter died of shock after taking this photo

      Not a natural

      I would imagine that my entry into instrumental music was not dissimilar to others. Learning to read and notate music was fun and exciting, but learning to play a musical instrument was slow and frustrating. My ears were way ahead of what I could execute, and at times, my parents would beg me to stop practicing. The endless repetition, the going-nowhere-no-matter-how-many-times-you-tried-to-push-forward – it all added up to a glacially slow and often tortuous path towards improvement (I will confess that a few years later, still struggling to develop technique on the alto saxophone – many, many times did I have the window to my 6th floor apartment open, seriously contemplating whether or not to throw my horn out like a Frisbee…).

      Between the ages of eleven to thirteen, the main focus of my life was to advance my musical skills to a point where I might gain entry to “Sheik’s Freaks” as the I.S. 70 Stage Band was known.

      Sheik had a friend named Jay Dryer who coached me for my audition to the High School of Performing Arts (known by those who attended simply as PA). I was accepted to PA, which was the school that the movie “Fame” was based on (and no, we did not dance on the cars at lunch time….). Attending PA exposed me to a higher level of musicality than I had been accustomed to. We had sight-singing for an entire year, and that class radically altered the way I heard, recognized and identified different notes.

      Students came from all over NYC to attend PA, some from as far away as Staten Island. At the end of my junior year, my friends started talking about this young saxophonist they knew that would be arriving at PA the following year. I got so sick of hearing about this guy, I couldn’t stand it anymore. He was only fourteen years old – how great could he be?

      I literally could not believe my ears, when I heard the object of their praise – a brilliant young musician named Drew Francis. He had it all – perseverance coupled with improvisation, composition and arranging skills (and just to add insult to injury – he also had perfect pitch). In addition to saxophone (soprano, alto and tenor), he was an excellent flute and clarinet player. At just fourteen years of age, Drew Francis was light years ahead of anything I could possibly wrap my brain around at that time.

      Drew used to make recordings in the basement of his Staten Island house, and one of them included a mutual friend named Dan Weiss, who kindly supplied me with this recording of Drew. It was made when Drew was still a teenager, probably about seventeen.

      Sadly, the brilliant light of Drew Francis did not shine long. He passed away at just 39 years old, never having realized a fraction of his great potential.

      clip_image005

      Drew Francis (left), Randy Andos (right)

      (photograph by David Rothschild, used by permission)

      Second mentor

      One crucial development that arose out of attending PA was meeting tenor saxophonist Jeff Gordon. Jeff had a younger brother that attended PA, and he urged me to study with Jeff. By this time I could read and notate music well, understood the basics of harmony, and was a good instrumentalist, given the relatively few years I had been playing the alto saxophone. However, I knew nothing of improvisation. I would go to jam sessions and play transcriptions of other musician’s solos. After my performance of the transcribed solo ended, I was not able to contribute anything of my own.

      With regard to studying with Jeff Gordon, I wanted to “try before I buy”, and so in 1976 I attended a concert where Jeff played as part of a larger ensemble. “Blown away” would be an apt description of my reaction. At twenty-two years of age, Jeff was a young lion, bursting with musical feeling. He had everything I coveted.

      While continuing to play alto at PA, at seventeen I acquired a tenor, and began my studies with Jeff. After about a year, Jeff informed me that my studies with him were complete He said that I needed to seek out musicians who could take me to the next level, and the two names he mentioned were Frank Foster and George Coleman. I had heard a little bit of George Coleman on Miles Davis’ classic “Four and More” album, and Jeff had also played for me George’s great solo on “Have You Met Miss Jones” from a Chet Baker album.

      Sad as I was to move away from the musical sphere of Jeff Gordon, I picked up the phone and called Frank Foster to see if he would take me on. Frank said he was too busy, and was not accepting students at that time.

      Right place, right time

      Right around this time, I received a phone call from an old friend that I had known at I.S. 70, Josiah Weiner, whose father had a truly unique talent – he could fix any type of art work. We’re talking about art objects that resides in museums and personal collections. As such, the elder Mr. Weiner knew many, many people in the art world, and one of them was Merton Simpson. Mert was one of the foremost dealers of primitive art in the world, and also a tenor saxophonist and jazz fan.

      Josiah was calling to tell me that Mert was throwing a party at his gallery at 80th and Madison, and that there would be jazz musicians performing. Knowing of my interest in Jazz, Josiah asked Mert if I could come up and play. Mert agreed.

      clip_image007

      Mert Simpson

      photographer unknown

      I arrived at the gallery and listened for the first set. Then the musicians asked me to come up and sit in. There was a saxophonist, trumpeter, and a rhythm section. Muscles bursting everywhere, the saxophonist looked like a football player – think “Iron Man”. I remember that I only knew two of the songs they played: “How High The Moon” and “Body and Soul”.

      After the set, I walked up to the saxophonist and thanked him for letting me play. I asked him his name, and he replied:

      “George – George Coleman…” (this was about two weeks after Jeff Gordon suggested that I study with him).

      I picked my jaw up off the floor, ran over to Josiah, and called him every kind of curse word that I could think of for not telling me that I was going to be sitting in with the great George Coleman. Josiah’s response: “Who is George Coleman????”

      And so began the longest and most profound musical and personal relationship of my life. I studied with George monthly for about five years. He performed quite often at that time, and I recorded his live performances (still have my cassettes!). At my lessons we would play the recordings back, and I’d ask him about specific things that I didn’t understand.

      Many, many pearls of wisdom were imparted at these lessons. But the most precious gift I received was that he taught me how to teach myself. Without fail, all of the musicians that I have known from his generation would never tell you how to play. They might show you an example of one way to do something, and then ask you to continue it.

      clip_image009

      George Coleman, NYC c. 1980

      photographer unknown

      University of the Streets

      A critical part of my musical education was the time spent around George Coleman and his brilliant band members in between sets, in the back rooms of jazz clubs in New York City. This accumulated hang time, combined with the band stand time that he generously granted me, made all the difference in the world in my musical development. I sat in often, and therefore got to play with and know many of the great musicians that George was associated with. A short list would have to include:

      Jamil Nasser, Harold Mabern, Hilton Ruiz, Mario Rivera, Ray Drummond, Billy Higgins, Danny Moore, Al Foster, Walter Bolden, Ahmad Jamal, Junior Cook, Frank Strozier, Harold Vick, Philly Joe Jones, Billy Hart, and many, many others.

      I was trespassing in the rarified air and I knew it.

      An unexpected phone call from an old friend had positioned me in the exact time and place to encounter one of the all-time great saxophone stylists. George Coleman became my most influential musical mentor, and foster-father. His effect on my musical development is undeniable.

      There will be another section to this post.

      Thanks for reading –

      Ned Otter

      New York City, 2014