Category Archives: DBA

Backup and Recovery for SQL Server databases that contain durable memory-optimized data

With regard to backup and recovery, databases that contain durable memory-optimized tables are treated differently than backups that contain only disk-based tables. DBAs must be aware of the differences so that they don’t mistakenly affect production environments and impact SLAs.

The following image describes files/filegroups for databases that contain durable memory-optimized data:

clip_image002

Data/delta files are required so that memory-optimized tables can be durable, and they reside in Containers, which is a special type of folder. Containers can reside on different drives (more about why you’d want to do that in a bit).

Database recovery occurs due to the following events:

  • Database RESTORE
  • Database OFFLINE/ONLINE
  • Restart of SQL Server service
  • Server boot
  • Failover, including
      • FCI
    • Availability Groups*
    • Log Shipping
    • Database mirroring

The first thing to be aware of is that having durable memory-optimized data in a database can affect your Recovery Time Objective (RTO).

Why?

Because for each of the recovery events listed above, SQL Server must stream data from the data/delta files into memory as part of recovery.

There’s no getting around the fact that if you have lots of durable memory-optimized data, even if you have multiple containers on different volumes, recovery can take a while. That’s especially true in SQL 2016 because Microsoft has raised the limit on the amount of memory-optimized data per database from 256GB to multiple TB (yes, terabytes, limited only by the OS). Imagine waiting for your multi-terabytes of data to stream into memory, and how that will impact your SLAs (when SQL Server streams data to memory, you’ll see a wait type of WAIT_XTP_RECOVERY).

*One exception to the impact that failover can have is when you use Availability Groups with a Secondary replica. In that specific scenario, the REDO process keeps memory-optimized tables up to date in memory on the Secondary, which greatly reduces failover time.

Indexes for memory-optimized tables have no physical representation on disk. That means they must be created as part of database recovery, further extending the recovery timeline.

CPU bound recovery

The recovery process for memory-optimized data uses one thread per logical CPU, and each thread handles a set of data/delta files. That means that simply restoring a database can cause the server to be CPU bound, potentially affecting other databases on the server.

During recovery, SQL Server workloads can be affected by increased CPU utilization due to:

  • low bucket count for hash indexes – this can lead to excessive collisions, causing inserts to be slower
  • nonclustered indexes – unlike static HASH indexes, the size of nonclustered indexes will grow as the data grows. This could be an issue when SQL Server must create those indexes upon recovery.
  • LOB columns – new in SQL 2016, SQL Server maintains a separate internal table for each LOB column. LOB usage is exposed through the sys.memory_optimized_tables_internal_attributes and sys.dm_db_xtp_memory_consumers views. LOB-related documentation for these views has not yet been released.

You can see from the following output that SQL 2016 does indeed create a separate internal table per LOB column. The Items_nvarchar table has a single NVARCHAR(MAX) column. It will take additional time during the recovery phase to recreate these internal per-column tables.

image

Corruption

Because they don’t have any physical representation on disk (except for durability, if you so choose), memory-optimized tables are completely ignored by both CHECKDB and CHECKTABLE. There is no allocation verification, or any of the myriad other benefits that come from running CHECKDB/CHECKTABLE on disk-based tables. So what is done to verify that everything is ok with your memory-optimized data?

CHECKSUM of data/delta files

When a write occurs to a file, a CHECKSUM for the block is calculated and stored with the block. During database backup, the CHECKSUM is calculated again and compared to the CHECKSUM value stored with the block. If the comparison fails, the backup fails (no backup file gets created).

Restore/Recovery

If a backup file contains durable memory-optimized data, there is currently no way to interrogate that backup file to determine how much memory is required to successfully restore.

I did the following to test backup/recovery for a database that contained durable memory-optimized data:

  • Created a database with only one durable memory-optimized table
  • Generated an INSERT only workload (no merging of delta/delta files)
  • INSERTed rows until the size of the table in memory was 20GB
  • Created a full database backup
  • Executed RESTORE FILELISTONLY for that backup file

The following are the relevant columns from the FILELISTONLY output. Note the last row, the one that references the memory-optimized filegroup:

image

There are several things to be aware of here:

  • The size of the memory-optimized data in the backup is 10GB larger than memory allocated for the table (the combined size of the data/delta files is 30GB, hence the extra 10GB)
  • The Type for the memory-optimized filegroup is ‘S’. Within backup files, Filestream, FileTable and In-Memory OLTP all have the same value for Type, which means that database backups that contain two or more types of streaming data don’t have a way to differentiate resource requirements for restoring. A reasonable naming convention should help with that.
  • It is not possible to determine how much memory is required to restore this database. Usually the amount of memory is about the same size as the data/delta storage footprint, but in this case the storage footprint was overestimated by 50%, perhaps due to file pre-creation. There should be a fix in SQL 2016 RC0 to reduce the size of pre-created data/delta files for initial data load. However, this does not help with determining memory requirements for a successful restore.

Now let’s have a look at a slightly different scenario — imagine that you have a 1TB backup file, and that you are tasked with restoring it to a development server. The backup file is comprised of the following:

  • 900GB disk-based data
  • 100GB memory-optimized data

The restore process will create all of the files that must reside on disk, including files for disk-based data (mdf/ndf/ldf) and files for durable memory-optimized data (data/delta files). The general steps that the restore process performs are:

  • Create files to hold disk-based data (size = 900GB, so this can take quite a while)
  • Create files for durable memory-optimized data (size = 100GB)
  • After all files are created, 100GB of durable memory-optimized data must be streamed from the data files into memory

But what if the server you are restoring to only has 64GB of memory for the entire SQL Server instance? In that case, the process of streaming data to memory will fail when there is no more memory available to stream data. Wouldn’t it have been great to know that before you wasted precious time creating 1TB worth of files on disk?

When you ask SQL Server to restore a database, it determines if there is enough free space to create the required files from the backup, and if there isn’t enough free space, the restore fails immediately. If you think that Microsoft should treat databases containing memory-optimized data the same way (fail immediately if there is not enough memory to restore), please vote for this Azure UserVoice item.

SQL Server log shipping within the AWS Cloud

Much of what you see in the blogosphere pertaining to log shipping and AWS references an on-premises server as part of the topology. I searched far and wide for any information about how to setup log shipping between AWS VMs, but found very little. However, I have a client that does business solely within AWS, and needed a solution for HA/DR that did not include on-premises servers.

Due to network latency issues and disaster recovery requirements (the log shipping secondary server must reside in a separate AWS region), it was decided to have the Primary server push transaction logs to S3, and the Secondary server pull from S3. On the Primary, log shipping would occur as usual, backing up to a local share, with a separate SQL Agent job responsible for copying the transaction log backups to S3. Amazon has created a set of Powershell functionality embodied in AWS Tools for Windows Powershell, which can be downloaded here. One could argue that Amazon RDS might solve some of the HA/DR issues that this client faced, but it was deemed too restrictive.

image_thumb12

S3 quirks

When files are written to S3, the date and time of when the file was last modified is not retained. That means when the Secondary server polls S3 for files to copy, it cannot rely on the date/time from S3. Also, it is not possible to set the LastModified value on S3 files. Instead, a list of S3 file name must be generated, and compared to files that reside on the Secondary. If the S3 file does not reside locally, it must be copied.

Credentials – AWS Authentication

AWS supports different methods of authentication:

  1. IAM roles (details here)
  2. profiles (details here)

From an administrative perspective, I don’t have and don’t want access to the client’s AWS administratove console. Additionally, I needed a solution that I could easily test and modify without involving the client. For this reason, I chose an authentication solution based on AWS profiles that are stored within the Windows environment, for a specific Windows account (in case you’re wondering, the profiles are encrypted).

Windows setup

  • create a Windows user named SQLAgentCmdProxy
  • create a password for the SQLAgentCmdProxy account (you will need this later)

The SQLAgentCmdProxy Windows account will be used as a proxy in for SQL Agent job steps, which will execute Powershell scripts. (NOTE: if you change the drive letters and or folder names, you will need to update the scripts in this post)

from a cmd prompt, execute the following:

Powershell setup

(The scripts in this blog post should be run on the Secondary log shipping server, but with very little effort, they can be modified to run on the Primary and push transaction log backups to S3.)

The following scripts assume you already have an S3 bucket that contains one or more transaction log files that you want to copy to the Secondary server (they must have the extension “trn”, otherwise you will need to change -Match “trn” in the script below). Change the bucket name to match your bucket, and if required, also change the name of the region. Depending on the security configuration for your server, you may also need to execute “Set-ExecutionPolicy RemoteSigned” in a Powershell prompt as a Windows Administrator, prior to executing any Powershell scripts.

After installing AWS Tools for Windows Powershell, create a new Powershell script with the following commands

Be sure to fill in your AccessKey and SecretKey values in the script above, then save the script as C:\Powershell\Setup.ps1. When this script is executed, it will establish an AWS environment based on the proxy for the SQL Agent job step.

The next step is to create a new Powershell script with the following commands:

Again you should substitute your bucket and region names in the script above. Note that after the files are copied to the Secondary, the LastModifiedTime is updated based on the file name (log shipping uses the UTC format when naming transaction log backups). Save the Powershell script as C:\powershell\CopyS3TRNToLocal.ps1

SQL Server setup

  • create a login for the SQLAgentCmdProxy Windows account (for our purposes, we will make this account a member of the sysadmin role, but you should not do that in your production environment)
  • create a credential named TlogCopyFromS3Credential, mapped to SQLAgentCmdProxy (you will need the password for SQLAgentCmdProxy in order to accomplish this)
  • create a SQL Agent job
  • create a job step, Type: Operating System (CmdExec), Runas: TlogCopyFromS3Credential

Script for the above steps

  • Change references to <DomainName> to be your domain or local server name, and save the script
  • Execute the job
  • Open the job and navigate to the job step. In the Command window, change the name of the Powershell script from Setup.ps1 to CopyS3TRNToLocal.ps1
  • Execute the job
  • Verify the contents of the C:\Backups\logs folder – you should now see the file(s) from your S3 bucket

Troubleshooting credentials

If you see errors for the job that resemble this:

InitializeDefaultsCmdletGet-S3Object : No credentials specified or obtained from persisted/shell defaults.

then recheck the AccessKey and SecretKey values that you ran in the Setup.ps1 script. If you find errors in either of those keys, you’ll need to rerun the Setup.ps1 file (change the name of the file to be executed in the SQL Agent job, and re-run the job). If you don’t find any errors in the AccessKey or SecretKey values, you might have luck with creating the AWS profile for the proxy account manually (my results with this approach have been mixed). Since profiles are specific to a Windows user, we can use runas /user:SQLAgentCmdProxy powershell_ise.exe to launch the Powershell ISE, and then execute the code from Setup.ps1.

You can verify that the Powershell environment uses the SQL proxy account by temporarily adding $env:USERNAME to the script.

S3 Maintenance

When you setup log shipping on the Primary or Secondary, you can specify the retention period, but S3 file maintenance needs to be a bit more hands on. The following script handles purging local and S3 files with the extension “trn” that are more than 30 days old, based on UTC file name.

Save the script, and create a SQL Agent job to execute it. You’ll also have to reference the proxy account as in the prior SQL Agent job.

Don’t forget

If you use log shipping between AWS VMs as outlined in this post, you will need to disable/delete the SQL Agent copy jobs on the Primary and Secondary servers.

Disaster Recovery

All log shipping described here occurs within the AWS cloud. An alternative would be to ship transaction logs to a separate storage service (that does not use S3), or a completely separate cloud. At the time of this writing, this blog post by David Bermingham clearly describes many of the issues and resources associated with HA/DR in AWS.

“Hope is not a strategy”

HA/DR strategies require careful planning and thorough testing. In order to save money, some AWS users may be tempted to create a Secondary instance with small memory and CPU requirements, hoping to be able to resize the Secondary when failover is required. For patching, the ‘”resize it when we need it” approach might work, but for Disaster Recovery it can be fatal. Be forewarned that Amazon does not guarantee the ability to start an instance of a specific size, in a specific availability zone/region, unless the instance is reserved. If the us-east region has just gone down, everyone with Disaster Recovery instances in other AWS regions will attempt to launch them. As a result, it is likely that some of those who are desperately trying to resize and then launch their unreserved Disaster Recovery instances in the new region will receive the dreaded “InsufficientInstanceCapacity” error message from AWS. Even in my limited testing for this blog post, I encountered this error after resizing a t1-micro instance to r2.xlarge, and attempting to start the instance (this error persisted for at least 30 minutes, but the web is full of stories of people waiting multiple hours). You could try to launch a different size EC2 instance, but there is no guarantee you will have success (more details on InstanceCapacity can be found here).

The bottom line is that if you run a DR instance that is not reserved, at the precise moment you require more capacity it may be unavailable. That’s not the type of hassle you want when you’re in the middle of recovering from a disaster.

I am indebted to Mike Fal (b) for reviewing this post.

The Road to Technology

A Tale of Perseverance

Initial resistance

During the mid-1980s, as personal computer technology started to gain acceptance in the work place, I was steadfastly against learning anything about it. I had various types of jobs, including croupier, piano tuner, trash man in my apartment building and foot messenger.

By 1988, however, I had somewhat relented. Based on my newly discovered interest in genealogy, my birthday present that year was a DOS software package called “Roots III” that arrived on 5-1/4 inch floppy disks (seriously dating myself, I know). As I struggled to learn the difference between a path and a folder, technology began to win me over. Computers were awesomely cool, and my inner-gadget-guy came alive.

In April of 1988 the phone rang (yes, they used to have bells and literally “ring” when you received a call) with an offer to go on the road with Dizzy Gillespie. Despite my mother being very, very ill at the time, I agreed to hit the road for a tour of the USA and Europe, for a total of three weeks. We played Carnegie Hall, which was a real thrill, and all the major jazz festivals of Europe. Dizzy was about 72 at the time, and other than in 1987, had not worked with a full big band in many, many years.

nedanddizzy

in Europe with Dizzy Gillespie, July 1988

I stayed in Europe after the tour with Diz ended, and returned to NYC in late October of 1988. Having always been too stubborn to play any music I didn’t feel passionate about, I considered learning word processing to fill in the gaps. I had a friend at the time who did this type of work, and agreed to let me spend time on his IBM “clone”.

In order to get a temp job doing word processing, you had to type at least 50 words a minute, with very few mistakes. I already owned an electric typewriter, and so I bought a typing practice book. After a while my typing improved to the point where I thought I was ready to look for work.

Without fail, each and every temp agency that I applied to had a typing test, and I flunked them all. But then I found one agency which had only a computer test for Word Perfect (does anyone even use that any more?). The guy at the front desk asked me if I had ever been there before, and I replied no. I took the test, and just missed a passing grade. So I went back home, researched the parts of the test that I thought I had difficulty with, and I returned to the same agency a week later. When the guy asked me if I had ever been then before, I said no. The test was exactly the same and you will not be shocked to learn that I passed.

I was assigned to the Asia Bureau of the United Nations Development Program, a few blocks north of the famous Secretariat building, and my rate was $14.50 per hour. While there I met James Oliver, a desktop database contractor (dBase, FoxPro) who was making the staggering sum of $45 per hour. We became friends, and I started to become more curious about what James did. I began to wonder if I could ever wrap my brain around the type of work that he was involved in.

In late 1989 I came into enough money to take an extended break from the work world, and concentrate full-time on becoming a computer programmer. I left the UNDP job, and purchased a 286 Toshiba laptop for the whopping sum of $2,500.

My goal was to become a desktop database programmer, and I blocked out 18 months to get it done. There was just one small problem:

I had no idea how to go about doing it.

The internet did not yet exist for public consumption, and there was only a single book on the specific technology that I was interested in. But there were dial-up services like Compuserve, which had many bulletin boards with specific topics. One was about FoxPro, a desktop database (pre-Microsoft purchase). It was a fantastic alternative to dBase, which was owned by then software giant Ashton-Tate.

Long is the road, and hard is the way

I am truthful when I say that I spent so many hours per day programming in FoxPro, that towards the end of each day, I could no longer sit down. I took a stack of LPs (vinyl records for you young folks) from my shelves, set my laptop on the stack, and continued to program into the wee hours of the morning while standing up. Every day. Every night. Every month. I wrote programs for my sister’s real estate office, my dentist, non-profits, for anyone that would let me, and I didn’t get paid a cent (except from the dentist). I locked myself in my 400 square foot apartment in Greenwich Village, and vowed not to emerge until I was a good programmer. During the approximately two years I studied, I would guess-timate that I put in 10 to 15 hours per day, and got about 5 years experience.

FoxProFloppy

One of my 5-1/4 FoxPro floppies

I had started to look for programming work a little on the late side, and by the end of 1991 my money ran out. I was four months behind on my rent, and had received shut-off notices for both my electrical and telephone service. My credit card debt exceeded $18,000 (and those were 1990s dollars).

Light at the end of the tunnel

In February of 1992 I had an interview at Chemical Bank (later devoured by Chase), and the interview went well. I worked there for a year as a FoxPro programmer.

While at Chemical I got word that FoxPro programmers were in high demand at a high profile Wall Street bank. I interviewed there and was accepted. But after a year of working without a break at Chemical, I wanted to have two weeks off before starting on Wall Street. But I had to give two weeks notice at Chemical. The rep at the agency that I was working through thought that a month was too long to wait before starting, but I insisted. On my last work day at Chemical Bank I received a phone call from the agency. They had heard from the new bank that they “no longer required the services of Ned Otter.”

But I was done with Chemical, and moved on to freelance work.

A short while later, I had another interview at the same high-profile Wall Street bank, but in a different department. While in the building, I ran into the manager that interviewed me for the first position (she wished they had hired me). She asked what I was doing there, and I told her that I had another interview. She looked me dead in the eye and said: “After what they did to you, I would never set foot in this building again”. But I was determined to gain entry to the forbidden inner sanctum of Wall Street banking.

A lot of the early desktop database systems that were implemented at this bank were actually coded by traders, not programmers. They had deep analytical knowledge of their business, but their code was unreadable, uncommented, and unmaintainable. I knew I was in trouble when just such a person ushered me into a room where I took a written test, and I would not be considered for a position unless I passed this phase. This quagmire of formulae and symbols was somehow expected to be interpreted by those with perhaps vast programming experience, but zero business knowledge.

Needless to say, the entire experience was a disaster. Afterwards, they told the agency to send their best candidate. The agency said that they had already shot down their best candidate (me).

A few weeks later, I was told that another department in the same bank needed someone with my qualifications (they knew of the prior debacles, but agreed to have me interview). The staff that interviewed me weren’t immediately convinced to hire me (the agency rep half-jokingly offered them a set of free steak knives if they would give me a chance). We all finally came to an agreement that I would work there for one week, and if they didn’t like me, they didn’t have to compensate me (an outrageous proposition, but I was sure they would keep me if I could just get my foot in the door). Things went well the first week, and they decided to keep me on. The manager later asked me why I kept coming back for interviews. I told him: “Because you mofos kept saying no”.

I ain’t no accidental DBA

FoxPro was a derivative of dBase, both products using non-standard ways to access, retrieve and manipulate data. FoxPro had started to incorporate enough of the standard query language for me to consider making a shift to corporate database platforms that were based on SQL (Structured Query Language). One of those database platforms was Unmentionable-DB, and at the Wall Street bank, there were many Database Administrators (DBAs) of Unmentionable-DB on staff. I asked one of them what it was like to be a DBA.

“If hours and hours of sheer boredom, followed by moments of absolute terror sounds good to you, you’ll enjoy being a DBA.”

That intrigued me, but there were two other motivational factors:

1. I could see the end of desktop databases on the horizon

2. While faxing a time sheet to my agency, by chance I saw an incoming fax from Unmentionable-DB to the bank. It was an invoice for one of their consultants who was on site at the bank, and my jaw dropped when I saw that the daily rate was in excess of $1,200. I was stunned. That was four times what James Oliver was making at UNDP.

So I set my sights on becoming a DBA of the Unmentionable-DB platform.

In 1994 I took the Unmentionable-DB certification exams and passed, never having actually touched the product (hence the universal distrust of most certifications). I wanted to get my hands on the software to get some real-world experience, and was overjoyed to find out that Unmentionable-DB had a developer version of their database that was priced (outrageously) at $1,000. There was only one catch:

You had to pay an additional $4,000 for an annual support contract.

That’s a lot of money today, in 2013; it was a small fortune in 1994. I argued with them that I wouldn’t need support, as they had just certified me on their database platform. But they would not yield. I paid the outrageous sum and got my hands on it, but the entire episode put such a bad taste in my mouth, that I vowed to not use or touch Unmentionable-DB ever again (and have maintained that vow to this day).

Then a new player in the database market made its mark.

While at Chemical Bank in 1992, one of the guys I worked with got a hold of a new database platform called Microsoft SQL Server, and it ran on the IBM OS/2 operating system. This was a time when all software was delivered on 5-1/4 inch floppy disks, or 3-1/2 inch not-so-floppy disks (OS/2 had to be installed from approximately 20 not-so-floppies). It took forever to install, and then on the last disk, it failed. Ultimately I got it loaded, but on my puny 286 computer it ran so slowly, I lost interest completely.

Fast forward to 1995 – Microsoft had introduced its own operating system, Windows NT. I committed to learning their database platform, and have never looked back.

Ned Otter

New York City, 2013