A Problem with Storage Spaces, Failover Clustering, and Always On Availability Groups in Azure

This is quite possibly my longest blog post title, ever, however it is pretty important for anyone who is building SQL Server configurations using Azure virtual machines. If you aren’t familiar with Azure storage, the max size of an individual disk is 4 TB (and more importantly 7500 IOPs–most workloads will be better served by the 1 TB/5000 IOPs P30 disks), and to get a volume larger than the size of a single disk (and with more IOPs), the admin will use Storage Spaces to create a RAID 0 group (because Azure provides RAID in the infrastructure) and you get the sum of your storage and IOPs. This process is well documented and I’ve done it a number of times or customer workloads. I was building a new availability group for a customer last week, and the process was a little bit different, and failed on me.

Windows Clustering is Aggressive with Your Storage

Note: As I was writing this post, about four different parts of the VM creation process changed in the portal. Welcome to cloud computing boys and girls.

So in this demo–I created a VM and added two disks. I’ll also join the VM to our Active Directory domain. I haven’t added failover clustering yet. From server manager, I can see my primordial storage pool, with the two disks I’ve added.


So, next I’ll add the Failover Clustering feature and build a one node cluster. For this demo, you only need one node–the behavior is same in a single or multi-node configuration. I’ve built a cluster and you will note that there are no disks present in the cluster.


However, if I go back to storage spaces, I still see my primordial pool, but no physical disks, and now I have an error about “Incomplete Communication with Cluster”


At this point, I am unable to configure a storage pool for my SQL Server data.p5

I’ve had inconsistent results here–but in no situation have I been able to create a storage pool.


This really sucks, but the workaround is the evict the node from the cluster, and then create your storage pool as documented above. That’s kind of gross, so a better workaround is to configure your storage spaces pool, before you add your node to your Windows cluster. I did test the process of adding a new disk to an existing pool, after the server has been clustered and that process works as expected.

Root Cause

My thought (and I haven’t tried to debug this, but I have communicated with Microsoft Windows and SQL teams about it) is that clustering is being too aggressive with the available storage. I tried running some PowerShell to prevent clustering from taking the disks, but I still had the same result. I’ll update this post if I hear anything further from Microsoft.


Monitoring Availability Groups—New Tools from Solarwinds

As I mentioned in my post a couple of weeks ago, monitoring the plan cache on a readable secondary replica can be a challenge. My customer was seeing dramatically different performance, depending on whether a node was primary or secondary. As amazing as the Query Store in SQL Server 2016 is, it does not allow you to view statistics from the readable secondary. So that leaves you writing xQuery to mine the plan cache DMVs for the query information you are trying to identify.

My friends at Solarwinds (Lawyers: see disclaimer at bottom of post) introduced version 11.0 of Database Performance Analyzer (DPA, a product you may remember as Ignite) which has full support for Availability Group monitoring. As you can see in the screenshot below, DPA gives a nice overview of the status of your AG, and also lets you dig into the performance on each node.


There are a host of other features in their new releases, which you can check out some of their new hybrid features in their flagship product Orion. Amongst these features, a couple jumped out at me—there is now support for Amazon RDS and Azure SQL Database in DPA, and there is some really cool correlation data that will let your compare performance across your infrastructure. So, when you the DBA is arguing with the SAN, network, and VM teams about where the root cause of the performance problem, this tool can quickly isolate the root cause of the issue. With less fighting. These are great products, give them a look.

Disclaimer: I was not paid for this post, but I do paid work for SolarWinds on a regular basis.

Query Store and Availability Groups—Force Plan on Secondary Replicas

I’m still fighting with some challenges about inconsistent performance between a primary and secondary replica, so I’ve been waste deep in undocumented system views looking at temporary statistics. One of the things I thought about doing was talking advantage of the Force Plan option in the Query Store in SQL Server 2016.  If you are not familiar with this feature, it allows you to force a “preferred” execution plan. In this scenario, our query was running in about 20-30 seconds on the primary, and 20-30 minutes on the secondary. The plans were reasonably close, but I wanted to see what would happen if I forced a plan on the primary.

Primer about the Query Store and Availability Groups

Since readable secondary replicas are read-only, the query store on those secondary replicas are also read-only. This means runtime statistics for queries executed on those replicas are not recorded into the query store. All the stats there are from the primary replica. However, I wasn’t sure what would happen if I forced a plan on the primary—would the secondary replica honor that plan?

Let’s Find Out

The first thing I did was to query the query store catalog views to verify that the plan was forced.


I have to copies of the forced plan. If I run an estimated query plan on the primary, I see that plan is forced. You can see this by looked for UsePlan in the XML of the plan.


I did the same thing on the secondary (in the case of the secondary, we are looking at the actual plan, but it doesn’t matter).


You will note that there is no UsePlan. There are extended events and a catalog view that reflect plan forcing failure (Grant Fritchey wrote about this behavior here), While, I wouldn’t expect the catalog view to get updated, I was hoping that the Extended Event might fire. It did not.


The query store, as awesome as it is, doesn’t really do much for you on readable secondary replica. It does not force plans, nor does it record any of your data.

Thanks to Grant Fritchey and Erin Stellato for helping with this post!

SQL Server on Linux Clustering—A Few Other Notes

So I was chatting with fellow MVP Allan Hirt (b|t) about the cluster build that I wrote about yesterday, and I had a few more realizations about the Linux HA process as it stands right now. I haven’t talked to the the Linux product team at Microsoft about this, but I hope to in the near future to get a better idea of where things are headed. So these are my notes as of now, strictly relating to failover cluster instances (FCI), AlwaysOn Availablity Groups are coming, but are not in the latest CTPs of SQL Server on Linux.

It was faster than building a Windows cluster

It took me a while, I laughed, I cried, I cursed a lot, but if I look at the time it took for me to actually build the cluster and install SQL Server, it was a much faster process. Much of this comes down to the efficiency of the SQL Server installation process on Linux, which is as simple as running yum install mssql-server (mostly). Which leads me to my next point..

Installation options would be nice

The cluster building process is a little kludgy. Basically, you install two standalone instances of SQL Server, and then remove the data files from one them, and copy them into your NFS share. Having the option to do the equivalent of an “Add Node” install, would mean you wouldn’t need to worry about cleaning up your second node.

There’s no cluster validation, explicitly

This is a bit scarier, or easier depending on your view point. There are tests at various parts of the process to make sure things are working. For example, the first step of building your Linux cluster is to authorize the nodes to take part in the cluster, which validates certain security and network settings. However, the storage validation consists of starting and stopping SQL Server on each node to make sure it can talk to the storage and startup. Given that Microsoft doesn’t own the clusterware for this solution, I’m not sure how much they can enhance that, or if they will. This is a good open question.

There’s no dns

(Happy Late Birthday Kris!) One interesting thing I realized after talking to Allan was that I did all of my networking setup through the /etc/hosts file on each individual node. I remember doing this for RAC, and I think it may be a requirement of Pacemaker, but you will still want to make a DNS entry for your cluster identifier. When you do this on Windows, if you are using Active Directory for DNS, the installation does this for you. Not in Linux, you will need to do this yourself.

Screen Shot 2017-01-04 at 11.40.26 AM

Get comfortable with command line and scripting

There’s no cluster wizard to get you through the process. I think this isn’t a huge deal—Denny and I were talking yesterday about how relatively easy it would be to script the whole process in bash (I’m holding off until I find out if Microsoft is doing this), and most Linux sysadmins are really comfortable with writing bash scripts. But if you aren’t comfortable with Linux and the command line, now is the time to brush up, before things go prod.


We are in the very early days of this process, there is much that will likely change. From a functional and conceptual perspective, this is very similar to the way a SQL Server Failover Cluster works in Windows, but the implementation is quite different. I’d like to see things resemble Windows a bit more, at least from a SQL Server perspective, but we’ll see where the product heads.

SQL Server on Linux–Clustering

First of word of warning on this post—if you are reading it and it isn’t January of 2017, I suspect things may have changed significantly in the months going forward.

Screen Shot 2017-01-03 at 3.47.08 PM

So I did It, I built a SQL cluster on Linux. The process is documented here on BOL, I’m not going to walk you through it, I’ll probably do that in a later post, I just wanted to mention some things I ran into during this build process. First, I did this using VMWare Fusion on my Mac, but I think any virtualization platform that allows virtual networks should work. Secondly, even though BOL says you need Red Hat Enterprise Linux (and you do if you are doing this in prod and require support), I was able to do all of this on CentOS, which is the free as in beer version of RHEL.

In my scenario, I built 3 VMs, one to serve as an NFS server, the other two to be my SQL Servers. Currently, there is no cluster version of the install, it’s the standard installation for standalone SQL on Linux, you then point SQL Server at the NFS mount you created which serves as your shared storage. I had an initial permissions problem on writing my data files there—I did a bad thing on the NFS server and opened up the directory to the world (777), and was then able to copy files there. I’ll follow up on that.

One other thing that wasn’t in BOL, that I had to troubleshoot my way through is that just like a cluster on Windows, you have a cluster identifier and floating IP address. I had to add that to /etc/hosts on each of my nodes to get it to resolve. The article mentions turning off fencing for non-prod environments—I had to do that in order to get failover working correctly in my environment.

Finally, failover was a bit wonky at first, and I had to spend too much time troubleshooting an odd problem. I wrote a connect item for it., but select @@servername and select name from sys.servers returns the name of the host, and not the cluster name. I’m sure the team will fix this in the near future.

What Happens to Pages in the Buffer Pool when your Availablity Group Fails Over?

Recently at SQL Saturday Philadelphia, we started discussing failover  as it relates to mirroring and Always On Availability Groups. Specifically, we were wondering what would happen if you had a relatively busy readable secondary replica (which would have a lot of pages in the buffer pool on the secondary instance) and if those pages would be flushed from cache or anything like that. So I reached out to the product group and Kevin Farlee from Microsoft was extremely helpful:

Pages in the buffer pool are still valid, as they are updated by the redo thread.  By the time the secondary has transitioned to primary, all pending updates will have been applied, so pages in the bufferpool will all have correct contents. Note that the set of pages in bufferpool may be different between primary and secondary so you may have some cache warm up to do.”

So in a nutshell, if you have a busy readable secondary your cache won’t take much time to warmup, since the hot pages are already there.

Getting Started with Azure Hybrid Models—Point to Site Networking

I’ve done several presentations on hybrid disaster recovery (DR) using SQL Server and Windows Azure, and configuring the network has been a headache every time. Some SQL DR methods do not require a direct network connection (log shipping, mirroring, backups) and mat be secured using certificates for encyption, more advanced configurations require a VPN connection. Also, if domain services are required (either to connect back to an on-premises Windows Domain Controller (DC) (good) or to replicate your on-premises Domain Controller to a ReadOnly DC (best)) you will need a VPN connection.

If you would like a little background on networking in Azure, I highly recommend Mark Russinovich’s (b|t) session from TechEd last year on Windows Azure internals.

There are three types of direct connections to Windows Azure:

  1. Site to Site Network—an approved VPN device in your DMZ connects via VPN (encrypted, but not dedicated) to a network gateway Windows Azure. Best scenario for low volume, but regularly used scenarios
  2. Point to Site Network—the focus of this post, a certificate based VPN allowing individual machines to connect directly to the network gateway in Windows Azure, using a certificate and a VPN client. Each machine must have the certificate. This is designed for demos, and admin users who may need to connect directly to Azure virtual networks
  3. ExpressRoute—this is the true Enterprise class solution for connecting to Windows Azure. You can connect either at your ISP (participating ISPS on this list) or via a dedicated MPLS VPN from your WAN to Windows Azure. This guarantees a fixed amount of bandwidth and performance.

Figure 1 ExpressRoute Networking — http://www.windowsazure.com/en-us/services/expressroute/

Configuring Point to Site Networking

So if you are reading this, I am going to assume you have a Windows Azure account—if you don’t, get one—there’s lots of interesting things to play with. Connect to the Azure management portal and scroll down to networks.

Figure 2 Add new network in Windows Azure

From that menu select Virtual Network > Custom Create. You will be prompted to give your network a name and select an existing or create a new affinity group. I’m not going to write about affinity groups today, but they are an important part of Windows Azure—you can read more about them here.

Optionally, you can add a DNS server name and address. If you do not, Windows Azure will provide DNS services for you (in your VMs.) In my experience with this, if you want your VMs to connect to the internet, do not setup a DNS server (this mainly applies for point to site demo work—if you are doing this in prod, setup a DNS server.) There is an also an option to configure point to site networking—check this box and if you have a real VPN, you may also check site-to-site.

Figure 4 Configuring Point to Site VPN in Network

The next screen will just cover your IP address space, for limited demos you can just accept the default and click next. Finally click the “Add Gateway Subnet” option.

Figure 5 Add Gateway Subnet

You now have a new network, but with no VPN gateway to connect to. From the network dashboard click the Create Gateway button on the bottom of the screen. This will take about 20-25 minutes—don’t worry, there is other stuff to do.

Figure 6 Network without gateway

You have two other tasks—create an Azure VM, that is on this network (See below screenshot—you can choose network on the third screen of the VM creation process)

Figure 7 Creating a new Azure VM

While your VM is being created, you will need to create your certificates, to both upload to your Azure gateway and to install on the client machine. In my opinion, this isn’t clear in the Microsoft documentation, so I’ll draw a picture.

So you will create the root certificate on the main machine you want to connect to the VPN (in my case it is my on-premises Domain Controller) and then upload it to your Azure network.

Figure 8 Cert uploaded to network

You will also create a client certificate using this root certificate. You will install the client certificate on any machine you want connect to your VPN—do not recreate and upload a new root certificate for each machine that you want to connect. In order to create the certificates, you will need makecert.exe—you can get it as part of the Windows SDK. For the detailed instructions on creating the certificates, go to this MSDN page and find “Create a self-signed root certificate.” Execute those instructions all the way through “Install the client certificate.”

Now, you are ready to connect—on your gateway page (it’s probably finished creating by now) on the right side of the screen you will see “Download the 64-bit (You aren’t really using a 32 bit OS, are you?) Client VPN Package” click that link. It’s an exe that is not signed by Microsoft, so Windows will yell at you for installing it. That will install quickly—if you expand the network pane in Windows, you will see a new connection.

Figure 9 Connect to Azure VPN

Click connect, and then you will see…

Figure 10 Azure VPN Connection

If you run an IPConfig—you will see the connection to Azure.

Figure 11 IP Record from Azure VPN

One warning here—I’ve had this happen with a client, and it happened to me in the office. When I connect the VPN, I can’t get to the internet on any of my connections. Oddly, this has only happened on corporate networks, and hasn’t happened when I’ve been on the same connection through a virtual machine on my laptop. More to come later.

Update: This behavior is tied to Cisco firewall equipment. I will address in a follow up post.

Now (if you’ve turned Windows Firewall off) you can ping your VMs, and get started using SQL Server HA and DR functions with them.

Into the Blue—Building DR in Windows Azure

Tomorrow (at 1300 EDT/1800 GMT), I’ll be presenting to the SQL Pass High Availability and Disaster Recovery VC on one my favorite topics of late—building disaster recovery from your data center into Windows Azure Infrastructure as a Service. You can register for the session here. While I’m not screaming “cloud, cloud, cloud” from the rooftops, I do feel like DR is a really great point of entry to the cloud for many smaller organizations, who may lack the resource to have a second data center, or perhaps only need their applications to be highly available during certain parts of year. Additionally, the ability to backup SQL Server databases directly to Azure Blob Storage also can meet your offsite backup requirements with very little headaches.

The cloud isn’t all unicorns and rainbows however—there are some definite challenges in getting these solutions to work properly. I’ll discuss that and the following in this session:

  • Choosing the right DR solution
  • Networking and the cloud
  • Economics of cloud DR
  • Implementation concerns for AlwaysOn Availability Groups

I hope you can join me on Tuesday.

AlwaysOn Availability Groups and Automation

Last week I encountered a unique application requirement—we have a database environment configured with AlwaysOn Availability Groups for high availability (HA) and disaster recovery (DR), and our application was going to be creating and dropping databases from the server on a regular basis. So, I had to develop some code to handle this process automatically. I had a couple of things going in my favor on this—it was our custom developed application that was doing this, so I knew the procedures I was writing would be called as part of the process, and additionally, our Availability Group was only two nodes at the moment, so my version 1 code could be relatively simplistic and work. Aaron Bertrand (b|t), posted on this stack exchange thread, his code is a good start. I’m not going to put all of my code in this post—it’s ready for our prime time, but I have a few more fixed I’d like to make before releasing the code into the wild.

Dropping a Database

First of all, I need to say it’s very important to secure these procedures—they can do bad things to your environment if run out of context—particularly this one. I denied execute on everyone except the user who would be calling the procedure. I didn’t want any accidents happening. Dropping a database from an availability group is slight different then doing it from a standalone server. The process is as follows:

  1. Remove the database from the availability group (on the primary)
  2. Drop the database from the primary
  3. Drop the database from the secondary node(s)

Since, we have to initiate this from the primary instance we need to find out two pieces of data—1) what availability group are we removing the database from and 2) is the instance we are on the primary instance. In my code—I didn’t actually have to drop the database from the primary server. That piece was being called from another proc. So, I just had to remove from the availability group, and remove on the secondary. There are a number of ways to connect the secondary database—this needs to happen in SQLCMD mode, which isn’t possible in a stored procedure. We could use a linked server, or we could enable xp_cmdshell and run a SQLCMD script, and then disable xp_cmdshell. This isn’t my favorite technique from a security perspective, but I was under a time crunch, and the procedure is locked down, but in the future I will probably rebuild this with linked servers (created within the procedure)



ALTER PROCEDURE [dbo].[RemoveDBfromAG] @dbname VARCHAR(50)



EXEC sp_configure 'show advanced options'




EXEC sp_configure 'xp_cmdshell'





DECLARE @groupid VARCHAR(50)

DECLARE @groupname VARCHAR(50)


DECLARE @primary VARCHAR(50)

DECLARE @secondary VARCHAR(50)


DECLARE @dropsecondary VARCHAR(500)

SELECT @dbid = database_id

FROM sys.databases

WHERE NAME = @dbname

SELECT @groupid = group_id

FROM sys.availability_databases_cluster

WHERE database_name = @dbname

SELECT @groupname = NAME

FROM sys.availability_groups

WHERE group_id = @groupid


SELECT @primary = primary_replica

FROM sys.dm_hadr_availability_group_states

WHERE group_id = @groupid

SELECT @secondary = node_name

FROM sys.dm_hadr_availability_replica_cluster_nodes

WHERE node_name != @primary

SELECT @sql = 'alter availability group ' + @groupname + ' remove database [' + @dbname + ']'

SELECT @dropsecondary = 'sqlcmd -S "' + @secondary + '" -E -Q "exec ReturnsTestInstanceDropSecondaryDB [' + @dbname + ']"'


SELECT primary_replica

FROM sys.dm_hadr_availability_group_states

WHERE primary_replica = @primary







EXECUTE sp_executesql @sql

WAITFOR DELAY '00:00:25'

EXEC xp_cmdshell @dropsecondary

EXEC sp_configure 'xp_cmdshell'







The one particularly unique thing you will notice in the code—is a “WAITFORDELAY”—what I observed is that after the secondary database is removed from the availability group, it goes into recovering for about 10 seconds—and we are unable to drop a database while it’s in recovery. By implementing that wait (the T-SQL equivalent of a sleep command) the database was able to be dropped.

Adding a New Database

Adding a new database has similar requirements—we have a couple of additional factors though. We have to verify that the instance we are on, is the primary for the availability group we are adding the database into. This is where I really need to fix my code—it assumes that there are only two nodes in our availability group cluster. I need to refactor the code to potentially loop through the other four secondaries (or 7 if we are talking about SQL Server 2014). Also, I’m using a linked server connection—this also assumes that the drive letter and path to the data file directory on the secondary are also the same. To summarize, the process is as follows:

  1. Accept the availability group and database names as input parameters
  2. Verify that the node you are working on is the primary for the availability group
  3. Backup the database and transaction log of the database to file share you’d like to add to the availability group
  4. Add the database to the availability group on the primary
  5. Restore the database and transaction log to the secondary with norecovery
  6. Alter the database to set the availability group

My code for this isn’t totally flushed out—it works in my environment, but I don’t think it’s ready for sharing. I’ll share later, I promise.

Pitfalls to Avoid

This isn’t that different than just building out an availability group, but many of the same caveats apply, you need to ensure agent jobs and logins affiliated with a database are on all nodes in the availability groups. Additionally, the procedures to add and remove databases from your availability group, need to run on all of the nodes. Also, if you are doing this programmatically, the connection should use the listener, so that your application is always connecting to the primary instance.

Disaster Recovery Starts with Backup and Recovery

I present and blog a lot about high availability and disaster recovery solutions, in doing so I get to talk to a lot of folks about different strategies. Depending on your business needs and regulatory requirements these can vary greatly in costs and complexity. However, no matter your DR solution, it is imperative that you have a sound backup strategy and that you test those backups on a regular basis.

I recently took part in a architectural review of several important applications. The reason for the review is that customer teams were asking for a real-time DR solution for systems that were producing multiple terabytes of transaction volume daily. This is possible, but only at great costs, so in order to craft a better solution we started asking for details around current strategy, and to get a better understanding of the business requirements around their data. When doing so, it came out, that multiple application teams had never tested a restore of their backups. Excuse my fonts here…

If you aren’t regularly testing your restores, save the drive space and don’t

back anything up

Ok, rant over—sort of. At a conference I had the pleasure of attending this spring, an Infrastructure Executive from Goldman Sachs was presenting about how they had zero downtime during Hurricane Sandy. Granted it is Goldman Sachs, and their IT budget is some-huge-number billion dollars, but several things she said really stand out. In addition to much risk analysis work that GS had done, they regularly tested out failovers and restores, in all systems (computer, power, cooling and generation). That’s by far the most important thing you can do from a DR perspective. Even, if it is not physically doing a test (you really should), but getting all of the teams (Database, Sys Admin, Network, Application) into one room, and working out all of the moving parts, and what needs to happen in the event of a disaster.

Lastly, I know it’s hard to get resources to do test restores, enterprise storage isn’t cheap. However, there are many options you can leverage if you don’t have space in your primary location for testing:

  • Amazon Web Services—Take advantage of the cloud, DR in my opinion is one of the best use cases for the cloud. You can fire up a server, use it for testing, and then blow it away. You have to get your data there and that can be painful for large data sets, but it’s not a bad solution.
  • Real Hardware—I know enterprise storage is pricey, but a lab server with some relative slow, but dense storage isn’t. You can build a really effective restore testing environment for less than $10,000, and well under it, if you are willing to buy older unsupported HW on eBay.
  • Offsite Recovery Services—There are a number of firms who provide offsite recovery services (for example Sungard). However this option tends to be extremely expensive, as they guarantee hardware for you in the event of a disaster, and as part of that guarantee you are granted testing time.

I can’t express my feelings on this strongly enough—particularly if you are working with large data sets, with lots of opportunities for failure, it is absolutely critical to test those systems. Otherwise, don’t plan on being able to restore when you need.

%d bloggers like this: