A Problem with Storage Spaces, Failover Clustering, and Always On Availability Groups in Azure

Spread the love

This is quite possibly my longest blog post title, ever, however it is pretty important for anyone who is building SQL Server configurations using Azure virtual machines. If you aren’t familiar with Azure storage, the max size of an individual disk is 4 TB (and more importantly 7500 IOPs–most workloads will be better served by the 1 TB/5000 IOPs P30 disks), and to get a volume larger than the size of a single disk (and with more IOPs), the admin will use Storage Spaces to create a RAID 0 group (because Azure provides RAID in the infrastructure) and you get the sum of your storage and IOPs. This process is well documented and I’ve done it a number of times or customer workloads. I was building a new availability group for a customer last week, and the process was a little bit different, and failed on me.

Windows Clustering is Aggressive with Your Storage

Note: As I was writing this post, about four different parts of the VM creation process changed in the portal. Welcome to cloud computing boys and girls.

So in this demo–I created a VM and added two disks. I’ll also join the VM to our Active Directory domain. I haven’t added failover clustering yet. From server manager, I can see my primordial storage pool, with the two disks I’ve added.

p1

So, next I’ll add the Failover Clustering feature and build a one node cluster. For this demo, you only need one node–the behavior is same in a single or multi-node configuration. I’ve built a cluster and you will note that there are no disks present in the cluster.

p2

However, if I go back to storage spaces, I still see my primordial pool, but no physical disks, and now I have an error about “Incomplete Communication with Cluster”

p4

At this point, I am unable to configure a storage pool for my SQL Server data.p5

I’ve had inconsistent results here–but in no situation have I been able to create a storage pool.

Workaround

This really sucks, but the workaround is the evict the node from the cluster, and then create your storage pool as documented above. That’s kind of gross, so a better workaround is to configure your storage spaces pool, before you add your node to your Windows cluster. I did test the process of adding a new disk to an existing pool, after the server has been clustered and that process works as expected.

Root Cause

My thought (and I haven’t tried to debug this, but I have communicated with Microsoft Windows and SQL teams about it) is that clustering is being too aggressive with the available storage. I tried running some PowerShell to prevent clustering from taking the disks, but I still had the same result. I’ll update this post if I hear anything further from Microsoft.

 

6 thoughts on “A Problem with Storage Spaces, Failover Clustering, and Always On Availability Groups in Azure

  1. Pingback: “Incomplete Communication with Cluster” with local Storage Space for SQL Server cluster – Clustering For Mere Mortals

  2. daveberm

    Any update Joey?

    I recently had a customer share this Powershell with me that he thinks fixes the issue. I haven’t had a chance to try it yet.

    Get-StorageSubSystem | Where AutomaticClusteringEnabled -eq $true | Set-StorageSubSystem -AutomaticClusteringEnabled $false

    Reply
  3. Kumar

    I see you noted “I did test the process of adding a new disk to an existing pool, after the server has been clustered and that process works as expected.” I need to increase the disk space by adding a new disk and increase/expand the size of the disk on each node of the SQL AG. however I see the process is not working. what are my options.

    Reply
  4. Bryan Reisner

    I was able to extend an existing storage space without issue. I was also able to add a local storage pool post clustering without evicting a node following this link:

    https://docs.microsoft.com/en-us/archive/blogs/keithmayer/extending-sql-server-2014-alwayson-resource-groups-with-storage-spaces-on-microsoft-azure

    Main thing is to run the below first, as it prevents the cluster from taking the resulting storage pool and virtual disk. Additionally the 3 disk minimum is removed.

    Get-StorageSubsystem -FriendlyName “*Clustered*” | Set-StorageSubSystem -AutomaticClusteringEnabled $False

    Reply

Leave a Reply to Dan Patrick (@deltadan)Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.