I wrote a couple of weeks ago, about what not to do with backups in Azure. Because I’ve seen a few improperly configured VMs lately, I wanted to talk about the way the storage works in Azure, and the way we traditionally did things on-premises.
If you still buy your storage from a three letter company, and your sales rep drives an expensive German car, and has better taste in shoes than Imelda Marcos, you might still configure your storage this way. You might create a separate disk volume for TempDB, transaction log files, and data files. Ideally, you are backing up to a separate storage appliance, and not to the same storage array where your data files live.
This architecture design dates back to when a storage LUN was literally a built of a few disks, and we wanted to ensure that there were enough I/O operations per second to service the needs of the SQL Server, because we only had the available IO of a few disks.
As virtualization became popular storage architectures changes and the a SAN lun was carved out into many small extents (typically 512k-1MB depending on vendor) across the entire array. What this meant was that with modern storage there was no need to separate logs and data files, however some DBAs did, however in an on-premises world there was no penalty for this.
Note: There is a scenario where you would want multiple disk devices in Windows. Under very high I/O workloads, IOs can queue at the Windows disk device level. This is an uncommon performance scenario in my experience
Enter the Cloud
Instead of physical disks, in the cloud your “disk devices” are virtual hard drive files, which are stored across 3 different physical disks on the infrastructure. All storage performance is controlled by quality of service settings on the Azure infrastructure. Each disk you add increases both your IOPs and storage capacity. Also, each virtual machine has a fixed limit on the number of IOPs available to it (while this is very possible on-premises, it’s far less common).
We then translate this to the operating system level, and in this specific case, Windows Server. In order to get maximal volume and performance out of our disks, we use Storage Spaces in Windows to create pools of storage. The exciting part here is that you get to use RAID 0, since Azure’s (or Amazon’s) infrastructure is providing your RAID. This means if we have 20 1 TB disks, with 5000 IOPs each, we can have a 20 TB pool, that theoretically supports 100k IOPs. (Most VMs in Azure don’t support that level of IO performance, but a couple do).
It’s also important to know that you need to specify the number of columns parameter when building your storage spaces pools in Windows. If you have more than four disks your need to use PowerShell for that–I’ll write more about that next week. But here’s some info from the product teams.
This post has good info on columns, but it’s from 2014 and the rest of the storage information is very dated (premium storage didn’t exist). I’m only including because it’s the best explanations of columns that I’ve seen.
What this means, is that in order to maximize your database server’s IO performance, you should create one large pool, with all the disks. Throw your system DBs and your data and log files all on that volume. And please don’t write your backups to that disk. (BACKUP TO URL was invented for this purpose).
You can also throw TempDB on the local D: drive, which is ephemeral (it goes away when your machine reboots, but so does TempDB), and can over slightly lower latency.
Note, if you’re reading this and you are using Ultra Disk, I haven’t tested any of this stuff with Ultra Disk because I haven’t been able to test it. I think you may not need to stripe disks to achieve good performance.
Pingback: VM Storage Performance in the Cloud – Curated SQL
But that three letter storage company and their SAN technology does provide hundreds of thousands of IOPS at sub-ms performance for those workloads that simply won’t fit in the cloud or have not been designed for cloud. Maybe it’s time for an upgrade instead of using your father’s SAN :-). It doesn’t seem that “old school” when you’re assembling your own storage pools in the cloud :-).
One of the issues we are facing with striped disks (storage pools as well) in AWS is latency when volume is failing over on AWS side. This usually cause 30-60 seconds delay in reads/writes for striped disk and quite disturbing for the database.
We found that it does make sense to separate log files onto single not-striped disk with one volume behind it as it helps to reduce frequency of the issue – it has less impact when secondary sync replica having issues with striped data disk.