This headline may seem a little aggressive. You may think, Joey, I’ve heard you and other MVPs say that backups are the most important thing ever and I’ll get fired if I don’t have backups. That is very accurate, but when I talk about backups, I am talking about backing up your databases. If you are running an Azure VM, or have a good connection to Azure, Microsoft offers BACKUP to URL functionality that let you easily isolate your database backups from the VM.
If you are backing up your operating system and system state on your cloud provider, you are wasting storage space, money, and CPU cycles. You may be even negatively impacting your the I/O performance of your VMs. We had a customer who was doing a very traditional backup model–they were backing up their databases to the local file system, and then backing up the SQL Server VM using the Azure backup service. They swore up and down that backups where impacting their database performance, and I didn’t believe them until they told me how they were doing backups.
Cloud Storage Architecture
Unlike in your on-premises environment, where you might have up to a 32 Gbps fibre channel connection to your storage array and then a separate 10 Gbps connection to the file share where you write your SQL Server backups, in the cloud you have a single connection to both storage and the rest of the network. That single connection is metered and correlates to the size (and $$$) of your VM. So bandwidth is somewhat sacred, since backups and normal storage traffic go over the same limited tunnel. This doesn’t mean you can’t have good storage performance, it just means you have to think about things. In the case of the customer I mentioned, they were saturating their network pipe, by writing backups to the file system, and then having the Azure backup service backup their VM, they were saturating their pipe and making regular SQL Server I/Os wait.
Why Backing Up Your VMs is Suboptimal
The Azure Backup service is not natively database aware. There is this service https://docs.microsoft.com/en-us/azure/backup/backup-azure-sql-database which offers native backups in SQL Server for you. While this feature is well engineered, I’m not a huge fan. The reason is the estimated time to create a new VM in Azure is about 5-10 minutes. If you are using backup to URL, you can have a new SQL Server VM in 10 minutes. You can immediately join the machine to the domain (or better yet, do it in an automated fashion), and then begin restoring your databases. Restoring from the backup service is really slow, whereas you will see decent restore speeds from RESTORE TO URL.
The one bit of complexity in this process is for high availability solutions like Always On Availability Groups, that have somewhat complex configurations. I’m going to say two dirty words here: PowerShell and Source Control. Yes, you should have your configurations scripted and in source control. You’d murder your developers if their web servers required manual configuration for new deployments, so you should do the same for your database configurations.
If you have third party executables installed on your SQL Server for application software, then well, you doing everything wrong.
How I Do Backups in Azure:
- Use Ola Hallengren’s scripts. They support BACKUP to URL.
- Add this agent job step from Pieter Van Hove to cleanup your backups
Sleep well. Eventually, there will be full support for automated storage tiering in Azure (it still has some limitiations that preclude it’s use for SQL Server backups), so for right now you need the additional manual step.
While I wrote this post around Azure, but I think the same logic should apply to your on-premises VMs and even physical machines. You should be able to get Windows and SQL Server installed within 20-30 minutes if you have a very manual process, and if you have an automated process you should be able to have a machine in less than 10. When I worked at the large cable company, we didn’t backup any of our SQL Server, just the databases.
Pingback: Backup to URL in Azure – Curated SQL
We have attempted this on our servers but the issue we encounter is that unless we remove the backup verification, which is not an desirable option, backing up directly to Blob storage is extremely slow. Instead we backup locally and copy that up to blob storage in a separate step. Also, Azure Storage Account’s Lifecycle Management option has been a great tool for moving the archival backups to cold storage and then purging them after the desired timeframe.
Doing restore with verify is of limited benefit. While it does verify the validity of the backup file, SQL Server can still backup corrupted data, so the only way to validate the backup file is to restore it. tl;dr WITH VERIFY gives a little comfort, but it doesn’t really prove the backup is good.
I agree with your take. In addition to the points you make, here are a few more considerations.
First, testing a VM backup is significantly more complicated than testing native SQL backups. I know there are commercial tools out there that will create an isolated network environment in which you can test the restore, but those come with downsides too.
Second, when you must actually restore the VM, you might have to deal with side effects, such as an expired machine account in Active Directory. While this is rather easy to deal with, it adds friction to the restore process that shouldn’t be there.
The last point is that in many environments, the RPO is such that you can’t back up the VM frequently enough to meet it. So you must use traditional SQL Server backups (full, differential and/or transaction log) anyway. The effort of backing up the VM itself provides little additional benefit for a lot of additional work. The marginal cost exceeds marginal revenue as economists might put it.
Pingback: Don’t Backup Your SQL Server VMs in Azure - How to Code .NET
A similar question and I’m looking for a source. You mention a single storage connection. We’re in the process of setting up Azure and we’re using the Azure VM SQL backup mechanism.
The other day we had half the DBs flip to the secondary and I suspect it was contention on the network.
In the Azure environment is all bandwidth: internet, storage, internal networking just a single shared bandwidth pool without any govenors?
If you have a reference I’d be grateful. My experience so for has me believe this is the case but others are hoping that this is all “magic” behind the scenes connectivity.
It’s complicated. Storage has dedicated bandwidth, but there is also throttling as you approach the edge of that capacity. Most of my customers that have AGs are on larger VMs with pretty good bandwidth isn’t something I’ve encountered, but I have seen it happen in on-premises AGs with poor networking. I’m assuming you saw a cluster lease timeout? Sorry for delay on this response, I didn’t get the notification of this comment, and just happened to come back to grab a link to this post.