Let’s Talk about Backups, and How to Make Them Easier

Recently I’ve run into a couple of situations where customers had lost key business data due to several factors. Whether it is ransomware, a virus, or just hardware failure, it doesn’t really matter how you lose your data, it just matters that your data is lost, and your business is now in a really bad spot. When I was first thinking about writing this post this morning, I was going to tell you how important it was to backup your databases, and how the cloud is a great disaster recovery solution for those backups. The problem is, if you are reading this blog, you likely at least know that you should have backups. You probably even know how to optimize them to make them run faster, and you test your restores. You do test your restores, right?

analog audio backup broken

Photo by Anthony on Pexels.com

Then I thought a little harder, and I was reminded of a tweet that my good friend Vicky Harp (the Principal Program Manager of the SQL Server tools team at Microsoft) wrote a couple of years ago:

Backups are DBA 101, but most of the organizations who are having these types of issues don’t have a DBA. They might not even have a dedicated IT person, or if they do it’s someone who comes by once a week to make sure the printer and wifi still works and takes care of company laptops. The current situation is that we hope they go to a SQL Saturday, or user group, and learn about backups and start taking them. So, I thought, what could be done to make that easier, and faster. The database engine has technology to do backups automatically (maintenance plans) and even move them to secondary or tertiary location (backup to URL).

What I’m asking for (besides you to upvote that User Voice item) is for Microsoft, as part of SQL Server setup, to add two new screens. The first would be called “Backup”—it would offer a dire warning to the effect of:

In order to protect the data in your databases, Microsoft strongly encourages you to take backups of your data. In the event of hardware failure, data corruption, or malicious software, Microsoft support will be unable to help you recovery your data, and you will incur data loss. This box is checked by defaults to enable automatic daily backups of all of your databases.

The next screen would offer you options on where to store your data, and how long to retain it. It would offer the option to store the backups locally, or on a network share, and give you the ability to encrypt your databases. It would also allow you to backup your encryption key (and strongly encourage you to do so).

The next part of that screen is where I think this could become attractive to Microsoft. It would give you the option to backup your databases to URL in Azure, and if you didn’t have an Azure account, it would allow the user to create one. Frankly, for most organizations who would be using this for their backup solution, Azure is the best option.

Arguments Against This Feature

The main arguments I could make against this feature request are minimal. One could argue that you would like to use Ola’s scripts, or DBATools, or change striping or the buffer count for your backup. If you are making any of those arguments, this feature isn’t for you. If you’ve ever installed SQL Server with a configuration file, this feature isn’t for you. The only valid argument I can see against doing this, is that one could potentially fill up a file system with backup files. Maintenance plans do have the ability to prune old backups, so I would include that in my deployment. I might also build in some alerting and warnings into the SQL Agent to notify someone by default.

The other argument I see, is that Microsoft offers a similar product with Azure Backup for SQL VMs, and this would cannibalize that feature. It very well might, but that product is limited to Azure, and we are aiming for the greater good here helping more people protect their data is good for Microsoft, good for SQL Server, and good for the world.

Summary

If you are reading this, go upvote my User Voice request here. This feature isn’t about you, it’s about all the orgs who’ve IT decisions have them at the point of data loss, and they were really none the wiser. Let’s make life easier for folks.

The Ransomware Breach Your’re Going to Have

I don’t typically blog about network engineering here. However, in the last few weeks I’ve seen several major companies get hit with ransomware attacks. While this isn’t an uncommon thing in 2019, it is uncommon that their entire environments were taken offline because of it. So with that, let’s talk about how these attacks can do this. Since the best way to deal with any sort of an attack is to deal with an “Assume breach” model, let’s talk about the best way to defend against attacks: physical network security.

The Attack Vector: Dumb Humans

The easiest way to attack any company is via human targets. Whether it’s bribing a sysadmin to get credentials, or a standard phishing attack, or any sort of other malware, the best way to pwn a company is by getting an unknowing (or knowing) human to most of the work for you. There are ways to stop these sorts of things from getting in the front door–the first would be to use an email service like Office 365 or GMail, that have built-in phishing protections, and use machine learning based on their mass volume of exposure to attacks, that protect you. You should also educate your users to avoid these scams–there is good training that I’ve taken for a few clients that do this.

But the real assumption is to take an assume breach methodology. I’m currently working on a financial system for a Fortune 100 company. In order to reach the servers, I have to use a special locked down laptop, have two key cards, and go through two jump hosts to get there. Even if that laptop were to get hacked (and it wouldn’t because you can’t install software on it) you couldn’t do anything without my key cards and PINs.

Physical and Virtual Network Security

Can you connect to your production database servers over any port from your desktop? Or to your domain controllers? If you can do that you have a problem. Because once someone’s desktop gets pwned, then the malicious software that gets installed when the CEO tries to open a PDF with the new Taylor Swift album on it, can then run anywhere on your network. This is bad.

orange yellow green and blue abstract painting

Photo by Steve Johnson on Pexels.com

The networking gospel according to IBM.

The IBM white paper linked above is the gold standard of how to build and segment your network. In a common example there are a few zones:

Black Zone: No outbound external traffic, inbound restricted to whitelisted IPs and ports from within black and green zones. This is where your domain controllers and database servers with any sort of sensitive data should live.

Green Zone: Limited external traffic (think Windows Update, Power BI gateway, Linux package repos), can communicate to end use networks over controlled ports. This is where most of your application servers and some management services should live.

Blue Zone: Management zone–this is where you should have your jump boxes so that you don’t have to log directly onto production boxes. This can have limited external traffic, and should be able to talk to the servers in the black zones, but only over ports that you have specified.

Yellow Zone: This is typically where your DMZ will exist. That means you are allowing inbound traffic from the internet. This is obviously a big attack vector, which means it should live on an isolated sector of your network, and the traffic to and from this zone should be locked down to specific IPs and ports that need connectivity.

Red Zone: This is where the users live. There’s internet access, but communications from this network should nearly always stay within this network. You will have teams that want to deploy production workloads in this network. Don’t let them.

But That Sounds Hard

Good security is always hard. See my above server management issues. In my scenario, when the CEO got pwned you might have to deal with a bunch of ransomware laptops, but since your servers and domain controllers weren’t easily reachable from the desktop network, your company would keep moving, and you could simply re-image the pwned machines.

This is Trivial to Implement with Cloud Networks

In order to do this on-premises, you may have to buy a lot more networking gear than you already have, or at least restructure a whole lot of virtual lans (vLANs). However, in a cloud scenario, or even in some virtual infrastructures this kind of model is trivial to implement. Just look up network security zones in Azure (and you never have to run any cable in the cloud).

Technology, and especially enterprise technology isn’t easy, but it’s more important than ever to use good security methods across your environment.

Azure Hybrid Automation–Performing Tasks in a VM

Recently, I learned something new about Azure Automation–that you could execute tasks inside of a VM. I haven’t dealt with this situation before–typically, any tasks I want to execute inside of a VM, I execute using a scheduler like SQL Agent, or the Windows scheduler. However, in this situation, we were trying to reduce costs, and this VM is just used for ETL processing (the database in Azure SQL DB), but we still needed to take backups of the Master Data Services database.

My first thought around this was to have an SQL Server Agent job that either executed on startup, or once a day, however this was messy, and could potentially lead to having several sets of unnecessary backups a day. I knew I could create an Azure Automation job that would check the status of the VM, and start it if it was stopped. What I needed to figure out from there was:

  1. How to get a command to execute inside of the VM
  2. To shutdown the VM if it had not been running previously

Enter Azure Hybrid RunbooksHybridAutomation

While this image shows a potential on-premises > Azure use case, in my case I was simply using this to connect to a VM within Azure. You will have to run some PowerShell to enable this on your machine, but after that step is completed, you can simply call Azure Automation cmdlets with the “-runon” option, which specifies the name of the hybrid worker group you created.

The other trick to this, was calling a runbook from a runbook, which wasn’t well documented.

# Ensures you do not inherit an AzureRMContext in your runbook

Disable-AzureRmContextAutosave –Scope Process


if ($vm -ne 'PowerState/running')

{start-azurermvm -ResourceGroupName $RGName -Name $vmName;

start-sleep -Seconds 35;

start-azureRMautomationrunbook -AutomationAccount 'Automation' -Name 'BackupDB' -ResourceGroupName $rgName -AzureRMContext $AzureContext -Runon 'Backups' -Wait;

stop-azurermvm -Name $VMname -ResourceGroupName $RgName -force}

else

{start-azureRMautomationrunbook -AutomationAccount 'Automation' -Name 'BackupDB'`

-ResourceGroupName $rgName -AzureRMContext $AzureContext -RunOn 'Backups' -Wait}

In order to execute this inner runbook, you simply use the Start-AzureRMAutomationRunbook and since it’s hybrid, we use the aforementioned -runon option. My BackupDB runbook simply uses the DBATools

backup-dbadatabase cmdlet, to perform a full backup.

It’s quick and easy to get this up and running in your automation account–the hardest part is getting all of the modules you need into your automation account.

Getting Started with the Cloud with No Budget and an Unsupportive Employer

This thread on Twitter last night really piqued my interest:

 

It really made me think of a conversation I had with a colleague in my last “regular” job. I’m not counting my time at Comcast, because we were effectively a technology firm. I mean a normal, regular company whose core business does not relate to computers or software. Scott, who was my colleague had just attended TechEd 2011, or maybe 2012–the years run together at this point. His comment was “with everything going to the cloud, it seems like all the jobs will be with Microsoft, or helping other customers implement cloud.” In 2011-12, the cloud was still really awful (remember the original SQL Azure? I do, and it was bad), but it was clear what the future would be.

The Future is Here What Do We Do Now?

So if you are working in a “traditional” firm, and you feel as though your skills are slipping away, as the rest of the technology world moves forward, what should you do? The first thing I’m going to say isn’t an option for everyone, because of challenges, and personal situations, but given the current state of economy and IT employment, I think it needs to be said. If you are in a job where you are only supporting legacy tech, of which I don’t really mean on-prem firms–some of the most cutting edge SQL Server orgs in the world are 100% on-premises, but if you are regularly supporting software whose version conforms to the regular expression ^(200)\d{2}$ my best bit of advice to you would be to start the process of finding another job.

I know changing firms isn’t for everyone, and if you want to become a cloud engineer, you need to build your skills in that space. The crux of the twitter thread is how do you learn this things when you are in an organization that thinks that cloud computing has something to do with rain? The first thing I would recommend, if you are willing to spend a little money, is to use skillmeup.com (note: both DCAC and my company have business relationships with Opsgility, the parent company). I have taught classes using their labs–you get a real Azure subscription, with a production scenario, and you also get online training associated with the lab.

Other resources like Pluralsight or LinkedIn Learning (note: DCAC has a business relationship with LinkedIn Learning) offer online training, however I really feel like getting hands on with tech is the best way learn tech.

My Budget Isn’t Even That High

Both Amazon and Microsoft offer free trials–I know Azure a lot better, so I’m going to focus on that scenario. (BTW, this ties to another bit of advice I have, learn one cloud first. The concepts between them are pretty similar, and if you learn one cloud really well, transitioning to the other one will be much easier than trying to consume all of it at once). The Microsoft offer gives you $200 to use for 30 days, also if you have an MSDN subscription you also get somewhere between $50-150 month to use.

While those numbers are small, especially when talking about services, it can still easily get you started with the basics of cloud. Virtual machines (which also cost a lot) are for all intents and purposes very similar to your virtual machines on-prem. But if you want to learn how to extend an on-premises Active Directory to the cloud, you can do that by building a Windows Server VM on your laptop, and then connecting to Azure Active Directory. That has minimal cost (AAD is a free service). Also, learning things like networking and storage also have minimal cost.

One of the most important cloud skills you can have, Automation, just involves using PowerShell (or CLI, depending on what you like). If you haven’t learned a scripting language, you should invest more time into that. You can do this on any trial account, and with a minimal cost, especially when you learn how to clean up the resources you deployed as soon as your deployment script created.

As a SQL Server pro, if you want to start learning Azure SQL*, you should get started with Azure SQL Database. It’s cheap, and you can do everything you can do in the $15,000/month database with the $5/month database.

tl;dr

This was a long post. Here’s what you should start learning for cloud:

  • scripting (powershell, cli, or rest, doesn’t matter, learn one of them)
  • networking
  • storage
  • security
  • platform as a service offerings in your field and how they work with networking, storage and security

You can do all of these things with a minimal financial investment, or perhaps even for free.

Summary

You are in charge of your career, not your current employer. If you want to advance your skills you are going to have to work for it, and maybe spend some money, but definitely a big time investment. Also, consider going to some training–I just did a precon at SQL Saturday Chicago, and while the attendees aren’t going to be cloud experts after a day, they have a great basis on which to move forward. Books and reading are challenging in a cloud world–it moves quickly and changes fast.

If Your Hardware and OS are Older than Your Interns, Fix It

Yesterday, I wrote my monthly column for Redmond Magazine about dealing with situations where your management doesn’t want to invest in the resources to provide reliability for your hardware and software environment. Whether its redundant hardware and even offsite backups, many IT organizations fail at these basic tasks, because the business side of the organization fails to see the benefit, or more likely doesn’t perceive the total risk of having a major outage.

carimage

Is this your hardware?

As I was writing the column and as mentioned in the column, AMC theaters had a system meltdown the other day, during the early sale of tickets for the premier of the new Avengers movie. The next day, Arizona Iced Tea (whom I didn’t realize was still in business) got ransomwared.

 

While I agree with Andy about testing your restores, I wanted to address a couple of other things. If you are running an old OS like Windows 2003, your business is at risk. If for some reason you absolutely have to run a 16 year old operating system in your environment, you should ensure that it is isolated enough on your network that it’s exposure is limited.

Additionally, as an IT organization it’s your job to be yelling up and down at your software vendors who won’t support modern versions of infrastructure components like operating and database systems. And yes, while I’m a consultant now, I’ve had many real jobs, and I understand the business chooses the software packages they want to run. I also understand, that when the org gets ransomwared because “SomeShittyApp” needed to run on an unpatched XP box with an SMB-1 share open to the internet, IT are going to be the folks whose jobs are on the line.

One of the other things I brought up in my column is how to handle the PR aspects of a system outage. Let’s face it, if your site and services are down, you need to be able to explain to your customers why and what your timeline for repair is. When you are investing in your systems and doing all of the right things, it is very easy to be transparent and explain “we had redundancy in place, but the failure happened in such a way that we incurred an outage”, sounds a lot better than “yeah, we had to run an OS that’s older than our interns because we didn’t have the budget for an upgrade.”

Finally, if you are on really old hardware (hint: if your servers were originally beige and are now a much yellower shade of beige, you’re on really old hardware), it’s probably cheaper and more efficient to do a cloud migration. You can move to Azure IaaS (or AWS) or if you’re a VMWare shop their cloud option on AWS offers a very simple migration path, especially if your cloud experience is limited. Just get off that really old hardware and software and onto something that gets patched regularly.

%d bloggers like this: