Building Storage for SQL Server (and other database) Virtual Machines in the Cloud

I wrote a couple of weeks ago, about what not to do with backups in Azure. Because I’ve seen a few improperly configured VMs lately, I wanted to talk about the way the storage works in Azure, and the way we traditionally did things on-premises.

Old School

If you still buy your storage from a three letter company, and your sales rep drives an expensive German car, and has better taste in shoes than Imelda Marcos, you might still configure your storage this way. You might create a separate disk volume for TempDB, transaction log files, and data files. Ideally, you are backing up to a separate storage appliance, and not to the same storage array where your data files live.

This architecture design dates back to when a storage LUN was literally a built of a few disks, and we wanted to ensure that there were enough I/O operations per second to service the needs of the SQL Server, because we only had the available IO of a few disks.

As virtualization became popular storage architectures changes and the a SAN lun was carved out into many small extents (typically 512k-1MB depending on vendor) across the entire array. What this meant was that with modern storage there was no need to separate logs and data files, however some DBAs did, however in an on-premises world there was no penalty for this.

Note: There is a scenario where you would want multiple disk devices in Windows. Under very high I/O workloads, IOs can queue at the Windows disk device level. This is an uncommon performance scenario in my experience

assorted title cassette tapes

Photo by Vova Krasilnikov on Pexels.com

Enter the Cloud

Instead of physical disks, in the cloud your “disk devices” are virtual hard drive files, which are stored across 3 different physical disks on the infrastructure. All storage performance is controlled by quality of service settings on the Azure infrastructure. Each disk you add increases both your IOPs and storage capacity.  Also, each virtual machine has a fixed limit on the number of IOPs available to it (while this is very possible on-premises, it’s far less common).

We then translate this to the operating system level, and in this specific case, Windows Server. In order to get maximal volume and performance out of our disks, we use Storage Spaces in Windows to create pools of storage. The exciting part here is that you get to use RAID 0, since Azure’s (or Amazon’s) infrastructure is providing your RAID. This means if we have 20 1 TB disks, with 5000 IOPs each, we can have a 20 TB pool, that theoretically supports 100k IOPs. (Most VMs in Azure don’t support that level of IO performance, but a couple do).

It’s also important to know that you need to specify the number of columns parameter when building your storage spaces pools in Windows. If you have more than four disks your need to use PowerShell for that–I’ll write more about that next week. But here’s some info from the product teams.

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-performance

This post has good info on columns, but it’s from 2014 and the rest of the storage information is very dated (premium storage didn’t exist). I’m only including because it’s the best explanations of columns that I’ve seen.

Best Practices & Disaster Recovery for Storage Spaces and Pools in Azure

What this means, is that in order to maximize your database server’s IO performance, you should create one large pool, with all the disks. Throw your system DBs and your data and log files all on that volume. And please don’t write your backups to that disk. (BACKUP TO URL was invented for this purpose).

You can also throw TempDB on the local D: drive, which is ephemeral (it goes away when your machine reboots, but so does TempDB), and can over slightly lower latency.

Note, if you’re reading this and you are using Ultra Disk, I haven’t tested any of this stuff with Ultra Disk because I haven’t been able to test it. I think you may not need to stripe disks to achieve good performance.

 

Don’t Backup Your SQL Server VMs in Azure

This headline may seem a little aggressive. You may think, Joey, I’ve heard you and other MVPs say that backups are the most important thing ever and I’ll get fired if I don’t have backups. That is very accurate, but when I talk about backups, I am talking about backing up your databases. If you are running an Azure VM, or have a good connection to Azure, Microsoft offers BACKUP to URL functionality that let you easily isolate your database backups from the VM.

If you are backing up your operating system and system state on your cloud provider, you are wasting storage space, money, and CPU cycles. You may be even negatively impacting your the I/O performance of your VMs. We had a customer who was doing a very traditional backup model–they were backing up their databases to the local file system, and then backing up the SQL Server VM using the Azure backup service. They swore up and down that backups where impacting their database performance, and I didn’t believe them until they told me how they were doing backups.

Cloud Storage Architecture

Unlike in your on-premises environment, where you might have up to a 32 Gbps fibre channel connection to your storage array and then a separate 10 Gbps connection to the file share where you write your SQL Server backups, in the cloud you have a single connection to both storage and the rest of the network. That single connection is metered and correlates to the size (and $$$) of your VM. So bandwidth is somewhat sacred, since backups and normal storage traffic go over the same limited tunnel. This doesn’t mean you can’t have good storage performance, it just means you have to think about things. In the case of the customer I mentioned, they were saturating their network pipe, by writing backups to the file system, and then having the Azure backup service backup their VM, they were saturating their pipe and making regular SQL Server I/Os wait.

Why Backing Up Your VMs is Suboptimal

The Azure Backup service is not natively database aware. There is this service https://docs.microsoft.com/en-us/azure/backup/backup-azure-sql-database which offers native backups in SQL Server for you. While this feature is well engineered, I’m not a huge fan. The reason is the estimated time to create a new VM in Azure is about 5-10 minutes. If you are using backup to URL, you can have a new SQL Server VM in 10 minutes.  You can immediately join the machine to the domain (or better yet, do it in an automated fashion), and then begin restoring your databases. Restoring from the backup service is really slow, whereas you will see decent restore speeds from RESTORE TO URL.

The one bit of complexity in this process is for high availability solutions like Always On Availability Groups, that have somewhat complex configurations. I’m going to say two dirty words here: PowerShell and Source Control. Yes, you should have your configurations scripted and in source control. You’d murder your developers if their web servers required manual configuration for new deployments, so you should do the same for your database configurations.

If you have third party executables installed on your SQL Server for application software, then well, you doing everything wrong.

How I Do Backups in Azure:

  1. Use Ola Hallengren’s scripts. They support BACKUP to URL.
  2. Add this agent job step from Pieter Van Hove to cleanup your backups

Sleep well. Eventually, there will be full support for automated storage tiering in Azure (it still has some limitiations that preclude it’s use for SQL Server backups), so for right now you need the additional manual step.

While I wrote this post around Azure, but I think the same logic should apply to your on-premises VMs and even physical machines. You should be able to get Windows and SQL Server installed within 20-30 minutes if you have a very manual process, and if you have an automated process you should be able to have a machine in less than 10. When I worked at the large cable company, we didn’t backup any of our SQL Server, just the databases.

analog audio backup broken

Photo by Anthony on Pexels.com

Run a PowerShell Script Against all of Your Azure SQL Databases

I started working on this bit of code a few months ago, and it’s served me really well. Just about every command you run against a SQL Database requires you to supply the server name and the resource group name at parameters. And in order to get the list of server names you have to do it for each for resource group.

abstract art circle clockwork

Photo by Pixabay on Pexels.com

This code is pretty simple and looks for an Azure SQL Server in each resource group, and then looks for the databases that aren’t master on each server. In this example I’m setting the storage account for Azure Threat Detection, but you could do anything you wanted in that last loop.

$rg=(Get-AzResourceGroup).ResourceGroupName

foreach ($rgs in $rg)

{

  $svr=(get-azsqlserver -ResourceGroupName $rgs).ServerName

  #write-host 'rg:'$rgs

    foreach ($svrs in $svr)

    {

    #write-host 'server:'$svrs

    if ($svr.Location -eq 'West US' ) {set-variable $stg='storage2'}

    {

     $dbs=(Get-azSqlDatabase -ResourceGroupName $rgs -ServerName  $svrs|Where-Object {$_.DatabaseName -NE 'master'}).DatabaseName|Set-AzSqlDatabaseThreatDetectionPolicy -ResourceGroupName $rgs -ServerName $svrs -DatabaseName $dbs -NotificationRecipientsEmails "bob@contoso.com" -EmailAdmins $True -StorageAccountName $stg




    else ($svr.Location -eq 'West US 2') {set-variable $stg='storage1'}




        $dbs=(Get-azSqlDatabase -ResourceGroupName $rgs -ServerName  $svrs|Where-Object {$_.DatabaseName -NE 'master'}).DatabaseName|Set-AzSqlDatabaseThreatDetectionPolicy -ResourceGroupName $rgs -ServerName $svrs -DatabaseName $dbs -NotificationRecipientsEmails "bob@contoso.com" -EmailAdmins $True -StorageAccountName $stg

    }

    }

}
The last bit of complication in this code, is specifying the storage account based on the location of the Azure SQL Server, which is a property of the server’s object.

The Challenge of Migrating to Azure SQL Database Managed Instance

When Azure SQL Database Managed Instance was introduced to the public at //build a couple of years ago, it was billed as a solution to ease the migration from either on-premises or even infrastructure as a service VMs. You would get all of the benefits of a managed service like built-in high availability and patching, automated backups, and you could do all of the things you couldn’t do in Azure SQL Database, like run CLR, use cross-database queries and have SQL Agent jobs without having to learn Azure Automation and PowerShell. The final big bonus was that you restore your backups from on-premises into the managed instance environment. No more dealing with DACPACs and crying, and drinking, and crying, and drinking, and crying.

 

photo of woodpile

Photo by João Vítor Heinrichs on Pexels.com

I had very early access to managed instance servers, and it seemed obvious to me that an easy migration approach would be to use log shipping. You could write your backups from your source server to URL, restore them with NORECOVERY to your managed instance, repeat the process with log backups, and voila you were in a managed instance. Quick and easy, and more importantly, if you were a DBA, nearly the exact same process you would have executed in your on-premises environment (except with backups to blob storage).

There was a long period of time, were we Data Platform MVPs were unable to deploy managed instances into our Microsoft subscriptions. Which is fine, when capacity is short, it should go to paying customers, not us idiots. However, this meant I was away from the product for a while. During this time Microsoft introduced the Data Migration Service, a comprehensive set of tools to move your data to and from a variety of platforms in an online and offline manor.

While DMS is pretty interesting tooling, I had mostly ignored it until recently. Functionally, the tool works pretty well. The problem is it requires a lot of privileges–you have to have someone who can create a service principal and you need to have the following ports open between your source machine and your managed instance:

  • 443
  • 53
  • 9354
  • 445
  • 12000

While the scope of those firewall rules is limited, in a larger enterprise, explaining why you need port 445 open to anything is going to be challenging. So in addition an AAD admin, the DBA is going to need a network admin to enable this. That service principal you created is also going to need the contributor permission on the entire subscription. Yes, that means it can create anything in the entire subscription. This is probably my biggest complaint. Microsoft does acknowledge this in docs, and says they are working to reduce the permissions that are required.

I’m currently engaged in a VM to Managed Instance migration, and when the client’s DBA was complaining about the complexity of the DMS, I suggested we just use log shipping like I had done when I first played with the Managed Instance service. I was trying to figure out how to automate the process, but then I figured I should just verify I could do a restore with NORECOVERY.

Msg 3032, Level 16, State 2, Line 11
One or more of the options (stats, norecovery, stats=) are not supported for this statement. Review the documentation for supported options.

Sad Trombone. That means the only way to migrate a database in near real-time is to use the DMS. And it’s going to take half of your IT staff in order to do it. In order to reduce the friction to migrations I’ve yelled at a couple of PMs about this, but I thought I would create a User Voice option.

https://feedback.azure.com/forums/908035-sql-server/suggestions/38267374-add-restore-with-norecovery-back-to-managed-instan

Please vote for it, if you are interested.

 

Three Azure Features You Should Really Be Using

There was a thread on one of the Microsoft MVP distribution lists the other week, about recovering from a deleted resource that reminded me of a post I had been meaning to write. In many organizations, the public cloud is the wild west of the IT organization. In the worst cases, this means organization admins are using their gmail accounts to access the subscriptions, but even in well run organizations, the ease of deploying cloud resources leads to the dreaded server sprawl. In this post, I’m going to talk about three features of the Azure Resource Manager architecture that you should be using to better manage your subscription: tags, policies, and locks.

Tags

several assorted color tags

Photo by rawpixel.com on Pexels.com

When I worked in corporate IT, there was no discussion I hated more than the dreaded “server naming convention” discussion. It would typically be held in a room filled with middle managers (who would nearly always be men) who felt the need to exercise their dominance by defining at least two characters of the up to 15 we were allowed by NetBIOS. This also lead to metadata packing as I called it–where we would end up with server names like SWCSAPSQL01P, which would indicate the company, the data center location, the application, the function, an integer, and the environment. Plus server names like that roll right of the tongue. In reality, this is kind of a terrible way to define metadata about server resources, and in a world where we are using things like containers which are disposable, this paradigm does not work. Fortunately Azure (and AWS and Kubernetes) allow for tagging of resources. Tags are simply key-value pairs that describe our objects. For example, if I had a VM running SAP’s SQL Server, I might have the following tags:

Environment:Production

Application:SAP

Function:SQL

Cost Center:Operations

Tags are free form, and you have up to 15 of them per resource, so you describe things very well. Tags also roll up on your Azure bill–hence my use of the cost center tag in my example. You can also use PowerShell and Azure CLI to filter operations by tags, so they are essential to filtering your management and maintenance tasks.

Policy

account black and white business commerce

Photo by Pixabay on Pexels.com

If you are thinking “tags are a really good idea, but the other people on my team are lazy and will never remember to use them” do I have the solution for you. Similar to Windows Server Active Directory, Azure allows you to implement policies to manage your subscription. Before we delve too deep into Azure Policy specifics, let’s talk about the hierarchy of resources within your Azure subscription (for the purposes of this post I’m talking about a single subscription).

Screen Shot 2019-07-29 at 9.32.43 AM

At the highest level we have the subscription, which is made of one or more resource groups, which themselves are composed of zero or more resources. This hierarchy is important to understand for many reasons, not the least of which is that it is how role based access control (RBAC) works in Azure. Security (and policy) are scoped at a level, and then have inheritance down. If you have a role granted at the level of the subscription, you are going to have access to all of the resource groups and resources in that subscription (unless someone has issued an explicit deny, but that’s a different post).

Policy works the same way–a policy has a scope of either a specific resource group or at the subscription. You can define a policy to do any number of things in Azure–if you want to define a policy to enforce tagging and scope it at the subscription level, no one will be able to deploy a resource without the tags you have specified in your policy. You can also specify the type, sizes, and regions where your users can deploy resources. Once policy is in place, users will receive an error if they try to deploy resources that do not meet the definition. Because of this, you should also socialize your policies with anyone who will be deploying resources into your environment, so that they know the rules, and don’t come to your desk with a bat.

Locks

door green closed lock

Photo by Life Of Pix on Pexels.com

The final feature that you should be using are locks. Locks are just what they sound like, and they can perform one of two functions: prevent any changes to the resource, resource group, or subscription (read-only locks) or not allowing any resource at the scope of the lock to be deleted (delete locks). I don’t really like using read-only locks, as they prevent changes like adding disk space, or changing the performance level of an Azure SQL Database. However, I think delete locks should go on every production resource in your subscription. Locks can be removed, if you are the owner of a resource or resource group, but the if you attempt to delete the resource with a lock in place, Azure will throw an error message indicating that the lock is in place.

The cloud is big and complex, and it’s easy to let resources grow out of control, which can lead to configuration drift, security problems, and most importantly excessive spend. These are just a few of the many built-in

tools you can use to make cloud management easier.

SQL Server 2019 is Now Available on Windows Containers—Why You’re Doing It Wrong

I try to avoid writing blog posts that I like to call “hot takes”—quick crappy opinions on the news of the day, but this is a topic I feel particularly strongly about. I’m not sure how many of you were using Microsoft’s Azure Hadoop offering HDInsight, when it debuted in the 2012-13 timeframe, but it had a unique characteristic. Unlike virtually every other Hadoop offering at the time (and this was a hot era for Hadoop) HDInsight ran on Windows Server. That meant all the assorted utilities in and around Hadoop were always trailing what was current on Linux. It also meant that when you had issues, and you searched for help in various online forums, you were always challenged because you got weird error messages, and your cluster used oddball file pathing because Windows. Since 2013, Microsoft has gotten a new CEO, the stock price has shot way up, and the company has embraced open source software. SQL Server is on Linux now, which leads me to my next points.

Screen Shot 2019-07-03 at 12.37.00 PM

Ever since SQL Server Helsinki debuted in Docker, I’ve seen the benefit of using containers with databases. When SQL Server started supporting Kubernetes, I really saw the benefit and quickly embraced this, by writing and presenting on the topic, and evangelizing all the benefits of the Kubernetes platform (of which there are many). Before I start my rant, remember in a container platform, there is only one copy of the base operating system on a given host. Your container contains the libraries and binaries it needs but shares a kernel with the host operating system. This means the base operating system of the host must be the same as that of the container.

Much like Hadoop, Kubernetes was built from the ground up as a Linux based platform. When Google built the Borg cluster management system that eventually became Kubernetes, it was reportedly built on a custom build of OpenBSD. This means there a lot of assumptions about the way things work in Linux that are built into Kubernetes. While, I know I’ve heard a reasonable amount of community demand for Windows containers (clearly enough that Microsoft has made an effort to build support both into Windows Server and Azure Kubernetes Services), I can’t help but feel this is not a good long term plan.

When dealing with open source software, it’s good to be on a platform that is widely utilized. When you are searching for help on forums, or looking for the latest patch, the platform that is the most widely utilized. Another example I like to use for this, is Oracle on Windows, which I supported in a past life. Since Oracle was most commonly run on Linux/UNIX, patches for Windows were always days and weeks behind. While, I appreciate the effort of the Windows team to build container and Kubernetes support into the platform, Microsoft is going to be the only support/patching path for Kubernetes on Windows, which hampers one of the key benefits of OSS, rapid fixes.

There’s another elephant that’s in the room—in this scenario, Linux is free as in beer, and you will have to license (pay for) your Windows nodes. If you are running a supported version of Linux like Red Hat (now presented by IBM), you will pay about the same cost as Windows Server licensing, but in most cases organizations running Kubernetes are doing it on a free version of the operating system.

I don’t mean to slight anything Microsoft is doing (note: I’m a shareholder and current a contractor at MS), but I feel as though if you are implementing Kubernetes on Windows, you are likely doing containers wrong. With .NET Core and SQL Server being available on Linux there are few reasons to tie your development to the Windows platform. System administration reasons like domain authentication and group policy support make some sense to me, however I can’t help but think this feels like Hadoop on Windows. Also, the lack of community support can’t be overemphasized–this is a big deal, especially on a not fully mature platform like Kubernetes. By the way, Microsoft stopped offering HDInsight on Windows sometime in 2014-15, just saying.

 

 

Adventures in Awful Application Design–Amtrak

I was going to New York last weekend from my home of Philadelphia. We were running late for the train, and for the first time ever, I had a booked an award ticket on Amtrak. For reasons unbeknownst to me, you can not make changes to an award ticket on their app (I didn’t try the website). Additionally, when you call the standard Amtrak line, the customer service reps can’t change an award ticket, unless you have defined a PIN. This PIN is defined by telling an awards customer service rep what you want your PIN to be. (Because that’s really secure). While this is all god awful business process, that is exacerbated by crappy IT, it’s really down to bad business processes.

photo of railway on mountain near houses

Photo by SenuScape on Pexels.com

The bad application design came into play, when the Awards rep, tried to change my ticket, and she asked “do you have it open in our app? I can’t make changes to it if you have it open.” My jaw kind of dropped when this happened, but I went ahead and closed the app. Then the representative was able to make this change. We had to repeat the process when the representative had booked us into the wrong class of service. (The rep was excellent and even called me back directly).

But let’s a talk about the way most mobile apps work. There are a series of front-end screens that are local to your device, and most of the interaction is handled through a series of Rest API calls. The data should be cached locally to your device after the API call, so it can still be read in the event of your device being offline. If you are a database professional, you are used to the concept of ACID (Atomicity, Consistency, Isolation, Durability), which is how we ensure things like bank balances in our databases can remain trusted. In general, a tenant of most databases applications is that readers should never block writers–if a process needs to write to a record someone reading the record should not effect that operation. There are exceptions to this rule, but these rules are generally enforced by the RDBMS in conjunction with the isolation level of the database.

Another tenant of good database development is that you should do what you are doing, and get the hell out of the database. Whether you are reading or writing, you should get the data you need and then return either the data or the status of the transaction to your app. Otherwise, you are going to keep your transaction open, and impact other transactions, in a generally unpredictable set of timings. That’s the whole point of using the aforementioned Rest API calls–they do a thing, return your data, or that you updated some data, and then get the hell out.

What exactly is Amtrak’s app doing? Obviously I don’t have any backend knowledge, but based on that comment from the CS rep, opening your reservation in the mobile app, opens a database transaction. That doesn’t close. I can’t fathom why anyone would ever design an app this way, and I can’t think of a tool that would easily let you design an app like this. At some point, someone made a really bad design choice.

Let’s Talk about Backups, and How to Make Them Easier

Recently I’ve run into a couple of situations where customers had lost key business data due to several factors. Whether it is ransomware, a virus, or just hardware failure, it doesn’t really matter how you lose your data, it just matters that your data is lost, and your business is now in a really bad spot. When I was first thinking about writing this post this morning, I was going to tell you how important it was to backup your databases, and how the cloud is a great disaster recovery solution for those backups. The problem is, if you are reading this blog, you likely at least know that you should have backups. You probably even know how to optimize them to make them run faster, and you test your restores. You do test your restores, right?

analog audio backup broken

Photo by Anthony on Pexels.com

Then I thought a little harder, and I was reminded of a tweet that my good friend Vicky Harp (the Principal Program Manager of the SQL Server tools team at Microsoft) wrote a couple of years ago:

Backups are DBA 101, but most of the organizations who are having these types of issues don’t have a DBA. They might not even have a dedicated IT person, or if they do it’s someone who comes by once a week to make sure the printer and wifi still works and takes care of company laptops. The current situation is that we hope they go to a SQL Saturday, or user group, and learn about backups and start taking them. So, I thought, what could be done to make that easier, and faster. The database engine has technology to do backups automatically (maintenance plans) and even move them to secondary or tertiary location (backup to URL).

What I’m asking for (besides you to upvote that User Voice item) is for Microsoft, as part of SQL Server setup, to add two new screens. The first would be called “Backup”—it would offer a dire warning to the effect of:

In order to protect the data in your databases, Microsoft strongly encourages you to take backups of your data. In the event of hardware failure, data corruption, or malicious software, Microsoft support will be unable to help you recovery your data, and you will incur data loss. This box is checked by defaults to enable automatic daily backups of all of your databases.

The next screen would offer you options on where to store your data, and how long to retain it. It would offer the option to store the backups locally, or on a network share, and give you the ability to encrypt your databases. It would also allow you to backup your encryption key (and strongly encourage you to do so).

The next part of that screen is where I think this could become attractive to Microsoft. It would give you the option to backup your databases to URL in Azure, and if you didn’t have an Azure account, it would allow the user to create one. Frankly, for most organizations who would be using this for their backup solution, Azure is the best option.

Arguments Against This Feature

The main arguments I could make against this feature request are minimal. One could argue that you would like to use Ola’s scripts, or DBATools, or change striping or the buffer count for your backup. If you are making any of those arguments, this feature isn’t for you. If you’ve ever installed SQL Server with a configuration file, this feature isn’t for you. The only valid argument I can see against doing this, is that one could potentially fill up a file system with backup files. Maintenance plans do have the ability to prune old backups, so I would include that in my deployment. I might also build in some alerting and warnings into the SQL Agent to notify someone by default.

The other argument I see, is that Microsoft offers a similar product with Azure Backup for SQL VMs, and this would cannibalize that feature. It very well might, but that product is limited to Azure, and we are aiming for the greater good here helping more people protect their data is good for Microsoft, good for SQL Server, and good for the world.

Summary

If you are reading this, go upvote my User Voice request here. This feature isn’t about you, it’s about all the orgs who’ve IT decisions have them at the point of data loss, and they were really none the wiser. Let’s make life easier for folks.

The Ransomware Breach Your’re Going to Have

I don’t typically blog about network engineering here. However, in the last few weeks I’ve seen several major companies get hit with ransomware attacks. While this isn’t an uncommon thing in 2019, it is uncommon that their entire environments were taken offline because of it. So with that, let’s talk about how these attacks can do this. Since the best way to deal with any sort of an attack is to deal with an “Assume breach” model, let’s talk about the best way to defend against attacks: physical network security.

The Attack Vector: Dumb Humans

The easiest way to attack any company is via human targets. Whether it’s bribing a sysadmin to get credentials, or a standard phishing attack, or any sort of other malware, the best way to pwn a company is by getting an unknowing (or knowing) human to most of the work for you. There are ways to stop these sorts of things from getting in the front door–the first would be to use an email service like Office 365 or GMail, that have built-in phishing protections, and use machine learning based on their mass volume of exposure to attacks, that protect you. You should also educate your users to avoid these scams–there is good training that I’ve taken for a few clients that do this.

But the real assumption is to take an assume breach methodology. I’m currently working on a financial system for a Fortune 100 company. In order to reach the servers, I have to use a special locked down laptop, have two key cards, and go through two jump hosts to get there. Even if that laptop were to get hacked (and it wouldn’t because you can’t install software on it) you couldn’t do anything without my key cards and PINs.

Physical and Virtual Network Security

Can you connect to your production database servers over any port from your desktop? Or to your domain controllers? If you can do that you have a problem. Because once someone’s desktop gets pwned, then the malicious software that gets installed when the CEO tries to open a PDF with the new Taylor Swift album on it, can then run anywhere on your network. This is bad.

orange yellow green and blue abstract painting

Photo by Steve Johnson on Pexels.com

The networking gospel according to IBM.

The IBM white paper linked above is the gold standard of how to build and segment your network. In a common example there are a few zones:

Black Zone: No outbound external traffic, inbound restricted to whitelisted IPs and ports from within black and green zones. This is where your domain controllers and database servers with any sort of sensitive data should live.

Green Zone: Limited external traffic (think Windows Update, Power BI gateway, Linux package repos), can communicate to end use networks over controlled ports. This is where most of your application servers and some management services should live.

Blue Zone: Management zone–this is where you should have your jump boxes so that you don’t have to log directly onto production boxes. This can have limited external traffic, and should be able to talk to the servers in the black zones, but only over ports that you have specified.

Yellow Zone: This is typically where your DMZ will exist. That means you are allowing inbound traffic from the internet. This is obviously a big attack vector, which means it should live on an isolated sector of your network, and the traffic to and from this zone should be locked down to specific IPs and ports that need connectivity.

Red Zone: This is where the users live. There’s internet access, but communications from this network should nearly always stay within this network. You will have teams that want to deploy production workloads in this network. Don’t let them.

But That Sounds Hard

Good security is always hard. See my above server management issues. In my scenario, when the CEO got pwned you might have to deal with a bunch of ransomware laptops, but since your servers and domain controllers weren’t easily reachable from the desktop network, your company would keep moving, and you could simply re-image the pwned machines.

This is Trivial to Implement with Cloud Networks

In order to do this on-premises, you may have to buy a lot more networking gear than you already have, or at least restructure a whole lot of virtual lans (vLANs). However, in a cloud scenario, or even in some virtual infrastructures this kind of model is trivial to implement. Just look up network security zones in Azure (and you never have to run any cable in the cloud).

Technology, and especially enterprise technology isn’t easy, but it’s more important than ever to use good security methods across your environment.

Azure Hybrid Automation–Performing Tasks in a VM

Recently, I learned something new about Azure Automation–that you could execute tasks inside of a VM. I haven’t dealt with this situation before–typically, any tasks I want to execute inside of a VM, I execute using a scheduler like SQL Agent, or the Windows scheduler. However, in this situation, we were trying to reduce costs, and this VM is just used for ETL processing (the database in Azure SQL DB), but we still needed to take backups of the Master Data Services database.

My first thought around this was to have an SQL Server Agent job that either executed on startup, or once a day, however this was messy, and could potentially lead to having several sets of unnecessary backups a day. I knew I could create an Azure Automation job that would check the status of the VM, and start it if it was stopped. What I needed to figure out from there was:

  1. How to get a command to execute inside of the VM
  2. To shutdown the VM if it had not been running previously

Enter Azure Hybrid RunbooksHybridAutomation

While this image shows a potential on-premises > Azure use case, in my case I was simply using this to connect to a VM within Azure. You will have to run some PowerShell to enable this on your machine, but after that step is completed, you can simply call Azure Automation cmdlets with the “-runon” option, which specifies the name of the hybrid worker group you created.

The other trick to this, was calling a runbook from a runbook, which wasn’t well documented.

# Ensures you do not inherit an AzureRMContext in your runbook

Disable-AzureRmContextAutosave –Scope Process


if ($vm -ne 'PowerState/running')

{start-azurermvm -ResourceGroupName $RGName -Name $vmName;

start-sleep -Seconds 35;

start-azureRMautomationrunbook -AutomationAccount 'Automation' -Name 'BackupDB' -ResourceGroupName $rgName -AzureRMContext $AzureContext -Runon 'Backups' -Wait;

stop-azurermvm -Name $VMname -ResourceGroupName $RgName -force}

else

{start-azureRMautomationrunbook -AutomationAccount 'Automation' -Name 'BackupDB'`

-ResourceGroupName $rgName -AzureRMContext $AzureContext -RunOn 'Backups' -Wait}

In order to execute this inner runbook, you simply use the Start-AzureRMAutomationRunbook and since it’s hybrid, we use the aforementioned -runon option. My BackupDB runbook simply uses the DBATools

backup-dbadatabase cmdlet, to perform a full backup.

It’s quick and easy to get this up and running in your automation account–the hardest part is getting all of the modules you need into your automation account.

%d bloggers like this: