How the Cloud Democratizes Solutions

The grey hairs I see every now and again remind me of how long I’ve been working in IT. I’m now in my 22nd year of having a job (2 years as an intern, but I did have the ‘sys’ password), and I’ve seen a lot of things come and go. When I started working servers cost at least tens of thousands of dollars, were the size of refrigerators, and had less processing power than the Macbook Pro I’m typing this post on. More importantly, they had service contracts that cost a rather large amount of cash per year. This bought you, well I’m not sure, but your hardware reps surely took you out for at least a steak dinner or two. Eventually, we moved to commodity hardware (those boxes from HP and Dell that cost a tenth of what you paid 20 years ago) and service contracts were a few hundred dollars per server per year. (And then those sales reps started selling expensive storage).

Most of my career has been spent working in large Fortune 500 enterprises—I think about things that at the time were only available to organizations of that size and budget, and are now available for a few clicks and a few dollars per hour. I’m going to focus on three specific technologies that I think are cool, and interesting, but as I’m writing this, I’ve already thought of three or four more.


Massively Parallel Processing

MPP processing is a data warehouse design pattern that allows for massive scale out solutions, to quickly process very large amounts of data. In the past it required buying an expensive appliance from Teradata, Oracle, Netezza, or Microsoft. I’m going to focus on Microsoft here, but there are several other cloud options for this model, like Amazon Redshift or Snowflake, amongst others. In terms of the on-premises investment you had to make, effectively to get your foot in the door with one of these solutions, you were looking at at least $250k/USD which is a fairly conservative estimate. In a cloud world? SQL Data Warehouse can cost as little as $6/hour, and while that can add up to a tidy sum over the course of a month, you can pause the service when you aren’t using it, and only pay for the storage. This allows you to do quick proof of concept work, and more importantly compare solutions to see which one best meets your needs. It also allows a smaller organization to get into the MPP game without a major investment.

Secondary and Tertiary Data Centers

Many organizations have two data centers. I’ve only worked for one that had a third data center. You may ask why is this important? A common question I get when teaching Always On Availability Groups, is if we are split across multiple sites, where do we put the quorum file share? The correct answer is that it should be in a third, independent data center. (Which virtually no organizations have). However, Windows Server 2016 offers a great solution, for mere pennies a month—a cloud witness, a “disk” stored in Azure Blob Storage. If you aren’t on Windows Server 2016, it may be possible to implement a similar design using Azure File Storage, but it is not natively supported. Additionally, cloud computing greatly simplifies the process of having multiple data centers. There’s no worries about having staff in two locales, or getting network connectivity between the two sites; that’s all done by your cloud vendor. Just build stuff, and make it reliable. And freaking test your DR (That doesn’t change in the cloud)

Multi-Factor Authentication

If you aren’t using multi-factor authentication for just about everything, you are doing it wrong. (This was a tell that Joey wrote this post and not the Russian mafia). Anyway, security is more important than ever (hackers use the cloud too) and having multi-factor authentication can offer additional levels of security that go far beyond passwords. MFA is not a new thing, and I have a collection of dead SecureID tokens dating back to the 90s that tell me that. However, implementing MFA used to require you to buy an appliance (later it was some software), possible have ADFS, and a lot of complicated configuration work. Now? It’s simply a setting in the Office 365 portal (if you are using AAD authentication; if you are an MS shop learn AAD, it’s a good thing). While I complain about the portal, and how buried this setting is, it’s still far easier and cheaper (a few $ a month) than buying stuff and configured ADFS.

These a but a few examples of how cloud computing makes things that used to be available to only the largest enterprises available to organizations of any size. Cloud is cool and makes things better.

Have You Patched For Spectre/Meltdown Yet? (And more on patches)

It’s security week here at DCAC (you can join us on Friday January 19th, 2018 at 2PM in a webcast to talk more about security) and I wanted to focus on patches. I wrote a couple of weeks ago about the impact of Spectre and Meltdown to SQL Server (and just about every other thing that runs on silicon chips). Well in the interim, Microsoft has patched all currently supported editions of SQL Server—the patches can be hard to find but are all summarized in this KB article. I can’t emphasize enough the need to patch all of your infrastructure for this—the vulnerabilities are big and they are really bad. While you may have physically isolated servers (though these are a rarity in modern IT) an attacker may have gained access to your network via other credentials that were taken from an unpatched server.

So to summarize, you need to patch the following:

  • System BIOs
  • Hypervisor
  • Guest Operating System
  • Browser
  • Your Mouse (probably)

That’s a lot of patching. And a lot of downtime, and testing. It sucks, and yeah, it’s probably also going to impact server performance. You still need to do it—unless you want to be next guy blamed by the CEO of Equifax.

Which brings me to my next topic.


What is your patching strategy?

In my career I found enterprise IT to be stodgy and not always open to new ideas. We were also slow to move generally, and operated a couple of years and versions behind modern IT. However, all of the large enterprises where I worked (5 different Fortune 100s) were really good at centralized management of systems. Which made things like patching much easier. At the telecom company where I worked, I remember having to patch all Windows Servers to fix a remote desktop vulnerability—it was one my first tasks there. We had System Center Configuration Manager to patch (and inventory the patch) of all of those servers. We had a defined maintenance window, and good executive support to say we are going to apply system updates, and you should build customer facing applications to be fault tolerant.

Smaller organizations have challenges with patching—Cabletown had a team of two people who’s job was to manage SCCM. Many smaller orgs are lucky if they have a sysadmin and Windows Server Update Services. So how do you manage updates in a small org? My first recommendation would be to get WSUS—we have it on our organization, and we’re tiny. However, you still need to manage rebooting boxes, and applying SQL Server CUs (and testing, maybe). So what can you do?

  • Use the cloud for testing patches
  • Get a regular patching window
  • Use WSUS to check status of updates
  • When in doubt, apply the patch. I’d rather have to restore a system than be on the news

I mentioned the cloud above—one thing you may want to consider for customer facing applications is platform as a service offerings like Amazon RDS, Microsoft Azure SQL Database, and Azure Web Apps. These services are managed for you, and have been architected to minimize downtime for updates. For example if you are using Azure SQL Database, when you woke up to the Meltdown/Spectre news, your databases were already protected. Without significant downtime.

Spectre and Meltdown–What does this mean for your SQL Servers?

By now, you have probably heard about a major bug in Intel (and ARM, and maybe AMD) processors. Since this vulnerability affects all processors types, you will probably need to update your phone, your tablet, your PC, and all of your servers in the coming weeks. Why and how does it affect everything?

What is this Bug About?

Modern (since the early 2000s) processors use what’s known as “out-of-order” execution. This is somewhat similar to the way SQL Server does read-ahead–in order to improve performance the CPU will execute a series of instructions before the first one is necessary complete. The way this bug works is that an attacker can pass code that will fail, but using some trickery and the magic of caches, build and retrieve (at up to 500 kb/s ) all of the kernel memory. This means things like passwords, that are normally secured in kernel memory can be stolen very easily by an attacker.

Should I Patch?


This is deadly serious bug, that is easy to exploit, with a ton of vectors. You should go through your normal test cycle with patch validation, but applying these patches should be a key priority on the coming weeks.

For a comprehensive listing of patches Allan Hirt (b|t) of SQLHA has put together the most comprehensive reference I’ve seen.

The biggest impact thing that I’ve gleaned from reading all of this stuff, is that if you are running VMWare ESX 5.5, the patches that VMWare has released for that version are not comprehensive, so you will need to likely upgrade to ESX 6 or higher.

Will This Impact My Performance?

Probably–especially If you are running on virtual hardware. For workloads on bare metal, the security risk is much lower, so Microsoft is offering a registry option to not include the microcode fixes. Longer term especially if you are audited, or allow application code to run on your database servers, you will need to enable the microcode options.

This will likely get better over time as software patches are released, that are better optimized to make fewer calls. Ultimately, this will need to fixed on the hardware side, and we will need a new generation of hardware to completely solve the security issue with a minimum impact.

What Do I Need to Patch?

Just about everything. If you are running in a Platform as a Service (Azure SQL DW/DB) you are lucky–it’s pretty much done for you. If you are running in a cloud provider, the hypervisor will be patched for you (it already has if you’re on Azure, surprise–hope you had availability sets configured), however you will still need to patch you guest VMs. You will also need to patch SQL Server–see this document from Microsoft. (also, note a whole bunch of new patches were released today.

What Other Things Do I Need to Think About?

This is just from a SQL Server perspective, but there are a few things that are a really big deal in terms of security now until you have everything patched:

  • Linked Servers–If you have linked servers, you should assume an attacker can read the memory of a linked server by running code from the original box
  • VMs–until everything is patched, an attacker can do all sorts of bad things from the hypervisor level
  • CLR/Python/R–By using external code from SQL Server an attacker can potentially attack system memory.

Don’t Let Your Infrastructure Team Design Your Data Protection Strategy

In the last two days, I’ve been part of two discussions, one of which was about the need to run CHECKDB on modern storage (yes, the answer is always yes, and twice on Sundays), and then another about problems with third party backup utilities. Both of these discussions were born out of (at the end of the day) infrastructure teams wanting to treat database servers like web, application, and file servers, and have one tool to manage them all. Sadly, the world isn’t that simple, and database servers (I’m writing this generically, because I’ve seen this issue crop up with both Oracle and SQL Server). Here’s how it happens, invariably your infrastructure team has moved to a new storage product, or bought an add on for a virtualization platform, that will meet all of their needs. In one place, with one tool.


So what’s the problem with this? As a DBA you lose visibility into the solution. Your backups are no longer .bak and .trn files, instead, they are off in some mystical repository. You have to learn a new tool, to do the most critical part of your job (recovering data), and maybe you don’t have the control over your backups that you might otherwise have. Want to stripe backups across multiple files for speed? Can’t do that. Want to do more granular recovery? There’s no option in the tool for that. Or my favorite one—want to do page level recovery? Good luck getting that from a infrastructure backup tool. I’m not saying all 3rd party backup tools are bad—if you buy one from a database specific vendor like RedGate, Idera, or Quest, you can get value-added features in conjunction with your native ones.

BTW, just an FYI, if you are using a 3rd party backup tool, and something goes terribly sideways, don’t bother calling Microsoft CSS, as they will direct you to the vendor of that software, since they don’t have the wherewithal to support solutions they didn’t write.

Most of the bad tools I’m referring to, operate at the storage layer by taking VSS snapshots of the database after quickly freezing the I/O. In the best cases, this is non-consequential, Microsoft let’s you do it in Azure (in fact it’s a way to get instant file initialization on a transaction log file, I’ll write about that next week). However, in some cases these tools can have faults, or take too long to complete a snapshot, and that can do things like cause an availability group to failover, or in the worst case, corrupt the underlying database, while taking a backup, which is pretty ironic.

While snapshot backups can be a good thing for very large databases, in most cases with a good I/O subsystem, and backup tuning (using multiple files, increasing transfer size) you can backup very large databases in a normal window. I manage a system that backs up 20 TB every day with Ola Hallengren’s scripts, and not even storage magic. Not all of these storage based solutions are bad, but as your move to larger vendors who are further and further removed from what SQL Server or Oracle are, you will likely run into problems. So ask a lot of questions, and ask for plenty of testing time.

So if you don’t like the answers you get, or the results of your testing, what do you do? The place to make the arguments are to the business team for the applications you support. Don’t do this without merit to your argument—you don’t want to unnecessarily burn a bridge with the infrastructure folks, but at the end of the day your backups, and more importantly your recovery IS YOUR JOB AS A DBA, and you need a way to get the best outcome for your business. So make the argument to your business unit, that “Insert 3rd Party Snapshot Magic” here isn’t a good data protection solution and have them raise to the infrastructure management.

Azure VMs with Reduced Core Counts

Something that has come from Microsoft in the last couple of months is the ability to provision an Azure Virtual Machine (VM) with fewer CPUs, than the machine has allocated to it. For example, by default a GS5 VM has 32 CPUs and 448 GB of memory. Let’s say you want to migrate your on-premises SQL Server that has 16 cores and 400 GB of RAM. Well, if you wanted to use a normal GS5, you would have to license an additional 16 cores of SQL Server (since you have all that RAM I’m assuming that you are using Enterprise Edition). Now, with this option, you can get a GS5 with only 16 (or 8) CPUs.

I can hear open source database professionals laughing at turning down CPUs, but this is a reality in many Oracle and SQL Server organizations, so Microsoft is doing us a favor. This doesn’t apply to all VM classes, and currently the pricing calculator does not show these options, however they are in the Azure portal when you select a new VM for creation. The costs of these VMs are the same as the fully allocated ones, though if you are renting your SQL Server licenses through Azure those costs are less.

This option is available on the D, E, G, and M series VMs, and mainly on the larger sizes in those series. If you would like to see cores reduced on other VM sizes, send feedback to Microsoft.



Resources from Live 360 Postcon

I did a postcon at Live360 in Orlando last week, and I promised that I would share the resources I talked about during the event. I’d like to thank everyone who attended, it was the end of a long conference, and the audience was attentive and had lots of great questions. So anyway here goes in terms of resources that we talked about during the session:

1) Automated build scripts for installing SQL Server.

2)  SentryOne Plan Explorer

3) Glenn Berry’s Diagnostic Scripts- (B|T) These were the queries I was using to look at things like Page Life Expectancy, Disk Latency on the plan cache, Glenn’s scripts are fantastic and he does and excellent job of keeping them up to date.

4) SP_WhoIsActive this script is from Adam Machanic (b|t) and will show you what’s going on a server at any point in time. You have the option of logging this to a table or getting replication.

Finally, my slides are here. Thanks everyone for attending and thanks 1105 Media for a great conference.

Speaker Idol 2017—A Judge’s Tale

Not to sound too much like Juan Antonio Samaranch, but 2017 was truly the best Speaker Idol I’ve ever had the pleasure of judging (and I’ve been judgy at all of them). I would foremost like to thank Tom LaRock (b|t) for stepping in as an emcee while Denny Cherry was ailing. Tom did an excellent job of keeping the contest flowing, and his stage presence and sense of humor kept our audience (and judges) entertained. I am also taking the opportunity of this post to document two new rules that we are introducing to Speaker Idol for 2018. (I think the official rules may be in a OneNote somewhere, or it’s just tribal knowledge between Denny and I):

  1. Contestants, or any representatives of contestants may have no written or internet communications related in any way to the judging of the contest with judges during the period of the competition. (Which is defined as the moment the first speaker idol contestant speaks, until the final decision of the winner is announced). Penalty is disqualification and removal from final.

This means  if you won, you aren’t allowed to ask (or have anyone else ask) the judges what you did right/wrong in your talk. I would extend this rule only to round winners, but since runner ups have the ability to wildcard into the finals, or if a round winner does this they could automatically be promoted, the rule applies to everyone. You are free to ask judges for their feedback after the competition, but the for the most part, what we say to you on stage is our feedback. If it was really bad, we might be a little nice, but you likely know it was really bad. You can still say hi to a judge at the conference, but don’t ask them how you could improve.

2. No gifts of any value may be offered to judges within a 90 day period before and after the competition.

I chose 90 days somewhat arbitrarily for this, because I don’t think we name contestants 90 days before Summit. And if you care enough to buy judges 30 year old scotch 3 months after the competition, more power to you. If you want to give a judge a sticker, or a business card, outside the competition room, that is acceptable. Nothing more. Sticker or business card. No free training. Or logins to your site. 

That’s enough about rules. Let’s talk about the competition.

The Level of Quality was High

As judges, we’ve never actually had to calculate scores before. For this years final round, we actually flipped over the sign in front of the room and objectively scored our top two on the following:

  • Slides
  • Delivery
  • Content

Everyone who made it to the final was good. Really good. Each of them would be a fine speaker at Summit. So what were the differences? When competitions are there close, scoring comes down to very minor factors like body movement on stage, ticks in delivery, and making the most of your time. Another factor is taking feedback from the earlier rounds and incorporating it into your presentation. Almost all of our contestants improved from their preliminary round—if you made it to the final, congratulations you did and excellent job.

Why the Winner Won

There’s a saying I’ve heard in sports, particularly amongst hitters in baseball, and quarterbacks in football, and I can tell you it also holds true in bike racing, that as you become more experienced, everything around you seems to slow down and lets you observe more of what’s going on in the moment, than someone who is less experienced. The same thing applies to public speaking—when you first do it, you feel nervous, and rushed, and you don’t feel like you can just relax and be yourself. The biggest difference between our winner, and our second place competitor, was that Jeremy was relaxed, delivered his content slowly, so that it could be easily consumed, and conveyed a complex technical concept in a manner that was easily understandable. Both presentations were excellent, Jeremy’s simply rose above.

SQL Server 2017 Temporal Enhancements

One of the most popular features in my talks about SQL Server 2016 has been the temporal tables feature. If you aren’t familiar with this feature you can read more about it on Books Online here. In a nutshell, you get a second table that tracks the lineage of your data. This is fantastic for all sorts of scenarios up to and including auditing, data recovery, fraud detection, or even slowly changing dimensions.


This is implemented in SQL Server via a history table—a second copy of your data that maintains timestamps of when the data is valid. As you can imagine this table can grow quite large—Microsoft does us a couple of favors: the history table is page compressed by default (you can use columnstore) and you could put the history table on a different filegroup. The only major issue was to truncate or delete data from history table for pruning purposes, you had to turn of system versioning, or the glue that makes that this feature work.

Starting with SQL Server 2017 (and Azure SQL Database) you can define a retention period and have SQL Server prune records for you. This is awesome and easy—see how to implement here.

Always On Availability Groups transport has detected a missing log block…

If you are running SQL Server 2016 (especially before CU3) you have received this error:

DATE/TIME:    8/21/2017 11:24:53 AM

DESCRIPTION:    Always On Availability Groups transport has detected a missing log block for availability database "Database". LSN of last applied log block is (99939:95847:0). Log scan will be restarted to fix the issue. This is an informational message only. No user action is required.

COMMENT:    (None)

JOB RUN:    (None)

Image result for missing block

I’ve talked with the product team about it, and its just something that happens, and is more of an informational message. Based on some discussions I’ve have on #sqlhelp on Twitter, it may be related to not enough network bandwidth between nodes. But if you see these sporadically, relax, it’s nothing major.

Managed Instances versus Azure SQL Database—What’s the Right Solution for You?

Last week at Microsoft Ignite, Microsoft introduced the public preview of the Managed Instances for Azure SQL Database. This is a new product that is a hybrid between running fully platform as a service (PaaS, in this case being Azure SQL Database) or infrastructure as a service (IaaS, or in this case SQL Server running on a VM). It has built-in support for cross-database queries and basically looks and feels just like your on-premises SQL Server. Probably my favorite part is the ability to just migrate a backup directly into the service from Azure Blob storage. There is also the benefit of being able to bring your own license to reduce some of the costs. Additionally, you can have much larger databases, up to 35 TB.

Image result for instance

Managed instances will be a good fit for you if:

  • You don’t own your code and need it to work with SQL Server
  • Your application makes a lot of cross database calls
  • You need to be able to migrate with near zero downtime
  • You have large databases that are not a good fit for the Azure SQL Database model

I think Azure SQL DB and Managed Instances will co-exist for the foreseeable future as each platform has a good use case.

%d bloggers like this: