A Weird Networking Problem with Azure SQL Database (and Fios)

Note: This is my first blog post in a while. You too may have been stressed by recent events in world. I’m also doing a large content project I’ll hope you get to enjoy in the near future too. This post actually ties into working from home.

 

pexels-photo-373543

So anyway, one of our customers recently spun up a new Azure SQL Database in the Azure South Africa North region, and two of their team members couldn’t connect to it from their homes. The common element–both of them used Verizon Fios as their ISP.

It’s important to note that connectivity for Azure SQL Database is to a public IP address. Each region has a public IP address and a lookup takes place, and then you are connected through a gateway and you eventually connect to the database. The IP address is just a public endpoint. If I run a trace route to a database running in the Azure US East region I see:

Josephs-MacBook-Pro-3:Dropbox joey$ traceroute dcac-demo.database.windows.net
traceroute to cr4.eastus2-a.control.database.windows.net (52.167.104.0), 64 hops max, 52 byte packets
1 192.168.115.1 (192.168.115.1) 2.224 ms 2.849 ms 1.953 ms
2 96.120.76.77 (96.120.76.77) 12.961 ms 12.978 ms 14.438 ms
3 68.86.213.153 (68.86.213.153) 10.461 ms 9.803 ms 8.983 ms
4 68.86.211.170 (68.86.211.170) 16.069 ms 6.967 ms 12.133 ms
5 68.85.158.49 (68.85.158.49) 11.460 ms 15.657 ms 11.314 ms
6 be-201-ar03.ivyland.pa.panjde.comcast.net (96.108.181.21) 14.815 ms 13.470 ms 13.775 ms
7 be-33287-cr01.newark.nj.ibone.comcast.net (68.86.93.173) 18.519 ms 17.500 ms 19.012 ms
8 be-1301-cs03.newark.nj.ibone.comcast.net (96.110.36.89) 16.457 ms 23.452 ms 20.628 ms
9 be-2303-pe03.newark.nj.ibone.comcast.net (96.110.37.74) 15.248 ms 21.317 ms 22.250 ms
10 50.248.118.226 (50.248.118.226) 16.467 ms 18.986 ms 23.555 ms
11 ae23-0.ear01.ewr30.ntwk.msn.net (104.44.239.159) 24.966 ms 22.497 ms
ae24-0.ear01.nyc30.ntwk.msn.net (104.44.239.155) 26.774 ms
12 be-21-0.ibr02.ewr30.ntwk.msn.net (104.44.20.122) 32.792 ms
be-20-0.ibr01.nyc30.ntwk.msn.net (104.44.20.124) 27.437 ms
be-20-0.ibr01.ewr30.ntwk.msn.net (104.44.20.120) 34.036 ms

You can see that on steps 7-9 it my connection jumps onto the Comcast backbone (ibone.comcast.net) network, and at step 11 jumps onto the Microsoft Azure network (msn.net). I truncated the output of the trace route there.

Monica Rathbun (b|t ) who works with me at DCAC, also had Fios and was able to help us troubleshoot this. When Monica ran a trace route to the public IP address the  of Azure SQL Database in South Africa North it looked like:

C:\Users\monica>Tracert egpaftz.database.windows.net

Tracing route to cr1.southafricanorth1-a.control.database.windows.net [102.133.152.0]

over a maximum of 30 hops:

  1    <1 ms     1 ms    <1 ms  Fios_Quantum_Gateway.fios-router.home [192.168.1.1]

  2     3 ms     1 ms     2 ms  102.133.152.0

That is not how the internet is supposed to work. While we tried to figure out how to get in touch with someone at Verizon to fix this, which through normal consumer channels is not fun. So while we waited for that to get fixed, we had another options to fix this.

Introducing Private Link

Azure Private Link was recently introduced as generally available for Azure SQL Database and allows for you to have a truly private connection to your database. For several years now you have been able to use network endpoints to allow a specific VM or App Service to connect to Azure SQL, however that design had some limitations. The first being that it still routed the connection to the database over the public IP address. The second was that network endpoints did not support multi-region scenarios. With Private Link your database has it’s own private IP address on the virtual network where it is deployed.

In our client’s case, this work around involved connecting to an Azure VPN in US East which would then be connected to South Africa North. One interesting thing about Private Link is that it doesn’t support Azure Virtual Network peering, so you will need to create a Gateway if you want your traffic to traverse virtual networks. Additionally, we were working in two different subcriptions, which mean we had to use the PowerShell from here. Private Link is easy to configure from your Azure SQL server logical server, but be sure you have the right permissions in Azure RBAC. I needed to be granted the network contributor role in order to get the GUI to work.

The Conclusion

I had some issues that were permissions related which delayed our implementation of Private Link, and I was fortunate enough to have some members of the Azure Networking product group helping me out. Since Microsoft and Verizon are technically partners (Verizon is an ExpressRoute partner) they were able to get in touch with some folks as at Verizon, and they were able to resolve the problem. Sort of–the trace routes still look weird, but everyone can connect. Verizon reports that the trace routes from what they can see look normal. Which tells me they are doing something really weird on the router itself with how Azure traffic is routed.

Photo by Pixabay on Pexels.com

 

 

 

Storage Field Day 19 MinIO #SFD19

In January I had the chance to attend Storage Field Day 19 in Santa Clara, where we got to meet with a wide variety of startups and large enterprise storage companies. One of the more interesting companies we meet with was MinIO which has a really interesting and compelling object-based storage product.

IMG_3141
Some of MinIOs Customers and Use Cases

I’ve talked about object storage here before, but it’s a very different paradigm than the traditional block based storage you may currently be using. With block storage files are split into evenly sized blocks of data (typically somewhere between 64 KB and 1 MB depending on your vendor). Data protection is provided by traditional RAID options.

Object storage on the other hand doesn’t split files into blocks. Files are stored as objects which contain the file data, metadata, and a unique identifier. There is no limit on the size or amount of the metadata associated with the file. If you have ever created a managed disk in Azure, taken a backup to URL, or used an Azure SQL Database you’ve used object based storage. In object based storage, redundancy is generally provided by maintaining three copies of the object (e.g. a write isn’t considered complete until it writes to all three copies).

Object storage is designed to solve problems of scale. One of the things I learned at Comcast was that the cost of SAN storage didn’t scale to some of the massive petabyte scale data problems we had. The management overhead, the cost, and sometimes even the storage itself does not scale. This is a problem largely for companies like Microsoft, Amazon, Google, Facebook, etc, who have massive amounts of data to store.  But as data volumes grow there are lots of other firms who have very large volumes that they need to manage.

MinIO is a firm that offers such a solution. MinIO offers open source storage management software that offers extremely fast (183 GB/s reads and 171 GB/s writes). It is fully compatible with Amazon’s S3 API, which has somewhat become the de facto standard for object storage. They were working on Azure Blob Storage support when we visited.

One of the ways MinIO is able to get such good performance out of pretty standard hardware is by taking advantage of SIMD processor instructions, which all more text and number crunching to be performed per CPU instruction which dramatically increases performance. SQL Server uses this through the query processor’s use of batch mode.

MinIO’s storage can also be used as a persistent store for Kubernetes (drink), or used for systems like Spark, TensorFlow, and a replacement for Hadoop HDFS. Where you would probably use this in your environment would be to replace your file servers, or as a target for container storage, or maybe even an analytic store. Or you want to become a cloud storage provider and you need to host 50 PB of data in your data center.

Storage Field Day 19–Western Digital #SFD19

As I mentioned in my post last week I recently had the opportunity to attend Storage Field Day 19, where I got to meet with a wide variety of storage software and hardware companies in Silicon Valley. One of the more interesting companies we met with was a longtime player in storage—Western Digital. (Disclosure—I own shares of Western Digital and was gifted an SSD after the event) One of the overwhelming themes of the week was the vast amounts of data that we are generating much of which is coming from video and IoT device telemetry. Western Digital estimates that 103 zetabytes (that’s 103MM petabytes, or 103 Billion terabytes) of just IoT data will be created by 2023.

IMG_3044

We were able to hear from a wide array of executives at Western Digital making up various parts of their business. There are a few market forces that are driving the direction of the company. The first area is gaming—building internal NVME drives with up to 2 terabytes with bandwidth up to 3480 MB/second. Performance is one aspect of gaming systems, but design aesthetic and cooling are also very important.  PC Gaming is a $37.5 billion market, so Western Digital sees this as a major market for them.

While the gaming part of the presentation focused on bleeding edge performance, the rest of the afternoon looked at increasing storage densities. While it went unsaid, I feel like much of the development in the hardware business is increasingly focused on public cloud providers like Microsoft and Amazon, as well as large scale data companies like Facebook and Twitter. Western Digital is at the forefront of this development through the develpment of zoned storage. One of the goals of this extension to the NVME standard is to  allow ultra-fast SSDs to be zoned similar to the way hard drives can now. This is not technology that you will be implementing in your data center anytime soon, however it will likely be coming to a cloud provider in the near future.

The other aspect of storage futures are increased densities. While many analysts have prematurely speculated about the death of the spinning hard drive (in lieu of lighter, faster, cooler solid-state drives), the density offered by traditional hard drives is unmatched. Western Digital showcased volumes up to 20 TB, as well as multi-actuator driveswhich can increase the performance of a spinning disk by an order of magnitude. These drives will consume more power than a traditional drive, but less than the two traditional drives. The data on these platters is striped in a RAID-0 fashion on the drive itself.

The world is heavily dependent on reliable, fast storage for all of the data systems modern life demands. As one of the leading builders of storage media, Western Digital is well positioned to support both end users and hyperscale cloud providers now, and in the future.

Storage Field Day 19–Current State of the Storage Industry #SFD19

Last week I got to spend some time meeting with numerous storage companies in Silicon Valley. I along with another dozen or so delegates met with companies large and small, including Western Digital, Dell EMC, NetApp, and startups like MinIO. I’ll be writing posts in coming weeks to talk about some of the interesting technology we learned about this week.

pexels-photo-157544
Photo by Anthony on Pexels.com

In this post I wanted to focus on some interesting scenarios. It’s something I specifically noticed when we were at Western Digital but came up again particularly with the startups we met with. I had this thought, and then on Sunday Argenis Fernandez (b|t) who recently returned to Pure Storage about about after this tweet.

Argenis was complaining about file systems because when you have very fast (think NVME, or faster) storage, or storage-class memory the overhead of all the things the file systems does become a significant portion of the time that it takes to complete an I/O operation. This isn’t significant when your IOs take 4-5 milliseconds to complete, but when they are completing in 50 microseconds you notice the time it takes for the filesystem to timestamp a file.

This leads me to the point I wanted to make in the post. Storage technology futures are very much bifurcated (that’s a fancy word for going in two directions) –on one end there is ultra-high performance NVME storage for workloads like gaming and ultra-high throughput trading systems. On the other end there is a lot of development around ultra-high density storage for hyperscale providers (that’s basically your public clouds and Facebook).

Did you know that there were hard drives with multiple actuators (needles to a record player for those of you who are old)?

Multi Actuator Technology: A New Performance Breakthrough

The reason why this is happening is that spinning hard drives are here to stay, for density reasons (you may have heard that the world is going to have eleventy billion zetabytes by 2022 and most of it will be in cloud, or something to that effect),  and SSDs still lack the density required to say be a cloud provider or host most of the world’s photos.

The cloud providers are also part of the high-speed storage game–mostly to be able to do things like NMVE over fabric, which will allow ultra-fast disk to by virtualized and shared.

What does this mean for you as a data professional and consumer of storage? It means things probably aren’t going to change that much for you. If you are working with an all-flash vendor for performance storage, you’ll see the gains as NVME rolls in, but a lot of the ultra-high speed storage will be limited by the rest of stack (OS and RDBMS). If you are in the public cloud, I think you will see storage get gradually faster and less latent over the next 18 months, and you will see densities increase in hard drives. Your SAN admin will get some better tools, that I’m going to talk about in some coming posts.

 

 

 

 

 

My SQL Saturday Chicago Precon: Leveling Up Your Azure Skills

I’m doing another precon at SQL Saturday Chicago on March 20th, 2020. The event is at Benedictine University in Lisle, IL. This time we’re going to dive a little bit deeper into Azure. While last years precon focused on basic infrastructure skills in Azure, this time we’re going to focus a little deeper into the specifics of the Data Platform in Azure. I did a variant of this topic in India last August, but I’ve made a few changes based on a number of conversations with customers I had at Microsoft Ignite last year.

aerial photography of building city lights
Photo by Nate on Pexels.com

In this full day session, here’s what we’re going to cover:

  • VMs and Storage (like it or not this is still 80% of the cloud spend, and it’s how people migrate to the cloud)
  • Understanding the Azure services you need to manage your environment (this is going to focus heavily on Azure Automation, Azure Key Vaults, and Logic Apps)
  • Azure Data Platform–This is a Big Topic, but here’s how it breaks down
    • Understand the differences between
      • SQL Server in a VM
      • Azure SQL Database
      • Azure Managed Instance
      • Azure Synapse (nee Azure SQL Data Warehouse)
      • Cosmos DB
    • We’ll talk about the costs/benefits/performance of each of these options, and how they are incorporated into your broader architecture
    • Don’t worry, I’m a DBA, we’ll cover HA/DR here
  • We’ll also talk about a bit about cloud focused application architecture
    • Why Redis is front of your database is a good idea
    • Taking advantage of dynamically scaling resources

That’s a lot of topics–it’s a full day, and the deep dive part will be on the data platform, but any and all questions are welcome. There are still a few discount tickets left–you can sign up here.

 

Let’s Talk About Ransomware (Again)

I’ve posted about ransomware a few times before. At DCAC, we’ve worked with a few customers who’ve fortunately been able to help them recover from their attacks. But ransomware attacks are trending upward, as seen below–there’s a ransomware attacker that has pulled in $10 million dollars this year (this year being so far in 2020. 7 days in). This had been a number of announcements about ransomware attacks in recent days– a small company that went bankrupt and money exchange Travelex who was recently attacked.

https://twitter.com/GossiTheDog/status/1214316619863908352?s=20

https://twitter.com/GossiTheDog/status/1214240881139691520?s=20

https://twitter.com/GossiTheDog/status/1213973368338354176?s=20

https://twitter.com/GossiTheDog/status/1214269524549259264?s=20

https://twitter.com/selenalarson/status/1214630282063745025

Ransomware attacks are quite different from traditional hacking. Traditionally hackers are in pursuit of high value data, whether it be from a government agency or a large enterprise business, or someone with a vast array of customer data like Equifax. Ransomware is targeting easy targets, that the attackers think will pay, or hope they will pay. What that means is that if you work for a smaller firm, you are far more likely to be attacked with ransomware than you would be in a traditional hacking attack. Theoretically,  large enterprises have more network segmentation, and better security controls, that make them less vulnerable to these attacks. As the the last year has shown, this is definitely not always the case.

It’s All About the Network

The way ransomware typically works, is to find a vulnerability via user accounts, in order to run exploit software on your network. If the only thing that can be reached on your network is user PCs, that’s the only data that’s going to be encrypted. Where organizations get exploited by this, is that their user network is either directly connected to their business network, or there is just one flat network structure. This means if a user is attacked (typical attack vectors are emailing infected office documents or PDFs, but others do include more advanced attack vectors like the aforementioned Pulse VPN vulnerability) the ransomware can go after file servers and domain controllers and start encrypting all of your files.

Assume Breach

One of the most important approaches to this, is that in order to build your network structures appropriately, you need to assume that your user network is going to be breached. It’s what Microsoft does with Azure, and it’s what you should do on your network as well. This means you need to do a few things:

  • Segment your network
  • Separate accounts, and maybe even a separate domain for your all of your servers
  • Disconnected backups, you can encrypt a safe full of tapes
  • Frequent restore testing
  • Basically, if you can connect to a file or database server from your corporate laptop without multi-factor auth, changing credentials, and/or network, you’re at the mercy of your dumbest user to not launch an infected Word file

All of these controls make our jobs harder, and it sucks, but it’s what we have to do. The one benefit of modern technology is that the cloud makes all these things approachable to even the smallest of businesses. I can build a segmented network in about 20 min in Azure, whereas it would take me a few hours to find the the right equipment at the right price point from a networking vendor if you were trying to do it on-premises.

I’m Just the DBA, What Can I Do?

I get it, and I’ve been there–you’re the DBA at a company where IT isn’t exactly the priority. There’s a few approaches you can take–the first is the most self-serving for me, is to engage a consulting firm like DCAC. While you know what you are talking about, your management sees you as a worker bee, and may not listen to your complaints about having an open WiFi network that your regulatory agency can logon to and see the public file servers (yes, this actually happened to me). However, when they are paying a consultant $ALot/hr they tend to be more receptive of taking advice, even if it’s the same thing the DBA has been talking about for years.

Another approach is to hire a penetration testing firm–many large organizations are required to do this by regulation, and they do a good job of identifying vulnerabilities in a firm.

Finally, and the easiest thing to execute is to PATCH YOUR SHIT. This is where you can assist the most–you can patch Windows and SQL Server (or Linux if that’s your bag), and you can work with the sysadmin teams in your organization to ensure that all OS, application, and mouse software is patched regularly.  It’s not perfect, and it doesn’t fix all of your problem, but it’s a start. Also, test your god damned restores, it’s not enough to have backups if you don’t know that they work.

 

 

PowerShell for Parsing DiskSpd Results to CSV

We have a customer who is moving platforms, and as part of this, I’ve been tasked with testing a lot of various storage configurations. This means lots of utilization of DiskSpd, which is a disk performance benchmarking tool from Microsoft. We could argue about the value of synthetic disk benchmarks, but they are good for testing a variety of disk configurations with a standardized tool for comparing results. It also has the benefit of add the runtime configuration into the results file. So as long as you have your results file, you can know what parameters you ran the test with. (You do have to document your disk configuration–we are using the name of our output file for this).

muscles food
Photo by Elle Hughes on Pexels.com

Anyway, I have a bunch of these files, and I needed to get the data into Excel. Since I was too lazy to figure out how to parse a text file in C#, my first thought was to use some combination of sed, awk, and grep in a bash shell. I reached out to my friend Anthony Nocentino (b|t) about his thoughts on the best way to do this, and he immediately said PowerShell.

When I asked about how to do things I wanted to do with specific bash commands, he mentioned the fact that I could use bash statements that supported standard input and output in PowerShell. The linked blog shows how to do this in Windows, however I wrote all of this code in PowerShell natively on my MacBook Pro.

$path='$InsertYourPathHere'

foreach ($file in $files)

{

$content = get-content $file

$command= ($content)|select -first 1 -skip 1

$results= ($content)|grep total -m1|sed 's/"|"/","/g'|sed 's/"total:"//g'




$results= $results.split(",")

$Output = New-Object -TypeName PSObject -Property @{

FileName = $File.Name

Command = $Command

TotalBytes = $Results[0].Trim()

TotalIOs = $results[1].Trim()

MiBperSec = $results[2].Trim()

IOPs = $results[3].Trim()

AvgLatency = $results[4].Trim()

LatStdDev = $results[5].Trim()}| Select-Object FileName,Command,TotalBytes, TotalIOs, MiBperSec, IOPs,AvgLatency,LatStdDev

$Output|Export-CSV results.csv -Append
}
As you can see, I’m passing output in my $Results variable to a grep to give me the first match of the word “total” and then using sed to do a couple of find and replace commands to make parsing the file a little bit easier. After I’ve done all that, I split the array into a comma delimited set of results, and output it to a CSV file. This allows you to grab the results, with headers and open then in your favorite spreadsheet software. For posterity, the code is available at in our GitHub repo here.

Understanding Your Options for Running SQL Server on Azure

I was at Microsoft Ignite last week, and I spent the better of part of 3 days working in the “SQL Server” booth. I put SQL Server in quotes, because in 2019 that term is stretched. Our booth covered SQL Server 2019, Arc (which you should read about because it’s really cool, but is the subject of another post), SQL on Edge, SQL Server Big Data Clusters, Azure SQL Database, and Azure SQL Database Managed Instance, and a few other things that I’m too tired to type. And that doesn’t even include Microsoft Synapse (which is the artist formerly known as Azure SQL Data Warehouse)

image_thumb.png

My big takeaway from working the booth, was how most of the customers I spoke to, didn’t understanding the differences between the various platform options. In fact, I think I took the following question over 100 times: “We’re currently on-premises, but moving to the cloud, can you explain the difference between Azure SQL Database, Managed Instance, and SQL Server in a VM?” The fact that we had to answer that so much, isn’t a failing on the part of the customers, as it is Microsoft and us MVPs not communicating the message effectively. So, in order to do my part, I’m going to write a post (I also wrote a Redmond column last month, but this post will go into some more detail). I’m not going to talk about pricing in detail in this post, as it’s effectively the same as the two services.

First Things First—Do you Know What PaaS Means?

Azure SQL Database and Managed Instance are both known as Platform as a Service (PaaS) offerings. This means you the DBA will manage a lot less than you typically would on either an on-premises SQL Server, or even a VM running SQL Server in Azure on the Infrastructure as a Service (IaaS) offering. If you run one of the PaaS offerings, Microsoft is responsible for the following tasks:

  • Backup of your database and transaction logs
  • High Availability within your Azure region, with a financial guarantee (the number of 9s depends on config, but it’s pretty solid)
  • Installation, configuration, and patching of SQL Server software
  • Installation, configuration, and patching of Windows server
  • Consistency checks of your database
  • Performance monitoring of your database (This is configured for you, you still have to react to the numbers)

What tasks does that leave you to do?

  • Configuring disaster recovery across Azure regions—while this is just a few clicks, you get to convince your boss to pay for it, and it’s not turned on by default
  • Performance tuning—getting utilization down by query optimization can save your company money. It’s a quantifiable part of your job now
  • Network architecture—you need to be able to make sure your virtual networks in Azure can all talk to your database services, and in a secure fashion
  • User Management and Security (though there are some security benefits in PaaS)
  • Helping your business derive value from the data you are storing in the database

Ok, Tell Me About Cloud Offerings

I’m going to start out by talking about running SQL Server in an Azure VM. Why would you choose to do this, when the aforementioned PaaS services do all of this magic for you?  The main reason I can think of is vendor support. Your business runs some high quality ISV software that requires you to remain on SQL Server 2012 CU1 (just kidding, you know it’s actually SQL 2005 RTM, and Windows 2003), and you can only run this mission critical conference room projection software on that version of SQL Server.

The other reasons why you might choose to run SQL in a VM besides vendor support (which is common) is that the size of your database doesn’t work well with the current limits in Azure PaaS (which is 8 TB in Managed Instance, and effectively unlimited in Azure SQL Database Hyperscale), or that you are willing to trade off some availability for reduced cost of running in a standalone VM.

What do I mean by costs? While PaaS offerings include high availability, that’s something Microsoft has to include in your pricing (otherwise, their shareholders, including me, would murder them), so compared to a VM that is standalone, the PaaS offering will commonly be more expensive. To get a better cost estimate, you should really multiply your IaaS costs by 3. (Also, VMs have pretty good availability—it can be enough, especially for departmental apps).

Azure SQL Database/Azure SQL Database Hyperscale

This is truly one of the OG Azure offerings, dating back to 2009/10. It is really designed to be a standalone database offering. Think of an app, that needs a single database, and wants minimal effort around spinning that up, and you have the right answer for Azure SQL Database. The cost for getting started is also really cheap at $5/month for a basic DB (it can be way, way more than that, the P15 database is $15k USD/month, and there’s a new M-series offering that will be even more). Effectively, an Azure SQL Database is a single user database running in a SQL Server instance somewhere in the cloud. There’s a feature called elastic pools that confuses this a little bit.

Elastic pools are a group of Azure SQL Databases that share the same resources. This is effectively a SQL Server instance somewhere in the cloud that hosts multiple user databases. The elastic pool feature is mostly designed for applications where each customer gets a copy of the same database, and workload is mixed at different times of the day, so resources are utilized most effectively.

Azure SQL Database has some tools called elastic query, elastic transactions, and elastic jobs, that allow you to perform actions across multiple databases (elastic is a bit of a misnomer here, there’s no direct connection to elastic pools), but in general cross-database queries using classical three part naming database.schema.object do not work in Azure SQL Database, so if you need to joins or complex queries across databases, this probably isn’t your cloud product.

Way back when Azure SQL Database was size limited to 5 GB, then ultimately 4 TB, until the hyperscale architecture was developed. Hyperscale is a completely model for physical database infrastructure design (read more about it here, it’s really cool) and should be able to scale to near petabyte scale. If you’ve got more than that much data in a single database, we really need to talk. The one challenge with SQL DB and database size, is that you can’t restore to it, you have to use a database export and load data using a BACPAC, so it can be very slow for very large databases. My guidance here is that if you have a very large on-premises database that you’d like to move, Managed Instance may be a better choice, as the migration path is easier. If you are building a new application in the cloud, Azure SQL Database is frequently the best choices.

There are a few other limitations of on-premises features that Azure SQL Database doesn’t support. Those are as follows:

  • Common Language Runtime (CLR)
  • Change Data Capture
  • Cross-Database Transactions
  • Database Mail
  • Filestream
  • Linked Servers* (sort of—you can connect to external data sources)
  • Replication Source (you can publish to, but not from an Azure SQL DB)
  • Resource Governor
  • Service Broker
  • SQL Server Agent

Of those features, the biggest one is SQL Server Agent. There are a number of different ways to schedule jobs to run in Azure, the most common being Azure Automation, which allows for PowerShell or Python. You can also have an Agent job on a SQL VM running full-blown SQL Server, though that seems less practical.

I mentioned new application development as a target for Azure SQL Database. Since, SQL DB allows for your database to be a replication target, it can also be a good use case for reporting, particularly for field users who are afar from the home office. You can leverage either geo-replication, or just have multiple replication targets, and allow field users to report against them.

From a network perspective, your SQL Database is on the internet, with a firewall in front of it (either at the server or the database level). You can also elect to only allow traffic from a Virtual Network within Azure, however your database is still on shared networking infrastructure (albeit secured).

Azure SQL Database Managed Instance

This service came much later, being announced at Build in 2017, and going GA last year. Managed Instance was specifically designed to make it easier to move from on-premises SQL Server to a PaaS services. One of the initial differences was that Managed Instances uses a virtual core-based pricing model (Azure SQL Database still offers a legacy DTU based model), so you pay a price per core, depending on your tier.

The two tiers of Managed Instances (and vCore based Azure SQL Database) are general purpose and business critical. The two key differences between these tiers are performance and high availability models—general purpose (GP) uses networked SSDs and will generally have IO latency in the 5-10 ms range. If you are using GP, you need to read this blog post on how to optimally configure your storage, as other it can be really painful.  GP also uses the underlying storage model for its high availability. Business critical (BC) uses the local SSD that’s in each server host for storage, which is lower latency than a network call, and effectively uses an AlwaysOn Availability Group with four replicas for its availability model. You can also query the secondary replica on BC, by using the application intent=read only connection syntax in the SQL client.

In terms of migration, you can restore a backup (from Azure Blob Storage using the RESTORE FROM URL syntax) into Managed Instance. For a more real-time migration strategy, you can use the Data Migration Services tool which will allow for an almost zero-downtime migration. There’s been talk of allowing log shipping, or at least “log shipping” (via an API), but last I checked (and I’ll check again) you had to use DMS.

Features in MI are much closer to on-premises SQL Server. The big gap closers between MI and SQL DB are that the SQL Agent is there, CDC is there, and cross-database queries can be performed. The following list is the features that are included in Managed Instance and not in SQL DB:

  • Copy-only backup
  • CDC
  • CLR
  • Cross-Database Queries/Transactions
  • DBMail
  • Linked Servers (Only to MI and other SQL Servers)
  • Resource Governor
  • Service Broker (within instance)

The gaps to on-premises are as follows:

  • R Services
  • SQL Server Analysis/Integration* (there’s a way to do this with Azure Data Factory, but it costs money)/Reporting Services
  • Filestream

There are a few other differences, but I went for the ones that I see as most commonly affecting customers. One thing to note for both MI and Azure SQL DB is that they rely on Azure Active Directory and not on-premises Active Directory to provide authentication. I hadn’t seen this as a big gap with my customers, however it came up a few times at the booth last week, and Microsoft is researching options in this space.

The other limitation Managed Instance currently faces, is that there is a maximum of 8 TB database total size currently. The Managed Instance team is in the process of implementing hyperscale, and this limit will go away when that happens, which will likely be sometime next year.

So, I’ve almost written 2000 words on this. So therefore it needs a tl;dr

Azure SQL Database—Use It If

  • You are building a new cloud based application
  • You want to build a reporting option for field users
  • You are building a customer facing data platform and you want simplified networking and auth

Azure SQL Database Managed Instance—Use it if

  • You are migrating 3rd party ISV apps to a PaaS model
  • You want a PaaS model, but don’t want to change any of your code
  • Your app depends on SQL Agent, or cross-database queries

SQL Server in a VM—Use It If

  • you are stuck on an old version of SQL Server
  • You have to use on-prem Active Directory for authentication
  • 99.9% availability is good enough for your application
  • You are one of those people who run SSAS/SSIS/SSRS/R Services/RDBMS/DQS/MDS all on the same server to maximize every penny of your licensing
  • You are the person who uses Filestream

 

 

Colorado DOT Post Mortem–Don’t Do Stupid #$%^

A good policy in your technology career, or life in general, is to, to quote a former president, “don’t do stupid shit”. If we look historically at major IT outages or data breaches, there’s always some universal tenants–some piece of critical infrastructure was left completely unsecured, or someone put the private keys on Github. In fact, just last week I wrote my column for Redmond Mag, on some common patterns, mostly specific to SQL Server, but applicable to many other types of applications. I wanted to call it Data Breach Bingo, but I didn’t enough space in my column for 25 vulnerabilities.

Stupid GIF - Find & Share on GIPHY

While we’re here, it’s important that when thinking about security, you think about the basics first. The GRU operatives working for the Russian government likely aren’t targeting your data–it’s some bot network looking for a blank SA password running on a SQL Server that’s got port 1433 open to the internet. If you are on twitter, I highly recommend following Swift on Security a rather serious parody account (of Taylor Swift) that focuses on good security practices. That twitter account also hosts an excellent website called DecentSecurity.com that talks about the basics of how you should be securing your personal and business environments. While you might not be protected from the GRU, you can avoid most of the other “hackxors” in the world that are really just dumb bots.

The reason why I wrote this post, was that my colleague Meagan Longoria (b|t), shared with me a post-mortem from a ransomware attack on the Department of Transportation in the state of Colorado. I would really like to applaud the state for putting out this document, even though it doesn’t paint the organization in the best light. I’m going to use this space to talk about some of the things highlighted in the report.

A virtual server was created on February 18, 2018. The virtual server was directly connected into the Colorado Department of Transportation (CDOT) network, as if it was a local on premise system. The virtual server instance also had an internet address and did not have OIT’s standardized security controls in place.”

Ok, this sounds pretty much like they deployed a VM to either Azure or AWS, and they had a network connection back into their on-premises network. This in and of itself is not problematic, but not applying security policy in the provider is a problem. Both Azure and Amazon have standard security templates and policies that can be configured and applied at deployment.

“The account utilized to establish the connection into the CDOT network was a domain administrator account – this is the highest level privileged account, and means that 1) the account cannot be disabled for too many failed login accounts, and 2) it provides the highest level of access to the agency domain controllers (gatekeepers for all access to everything in the department).”

There’s a lot going on here. I read this as that they joined the server to the Active Directory domain, and used a domain admin account to perform the join. Neither of these are problematic. The next sentence is wrong–domain admin accounts can absolutely be disabled, however this indicates to me, that the department had the “never disable this login” checkbox check. Bad move. Also, in an ideal world, you would never have a domain admin log into member servers, so its password was never in kernel memory, but that’s not how this (or many) hacks work.

“Later, OIT was informed by the vendor that when an external IP address is requested, the vendor automatically opens the Remote Desktop protocol to the internet. The Remote Desktop protocol is how this attack was initiated.”

This is my biggest complaint about both Amazon and Microsoft Azure, especially in the timeframe of this attack, both services defaulted to having a public IP address and didn’t actively discourage public ports from being opened. While it’s a totally bad practice that should be enforced by policy, up until a few months ago on the Azure side, it was very easy to do.

“An attacker discovered this system available on the internet, broke into the Administrator account using approximately 40,000 password guesses until the account was compromised. From there, the attacker was able to access CDOT’s environment as the domain administrator, installing and activating the ransomware attack.”

This is a statement that will sound somewhat impressive to lay people. 40,000 password guesses sounds like a whole lot, however for an automated script that would take seconds. This means they either had approximately 3 character passwords (I’m guessing they didn’t), or the domain administrator was using a common dictionary based password. You can use add-ons to Active Directory to limit the types of passwords users can use, and while this may be painful for normal users, it is absolutely critical for high privilege users like DBAs or System Admins. They should also have separate user accounts for logging in as domain admin, and multi-factor authentication configured.

This is also the part where I would usually say you should have network segmentation configured to block the malware from spreading, but if someone pwns your domain controllers all the segmentation in the world won’t help that much. An attacker could even go as far to install their malware via group policy. Good luck cleaning that one up.

The state of Colorado had a really robust backup solution, and some good network practices (smart firewalls) that were able to limit the spread of the malware. So they were able to recover from this attack relatively quickly. It’s important to worry about big things, but doing all of the little things right and most importantly not doing stupid things will go a long way to securing your environment.

 

 

Azure SQL Database Price and Performance vs Amazon RDS

Yesterday, GigaOm published a benchmark of Azure SQL Database as compared to Amazon’s RDS service. It’s an interesting test case that tries to compare the performance of these platform as a service database offerings. One of the many challenges of this kind of a study is that the product offerings are not exactly analogous. However, GigaOm was able to build a test case where they used similar sized offerings as shown in the image below.

Screen Shot 2019-10-15 at 3.17.40 PM

GigaOm derived from the TPC-E workload, which is a blended OLTP workload which involves both read-only and update transactions. In the raw performance numbers, SQL DB was a little bit better in terms of overall throughput, but where Azure really shines in comparison to AWS is pricing.

In terms of raw pricing, there is a significant difference between Azure SQL DB and AWS pricing for these two instances. Azure is $25,000/month less, however it’s not quite an apples to apples comparison, as Azure SQL DB is single database service as opposed to RDS which acts more like Managed Instance, or on-premises SQL Server. If you use use the elastic pool functionality in Azure SQL DB which is the same cost as single database SQL DB, you get a better comparison.

Screen Shot 2019-10-15 at 3.54.24 PM

Where the real benefits come in, are that Microsoft allows you to bring on-premises licenses to Azure SQL Database, and the fact the Microsoft offers a much lower cost for 3 year reservations. (Which you can know pay for on a month to month basis). This is a big advantage that Microsoft has over Amazon is that they control the licensing for SQL Server. You can argue that this isn’t fair, but it’s Microsoft’s ball game, so they get to make the rules.

There will always be an inherent feature advantage to being on Azure SQL DB over AWS RDS, as Microsoft owns the code and can roll it out faster. While AWS is a big customer, and will be quick to roll out new updates and versions, they are still subject to Microsoft’s external release cycle whereas Azure is not. In my opinion, if you are building a new PaaS solution based on SQL Server, your best approach is to run on Azure SQL DB.