Getting Started with Log Replay Service for Azure SQL Managed Instance

Recently, I’ve started on a project where we are migrating a customer to Azure SQL Managed Instance, which now supports a few different migration paths. You can simply backup and restore from a database backup, but you can’t apply a log or differential backup to that database. You can also use the Database Migration Service, but that requires a lot of infrastructure and Azure configuration. The log replay service, or LRS, is the functional equivalent of log shipping to your new managed instance database. While log shipping is a well known methodology for both database migrations or disaster recovery. However, the implementation is a little different–let’s talk about how it works.

flock of birds flying
Photo by Shakhawat Shaon on

First, You Must backup to URL

In order to use the log replay service, you need to take a backup of your database to Azure Blob Storage. Alternatively, you can push on-premises backups using AzCopy or uploading files into the portal or Storage Explorer. But let’s focus on backup to URL. The first thing you need to do is create a credential in SQL Server. While there are other methods of creating a credential, I have had absolutely the best luck, using this PowerShell code which produces a CREATE CREDENTIAL statement to execute on your SQL instance.

CREATE CREDENTIAL [] WITH IDENTITY='Shared Access Signature', SECRET='sv=2020-08-04&si=sql1&sr=c&sig=Dw0uE13l2347%2FCNJgQbm5sHhM9ZtDts9F8SMi5Re1ASUw%3D'

The SAS token you create is created on the container level–you will also need to create a policy (that code from MS will create everything for you including the storage account). Next execute the backup.

EXECUTE dbo.DatabaseBackup@Databases = 'DB1',@URL = '',@BackupType = 'LOG',@Compress = 'Y',@Checksum = 'Y',@Verify = 'Y',@DirectoryStructure = '{DatabaseName}',@NumberOfFiles = 8,@MinBackupSizeForMultipleFiles = 10240,@logtotable=y

I’m using Ola Hallengren’s code to do my backups, but you can also do this manually.


I highly recommend using Ola’s code to perform these backups and all of your backups. A couple of things you should note–you have to use the checksum option, and with Ola’s backups I’m using the Directory Structure of {DatabaseName}, The reason for this is that LRS does not support nested directories, like Ola’s code would default to, nor does it support back ups stored in the root of the container. (You should note the backup example on the LRS doc page does just this).

Docs also say you need a full, differential, and log backup to do a complete migration, however in my testing I’ve been successful with just a log backup in addition to a full. To complete your restore process, you will need to have the name of your last log backup. You can use this code to generate the powershell code to initiate the migration process.


DECLARE @RGName VARCHAR(256) = 'ResourceGroupName'

DECLARE @MIName VARCHAR(256) = 'ManagedInstanceName'
DECLARE database_cursor CURSOR FOR

SELECT name FROM MASTER.sys.sysdatabases where name not in ('master','msdb','model','tempdb')

create table #migration (command nvarchar(max))

OPEN database_cursor FETCH NEXT FROM database_cursor INTO @DBName

SELECT @lastlogbackup = REVERSE(SUBSTRING(REVERSE(physical_device_name ),1,CHARINDEX('/',REVERSE(physical_device_name )) - 1))FROM msdb.dbo.backupset bs INNER JOIN msdb.dbo.backupmediafamily bmf ON [bs].[media_set_id] = [bmf].[media_set_id]WHERE (bs.database_name = @DBName OR @DBName IS NULL) AND bs.type = 'L'ORDER BY bs.backup_start_date DESC;

insert into #migration select ‘Start-AzSqlInstanceDatabaseLogReplay -ResourceGroupName “'+@RgName+'" -InstanceName "'+@MiName'" ` -Name "'+@dbName+'" `-LastBackupName "'+@lastlogbackup+'" -Collation "SQL_Latin1_General_CP1_CI_AS" -StorageContainerUri "'+@dbname+'/" -AutoCompleteRestore -StorageContainerSasToken "sp=rl&st=2021-12-14T21:25:07Z&se=2021-12-25T05:25:07Z&spr=https&sv=2020-08-04&sr=c&sig=h8%2FimcTh%2BnlCQ3WodgTShWERT7yA38HfBQFOLwvEoiqw%3D"'
FETCH NEXT FROM database_cursor INTO @DBName


CLOSE database_cursor DEALLOCATE database_cursor

select * from #migration

For a small database, this only takes a few minutes. If your PowerShell execution goes out to lunch, and your database is stuck in restoring, there is probably something wrong with the way your formatted your backup string. This also manifests itself in the form of http 204 error in the log of your managed instance.


Sending the notification action: NotifyRestoreProgress

Date                       12/9/2021 8:09:42 AM

Log                         SQL Server (Current – 12/9/2021 4:27:00 AM)

Source                  spid101


Http code after sending the notification for action NotifyRestoreProgress: 204

What’s annoying is that the PowerShell doesn’t attempt to do any validation of requirements. Hoepfully, this will change in the future because this was a challenging problem to troubleshoot.


A Post-mortem for PASS, and What That NDA* Issue Was

Editor’s Note: I wrote this last December, and went back and forth on publishing. As I see some of the poor leadership patterns from PASS, creeping into new offshoots of the SQL Community, I decided to publish it today.

I am not writing this to dance on the grave of PASS or throw pitchforks at anyone. Just like the rest of the community, I am disappointed at the end of PASS, as the community worked really hard to make the organization great. However, a lot of mistakes were made over the course of many years, and I think it’s important to talk about those mistakes, and how we as a community can avoid them in future endeavours. The biggest failure was not protecting community assets like SQL Saturday and the Virtual Chapters that were built by the sweat of the community. But first I’d like to talk about the “NDA” violation the came up while I was running for the board.

The reason why I’m including this story into my post-mortem is that the board NDA was a major organizational problem and was abused by senior leadership at C&C, in order to in my opinion hide a lot of bad practices that took place within the organization. Nearly every interaction with PASS required an NDA—for a community organization this notion is ridiculous. There are small set of activities that actually require an NDA—typically harassment cases and personnel matters like salaries. Other than that, all activities and budgets should be fully transparent. One of the biggest issues that happened in PASS’s failure was that members at large on the board of directors did not have a clear picture of PASS’s finances until it was much too late. This stonewalling was not for a lack of effort—even to members of the board PASS did not operate transparently. This behavior continued until the very end of the organization.

About that NDA Violation

I’d also like the clarify the difference between whistleblowing activity and NDA violations. Whistleblowers report things when they see their leadership acting with maleficence, and whistleblower activity nearly always violates NDAs. Whistleblowers have exposed government, and corporate negligence and saved lives and costs.

Many people reached out to me after the posts I wrote last year. There was rightful and genuine concern for PASS—despite frustrations there was a lot of emotion for PASS, and more important our community has been largely structured around a centralized organization. As people reached to me, I learned fairly early about the plan to ask Microsoft for a very large bailout (on the order of millions of dollars), in an effort to save PASS. Part of this plan involved selling Microsoft a large number of PASS Pro memberships.

Given that PASS had no plan for a revenue stream in 2021, this funding project seemed absurd, and at PASS’ current burn rate they would require another bail out next year. I had numerous sources for this information, and while I won’t reveal the sources, they included community members and C&C employees. I did share this information with selected individuals at Microsoft as well, but I never revealed it publicly. I also shared information in confidence to a few selected individuals One of those people decided that was a violation and reported it to PASS exec. I have a couple of things to say about this—first of which is that in the application that prospective board members sign, there is no reference to an NDA, and I was not under contract with PASS. I’m not a lawyer, but I took a few semesters of business law, and if there is not an NDA in-place, there is no expectation of privacy. In the end, all of that didn’t matter, as PASS is now dead.

A Board Operating in the Dark

In my discussions with former board members, one topic that constantly came up was the complete lack of transparency C&C showed, particularly when budgets and spending were discussed. Multiple Board and even exec members reported to me that they gave explicit directives to C&C only to have those directives be ignored.

Much of the blame for the end of PASS will be laid onto the Covid-19 pandemic taking away the primary revenue source, in-person Summit. While this is the direct cause for PASS’ failure, the truth is that PASS has been on a downward slide for several years. In preparing to write this post, I spoke to board members across several generations of the board and deeply involved community members to gain insights into what happened. There are a number of things that have always concerned me about PASS—the lack of transparency the organization has always, always been the core problem.

C&C’s Conflicts of Interest

Why do I think there was such a culture of secrecy in the organization? I think it stems from the management organization, and its leadership. C&C always acted in the best interests of C&C, and not the best interests of PASS or the Microsoft Data community down to the final days of the organization. C&C was paid by PASS as a percentage of PASS’ spending—while this is somewhat common in event management circles, in organizational management circles it can lead to perverse outcomes, and likely what was ultimately responsible for PASS’ bankruptcy.

C&C had so many conflicts of interest I don’t even know where to begin talking about them. The CEO’s sister was responsible for the finances of both PASS and C&C, and C&C held a non-voting seat on the executive board. While many members of the board would say this was helpful as PASS and C&C operated in partnership, it is probably the most explicit and visible conflict of interest. The fact that the organization providing all of your services knows what your bank balance and income are problematic at best and can lead to poor outcomes. It also means your partner has no real interest in saving the organization money. C&C had no other clients other than PASS, which is likely a violation of IRS and Revenue Canada employee/contractor regulations.

I’ll share some specific examples, but what this practice led to was a singular focus on PASS Summit, because that’s where the spending (and revenue) is for the organization, and therefore C&C. This came at the expense of community events like user groups and SQL Saturday events. This also led to C&C influencing heavily decisions like spending a ton of money on a virtual Summit that was unlikely to be successful, or spending a lot of money on PASS Pro, which effectively had no hope of ever being profitable. While PASS’ noble mission was “Connect, Share, and Learn” in practice it became “whatever it takes to drive C&C’s revenue growth”.

Lost Opportunities

One of the challenges PASS faced well before Covid was lack of growth (and even a downturn) in the number of people attending PASS Summit. Throughout my time with PASS the board acknowledged this single source of revenue as a risk and made a few attempts at growing the community or diversifying revenue. One of these was the Business Analytics conference, which was initially driven by Microsoft, but could have evolved into something much bigger. As opposed to a technology only conference, business analytics can be marketed to wide swath of data professionals and not just technologists. The marketing for this conference was handled poorly, as C&C tried to market just as though as it was PASS Summit.

This led to PASS turning down the opportunity to run Power BI user groups, which in turn led to the successful Microsoft Business Analytics (MBAS) conference, which could have both increased revenues, and more importantly grown our data community.

Another story shared with me, was that Microsoft wanted to make PASS Summit a free event, in order to dramatically grow the size of the conference. While this is a bit of one-upmanship between software vendors as to who has the largest conference, this had to potential to exponentially grow the PASS community. Microsoft was willing to fund the revenue difference, but this was shot down by C&C, presumably because the initiative would have cost them money. This goes against PASS’s stated mission of Connect, Share, Learn, and is just one of myriad of bad acts perpetrated by C&C through the years that ultimately doomed PASS.

What Happened in 2020

There are countless examples of C&C’s actions that were detrimental to the broader community to try and drive Summit attendance and revenue, but I’d really like to focus on the actions that took place this year, which lead to the death of PASS. When the pandemic hit in February and March, it became very obvious to everyone that having large conferences was an unlikely proposition until a vaccine was in place, and even then, probably a year after that.

Assuming people are mostly vaccinated by the middle of 2021, that means the best hope for an organization who’s revenue was dependent on conference revenue would have been to move to an austerity budget which limited services to completely essential services, until a time, probably in 2022 when a large scale conference was a good bet. This was bad news for PASS, because its major revenue source was compromised until at least 2022. (Editor’s note: I wrote this last year, and it has mostly held true)

Even in lieu of an austerity budget, what PASS could have done in April 2020, was to move to protect the assets that the community built like SQL Saturday and the Virtual Chapters, and now were at risk due to multiple organizational failures. The executive board and C&C executed none of these options and kept the board members at large as well as the general membership in the dark as to how perilous the financial situation was.

There were additional poor decisions made in 2020. The Exec and C&C decided to go along with PASS Pro, a paid membership offering, that was built on a learning management system. While a noble concept, it was competing against mature services like Pluralsight and LinkedIn Learning. There was some Microsoft funding to this project (my understanding is around $300k USD), and several community members were paid (around 3-5K USD) to record videos for the service. While a lot of money was spent to build and launch the service, there was never hope of it being a significant revenue source for PASS.

Despite what C&C has always said about PASS having 300,000 members, based on better estimates, PASS likely had < 5000 “active” members and a maximum of 20-30,000 “passive” members who unwittingly signed up by attending a SQL Saturday or PASS Summit. As of November, PASS Pro had only sold 300 memberships (note some of these may have been corporate memberships which included up to six subscriptions). Even if we assume all 300 of those were corporate memberships (which I know not to be the case), that provides PASS $180,000 of revenue. While that is not nothing, for an organization with PASS’s expenses, it is a completely insignificant amount of revenue, and since PASS/C&C had no real plans for a content pipeline, it was unlikely to grow over time, instead being more likely to fade away so like many other PASS initiatives.

Which brings us to Virtual Summit. I don’t have a final accounting to know what this event cost, but after PASS’s bankruptcy we know it was responsible for the death of the organization. PASS was very slow to cancel in-person Summit, with the virtual event was not announced until June. My understanding is that part of this was because it was a challenge to get out of the contract with the Houston Convention Center, however it put PASS in place where lots of other (sponsor funded) conferences had gone virtual and more importantly to this discussion, FREE.

I’m of the opinion that C&C leadership took this opportunity to apply the death blow to PASS. Instead of moving into an austerity budget, and taking assistance for revenue loss from the Canadian government, PASS decided to go “all in” on the virtual conference using a very expensive virtual conference platform (that was ultimately derided by both speakers and attendees), that promised a “unique, interactive” conference, that ended up just being a really fancy wrapper for a Zoom meeting. This was defended in the December board meeting minutes, that in order to protect PASS’s cashflow, executing a virtual conference was a necessity.  

The pricing of the virtual conference was lower than in-person Summit, though my understanding was that C&C was pushing for significantly more expensive pricing than the final number. C&C sold the board on the notion that since people didn’t have to travel to an in-person event, they would have a potentially larger audience than at an in-person event. That assumption was flawed from the start, as it didn’t account for the pandemic recession, virtual meeting fatigue, and the fact that most of the other conferences PASS usually competes against, were FREE.

PASS weakly attempted to highlight how the conference was going to be different and more interactive than other conferences, but they never had a free trial event to build hype, and C&C was ineffective at marketing the virtual conference. During the entire summer and into early fall there were few emails or tweets about the conference, other than short video describing the platform.

I lay the responsibility for death of PASS on C&C largely, but secondarily on PASS’ executive leadership. In good times, some of the decisions made vis a vis C&C were poor, but they didn’t matter given the revenue streams. A microcosm of this is that PASS sent speaker shirts to 2020 Virtual Summit speakers. Anyone who’s ever run an event knows that the first two things you cut when your budget is not looking good, are speaker shirts and speaker’s dinner.

The decisions made by PASS in 2020, from launching PASS Pro, to betting the organization on a virtual conference, and finally the decision to ask Microsoft for a major bailout, with no significant plan forward, show a complete lack of leadership from anyone on exec, instead deferring all decisions to the deeply conflicted C&C organization, which had shown repeatedly they only cared about their own interests and not that of the broader data platform community. If you are waiting on a refund from PASS Summit, you will have to stand in line behind C&C, because they are getting paid first. Like always, Judy got her money before the community did. And that just fucking sucks.

Like much of the community, I am lamenting the loss of PASS, but I feel the to call out executive leadership, not just for their leadership failings (which are many), but for their passive aggressiveness and general rudeness to community members who cared about the organization. These actions not only took place against community members, the abusive behavior extended to other members at large of the board of directors.  The exec did everything they could to keep the rest of the board in the dark on the financial situation all the way to the bitter end of the organization. Board members were chastised for asking hard questions and kept in the dark before votes. There were never open and honest communications between the board members at large and the exec and C&C. While in normal times this meant C&C got resources that should have gone to the community, in this pandemic year, it meant the complete death of the organization without protecting any of its assets. This is not a recipe for a strong organization. That secretive behavior did nothing to help, and actively alienated many active and senior members of the community, who otherwise could have been strong allies to the PASS cause.

I am thankful for the existence of PASS, and the community won’t go away. I don’t want to necessarily focus on what’s next—yet. I think the best thing community members can do is keep running their user groups, and hopefully in the second half of 2021, we can start having data events on Saturday, whatever they are called. I think the best approach to a larger organization is to have a federation of larger events, so they can coordinate on things like scheduling, and have a unified code of conduct and diversity and inclusion policy. Having a centralized organization comes with a lot of expense, so starting small and building is the right approach. Andy Mallon has a good vision for this in his post here.

Passing the AZ-500 Exam–How I Prepared

A few months ago, I was asked to do a webinar on security for Microsoft Teams. In preparing for that webcast, I noticed a number of security features that required either Azure Active Directory P2, or Office 365 M5. In order to get either of those two features for free, DCAC needed to gain the silver security competency from Microsoft. In order to do this one of us, had to pass wither MS (the Office/Microsoft 365 version) or the AZ-500 security exam. Since, I was the one who wanted the features, I volunteered to take the exam.

The first time I took the exam, I didn’t do much preparation. I was pretty busy, and I recently taught some Azure Architecture training and I was hoping that would be enough to get me through the exam. That almost worked–I came pretty close to passing, and likely didn’t pass because I was frustrated with some questions and kind of blew through them. Pro-tip: don’t schedule an exam for Monday at 4 PM.

The second time I took the exam, I prepared a little more. Here were the key topics I focused on:

  • Azure Active Directory
  • Azure Sentinel
  • Azure Role Based Access Control (RBAC)
  • Azure Networking
  • Azure Key Vault

You can see the full exam outline here. I spend a lot of time working on most of these technologies in my daily job, so this was more refreshing my memory on the specifics of how each of them work. I also paid for the practice exam, which was helpful but likely not necessary, but it did get me more into the exam mindset.

The second time I took the exam, I didn’t find the questions as friendly to my knowledge, and generally thought the exam was harder. However, this time I passed the exam, fairly easily. Yay, a week later we had P2 AAD so I can now do cooler demos. My general recommendation to folks who want to take cert exams, is to just dive in and take them as it’s really the best way to study, and know the areas where you are weak.

A Lesson in DR, Azure Site Recovery, and Troubleshooting

I need to blog more. Stupid being busy. Anyway, last week, we were doing a small scale test for a customer, and it didn’t work the way we were expecting, and for one of the dumbest reasons I’ve ever seen. If you aren’t familiar with Azure Site Recovery it provides disk level replication for VMs, and allows you to bring on-premises VMs online in Azure, or in another Azure region, if you VMs are in Azure already. It’s not an ideal solution for busy SQL Server VMs with extremely low recovery point objectives, however, if you need a simple DR solution for a group of VMs, and can sustain around 30 minutes of data loss, it is cheap and easy. The other benefit that ASR provides, similar to VMware’s Site Recovery Manager, is the ability to do a test recovery in a bubble environment.

Our test environment was as shown in the image below:

In our case, due to configuration difference between subscriptions, we had to have our test domain controller in a different virtual network, and peered to our network. The DC was also in a different Azure region, which wasn’t a big deal because you can still peer across regions. I have the additional box around the test environment because it is not connected to the rest of the network.

You should note that for a real disaster recovery scenario, you are likely better off having a live domain controller in each region where you want to operate. However, when you are testing you cannot use a live domain controller, as objects and passwords could get out of sync. For this test we added a DC to our ASR config as noted.

The other caveat to doing testing is that you need a way to login to your test environment. Because your test network(s) are not connected to your other networks, unless you create a new VPN gateway, you likely have no connection to the test bubble. I don’t recommend creating a new VPN, which leaves you a couple of options:

  1. Create a jump host with a public IP address and create an allow list for specific IPs to connect to it
  2. Use Azure Bastion which provides an endpoint with allows you to RDP/SSH in a browser into your VMs in a secure fashion.

We decided to go with option 2–which led to weird ramifications.

If you are testing in this scenario you want to bring your domain controller online before failing over your SQL Server and app server(s), It’s not completely necessary, but it will remove some headaches and reboots down the road. After you failover you domain control, you need seize the FSMO roles, and make the domain controller a global catalog, if it is not. However, you need to login with domain credentials to do all of that first.

There is no guidance about how to login using Bastion in docs. However, after a bunch of stupid troubleshooting yesterday. I discovered that if you attempt to login using the standard domain credentials (e.g. contoso\joey), you will see the following error message.

Instead you have to login with the user principal name (UPN), or in my case We troubleshot this in a test environment yesterday and were confused when DNS worked and it seemed like most other AD functionality was in-place. After a couple of more attempts at failover (one of the benefits of of ASR is that it’s really easy to test failover multiple times).

Sadly, this behavior wasn’t documented anywhere that I could find, except in a forum post (which was on, so I guess that counts), but it would be really nice if the portal just showed the correct format for how to login with a domain account. Beyond that, ASR is pretty cool, and allows for easy DR testing.

Senior SQL Server DBA Job Interview Questions–2021

In my role as a consultant, it’s rare that I go on an interview anymore, though I occasionally get interviewed by a client, or interview potential DBAs for clients as part of my role. There are a number of these lists on the internet, but many of them are old and focus on trivia questions (there won’t be any questions on what’s clustered versus nonclustered index, but if you are interviewing for a senior role, you should know that anyway. I also like to focus on open ended questions, to gauge the depth of knowledge of the person I’m interviewing.

man and woman near table
Photo by fauxels on

Here are some sample questions, and explanations:

What does TempDB do in SQL Server? This is probably my all time favorite SQL Server question, because almost anyone can answer it correctly, but at the same time I can gauge someone’s knowledge of SQL Server from a 100 level to “I worked on the product team and wrote part of the tempdb code” depending on the answer. The basic right answer is “stores temp tables” and the senior dba right answer involves spills, isolation levels, and version stores.

What would you change on a default (e.g. next, next, next, finish) installation of SQL Server? The answer to this question has changed in recent years as the product team has implemented most of the best practices around TempDB, MaxDOP, and Max Memory into setup (bonus points if they know which version those things were implemented in (TempDB (2016), MaxDOP (2019), and Max Memory (2019)). Other correct answers include enabling IFI, optimize for ad hoc, changing file locations, changing default growth rates. (The really good senior dba answer, is that I used DBA tools or a T-SQL script to automate my setup based on the machine size).

I didn’t want to make this a full list of questions and answers, but I will include a number of questions. You can go find the answers, or ask for help in the comments.

“Microsoft has a number of Platform as a Service database options in Azure. Can you tell me what those options are and a good example of a use case for each of those options?”

“Tell me about the various options for encrypting data in SQL Server, and how would you use each of them?”

“What are some system waits you have observed, and how what tuning steps did you take based on those wait statistics?”

“Tell me some ways you might improve backup and restore performance?”

“What types of workloads could benefit from a columnstore index and what are the tradeoffs?”

“How would you configure storage for a SQL Server VM running in Azure”

“Can you walk though the high availability and disaster recovery options, and the benefits and tradeoffs of each of those?”

“What is your methodology for troubleshooting a query that suddenly decreased in performance?”

How would you diagnose a storage performance problem versus a poorly performing query?”

What are the possible causes of database corruption?”

You need to deploy a change to 3000 SQL Servers, tell me about your methodology?”

How do you manage performance metadata across a large number of servers to provide baseline performance information?”

This is just a start–feel free to add questions in the comments or ask for help.

Validating IP Addresses in PowerShell

I had a client ask me to write a PowerShell script (for an Azure Automation runbook) to automate the creation of firewall rules for an Azure SQL Database. As part of this process, I was planning on having to validate the IP addresses (a valid IP address is of the format x.x.x.x where x is an integer between 0 and 255) by having to write some code to split the IP address string into individual octets and figuring out if each value was a valid number. However, in doing research (searching Stack Overflow), I discovered PowerShell has a built-in IP address data type, which made my life way easier.

Since PowerShell has IPAddress as a data type, that means we can attempt to cast a string value as a data type.

If I type an invalid IP address I get the following result:

You’ll note I don’t get an error, I simply don’t get a result. For me to do something useful with this I do have to write a little more code to handle this IP address in my script.

try {

$s=$StartIP -as [ipaddress] -as [Bool] $e=$EndIP -as [ipaddress] -as [Bool]
if ($s -eq $false -or -$e -eq $false){write-error "Invalid IP Address"}

After casting my parameters as an IPAddress, I can then cast them as a boolean. A boolean value is a yes/no value that in PowerShell will return a $True/$False value. If the IP addresss submitted is a valid one, the result will be true. In my IF loop, I’m saying if either my start or end IP addresses are invalid to write an error stating such.

How Do You Shrink Your Azure SQL Database?

In the early days of Azure SQL Database (ne SQL Azure) we had size limits that were minuscule–I vaguely remember 5 and 10 GB, but it could have been smaller. Back then space was at a premium. In 2021, we have hyperscale that allows us to have an Azure SQL Database of up to 100 TB (or more) in size. However data size in the cloud means costs, so space management can matter, right?

gray and black galaxy wallpaper
Photo by Pixabay on

Let’s Talk About Your Log File

While many DBAs will never ever want to shrink a data file, sometimes your transaction log file can grow for external reasons. Typically these reasons are as follows:

  • Your log backups failed for a while and the log file grew
  • You couldn’t write to an availability group secondary for a while and your log file grew
  • You couldn’t write to a replication target for a while and your log file grew

You will note that I didn’t mention that “your log file grew because of a large index rebuild”. That’s because that is probably roughly (this is a really rough rule of thumb) how big your transaction log needs to be. But, anyway, we’re talking about Azure SQL Database, so you don’t need to worry about your transaction log file. Microsoft takes care of that for you: ‘Unlike data files, Azure SQL Database automatically shrinks log files since that operation does not impact database performance.’

What About My Data Files?

In an on-premises world, shrinking data files should come down to two scenarios:

  • You were creating a database file and you accidentally added an extra zero to the size field.
  • You weren’t using compression, and you compressed all of your larger tables and indexes and got a bunch of space back

The first one is easy–since SQL Server never wrote data to those pages, SHRINKDB will quickly reallocate them. The second one is much harder and depends on your data types and how busy your server is. In that case SHRINK may never complete–especially if you used LOB data types. Besides that SHRINK will fragment your data massively. So you should basically never run it.

Disk space is mostly cheap, so unless you are extremely storage constrained you should just live with empty space. In Azure SQL this is even more complex, because the ghost cleanup process, which cleans up unused dirty pages can take a long time to run. So it can be days or weeks before you can reclaim your space, if ever.

So Really, How Do I Shrink My Azure SQL Database?

You don’t. Space, even in the cloud is really cheap. If you really want to make the database smaller, you can extract the data into a new database, and then drop the original database. Congratulations, you just took downtime and took several hours of your life to save $30/month. That doc from Microsoft recommends (with a warning) enabling autoshrink on your databases. Don’t do that. Ever. It’s really, really bad for your performance. Relax and enjoy the fact that your don’t have to bug the SAN admin for space in the cloud.

Do You Even Tag, Bro?

Sorry for the clickbaity title. One of the things I hated most about working in corporate IT, was the persistent, never ending discussions around server naming standards. I once flew to Switzerland to spend 16 hours in a room full of dudes (it’s always a room full of dudes) discussing each of the 16 characters of out server naming standard should be. If I was billing for that kind of thing now, I’d charge $3000/hr (as in I would only do this for $3000/hr), as a server naming standard is probably one of the least important things an IT organization can spend its time on and there are much better solutions in modern infrastructure. If you aren’t familiar with tagging or labeling, you should be. Modern infrastructure, whether it be on-premises virtual machine hypervisor, public cloud platforms, or a container management system like Kubernetes, all support tagging of resources.

Tags by themselves aren’t particularly useful–they are simple key value pairs (e.g. Environment:Production, ServerType:SQL) other than they allow you to easily identify information about resources. Where they become really powerful is their integration with the control plane, also known as the API layer that powers modern software defined infrastructure. This means you can operate on resources based on their tag values. The code shown in the picture below illustrates this:

In this code sample, we are looking for all resource groups matching a tag with the name “Use” and the value of “Demo” and then within those resource groups, stops any running VMs. This is a really simple example, that I use in my Azure subscription. With this you can start to see the power of tagging. You can also, on some platforms (Azure, for sure, I didn’t check others) use policy to enforce the use of tags. For example, at DCAC, I implemented a policy that all new Azure resource groups are required to be tagged with client name, and date created, so we could more easily know if we could delete resources.

In addition to allowing for code-driven automation, when you are in a cloud world, tags make their way down to your actual bill, letting you understand which departments or applications are costing you the most money. This allows to take a multi-dimensional approach to breaking down your cloud billing–in a large organization without a good tagging strategy, diseccting your bill is going to be very challenging.

Tags are incredibly useful in identifying your infrastructure, and you can add a lot of them–in Azure you can have up to 15 tags. This gives you a rich collection of information about your resources that you can access programmatically. Tag early, tag often.

Dear Microsoft—Help Us Help You

One of the things that is so “fun” about working with Microsoft Azure, is the near constant rate of change in the platform. I did put fun in quotation marks, but I really mean it—if I had to work on the same things every day my job would become quite boring. Part of this constant rate of change, means there are always new products and features in the pipeline. While there are some architectural standards around availability and design, each Azure service has some its own unique nuances.

As MVPs and Microsoft Partners, the way we get new features and services to our customers typically goes down the following path:

  • A product group announces plans for a new feature or service, and we get excited about it
  • We might play around with the service in a test environment so we can confirm it does the thing we expected it to do
  • We then find or encounter a customer scenario which could benefit from new feature or service
  • We implement the service at a customer

Of course, given that we are talking about cloud computing, most services have direct costs that are associated with them. We expect cloud to cost money, but Microsoft is quite generous with cloud credits to MVPs and Partners to allow us to test and do proof of concepts.  However, because of cost models, some services are just untenable for MVPs and customers because of high cost of entry.

Photo by Alexander Mils on

StretchDB—A Case Study

When SQL Server 2016 was in preview, Microsoft introduced a feature called StretchDB. This feature allowed you to extend a table or tables to an Azure SQL Database. While the feature had limited functionality, conceptually it was cool. If you had cold data that wasn’t actively queried, and needed storage space, you could archive that cold data into the cloud for a relatively low cost, as initially you could choose whichever tier of Azure SQL DB you wanted—I frequently demo’ed this feature with a Basic SQL Database that cost $5/month. Sometime before SQL Server 2016 went generally available (GA), the decision was made at Microsoft that StretchDB was going to be platformed on a StretchDB SKU of Azure SQL Data Warehouse with a minimum cost of thousands of dollars of month (it used a premium SKU of SQL DW). This effectively killed the feature to the MVP community and frankly most customers—I haven’t seen any adoption of this feature, which also means there probably won’t be investments from Microsoft into the feature gap.

StretchDB isn’t the only service that I’ve encountered with this problem. Azure services that either have extremely high fixed costs or have highly unpredictable variable costs are really unattractive to customers.

What Should Microsoft Do?

Some services are inherently expensive—if you think about Azure SQL Database in the Business Critical tier, you have the functional equivalent of a three node availability group, with three copies of your data on local flash storage. That’s a solid amount of compute resources, and I can understand why it cost what it does. However, SQL Database also provides the Basic service tier at $5/month, which is nearly the functional equivalent of a $40,000/month 80 virtual core business critical database. (There are some minor functional differences and the obvious capacity differences) This allows developers and partners to get comfortable with the service at a low price point and grow their adoption over time.

I have a few basic architectural tenants that a lot of Azure services do, that I think should be across the board for all service offerings:

  • Ideally, have a low-cost SKU, that has enough compute power for demos and proof of concepts
  • If your service must have a very high fixed cost, ensure that the service has a pause functionality, so that the compute costs do not have to run 24/7
  • If your service has a high degree of variable costs, share internal (Microsoft use) customer scenarios and billing, so that customers can get a decent idea of what their costs are going to look like

Making cloud services cheaper to get started with will help overall adoption. When a new service is launched and I can’t tell a customer what their bill is going to be, or that the costs are going to be thousands of dollars a month just to try them out, adoption is going to be limited. I think adopting these tenants across Azure services would help adoption.

Redgate Bought PASS’s Assets—What Does This Mean?

Yesterday, you probably received an email from PASS and/or Redgate stating that PASS’s assets had been acquired from Redgate. For the health of the Microsoft data and analytics community this may seem like a win, but in reality, it creates many more questions. The first of which, is how was the auction held, and to whom did the proceeds go to? Did Summit ticket holders have their money refunded? Did PASS Pro members get refunds? The fact that a member of the PASS Exec board works for Redgate raises further questions about the fairness of the process.

Having a for-profit vendor having ownership of community assets, just continues the same problem the community had with C&C. In fact, it creates more conflicts—are other vendors going to want to sponsor “Redgate SQL Saturday Events”? Or “Redgate PASS Summit”? The complete lack of transparency around the auction process aside, having one vendor have ownership of the community assets is problematic. I would be more ok, if it was Microsoft, but would still prefer the community own its assets so that we are not at the mercy of a company. This isn’t a problem I have with Redgate—I’d have the same problem with any vendor.

What Should Redgate Do?

Here’s what I think should happen:

  • Redgate should help form a not-for-profit organization with an independent board and governance structure to hold and manage community assets
  • They can have a permanent seat on the board of this foundation, provided its charter allows other vendors on the board
  • The organization should freely license assets like SQL Saturday and training content
  • If someone wants to run Summit, that endeavour should remain separate from the community organization, and run either out of Redgate or a separate organization.

This is the only path forward for a successful community organization. There are a lot of models in technology communites for community/corporate partnerships like the Linux Foundation and the Cloud Native Computing Foundation. Otherwise, the organization is not a community, it’s just a bunch of marketing assets for a corporation.