Building a Data as a Service Platform on Azure SQL Database

One of the benefits of cloud computing is flexibility and scale—I don’t need to procure hardware or licenses as you get new customers. This flexibility and platform as a service offerings like Azure SQL Database allow a lot of flexibility in what independent software vendors or companies selling access can provide to their customers. However, there is a lot of work and thought that goes into it. We have had success with building out these solutions with customers at DCAC, so in this post, I’ll cover at high level some of the architectural tenants we have implemented.

Authentication and Costing

The cloud has the benefit of providing detailed billing information, so you know exactly what everything cots. The downside to this is that the database provided is very granular and detailed and can be challenging to breakdown. There are a couple of options here—you can create a new subscription for each of your customers which means you will have a single bill for each customer, or you can place each of your customers into their own resource, and use tags to identity which customer is associated with that resource group. The tags are in your Azure bill and this allows you to break down your bill by each customer. While the subscription model in cleaner in terms of billing, however it adds additional complexity to the deployment model and ultimately doesn’t scale.

The other thing you need to think about is authenticating users and security. Fortunately, Microsoft has built a solution for this with Azure Active Directory, however you still need to think about this. Let’s assume your company is called Contoso, and your AAD domain is contoso.com. Assuming you are using AAD for your own business’s users, you don’t want to include your customers in that same AAD. The best approach to this is to create a new Azure Active Directory tenant for your customer facing resources—in this case called cust.contoso.com. You would then add all of the required accounts from contoso.com to cust.contoso.com in order to manage the customer tenant. You may also need to create a few accounts in the target tenant, as there are a couple of Azure operations that require an admin from home tenant.

black cassette tape on top of red and yellow surface
Photo by Stas Knop on Pexels.com

Deployment of Resources

One of the things you need to think about is what happens when you onboard a new customer. This can mean creating a new resource group, a logical SQL Server, and a database. In our case, it also means enabling a firewall rule, and enabling performance data collection for the database, and a number of other configuration items. There are a few ways you can do this—you can use an Azure Resource Manager (ARM) template, which contains all of your resource information, which is a good approach that I would typically recommend. In my case, there were some things that I couldn’t do in the ARM template, so I resorted to using PowerShell and Azure Automation to perform deployments. Currently our deployment is semi-manual as someone manually enters the parameters into the Azure Automation runbook, but it could be easily converted to be driven by an Azure Logic App, or a function.

Deployment of Data and Data Structures

When you are dealing with multiple databases across many customers, you desperately want to avoid schema drift that can happen.  This means having a single project for all of your databases. If you have to add a one-off table for a customer, you should still include it in all of your databases. If you are pushing data into your tables (as opposed the data being entered by the application or users) you should drive that process from a central table (more to come about this later).

Where this gets dicey is with indexes, as you have may have some indexes that are needed for specific customer queries. In general, I say the overhead on write performance of having some additional indexes is worth the potential benefit on reads. How you manage this is going to depend on the number of customer databases you are managing—if you are you have ten databases, you might be able to manage each databases indexes by themselves. However, as you scale to a larger number of databases, you aren’t going to be able to manage this by hand, Azure SQL can add and drop indexes it sees fit, which can help with this, but isn’t a complete solution.

Hub Database and Performance Data Warehoue

Even if you aren’t using a hub and spoke model for deploying your data, having a centralized data repository for metadata about your client databases. One of the things that is a common task is collecting performance data across your entire environment. While you can use Azure SQL Diagnostics to capture a whole lot of performance information in your environment, with one of our clients we’ve taken a more comprehensive approach combining the performance data from Log Analytics, Audit data that also goes to Log Analytics, and the Query Store data from each database. While log analytics contains data from the Query Store, there was some additional metadata that we wanted to capture that we could only get from the Query Store directly. We use Azure Data Factory packages (which were built by my co-worker Meagan Longoria (b|t) to a SQL Database that serves as a data warehouse for that data. I’ve even built some xQuery to do some parsing of execution plans, to identity which tables are most frequently queried. You may not need this level of performance granularity, but it is a talk you should have very early in your design phase. You can also use a 3rd party vendor tool for this—but the costs may not scale if your environment grows to be very large. I’m going to do a webinar on that in a month or so–I need to work it out the details, but stay tuned.

Final Things

You want to have the ability to quickly do something across your environment, so having some PowerShell that can loop through all of your databases is really powerful. This code allows you to make configuration changes across your environment, or if you use dbatools or invoke-sqlcmd to run a query everywhere. You also probably need to get pretty comfortable with Azure PowerShell, as you don’t want to have to change something in the Azure Portal across 30+ databases.

A Viable Future for PASS

I have received feedback that some folks think I just want to burn PASS down, or that I don’t want a for profit company involved with a community organization. Neither of those things are remotely what I’m thinking—I’ve only been loud and writing about it here, because I want PASS to survive, which is going to be near impossible with a loss of its main revenue source (in-person PASS Summit) and its expenses (C&C) which haven’t dropped nearly enough in the face of the aforementioned revenue loss. What do I see as a future for PASS?

Summit

Virtual Summit is going to happen in 2020, it’s also probably going to lose money. It’s effectively a sunk cost at this point, so I’m not going to waste any time talking about that. In 2021, PASS has a tough decision to make—large international conferences are unlikely to be a thing until 2022, when the covid vaccine has been broadly distributed. Planning a virtual conference in 2021 is risky as well, given that most of the competition is free. I think doing a low cost (and lower overhead) smaller scale event using a much cheaper platform like Microsoft Teams or even GoToWebinar would be a good small bet, without much risk. I also think a small conference the size of the old SQL Rally (a few hundred people and run in a hotel, not a conference center) could be viable for Q4 of 2021.

The reason for doing this would be an effort to try to keep the fundamental networking aspect of PASS going, while reducing financial risk. The original SQL Rally was a community organized event—by keeping it small, you not only reduce costs, but you also reduce the time to plan, which allows you to have a better assessment of the pandemic situation. PASS could also think about leveraging larger SQL Saturdays like Atlanta and Dallas, amongst others to be candidates for a Rally, as these events have community organizers who are very experienced at running larger scale events.

The Managing Organization

I’ve said what I’m going to say about C&C, but it’s very clear that PASS as an organization is untenable with its current cashflow situation. This means costs need to be drastically cut wherever possible. PASS won’t be in the business of planning large in person conferences until 2022, and therefore doesn’t require a large event management firm dedicated to its management. I would recommend hiring a full-time executive director (yes, I know I said we need to reduce spending) to manage the organization and manage vendor relationships. C&C currently has a seat on the on the PASS exec board with a title of Executive Director, which is a conflict of interest, and I would propose ending that immediately. The Executive Director role needs to be someone who understands both data and analytics and building communities. Finding this person will be a challenge, but I believe they are out there. I would also move to stop using the custom developed platforms PASS is using and move to using Software as a Service platforms where possible. Sessionize is probably the most obvious solution here, but there are others.

The Role of the Board

As I was reading the by-laws and guidance for the PASS BoD, I came across this paragraph.

Role of PASS Board Members

PASS Board members are the fiduciaries who steer the organization towards a sustainable future by adopting sound, ethical, and legal governance and financial management policies, as well as by making sure the organization has adequate resources to advance its mission. They are responsible for the short- and long-term strategic direction of the PASS organization.

Beyond this, what PASS needs from the Board is perspective and expertise. Perspective and expertise as a data professional, PASS community member, from their region, or from drawing on other relevant experience (i.e. speaker, business leader, other Board experience).

What PASS does not need from the Board is tactical execution or day to day management of organizational activities.

The challenge is to do this while also providing focus for the organization. Focus to drive the evolution of the organization and to prevent distraction from impeding momentum.

“What PASS does not need from the Board is tactical execution or day to day management of organizational activities”—I can’t imagine running a SQL Saturday and completely outsourcing everything to a third party—I feel the same way about our community organization, especially in this time of crisis. I think this is completely wrong, and the main reason why PASS is in the situation it is right now. The Board of Directors needs to take an active role in managing the organization, period. We, as a community organization are in a situation where the organization might go bankrupt and die, and while this is largely due to a force majeure (the pandemic), it is also due to decisions made that are in the interest of the managing firm, and not the community. When I was heavily involved in running my PASS chapter, I had a board member, who’s portfolio was chapters, who took an active interest in the chapters, and their needs and worked his tail off to make things better. Unfortunately, he was not re-elected, and things never got any better from there.

The board needs to take an active role—while the day to day operations of the org, would be managed by the executive director, and eventually some administrative staff, in a time where the organization needs to be austere with its spending, being on the board should require you to get your hands dirty. I would also try to involve the community—there are lots of projects, that over time could have been open sourced, but there has always been push back from the board. Given the success of community managed projects like DBATools, I see no reason to not engage volunteers who are willing to help, especially on community facing projects.

I’ve been involved in PASS for nearly 15 years now—I want it to survive, because having a centralized community organization is a good thing and makes the community stronger. The central organization also provides governance and helps with sponsors. PASS cannot survive financially in its current state, and we as a community must band together to help it survive and foster the changes to make it a sustainable organization. While we are doing that, we can make it a better community org.

How to Save Money with Your Azure Virtual Machine Demos

Sorry for spammy, SEO title, we got to pay the bills. Sometimes it’s fun to just write some code to solve problems, and not think about the world’s larger problems for a few hours. Last week, I learned something new from a client—that you can change managed disks in Azure in Premium Storage to Standard Storage if the VM connected to those disks is powered off. This is a cost savings of nearly $100/month per month per disk (assuming 1 TB disks) and since the SQL Server image in the marketplace uses two 1 TB disks, this can save you a good amount of money from your Azure spend.

silver and gold coins
Photo by Pixabay on Pexels.com

This code will loop through each resource group in your subscription and look for resource groups with the Tag “Use:Demo”. If you aren’t familiar with Tags in Azure (or AWS) they are a metadata application layer that allows you to more easily identity and filter resources. The most common use case is to make your Azure bill easier to navigate. However, you can also incorporate tagging into your management operations, as you see in this example.

After it identifies each resource group with that tag, it will then look for VMs in those resource groups, and power them down if they are running, and then migrate each premium disk on the VM to Standard. I have similar code in Github to do the opposite, however, I haven’t glammed it up to support the tagging functionality yet.

This code is available at DCAC’s GitHub here. To take this a step further you could create an Azure Automation runbook to deploy this code. In order to do that you would need to import the modules Az.Resources and Az.Compute into your automation account.  

Why I’m Not Speaking at PASS Summit and You Shouldn’t Either

If you saw any of my angry tweets last night, it’s not just because the Saints weren’t good. I’ve been writing a lot about PASS and C&C the for-profit event management firm that runs virtually all of PASS’ operation. I personally think C&C imposes a financial burden on the Microsoft Data Platform community that will ultimately kill PASS. I want to run for the board of directors (once you agree to run for the board you have to agree not to speak or write poorly of PASS, but it doesn’t say anything about C&C) to try to return PASS to being a community oriented organization. PASS has been a great organization and the connections I have made have been a great foundation for the career success that myself and many others have achieved. The reason I agreed to speak at PASS Summit this year was to help enable the organization’s survival, despite my lasting frustrations with C&C.

PASS had a couple of options for doing PASS Summit virtually, and they’ve failed at every turn. The best option would have been to do a super low-cost virtual summit, using Microsoft Teams, and tried to keep the pricing at level the average DBA could pay out of pocket. This big reduction in revenue is bad for C&C’s business, but frankly given that there likely won’t be a big conference until 2022, C&C should be operating on an austerity budget, since PASS’ main income source has been severely constrained.

The Burden on Speakers

I’ve lost count of how many webinars I’ve done this year—it’s been a lot. 98% have been live—in some cases with some really dicey demos, like I did at Eight KB.  Doing a webinar or a user group meeting is a decent amount of effort, but no more than doing in-person session. However, PASS Summit has asked speakers to record their sessions—recording a session takes me at a minimum 2-3x the amount of time to execute than to simply deliver a session. Setting up cameras, lighting, and doing small amounts of editing all add up to considerable amounts of time. Additionally, you have to render the video and then upload it to the site. I say this with experience, because I just recorded three sessions for SQLBits.

You might ask why I was willing to record sessions for Bits, but not PASS Summit. That’s a good question—SQLBits is truly a community run event, for the community, by the community. Sure it can be rough around the edges, but it’s a great event, and in general the conference is great to work with. Additionally, SQLBits always pays for speaker’s hotel rooms, it’s nominal in the cost of an international trip, but it’s something that makes you feel wanted as a speaker and I remember it. PASS Summit, unless you have a preconference session (precon) doesn’t offer any renumeration at all to speakers, nor have they ever. All that being said, after recording my Bits sessions, I said “I’m never doing that for free again”. In addition after doing the work for your session, you have to show up and do Q&A for your session.

Why You Shouldn’t Speak at PASS Summit (and TimeZones are hard)

PASS has asked speakers to record their sessions just six weeks before the conference. These recordings will only ever be seen by paid attendees of the conference, and possibly PASS Pro members. Speakers received a highly confusing email informing them of this late last night, which included the time and date of their sessions. It wasn’t clear if “live sessions” still needed to be recorded—which is even more confusing to speakers. Speakers weren’t consulted about the need to record their sessions when the revised speaker agreement went out. This burden has been imposed at the last minute. I haven’t gotten any official communications since July when I received my speaker code. It’s not fair to impose this on speaker’s this late in the process, especially when you aren’t compensating them for their time.  Also, this is insignificant, but we were supposed to get the slide template in July, and it’s still not in my inbox. I’ve have no communications from PASS about Summit since July.

Precons are all starting in the speaker’s native time zone, which will limit the audience for many precon speakers—European speakers are starting at early a 3 AM EST, which means basically no one in North America (PASS’ main market). Most regular conference sessions are 8-5 PM EST—which probably is a decent compromise, but still greatly limits the west coast in the morning and other regions of the world like Asia. There are some evening and overnight sessions but those are extremely limited compared to EST business hour sessions. All schedules for a worldwide event are going to be a compromise, but I feel like some creativity could have been used to better support a virtual audience. For example, Ignite has replays of all its sessions available for broader time zone coverage.  As far as I know, no speakers were consulted during the making of this schedule.  

Doesn’t This Hurt the Community?

A successful PASS Summit is a good thing for the community. However, with the poor management of C&C, the marketing for the event has been poor, and with most other events either going to free or freemium models, PASS continues to charge a premium for the event. The platform that PASS is using hasn’t been demoed to speakers or attendees, to show how it would have value over a free conference like EightKB or Ignite.

I’m not going to speak at PASS Summit. I’m going to record my session, and put it on YouTube, so everyone can watch the session. And I’ll do a live Q&A to talk about it—it’s a really cool session about a project I’ve worked on to aggregate query store data across multiple databases. I challenge other speakers to follow me—the conference is so bad and so expensive, because C&C is trying to prop itself up on the back of the community. C&C needs to go away before we can move forward. I was frustrated before, but this Summit fiasco has really pushed me over the top.

PASS is Untenable in its Current State

I’ve written a couple of recent posts that were extremely critical of PASS and more so C&C which is the company that manages PASS. Someone who read my last post pointed out that I probably didn’t emphasize the budget numbers I talked about enough. So let’s talk about that. I grabbed the most recent PASS financial data which was published in March 2020.

Income7,538,163 
Expenses (total) 6,589,810
Expenses (Non-Summit) 3,275,430
Cash on Hand (effective)2,477,769 

Summit Revenue Projections

This is a challenge, because obviously I don’t have the actual costs for PASS is spending per attendee for virtual summit. Many years ago, my rough understanding of Summit cost per attendee was that it was $400. So, for the purposes of my math, I’m going to estimate that virtual summit will cost $100/attendee (I suspect the actual cost is closer to $250 given that is what chapter leaders are being charged. Per the June, meeting minutes C&C has agreed to reduce their expenses by $500,000. It’s not clear where that comes in, but let’s just say that drops non-Summit expenses to $2.7MM.

If we have 2000 attendees of virtual PASS Summit in 2020 which I think may be generous estimate, all paying for the whole conference.

Income1,998,000 
Expenses (Summit) 200,000
Expenses (Non-Summit) 2,700,000
Summit Subtotal (902,000)
Cash on Hand (effective)1,575,769 

If we have 1000 attendees doing the All in One Bundle and 500 attendees doing the 3 day conference.

Income1,298,500 
Expenses (Summit) 150,000
Expenses (Non-Summit) 2,700,000
Summit Subtotal (1,551,500)
Cash on Hand (effective)926,269 

Given my experience and the current economy, I think my above projections are fairly optimistic. Let’s say my cost projections of $100 per person are too low, and the costs are $250 per person. Also, let’s say only 500 people sign up for the full conference and 500 register for the three day conference.

Income799,000 
Expenses (Summit) 250,000
Expenses (Non-Summit) 2,700,000
Summit Subtotal (2,151,000)
Cash on Hand (effective)326,769 

PASS doesn’t officially release attendance numbers, they say that 4000 people attended PASS Summit last year, which sounds really great. However, conference math is a factor here—many conferences count precons separate from the individual conference attendance. If you attended two precons, and the conference you would count as three conference attendees. In a best-case scenario where you had 4000 attendees, top line revenue would still drop by $3.5 million (or 54%) and fixed operating expenses are only down $500k (or 16%). That is as they say in business school is an untenable situation.

This is just focusing on the short term—2021 will face similar challenges. It is very possible that by November 2021, in-person conferences will be back (this assumes a vaccine in place, but Goldman Sachs does, and I trust them when it comes to money). However, I don’t see attendance quickly returning to pre-pandemic levels until 2022 or 2023, which means PASS will likely continue dipping into its cash on hand until reaching bankruptcy.

Sure, PASS Pro is a second potential revenue source, but it faces many challenges in getting of the ground and adding enough revenue to have any substantial impact. In addition to the fact that it has many community speakers feel alienated by the conversion of their Summit sessions or networking events into paid of profit sessions.

One final note, in FY2020 PASS spent approximately five percent ($385K) of its revenue on community activities. That number was substantially beefed up by a Microsoft SQL Server 2019 upgrade effort and to the total community spend has been dropping over time.  For a point of reference C&C charged pass $525k for IT services in 2019. It’s important to remember that PASS exists to serve the broader SQL community and not a for-profit firm.

Distributed Availability Groups–The Good, the Bad, and the Ugly

I’m writing this post because I’ve been mired in configuring a bunch of distributed availability groups for a client, and while the feature is technically solid, the lack of tooling can make it a challenge to implement. Specifically, I’m implementing these distributed AGs (please don’t use the term DAG as you’ll piss off Allan Hirt, but more importantly its used in Microsoft Exchange High Availability, so it’s taken) in Azure which adds a couple of additional changes because of the need for load balancers. You should note this feature is Enterprise Edition only, and is only available starting with SQL Server 2016.

First off why would you implement a distributed availability group? If you want to implement a disaster recovery (DR) strategy in addition to a high availability strategy with your AG. There’s limited benefit of implementing this architecture if you don’t have at least four nodes in your design. But consider the following design:

In this scenario, there are two data centers with four nodes. All of the servers are in a single Windows Server Failover Cluster. There are three streams from the transaction log on the primary which is called SQL1. This means we are consuming double the network bandwidth to send data to our secondary site in New York. With the distributed availability group, each location gets its own Windows Cluster and availability group, and we only send one transaction log stream across the WAN.

This benefits a few scenarios–the most obvious being, it’s a really easy way to do a SQL Server upgrade or migration. While Windows clustering now supports rolling OS upgrades, its much easier to do a distributed AG, because the clusters are independent of each other and have no impact on each other. The second is that its very easy to fail back and forth between these distributed availability groups. You have also reduced by half the amount of WAN bandwidth you need for your configuration, which can represent a major cost savings in a cloud world or even on-premises.

If you think this is cool, you are with smart people–this is the technology Microsoft has implemented for geo-replication in Azure SQL Database. The architecture is really robust, and if you think about the tens of thousands of databases in Azure, you can imagine all of the bandwidth saved.

That’s Cool How Do I Start?

I really should have put this tl;dr at the start of this post. You’ll need this page at docs.microsoft.com. There’s no GUI. Which kind of sucks, because you can make typos in your T-SQL and the commands can still potentially validate and give you non-helpful error messages (ask me how I know). But in a short list here is what you do:

  1. Create your first WSFC on your first two nodes
  2. Create an Availability Group on your first WSFC, and create a listener. Add your database(s) to this AG
  3. If you are in Azure, ensure your ILB has port 5022 (or whatever port you use for your AG endpoint) open
  4. Create your second WSFC on the remaining two nodes
  5. Create the second AG and listener, without a database. In case you really want to use the AG wizard, add a database to your AG, and then remove it. (Or quit being lazy and use T-SQL to create your AG)
  6. Create the distributed AG on the first AG/WSFC
  7. Add the second AG to your distributed Availability Group

This seems pretty trivial and when all of your network connections work (you need to be able to hit 1433 and 5022 from the listener’s IP address across both clusters). However, SQL Server has extremely limited documentation and management around this feature. The one troubleshooting hint I will provide is to always check the error log of the primary node of the second AG (this is known as the global forwarder), which is where you will see any errors. The most common error I’ve seen is:

A connection timeout has occurred while attempting to establish a connection to availability replica ‘dist_ag_00’ with id [508AF404-ED2F-0A82-1B8A-EA23BA0EA27B]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance

Sadly, that error is a bit of a catch all. In doing this work, I had a typo in my listener name on the secondary and SQL Server still processed the command. (So there’s no validation that everything and connect when you create the distributed AG). I’m sure in Azure this is all done via API calls, which means humans aren’t involved, but since there is no real GUI support for distributed AGs, you have to type code. So type carefully.

Overall, I think distributed availability groups are a nice solution for high available database servers, but without more tooling there won’t be broader adoption, and in turn, there won’t be more investment from Microsoft in tooling. So it’s a bit of a catch 22. Hopefully this post helps you understand this feature, where it might be used, and how to troubleshoot it.

How Do I Make MySQL Highly Available with Open Source Tools?

This is clickbait post title, sorry. You are here now. The correct answer is that you should purchase MySQL as a database service from your favorite cloud provider (Google, Amazon, and Azure all offer prebuilt database as a service offerings) because they have gone through the trouble of making their solution highly available. I’ll speak to Azure, because I’m familiar with the platform–Microsoft doesn’t employ MySQL High Availability per se, however both the storage and the VM are highly available. If there is a disk failure, the service is not impacted, and if there is a failure in the compute tier, a new VM is provisioned.

My second recommendation, if you really, really want to build your own thing is to build a Windows Server Failover Cluster, and used shared storage. Make the MySQL service a clustered resource, and assign a floating IP to the service that will fail with it. (Yes, I know you have to pay M$ for the Windows Server licenses).

Why shouldn’t you use an open source solution to make your MySQL database highly available? First let’s look at a picture of a common MySQL high availability architecture:

If we think about what we need a clustering solution to provide it comes down to a few things:

  • Providing a floating IP address to allow connection to the primary
  • Check the health of the database services and initiate a failover in the event one of them isn’t healthy
  • Executing a clean database failover and providing the ability to easily fail back
  • Ensuring the stability of the overall cluster, maintaining quorum, and avoiding split brain scenarios
  • If you are using a shared storage scenario, the clustering solution needs to manage the shared storage to coordinate failover with services.

If you are using SQL Server with Windows Server Failover Clustering, the cluster service takes care of all of the above, and more. When you look to do this on Linux for MySQL that there about 10 different sets of components you can use to make the service highly available. At the basis of all of these solutions is MySQL replication it’s pretty trivial transactional replication. MySQL’s replication service is fairly robust, and the GTID implementation is pretty solid.

The problem is that the rest of the components are all mix and match. You could use Haproxy to float the IP address, but there’s no way to do a smart health check of the database. It simply does a port connection test. Which means, if your primary goes away, and then comes back without some advanced configuration your floating IP is going to fail back to the original primary whether it’s actually the primary in your replication pair. This is but one example–you are going to end up with 3 or 4 different components to execute each of these functions, and congratulations you are in charge of a complex distributed system that you are responsible for administering for the rest of your life.

But Joey, Facebook/Google/Pick You Other Favorite online megacorp run MySQL and they support it with 5 9s. Ok, sure, I don’t disagree with this–and as databases, MySQL and PostgreSQL are generally ok. But look around at your engineering staff–wait do you have engineering staff? If you don’t have a few people who have both really good Linux SA skills and DBA skills, you are going to be pretty quickly in situation where support is a challenge.

Finally, consider if you need an HA solution. Are you running on a virtual machine? As long as your infrastructure is solid, that probably gets you to about 99.5% availability on a bad week. What you absolutely want to avoid is the Windows 2000 paradigm, which is where your high availability solution incurs more downtime than a standalone system.

Adding an Azure Active Directory User to Azure SQL Database with Automation

My teammate Meagan (b|t) messaged me in Teams yesterday afternoon to say “Joey, the client created a new database using your automated process, and my ETL user (which is a AAD user) didn’t get created, can you fix it?” Well, after a quick perusal of emails I remembered that I had the asked the client to add the create user process to their initial population process which hadn’t occurred yet. The reason why I did this was that creating an Azure Active Directory user in an Azure SQL Database from Azure Automation was painful and maybe not even possible. However, I pinged Rob Sewell (b|t) about the best way to do that. This sounded not that bad to do, but I managed to hit land mines around every corner.

The First Problem

Azure Automation is mostly PowerShell only—there is a Python option, but I’ve never used it, and I’m not going to start now. The trick with PowerShell is that it’s great for things you have to do to Azure Resources, it’s far less good for things you have to do inside of databases (think creating a user). I typically use the invoke-sqlcmd cmdlet, however we have a chicken and egg problem—I can’t create an AAD user from a SQL connection (a connection made using a SQL login) and invoke-sqlcmd doesn’t support authenticating with AAD. The Azure Automation service allows you to import 3rd party soluitons from the PowerShell gallery, so you can use DBATools which I did here. Rob has an excellent blog post here that describes this process.

$appid = (Get-AzKeyVaultSecret -vaultName “beard-key-vault” -name “service-principal-guid”).SecretValueText

$Clientsecret = (Get-AzKeyVaultSecret -vaultName “beard-key-vault” -name “service-principal-secret”).SecretValue

$credential = New-Object System.Management.Automation.PSCredential ($appid,$Clientsecret)

$tenantid =  (Get-AzKeyVaultSecret -vaultName “beard-key-vault” -name “Sewells-Tenant-Id”).SecretValueText

$AzureSQL = Connect-DbaInstance -SqlInstance $SqlInstance -Database $databasename  -SqlCredential $credential -Tenant $tenantid  -TrustServerCertificate

Invoke-DbaQuery -SqlInstance $AzureSql -Database $DatabaseName  -SqlCredential $credential -Query $query

The code, which I happily stole from Rob’s blog allows me to connect as a service principal. To easily facilitate this I made my automation account part of my DBA group (the Azure AD Admin group for the Azure SQL Server), which you can assign without this ridiculous process. I threatened to add Meagan’s ETL user to that group, but she was going to out me on Twitter.

After running that code I could connect to Automation run as account to my Azure SQL DB, but my query was failing with the following error:

I’m logged as a service principal there—hence the weird GUID, you can see that I have basically every privilege in SQL Server, but I can’t create a user from an external provider. PowerShell (and automation) say that the user could not be resolved.

The Next Land Mine

So I DMed Rob, and asked him WTF? It turns out for this to work, you need to create a service principal for your Azure SQL Database. If you aren’t familiar with service principals they are analogous to service accounts in an on-premises world. Doing this was the easiest step in the process—I have a PoSH script to hit every server in my subscription, and it was trivial to add a service principal as well as add to my database runbook. However, that was just the first part.

You have to give the service principal the “directory reader” permission in Azure AD, and the effective way to do this with Automation is to assign that privilege to a group. Well, it turns out adding AAD roles to group is a relatively new feature (it’s in preview) and more importantly requires P1 or P2 Azure Active Directory which has a per user cost. Which meant I needed to get approval. After much chatter on a DCAC teams channel I discovered since this feature was not user-assigned (e.g. it’s enabled for the entire AAD tenant once it’s enabled) I only had to have one AAD license in the tenant (I assigned it to Meagan). Once that was in place, I could grant the directory permission to the SQL Server Service Principals group.

Are We Done Yet?

I should have noticed in the documentation provided by the SQL team assigning groups with PowerShell, that there was a reference to the preview PowerShell module for Azure AD (I did, but I didn’t think it mattered because I was just assigning a user to a group). So I thought I had everything wired up when I started getting the following error:

Add-AzureADGroupMember -ObjectId $g.objectid  -RefObjectId $miIdentity.ObjectId

Add-AzureADGroupMember: Error occurred while executing AddGroupMember

Code: Authorization_RequestDenied

Message: Insufficient privileges to complete the operation.

RequestId: 62301512-a42f-4d00-a798-4e1bd4062df8

DateTimeStamp: Tue, 25 Aug 2020 13:14:08 GMT

HttpStatusCode: Forbidden

HttpStatusDescription: Forbidden

HttpResponseStatus: Completed

I have Global Admin and Subscription owner in the two environments I was testing in, so clearly this wasn’t a permissions issue. To further prove that point, I was able to add the service accounts I had created to the group through the Azure portal. So after writing like three emails with my further discoveries to the Azure MVP distribution list (I could add the service principal to a regular group, just not one with a role assigned to it). I went back and decided to play with that preview module.

Everything up to this point is me being an idiot, but I’m going to yell at Microsoft for a second. I couldn’t install the azureadpreview on my Mac because its dependent on Winforms—I thought Az modules were all supposed to be built on .NET core. I also couldn’t get it to run in cloud shell, which may be related to the Winforms thing, or not.

I do have a Windows VM, so I installed the module there, and it successfully worked on the DCAC tenant.  I went to Azure Automation to install the module. If you’ve never imported a module into Azure Automation, you should know that the portal notification about a module import being complete is meaningless, because Azure Automation is a lying liar who lies.

Look on the modules page and hit refresh a lot. It usually takes 1-2 minutes for a module to import. I messaged Kerry in Teams.

And what do you know? It worked. I was concerned and about ready to murder someone, but it worked. Rob’s code is really helpful and he covers key vault in his post. I did have some open GUIDs in some of my code pictures, it’s cool those aren’t sensitive. However, you should store all your secrets in Key Vault as it’s fast and easy.

The other thing I learned in this process is that you can now make a guest user you Azure Active Directory Admin (this means I could make joey@dcac.com or jdanton1@yahoo.com an admin in the joeydantoni.com tenant), which you weren’t able to do before. Prior to this you could use a group and add a guest user to that group as I mentioned above. (Note: you should still use a group and not a single user as it’s best practice)

What Do You Want Your Professional Association to Be?

I wanted to write a post about the things I think PASS does well and are foundational to the future with or without the PASS organization. Before I address that I want to speak to an ethical matter related to the new PASS Pro service offering. Every year for PASS Summit, speakers sign an agreement (which if you sign, print it out—they don’t email you a copy, nor can you access prior year’s agreements in the speaker portal) that includes giving PASS the right to the recording of your session. This has always been there, so that PASS could sell online or USB access to session recordings. A few years ago, PASS started selling this access to people who didn’t attend the PASS Summit, which I was fine with—the context of that is still the conference.

With the PASS Pro offering, PASS took all of the sessions that community and Microsoft speakers did in Summit 2019 and bundled it as part of a paywalled offering. Speakers were not asked for their permission to reuse this content as part of a for-profit subscription model, nor were they compensated. Which in both cases is probably legal? However, I strongly question the ethics of this—when a community event morphs into a paid service, most speakers (myself included) want a cut of take, which is only fair as it’s a lot of work to speak at a conference. PASS will say that “all of this money is going back to the community”, which is frankly bullshit. First, the platform they are hosting PASS Pro on is not free, and then there is whatever C&C is charging PASS for this project, which is likely significant. As I’ve mentioned before, the community typically sees about 5% of PASS’ revenue, and the revenue numbers for PASS Pro make that an absolute pittance, while potentially alienating community members. This makes me think hard about speaking at future PASS events.

Stop Bitching, What Did You Really Want to Write About?

The SQL Server community has been part of my life for last 12 plus years—its why I have my current job, have most of my professional network, and why I’ve spoken all over the world. PASS motto is “Connect, Share, Learn” and I think it is a good one and should be the goal of any community organizations.  Let’s talk about the things that make up that community:

  • Conferences
  • SQL Saturday Events
  • Virtual Chapters
  • User Groups
PASS Summit 2011, First Timers Event

Conferences

Having a centralized community organization like PASS has some benefits. The biggest benefit is the ability to have PASS Summit, which is a mostly independent community organized conference that allows us to have a deep dive on Microsoft Data Platform (sorry to use a marketing phrase, but it’s a good description of what the topics covered are) over a week’s time. If you haven’t attended a vendor conference like Microsoft Ignite, it’s a very different experience compared to PASS Summit. The sessions are more marketing driven, and while you have excellent access to Microsoft engineering staff, you aren’t going to have precons on deep dives into topics like high availability and query optimization, and you won’t see nearly as many community speakers offering real-world perspective.

Having a big conference is a good thing, and it’s something PASS has done fairly well and would be a major loss if PASS were to fail. Running a big conference is expensive and hard, and would likely only happen with vendor support, or over the period of several years of growth from a smaller conference to eventually become a big conference. This is a US-centric view, as SQLBits and Data Platform Summit in India have been running pretty large scale conferences for several years.

SQL Saturday Events

SQL Saturdays are awesome—they provide a great starting point for a new speaker. There’s even less pressure than a user group, because your attendees are typically divided between a few tracks. I also have a fondness in my heart for them, as they are where I grew my career and gained organization skills by running a successful event for several years. However, they don’t need PASS to be a success. PASS in recent years has deemphasized SQL Saturday because of a flawed notion that they were cannibalizing Summit attendance (this may be true on the margins, to which I would say, make Summit better). While having a centralized infrastructure for these events is nice, the website is an absolute trainwreck, and should probably be refactored. Numerous SQL Saturdays in Europe have become independent events, without issue—sites like Sessionize make it really easy to run an event. I foresee a little bit of a lull, but these events can run well without a centralized org—just look at Code Camp.

User Groups

Even moreso than SQL Saturdays I do not see the loss of a centralized org having any impact on user group meetings. In recent years, the only service PASS has offered user groups is web hosting (and some funding tied to Summit discount codes, a program which has gotten more limited over time). User Groups, by their nature are fairly independent entities. I look forward to speaking at any UG post covid—having free pizza and meeting a bunch of data pros is always a good time.

Virtual Chapters

As you may have noted in reading my blog, I tend to be cynical about PASS C&C. However, in researching for this post, I noted that (as of Monday) PASS Virtual Chapters have provided 173 hours of free training in 2020. Gathering data is a bit arduous, so I didn’t capture previous year’s data, but I was really impressed at the amount of training in one place. There are other virtual events (especially this year) but having a central place and organization is real benefit to an organization and the SQL community.

What Does All of This Mean?

This means PASS’ core competencies are running Summit, and Virtual Chapters. (wow, I feel like that was the most MBA thing I’ve ever written). Any organization going forward needs to have those structures in place. Summit provides a revenue source, that can allow everything else to proceed.  It also means trying to provide a paid online training and certification service lies outside of its competencies and shouldn’t continue.

However, the challenge PASS faces (and is ultimately tied to its survival) is that Summit is not going to have the same level of revenue for at least the next two years, and expenses haven’t dropped significantly. In recent years I’ve heard a common refrain from board members—PASS Summit was “too big for a small event management company, and not big enough for a large event company”. Since PASS Summit is going to be smaller at least in the medium term, perhaps now is the time for change to save the organization.

I’d welcome comments on anything I missed or your thoughts on what your most meaningful community experiences have been.

Managing Statistics in Azure SQL Database Serverless

One of the only things platform as a service databases like Azure SQL Database do not do for you is actively manage column and index statistics. While backups, patches, and even integrity checks are built into the platform services, managing your metadata is not. Since Azure SQL Database lacks a SQL Sever Agent for scheduling, you have to use an alternative for job scheduling. One of the more common approaches is to use Azure Automation runbooks which are PowerShell (or Python) jobs that are run from within Azure. I like to call Automation “cron for Azure”. My good friend Erin Stellato (b|t) just recorded a YouTube video that walks through the process of setting up an Azure Automation runbook for stats maintenance. There are a lot of steps to get your runbook up and running, and if I had a very small environment, I might just recommend using an Azure Logic App, with a schedule trigger–for a handful of databases, you could be up and running in a few minutes.

silver and gold coins
Photo by Pixabay on Pexels.com

However in my case I’m working on a customer subscription, and I need to have my solution automatically deal with all of the databases in their subscription. My customer has a data as a service model, and has been using the “serverless” tier of Azure SQL Database in order to reduce their compute costs. The serverless tier is effectively auto-close (and auto-scale) for Azure SQL Database, which means the first time you attempt to connect to the gateway (the gateway is what you actually connect to–yourdatabase.database.windows.net is a public IP that in turn connects to your actual database(s). When you connect to that gateway, Azure will begin the process of turning on your database–this can take up to 45 seconds, which means the first connection will most likely fail.

I was running into failures, so I looked at my code, and made some minor modifications.

{
$svr=(get-AzSqlServer -ResourceGroupName $rgs).ServerName
#write-host 'rg:'$rgs
foreach ($svrs in $svr)
{
$sql=$svrs+'.database.windows.net'
write-host $sql
$d =Get-azSqlDatabase -ResourceGroupName $rgs -ServerName ` $svrs|Where-Object {$_.DatabaseName -NE 'master'}
$db = $d.DatabaseName
$servicetier = $d.RequestedServiceObjectiveName
$size=$d.maxSizeBytes

if ([string]::IsNullOrEmpty($db) -eq $false)
{
$Params = @{
'ServerInstance' = "$sql";
'Database' = "$db";
'Username' = "$adminlogin";
'Password' = "$pwd";
'Query' = "$query";
}

# write-host $Params.Query
Invoke-Sqlcmd @params

Start-Sleep -Seconds 45
$query = "EXEC dbo.usp_AdaptiveIndexDefrag;"


$Params = @{
'ServerInstance' = "$sql";
'Database' = "$db";
'Username' = "$adminlogin";
'Password' = "$pwd";
'Query' = "$query";
}
#


Invoke-Sqlcmd @params

}

}

In this case, I’ve made my first query select @@servername, but you can have any query issued–it’s going to fail anyway. I then add a sleep command in PowerShell, I’m using 45 seconds, but you could probably drop that to 30 seconds if you have a tight time window. I’m then using the AdaptiveIndexDefrag script that Pedro Lopes (t) has written.