I had a client ask me to write a PowerShell script (for an Azure Automation runbook) to automate the creation of firewall rules for an Azure SQL Database. As part of this process, I was planning on having to validate the IP addresses (a valid IP address is of the format x.x.x.x where x is an integer between 0 and 255) by having to write some code to split the IP address string into individual octets and figuring out if each value was a valid number. However, in doing research (searching Stack Overflow), I discovered PowerShell has a built-in IP address data type, which made my life way easier.
Since PowerShell has IPAddress as a data type, that means we can attempt to cast a string value as a data type.
If I type an invalid IP address I get the following result:
You’ll note I don’t get an error, I simply don’t get a result. For me to do something useful with this I do have to write a little more code to handle this IP address in my script.
After casting my parameters as an IPAddress, I can then cast them as a boolean. A boolean value is a yes/no value that in PowerShell will return a $True/$False value. If the IP addresss submitted is a valid one, the result will be true. In my IF loop, I’m saying if either my start or end IP addresses are invalid to write an error stating such.
In the early days of Azure SQL Database (ne SQL Azure) we had size limits that were minuscule–I vaguely remember 5 and 10 GB, but it could have been smaller. Back then space was at a premium. In 2021, we have hyperscale that allows us to have an Azure SQL Database of up to 100 TB (or more) in size. However data size in the cloud means costs, so space management can matter, right?
While many DBAs will never ever want to shrink a data file, sometimes your transaction log file can grow for external reasons. Typically these reasons are as follows:
Your log backups failed for a while and the log file grew
You couldn’t write to an availability group secondary for a while and your log file grew
You couldn’t write to a replication target for a while and your log file grew
In an on-premises world, shrinking data files should come down to two scenarios:
You were creating a database file and you accidentally added an extra zero to the size field.
You weren’t using compression, and you compressed all of your larger tables and indexes and got a bunch of space back
The first one is easy–since SQL Server never wrote data to those pages, SHRINKDB will quickly reallocate them. The second one is much harder and depends on your data types and how busy your server is. In that case SHRINK may never complete–especially if you used LOB data types. Besides that SHRINK will fragment your data massively. So you should basically never run it.
Disk space is mostly cheap, so unless you are extremely storage constrained you should just live with empty space. In Azure SQL this is even more complex, because the ghost cleanup process, which cleans up unused dirty pages can take a long time to run. So it can be days or weeks before you can reclaim your space, if ever.
So Really, How Do I Shrink My Azure SQL Database?
You don’t. Space, even in the cloud is really cheap. If you really want to make the database smaller, you can extract the data into a new database, and then drop the original database. Congratulations, you just took downtime and took several hours of your life to save $30/month. That doc from Microsoft recommends (with a warning) enabling autoshrink on your databases. Don’t do that. Ever. It’s really, really bad for your performance. Relax and enjoy the fact that your don’t have to bug the SAN admin for space in the cloud.
Sorry for the clickbaity title. One of the things I hated most about working in corporate IT, was the persistent, never ending discussions around server naming standards. I once flew to Switzerland to spend 16 hours in a room full of dudes (it’s always a room full of dudes) discussing each of the 16 characters of out server naming standard should be. If I was billing for that kind of thing now, I’d charge $3000/hr (as in I would only do this for $3000/hr), as a server naming standard is probably one of the least important things an IT organization can spend its time on and there are much better solutions in modern infrastructure. If you aren’t familiar with tagging or labeling, you should be. Modern infrastructure, whether it be on-premises virtual machine hypervisor, public cloud platforms, or a container management system like Kubernetes, all support tagging of resources.
Tags by themselves aren’t particularly useful–they are simple key value pairs (e.g. Environment:Production, ServerType:SQL) other than they allow you to easily identify information about resources. Where they become really powerful is their integration with the control plane, also known as the API layer that powers modern software defined infrastructure. This means you can operate on resources based on their tag values. The code shown in the picture below illustrates this:
In this code sample, we are looking for all resource groups matching a tag with the name “Use” and the value of “Demo” and then within those resource groups, stops any running VMs. This is a really simple example, that I use in my Azure subscription. With this you can start to see the power of tagging. You can also, on some platforms (Azure, for sure, I didn’t check others) use policy to enforce the use of tags. For example, at DCAC, I implemented a policy that all new Azure resource groups are required to be tagged with client name, and date created, so we could more easily know if we could delete resources.
In addition to allowing for code-driven automation, when you are in a cloud world, tags make their way down to your actual bill, letting you understand which departments or applications are costing you the most money. This allows to take a multi-dimensional approach to breaking down your cloud billing–in a large organization without a good tagging strategy, diseccting your bill is going to be very challenging.
Tags are incredibly useful in identifying your infrastructure, and you can add a lot of them–in Azure you can have up to 15 tags. This gives you a rich collection of information about your resources that you can access programmatically. Tag early, tag often.
One of the things that is so “fun” about working with Microsoft Azure, is the near constant rate of change in the platform. I did put fun in quotation marks, but I really mean it—if I had to work on the same things every day my job would become quite boring. Part of this constant rate of change, means there are always new products and features in the pipeline. While there are some architectural standards around availability and design, each Azure service has some its own unique nuances.
As MVPs and Microsoft Partners, the way we get new features and services to our customers typically goes down the following path:
A product group announces plans for a new feature or service, and we get excited about it
We might play around with the service in a test environment so we can confirm it does the thing we expected it to do
We then find or encounter a customer scenario which could benefit from new feature or service
We implement the service at a customer
Of course, given that we are talking about cloud computing, most services have direct costs that are associated with them. We expect cloud to cost money, but Microsoft is quite generous with cloud credits to MVPs and Partners to allow us to test and do proof of concepts. However, because of cost models, some services are just untenable for MVPs and customers because of high cost of entry.
When SQL Server 2016 was in preview, Microsoft introduced a feature called StretchDB. This feature allowed you to extend a table or tables to an Azure SQL Database. While the feature had limited functionality, conceptually it was cool. If you had cold data that wasn’t actively queried, and needed storage space, you could archive that cold data into the cloud for a relatively low cost, as initially you could choose whichever tier of Azure SQL DB you wanted—I frequently demo’ed this feature with a Basic SQL Database that cost $5/month. Sometime before SQL Server 2016 went generally available (GA), the decision was made at Microsoft that StretchDB was going to be platformed on a StretchDB SKU of Azure SQL Data Warehouse with a minimum cost of thousands of dollars of month (it used a premium SKU of SQL DW). This effectively killed the feature to the MVP community and frankly most customers—I haven’t seen any adoption of this feature, which also means there probably won’t be investments from Microsoft into the feature gap.
StretchDB isn’t the only service that I’ve encountered with this problem. Azure services that either have extremely high fixed costs or have highly unpredictable variable costs are really unattractive to customers.
What Should Microsoft Do?
Some services are inherently expensive—if you think about Azure SQL Database in the Business Critical tier, you have the functional equivalent of a three node availability group, with three copies of your data on local flash storage. That’s a solid amount of compute resources, and I can understand why it cost what it does. However, SQL Database also provides the Basic service tier at $5/month, which is nearly the functional equivalent of a $40,000/month 80 virtual core business critical database. (There are some minor functional differences and the obvious capacity differences) This allows developers and partners to get comfortable with the service at a low price point and grow their adoption over time.
I have a few basic architectural tenants that a lot of Azure services do, that I think should be across the board for all service offerings:
Ideally, have a low-cost SKU, that has enough compute power for demos and proof of concepts
If your service must have a very high fixed cost, ensure that the service has a pause functionality, so that the compute costs do not have to run 24/7
If your service has a high degree of variable costs, share internal (Microsoft use) customer scenarios and billing, so that customers can get a decent idea of what their costs are going to look like
Making cloud services cheaper to get started with will help overall adoption. When a new service is launched and I can’t tell a customer what their bill is going to be, or that the costs are going to be thousands of dollars a month just to try them out, adoption is going to be limited. I think adopting these tenants across Azure services would help adoption.
Yesterday, you probably received an email from PASS and/or Redgate stating that PASS’s assets had been acquired from Redgate. For the health of the Microsoft data and analytics community this may seem like a win, but in reality, it creates many more questions. The first of which, is how was the auction held, and to whom did the proceeds go to? Did Summit ticket holders have their money refunded? Did PASS Pro members get refunds? The fact that a member of the PASS Exec board works for Redgate raises further questions about the fairness of the process.
Having a for-profit vendor having ownership of community assets, just continues the same problem the community had with C&C. In fact, it creates more conflicts—are other vendors going to want to sponsor “Redgate SQL Saturday Events”? Or “Redgate PASS Summit”? The complete lack of transparency around the auction process aside, having one vendor have ownership of the community assets is problematic. I would be more ok, if it was Microsoft, but would still prefer the community own its assets so that we are not at the mercy of a company. This isn’t a problem I have with Redgate—I’d have the same problem with any vendor.
What Should Redgate Do?
Here’s what I think should happen:
Redgate should help form a not-for-profit organization with an independent board and governance structure to hold and manage community assets
They can have a permanent seat on the board of this foundation, provided its charter allows other vendors on the board
The organization should freely license assets like SQL Saturday and training content
If someone wants to run Summit, that endeavour should remain separate from the community organization, and run either out of Redgate or a separate organization.
This is the only path forward for a successful community organization. There are a lot of models in technology communites for community/corporate partnerships like the Linux Foundation and the Cloud Native Computing Foundation. Otherwise, the organization is not a community, it’s just a bunch of marketing assets for a corporation.
One of the challenges of being a consultant is having to work with a number of clients, and having different login credentials and accounts. In the early days of Azure, this was exceptionally painful, but over time the experience of using the portal with multiple identities and connecting to Azure tenants has gotten much easier. However, when writing PowerShell or Azure CLI code, switching accounts and contexts is slightly more painful. Also, when you are doing automation, you may be touching a lot of resources at one time, you want to be extra careful that you are in the right subscription and tenant.
Enter cloud shell.
If you click on the highlighted icon from the Azure Portal, you will launch cloud shell. This will require you to have an Azure Storage account which will consume a small amount of resources (€$£)–don’t sweat this–it’s literally going to cost pennies per month, unless you decided to upload terabytes of images to your cloud shell (don’t do this). The storage is there so you can maintain a history of commands and even store script files there.
With cloud shell you are automatically logged into the tenant associated with your login–you will still need to select the subscription. As shown below–you can see the subscriptions available to your login.
The other cool thing about cloud shell is that you also have built-in text editors including vim and code. While means you can paste code into a text editor and save it in your shell. Since you have a storage account that data is persisted. So you can have a bunch of scripts saved in your cloud shell. This is great for developing for Azure automation, or just running some ad-hoc scripts.
You can also go full screen with code–as shown above. While all of the examples I’ve shown have been PowerShell, you can also launch a bash shell running the Azure CLI.
I’ve written a lot about my thoughts on PASS this year. While I understand some of my posts could have been considered inflammatory, I wrote them from a deep position of love for the SQL Server and broader Microsoft Data Platform Community, and I decided to run for the Board of Directors because I wanted to ensure that the opportunities provided by PASS continued for others.
Before I tell you about why you should vote for me for the PASS Board of Directors, I wanted to talk a little bit about my history with PASS. I don’t remember exactly when I got started in PASS (I likely signed up for a virtual chapter earlier), but when I moved to Philadelphia and was in a role that required me to get more in-depth with SQL Server (I used to be an Oracle DBA) and I got involved with the Philadelphia SQL Server User’s Group. Shortly thereafter, I gave a talk or two, and then joined the board of the organization. I was able to attend my first PASS Summit in 2011, and ran SQL Saturday Philadelphia for 5 or so years. I was also a regional mentor for the Mid-Atlantic region in the US for several years.
I bring up this point to highlight one of the things that I think is so important about PASS and is often missed—local chapters and regional events. While PASS Summit is an international event that represents a “family reunion” for #sqlfamily, these reunions happen every weekend all over the world at SQL Saturday events, and monthly a local user group meeting. In order to grow our membership and commit to our mission of Connect, Share, and Learn those local events need to be priority. I would like to work towards having a speaker database, even it is something rudimentary, where UGs could seek out a speaker to present virtually, or in-person. This is a good example of something that could be a community project—it represents something that the community could build in a hack-a-thon after building some requirements.
One of the challenges PASS has consistently faced over time is problems with its technology systems, which are mostly developed in-house, making them both expensive and time-consuming to update and upgrade. I would work towards moving to Software as a Service products that many conferences and user groups are already using. This can both be a potential for cost savings and enhance the productivity of all involved.
PASS faces many challenges in coming years and will have to adapt to stay alive in relevant. The best way to do that is to stick to the core mission of Connect, Share, and Learn, and to remember that everything starts locally, whether it be a user group or a SQL Saturday. We also need to ensure that our sponsors are happy and receiving value from the organization. If you vote for me, I will do my best to make those things happen.
I would also like to recommend you vote for my colleagues Steph Locke and Matt Gordon. We have been doing a lot of thinking about how a community organization should work and would provide the right leadership for PASS.
Creating an Extended Events session (as well as viewing events) in Azure SQL Database is slightly different than a typical SQL Server. Since you don’t have access to the file system of the server where your database live, you need to configure a storage account target for persistence of your extended event sessions. You can write them to the ring buffer, but since you do not have the ability to “view live events” in SQL Server Management Studio, this is of limited benefit. You read about what you need to do in docs here, but in a nutshell it’s create a storage account (or use an existing one) create a database scoped credential so you can use the storage account, and then create the xEvents session.
The reason why I’m writing this post is that there is a bit of a bug here that’s not fully documented. Many of us (especially those of us who are consultants) work across the scope of Azure Active Directory tenants. What that means is joeyd@dcac.com might manage a database in the contoso.com Azure AD tenant while still being logged in with the joeyd@dcac.com identity. Normally, this isn’t an issue but there are a couple of places where some odd things happen with cross-tenancy. When you try to create a credential in your database, you will receive the following error, even if you are the database owner.
Msg 2760, Level 16, State 1, Line 1 The specified schema name "joeyd@dcac.com" either does not exist or you do not have permission to use it.
Total execution time: 00:00:00.195
You should note the rapid execution time of that error–this isn’t failing when going out to a storage account to validate the credential, the code is failing in the database. I posted something about this to the Microsoft MVP DL and the ever brilliant Simon Sabin emailed me and suggested that I try to create a schema called joeyd@dcac.com and then create the credential. Sure enough–that worked fine and I could proceed. In the customer system where this happened, we were fortunate enough to have global admin rights in AAD, and just created a new user in their subscription, and used it.
Note: I’m running for the PASS board of directors. Candidates are not allowed to disparage PASS, and I’m of the opinion that I’m not disparaging PASS in this post, but if anyone thinks I am, please let me know in the comments.
One topic I’m always yelling about on Twitter and to any community speakers I chat with, is to never do anything for a for profit company without compensation. Compensation can take the form of having your travel expenses paid for a conference, or just getting paid for work product. No matter how small the effort, it’s a lot of work to write a column, or do a presentation, and you are a subject matter expert whether you realize it or not. Community events are a different story—I’ve spoken at user groups, virtual chapters, many SQL Saturdays and even some community conferences, like SQLBits (in person SQLBits does provide hotel rooms for speakers) and EightKB without any compensation.
I’ve written about the PASS Pro subscription offering, that PASS and C&C launched to build a secondary revenue stream, beyond PASS Summit. When it was launched, I said, conceptually it was a good idea, but I didn’t have faith in C&C to provide good execution. I also thought PASS would struggle to fill a content pipeline, without a large capital infusion to pay speakers to build content, the way Pluralsight and LinkedIn Learning do.
It turns out my assumption was correct, as this morning I received an email (forwarded to me) asking PASS speakers to teach Microsoft Learn modules to PASS Pro members. The work consists of doing a webinar and leading a Q&A session for an hour, and the compensation is “This is a volunteer position”. Yes, that’s it—PASS expects folks to do this work for free.
My (and DCAC’s) typical compensation for something like this would be $1000 at a minimum. In addition to the active training time we are typically paid for the time required to prepare to give the training class. Prep time for Azure training is hard—Azure changes all of the time, which means you frequently need to update demos, change screenshots in slides, and possibly even refactor entire sections of training because the Azure platform is constantly changing. Asking the community to provide free training to a service, that is an attempt to prop up a for-profit event management company is just unconscionable to me.
Because PASS Pro is a paywalled service there are a limited number of people who can attend these events. This means this training is not upholding the PASS mission of Connect, Share, Learn to the community at large. More importantly PASS is asking experts to provide their expertise and skill for free. Just like artists shouldn’t work ‘for exposure’, you shouldn’t either. If someone is asking you to build content for them, you have a valuable level of expertise that that company needs.
Photo by Dids on Pexels.com — You could die from exposure.
I don’t mean to sound pretentious, but I’ve worked long and hard to have a very solid understanding of the Azure ecosystem, and I consider myself knowledgeable. My knowledge is valuable—it took me a long time to acquire that knowledge as well as the ability to teach others that knowledge in a manner that they can understand. That knowledge and skill is valuable, and asking professionals to work for free completely undermines the foundation of having a Professional Association.
I’ll take this a step further and say that anyone who does this for free is actively harming the rest of the community. As I mentioned before we at DCAC are paid for things like this, along with many other members of the data community and the Microsoft Certified Trainer community. When others do free training for a paid service you are undercutting all of those people. You may ask what I think about people doing things on YouTube or other free services—those videos are available for the whole world to see, and in many cases are loss leaders to try to get viewers to subscribe to paid services.
I’m of the opinion that PASS Pro being a paid service had the potential to divide the community. However, when speakers are paid, that’s another opportunity for speakers to make money, and is how nearly all legitimate online training services work. If the success of your business model depends on free labor for your paid service, that’s just theft of labor, and you don’t have a feasible business model.
One of the benefits of cloud computing is flexibility and scale—I don’t need to procure hardware or licenses as you get new customers. This flexibility and platform as a service offerings like Azure SQL Database allow a lot of flexibility in what independent software vendors or companies selling access can provide to their customers. However, there is a lot of work and thought that goes into it. We have had success with building out these solutions with customers at DCAC, so in this post, I’ll cover at high level some of the architectural tenants we have implemented.
Authentication and Costing
The cloud has the benefit of providing detailed billing information, so you know exactly what everything cots. The downside to this is that the database provided is very granular and detailed and can be challenging to breakdown. There are a couple of options here—you can create a new subscription for each of your customers which means you will have a single bill for each customer, or you can place each of your customers into their own resource, and use tags to identity which customer is associated with that resource group. The tags are in your Azure bill and this allows you to break down your bill by each customer. While the subscription model in cleaner in terms of billing, however it adds additional complexity to the deployment model and ultimately doesn’t scale.
The other thing you need to think about is authenticating users and security. Fortunately, Microsoft has built a solution for this with Azure Active Directory, however you still need to think about this. Let’s assume your company is called Contoso, and your AAD domain is contoso.com. Assuming you are using AAD for your own business’s users, you don’t want to include your customers in that same AAD. The best approach to this is to create a new Azure Active Directory tenant for your customer facing resources—in this case called cust.contoso.com. You would then add all of the required accounts from contoso.com to cust.contoso.com in order to manage the customer tenant. You may also need to create a few accounts in the target tenant, as there are a couple of Azure operations that require an admin from home tenant.
One of the things you need to think about is what happens when you onboard a new customer. This can mean creating a new resource group, a logical SQL Server, and a database. In our case, it also means enabling a firewall rule, and enabling performance data collection for the database, and a number of other configuration items. There are a few ways you can do this—you can use an Azure Resource Manager (ARM) template, which contains all of your resource information, which is a good approach that I would typically recommend. In my case, there were some things that I couldn’t do in the ARM template, so I resorted to using PowerShell and Azure Automation to perform deployments. Currently our deployment is semi-manual as someone manually enters the parameters into the Azure Automation runbook, but it could be easily converted to be driven by an Azure Logic App, or a function.
Deployment of Data and Data Structures
When you are dealing with multiple databases across many customers, you desperately want to avoid schema drift that can happen. This means having a single project for all of your databases. If you have to add a one-off table for a customer, you should still include it in all of your databases. If you are pushing data into your tables (as opposed the data being entered by the application or users) you should drive that process from a central table (more to come about this later).
Where this gets dicey is with indexes, as you have may have some indexes that are needed for specific customer queries. In general, I say the overhead on write performance of having some additional indexes is worth the potential benefit on reads. How you manage this is going to depend on the number of customer databases you are managing—if you are you have ten databases, you might be able to manage each databases indexes by themselves. However, as you scale to a larger number of databases, you aren’t going to be able to manage this by hand, Azure SQL can add and drop indexes it sees fit, which can help with this, but isn’t a complete solution.
Hub Database and Performance Data Warehoue
Even if you aren’t using a hub and spoke model for deploying your data, having a centralized data repository for metadata about your client databases. One of the things that is a common task is collecting performance data across your entire environment. While you can use Azure SQL Diagnostics to capture a whole lot of performance information in your environment, with one of our clients we’ve taken a more comprehensive approach combining the performance data from Log Analytics, Audit data that also goes to Log Analytics, and the Query Store data from each database. While log analytics contains data from the Query Store, there was some additional metadata that we wanted to capture that we could only get from the Query Store directly. We use Azure Data Factory packages (which were built by my co-worker Meagan Longoria (b|t) to a SQL Database that serves as a data warehouse for that data. I’ve even built some xQuery to do some parsing of execution plans, to identity which tables are most frequently queried. You may not need this level of performance granularity, but it is a talk you should have very early in your design phase. You can also use a 3rd party vendor tool for this—but the costs may not scale if your environment grows to be very large. I’m going to do a webinar on that in a month or so–I need to work it out the details, but stay tuned.
Final Things
You want to have the ability to quickly do something across your environment, so having some PowerShell that can loop through all of your databases is really powerful. This code allows you to make configuration changes across your environment, or if you use dbatools or invoke-sqlcmd to run a query everywhere. You also probably need to get pretty comfortable with Azure PowerShell, as you don’t want to have to change something in the Azure Portal across 30+ databases.