One of the promises of Azure SQL Data Warehouse is the ability to have petabyte scale. The ability to quickly scale data, and have that data scale independently of compute resources. So when one I my clients emailed me yesterday with this screenshot, needless to say I was concerned.
As you can see, when the properties screen shows a max size of 5 GB. So I sent a frantic email to one of my colleagues on the SQL Data Warehouse team (thanks, JRJ). Turns out, due to the distributed architecture of SQL DW, that max database size refers to the size of the control node, and not the underlying data nodes that service your queries. So feel free to ignore this.
Security of your databases has become the most important part of your job as a DBA. No one wants to be the next company to leak a list of customers, leak photos of children and their parents using your company’s product, or have a key marketing database released. Security is also a very complicated, multi-layered thing. There’s not a switch you can turn on to suddenly have good security. Additionally, you can be doing all the right things, and a few bad lines of application code can subvert all of the measures you have taken to protect your data.
With this is mind, Microsoft has made some big changes to CLR in SQL Server 2017. SQL CLR has always been an interesting area of the engine—it allows for the use of .NET code in stored procedures and user defined types. For certain tasks , it’s an extremely powerful tool—things like RegEx and geo functions can be much faster in native CLR than trying to do the equivalent operation in T-SQL. It’s always been a little bit of a security risk, since under certain configurations, CLR had access to resources outside of the context of the database engine. This was protected by boundaries defined in the CLR host policy. We had SAFE, EXTERNAL_ACCESS, and UNSAFE levels that we could set. SAFE simply limited access of the assembly to internal computation and local data access. For the purposes of this post, we will skip UNSAFE and EXTERNAL_ACCESS, but it is sufficed to say, these levels allow much deeper access to the rest of the server.
Code Access Security in .NET (which is used to managed these levels) has been marked obsolete. What does this mean? The boundaries that are marked SAFE, may not be guaranteed to provide security. So “SAFE” CLR may be able to access external resources, call unmanaged code, and acquire sysadmin privileges. This is really bad.
So What Now?
Microsoft is recommending enabling “CLR Strict Security” in SQL Server 2017, which means users cannot create any CLR assemblies unless they have elevated permissions. This could be a breaking change anywhere you want to use dynamic CLR. The user needs to have CREATE ASSEMBLY and one of the following options has to be true:
Database has the TRUSTWORTHY property on AND the database owner (DBO) has the UNSAFE ASSEMBLY permission on the server
Assembly is signed with a certificate or asymmetric key that has a corresponding login with the UNSAFE ASSEMBLY permission on the server
In short, the engine treats all CLR assemblies as unsafe, even if they are created with SAFE or EXTERNAL_ACCESS permission sets.
You will need to sign your assemblies. Also, if you try to restore a database with CLR assemblies into SQL Server 2017, you may run into issues if your assemblies are unsigned.
But Now I Have to Do Work?
What does this mean for you, the SQL DBA or Developer? This is a breaking change (and breaking changes are a BIG DEAL at Microsoft—they don’t like to break things unless there are really good reasons). This means if you are using CLR, and you want to move to SQL Server 2017 (or really, let’s face it, these security risks are in lower versions of the code), you need to work out a signing system for your CLR assemblies and possibly some key management infrastructure. You need to evaluate your older CLR code to ensure that none of it is running under UNSAFE or EXTERNAL_ACCESS (unless you want to turn off “CLR Strict Security”, which you really don’t want to do). Also, if you want to run SQL Server 2017 on Linux, you will be limited to the SAFE permission set.
We Only Have Third Party Apps…
Good luck with that. This is a big change, and we all know lots of smaller independent software vendors (ISVs) have just gotten around to supporting SQL Server 2014. In most of my work with ISVs, I haven’t seen a ton of CLR in most of the vendor apps. That being said, I’m sure it exists in many ISV apps. The angle I would take as a DBA towards my ISV(s) is to use the “audit hammer” with the vendor. If you are not familiar with this approach to managing your vendors, you can mention that your internal audit process is now failing unsigned assemblies that can’t run under CLR Strict Security. This is probably accurate anyway—CLR has been an audit flag for years. I would also recommend doing a comprehensive testing process to ensure your CLR functions as you would expect.
You know what people say about what happens when you assume something, right? Don’t be an ass and make assumptions. One trend I’ve seen in recent years from some conference is that the conferences tend to prefer super advanced, internals, deep-dive sessions. While these sessions are very cool, and we all like to totally geek out about internals and the inter-workings of the query optimizer, and secret trace flags that make SQL Server return all query results as emojis. (Dear Conor, can you work on that one )
Anyway, while many of us in the SQL Server community have massive depth of knowledge about the product, and computer systems in general, it is important to realize that everyone attending SQL events is not an MCM, MVP, and MSCE. It’s great if you want to do a session on backup and restore, or how to install SQL Server (make sure to explain why you are doing each step). Topics like intro to T-SQL and Writing Your First Stored Procedure can be fantastic.
I’m not saying the whole event needs to have 100 level sessions, but making an effort to reach out to people who are just getting started could go a long way to broadening the reach of a professional association.
Monday PASS announced its changes to the speaker selection process, you can read the details here. This is a big change—there will be preselected speakers from a pool have folks who have a long track record of successfully speaking at PASS Summit. Is this a good thing or bad thing? I think it is a good thing, so long as it is implemented carefully. I don’t want to see new speakers get locked out of the speaking at Summit, but I also want to see good sessions from good speakers.
This change will allow PASS to better structure the conference. A good example of this is Nic Cain’s (b|t) idea for curated tracks. In order to have a “new DBA” or “new BI analyst” track, you need to have speakers collaborating with each other in order to build the complimentary track.
Another consideration is NDA content—a good example of this was last year. I would have loved to have done a talk about SQL Server on Linux, which I knew was going to be un-NDAed by Summit, however since submission is public, I couldn’t submit an abstract on it.
My advice for new speakers who want to speak at Summit? It’s the same as it’s always been—blog (note—the blogging part of this is important!) and speak about whatever interests you. Of course, you need to be strategic about submitting to a large conference like Summit, but to a local SQL Saturday, or a user group? Those organizers are begging for speakers. Additionally, consider speaking at PASS Virtual Chapters—they are many, and they meet every month, and there is no cost (other than your time) involved with speaking there.
As you develop as a speaker and writer, you’ll get better known, and develop your own niche. You will also get rejected. Getting rejected sucks—trust me, I submitted to three Summits before I was chosen (I was also nominated for MVP like 10x before getting it, but I digress). When you get rejected look at your abstracts and try to understand how you can make them better. Have a friend or colleague review them. This is an ongoing process.
I don’t think most speakers will notice a big difference with this new process. The speakers who are preselected, were likely going to get selected anyway. The big difference is they will have chosen their topic versus being subject to the whim of the program committee. If you’re a new speaker–speak as much as you can. VCs are free, and your local user group needs speakers. If you live in the middle of nowhere, a lot of user groups will welcome remote presentations. Hone your skills. Write some blog posts (you may have noticed this is my third blogging mention, fire up your keyboard). There’s a new version of SQL Server this year. Get inspired!
One of the great advantages of the cloud computing is the ability to power off resources that are not in use to save some money. Sure, your production database servers should be running 24×7, but that VM, or SQL Data Warehouse you are developing against during the week? You can shut it down at 7 PM (1900 for the Europeans reading this) and not start it up. Azure even recently introduced an auto-shutdown feature for VMs.
Unfortunately, there is no auto-startup feature, but that is easy enough to code using an Azure automation job.
This sounds great, can it walk my dog, too?
Unfortunately, there’s one problem with our awesome budget saving proposal. Sometimes developers have jobs that run beyond the time they leave the office. For example, last night at one of my clients a developer had an SSIS package running after he left, and it got killed when the SSIS machine auto-shutdown at 7. That isn’t good.
The solution for this is Azure resource locks—you can put a lock on any resource in Azure. A lock can do one of the two things—first there are delete locks which simply keep a resource from being deleted. It is not a bad idea to put a delete lock on all of your production resources to prevent any accidental deletion from happening. The second type of lock is a read-only lock, and these are a little more aggressive. You can’t do anything to a resource with a read-only lock—you can’t add a drive to a VM, you can’t resize, and most importantly, you can’t shutdown the resource.
You can use the portal, PowerShell, or CLI to create a lock. It’s a fairly simple construct that can be extremely beneficial. You can get current details for lock creation from the Azure Documentation.
My developers have access to the portal (thanks to role based access control and resource groups), so I’ve instructed them on how to place locks on resources, and how to remove them. As an administrator, you probably want to monitor for locks, to ensure that they aren’t left in place after they are needed.
I had this dream that other week. I was in the big room at PASS Summit, sitting in the audience. I was relaxed, as I thought I was presenting later in the day, when I quickly realized, due to the lack of speaker on the stage, that I was the next speaker, and the room was full. And I was playing with my laptop and I didn’t have a slide deck. In my dream, this talk was a 300 level session on troubleshooting SQL Server, something I feel like I could do pretty easily, you know with slides. Or a whiteboard.
I woke up, before I started speaking. So, I’m not sure how I would have handled it—interpretive dance? I’m a pretty bad dancer. One thing, I will mention, and I saw my friend Allan Hirt (b|t) have to do this last month in Boston—really good (and really well rehearsed) speakers, can do a very good talk without their slides. Slides can be a crutch—one of the common refrains in Speaker Idol judging is don’t read your slides. It is bad form—do I sometimes read my slides? Yeah, everyone does occasionally. But when you want to deliver a solid technical message, the best way to do that is telling stories.
I’m doing a talk next month in Belgium (April 10, in Gent), right before SQL Bits. It’s going to be about what not to do in DR. My slide deck is mostly going to be pictures, and I’m going to tell stories—stories from throughout my career, and some stores from friends. It’s going to be fun, names will be changed to protect the guilty.
So my question and guidance for you dear readers, is to think about what you would do if the projector failed and you did not have a whiteboard. I can think of a number of talks I can do without a whiteboard–in India last year, another instructor and I demonstrated Azure networking by using our bodies as props. What would you do in this situation?
As I mentioned in my post a couple of weeks ago, monitoring the plan cache on a readable secondary replica can be a challenge. My customer was seeing dramatically different performance, depending on whether a node was primary or secondary. As amazing as the Query Store in SQL Server 2016 is, it does not allow you to view statistics from the readable secondary. So that leaves you writing xQuery to mine the plan cache DMVs for the query information you are trying to identify.
My friends at Solarwinds (Lawyers: see disclaimer at bottom of post) introduced version 11.0 of Database Performance Analyzer (DPA, a product you may remember as Ignite) which has full support for Availability Group monitoring. As you can see in the screenshot below, DPA gives a nice overview of the status of your AG, and also lets you dig into the performance on each node.
There are a host of other features in their new releases, which you can check out some of their new hybrid features in their flagship product Orion. Amongst these features, a couple jumped out at me—there is now support for Amazon RDS and Azure SQL Database in DPA, and there is some really cool correlation data that will let your compare performance across your infrastructure. So, when you the DBA is arguing with the SAN, network, and VM teams about where the root cause of the performance problem, this tool can quickly isolate the root cause of the issue. With less fighting. These are great products, give them a look.
Disclaimer: I was not paid for this post, but I do paid work for SolarWinds on a regular basis.
Want a chance to win a really cool prize (5 Amazon Dots, and a copy of SQL Clone), while learning about a cool product from Redgate? My friends at Redgate are in the process of releasing a really awesome tool call SQL Clone. So what is SQL Clone?
One of the challenges I’ve faced through my career as a DBA is the ability to have a reliable copy of a production database to test against. Sure, when I worked at Comcast we had dev, qa, test, and prod environments, but many of the smaller organizations I worked for and consult with, don’t have that luxury. Even when I was at Comcast, the overwhelming cost of storage really limited our ability to maintain consistent development platforms.
So what is the solution? SQL Clone offers a virtual copies of your database, in seconds, whilst only taking up 40 MB of disk space. How does it do it? Redgate stores pointers to your database in the copy. This means you can have one copy of your database and multiple clones, so that different parts of your dev team can work together in harmony. This data will grow a bit as each developer queries the data, but if you dealing with very large (> 5TB) or even just pretty big databases, this can greatly reduce your storage cost and increate developer productivity.
To learn more about Red Gate SQL Clone, check out this video:
I’m still fighting with some challenges about inconsistent performance between a primary and secondary replica, so I’ve been waste deep in undocumented system views looking at temporary statistics. One of the things I thought about doing was talking advantage of the Force Plan option in the Query Store in SQL Server 2016. If you are not familiar with this feature, it allows you to force a “preferred” execution plan. In this scenario, our query was running in about 20-30 seconds on the primary, and 20-30 minutes on the secondary. The plans were reasonably close, but I wanted to see what would happen if I forced a plan on the primary.
Primer about the Query Store and Availability Groups
Since readable secondary replicas are read-only, the query store on those secondary replicas are also read-only. This means runtime statistics for queries executed on those replicas are not recorded into the query store. All the stats there are from the primary replica. However, I wasn’t sure what would happen if I forced a plan on the primary—would the secondary replica honor that plan?
Let’s Find Out
The first thing I did was to query the query store catalog views to verify that the plan was forced.
I have to copies of the forced plan. If I run an estimated query plan on the primary, I see that plan is forced. You can see this by looked for UsePlan in the XML of the plan.
I did the same thing on the secondary (in the case of the secondary, we are looking at the actual plan, but it doesn’t matter).
You will note that there is no UsePlan. There are extended events and a catalog view that reflect plan forcing failure (Grant Fritchey wrote about this behavior here), While, I wouldn’t expect the catalog view to get updated, I was hoping that the Extended Event might fire. It did not.
The query store, as awesome as it is, doesn’t really do much for you on readable secondary replica. It does not force plans, nor does it record any of your data.
Thanks to Grant Fritchey and Erin Stellato for helping with this post!
I was working with a client this week and we encountered very long wait types on “insert as select” queries that were part of their data delivery process. This wait type isn’t documented very well, SQL Skills has it documented here and mentions this:
“Typically these waits occur when queries involve columnstore indexes, but they can also occur without columnstore indexes being involved if a hash operator runs in batch mode.”
Nacho from Microsoft also has a blog post on it here. My theory was that a bad hash join was taking place and causing the wait.
Isolating the Plan
The thing that was very curious about the situation is that waits were only occurring on the readable secondary replica. At first, I tried to examine the query store to try to understand if there were multiple execution plans for a given query. The one problem with that is the readable secondary copy of the data is read-only, which means on that secondary replica you only see the query store data from the primary replica. If there was a plan that was specific to the secondary, I’d have gather them from the plan cache on the secondary. (Thanks to Erin Stellato (b|t) for this idea). There was one other problem—the code in question was executing as dynamic SQL from a stored procedure when meant it was always getting a new execution plan.
Ultimately I think any thought of the readable secondary having a vastly different plan was a red herrings. Statistics are going to be the same on both instances, and if there were a missing statistic on the secondary, SQL Server would create it in TempDB. Anyway, columnstore indexes don’t use statistics in the traditional sense.
Fortunately I was able to catch a query in the process of waiting on HTDELETE, so I no longer had to look for the needle in the haystack, and I could get to tuning the plans. I was able to grab the SELECT part of the query and generate an estimated plan on both the primary and secondary nodes. The plans were virtually the same on both nodes, with just a minor difference in memory grant between them.
The query is here.
SELECT Object4.Column1 FROM Database1.Schema1.Object5 Object4 INNER JOIN Database2.Schema1.Object6(?, ?) Object7 ON Object4.Column2 = Object7.Column3 INNER JOIN Database2.Schema1.Object8(?) Object9 ON Object4.Column4 = Object9.Column4 INNER JOIN Database1.Schema1.Object10 Object11 ON Object4.Column5 = Object11.Column6 INNER JOIN Database2.Schema1.Object12(?) Object13 ON Object11.Column7 = Object13.Column7 WHERE 1 = 1 AND Object4.Column8 >= ‘01-Jan-2017’
The pattern here was that we were taking all of rows of an ID field in a columnstore index with about 350MM rows and joining them to a function that has 3500 rows. My gut instinct was this was a bad match for batch mode hashing. Additionally, SQL Server was recommending I create a b-tree index on the large columnstore table. there was a key lookup in the plan that I wanted to eliminate, but my hunch was that this join was causing the waits.
So before I created the index, the query was taking at least 2-4 minutes, when it wasn’t getting hung on the HTDELETE wait. After I created the first index, we got done to about 15 seconds. SQL Server then recommended that I create another index on one of the join tables, which brought my query time down to sub-second. The plan looked a lot more traditional and had lots of my favorite operator INDEX SEEK.
The Moral of the Story
Sometimes you need non-clustered indexes on columnstore indexes. It stinks, because they do add space, but its hard to argue with a performance gain like this. I need to email some friends on the product team to ask, but I’m therorizing that the join was super expensive and causing the query to hang. Anyway, the real answer is to never stop tuning and trust your instincts.
Thanks to Sentry One for making Plan Explorer Free. I used it for the screen shots and anonymization in this post.