DBCC Clonedatabase and Very Large Databases

One of the recent feature introductions to SQL Server is dbcc clonedatabase, a feature that lets you create a “data-less” clone of you database. All of the statistics and objects come into your cloned database, however none of the data does. This is perfect for development or performance tuning exercises, where you want all the metadata, but do not want the security risk of dealing with production data.

Recently I had the opportunity to use clonedatabase on a very large database. I was concerned about the size of the data files and how this would impact space on my volumes. Books Online is fairly clear, but I wanted to see for myself.

Note All files in the target database will inherit the size and growth settings from the model database. File name convention: The file names for the destination database will follow the source_file_name _underscore_random number convention. If the generated file name already exists in the destination folder, DBCC CLONEDATABASE will fail

So my thought in reading that, is that the same number of data files will be created in the clone, just with the settings in model. Let’s test that out.

The first thing I did was create a new database, and then add a few data files to it. I made them 20 MB, which is a different size than model—just for testing purposes.

image

Next, I ran the clone database command.

image

Then connect to the clone and look at the data files

image

I can see that all of the files were created, in the same location as the files on the source database, except with the size settings of model. While this shouldn’t be a big deal for most, if you do like I recommend and make model a reasonable size for your environment, and you happen to be tight on drive space, you could fill up a volume. So just be aware when using clonedatabase particularly with databases that have a lot of data files in them.

An “Ask” for Microsoft—A Global Price List

And yes, I just used ask as a noun (I feel dirty), I wouldn’t do that in any other context, but this one. In reviewing my end of year blog metrics, my number one post from last year was a post that listed the list price of SQL Server. I wrote this post because a) I wanted clicks and b) I knew what a pain it was to find the pricing in Microsoft documents. However, the bigger issue is that to really figure out what a SQL Server cost, you need to go to another site to get Windows pricing, and probably another site to find out what adding System Center to your server might cost.

This post came up because Denny and I were talking the other night, as someone had posted to the Data Platform MVP list asking how much the standalone R Server product cost. We found a table on some Microsoft site:

IMG_06012017_194009

I’m not sure what math is required to translate “Commercial Software” into a numeric value, but it is definitely a type conversion and those perform terribly. Eventually I found this on an Azure page:

This image is charged exactly like SQL Server 2016 Enterprise image, but it contains no Database elements and has the core ScaleR and DeployR functionality optimized for Windows environments. For production workloads we recommend that you use a virtual machine size of DS4 or higher.

This leads me to believe that R Server has the same pricing as SQL Server, but with the documents I have I am not certain of that fact.

What Do I Want?

What I want, is pricing.microsoft.com, a one-stop shop where I can find pricing for all things Microsoft, whether they be Azure, On-Premises, or Software as a Service. At worse it should be one click from the product name to it’s pricing page. Ideally, I’d like it all in a single table, but let’s face it, software pricing can be complex and each product probably needs it’s own page with pricing details.

The other thing that would be really cool, and this is more of an Azure thing, is to have pricing data built-in to the API for deploying solutions. That way I can build pricing based intelligence into my automation code, to rollout cost optimized solutions for Azure.

Anyone else have feature suggestions?

Updated: Jason Hall has a great comment below that I totally forgot about. Oracle has a very good price list (it definitely wins the number of commas award) that is very easy to access. So dear readers in Redmond: Oracle does it, we you should too!

Updated: There is some of this available in Azure. It’s not perfect though. https://msdn.microsoft.com/en-us/library/azure/mt219004?f=255&MSPPError=-2147217396. Amazon just announced enhancements to their version of this service. https://awsinsider.net/articles/2017/01/09/pricing-notifications.aspx

SQL Server on Linux Clustering—A Few Other Notes

So I was chatting with fellow MVP Allan Hirt (b|t) about the cluster build that I wrote about yesterday, and I had a few more realizations about the Linux HA process as it stands right now. I haven’t talked to the the Linux product team at Microsoft about this, but I hope to in the near future to get a better idea of where things are headed. So these are my notes as of now, strictly relating to failover cluster instances (FCI), AlwaysOn Availablity Groups are coming, but are not in the latest CTPs of SQL Server on Linux.

It was faster than building a Windows cluster

It took me a while, I laughed, I cried, I cursed a lot, but if I look at the time it took for me to actually build the cluster and install SQL Server, it was a much faster process. Much of this comes down to the efficiency of the SQL Server installation process on Linux, which is as simple as running yum install mssql-server (mostly). Which leads me to my next point..

Installation options would be nice

The cluster building process is a little kludgy. Basically, you install two standalone instances of SQL Server, and then remove the data files from one them, and copy them into your NFS share. Having the option to do the equivalent of an “Add Node” install, would mean you wouldn’t need to worry about cleaning up your second node.

There’s no cluster validation, explicitly

This is a bit scarier, or easier depending on your view point. There are tests at various parts of the process to make sure things are working. For example, the first step of building your Linux cluster is to authorize the nodes to take part in the cluster, which validates certain security and network settings. However, the storage validation consists of starting and stopping SQL Server on each node to make sure it can talk to the storage and startup. Given that Microsoft doesn’t own the clusterware for this solution, I’m not sure how much they can enhance that, or if they will. This is a good open question.

There’s no dns

(Happy Late Birthday Kris!) One interesting thing I realized after talking to Allan was that I did all of my networking setup through the /etc/hosts file on each individual node. I remember doing this for RAC, and I think it may be a requirement of Pacemaker, but you will still want to make a DNS entry for your cluster identifier. When you do this on Windows, if you are using Active Directory for DNS, the installation does this for you. Not in Linux, you will need to do this yourself.

Screen Shot 2017-01-04 at 11.40.26 AM

Get comfortable with command line and scripting

There’s no cluster wizard to get you through the process. I think this isn’t a huge deal—Denny and I were talking yesterday about how relatively easy it would be to script the whole process in bash (I’m holding off until I find out if Microsoft is doing this), and most Linux sysadmins are really comfortable with writing bash scripts. But if you aren’t comfortable with Linux and the command line, now is the time to brush up, before things go prod.

Summary

We are in the very early days of this process, there is much that will likely change. From a functional and conceptual perspective, this is very similar to the way a SQL Server Failover Cluster works in Windows, but the implementation is quite different. I’d like to see things resemble Windows a bit more, at least from a SQL Server perspective, but we’ll see where the product heads.

SQL Server on Linux–Clustering

First of word of warning on this post—if you are reading it and it isn’t January of 2017, I suspect things may have changed significantly in the months going forward.

Screen Shot 2017-01-03 at 3.47.08 PM

So I did It, I built a SQL cluster on Linux. The process is documented here on BOL, I’m not going to walk you through it, I’ll probably do that in a later post, I just wanted to mention some things I ran into during this build process. First, I did this using VMWare Fusion on my Mac, but I think any virtualization platform that allows virtual networks should work. Secondly, even though BOL says you need Red Hat Enterprise Linux (and you do if you are doing this in prod and require support), I was able to do all of this on CentOS, which is the free as in beer version of RHEL.

In my scenario, I built 3 VMs, one to serve as an NFS server, the other two to be my SQL Servers. Currently, there is no cluster version of the install, it’s the standard installation for standalone SQL on Linux, you then point SQL Server at the NFS mount you created which serves as your shared storage. I had an initial permissions problem on writing my data files there—I did a bad thing on the NFS server and opened up the directory to the world (777), and was then able to copy files there. I’ll follow up on that.

One other thing that wasn’t in BOL, that I had to troubleshoot my way through is that just like a cluster on Windows, you have a cluster identifier and floating IP address. I had to add that to /etc/hosts on each of my nodes to get it to resolve. The article mentions turning off fencing for non-prod environments—I had to do that in order to get failover working correctly in my environment.

Finally, failover was a bit wonky at first, and I had to spend too much time troubleshooting an odd problem. I wrote a connect item for it., but select @@servername and select name from sys.servers returns the name of the host, and not the cluster name. I’m sure the team will fix this in the near future.

Why Are You Still Running Your Own Email Server?

One of the things I tell customers when doing any sort of architectural consulting, is to identify their most important business systems. Invariably something that gets left off of that list is email. Your email is your most critical system. ERP may run your profit centers, but email keeps it moving.

With that in mind, and given all the security risks that exist in the world (see: Russian hacking scandal, other email leaks of the week) it doesn’t make a lot of sense for most organizations to run their own Exchange environments when Microsoft is really good at it.

I had a discussion with an attorney at a company in a heavily regulated industry recently. The attorney mentioned that after investigating, she determined that the company didn’t have journaling turned on for their Exchange servers. (For you DBAs, journaling is effectively full recovery mode for Exchange—it’s more complicated that, but that is a nice analogy). Given that we are Office 365 customers, I wanted to check the difficulty of enabling this in our environment. I found out, full e-discovery capabilities that integrate with e-discovery systems are as easy as one click of a mouse (and a credit card to make sure you are on the right service level).

Another great security feature that was really painful to integrate with email login is multi-factor authentication. Once again, this requires a mouse click or two, and your credit card. You can even quickly do things like whitelisting your office’s IP address so that your users don’t have to use MFA when in the office.

These features are great, but it doesn’t even cover all the threat protection that Microsoft has built into Office 365 and Azure. You can read about that here, but Microsoft can even protect you from threats like spearphising. (Hi Vlad!) . Just like encryption. Don’t be a news story—just be secure.

Dear Colos: Up Your Game: aka How the #$%^ do you not have fibre in stock?

Many companies use co-located data centers to store their hardware. In some cases (like the colo we at DCAC use) you pay for power, cooling, and a connection to the internet. There is no expectation of added services other than those three things. In other cases, companies like Rackspace or Level 3 are what are known as managed service providers (note: I’m not talking about Rackspace or Level 3 in this post, I’m not going to name the guilty party, but if you want to know, you can reach me privately). Managed service providers offer solutions like shared storage, network management, and other value added services beyond just a space for your servers.

Enter the Cloud

So Microsoft and Amazon are effectively playing in this space with their IaaS offerings. There’s a big difference, however, as the cloud providers have invested a great deal of money in automation. The same customer I’m talking about in today’s post has some Azure VMs that we are deploying. I built and VM and allocated 3 TB of SSD storage in about 10 minutes this morning. Pretty slick operation–I’m fairly certain when I ran the PowerShell to deploy the VM, there was no person who got up to do anything. When I added the storage, I’m pretty sure no SAN zoning took place, and if it did, it was a few lines of code. We had previously stayed away from Azure because it’s not the most cost effective solution for very large workloads (Colo’s tend to have slightly better pricing on big boxes, but you get nickled and dimed on other things.

When Your Colo Sucks

So I have two different work streams going with the colo right now. One of which is to configure a site-to-site VPN to Azure. This should be a simple operation, however it took over a week to get in place, and only after I sent the colo the Cisco instructions on how to configure the VPN were they able to tell me that the Cisco device they had didn’t support the latest route-based VPN in Azure.  So we finally get up and running, and then we discover that we can’t get the Azure VMs from certain on-prem subnets. We ask them to make a change to add those subnets and they completely break our connection. Awesome.

The other workstream is a cluster upgrade. I wanted a new cluster node and storage, so we didn’t have to do an in-place upgrade. We started this process like 3 weeks ago, hoping to do the migration on black Friday. We had a call today to review the configuration. Turns out they had nothing in place, and aren’t even sure they can get a server deployed by NEXT FRIDAY (YES–10 days to deploy a server, your job is deploy servers). I heard lots of excuses like, we aren’t working Thurs/Fri, and we have to connect to two different SANs, we might not have that fibre in stock. It wasn’t my place to yell WHAT THE EVERLOVING $%^^ on the call, so I started live tweeting. Because that’s ridiculous. Managing and deploying infrastructure was what I did for a living, and I wouldn’t have a job if it took 10 days to deploy a server, and that wasn’t my only job. That really is the colo’s only job. How the #$%^ do you not have fibre in stock? Seriously? My lab at Comcast had all the fibre I could possibly need.

Edited to add this:

This is after last month when they confused SAN snapshots with SAN clones (when it takes 4 hours to recover from a “snapshot” it’s a clone) and presented production cluster storage (that was in use) to a new node. Awesome!!!

Why the Cloud will Ultimately Win

Basically, when it comes to repetitive tasks like deploying OSs and setting up storage, software is way better than humans. Yeah, you need smart engineers and good design, but Azure and AWS are already 90% of the way there. Also, there service levels and response times are much better, because everything is standardized and makes troubleshooting and automating much easier.

 

 

 

SQL Server v.Next—Linux Preview and Ola Hallengren’s Jobs

If you watched Scott Guthrie’s keynote at Microsoft Connect() this morning, your mouth is probably still on the floor. There was lot of big news:

  • Nearly all enterprise edition features in SQL Server 2016 SP1 are in standard edition
  • There is going to be a v.Next of SQL Server and you can play with CTP1 of it today
  • SQL Server on Linux CTP1 can now be downloaded and installed

The biggest news of the day is the standard edition news—this is going to be huge for independent software vendors who build their applications on top of SQL Server. While this is amazing news, I wanted to talk about something a little more near and dear to my heart—SQL Server on Linux. You can learn how to install SQL on Linux here.

So SQL on Linux

I’ve had the good fortune of being involved in the private preview for a good while now. Here’s the requisite screenshot of @@version you’ve seen in so many demos.

screen-shot-2016-11-16-at-9-53-50-am

In this case I’m running a VM on my Mac running CentOS. I also have a VM running Ubuntu and SQL Server running in a Docker container. If you want to go full native, you can use Visual Studio code and run without Windows at all. Aaron Bertand has a great post on how to make this work.

Management of SQL Server on Linux

Aside from a couple of DMVs that show you Linux specific performance information, everything in SQL Server on Linux is the same. Some of the HA and DR functionality is not complete, and the SQL Agent is not done, however you can use cron (and if you’re familiar with Linux, you should learn about cron—I’ll have another post on that next week).

Ola Hallengren’s Jobs on Linux

Many DBAs use Ola Hallengren’s jobs to manage backups and maintenance on their servers. The first thing you’ll want to do is download Ola’s scripts to your machine. You can do this using the CURL command in Linux. In this scenario I’m redirecting the output of the CURL command to a file called ola.sql

curl https://ola.hallengren.com/scripts/MaintenanceSolution.sql> ola.sql

Because of the behavior of the SQL Agent (currently) you will need to set the CREATE_JOBS parameters in Ola’s scripts from Y to N. I used VI to do this—you can read a primer on VI here.

After that—you’ll want to install Ola’s scripts on your SQL Server instance. You have sqlcmd on your Linux install and here you will use the input file flag.

sqlcmd -S . -Usa -Pp@ssw0rd! -iola.sql

You can do this from Management Studio on your Windows machine, or you can just do this from the command line—the nice part about the command line is that automation should be easy and straightforward.

The next thing we want to do is put a backup command in a shell script. In this case I’m just going to grab the backup example from Ola’s site. Use your favorite text editor—mine is VI because I hate myself and create a file called userbackup.sh

sqlcmd -S localhost -U SA -P p@ssw0rd!! -d master -Q “execute [dbo].[databaseBackup] @Databases=’USER_DATABASES’,@BackupType=’FULL’, @verify=’Y’, @CleanupTime=48, @CheckSum=’Y’, @LogToTable = ‘Y'” -b

After you save this file, you’ll want to make it executable. I’m going to use the chmod command to do that.

chmod 770 userbackup.sh

Now this file can be executed. You can do this using the ./ syntax. The output will be returned to the screen, if you are automating the process you can redirect the output to file which you can check for errors.

Date and time: 2016-11-16 10:22:47

Command: BACKUP DATABASE [TestingQuerystore] TO DISK = N’C:\Data\helsinki\TestingQuerystore\FULL\helsinki_TestingQuerystore_FULL_20161116_102247.bak’ WITH CHECKSUM, NO_COMPRESSION

Processed 2032 pages for database ‘TestingQuerystore’, file ‘TestingQuerystore’ on file 1.

Processed 2 pages for database ‘TestingQuerystore’, file ‘TestingQuerystore_log’ on file 1.

BACKUP DATABASE successfully processed 2034 pages in 0.090 seconds (176.562 MB/sec).

Outcome: Succeeded

Duration: 00:00:00

Date and time: 2016-11-16 10:22:47

This is quick primer on getting started with Linux—in the coming months, you’ll be learn more about being a DBA on Linux.

 

 

 

 

Exporting Masked Data with Dynamic Data Masking

The SQL Herald | Databases et al…

Dynamic Data Masking is a presentation layer that got added to Azure SQL DB and SQL Server 2016. In a nutshell it prevents end users from seeing sensitive data, and lets administrators show some data (e.g. the last 4 digits of social security number) for verification purposes. I’m not going to focus too much on the specifics of data masking in this post—that’s a different topic. This is how once you have a masking strategy you can protect your sensitive data going to other environments.

Well at PASS Summit, both in our booth and during my presentation on security in Azure DB, another idea came up—exporting data from production to development, while not releasing any sensitive data. This is a very common scenario—many DBAs have to export sensitive data from prod to dev, and frequently it is done in an insecure fashion.

Doing this requires a little bit of trickery, as dynamic data masking does not work for administrative users. So you will need a second user.

First step—let’s create a database and a masked table.

CREATE DATABASE DDM_Demo
GO

USE DDM_Demo
GO
CREATE TABLE Membership
(MemberID int IDENTITY PRIMARY KEY,
FirstName varchar(100) MASKED WITH (FUNCTION = ‘partial(1,”XXXXXXX”,0)’) NULL,
LastName varchar(100) NOT NULL,
Phone# varchar(12) MASKED WITH (FUNCTION = ‘default()’) NULL,
Email varchar(100) MASKED WITH (FUNCTION = ’email()’) NULL);

INSERT Membership (FirstName, LastName, Phone#, Email) VALUES
(‘Roberto’, ‘Tamburello’, ‘555.123.4567’, ‘RTamburello@contoso.com’),
(‘Janice’, ‘Galvin’, ‘555.123.4568’, ‘JGalvin@contoso.com.co’),
(‘Zheng’, ‘Mu’, ‘555.123.4569’, ‘ZMu@contoso.net’);

CREATE LOGIN TestUser WITH PASSWORD ‘P@ssw0rd!’
CREATE USER TestUser FROM LOGIN TestUser

GRANT VIEW DEFINITION TO demoexport;
USE msdb
GO

ALTER ROLE db_datareader ADD MEMBER demoexport;
GO
ALTER ROLE db_datawriter ADD MEMBER demoexport;
GO

Next I’ll login as this user and select from the membership table.

image_thumb.png

From here, I’m going to (as the TestUser) take an export of the database. You can do this by selecting the Export Data Tier Application option from the tasks menu in Management Studio.

image.png

I won’t bore you with clicking through the process, but this will give you an export of your database, with the data masked. Your next step is to import the .bacpac file you created. In this case I’m going to the same instance, so I changed the database name.

Right click on “Databases” in SSMS and select “Import Data-tier application”. Import the file you created in the previous step.

image.png

 
Now try selecting as your admin user.

imageimage_thumb.pngimage.png

Boom, you’ve exported and imported masked data in your lower environments.

Circles and Squares–What Do They Mean in the Query Store?

Denny Cherry and I are on a tuning mission this week, and fortunately the customer has SQL Server 2016, so we don’t have to waste a lot of time finding out problematic queries in the system. However, we were a little bit confused when we saw a graphic in the query store diagram today.

image

You will note that plan 44 has two durations, and even though has a single a plan id, the table to the right of the plan summary window indicates two plans. You will note that one of the plans is represented by a circle, the other a square. So what does this mean?

If the query is circle that means it completed. When the icon is a square that represents query that is cancelled, but had a plan generated for it. In this case I cancelled the query in SSMS.

We can confirm by looking in the Query Store catalog views.

image

The second run of the query shows as aborted and is represented as such in the query.

Does In-Memory OLTP Work with Always Encrypted? (AKA That time Bob Ward asked us a question)

Last week was the PASS Summit, which is the biggest confab of SQL Server professionals on the planet (and educational as ever), Denny Cherry  (b|t) and I ran into Bob Ward (b|t) of Microsoft and of 500 level internals presentations. And for the first time ever, Bob asked us a question about SQL Server—of course we didn’t know the answer of the top of our heads, but we felt obligated to research it like we’ve made Bob do so many times. Anyone, the question came up a Bob’s internals session on Hekaton (In-Memory OLTP) and whether it supported the new Always Encrypted feature in SQL Server 2016. I checked books online, but could not find a clear answer, so I fired up SSMS and setup a quick demo.

 

Set up Hekaton

Let’s get started with setting up Hekaton.

CREATE DATABASE imoltp  
GO 
 
————————————– 
— create database with a memory-optimized filegroup and a container. 
ALTER DATABASE imoltp ADD FILEGROUP imoltp_mod CONTAINS MEMORY_OPTIMIZED_DATA  
ALTER DATABASE imoltp ADD FILE (name=’imoltp_mod1′, filename=’c:\temp\imoltp_mod1′) TO FILEGROUP imoltp_mod  
ALTER DATABASE imoltp SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON 
GO 

In that code, I’m just doing the setup for In-Memory OLTP—you need to added a filegroup, as well as enable the MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT_ON.

Always Encrypted Setup

Next, we needed to setup up Always Encrypted.

image

In order to configure this, you’ll first need to setup a Column Master Key, and then a Column Encryption Key. I took a sample table from books online, and attempted to encrypt it.

CREATE TABLE [Employee_Encrypted](
    [BusinessEntityID] [int] NOT NULL IDENTITY(1,1) ,
    [NationalIDNumber] [char](11) COLLATE Latin1_General_BIN2
        ENCRYPTED WITH (ENCRYPTION_TYPE = DETERMINISTIC,
        ALGORITHM = ‘AEAD_AES_256_CBC_HMAC_SHA_256’,
        COLUMN_ENCRYPTION_KEY = ColumnKey1)
        NOT NULL,
    [LoginID] [nvarchar](256) NULL,
    [BirthDate] [date]
        ENCRYPTED WITH (ENCRYPTION_TYPE = RANDOMIZED,
        ALGORITHM = ‘AEAD_AES_256_CBC_HMAC_SHA_256’,
        COLUMN_ENCRYPTION_KEY = ColumnKey1)
        NOT NULL,
         CONSTRAINT [PK_Employee_Encrypted_BusinessEntityID] PRIMARY KEY NONCLUSTERED
(
    [BusinessEntityID] ASC
)
)
WITH  (MEMORY_OPTIMIZED=ON)

With that T-SQL (and a couple of attempts at getting my parentheses in the right place) it just ran. Nothing major to do—just create table. I confirmed this behavior with Jos de Bruijn, who is the program manager for Hekaton.

%d bloggers like this: