Group Managed Service Accounts

One of the really enjoyable things about my current job, is being able to work on some of the latest technology as it comes out. My team owns the infrastructure build processes, so I get to do fun things like building proof of concept environments, SAN testing, and working with new versions of things. We’ve been spending a lot of time working with some of the new features of Windows Server 2012, in an effort to get it ready for production rollout. We’re ready to go, but we are still waiting on our antivirus team.

For those of you who work in organizations with any sort of regulatory concern (which is pretty much any public corporation), you know that managing the accounts that own SQL Server services can be a bit of nightmare. You have to maintain passwords, and if the password changes at any point in time, it’s a very manual process to change it on each of your servers. Its best security practice to change these passwords on a regular basis, and particularly if a member of the DBA team leaves. However changing this password requires touching every server in some fashion—you can automate this process, but it’s still kind of painful. Also, if that account ever gets locked out—it can be really painful.

Starting with Windows Server 2008 R2 the concept of Managed Service Accounts came into play. These were accounts with Active Directory internal passwords—set to change on a regular basis, but controlled by the system. The best part of this was that AD did the password change and propagated it out to the server. The problem with this for large environment was that there was a 1:1 ratio of accounts to servers—which mean we could never use it on a SQL Cluster, and it wasn’t necessarily practical (though a best security practice) to have a Domain account for each SQL Server in your environment.

With Windows Server 2012, this concept has been expanded to Group Managed Service Accounts (gMSAs)—which have one account that can be used by multiple servers. So this makes it suitable for clusters* (the * is for AlwaysOn Availability Groups—more on that later), which is very important in my organization, as we are heavily clustered. Note—this process includes a number of steps which will require your Active Directory team. The initial steps actually have to take place on a Domain Controller. I highly recommend showing this post to your Windows Admin—it was an excellent resource.

The prerequisites for this are as follows:

  • The Active Directory schema in the gMSA domain’s forest has to be updated to Windows Server 2012 in order to create a gMSA (note this can be done on a 2008/2008R2 Domain Controller). Send your Admin here.
  • The Key Distribution Services KDS root key needs to be created. A domain admin needs to do this on a domain controller. See here.
  • You have to wait 10 hours after creating this key—really you do. You can advance the time in your domain for testing, but don’t do that.
  • The target servers for the gMSA need to be running Windows Server 2012.

So after you domain admin has created the key, then it’s time to create the gMSA. Run the following PowerShell command as admin (may need to be run by your domain admin):

New-ADServiceAccount gmsa1 –DNSHostName membersrv1.domain.tld –PrinicipalsAllowedToRetrieiveManagedPassword ShortGroupName

Now you have your Group Managed Service Account:


Here’s the AD Information on it:

One thing I didn’t mention before is setting up groups. This is going to depend on the size of your environment and your management philosophy. In our case, I’m thinking about creating three groups—production, non-production, and mission critical. Some shops may create one for each cluster or server, it really depends on your needs.

Also, by default, a gMSA will change the password every 30 days—this is modifiable, but since it’s a system managed password, I generally don’t want to mess with it. The beautiful thing about this, is that when AD changes the password, since it is system managed, you don’t have to touch your SQL Servers at all. It just works.

One interesting note, you will need to add the Active Directory Domain Services role to your SQL Servers. This will enable the PowerShell command you need to use to install the service account on each of your servers. You can do this by issuing the following PowerShell command:

Install-WindowsFeature –Name AD-Domain-Services –ComputerName $servername –IncludeManagementTools

After this, you need to install the account on each server, once again using PowerShell:

Install-ADServiceAccount gMSA1

After that, when installing SQL Server just search for the account in Active Directory. You need to make sure the “Service Account” object is checked off.

I’ve selected the Service Account—note that the password box is greyed out. You don’t enter the password (since only AD knows what it is).

I know a password shows up in Configuration Manager—don’t mess with this—it’s managed exclusive by Active Directory.

I have tested this configuration in standalone and SQL Server Failover Cluster instances (only in SQL 2012, however), and it works fine. AlwaysOn Availability Groups, however act a little weird, I will follow up with another blog post on this, but initially everything configured fine. However, when I tried to failover, I got two really ugly SSPI errors, but then it just worked. I haven’t seen a statement from Microsoft on support for this feature, so stay tuned.

Cluster Aware Updating Part II—SQL Server Failover Cluster Instances

I recently wrote here about my experiences testing out the new Windows Server 2012 feature, cluster aware updating with SQL Server AlwaysOn Availability Groups. As you see in the post, it didn’t go so well. I didn’t have the opportunity to test on the more traditional SQL Server Failover Cluster instance.

Well after the Thanksgiving holiday, I was able to get my infrastructure up and running and build a Failover Cluster Instance. Note—I was using SQL 2012 SP1 and Windows 2012, and use Starwind SAN software for the shared storage requirement. I am very happy to report, that the Cluster Aware Updating process simply worked, and failed over the instance correctly. So in order to configure this, you will simply need to configure Cluster Aware Updating and not do anything else to SQL Server. Details on Cluster Aware Updating can be found here.

I suspect the issue with AlwaysOn Availability Groups relates to the way it interacts with the cluster service. In our traditional model of failover clusters, do all of our instance control through Failover Cluster Manager, however with AlwaysOn AGs we are specifically instructed not to:

 

                                    -source http://msdn.microsoft.com/en-us/library/ff878487.aspx

So I think since Cluster Aware Updating is attempting to manipulate that service, that is what is causing the errors. Just my thoughts, please share yours!

SQL Saturday #173—Washington DC

My Slides for the presentation are here.

Replication Subscribers and AlwaysOn–from Books Online.

Link to Starwind Documentation

Cluster Aware Updating and AlwaysOn Availability Groups

One of the features I was most looking forward to in Windows Server 2012, was Cluster Aware Updating. My company has a lot of Windows servers, and therefore a lot of clusters. So when a big vulnerability happens and they all need to be rebooted, we use System Center Configuration Manager to handle the reboots automatically. Unfortunately, clusters must maintain quorum to stay running, so rebooting them has generally been a manual process.

However with Windows 2012, we have a new featured called Cluster Aware Updating that is smart enough to handle this for us. It allows us to define a cluster for patching, so we can tell our automated tools to update and reboot the cluster, or we can even just update and reboot manually. This seems like a big win—it was hard to test in earlier releases of Windows 2012, as updates weren’t available. So my question was how it would work with SQL Server. My first test (I’ll follow up with testing a SQL Server Failover Cluster Instance) was with my demo AlwaysOn Availability Groups environment.

The environment was as follows:

  • One Domain Controller (controlling the updates as well)
  • Two SQL Server 2012 SP1 nodes
  • No Shared Storage
  • File Share and Node Majority Quorum Model (File Share was on DC)
  • Updates downloaded from Windows Update Internet service

I ran into some early issues when I ran out of C: drive space on one of my SQL VMs, it was less than intuitive that the lack of storage was the issue, but I was able to figure it out and work through it. So I started onto attempt #2. The process for how cluster aware updating works as follows:

  • Scans both nodes looking for required updates
  • Chooses node to begin updates on (in my case it was the node that wasn’t the primary for my AG—not sure if that’s intentional)
  • Puts node into maintenance mode, pausing the node in the cluster
  • Applies Updates
  • Reboots
  • Verifies that no additional updates are required
  • Takes node out of maintenance mode.

All was well when my node SQLCluster2 went through this process. When SQLCluster1 went into maintenance mode this happened:

When I logged into SQL Server on SQLCluster2 to check the Availability Group dashboard, I found this.

The Availability Group was in resolving status. Mind you the cluster still had quorum, and was running. I couldn’t connect to the databases that are members of the AG, and I could connect to the listener, but the again databases were inaccessible. The only option to bring these DBs online is to perform a manual forced failover to the other node, which may involve data loss. After the updating is completed the services do resolve themselves.

I was hoping Cluster Aware Updating would work a little more seamlessly than that. As far as I can tell, to avoid an outage, I will need to either have manual intervention, or build in some intelligent scripting to fail my AGs over ahead of time. Hopefully this will get resolved in forthcoming SPs and/or CUs.

**Update–Kendal Van Dyke (b|t) messaged me and proposed that changing the failover and failback settings for the cluster (the number of failures that are allowed in a given time period) could resolve the issue.  Unfortunately, I saw the same behavior that I saw above.

Starting a SQL Server Clustered Instance in Single User Mode

One of my DBAs came to me the other day with an issue—he was trying to start an instance in single-user mode in order to do an emergency repair on a database. But he kept getting blocked when trying to use the Dedicated Administrator Connection (DAC). After a bit of research (to be honest, and I’ve worked a lot with clusters, I’ve never had to do this to a cluster), it turns out that the cluster resource, uses the 1 connection available to SQL Server in single user mode, much like the SQL Server Agent can.

So what do we do to get around this issue? Follow these steps.

Remote Desktop into the node of your cluster that currently owns the SQL Server service. You can do this either by connecting the Cluster Network Name, or by checking who the Owner Node is in Failover Cluster Manager.

If you had a –M in your startup parameters, remove it now.

 

Take the resource name for the cluster offline—note this is not the whole clustered service or role—just the resource. You will still need the disks and IP address that the cluster needs to run. Note—you also want to make sure the SQL Server service and SQL Agent service are stopped, if your dependencies are correct this should happen anyway, but confirms.

Launch a command prompt, and issue the following command: net start MSSQLServer /m

Launch your DAC connection and do whatever work you need to do.

When you are completed stop the SQL Server service from your command window.

Go back to failover cluster manager and bring your Cluster Resource back online. Then restart SQL Server and SQL Agent. Green is good—if you have any services that don’t restart check the cluster and application logs in Event Viewer.

Hopefully, you don’t have to use the DAC that often—it’s usually a pretty bad situation when you do, but hopefully this post will save you some hassles when trying to do it.

Recovering From a Deleted Cluster Resource

I recently had the joy of hearing about a security issue on one of our SQL Server Failover Cluster instances, in an attempt to resolve the problem, someone inadvertently deleted the clustered SQL service from Failover Cluster Manager. Why? I have no idea—but I set about trying to figure out what to do to recover from it. Note—all of these screen shots are from Windows Server 2012 and SQL Server 2012, but the same technique applies in 2008R2—I don’t have a 2003 Cluster to test on, and I think the procedure may be a little more manual there (you may have to make a registry entry).

So how do we delete the service?—easy, right click on it and select remove.

Windows will try to warn us, but for the purposes of science we will continue:

After we click yes, the cluster deletes the service, and SQL Server goes to sad panda mode. This happens because the SQL Server was dependent on our Virtual Server Name, which disappeared when we deleted the service. I’m not sure why the Agent looks like it’s still running—it to is down.

A quick look over at our Domain Controller, shows that our cluster object (SQL2012CL1) is still in AD, so no recovery is needed there—so we don’t need to recover it. However, our DNS entry is gone.

So where do we start the repair process? Select Roles in Failover Cluster Manager, and then select Configure Role

Select the role “Other Server”

Set your Client Access Point—use the same name and IPAddress as your previous service.

Select your applicable disks—this is my demo instance, so I only have one cluster LUN.

Here’s where it starts to get good. We select the SQL Server and SQL Server Agent resource types.

It looks like we may be done—we aren’t.

This is where we need to set our dependencies and properties. So first right click on “New SQL Server” and select Properties

SQL is going to be dependent on the network name, the IP address, and the Disk that our cluster uses all being online.

So we add those—next go to properties in that same window. We need to set VirtualServerName and InstanceName to their correct values.

Lastly, I set the service name to the right value, but I don’t think it matters.

We need to do a similar process for the SQL Agent. It is only dependent on the SQL Server service.

We also need to define the parameters for it.

Then we can try to start everything up and yes, it’s all green!!

Hopefully this never happens to you, and it shouldn’t but in the event that it does, you now know what to do.

Windows Server 2012 Impressions, Part 2

I’ve been doing some testing with Windows 8, and after fighting with VMWare for a while yesterday, I was able to get a SQL 2012 Failover Cluster Instance up and running on two nodes. All in all, the process wasn’t that different from building a cluster in Windows 2008R2, but there are a couple of things to note.

One of which is that the new Windows file system ReFS, which Denny Cherry blogged about here, is not supported for SQL cluster disks—I had my data disk formatted using ReFS, and the SQL installer showed the dreaded red X for shared disks. When I reformatted the same disk under NTFS, everything worked fine. I’m not sure what will happen if you try to do a standalone SQL install to an ReFS disk, I’ll try that later.

The major issue I ran into was getting .net 3.5 installed, so the SQL installation could complete successfully. This has been well documented by Aaron Betrand (b|t) here and by Allan Hirt (b|t) here.

When trying to add the feature from Server Manager, the following error occurs:

“.NET Framework 3.5” feature the installation fails with the error message “The source files could not be downloaded.”

The workaround for this is to run the following command (assuming your Windows 8 media is mounted under D:\)

dism.exe /online /enable-feature /all /featurename:NetFX3 /Source:d:\sources\sxs /LimitAccess

Aside from those two minor issues, the install went pretty well, and everything worked as expected.

In my earlier post “Adding a New Disk to a SQL Cluster Instance“, I needed to clarify something based on a comment. Any drive that gets added to your cluster that is not a mount point, as in occupies a physical drive letter, needs to be dependent on the SQL Server service.  I didn’t think to mention that, because I recommended using mount points.

Thanks to reader Shaun Archer in the West Indies for pointing that out in the comments. I’d offer to send him some rum, but I believe he has a better supply chain than I do.

SQL Server Cluster SAN Migration–Part I The Easy Way

Recently, we got a new SAN (yeah!!!), unfortunately, our SAN vendor’s migration utilities are priced at such a level, that there is no way we were going to use them. So we had to resort to manually moving the disks. Fortunately in this instance all of the directory names and data files were the same, so there was no need to do anything internal to SQL Server. What happens when you have to make those changes will be part II of this post.

I would feel remiss to not include a statement about discussing your storage layout and performance needs with you Storage Administrator, but that’s outside of the scope of this particular blog post.

Of course, with any all activity like this, the #1 rule is TAKE A BACKUP BEFORE DOING ANY OF THIS! This is a NON-destructive process, but better safe than sorry.

If you don’t need to change any directory names—the process is as simple:

  1. Mount the drives in your cluster (See my post here on how to do that)
  2. Make your SQL Server service offline–I used failover cluster manager to do this
  3. Copy the directories from your source drive to your target drive
  4. Remove the older drives from your cluster service and take them offline (This isn’t a necessity, but I do it to prevent any possible conflict). You can bring these drives back if needed–you are not destroying any data
  5. Select the drive you had chosen as the root drive (S:\) and change the drive letter to match what your SQL Server instance is expecting. The mount points underneath it should be maintained.
  6. Make the SQL Server service dependent on the root disk. In order to do this right click on the SQL Server service name > Dependencies Tab > Add. See screenshot below
  7. Restart your SQL Service–everything should be ready to go.

 

If anything fails investigate the alert log, for both Windows and SQL Server, and resolve any issues. A common problem would be the dependency or a directory not being copied (generally SQL binary files)

Of course, this is assuming you do not have to change directory names or move any databases. More on that next week.

    %d bloggers like this: