Windows MPIO Information–Encrypted, Really?

We are in the process of trying to get all of our Windows 2008 and higher servers of EMC PowerPath and onto the native Windows MPIO driver. While PowerPath has some benefits, it’s also very expensive, and we’ve run into some versioning issues that have caused a couple of outages. Since Microsoft started including a driver (that generally works really well) in Windows 2008, we are using that as a standard.

There were some questions from one of my engineers about the paths looking different from PowerPath to the native driver so I went first to the MPIO configuration tool (Control Panel > Admin Tools > MPIO). It doesn’t provide a great deal of detail, but there is the option to capture a snapshot of the current MPIO configuration. Well, I tried to run this and got “System Error 5 has occurred. Access is Denied”

Given that I was a local admin on the server, I was really curious as to why this was failing. So I went to the command line, where I could use the MPCLAIM tool. The –V flag gathers the config information and exports it to a file. Same error—Access is Denied, Error 5. So I went to the googles—and I found this from a Microsoft employee in a forum post.

Microsoft for whatever reason decided to encrypt the temp file for that is generated. It turned out in my case we had an issue with an expired Data Recovery Agent certificate that was preventing AD from encrypting that file. If you are trying to gather MPIO config information, you need to have the ability to encrypt. So if you run into this, you probably need to contact your friendly domain admin.

Does anyone have any ideas on why Microsoft would want to encrypt this file? The only logical thing I could conclude is that is could have configuration information, but that info tends to be accessible from other methods (HBA clients, WMI, etc)

 

Availability Group DMVs—Part 1 of many–sys.availability_databases_cluster

After some recent discussions with colleagues Tom LaRock (b|t) and Kendall VanDyke (b|t) about information around AlwaysOn availability groups and how it’s lacking in the GUI, I thought I would do some community service and start documenting the DMVs a little further than books online. One thing you will notice in this series is that the naming is not always consistent—some DMVs will reflect the original HADRON name that Microsoft was using for Availability Groups.

So let’s get into the DMVs. Starting with sys.availability_databases_cluster:

This DMV contains one record for each Availability Group and database hosted on a given Windows cluster—it should look the same on each member of your availability group. The GUIDs represent the availability group’s unique ID, and then the unique ID of the database within the group, and the name of the database. Here’s a look at a SharePoint AG.

This data isn’t too interesting—as we get further into this blog series, I’ll show how we can tie these DMVs together to get relevant data.

Appliances—Unitaskers or Problem Solvers?

What is a Unitasker?

One of my favorite hobbies is cooking—I’ve blogged about how project planning isn’t that different from planning a dinner party. I am pretty into the technical side of cooking—one of ethos I try to follow in equipping my kitchen is to avoid the dreaded “unitasker” syndrome. For a tool to makes its way into my kitchen, it needs to either be versatile, or do what it does incredibly well. A good example of a bad unitasker is the Margarita Maker—I already have two other tools (a cocktail shaker and a blender) which could accomplish the task of making a margarita, so why do I need a specialized appliance that doesn’t really do anything else? My knives and my Vita-Prep do a lot of things really well—not just one thing. So why am I writing about this on my database blog? We have the same phenomenon going on in IT right now.

The Bad Old Days

Back in the bad old days, servers were really expensive, and came with even more expensive support contracts. This was especially true if you were on the dark side and managed UNIX servers which required RISC based hardware. In their time these servers brought really outstanding performance, but they also came with hefty vendor lock in. You paid for your O/S support from the hardware vendor and you had to buy any additional spare parts through the vendor. In 2009, which really isn’t that long ago, my former company had to pay $36,000 to put 16 GB of memory into a mid-range UNIX server. We could have purchased the same exact memory for the same vendors’ x64 server for about $2000. Support on those 4 servers was also north of $200k/yr. By now most enterprises have seen the light and are using commodity hardware—running some combination Linux and Windows as their application stack demands. The amount of hardware that can be purchased for $10,000 now completely floors me (I spec’ed some servers in this price range with 8 really fast cores and 256 GB of RAM the other week). So now that hardware is cheaper (and service on those commodity servers is really minimal–< $1000/yr in most cases) the margins of the hardware manufacturers are way down—so where do they make up that margin? Margarita machines, er hardware appliances.

Computing Appliances

The first exposure I had to dedicated computing platforms was when I worked in pharmaceutical manufacturing—the appliances in questions were hardened physically and protected against extreme conditions. Also, they were easy to deal with from a validation perspective. So these devices made sense—and since they more along the lines of workstations, they didn’t really affect our IT buying. The first big splashy product launch of an appliance I came across was Exadata—the HP Oracle Database Machine (this was prior to Oracle’s acquisition of Sun). The original specs for this machine really targeted large data warehouse environments (it’s still pretty good at that), but the sales and marketing folks came in, Oracle bought Sun, and now we have ads on the front of the Wall Street Journal proclaiming “the World’s Fastest Database Machine”. It’s really not—see my friend Kevin Closson’s blog if you are really interested.

Me too!!

Microsoft introduced a similar product with Parallel Data Warehouse in 2010. The Microsoft product is slightly different as the hardware is purchased through a hardware vendor, as opposed to the database vendor. SAP has their own offering with the BW accelerator—designed to speed memory processing of large data warehouse queries. One difference between these two products and Oracle’s is the option for different hardware vendors. There are numerous other appliances from numerous vendors. In the big data space, numerous vendors have solutions for Hadoop.

What Do Appliances Have in Common?

Some things are different, but most of the appliances I’ve seen have the following things in common:

  • Purchased as a whole unit, typically in half rack or whole rack configuration
  • A large upfront cost, either directly to the vendor, or to a third party consulting partner, to “configure” the appliance for your environment
  • Ongoing support costs that are much higher than on commodity hardware
  • An expansion plan that consists of—buy another really expensive appliance, with all of the above costs
  • Sales reps from major vendors seem heavily incentivized to sell them, even where they don’t fit—so be wary

Benefits?

I’ve been trashing these appliances throughout this post, but they do have some benefits. If you have the right type of workload (and from a database perspective this tends to be a large data warehouse) and more importantly, a singular workload (not mixed OLTP/DW) these machines can really offer some great performance gains, at the cost of flexibility and lots of money.

Things to Think About

I work for a large enterprise, we buy a whole lot of servers (this means our server costs are really low), and we have a couple of appliances. The appliances don’t fit well into our monitoring and standards, and when workloads get onto them they tend to get stuck there. Appliances represent the ultimate case of vendor lock-in to me.

While there can be performance gains, the main reason I see for these appliances is for hardware vendors to recoup some of the lost profit margin I mentioned above. The best example I’ve seen is with Hadoop—which was basically designed around cheap, commodity hardware. Every major HW vendor now has some sort of an appliance for Hadoop, which costs a ton more than building your own, with no real functionality improvements.

So think long and hard when your friendly sales rep or VP/CIO tries to convince you that an appliance is the solution that will resolve all of your data problems.

 

 

 

 

 

Group Managed Service Accounts

One of the really enjoyable things about my current job, is being able to work on some of the latest technology as it comes out. My team owns the infrastructure build processes, so I get to do fun things like building proof of concept environments, SAN testing, and working with new versions of things. We’ve been spending a lot of time working with some of the new features of Windows Server 2012, in an effort to get it ready for production rollout. We’re ready to go, but we are still waiting on our antivirus team.

For those of you who work in organizations with any sort of regulatory concern (which is pretty much any public corporation), you know that managing the accounts that own SQL Server services can be a bit of nightmare. You have to maintain passwords, and if the password changes at any point in time, it’s a very manual process to change it on each of your servers. Its best security practice to change these passwords on a regular basis, and particularly if a member of the DBA team leaves. However changing this password requires touching every server in some fashion—you can automate this process, but it’s still kind of painful. Also, if that account ever gets locked out—it can be really painful.

Starting with Windows Server 2008 R2 the concept of Managed Service Accounts came into play. These were accounts with Active Directory internal passwords—set to change on a regular basis, but controlled by the system. The best part of this was that AD did the password change and propagated it out to the server. The problem with this for large environment was that there was a 1:1 ratio of accounts to servers—which mean we could never use it on a SQL Cluster, and it wasn’t necessarily practical (though a best security practice) to have a Domain account for each SQL Server in your environment.

With Windows Server 2012, this concept has been expanded to Group Managed Service Accounts (gMSAs)—which have one account that can be used by multiple servers. So this makes it suitable for clusters* (the * is for AlwaysOn Availability Groups—more on that later), which is very important in my organization, as we are heavily clustered. Note—this process includes a number of steps which will require your Active Directory team. The initial steps actually have to take place on a Domain Controller. I highly recommend showing this post to your Windows Admin—it was an excellent resource.

The prerequisites for this are as follows:

  • The Active Directory schema in the gMSA domain’s forest has to be updated to Windows Server 2012 in order to create a gMSA (note this can be done on a 2008/2008R2 Domain Controller). Send your Admin here.
  • The Key Distribution Services KDS root key needs to be created. A domain admin needs to do this on a domain controller. See here.
  • You have to wait 10 hours after creating this key—really you do. You can advance the time in your domain for testing, but don’t do that.
  • The target servers for the gMSA need to be running Windows Server 2012.

So after you domain admin has created the key, then it’s time to create the gMSA. Run the following PowerShell command as admin (may need to be run by your domain admin):

New-ADServiceAccount gmsa1 –DNSHostName membersrv1.domain.tld –PrinicipalsAllowedToRetrieiveManagedPassword ShortGroupName

Now you have your Group Managed Service Account:


Here’s the AD Information on it:

One thing I didn’t mention before is setting up groups. This is going to depend on the size of your environment and your management philosophy. In our case, I’m thinking about creating three groups—production, non-production, and mission critical. Some shops may create one for each cluster or server, it really depends on your needs.

Also, by default, a gMSA will change the password every 30 days—this is modifiable, but since it’s a system managed password, I generally don’t want to mess with it. The beautiful thing about this, is that when AD changes the password, since it is system managed, you don’t have to touch your SQL Servers at all. It just works.

One interesting note, you will need to add the Active Directory Domain Services role to your SQL Servers. This will enable the PowerShell command you need to use to install the service account on each of your servers. You can do this by issuing the following PowerShell command:

Install-WindowsFeature –Name AD-Domain-Services –ComputerName $servername –IncludeManagementTools

After this, you need to install the account on each server, once again using PowerShell:

Install-ADServiceAccount gMSA1

After that, when installing SQL Server just search for the account in Active Directory. You need to make sure the “Service Account” object is checked off.

I’ve selected the Service Account—note that the password box is greyed out. You don’t enter the password (since only AD knows what it is).

I know a password shows up in Configuration Manager—don’t mess with this—it’s managed exclusive by Active Directory.

I have tested this configuration in standalone and SQL Server Failover Cluster instances (only in SQL 2012, however), and it works fine. AlwaysOn Availability Groups, however act a little weird, I will follow up with another blog post on this, but initially everything configured fine. However, when I tried to failover, I got two really ugly SSPI errors, but then it just worked. I haven’t seen a statement from Microsoft on support for this feature, so stay tuned.

Vendors, Again—8 Things To Do When Delivering a Technical Sales Presentation

In the last two days, I’ve sat through some of the most horrific sales presentations I’ve ever done—this was worse than the time share in Florida. If you happen to be a vendor and reading (especially if you are database vendor—don’t worry it wasn’t you), I hope this helps you craft better sales messages. In one of these presentations, the vendor has a really compelling product that I still have interest in, but was really put off by bad sales form.

I’ll be honest, I’ve never been in sales—I’ve thought about it a couple times, and still would consider it if the right opportunity came along, but I present, a lot. Most of these things apply to technical presentations as well as sales presentations. So here goes.

The top 8 things to do when delivering a sales presentation:

  1. Arrive Early—ask the meeting host to book your room a half hour early and let you in. This way you can get your connectivity going, and everything started before the meeting actually starts, wasting the attendee’s valuable time, and more importantly cutting into your time to deliver your sales message. Also starting on time allows you to respect your attendees’ schedules on the back end of the presentation.
  2. Bring Your Own Connectivity—if you need to connect to the internet (and if you have remote attendees, you do) bring your own connectivity. Mobile hotspots are widely available, and if you are in sales you are out of the office most of the time anyway, consider it a good investment.
  3. Understand Your Presentation Technology—please understand how to start a WebEx and share your presentation. If you have a Mac have any adapters you need to connect to video. If you want to use PowerPoint presentation mode (great feature by the way) make sure the audience doesn’t see the presenter view, and sees your slides. Not being able to do this is completely inexcusable.
  4. Understand Who Your Audience Is—if you are presenting to very Senior Infrastructure architects in a large firm, you probably don’t need to explain why solid state drives are faster than spinning disks. Craft your message to your intended audience, especially if it has the potential to be a big account. Also, if you know you are going to have remote attendees don’t plan on whiteboarding anything unless you have access to some electronic means to do so. You are alienating half of your audience.
  5. Don’t Tell Me Who Your Customers Are—I really don’t care that 10 Wall St banks use your software/hardware/widget. I think vendors all get that same slide from somewhere. Here’s a dirty little secret—large companies have so many divisions/partners/filing cabinets that we probably do own 90% of all available software products. It could be in one branch office that some manager paid for, but yeah technically we own it.
  6. I Don’t Care Who You Worked For—While I know it may have been a big decision to leave MegaCoolTechCorp for SmallCrappyStorageVendor, Inc., I don’t really care that you worked for MegaCoolTechCorp. If you mention it once, I can deal with it, but if you keep dropping the name it starts to get annoying and distracting.
  7. Get on Message Quickly—don’t waste a bunch of time telling me about marketing, especially when you go back to point #4—knowing your audience. If you are presenting to a bunch of engineers, they want to know about the guys of your product, not what your company’s earnings were. Like I mentioned above, one of the vendors I’ve seen recently has a really cool product, which I’m still interested in, but they didn’t start telling me about the product differentiation until 48 minutes into a 60 minute presentation.
  8. Complex Technical Concepts Need Pictures—this is a big thing with me. I do a lot of high availability and disaster recovery presentations—I take real pride in crafting nice PowerPoint graphics that take a complex concept like clustering and simplify it so I can show how it works to anyone. Today’s vendor was explaining their technology, and I was pretty familiar with the technology stack, yet I got really lost because there were no diagrams to follow. Good pictures make complex technical concepts easy to understand.

I hope some vendors read this and learn something. A lot of vendors have pretty compelling products, but fail to deliver the sales message which is costing them money. I don’t mind listening to a sales presentation, even for a vendor I may not buy something from, but I do really hate sitting through a lousy presentation that distracts me from the product.

SQL Saturday #200 – Philadelphia 2013

It is with great pride that I announce the 200th SQL Saturday today. It’s with even more exuberance that I announce that it will be my user group’s event in Philadelphia (well Malvern to be specific). June 1st 2013, at Microsoft, where we had the event last year.

The webpage is here.

I had a great time running last year’s event, and was really happy with how it went, and I look forward to putting on another great event. We will do something special, since this is #200 and we are in the bicentennial city.

Cluster Aware Updating Part II—SQL Server Failover Cluster Instances

I recently wrote here about my experiences testing out the new Windows Server 2012 feature, cluster aware updating with SQL Server AlwaysOn Availability Groups. As you see in the post, it didn’t go so well. I didn’t have the opportunity to test on the more traditional SQL Server Failover Cluster instance.

Well after the Thanksgiving holiday, I was able to get my infrastructure up and running and build a Failover Cluster Instance. Note—I was using SQL 2012 SP1 and Windows 2012, and use Starwind SAN software for the shared storage requirement. I am very happy to report, that the Cluster Aware Updating process simply worked, and failed over the instance correctly. So in order to configure this, you will simply need to configure Cluster Aware Updating and not do anything else to SQL Server. Details on Cluster Aware Updating can be found here.

I suspect the issue with AlwaysOn Availability Groups relates to the way it interacts with the cluster service. In our traditional model of failover clusters, do all of our instance control through Failover Cluster Manager, however with AlwaysOn AGs we are specifically instructed not to:

 

                                    -source http://msdn.microsoft.com/en-us/library/ff878487.aspx

So I think since Cluster Aware Updating is attempting to manipulate that service, that is what is causing the errors. Just my thoughts, please share yours!

The Best Consulting Gig Ever

Before I knew anything about anything, I signed up for a site called Elance. It’s a reverse auction site for contract/consulting type work. From seeing the prices on the site, I can’t imagine the quality of service is very high, and probably ends up with questions getting asked like “Do you have any MSSQL instances running on Linux?” (oops, that happened at my real job today). Anyway, I hadn’t paid any mind to Elance in years, but I got an email this morning saying that someone had invited to submit a proposal for a “DBA Performance Architect” role. Since the title looked interesting I read the email.

DBA Performance Architect

Fixed price: Less than $500  |  Database Development  |  Posted: Apr 19, 2012
  some guy,    United Arab Emirates
XYZ LLC requires all candidates to complete a 1 week test assignment as part of our interview process. You will be paid for the test assignment ONLY if we decide to hire you based on the test assignment results. If we decide not to hire you, there will be no payment made. 

The XYZ LLC DBA Performance Architect role pays $30/hr ($1200 fixed price payment if your trial performance leads to a successful hire). Please note that we require signup or an existing affiliation with an approved staffing and payment platform that charges a percentage fee on your gross earnings that will impact your net payment.

The XYZ LLC DBA Performance Architect position is based around improving application performance by improving the performance of the database. Job includes increasing database performance through, database deployment and configuration, optimizing SQL and stored procedures, indexing and schema based optimization, and database tools optimization.

This role requires experience with Oracle, Postgres, Microsoft SQL Server, SQL, stored procedures, mysql.

In addition, this role has the following non-technical requirements

English – all candidates must be able to speak and write capably in English. English need not be the native first language, but it should be sufficient to enable technical discussion.
Video – candidates must have the computer hardware and networking bandwidth to conduct a seamless video skype conversation for team communication. Ongoing use of webcam for billing and skype video are required.
Full Time – This job is only offered for a Full Time basis (40 hrs/wk).

Qualification for this role begins with an unpaid testing phase where candidates are required to provide verifiable identification, complete online skills testing, and attend a brief Skype video interview. The testing phase lasts for no more than a week, and highly motivated candidates can complete all requirements of the testing phase within a few hours.

Candidates who pass our testing phase will be offered a one week fixed price trial assignment. Candidates who successfully complete our trial assignment within the given time frame will be offered the full time position at 40 hrs/wk at the jobs hourly rate. Those who show progress and dedication during the trial phase but fail to complete the assignment successfully may be offered an alternate position within XYZ LLC.

If interested, simply click “Apply for this Position” below.

http://tbe.taleo.net/NA5/ats/careers/requisition.jsp?org=DEVFACTORY&cws=1&rid=104
——————————————
Added 18 MAY 2012, 14:55 PM EDT
The pay rate for United States citizens and permanent residents is $24/hr to cover W-2 processing.

There are just so many things wrong with this I don’t know where to start. A one week free trial phase to make sure I know what I’m doing, and this only for a one week fixed priced assignment (at < $500). Wow, and then they only are paying $30/hr, oh wait you’re in the US make that $24/hr. I can barely hire someone who can turn on a computer for that in the US—much less someone who knows Oracle, Microsoft SQL Server and Postgres. Also, I have the distinct feeling that if you were to resolve their performance issues in the first unpaid testing week, then surprise you don’t pass the test assignment, and you fixed their problems for free.

I’ve seen some really crappy job postings in the past, but this one takes the cake. Good luck hiring for this XYZ.

Cluster Aware Updating and AlwaysOn Availability Groups

One of the features I was most looking forward to in Windows Server 2012, was Cluster Aware Updating. My company has a lot of Windows servers, and therefore a lot of clusters. So when a big vulnerability happens and they all need to be rebooted, we use System Center Configuration Manager to handle the reboots automatically. Unfortunately, clusters must maintain quorum to stay running, so rebooting them has generally been a manual process.

However with Windows 2012, we have a new featured called Cluster Aware Updating that is smart enough to handle this for us. It allows us to define a cluster for patching, so we can tell our automated tools to update and reboot the cluster, or we can even just update and reboot manually. This seems like a big win—it was hard to test in earlier releases of Windows 2012, as updates weren’t available. So my question was how it would work with SQL Server. My first test (I’ll follow up with testing a SQL Server Failover Cluster Instance) was with my demo AlwaysOn Availability Groups environment.

The environment was as follows:

  • One Domain Controller (controlling the updates as well)
  • Two SQL Server 2012 SP1 nodes
  • No Shared Storage
  • File Share and Node Majority Quorum Model (File Share was on DC)
  • Updates downloaded from Windows Update Internet service

I ran into some early issues when I ran out of C: drive space on one of my SQL VMs, it was less than intuitive that the lack of storage was the issue, but I was able to figure it out and work through it. So I started onto attempt #2. The process for how cluster aware updating works as follows:

  • Scans both nodes looking for required updates
  • Chooses node to begin updates on (in my case it was the node that wasn’t the primary for my AG—not sure if that’s intentional)
  • Puts node into maintenance mode, pausing the node in the cluster
  • Applies Updates
  • Reboots
  • Verifies that no additional updates are required
  • Takes node out of maintenance mode.

All was well when my node SQLCluster2 went through this process. When SQLCluster1 went into maintenance mode this happened:

When I logged into SQL Server on SQLCluster2 to check the Availability Group dashboard, I found this.

The Availability Group was in resolving status. Mind you the cluster still had quorum, and was running. I couldn’t connect to the databases that are members of the AG, and I could connect to the listener, but the again databases were inaccessible. The only option to bring these DBs online is to perform a manual forced failover to the other node, which may involve data loss. After the updating is completed the services do resolve themselves.

I was hoping Cluster Aware Updating would work a little more seamlessly than that. As far as I can tell, to avoid an outage, I will need to either have manual intervention, or build in some intelligent scripting to fail my AGs over ahead of time. Hopefully this will get resolved in forthcoming SPs and/or CUs.

**Update–Kendal Van Dyke (b|t) messaged me and proposed that changing the failover and failback settings for the cluster (the number of failures that are allowed in a given time period) could resolve the issue.  Unfortunately, I saw the same behavior that I saw above.