Last week I encountered a unique application requirement—we have a database environment configured with AlwaysOn Availability Groups for high availability (HA) and disaster recovery (DR), and our application was going to be creating and dropping databases from the server on a regular basis. So, I had to develop some code to handle this process automatically. I had a couple of things going in my favor on this—it was our custom developed application that was doing this, so I knew the procedures I was writing would be called as part of the process, and additionally, our Availability Group was only two nodes at the moment, so my version 1 code could be relatively simplistic and work. Aaron Bertrand (b|t), posted on this stack exchange thread, his code is a good start. I’m not going to put all of my code in this post—it’s ready for our prime time, but I have a few more fixed I’d like to make before releasing the code into the wild.
Dropping a Database
First of all, I need to say it’s very important to secure these procedures—they can do bad things to your environment if run out of context—particularly this one. I denied execute on everyone except the user who would be calling the procedure. I didn’t want any accidents happening. Dropping a database from an availability group is slight different then doing it from a standalone server. The process is as follows:
- Remove the database from the availability group (on the primary)
- Drop the database from the primary
- Drop the database from the secondary node(s)
Since, we have to initiate this from the primary instance we need to find out two pieces of data—1) what availability group are we removing the database from and 2) is the instance we are on the primary instance. In my code—I didn’t actually have to drop the database from the primary server. That piece was being called from another proc. So, I just had to remove from the availability group, and remove on the secondary. There are a number of ways to connect the secondary database—this needs to happen in SQLCMD mode, which isn’t possible in a stored procedure. We could use a linked server, or we could enable xp_cmdshell and run a SQLCMD script, and then disable xp_cmdshell. This isn’t my favorite technique from a security perspective, but I was under a time crunch, and the procedure is locked down, but in the future I will probably rebuild this with linked servers (created within the procedure)
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[RemoveDBfromAG] @dbname VARCHAR(50)
AS
BEGIN
EXEC sp_configure 'show advanced options'
,1
RECONFIGURE
WITH OVERRIDE
EXEC sp_configure 'xp_cmdshell'
,1
RECONFIGURE
WITH OVERRIDE
DECLARE @dbid INT
DECLARE @groupid VARCHAR(50)
DECLARE @groupname VARCHAR(50)
DECLARE @server VARCHAR(50)
DECLARE @primary VARCHAR(50)
DECLARE @secondary VARCHAR(50)
DECLARE @sql NVARCHAR(max)
DECLARE @dropsecondary VARCHAR(500)
SELECT @dbid = database_id
FROM sys.databases
WHERE NAME = @dbname
SELECT @groupid = group_id
FROM sys.availability_databases_cluster
WHERE database_name = @dbname
SELECT @groupname = NAME
FROM sys.availability_groups
WHERE group_id = @groupid
SELECT @server = @@SERVERNAME
SELECT @primary = primary_replica
FROM sys.dm_hadr_availability_group_states
WHERE group_id = @groupid
SELECT @secondary = node_name
FROM sys.dm_hadr_availability_replica_cluster_nodes
WHERE node_name != @primary
SELECT @sql = 'alter availability group ' + @groupname + ' remove database [' + @dbname + ']'
SELECT @dropsecondary = 'sqlcmd -S "' + @secondary + '" -E -Q "exec ReturnsTestInstanceDropSecondaryDB [' + @dbname + ']"'
IF NOT EXISTS (
SELECT primary_replica
FROM sys.dm_hadr_availability_group_states
WHERE primary_replica = @primary
)
BEGIN
RETURN
END
ELSE
BEGIN
EXECUTE sp_executesql @sql
WAITFOR DELAY '00:00:25'
EXEC xp_cmdshell @dropsecondary
EXEC sp_configure 'xp_cmdshell'
,0
RECONFIGURE
WITH OVERRIDE
END
END
GO
The one particularly unique thing you will notice in the code—is a “WAITFORDELAY”—what I observed is that after the secondary database is removed from the availability group, it goes into recovering for about 10 seconds—and we are unable to drop a database while it’s in recovery. By implementing that wait (the T-SQL equivalent of a sleep command) the database was able to be dropped.
Adding a New Database
Adding a new database has similar requirements—we have a couple of additional factors though. We have to verify that the instance we are on, is the primary for the availability group we are adding the database into. This is where I really need to fix my code—it assumes that there are only two nodes in our availability group cluster. I need to refactor the code to potentially loop through the other four secondaries (or 7 if we are talking about SQL Server 2014). Also, I’m using a linked server connection—this also assumes that the drive letter and path to the data file directory on the secondary are also the same. To summarize, the process is as follows:
- Accept the availability group and database names as input parameters
- Verify that the node you are working on is the primary for the availability group
- Backup the database and transaction log of the database to file share you’d like to add to the availability group
- Add the database to the availability group on the primary
- Restore the database and transaction log to the secondary with norecovery
- Alter the database to set the availability group
My code for this isn’t totally flushed out—it works in my environment, but I don’t think it’s ready for sharing. I’ll share later, I promise.
Pitfalls to Avoid
This isn’t that different than just building out an availability group, but many of the same caveats apply, you need to ensure agent jobs and logins affiliated with a database are on all nodes in the availability groups. Additionally, the procedures to add and remove databases from your availability group, need to run on all of the nodes. Also, if you are doing this programmatically, the connection should use the listener, so that your application is always connecting to the primary instance.
Funny you posted about this, we are scoping out something similar at my organization at the moment. We have a huge multi terabyte db that is spit up in to multiple quarterly databases (2011Q1,2011Q2,…etc). All of the dbs except the active quarterly db are read only and only the active quarterly db needs to be restored in a DR situation ASAP to bring the system online. The read only dbs can be restored from tape as time permits in a real DR. We were thinking of putting together an AG group that only mirrors the active quarterly DB to another replica that is hooked up to replicated storage. Obviously automation would need to be written to rotate in a new quarterly db, rotate the old out. I’m looking forward to your follow up post.
Thank you for this.
I’ll share later, I promise. 😉
Where is the procedure “ReturnsTestInstanceDropSecondaryDB” script?