I’ve recently been helping a client set up and configure SQL Server Always On Availability Groups. In addition to implementing your SQL Server Agent jobs on all the secondaries, you also have to implement a mechanism to make jobs aware of which node they are running on (Primary or secondary replicas). You would think this would be available out of the box, right? I guess they are so many permutations (especially when you have multiple availability groups on the same server), that it was left to end users to implement.
When SQL Server completes a primary <=> secondary replica role change it fires alert 1480, and you can configure this alert to notify you of the role change and act upon it.
You have 2 choices:
- Respond to the role changeover alert (1480) and run a task to disable the jobs related to the databases in the availability group that has failed over.
- Add a new first job step to every SQL Agent job which checks if it is running on the Primary Replica. If it is running on the Primary then move to the next job step and continue executing the job, otherwise if it is running on a Secondary Replica then stop the job.
To check if the Primary Replica of a given AG database is currently hosted on a particular server or not, you can either check based on the name of the database or you can check based on the name of the Availability Group the database is in (a database can only be in one availability group).
My scenario had a single availability group with not all databases on the server in the AG (a common scenario). After a couple of unsatisfactory attempts at (a), I decided to opt for (b).
Not wanting to do this change manually for every agent job, I’ve created a stored procedure that adds a new first job step to any agent job, to check if the job is running on the primary. If not, it silently quits with success. There’s also a short snippet of TSQL to run the proc for every SQL Agent job.
use master go -- Adds a first step to specified job, which checks whether running on Primary replica create procedure AddAGPrimaryCheckStepToAgentJob @jobname nvarchar(128) as set nocount on; -- Do nothing if No AG groups defined IF SERVERPROPERTY ('IsHadrEnabled') = 1 begin declare @jobid uniqueidentifier = (select sj.job_id from msdb.dbo.sysjobs sj where sj.name = @jobname) if not exists(select * from msdb.dbo.sysjobsteps where job_id = @jobid and step_name = 'Check If AG Primary' ) begin -- Add new first step: on success go to next step, on failure quit reporting success exec msdb.dbo.sp_add_jobstep @job_id = @jobid , @step_id = 1 , @cmdexec_success_code = 0 , @step_name = 'Check If AG Primary' , @on_success_action = 3 -- On success, go to Next Step , @on_success_step_id = 2 , @on_fail_action = 1 -- On failure, Quit with Success , @on_fail_step_id = 0 , @retry_attempts = 0 , @retry_interval = 0 , @os_run_priority = 0 , @subsystem = N'TSQL' , @command=N'IF (SELECT ars.role_desc FROM sys.dm_hadr_availability_replica_states ars JOIN sys.availability_groups ag ON ars.group_id = ag.group_id AND ars.is_local = 1) <> ''Primary'' BEGIN -- Secondary node, throw an error raiserror (''Not the AG primary'', 2, 1) END' , @database_name=N'master' , @flags=0 end end GO ------------------- -- Run AddAGPrimaryCheckStepToAgentJob for each agent job DECLARE @jobName NVARCHAR(128) DECLARE jobCursor CURSOR LOCAL FAST_FORWARD FOR SELECT j.name FROM msdb.dbo.sysjobs j --WHERE ??? -- filter out any jobs here OPEN jobCursor FETCH NEXT FROM jobCursor INTO @jobName WHILE @@FETCH_STATUS = 0 BEGIN exec AddAGPrimaryCheckStepToAgentJob @jobName FETCH NEXT FROM jobCursor INTO @jobName END CLOSE jobCursor DEALLOCATE jobCursor GO ---------------------- -- Remove the first job step ''Check If AG Primary'' added in previous snippet -- Just here should you want to remove the step added in snippet above. DECLARE @jobName NVARCHAR(128) DECLARE jobCursor CURSOR LOCAL FAST_FORWARD FOR SELECT j.name FROM msdb.dbo.sysjobs j join msdb.dbo.sysjobsteps js on js.job_id = j.job_id where js.step_name = 'Check If AG Primary' and js.step_id = 1 OPEN jobCursor FETCH NEXT FROM jobCursor INTO @jobName WHILE @@FETCH_STATUS = 0 BEGIN EXEC msdb.dbo.sp_delete_jobstep @job_name = @jobName, @step_id = 1 ; FETCH NEXT FROM jobCursor INTO @jobName END CLOSE jobCursor DEALLOCATE jobCursor GO
Note: For SQL Server 2014 onwards you can use the builtin function sys.fn_hadr_is_primary_replica('dbname')
:
If sys.fn_hadr_is_primary_replica (@dbname) <> 1 BEGIN -- This is not the primary replica, exit without error. END -- This is the primary replica, continue to run the job...