SQL Server Availability Groups: Add ‘Check if Primary’ Step to Existing Agent Jobs

I’ve recently been helping a client set up and configure SQL Server Always On Availability Groups. In addition to implementing your SQL Server Agent jobs on all the secondaries, you also have to implement a mechanism to make jobs aware of which node they are running on (Primary or secondary replicas). You would think this would be available out of the box, right? I guess they are so many permutations (especially when you have multiple availability groups on the same server), that it was left to end users to implement.

When SQL Server completes a primary <=> secondary replica role change it fires alert 1480, and you can configure this alert to notify you of the role change and act upon it.

You have 2 choices:

  1. Respond to the role changeover alert (1480) and run a task to disable the jobs related to the databases in the availability group that has failed over.
  2. Add a new first job step to every SQL Agent job which checks if it is running on the Primary Replica. If it is running on the Primary then move to the next job step and continue executing the job, otherwise if it is running on a Secondary Replica then stop the job.

To check if the Primary Replica of a given AG database is currently hosted on a particular server or not, you can either check based on the name of the database or you can check based on the name of the Availability Group the database is in (a database can only be in one availability group).

My scenario had a single availability group with not all databases on the server in the AG (a common scenario). After a couple of unsatisfactory attempts at (a), I decided to opt for (b).

Not wanting to do this change manually for every agent job, I’ve created a stored procedure that adds a new first job step to any agent job, to check if the job is running on the primary. If not, it silently quits with success. There’s also a short snippet of TSQL to run the proc for every SQL Agent job.

use master
go

-- Adds a first step to specified job, which checks whether running on Primary replica

create procedure AddAGPrimaryCheckStepToAgentJob
    @jobname nvarchar(128)
as

set nocount on;

-- Do nothing if No AG groups defined
IF SERVERPROPERTY ('IsHadrEnabled') = 1
begin
    declare @jobid uniqueidentifier = (select sj.job_id from msdb.dbo.sysjobs sj where sj.name = @jobname)

    if not exists(select * from msdb.dbo.sysjobsteps where job_id = @jobid and step_name = 'Check If AG Primary' )
    begin
        -- Add new first step: on success go to next step, on failure quit reporting success
        exec msdb.dbo.sp_add_jobstep 
          @job_id = @jobid
        , @step_id = 1
        , @cmdexec_success_code = 0
        , @step_name = 'Check If AG Primary' 
        , @on_success_action = 3  -- On success, go to Next Step
        , @on_success_step_id = 2
        , @on_fail_action = 1     -- On failure, Quit with Success  
        , @on_fail_step_id = 0
        , @retry_attempts = 0
        , @retry_interval = 0
        , @os_run_priority = 0
        , @subsystem = N'TSQL' 
        , @command=N'IF (SELECT ars.role_desc
        FROM sys.dm_hadr_availability_replica_states ars
        JOIN sys.availability_groups ag ON ars.group_id = ag.group_id AND ars.is_local = 1) <> ''Primary''
    BEGIN
       -- Secondary node, throw an error
       raiserror (''Not the AG primary'', 2, 1)
    END'
        , @database_name=N'master'
        , @flags=0
    end
end
GO

-------------------

-- Run AddAGPrimaryCheckStepToAgentJob for each agent job

DECLARE @jobName NVARCHAR(128)

DECLARE jobCursor CURSOR LOCAL FAST_FORWARD
FOR
    SELECT j.name FROM msdb.dbo.sysjobs j
    --WHERE ??? -- filter out any jobs here

OPEN jobCursor 
FETCH NEXT FROM jobCursor INTO @jobName

WHILE @@FETCH_STATUS = 0
BEGIN
    exec AddAGPrimaryCheckStepToAgentJob @jobName

    FETCH NEXT FROM jobCursor INTO @jobName
END

CLOSE jobCursor
DEALLOCATE jobCursor
GO

----------------------

-- Remove the first job step ''Check If AG Primary'' added in previous snippet
-- Just here should you want to remove the step added in snippet above.

DECLARE @jobName NVARCHAR(128)

DECLARE jobCursor CURSOR LOCAL FAST_FORWARD
FOR
    SELECT j.name FROM msdb.dbo.sysjobs j
    join msdb.dbo.sysjobsteps js on js.job_id = j.job_id
    where js.step_name = 'Check If AG Primary' and js.step_id = 1

OPEN jobCursor 
FETCH NEXT FROM jobCursor INTO @jobName

WHILE @@FETCH_STATUS = 0
BEGIN
    
    EXEC msdb.dbo.sp_delete_jobstep  
        @job_name = @jobName,  
        @step_id = 1 ;  
    
    FETCH NEXT FROM jobCursor INTO @jobName
END

CLOSE jobCursor
DEALLOCATE jobCursor
GO

Note: For SQL Server 2014 onwards you can use the builtin function sys.fn_hadr_is_primary_replica('dbname'):

If sys.fn_hadr_is_primary_replica (@dbname) <> 1   
BEGIN  
    -- This is not the primary replica, exit without error. 
END  
-- This is the primary replica, continue to run the job...