SQL Server – Page 4 – Mitch Wheat

Faster I/O Subsystem Causing More WRITELOG Waits?

March 13, 2019 by admin

Paul Randal explains how WRITELOG waits can increase with a faster I/O subsystem:

I was contacted last week by someone who was asking how are faster disks causing more WRITELOG waits. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOG wait time and get better workload throughput.
They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain…

Azure SQL DB Storage Bottleneck

March 5, 2019March 5, 2019 by admin

If you are using Azure SQL Databases, you should definitely read this post by Brent Ozar: There’s a bottleneck in Azure SQL DB storage throughput. The bottom line:
the transaction log throughput currently appears to bottleneck at 16 cores!

The bit where he compares AWS costs/relative performance is also an eye opener:

8 cores, $1,991 per month: 64 minutes</li><li><strong>16 cores,$ 3,555 per month: 32 minutes (and interestingly, it’s the same speed with zone redundancy enabled)
80 cores, $18,299 per month: 32 minutes</strong></li><li>Just for reference: 8-core AWS EC2 i3.2xl VM,$ 1,424 per month with SQL Server Standard Edition licensing: 2 minutes (and I don’t put that in to tout AWS, I just happen to have most of my lab VMs there, so it was a quick comparison)

Categories Azure, SQL Azure, SQL Server

TransactionScope Default Isolation Level

March 5, 2019January 7, 2019 by admin

When you create a new TransactionScope, you might be surprised to find that the default isolation level is Serializable.

TransactionScope’s default constructor defaults the isolation level to Serializable and the timeout to 1 minute. In SQL Server, SERIALIZABLE transactions are rarely useful and prone to deadlocks.

If you are using a TransactionScope and getting deadlocks in your application (and hopefully you have a retry mechanism for SQLException 1205), it might be due to the Serializable isolation level.

Instead of using the default constructor:

using (var scope = new TransactionScope()) { // .... Do Stuff .... scope.Complete(); }

Use this:

using (var scope = TransactionUtils.CreateTransactionScope()) { // .... Do Stuff .... scope.Complete(); } ..... public class TransactionUtils { public static TransactionScope CreateTransactionScope() { var transactionOptions = new TransactionOptions { IsolationLevel = IsolationLevel.ReadCommitted, Timeout = TransactionManager.MaximumTimeout }; return new TransactionScope(TransactionScopeOption.Required, transactionOptions); } }

[The other gotcha relates to SQL Server 2014 and below: TransactionScope and Connection Pooling]

Categories .NET, C#, SQL Server, TransactionScope

sp_whoisactive 11.32

July 26, 2019October 29, 2018 by admin

Back in Dec 2017, Adam Machanic released a version of sp_whoisactive that is fully compatible with SQL Azure. He’s recently released an update, version 11.32

Categories Azure, Performance, SQL, SQL Azure, SQL Server

SQL Server: Complex Queries and Plan Timeout

October 23, 2018 by admin

I was speaking to a client recently about how very complex queries might not always be getting the most efficient query plan due to compilation timeouts. Joseph Pilov has written a good description here: Understanding Optimizer Timeout and how Complex queries can be Affected in SQL Server

Categories Query Plan, SQL Server

SQL Agent Jobs Starting at the Same Time

October 22, 2018October 21, 2018 by admin

The following query will list SQL Agent jobs that are due to start execution at the same time:

;with cteNextRunDatetime as ( SELECT job_id, next_scheduled_run_date = max(next_scheduled_run_date) FROM msdb.dbo.sysjobactivity WHERE next_scheduled_run_date IS NOT NULL GROUP BY job_id ) , cteSimultaneousJobsDates as ( SELECT next_scheduled_run_date FROM cteNextRunDatetime GROUP BY next_scheduled_run_date HAVING COUNT(*) > 1 ) SELECT sj.name, sj.description, c.next_scheduled_run_date FROM msdb.dbo.sysjobs sj JOIN cteNextRunDatetime sja ON sj.job_id = sja.job_id JOIN cteSimultaneousJobsDates c on c.next_scheduled_run_date = sja.next_scheduled_run_date WHERE enabled = 1 ORDER BY sja.next_scheduled_run_date

Categories SQL Server, SQL Server Agent

When Did My Azure SQL Database Server Restart?

October 27, 2018October 12, 2018 by admin

Getting the server restart time for an on-premise SQL Server is simple, and in fact there are several ways using sys.dm_os_sys_info, sys.dm_exec_sessions, sys.traces, or sys.databases

In an Azure SQL Database, you don’t get access to those system objects.

Brent Ozar posted a way to get the approximate Azure SQL Database restart date/time but I found that some of the wait types can produce large outliers and skew the result:

Instead, I’ve modified to use a standard statistics technique to reject outlier values that are outside 1.5 times the interquartile range:

;with cte as ( SELECT wait_time_ms FROM sys.dm_os_wait_stats w with(nolock) WHERE wait_type IN ( 'BROKER_TASK_STOP', 'DIRTY_PAGE_POLL', 'HADR_FILESTREAM_IOMGR_IOCOMPLETION', 'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_DISPATCHER_WAIT', 'XE_TIMER_EVENT' ) ) select approx_ms_since_restart = AVG(wait_time_ms), approximate_restart_date = DATEADD(s, AVG(-wait_time_ms)/1000, GETDATE()) from cte cross join (select q1 = min(wait_time_ms), q3 = max(wait_time_ms), iqr = max(wait_time_ms) - min(wait_time_ms) from (select wait_time_ms, row_number() over (order by wait_time_ms) as seqnum, count(*) over (partition by null) as total from cte ) t where seqnum = cast(total * 0.25 as int) or seqnum = cast(total * 0.75 as int) ) qts where (wait_time_ms >= q1 - 1.5 * iqr) AND (wait_time_ms <= q3 + 1.5 * iqr)

This tells me my Azure server restarted on the 21st Sept 2018 around 23:46.

It’s not as complicated as it first looks:

The first part is obviously the same as Brent Ozar’s query, it gets a list of waits and their respective cumulative wait times. We generate a row number ordered by wait time and the total number of rows, and then pick 2 values using the row number at positions a quarter and three quarters along that list (the first and third quartile values). We then reject any value that is smaller than the 1st quartile value minus 1.5 times the interquartile range and any value that is larger than the 3rd quartile value plus 1.5 times the interquartile range.

Note: DATEADD() only accepts an integer for its second parameter, and using milliseconds(ms) overflows for even short periods of server up-time. In fact, since it’s approximate it might be better to use:

approximate_restart_date = DATEADD(minute, AVG(-wait_time_ms)/60000, GETDATE())

Categories SQL Azure, SQL Server, TSQL 1 Comment

SQL Server 2019 …

September 29, 2018 by admin

If anyone needed a single concrete reason to upgrade to SQL Server 2019 (when its released), ignoring all the new features, here it is:

2628 – String or binary data would be truncated in table ‘%.ls’, column ‘%.ls’. Truncated value: ‘%.*ls’.

That’s right! After something like 25+ years, Microsoft are finally fixing the cursed “BINARY OR STRING DATA WOULD BE TRUNCATED”!

Seen via Brent Ozar’s post: What’s New in SQL Server 2019’s sys.messages: More Unannounced Features

Categories Error, SQL Server

Set Up SQL Server Agent Alerts

September 20, 2018 by admin

SQL Server alerts are a free and easy way to get notified of corruption, agent job failures, or major failures before you get angry phone calls from users.

You should have already set up Database Mail and an alert operator to receive notifications. TIP: Always use an email distribution list for your operator notification email. If you use a person’s email, they might change jobs, or be on holiday…

The script below sets up the standard SQL Server Agent alerts for severity 17 through 25 as well as specific alerts for 823, 824 and 825 errors, and a bunch of others:

USE [msdb]; GO SET NOCOUNT ON; BEGIN TRANSACTION -- Change these as required... DECLARE @Normal_Delay_Between_Responses int = 900 -- in seconds DECLARE @Medium_Delay_Between_Responses int = 3600 DECLARE @Long_Delay_Between_Responses int = 14400 DECLARE @OperatorName sysname = N'SQLDBAGroup' DECLARE @CategoryName sysname = N'SQL Server Agent Alerts'; DECLARE @DelayBetweenResponses int DECLARE @TotalAlerts int DECLARE @Row int = 1 DECLARE @AlertName sysname DECLARE @SeverityNo int DECLARE @ErrorNo int DECLARE @Alerts TABLE ( Id int IDENTITY(1,1) NOT NULL ,SeverityNo int NULL ,ErrorNo int NULL ,AlertName sysname NOT NULL ,DelayBetweenResponses int NOT NULL ); INSERT @Alerts (SeverityNo, AlertName, DelayBetweenResponses) VALUES (17, 'Alert - Sev 17 - Insufficient Resources', @Normal_DELAY_BETWEEN_RESPONSES) ,(18, 'Alert - Sev 18 - Nonfatal Internal Error', @Normal_DELAY_BETWEEN_RESPONSES) ,(19, 'Alert - Sev 19 - Fatal Error in Resource', @Normal_Delay_Between_Responses) ,(20, 'Alert - Sev 20 - Fatal Error in Current Process', @Normal_Delay_Between_Responses) ,(21, 'Alert - Sev 21 - Fatal Error in Database Process', @Normal_Delay_Between_Responses) ,(22, 'Alert - Sev 22 - Fatal Error: Table Integrity Suspect', @Normal_Delay_Between_Responses) ,(23, 'Alert - Sev 23 - Fatal Error: Database Integrity Suspect', @Normal_Delay_Between_Responses) ,(24, 'Alert - Sev 24 - Fatal Error: Hardware Error', @Normal_Delay_Between_Responses) ,(25, 'Alert - Sev 25 - Fatal Error', @Normal_Delay_Between_Responses); INSERT @Alerts (ErrorNo, AlertName, DelayBetweenResponses) VALUES (601, 'Alert - Error 601 - NOLOCK scan aborted due to data movement', @Long_Delay_Between_Responses) ,(674, 'Alert - Error 674 - Exception occurred in destructor', @Long_Delay_Between_Responses) ,(708, 'Alert - Error 708 - Low virtual address space or low virtual memory', @Medium_Delay_Between_Responses) ,(806, 'Alert - Error 806 - Audit failure: page read from disk failed basic integrity checks', @Medium_Delay_Between_Responses) ,(823, 'Alert - Error 823 - I/O Error: http://support.microsoft.com/kb/2015755', @Normal_Delay_Between_Responses) ,(824, 'Alert - Error 824 - Consistency-based I/O error: http://support.microsoft.com/kb/2015756', @Normal_Delay_Between_Responses) ,(825, 'Alert - Error 825 - File Read Retry: http://support.microsoft.com/kb/2015757', @Normal_Delay_Between_Responses) ,(832, 'Alert - Error 832 - Constant page has changed: http://support.microsoft.com/kb/2015759', @Normal_Delay_Between_Responses) ,(833, 'Alert - Error 833 - Long I/O request', @Long_Delay_Between_Responses) ,(855, 'Alert - Error 855 - Uncorrectable hardware memory corruption detected', @Normal_Delay_Between_Responses) ,(856, 'Alert - Error 856 - SQL Server has detected hardware memory corruption, but has recovered the page', @Normal_Delay_Between_Responses) ,(1205, 'Alert - Error 1205 - Transaction Deadlock arbitrated', @Long_Delay_Between_Responses) ,(3401, 'Alert - Error 3401 - Errors occurred during recovery while rolling back a transaction', @Normal_Delay_Between_Responses) ,(3410, 'Alert - Error 3410 - Data in filegroup is offline', @Medium_Delay_Between_Responses) ,(3414, 'Alert - Error 3414 - Recovery Error', @Long_Delay_Between_Responses) ,(3422, 'Alert - Error 3422 - Database was shutdown', @Long_Delay_Between_Responses) ,(3452, 'Alert - Error 3452 - Recovery inconsistency', @Long_Delay_Between_Responses) ,(3619, 'Alert - Error 3619 - Could not write a checkpoint because the log is out of space', @Medium_Delay_Between_Responses) ,(3620, 'Alert - Error 3620 - Automatic checkpointing is disabled because the log is out of space', @Medium_Delay_Between_Responses) ,(3959, 'Alert - Error 3959 - Version store is full', @Normal_Delay_Between_Responses) ,(5029, 'Alert - Error 5029 - Warning: Log has been rebuilt', @Long_Delay_Between_Responses) ,(5144, 'Alert - Error 5144 - Autogrow of file was cancelled by user or timed out', @Long_Delay_Between_Responses) ,(5145, 'Alert - Error 5145 - Long Autogrow of file', @Medium_Delay_Between_Responses) ,(5182, 'Alert - Error 5182 - New log file created', @Long_Delay_Between_Responses) ,(9001, 'Alert - Error 9001 - Transaction log not available', @Normal_Delay_Between_Responses) ,(9002, 'Alert - Error 9002 - Transaction log full', @Normal_Delay_Between_Responses) ,(17173, 'Alert - Error 17173 - Ignored trace flag', @Long_Delay_Between_Responses) ,(17883, 'Alert - Error 17883 - Non-yielding Worker on Scheduler', @Medium_Delay_Between_Responses) ,(17884, 'Alert - Error 17884 - New queries assigned to process have not been picked up by a worker thread', @Medium_Delay_Between_Responses) ,(17887, 'Alert - Error 17887 - Worker appears to be non-yielding on Node', @Medium_Delay_Between_Responses) ,(17888, 'Alert - Error 17888 - All schedulers on Node appear deadlocked', @Medium_Delay_Between_Responses) ,(17890, 'Alert - Error 17890 - A significant part of sql server process memory has been paged out', @Long_Delay_Between_Responses) ,(17891, 'Alert - Error 17891 - Worker appears to be non-yielding on Node', @Medium_Delay_Between_Responses) SELECT @TotalAlerts = COUNT(*) FROM @Alerts IF NOT EXISTS(SELECT * FROM msdb.dbo.sysoperators WHERE name = @OperatorName) BEGIN RAISERROR ('SQL Operator %s does not exist', 18, 16, @OperatorName); RETURN; END IF NOT EXISTS (SELECT * FROM msdb.dbo.syscategories WHERE category_class = 2 -- ALERT AND category_type = 3 AND name = @CategoryName) BEGIN EXEC dbo.sp_add_category @class = N'ALERT', @type = N'NONE', @name = @CategoryName; END BEGIN TRY WHILE @Row <= @TotalAlerts BEGIN SELECT @AlertName = @@SERVERNAME + ' - ' + AlertName ,@SeverityNo = SeverityNo ,@ErrorNo = ErrorNo ,@DelayBetweenResponses = DelayBetweenResponses FROM @Alerts WHERE Id = @Row IF EXISTS (SELECT * FROM msdb.dbo.sysalerts WHERE [name] = @AlertName) BEGIN EXEC msdb.dbo.sp_delete_alert @name = @AlertName END IF @SeverityNo IS NOT NULL BEGIN EXEC msdb.dbo.sp_add_alert @name = @AlertName, @message_id = 0, @severity = @SeverityNo, @enabled = 1, @Delay_Between_Responses = @DelayBetweenResponses, @include_event_description_in = 1, @category_name = @CategoryName, @job_id = N'00000000-0000-0000-0000-000000000000' END IF @ErrorNo IS NOT NULL BEGIN -- Errors 855 and 856 require SQL Server 2012+ and Enterprise Edition -- [Also need Windows Server 2012+, and hardware that supports memory error correction] IF @ErrorNo NOT IN (855, 856) OR (LEFT(CONVERT(CHAR(2),SERVERPROPERTY('ProductVersion')), 2) >= '11' AND SERVERPROPERTY('EngineEdition') = 3) BEGIN EXEC msdb.dbo.sp_add_alert @name = @AlertName, @message_id = @ErrorNo, @severity = 0, @enabled = 1, @Delay_Between_Responses = @DelayBetweenResponses, @include_event_description_in = 1, @category_name = @CategoryName, @job_id = N'00000000-0000-0000-0000-000000000000' END END EXEC msdb.dbo.sp_add_notification @Alert_Name = @AlertName, @Operator_Name = @OperatorName, @notification_method = 1 SELECT @Row = @Row + 1 END END TRY BEGIN CATCH PRINT ERROR_MESSAGE() IF @@TRANCOUNT > 0 BEGIN ROLLBACK END END CATCH IF @@TRANCOUNT > 0 BEGIN COMMIT END

Categories Alerts, DBATools, Reliability, SQL Server, SQL Server Agent, TSQL

SQL Server Availability Groups: Add ‘Check if Primary’ Step to Existing Agent Jobs

May 29, 2018 by admin

I’ve recently been helping a client set up and configure SQL Server Always On Availability Groups. In addition to implementing your SQL Server Agent jobs on all the secondaries, you also have to implement a mechanism to make jobs aware of which node they are running on (Primary or secondary replicas). You would think this would be available out of the box, right? I guess they are so many permutations (especially when you have multiple availability groups on the same server), that it was left to end users to implement.

When SQL Server completes a primary <=> secondary replica role change it fires alert 1480, and you can configure this alert to notify you of the role change and act upon it.

You have 2 choices:

Respond to the role changeover alert (1480) and run a task to disable the jobs related to the databases in the availability group that has failed over.

Add a new first job step to every SQL Agent job which checks if it is running on the Primary Replica. If it is running on the Primary then move to the next job step and continue executing the job, otherwise if it is running on a Secondary Replica then stop the job.

To check if the Primary Replica of a given AG database is currently hosted on a particular server or not, you can either check based on the name of the database or you can check based on the name of the Availability Group the database is in (a database can only be in one availability group).

My scenario had a single availability group with not all databases on the server in the AG (a common scenario). After a couple of unsatisfactory attempts at (a), I decided to opt for (b).

Not wanting to do this change manually for every agent job, I’ve created a stored procedure that adds a new first job step to any agent job, to check if the job is running on the primary. If not, it silently quits with success. There’s also a short snippet of TSQL to run the proc for every SQL Agent job.

use master go -- Adds a first step to specified job, which checks whether running on Primary replica create procedure AddAGPrimaryCheckStepToAgentJob @jobname nvarchar(128) as set nocount on; -- Do nothing if No AG groups defined IF SERVERPROPERTY ('IsHadrEnabled') = 1 begin declare @jobid uniqueidentifier = (select sj.job_id from msdb.dbo.sysjobs sj where sj.name = @jobname) if not exists(select * from msdb.dbo.sysjobsteps where job_id = @jobid and step_name = 'Check If AG Primary' ) begin -- Add new first step: on success go to next step, on failure quit reporting success exec msdb.dbo.sp_add_jobstep @job_id = @jobid , @step_id = 1 , @cmdexec_success_code = 0 , @step_name = 'Check If AG Primary' , @on_success_action = 3 -- On success, go to Next Step , @on_success_step_id = 2 , @on_fail_action = 1 -- On failure, Quit with Success , @on_fail_step_id = 0 , @retry_attempts = 0 , @retry_interval = 0 , @os_run_priority = 0 , @subsystem = N'TSQL' , @command=N'IF (SELECT ars.role_desc FROM sys.dm_hadr_availability_replica_states ars JOIN sys.availability_groups ag ON ars.group_id = ag.group_id AND ars.is_local = 1) <> ''Primary'' BEGIN -- Secondary node, throw an error raiserror (''Not the AG primary'', 2, 1) END' , @database_name=N'master' , @flags=0 end end GO ------------------- -- Run AddAGPrimaryCheckStepToAgentJob for each agent job DECLARE @jobName NVARCHAR(128) DECLARE jobCursor CURSOR LOCAL FAST_FORWARD FOR SELECT j.name FROM msdb.dbo.sysjobs j --WHERE ??? -- filter out any jobs here OPEN jobCursor FETCH NEXT FROM jobCursor INTO @jobName WHILE @@FETCH_STATUS = 0 BEGIN exec AddAGPrimaryCheckStepToAgentJob @jobName FETCH NEXT FROM jobCursor INTO @jobName END CLOSE jobCursor DEALLOCATE jobCursor GO ---------------------- -- Remove the first job step ''Check If AG Primary'' added in previous snippet -- Just here should you want to remove the step added in snippet above. DECLARE @jobName NVARCHAR(128) DECLARE jobCursor CURSOR LOCAL FAST_FORWARD FOR SELECT j.name FROM msdb.dbo.sysjobs j join msdb.dbo.sysjobsteps js on js.job_id = j.job_id where js.step_name = 'Check If AG Primary' and js.step_id = 1 OPEN jobCursor FETCH NEXT FROM jobCursor INTO @jobName WHILE @@FETCH_STATUS = 0 BEGIN EXEC msdb.dbo.sp_delete_jobstep @job_name = @jobName, @step_id = 1 ; FETCH NEXT FROM jobCursor INTO @jobName END CLOSE jobCursor DEALLOCATE jobCursor GO

Note: For SQL Server 2014 onwards you can use the builtin function sys.fn_hadr_is_primary_replica('dbname'):

If sys.fn_hadr_is_primary_replica (@dbname) <> 1 BEGIN -- This is not the primary replica, exit without error. END -- This is the primary replica, continue to run the job...

Categories Always On Availability Group, SQL Server

Older posts

Newer posts

Previous Page1 … Page3 Page4 Page5 Next