SQL Server: Poison Waits

SQL Server performance tuning often starts by examining your top wait statistics. There are certain wait types where even a small number of occurrences can indicate performance problems. These are termed Poison Waits.

RESOURCE_SEMAPHORE_QUERY_COMPILE
A query was sent to SQL Server, and there wasn’t an execution plan for it in the query plan cache. In order to create an execution plan, SQL Server requests a small amount of memory, but due to memory pressure the requested memory wasn’t available. So SQL Server had to wait for memory to become available before it could even build an execution plan, let alone execute the query. In this situation, cached query plans and small un-cached plans may be able to run depending on how much pressure the server is under, but complex queries will experience memory request waits and feel sluggish.

RESOURCE_SEMAPHORE
SQL Server compiled an execution plan (or retrieved the query plan from cache), but now it needs memory to actually execute the query (a memory grant request). If other queries are already using a lot of memory, then our query won’t be able to start executing because there is insufficient memory available. Similar to the RESOURCE_SEMAPHORE_QUERY_COMPILE wait, smaller queries may be able to execute, but complex ones will be blocked from executing and wait for memory to become available.

THREADPOOL
At startup, SQL Server creates a predefined number of worker threads based on how many logical processors the server has (each worker thread uses 2MB of memory). As queries arrive, they get assigned to worker threads. If enough queries queue up, such as when queries get blocked, you can run out of available worker threads (worker thread starvation). You might be tempted to increase max worker threads (and Microsoft support sometimes makes this suggestion), but then you might simply escalate the problem to a RESOURCE_SEMAPHORE or RESOURCE_SEMAPHORE_QUERY_COMPILE issue. Blocking is the most common culprit of THREADPOOL waits, but it can also be due to a large amount of connections trying to run queries at the same time. If you are unable to connect to SQL Server to troubleshoot because of worker thread starvation, try connecting using the Dedicated Admin Connection.

Whenever any of these poison waits occur, you have to get to the root cause of the problem. For a list and explanation of the various waits: https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql

Faster I/O Subsystem Causing More WRITELOG Waits?

Paul Randal explains how WRITELOG waits can increase with a faster I/O subsystem:

I was contacted last week by someone who was asking how are faster disks causing more WRITELOG waits. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOG wait time and get better workload throughput.

They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain…