admin

SQL Server: Poison Waits

December 29, 2019August 18, 2019 by admin

SQL Server performance tuning often starts by examining your top wait statistics. There are certain wait types where even a small number of occurrences can indicate performance problems. These are termed Poison Waits.

RESOURCE_SEMAPHORE_QUERY_COMPILE
A query was sent to SQL Server, and there wasn’t an execution plan for it in the query plan cache. In order to create an execution plan, SQL Server requests a small amount of memory, but due to memory pressure the requested memory wasn’t available. So SQL Server had to wait for memory to become available before it could even build an execution plan, let alone execute the query. In this situation, cached query plans and small un-cached plans may be able to run depending on how much pressure the server is under, but complex queries will experience memory request waits and feel sluggish.

RESOURCE_SEMAPHORE
SQL Server compiled an execution plan (or retrieved the query plan from cache), but now it needs memory to actually execute the query (a memory grant request). If other queries are already using a lot of memory, then our query won’t be able to start executing because there is insufficient memory available. Similar to the RESOURCE_SEMAPHORE_QUERY_COMPILE wait, smaller queries may be able to execute, but complex ones will be blocked from executing and wait for memory to become available.

THREADPOOL
At startup, SQL Server creates a predefined number of worker threads based on how many logical processors the server has (each worker thread uses 2MB of memory). As queries arrive, they get assigned to worker threads. If enough queries queue up, such as when queries get blocked, you can run out of available worker threads (worker thread starvation). You might be tempted to increase max worker threads (and Microsoft support sometimes makes this suggestion), but then you might simply escalate the problem to a RESOURCE_SEMAPHORE or RESOURCE_SEMAPHORE_QUERY_COMPILE issue. Blocking is the most common culprit of THREADPOOL waits, but it can also be due to a large amount of connections trying to run queries at the same time. If you are unable to connect to SQL Server to troubleshoot because of worker thread starvation, try connecting using the Dedicated Admin Connection.

Whenever any of these poison waits occur, you have to get to the root cause of the problem. For a list and explanation of the various waits: https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql

Amazon RDS SQL Server: Get Instance Size Using TSQL

August 12, 2019July 27, 2019 by admin

You can obviously retrieve an Amazon RDS SQL Server’s instance type (size) from the AWS portal, but I wanted to get it using TSQL:

IF OBJECT_ID('tempdb..#AmazonErrorLog') IS NOT NULL
    DROP TABLE #AmazonErrorLog;
 
CREATE TABLE #AmazonErrorLog
(
    LogDate DATETIME,
    ProcessInfo NVARCHAR(20),
    [Text] NVARCHAR(1000)
);
 
DECLARE @pattern nvarchar(30) = N'System Model:';
 
INSERT INTO #AmazonErrorLog
EXEC rdsadmin.dbo.rds_read_error_log;
 
IF @@ROWCOUNT > 0
BEGIN
    SELECT InstanceSize = CAST(REPLACE(SUBSTRING(Text, LEN(@pattern) + 1 + PATINDEX (N'%' + @pattern + N'%', Text), 100), '''', '') AS varchar(100))
    FROM #AmazonErrorLog
    WHERE PATINDEX (N'%' + @pattern + N'%', Text) > 0 
END
 
DROP TABLE #AmazonErrorLog;

Automatic Index Tuning in Azure SQL Databases

July 26, 2019 by admin

Microsoft whitepaper on tuning Azure SQL databases at scale: https://www.microsoft.com/en-us/research/uploads/prod/2019/02/autoindexing_azuredb.pdf

SQL Server: Plan Cache and Adhoc Workloads

July 25, 2019 by admin

Applications generating many dynamic queries (such as ORM frameworks) can lead to a query plan cache bloated by single use plans. Caching something that is only used once is obviously a waste of memory that could otherwise be used to store data pages.

If you have a predominately adhoc workload, turning on ‘optimize for adhoc workloads’ can help reduce the memory footprint of single use plans (it won’t solve the problem entirely though). A system I’ve recently worked on was able to regain 9GB of memory for data pages by turning this setting on.

sp_configure 'show advanced options', 1
GO
reconfigure
GO

sp_configure 'optimize for ad hoc workloads', 1
GO
reconfigure
GO

I’ve recently been using a slightly modified version of Kimberly Tripp’s query from her post, Plan cache and optimizing for adhoc workloads to categorise a workload:

SELECT 
	CacheType = objtype,
	TotalPlans = COUNT_BIG(*),
	TotalMBs = CAST(SUM(CAST(size_in_bytes AS DECIMAL(18, 2))) / 1024 / 1024 AS decimal(9,2)),
	AverageUseCount = AVG(usecounts),
	TotalMBs1USE = CAST(SUM(CAST((CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) AS DECIMAL(18, 2))) / 1024 / 1024 AS decimal(9,2)),
	TotalPlans1USE = SUM(CASE WHEN usecounts = 1 THEN 1 ELSE 0 END),
	[%TotalPlans] = CAST(100. * SUM(1) / (SELECT COUNT_BIG(*) FROM sys.dm_exec_cached_plans) AS decimal(9,2)),
	[%TotalMB] = CAST(100. * SUM(CAST(size_in_bytes AS DECIMAL(18,2))) / (SELECT SUM(CAST(size_in_bytes AS DECIMAL(18,2))) FROM sys.dm_exec_cached_plans) AS decimal(9,2))
FROM 
	sys.dm_exec_cached_plans
GROUP BY 
	objtype

SQL Server Security: Find Users with Weak Passwords

March 14, 2020July 21, 2019 by admin

Data Breaches are common, and their cause is often as simple as the use of weak passwords.

SQL Server has an internal system function, PWDCOMPARE(), that can be used to find SQL logins with a weak password. A list of very common weak passwords can be found here as well as many other places.

IF OBJECT_ID('tempdb..#CommonPasswords') IS NOT NULL
    DROP TABLE #CommonPasswords;

CREATE TABLE #CommonPasswords(Password varchar(30) COLLATE Latin1_General_CS_AS not null primary key);

INSERT INTO #CommonPasswords(Password) VALUES
(''),
('123'),
('1234'),
('12345'),
('123456'),
('1234567'),
('12345678'),
('123456789'),
('1234567890'),
('987654321'),
('123qwe'),
('mynoob'),
('18atcskd2w'),
('55555'),
('555555'),
('3rjs1la7qe'),
('google'),
('zxcvbnm'),
('000000'),
('1q2w3e'),
('1q2w3e4r5t'),
('1q2w3e4r'),
('qwerty'),
('qwerty123'),
('password'),
('p@ssword'),
('p@ssw0rd'),
('password1'),
('p@ssword1'),
('password123'),
('passw0rd'),
('111111'),
('1111111'),
('abc123'),
('666666'),
('7777777'),
('654321'),
('123123'),
('123321'),
('iloveyou'),
('admin'),
('nimda'),
('welcome'),
('welcome!'),
('!@#$%^&amp;*'),
('aa123456'),
('lovely'),
('sunshine'),
('shadow'),
('princess'	),
('solo'),
('football'),
('monkey'),
('Monkey'),
('charlie'),
('donald'),
('Donald'),
('dragon'),
('Dragon'),
('trustno1'),
('letmein'),
('whatever'),
('hello'),
('freedom'),
('master'),
('starwars'),
('qwertyuiop'),
('Qwertyuiop'),
('qazwsx'),
('corona'),
('woke'),
('batman'),
('superman'),
('login');

SELECT 
    name,
    create_date,
    is_disabled
FROM 
    sys.sql_logins sl (nolock)
    cross apply #CommonPasswords cp
WHERE 
    PWDCOMPARE(cp.Password, sl.password_hash) = 1

UNION ALL

SELECT 
    name,
    create_date,
    is_disabled
FROM 
    sys.sql_logins sl (nolock)
WHERE 
    PWDCOMPARE(sl.name, sl.password_hash) = 1; -- password same as username

Troy Hunt has collected the passwords from several major data breaches, and he has made the passwords searchable.

SQL Server: Diagnosing Intermittent Connectivity Issues

April 12, 2019 by admin

Common causes of intermittent SQL Server connectivity issues:

Queries/transactions holding long running exclusive locks on tables. Queries waiting on these locks may be timing out.
A faulty switch, network card, cable or other network infrastructure.
Another device on the network with the same IP address.
Incorrect connection protocol being used.
Virus scanners and incorrect exclusion lists.
Network/Port security scanners.
VM snapshot backups.
Under-powered SQL Server or TLog backups over an under-powered network connection or large TLog file growth settings.
SQL Server under extreme memory pressure.
An application doing slow row-by-row processing while SQL Server is sending the data fast; this often shows up as a high value for ASYNC_NETWORK_IO waits.

The obvious question is “What changed?”!

When different parts of the system are managed by different people, the culprit can be a seemingly innocuous change that is not well documented or communicated to the various stakeholders.

Is a server alias being used?

Have TCP/IP dynamic ports been enabled for the TCP/IP protocol?

See How to Configure a SQL Server Alias for a Named Instance on a Development Machine

Is the SQL Server a named instance?

If you don’t specify the protocol in the connection string, then it will use the protocol specified for the database driver. The protocol order, by default, is Shared Memory, TCP/IP and then Named Pipes.

To list connections and the protocol they are using:

SELECT * FROM sys.dm_exec_connections

–WHERE session_id = @@SPID;

To specify which protocol to use in a connection string, add tcp: before the server name:

var connectionString = “Data Source=tcp:Server\Instance; …”;

Is IPv6 being used?

Was the server upgraded?

If so, there might be queries that are timing out due to optimiser/cardinality estimator changes.

Are all applications experiencing intermittent connections?

If you have a monitoring tool connected to the SQL Server, is it able to connect even when the problem is occurring?

Are there any errors in the SQL Server or application server logs?

If all application servers lose network connectivity regularly at the same time, it might be a hardware issue such as a faulty switch or network card.

Does it happen to all application servers?

If you have several application or web servers, are they all affected?

[As an aside, if you’ve only got one, you might want to consider setting up another for load balancing and troubleshooting.]

Are all queries in the application affected, or just some queries?

Some queries/transactions may be holding long running exclusive locks on tables. Queries waiting on these locks might be timing out.

Do timeouts occur at regular days/times?

Could it be regularly scheduled port security scans, or VM backups, or large transaction log file growths? Use a fixed transaction log file growth setting (128MB is commonly used) rather than a percentage.

During the timeouts, is the application server able to ping the SQL Server?

Set up a script to ping the SQL Server every 10-15 seconds. If you are not able to ping the SQL Server next time the application has query timeouts, then it is not a SQL Server problem.

SQLBits 2019 Sessions Online

March 20, 2019 by admin

The SQLBits 2019 conference sessions are online (and free). They vary in quality but are mostly good.

Faster I/O Subsystem Causing More WRITELOG Waits?

March 13, 2019 by admin

Paul Randal explains how WRITELOG waits can increase with a faster I/O subsystem:

I was contacted last week by someone who was asking how are faster disks causing more WRITELOG waits. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOG wait time and get better workload throughput.
They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain…

Azure SQL DB Storage Bottleneck

March 5, 2019March 5, 2019 by admin

If you are using Azure SQL Databases, you should definitely read this post by Brent Ozar: There’s a bottleneck in Azure SQL DB storage throughput. The bottom line:
the transaction log throughput currently appears to bottleneck at 16 cores!

The bit where he compares AWS costs/relative performance is also an eye opener:

8 cores, $1,991 per month: 64 minutes</li><li><strong>16 cores,$ 3,555 per month: 32 minutes (and interestingly, it’s the same speed with zone redundancy enabled)
80 cores, $18,299 per month: 32 minutes</strong></li><li>Just for reference: 8-core AWS EC2 i3.2xl VM,$ 1,424 per month with SQL Server Standard Edition licensing: 2 minutes (and I don’t put that in to tout AWS, I just happen to have most of my lab VMs there, so it was a quick comparison)

Categories Azure, SQL Azure, SQL Server

TransactionScope Default Isolation Level

March 5, 2019January 7, 2019 by admin

When you create a new TransactionScope, you might be surprised to find that the default isolation level is Serializable.

TransactionScope’s default constructor defaults the isolation level to Serializable and the timeout to 1 minute. In SQL Server, SERIALIZABLE transactions are rarely useful and prone to deadlocks.

If you are using a TransactionScope and getting deadlocks in your application (and hopefully you have a retry mechanism for SQLException 1205), it might be due to the Serializable isolation level.

Instead of using the default constructor:

using (var scope = new TransactionScope()) { // .... Do Stuff .... scope.Complete(); }

Use this:

using (var scope = TransactionUtils.CreateTransactionScope()) { // .... Do Stuff .... scope.Complete(); } ..... public class TransactionUtils { public static TransactionScope CreateTransactionScope() { var transactionOptions = new TransactionOptions { IsolationLevel = IsolationLevel.ReadCommitted, Timeout = TransactionManager.MaximumTimeout }; return new TransactionScope(TransactionScopeOption.Required, transactionOptions); } }

[The other gotcha relates to SQL Server 2014 and below: TransactionScope and Connection Pooling]

Categories .NET, C#, SQL Server, TransactionScope

Older posts

Newer posts

Previous Page1 … Page4 Page5 Page6 … Page87 Next