Tuning SQL Server Queries 101

First, get an actual query execution plan. Look for warnings in the query plan.

Basic

  • First look for large scans and lookups: these can often be resolved by creating a new index or extending an existing one with additional included columns. Seeks are usually preferable to scans.
  • Then look for significant variance of Actual versus Estimated row counts: you can provide the optimiser with more accurate information by updating statistics, creating new statistics objects, adding statistics on computed columns, or by breaking the query up into simpler parts. Might be caused by ‘parameter sniffing’.
  • Then look for expensive operators in the query plan, especially those that consume memory such as sorts and hashes. Sorting can sometimes be avoided by altering/adding indexes.

More Advanced

Joe Sack has an excellent walkthrough here: The Case of the Cardinality Estimate Red Herring

Date and Time Dimension

Almost every fact table in a data warehouse uses a date (or calendar) dimension, because most measurements are defined at specific points in time. A flexible calendar date dimension is at the heart of most data warehouse systems; it provides easy navigation of a fact table through user familiar dates, such as weeks, months, fiscal periods and special days (today, weekends, holidays etc.).

I’ve created a date dimension generator here at Github

It targets SQL Server, but should be easy to convert to other RDBMS.

It features:

  • User defined start and end dates
  • Computed Easter dates (for years 1901 to 2099)
  • Computed Chinese New year dates for years 1971 to 2099.
  • Computed public holidays for US, UK, Canada, Ireland, Malta, Philippines, Australia (with state specific for WA, NSW, QLD, SA, VIC).
  • Date labels in US, UK and ISO formats.

Things to Note:

  1. The [TodayFlag] needs to be updated once per day by a scheduled task (timezone dependent: might need a flag for each timezone).

  2. If you use an unusual Fiscal year (say 5-4-4), it will need to be loaded from an external source (such as an Excel/Google spreadsheet).

  3. The precise start date of the month of Ramadan is by proclamation, so these need to be added, year by year. It is possible to calculate but can be a day out, and can vary by region.

    https://travel.stackexchange.com/questions/46148/how-to-calculate-when-ramadan-finishes

    https://en.wikipedia.org/wiki/Ramadan_%28calendar_month%29

Do You Name All Your SQL Server Database Constraints?

If you define a constraint without explicitly giving it a name, SQL Server will generate one for you.
You know the ones, they look something like this PK__MY_TABLE__3213E83FA7739BB4.

Why might that be a bad thing? It makes writing deployment scripts harder because you won’t know up front the names of constraints you might want to refer to.

Michael J Swart describes a query to discover the system generated names in your databases (with a small modification):

SELECT 
    [Schema] = SCHEMA_NAME(o.schema_id),
    [System Generated Name] = OBJECT_NAME(o.object_id),
    [Parent Name] = OBJECT_NAME(o.parent_object_id),
    [Object Type] = o.type_desc
FROM 
    sys.objects o
    JOIN sys.sysconstraints c ON o.object_id = c.constid
WHERE 
    (status & 0x20000) > 0
    and o.is_ms_shipped = 0

According to the sys.sysconstraints documentation page:

This SQL Server 2000 system table is included as a view for backward compatibility. We recommend that you use the current SQL Server system views instead. To find the equivalent system view or views, see Mapping System Tables to System Views (Transact-SQL). This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

You can query the same information by using the individual views unioned together:


SELECT 
    [Schema] = SCHEMA_NAME(schema_id),
    [System Generated Name] = OBJECT_NAME(object_id),
    [Parent Name] = OBJECT_NAME(parent_object_id),
    [Object Type] = type_desc
FROM sys.check_constraints 
WHERE is_system_named = 1

UNION ALL

SELECT 
    [Schema] = SCHEMA_NAME(schema_id),
    [System Generated Name] = OBJECT_NAME(object_id),
    [Parent Name] = OBJECT_NAME(parent_object_id),
    [Object Type] = type_desc
FROM sys.default_constraints 
WHERE is_system_named = 1

UNION ALL

SELECT 
    [Schema] = SCHEMA_NAME(schema_id),
    [System Generated Name] = OBJECT_NAME(object_id),
    [Parent Name] = OBJECT_NAME(parent_object_id),
    [Object Type] = type_desc
FROM sys.key_constraints 
WHERE is_system_named = 1

UNION ALL

SELECT 
    [Schema] = SCHEMA_NAME(schema_id),
    [System Generated Name] = OBJECT_NAME(object_id),
    [Parent Name] = OBJECT_NAME(parent_object_id),
    [Object Type] = type_desc
FROM sys.foreign_keys  
WHERE is_system_named = 1

SQL Server Error Code 4815 Bulk Insert into Azure SQL Database

If you receive error code 4815 while doing a Bulk Insert into an Azure SQL Database (including SqlBulkCopy()), it’s likely you are trying to insert a string that is too long into a (n)varchar(x) column.

The unhelpful error message does not contain any mention of overflow, or the column name! Posting in the hope it will save someone some time.

Azure SQL DB Storage Bottleneck

If you are using Azure SQL Databases, you should definitely read this post by Brent Ozar: There’s a bottleneck in Azure SQL DB storage throughput. The bottom line:
the transaction log throughput currently appears to bottleneck at 16 cores!

The bit where he compares AWS costs/relative performance is also an eye opener:

  • 8 cores, 1,991 per month: 64 minutes</li><li><strong>16 cores,3,555 per month: 32 minutes (and interestingly, it’s the same speed with zone redundancy enabled)
  • 80 cores, 18,299 per month: 32 minutes</strong></li><li>Just for reference: 8-core AWS EC2 i3.2xl VM,1,424 per month with SQL Server Standard Edition licensing: 2 minutes (and I don’t put that in to tout AWS, I just happen to have most of my lab VMs there, so it was a quick comparison)

When Did My Azure SQL Database Server Restart?

Getting the server restart time for an on-premise SQL Server is simple, and in fact there are several ways using sys.dm_os_sys_info, sys.dm_exec_sessions, sys.traces, or sys.databases

In an Azure SQL Database, you don’t get access to those system objects.

Brent Ozar posted a way to get the approximate Azure SQL Database restart date/time but I found that some of the wait types can produce large outliers and skew the result:

Instead, I’ve modified to use a standard statistics technique to reject outlier values that are outside 1.5 times the interquartile range:

;with cte as
(
    SELECT wait_time_ms 
    FROM sys.dm_os_wait_stats w with(nolock)
    WHERE wait_type IN 
    (
        'BROKER_TASK_STOP',
        'DIRTY_PAGE_POLL',
        'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
        'LAZYWRITER_SLEEP',
        'LOGMGR_QUEUE',
        'REQUEST_FOR_DEADLOCK_SEARCH',
        'XE_DISPATCHER_WAIT',
        'XE_TIMER_EVENT'
    )
)
select 
    approx_ms_since_restart = AVG(wait_time_ms), 
    approximate_restart_date = DATEADD(s, AVG(-wait_time_ms)/1000, GETDATE())
from 
cte
cross join
     (select 
         q1 = min(wait_time_ms), 
         q3 = max(wait_time_ms), 
         iqr = max(wait_time_ms) - min(wait_time_ms)
      from (select 
               wait_time_ms,
               row_number() over (order by wait_time_ms) as seqnum,
               count(*) over (partition by null) as total
            from cte
           ) t
      where seqnum = cast(total * 0.25 as int) or seqnum = cast(total * 0.75 as int)
     ) qts
 where (wait_time_ms >= q1 - 1.5 * iqr) AND (wait_time_ms <= q3 + 1.5 * iqr)

This tells me my Azure server restarted on the 21st Sept 2018 around 23:46.

It’s not as complicated as it first looks:

The first part is obviously the same as Brent Ozar’s query, it gets a list of waits and their respective cumulative wait times. We generate a row number ordered by wait time and the total number of rows, and then pick 2 values using the row number at positions a quarter and three quarters along that list (the first and third quartile values). We then reject any value that is smaller than the 1st quartile value minus 1.5 times the interquartile range and any value that is larger than the 3rd quartile value plus 1.5 times the interquartile range.

Note: DATEADD() only accepts an integer for its second parameter, and using milliseconds(ms) overflows for even short periods of server up-time. In fact, since it’s approximate it might be better to use:

approximate_restart_date = DATEADD(minute, AVG(-wait_time_ms)/60000, GETDATE())

SQL Server Unindexed Foreign Keys

I saw this, DMV To List Foreign Keys With No Index, via Brent Ozar’s weekly links email.

Unindexed foreign key columns might not be captured by the sys.dm_db_missing_index_details DMV because of their relatively small size. Lack of indexes on foreign keys might only have a small performance impact during reads but can lead to lock escalations during heavy write loads causing excessive blocking and possibly dead locks.

I’ve updated the original posted query to generate TSQL to create the missing indexes (which you should compare to the existing index landscape to see if any indexes can be consolidated before running in).

[Note: if you are unfortunate enough to have spaces in your table/column names, then you’ll need to replace them with an underscore ‘_’  (or other character) in the index name.]

;with cte_fk as 
( 
    select   
        fk_table_schema = OBJECT_SCHEMA_NAME(fk.parent_object_id),
        fk_table = OBJECT_NAME(fk.parent_object_id),
        fk_column = c.name,
        fk_name   = fk.name,
        fk_has_index = CASE WHEN i.object_id IS NOT NULL THEN 1 ELSE 0 END,
        is_fk_a_pk_also = i.is_primary_key,
        is_index_on_fk_unique = i.is_unique,
        index_def = 'create index NC_' + OBJECT_NAME(fk.parent_object_id) + '_' + c.name + 
           ' ON ' + QUOTENAME(OBJECT_SCHEMA_NAME(fk.parent_object_id)) + '.' + QUOTENAME(OBJECT_NAME(fk.parent_object_id)) + '(' + QUOTENAME(c.name) + ')',
        pk_table_schema = OBJECT_SCHEMA_NAME(fk.referenced_object_id),
        pk_table = OBJECT_NAME(fk.referenced_object_id),
        pk_column = c2.name,
        pk_index_name = kc.name,
        fk.*
    from     
        sys.foreign_keys fk
        join sys.foreign_key_columns fkc ON fkc.constraint_object_id = fk.object_id
        join sys.columns c ON c.object_id = fk.parent_object_id AND c.column_id = fkc.parent_column_id
        left join sys.columns c2 ON c2.object_id = fk.referenced_object_id AND c2.column_id = fkc.referenced_column_id
        left join sys.key_constraints kc ON kc.parent_object_id = fk.referenced_object_id AND kc.type = 'PK'
        left join sys.index_columns ic ON ic.object_id = c.object_id AND ic.column_id = c.column_id
        left join sys.indexes i ON i.object_id = ic.object_id AND i.index_id = ic.index_id
)
select  
    * 
from    
    cte_fk c
    left join sys.dm_db_partition_stats ps on ps.object_id = c.parent_object_id and ps.index_id <= 1
where   
    fk_has_index = 0 
    -- and fk_table = 'mytablename'
order by 
    used_page_count desc

Do you Encrypt your Remote Connections to SQL Azure Databases?

If you’re not encrypting connections to SQL Azure (or any remote SQL Server instance), then you probably should.

Encrypted connections to SQL Server use SSL,  and that is about as secure as you can get (currently).

[Remember: SSL protects only the connection, i.e. the data as it is transmitted ‘on the wire’ between the client and SQL Server. It says nothing about how the data is actually stored on the server].

Update: Don’t forget to also set TrustServerCertificate=false

SSMS

When you open SSMS’s ‘Connect to Server’ dialog, click the bottom right ‘Options’ button, and make sure you tick the checkbox ‘Encrypt Connection’:

image

SQLCMD

Ensure you add the -N command line option. The -N switch is used by the client to request an encrypted connection. This option is equivalent to the ADO.net option ENCRYPT = true.

e.g.

sqlcmd –N –U username –P password  –S servername –d databasename –Q “SELECT * FROM myTable”

Linked Servers

When creating a linked server to SQL Azure,  the @provstr parameter must be set to ‘Encrypt=yes;’:

-- Create the linked server:
EXEC sp_addlinkedserver
@server     = 'LocalLinkedServername',
@srvproduct = N'Any',
@provider   = 'SQLNCLI',
@datasrc    = '???.database.windows.net', -- Azure server name
@location   = '', 
@provstr    = N'Encrypt=yes;',       -- <<--  Important!
@catalog    = 'RemoteDatabaseName';  -- remote(Azure) database name
go

 

ADO.NET Connection strings

Add “ENCRYPT = true” to your connection string, or set the SqlConnectionStringBuilder property to True.

[Remember: don’t distribute passwords by sending as plaintext over the Internet, i.e. don’t email passwords! ]