.NET: Parse Structured Text Files

I wasn’t aware of the handy TextFieldParser class, that parses a text file in a reader oriented way:

The TextFieldParser object provides methods and properties for parsing structured text files. Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.

Maybe because it’s in the Microsoft.VisualBasic.FileIO namespace… How to: Read From Comma-Delimited Text Files in Visual Basic

Sure, you can write one yourself, but it’s handy for quick spikes:

using (var reader = new TextFieldParser(@"c:\logs\bigfile"))
{
reader.TextFieldType = FieldType.Delimited;
reader.Delimiters = new [] { Constants.vbTab };

while (!reader.EndOfData)
{
try
{
// If ReadFields encounters blank lines, they are skipped and next non-blank line is returned.
string[] currentRow = reader.ReadFields();

// Process row....
foreach (string currentField in currentRow)
{

}

}
catch (MalformedLineException ex)
{
// Handle malformed line....
}
}
}

SQLDiagCmd: a Standalone Runner for Glenn Berry’s SQL Server Diagnostic Scripts

If I said that I use Glenn Berry’s SQL Server diagnostic queries a fair bit it would be an understatement. Glenn releases versions of these very useful TSQL scripts (comprised of 50 plus queries) and a blank Excel Template to save the results in.

Rather than running each query individually, highlighting the results in SSMS and then copying and pasting to Excel, I decided to write a standalone, no install required runner that would do the donkey work for me:

  • It should run from command line (or through a UI)
  • It should be standalone, require no installation, and not require Excel or Office to be installed.
  • It should work with new (and old) versions of his scripts, and not require updating when the scripts are updated.
  • It should not require a blank excel template file.
  • It should identify and run Server and database specific queries automatically.
  • It should be able to target multiple databases at once.

The only (optional) alteration to Glenn’s scripts, is a small addition to the comment line which describes each query, that is used as the worksheet name (if this is missing, it uses the text at the start of the comment, but this won’t be unique due to Excel’s ridiculous limitation of having a 31 character maximum length for worksheet names!).

So, for each comment preceding a query, instead of:

-- SQL Server Services information (SQL Server 2008 R2 SP1 or greater)  (Query 5)

I’ve altered to explicitly specify what text to use as the worksheet name (in bold):

-- SQL Server Services information (SQL Server 2008 R2 SP1 or greater) (Query 5)(SQL Server Services Info)

I’ve emailed Glenn to see if I can get this addition in future scripts.

You can download version 1.0 of the zipped executable from here: SQLDiagCmd.zip (targets .NET Framework 4.0)

OK, How do I run it?

Download Glenn’s diagnostic script that targets your version of SQL Server and download SQLDiagCmd.exe.

Assuming they are in the same directory:

SQLDiagCmd.exe -E -S MUTLEY -i ".\SQL Server 2008 Diagnostic Information Queries (January 2013).sql" 
 -o c:\temp -d "AdventureWorks2008R2;AnotherUserDatabase" -A

Typing SQLDiagCmd.exe –-help displays usage and parameters:

Usage: 

SQLDiagCmd -E -S MUTLEY -i queries.sql -o C:\outputfolder -d databaselist –A 
SQLDiagCmd -S MUTLEY -U username -P password -i queries.sql -o C:\outputfolder -d databaselist –A

 

-S, –servername Required. Server name or instance to run queries against.
-E Use a trusted connection (Windows Authentication).
-U, –username Username for SQL Login.
-P, –password Password for SQL Login.
-i, –inputfile Required. Query diagnostic file to run.
-o, –outputfolder Required. Folder location to write results file.
-A, –autofit Auto-fit Excel columns.
-t, –timeout Query timeout in seconds. Defaults to 360 seconds (for longer running queries).
-d, –databases Semicolon separated list of specific databases to examine. Separate each database with a semicolon. Do not include spaces between databases and semicolon.
–help Display this help screen

 
The SQLDiagCmd git repository is available at github: SQLDiagCmd

This project uses  EPPlus and the Command Line Parser Library.

Several Reasons Why Coding is Hard!

I was looking up something Unicode related and I came across Jon Skeet’s talk on why coding is hard: why do fundamental datatypes such as floating point, strings and DateTime cause so many problems. The slide deck and transcript is here.

The only thing I’d add is that arithmetic might be easier if we had 12 digits rather than 10, rather than the 8 mentioned, but that’s another story…

SQL Server: Can a stored procedure cause a memory leak?

Yes, a memory leak is possible if you forget to call sp_xml_removedocument (for each matching sp_xml_preparedocument):

A parsed document is stored in the internal cache of SQL Server. The MSXML parser (Msxmlsql.dll) uses one-eighth the total memory available for SQL Server. To avoid running out of memory, run sp_xml_removedocument to free up the memory.

Example usage:

DECLARE @xml_text VARCHAR(4000), @i INT
SELECT @xml_text = '... some valid xml ...'

EXEC sp_xml_preparedocument @i OUTPUT, @xml_text

....

EXEC sp_xml_removedocument @i

Another form of memory leak is forgetting to both close and deallocate a cursor:

DECLARE c CURSOR   
  LOCAL STATIC FORWARD_ONLY READ_ONLY   
  FOR SELECT ...

....
CLOSE c; 
DEALLOCATE c;

Just for the record, even though I always like to see an explicit CLOSE and DEALLOCATE for cursors:

LOCAL cursors are implicitly de-allocated when the stored procedure, trigger, or batch in which they were created terminates, unless the cursor has been passed back as a parameter. The LOCAL cursor will then be implicitly de-allocated when the parameter or variable referencing the cursor in the code that called the procedure goes out scope.

Ref.

[Note: I rarely use cursors. Wherever possible and appropriate, I always try to do it the set-based way]

SQL Server: A Gotcha with Disabling/Enabling Foreign Keys and Check Constraints

When loading a large dataset, you can significantly reduce the load time by disabling foreign keys and check constraints during the load process (and ETL’s often perform data integrity checks anyway).  This is commonly done with large data warehouses where entire dimensions are reloaded every day. When the load is complete, they are re-enabled, and SQL Server will ensure that the keys and constraints are honoured.  That is, as long as you re-enable them correctly. And due to the syntax, it is possible to re-enable them without forcing a re-validation of the data (in which case, they have the status of ‘not-trusted’).

Not only does this mean that referential integrity is not being enforced (which is a bad thing!), but it can also have a negative performance impact on queries, because SQL Server won’t use un-trusted constraints in order to build execution plans.

Run this TSQL to check if you have any foreign keys or check constraints that are not trusted:

-- Must be run in the database being examined:
USE MyDB
-- non trusted foreign keys
Select '[' + s.name + '].[' + o.name + '].[' + fk.name + ']' AS keyname 
from sys.foreign_keys fk 
INNER JOIN sys.objects o ON fk.parent_object_id = o.object_id 
INNER JOIN sys.schemas s ON o.schema_id = s.schema_id 
WHERE fk.is_not_trusted = 1 AND fk.is_not_for_replication = 0 AND fk.is_disabled = 0

-- non trusted check constraints
Select '[' + s.name + '].[' + o.name + '].[' + c.name + ']' AS constraintname
from sys.check_constraints c 
INNER JOIN sys.objects o ON c.parent_object_id = o.object_id 
INNER JOIN sys.schemas s ON o.schema_id = s.schema_id 
WHERE c.is_not_trusted = 1 AND c.is_not_for_replication = 0 AND c.is_disabled = 0
GO

To ensure that SQL Server not only re-enables a constraint, but also rechecks all the data, you have to specify the CHECK keyword twice:

ALTER TABLE MyTable 
    WITH CHECK CHECK CONSTRAINT MyConstraint
         ^^    ^^

Note: If the table is very large, you might want to perform this during scheduled maintenance.

 

YOW! Night Perth – Julian Boot – Thurs, Oct 18: 6pm

Join us for YOW! Night to hear “Design Eye for a Dev Guy” by Julian Boot as he describes what developers need to do to contribute to the design of a good UI: a crash course on visual design for developers. Learn how to ensure you have an Agile team that supports and enables good UI design. Julian is well known in Australia as a leader in the software community and international Agile Development speaker. Please register here: http://www.eventbrite.com/event/4499698720

TOPIC: “Design Eye for a Dev Guy” with Julian Boot
DATE: Thursday, 18th October, 6pm – 7:30pm
VENUE: Enex 100 Seminar Room, Level 3, 100 St Georges Terrace, Perth
COST: Free. All welcome.

Seats are limited. Please note the 6pm start time rather than the usual 5:30pm start (due to room availability).

Find Max Value of Multiple Columns

Nothing new, but a nice trick to find the maximum value of a number of columns within a row, due to Louis Davidson (via Simon Sabin). It uses the fact that a sub-query can reference columns from the main rowset.

select 
WorkOrderId,
(select max(DateValue)
from (select StartDate as DateValue
union all
select EndDate
union all
select DueDate
union all
select ModifiedDate) as DateCols) as MaxDate,
StartDate,
EndDate,
DueDate,
ModifiedDate
from
Production.WorkOrder

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/white-space: pre;/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/white-space: pre;/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

SQL Server Agent Job History Statistics

It’s sometimes useful to know how long on average each SQL Server Agent Job takes and what the variation in running time is. You might want to increase how much Agent job history that SQL Server retains (the defaults are quite low).

-- Agent job history over all runs in retained in history
declare @tmp_sp_help_jobhistory table
(
instance_id int null,
job_id uniqueidentifier null,
job_name sysname null,
step_id int null,
step_name sysname null,
sql_message_id int null,
sql_severity int null,
message nvarchar(4000) null,
run_status int null,
run_date int null,
run_time int null,
run_duration int null,
operator_emailed sysname null,
operator_netsent sysname null,
operator_paged sysname null,
retries_attempted int null,
server sysname null
)

insert into @tmp_sp_help_jobhistory
exec msdb.dbo.sp_help_jobhistory @mode='FULL'

;with cteRuns as
(
SELECT
tshj.job_name AS [JobName],
convert(datetime,
stuff(stuff(cast(tshj.run_date as nchar(8)), 7, 0, '-'), 5, 0, '-') + N' ' +
stuff(stuff(substring(cast(1000000 + tshj.run_time as nchar(7)), 2, 6), 5, 0, ':'), 3, 0, ':'),
120) AS [RunDate],
( run_duration % 100 + ((run_duration / 100) % 100) * 60 + ((run_duration / 10000) % 100) * 3600 +
((run_duration / 1000000) % 100) * 86400 )
AS [RunDurationSeconds]
FROM @tmp_sp_help_jobhistory as tshj
where step_ID = 0 and run_status = 1
)
, cteAggRuns as
(
SELECT
JobName,
MAX(RunDate) AS LastRunDate,
AVG(RunDurationSeconds) AS [AverageRunDurationSeconds],
Min(RunDurationSeconds) AS [MinDurationAverageSeconds],
Max(RunDurationSeconds) AS [MaxRunDurationSeconds],
COUNT(jobname) as NumRunsInHistory,
sqrt(var(RunDurationSeconds)) AS [stdRunDurationSeconds]
FROM cteRuns
group by JobName

)
select
JobName,
LastRunDate,
AverageRunDurationSeconds,
MinDurationAverageSeconds,
MaxRunDurationSeconds,
NumRunsInHistory,
stdRunDurationSeconds
from cteAggRuns
ORDER BY JobName

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/white-space: pre;/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

Similarly you might want to get this information for just the last 5 runs say (for instance, when you’ve altered jobs):

declare @tmp_sp_help_jobhistory table
(
instance_id int null,
job_id uniqueidentifier null,
job_name sysname null,
step_id int null,
step_name sysname null,
sql_message_id int null,
sql_severity int null,
message nvarchar(4000) null,
run_status int null,
run_date int null,
run_time int null,
run_duration int null,
operator_emailed sysname null,
operator_netsent sysname null,
operator_paged sysname null,
retries_attempted int null,
server sysname null
)

insert into @tmp_sp_help_jobhistory
exec msdb.dbo.sp_help_jobhistory @mode='FULL'

;with cteRuns as
(
SELECT
job_name AS [JobName],
( run_duration % 100 + ((run_duration / 100) % 100) * 60 + ((run_duration / 10000) % 100) * 3600 +
((run_duration / 1000000) % 100) * 86400 )
AS [RunDurationSeconds],
convert(datetime,
stuff(stuff(cast(run_date as nchar(8)), 7, 0, '-'), 5, 0, '-') + N' ' +
stuff(stuff(substring(cast(1000000 + run_time as nchar(7)), 2, 6), 5, 0, ':'), 3, 0, ':'),
120) AS [RunDate]
FROM @tmp_sp_help_jobhistory
where step_ID = 0 and run_status = 1
)
, cteLast5Runs as
(
SELECT
ROW_NUMBER() over (partition by JobName order by RunDate DESC) as rownum,
JobName,
RunDurationSeconds,
RunDate
FROM cteRuns
)
SELECT
JobName,
AVG(RunDurationSeconds) AS [AverageRunDurationSeconds],
Min(RunDurationSeconds) AS [MinDurationAverageSeconds],
Max(RunDurationSeconds) AS [MaxRunDurationSeconds],
COUNT(jobname) as NumRunsInHistory,
sqrt(var(RunDurationSeconds)) AS [stdRunDurationSeconds]
FROM cteLast5Runs
where rownum between 1 AND 5
group by JobName
ORDER BY JobName

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/white-space: pre;/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }