.NET Regex: Character Class Subtraction

You’ve heard of a positive match character group [] and a negative match character group [^]. But did you know there is also a Character Class Subtraction? I didn’t. It’s supported in .NET but not in the majority of RegEx flavours.

A character class subtraction expression has the following form:

          [base_group – [excluded_group]]

The square brackets ([]) and hyphen (-) are mandatory. The base_group is a positive or negative character group as described in the Character Class Syntax table. The excluded_group component is another positive or negative character group, or another character class subtraction expression (that is, you can nest character class subtraction expressions).

For example, suppose you have a base group that consists of the character range from “a” through “z”. To define the set of characters that consists of the base group except for the character “m”, use [a-z-[m]]. To define the set of characters that consists of the base group except for the set of characters “d”, “j”, and “p”, use [a-z-[djp]]. To define the set of characters that consists of the base group except for the character range from “m” through “p”, use [a-z-[m-p]].

Using this format, the pattern

^[\w-[v123]]$

can be used in .NET to match all alphanumeric characters (any word character) excluding the letter v and numbers 123. 

The MSDN page for the .NET Regex definitions doesn’t seem to appear high in the search indexes, so bookmarking here for my future reference: Character Classes in Regular Expressions

This is useful for comparing Regex capabilities in different languages: regex flavor comparison chart

SQL Diagnostic Runner Updated

Thanks to Todd who reported a bug when connecting with username and password (I messed up the connection string).

I’ve uploaded an updated version (v1.0.4.13057) which you can download from the links below (or from any of the previous posts):

SQLDiagCmd.zip

SQLDiagUI.zip

[Servername will now take a semi-colon separated list of servers to run against, but with the limitation of using the same credentials and diagnostic script.]

Parsing Command Line Arguments

If you want a full blown Command Line Parser then there are several good options available:

[I used the Command Line Parser Library recently in the SQL Diagnostic Runner I wrote.]

If you just want a very basic parser, supporting simple options in the format  /argname:argvalue  then you could use this:

/// 
/// Very basic Command Line Args extracter
/// Parse command line args for args in the following format:
/// /argname:argvalue /argname:argvalue ...
///

public class CommandLineArgs
{
private const string Pattern = @"\/(?\w+):(?.+)";
private readonly Regex _regex = new Regex(Pattern, RegexOptions.IgnoreCase|RegexOptions.Compiled);
private readonly Dictionary _args = new Dictionary();

public CommandLineArgs()
{
BuildArgDictionary();
}

public string this[string key]
{
get { return _args.ContainsKey(key) ? _args[key] : null; }
}

public bool ContainsKey(string key)
{
return _args.ContainsKey(key);
}

private void BuildArgDictionary()
{
var args = Environment.GetCommandLineArgs();
foreach (var match in args.Select(arg => _regex.Match(arg)).Where(m => m.Success))
{
try
{
_args.Add(match.Groups["argname"].Value, match.Groups["argvalue"].Value);
}
// Ignore any duplicate args
catch (Exception) {}
}
}
}

Largest .NET Object….

In .NET versions prior to .NET 4.5, the largest allocation for any single object is 2GB.

On 64-bit platforms, in .NET versions 4.5 and greater, it is possible to enable the allocation of arrays that are larger than 2 GB in total size (but NOTE this does not change other limits on object size or array size):

  • The maximum number of elements in an array is UInt32MaxValue.

  • The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.

  • The maximum size for strings and other non-array objects is unchanged.

The default setting is not enabled.

You can enable this feature by using the gcAllowVeryLargeObjects element in your application configuration file:



<gcAllowVeryLargeObjects enabled="true" />


Before enabling this feature, ensure that your application does not include unsafe code that assumes that all arrays are smaller than 2 GB in size. For example, unsafe code that uses arrays as buffers might be susceptible to buffer overruns if it is written on the assumption that arrays will not exceed 2 GB.

Ref.

SQL Diagnostic Runner Updated

David Vogelaar (and others) kindly reported a bug: I wasn’t converting invalid filename characters when using an SQL Server instance name for the auto-generated results filename. This has been fixed. You can download version 1.0.2 from the previous download links or the ones below.

SQLDiagCmd.zip

SQLDiagUI.zip

There is a known issue:   

The results file is generated OK but sometimes when you open it in Excel a seemingly ‘nasty’ message is shown:

"Excel found unreadable content in ‘???.xlsx’. Do you want to recover the contents of this workbook?"

The file will open OK if you chose to recover: simply accept the prompts and save over the original. I will fix as soon as I can.

SQL Server: Differences between Temp Tables and Table Variables

I had been thinking of collating the differences between temp tables and table variables and posting it, but Martin Smith has already written a great summary over at DBA StackExchange:

What’s the difference between a temp table and table variable in SQL Server?

It’s broken up into the following categories:

  • Storage Location
  • Logical Location
  • Visibility to different scopes
  • Lifetime
  • Transactions
  • Logging
  • Object Metadata
  • Cardinality
  • Column statistics
  • Recompiles
  • Locking
  • Indexes
  • Parallelism
  • Other Functional Differences
  • Memory Only?
  • List of Freely Available Programming Books

    One of the things I think StackOverflow has got wrong is hiding, closed, highly useful questions that are deemed in some way not to ‘fit’ the site’s philosophy (whatever that might be). If your rep is higher than 10K, you can view these hidden closed questions. The site has bigger problems such as the increasing amount of very, very poor quality questions that amount to nothing more than “I can’t be bothered doing/looking up X. Please do X for me”.
    Here’s an example: List of freely available programming-books
    Can’t see it? I’d obviously prefer to link to the entire question and answers, but assuming you can’t see it, here’s an excerpt from the answer begun by George Stocker (who ironically is one of the people who closed it), and then contributed to by many people as a community wiki:
    Meta-Lists

    Language Agnostic

    NET (C# / VB / Visual Studio)

    SQL (implementation agnostic)

    31 Characters Should be Enough for Anyone, Right?

    I’ve always had a good laugh at Oracle for having a 30 character limit on table/column/index names (and probably other objects I don’t know about)

    Mentioned here on StackOverflow:

    “Not just millions of lines of DBA written code, but plenty of oracle internal code no doubt too. This topic came up in a session with Steven Feuerstein and he said he didn’t think they would ever change it.”

    “They couldn’t exactly trumpet it as a new feature, either… they’d spend a lot of time extending the limit, and then announce “you can now use names longer than 30 characters!”. They’d be the laughing stock”

    While writing SQLDiagCmd (a runner for Glenn Berry’s SQL Server diagnostic scripts), I re-discovered that Excel 2010 still has a limit of 31 characters for Worksheet names (and several weird bits of behaviour relating to that limit). Really?!? Why would anyone want more than 31 characters for a work sheet name? It is 2013 right, not 1970?

    Add that to the fact that worksheet names have to be unique (I understand the need for that), and Voila! unnecessary code to guarantee uniqueness with 31 characters! Someone please tell me there’s a way to override this ludicrous limit…