Algorithms

A programmer’s knowledge of algorithms can be (very) roughly divided into 3 categories:

  1. Essential: Basic knowledge of the algorithms you use in your day-to-day work (hopefully, along with their ‘Big Oh’ Complexity, although it is surprising how often this is lacking…)
  2. Desirable: Knowledge of a wide range of algorithms, including sorting, searching, graph algorithms etc.
  3. Rarely Required in a Business Setting: Ability to Analyse an Algorithm’s Complexity

Now, I’m obviously not advocating that all programmers should be algorithm guru’s, but if you want to broaden your knowledge, here are a few places to start:

One of the classic Introduction to Algorithms books is Sedgewick’s Algorithms This used to be a single book (the format I read it in), but is now split into two volumes. It comes in various (computer) language versions, including C++: Fundamentals (parts 1-4) and Graphs (part 5). It is accessible, and covers most of the common algorithms you are likely to encounter (or need).

Another classic introductory book on the subject is Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein (sometimes referred to as CLRS).

For an excellent free resource, check out the Introduction to Algorithms course on MIT OpenCourseWare (which uses CLRS as the course text; one of the authors, Prof. Leiserson, taught the course at MIT): MIT 6.046J / 18.410J Introduction to Algorithms

The course materials also contain video lectures. Here’s an example of why having at least some broader knowledge of algorithms and their application is useful: The skip list is a little known data structure (possibly because it is a relatively recent invention), and yet it is extremely useful and much easier to implement from scratch than many of the other balanced data structures: This is described in lecture 12.

Two books that are lighter and less formal are Algorithms in a Nutshell and the Algorithm Design Manual (Second Edition). Instead of formal mathematical proofs these books take a more practical approach, with real world problems and their solutions. They also show you how to estimate and measure the complexity of a solution. Both books are good, practical reference guides for programmers (I’m just about to add the Algorithm Design Manual to the Perth .NET User Group library…). Highly Recommended.

If you want something more advanced, then try these MIT OCW courses: 6.854J / 18.415J Advanced Algorithms, and the more mathematically advanced: 18.409 Topics in Theoretical Computer Science: An Algorithmist’s Toolkit.

For learning how to analyse algorithms (and let’s be honest, it will be a rare event that you actually have to!), another of Sedgewick’s books, An Introduction to the Analysis of Algorithms, is a good place to start. A more advanced text is Concrete Mathematics: A Foundation for Computer Science and of course Knuth Volumes 1 – 3 (and the remaining volumes, which Knuth is releasing as ‘fascicles’…) which are not for the faint hearted and require considerable mathematical knowledge as a pre-requisite.

Parsing Log Files

If you need to parse log files, before you decide to write your own parser, try using the free Log Parser from Microsoft:

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.

This post shows a nice example of SMTP Log Parsing, with the process automated via PowerShell.

The TechNet script centre contains several Examples of using LogParser.

Books, Books, Booko!

If like me, you buy a lot of technical books each year, the opportunity to save a few dollars here and there really adds up. In the past, I have used Amazon(US) pretty much exclusively as it was always possible to save considerably on most (if at all) books in an order compared to the local retailers in Perth. And as a bonus, Amazon usually delivered them faster (from the US)!

Recently, a colleague let me know about Booko:

Booko is a site with a very simple goal – to find the cheapest place to buy books & DVDs

I’ve purchased 5 books in the last 4 weeks via Booko from 4 different sellers in 3 different countries. It’s a great way to save money on books. It is now my first stop, whereas before, I simply used Amazon.

Next time you need to buy a technical book, give Booko a go.

PDB Files: What Every Developer Must Know

I’m a long time fan of John Robbins’ work, and over the years I’ve followed his blog, MSDN column and purchased several of his excellent debugging books.

He posted a must read blog entry that I meant to mention a while back: PDB Files: What Every Developer Must Know.

Here’s a snippet:

The most important thing all developers need to know: PDB files are as important as source code!

At a minimum, every development shop must set up a Symbol Server … Briefly, a Symbol Server stores the PDBs and binaries for all your public builds. That way no matter what build someone reports a crash or problem, you have the exact matching PDB file for that public build the debugger can access. Both Visual Studio and WinDBG know how to access Symbol Servers and if the binary is from a public build, the debugger will get the matching PDB file automatically.

Windows 7 Just keeps Getting Better

As most people are aware, Windows 7 went RTM and is now available to TechNet Subscribers (and many of those that took part in the Beta program). First impressions are that it feels even faster than the RC release! It installed flawlessly. Easy as.

This is simply, the best OS Microsoft have ever released.

I re-ran the Windows experience index:

WEI2

Interestingly, my disk score has fallen 0.1 points, which is shame as it’s the slowest component. Rather surprising, given that it’s an SSD (and one with a reasonably high performance). I feel a bit of tweaking coming on…

So never mind your poser laptop with the nerdy cover light! Can it compete with a real PC? 😉

SQL Server 2008: ETL Data Load: 1 TB in 30 Minutes!

I came across this excellent article on MSDN: We Loaded 1TB in 30 Minutes with SSIS, and So Can You, detailing the design and implementation of a large data load using SSIS. The work was actually done back in February 2008.

Summary: In February 2008, Microsoft announced a record-breaking data load using Microsoft® SQL Server® Integration Services (SSIS): 1 TB of data in less than 30 minutes. That data load, using SQL Server Integration Services, was 30% faster than the previous best time using a commercial ETL tool. This paper outlines what it took: the software, hardware, and configuration used. We will describe what we did to achieve that result, and offer suggestions for how to relate these techniques to typical scenarios. Even for customers who don’t have needs quite like this benchmark, such efforts can teach a lot about getting optimal performance.

SQL Server 2008: Script Data as Inserts

I expect many people know this already but just in case you don’t: in addition to scripting your database schema as TSQL, you can also generate data insert scripts directly from SQL Server 2008 Management Studio. Right-click on your database in SSMS, select Tasks –> Generate Scripts, ensure your database is highlighted and click next. Scroll down the options list to the “Table/View Options” section, and change “Script Data” to True.

image

(I’m not sure if this was also present in SQL Server 2005, as I don’t have an instance to hand).