Patterns and Practices: Guidance Explorer

The Patterns and Practices Team continue to have a major impact on software development both inside and outside Microsoft. Their latest offering is Guidance Explorer.

Guidance Explorer allows you to create and distribute a set of standard performance and security best-practices that your team can adhere to.

From J.D Meier’s blog: “Guidance Explorer is a new, experimental tool from the patterns & practices team that radically changes the way you consume guidance as well as the way we create it. If you’ve felt overwhelmed looking across multiple sources for good security or performance guidance then Guidance Explorer is the tool for you”

It’s currently aimed at ASP.NET, but windows guidelines are apparently in the pipeline. I’ve just downloaded it, and I might blog my experiences later…

Visual Studio 2005 Icon Library

Did you know that Visual Studio 2005 ships with a library of standard windows bitmaps, cursors, icons and metafiles which can be freely used in your windows and web applications? It contains Windows, Office, and Visual Studio icons that are licensed for reuse.

You can find it here: C:\Program Files\Microsoft Visual Studio 8\Common7\VS2005ImageLibrary\VS2005ImageLibrary.zip

In addition the .ico files are in multi icon format with the 16×16, 32×32, 48×48 images (and color depth 256, 16bpp, 24bpp) contained in a single file.

Skyscrapr

If you’re an architect or an aspiring architect, check out skyscrapr. The site was recently launched by Microsoft (May, 2006), and plans to cover all aspects of architecture.

Introduction to Test-Driven Development

This is old news but worth mentioning if you haven’t already seen the Introduction to Test-Driven Development webcast by Peter Provost, Scott Densmore, Brad Wilson, Brian Button and Ron Jacobs, and you would like to know more about Test-Driven Development (or even if you are a sceptic!) then download and watch this webcast. Not only is this a gentle introduction to Test-Driven Development but it’s also quite funny!

Ron Jacobs also hosts ARCast which has some excellent content. Ron is “…Someone who understands what you are thinking… someone who can tell a good joke.” He also seems to have an infectious sense of humor!

Simian: A tool for Detecting Similar Code

Simian is a code similarity analyser that can be used to identify duplication in “…any human readable files…”. Simian runs natively in any .NET 1.1 or higher supported environment and on any Java 1.4 or higher virtual machine.

Howard van Rooijen shows how to integrate Simian into Visual Studio here Detecting duplicate code with Simian and also how to make it more usuable here MonkeyWrangler – Making Simian more usable in Visual Studio

To incorporate it into your NAnt automated build scripts, create a simian target:

<property name=”Exec.Simian” value=”C:\BuildTools\simian-2.2.8\bin\simian-2.2.8.exe”/>

<target name=”runSimian” description=”Runs Simian to find duplicate code”>

<exec program=”{Exec.Simian}"></span>        <span style="color:blue;"><</span><span style="color:maroon;">arg</span><span style="color:fuchsia;"> </span><span style="color:red;">value</span><span style="color:blue;">="-recurse={project.root}\*.cs”/>

<arg value=”-formatter=xml:${build.outputfolder}\simian.xml”/>

</exec>

</target>

The latest version of CruiseControl.Net already contains the necessary .XSL formatter to display the results in the CC.Net dashboard, just point it to the simian.xml output file.

Software Development Must Haves

If you are starting a career in software development, the choice you make for your first job is extremely important. It can make the difference between an average career and one that stands out from the crowd. When you go for an interview, you have to remember that the interview is a two-way process: you need to interview them as well. Finding an environment that will nurture your skills and direct your development, is often more important than simply finding the company that will pay you the most money. The Guerrilla Guide to Interviewing by Joel Spolsky is well worth reading.

Does your company have/do the following?:

  • Source code version control
  • Issue tracking system
  • Automated nightly build process (possibly with continuous integration)
  • Unit Tests and an automated unit testing process
  • Integration testing
  • Coding standards and design guidelines
  • Ability to build your entire product in a single step
  • A mentoring program for junior programmers
  • Developers always write code with the consumer in mind

The last point requires some explanation: when you are designing code and deciding ‘what the code should look like’ there is no better way than writing down how you envisage consumers (whoever they are) calling your methods. If you put yourself in the place of the consumer of your methods, you will invariably find the best way to phrase the interface of those methods. This is an important design principle when creating software frameworks.

This list is my shortened version of Joel Spolsky’s The Joel Test: 12 Steps to Better Code. If you’re in the job market, ask potential employers whether they have all of these. Look for warning signs like “we were going to implement ‘xyz’, and we know it’s a good idea/best practice BUT we don’t have time…”; these are the development environments to avoid!

Long and Short Variable Naming

Darren Neimke has been talking about variable naming and how long variable names should be: Debunking popular myths. I agree that long variable naming can and has been abused but would also like to throw in the following points (this is a edited version of my comments):

I have seen the situation many times when a programmmer will construct a poor abbreviation just because a rigid coding standard enforced that variable names should be at most N characters, and using the more full and descriptive name would have gone over by a few chars (say five too many). So you end up with a 10 character cryptic (or ambiguous) name as opposed to an 18 character descriptive name. I’d definitely prefer to see and read the latter.

In my view, an even bigger give away of regions of code that warrant closer inspection is when there is a mixture of very terse and very verbose variable naming, either because it’s the work of more than one programmer or just one who was unsure of what they were doing.

I agree that really long names are bad for the reasons Darren mentioned, but also for the reason that they make code harder to read, and therefore slower to understand, and therefore harder to maintain.

I guess in the end it’s about common sense; I obviously try to keep variables as short as possible whilst maximising their meaning. My 32-character maximum length rule of thumb is slightly longer than Darren’s, although in practice it would be extremely rare that I would ever name anything that long.

The Pitfalls of Bubble Sort

Approximately 15 years ago, a few months after joining a new company, I was approached by a programmer who had a problem. He knew that I had some experience in algorithm design and implementation. He told me that an application that had been working fine in testing was now running so poorly in production that it had practically come to a standstill. Although I had not seen the source code, I hazarded an educated guess as to the cause of the problem. I came right out and said “You’re using Bubble Sort aren’t you?” He looked at me a little perplexed, and said “…er Yes. But how did you know! It was working fine during testing”.

The problem only showed up in production because they were using a few hundred items in testing, but production had tens of thousands of items. This comparison table shows the time taken to solve some problem of size N using various algorithms of differing complexity. The actual times are not as important as the way in which the time increases:

Problem Size NlogN N
100 3.5 secs 0.19 secs 0.05 secs 0.003 secs
1000 1 hour 10 secs 0.46 secs 0.033 secs
10000 38 days 25 minutes 6 secs 0.33 secs
100000 100 years 1.5 days 1 minute 3.3 secs
1000000 100 million years! 5 months 13 minutes 33 secs

(Ignoring constants of proportionality, which in somes cases can cause higher order complexity algorithms to perform better than lower complexity ones when N is small)

BubbleSort is an O(N²) algorithm (best and worst cases). So why does anyone continue to teach the use of Bubble Sort in Colleges and Universities? For just a slightly increased complexity, you can implement Shell sort (named after its creator Donald Shell) which will always outperform BubbleSort and has a worst case performance of O(N^1.5) compared with BubbleSort’s O(N²) behaviour. Shellsort is very fast for small data sets (less than 1000 items).

If you want the fastest possible general purpose sorting algorithm then implement Sedgewick’ s median of three Quicksort, with insertion sorting of small subsets (this implemention removes vanilla Quicksort’s pathological O(N²) behaviour in the presence of almost sorted data).

Perhaps this is a candidate for one of those ‘negative’ interview questions: can you write down the bubblesort algorithm in code. This is a bit like asking a candidate if they can write down the code to describe the use of cursors in T-SQL. In my view, it is definitely a plus for those who can’t and prefer to rely upon (wherever possible) set based constructs instead.

Detecting and Removing Malware

I updated my virus scanner recently and it occurred to me that I haven’t heard anything in the news about a new virus for ages. Have they gone out of fashion or are new ones simply variants of old ones? Or is Microsoft’s security initiative having an effect?

So I had a trawl, and came across a webcast by Mark Russinovich on detecting and removing malware using 3 of the many Sysinternals tools, SigCheck, AutoRuns and ProcessExplorer. These are great tools and are free (as are all of the SysInternals offerings, such as FileMon and RegMon) and knowing how to use them is a valuable addition to any programmer’s toolkit.

You can find the webcast here: Understanding and Fighting Malware: Viruses, Spyware and Rootkits.