Software Development Must Haves

If you are starting a career in software development, the choice you make for your first job is extremely important. It can make the difference between an average career and one that stands out from the crowd. When you go for an interview, you have to remember that the interview is a two-way process: you need to interview them as well. Finding an environment that will nurture your skills and direct your development, is often more important than simply finding the company that will pay you the most money. The Guerrilla Guide to Interviewing by Joel Spolsky is well worth reading.

Does your company have/do the following?:

  • Source code version control
  • Issue tracking system
  • Automated nightly build process (possibly with continuous integration)
  • Unit Tests and an automated unit testing process
  • Integration testing
  • Coding standards and design guidelines
  • Ability to build your entire product in a single step
  • A mentoring program for junior programmers
  • Developers always write code with the consumer in mind

The last point requires some explanation: when you are designing code and deciding ‘what the code should look like’ there is no better way than writing down how you envisage consumers (whoever they are) calling your methods. If you put yourself in the place of the consumer of your methods, you will invariably find the best way to phrase the interface of those methods. This is an important design principle when creating software frameworks.

This list is my shortened version of Joel Spolsky’s The Joel Test: 12 Steps to Better Code. If you’re in the job market, ask potential employers whether they have all of these. Look for warning signs like “we were going to implement ‘xyz’, and we know it’s a good idea/best practice BUT we don’t have time…”; these are the development environments to avoid!

Long and Short Variable Naming

Darren Neimke has been talking about variable naming and how long variable names should be: Debunking popular myths. I agree that long variable naming can and has been abused but would also like to throw in the following points (this is a edited version of my comments):

I have seen the situation many times when a programmmer will construct a poor abbreviation just because a rigid coding standard enforced that variable names should be at most N characters, and using the more full and descriptive name would have gone over by a few chars (say five too many). So you end up with a 10 character cryptic (or ambiguous) name as opposed to an 18 character descriptive name. I’d definitely prefer to see and read the latter.

In my view, an even bigger give away of regions of code that warrant closer inspection is when there is a mixture of very terse and very verbose variable naming, either because it’s the work of more than one programmer or just one who was unsure of what they were doing.

I agree that really long names are bad for the reasons Darren mentioned, but also for the reason that they make code harder to read, and therefore slower to understand, and therefore harder to maintain.

I guess in the end it’s about common sense; I obviously try to keep variables as short as possible whilst maximising their meaning. My 32-character maximum length rule of thumb is slightly longer than Darren’s, although in practice it would be extremely rare that I would ever name anything that long.

The Pitfalls of Bubble Sort

Approximately 15 years ago, a few months after joining a new company, I was approached by a programmer who had a problem. He knew that I had some experience in algorithm design and implementation. He told me that an application that had been working fine in testing was now running so poorly in production that it had practically come to a standstill. Although I had not seen the source code, I hazarded an educated guess as to the cause of the problem. I came right out and said “You’re using Bubble Sort aren’t you?” He looked at me a little perplexed, and said “…er Yes. But how did you know! It was working fine during testing”.

The problem only showed up in production because they were using a few hundred items in testing, but production had tens of thousands of items. This comparison table shows the time taken to solve some problem of size N using various algorithms of differing complexity. The actual times are not as important as the way in which the time increases:

Problem Size NlogN N
100 3.5 secs 0.19 secs 0.05 secs 0.003 secs
1000 1 hour 10 secs 0.46 secs 0.033 secs
10000 38 days 25 minutes 6 secs 0.33 secs
100000 100 years 1.5 days 1 minute 3.3 secs
1000000 100 million years! 5 months 13 minutes 33 secs

(Ignoring constants of proportionality, which in somes cases can cause higher order complexity algorithms to perform better than lower complexity ones when N is small)

BubbleSort is an O(N²) algorithm (best and worst cases). So why does anyone continue to teach the use of Bubble Sort in Colleges and Universities? For just a slightly increased complexity, you can implement Shell sort (named after its creator Donald Shell) which will always outperform BubbleSort and has a worst case performance of O(N^1.5) compared with BubbleSort’s O(N²) behaviour. Shellsort is very fast for small data sets (less than 1000 items).

If you want the fastest possible general purpose sorting algorithm then implement Sedgewick’ s median of three Quicksort, with insertion sorting of small subsets (this implemention removes vanilla Quicksort’s pathological O(N²) behaviour in the presence of almost sorted data).

Perhaps this is a candidate for one of those ‘negative’ interview questions: can you write down the bubblesort algorithm in code. This is a bit like asking a candidate if they can write down the code to describe the use of cursors in T-SQL. In my view, it is definitely a plus for those who can’t and prefer to rely upon (wherever possible) set based constructs instead.

Detecting and Removing Malware

I updated my virus scanner recently and it occurred to me that I haven’t heard anything in the news about a new virus for ages. Have they gone out of fashion or are new ones simply variants of old ones? Or is Microsoft’s security initiative having an effect?

So I had a trawl, and came across a webcast by Mark Russinovich on detecting and removing malware using 3 of the many Sysinternals tools, SigCheck, AutoRuns and ProcessExplorer. These are great tools and are free (as are all of the SysInternals offerings, such as FileMon and RegMon) and knowing how to use them is a valuable addition to any programmer’s toolkit.

You can find the webcast here: Understanding and Fighting Malware: Viruses, Spyware and Rootkits.

Recommended Computing Books

I was just about to order Jeffrey Richter’s book “CLR via C#” to supplement my copy of his previous book “Applied .Net Framework”, when I saw the announcement about the new version of the .Net framework, .Net 3.0. At this rate of change, buying platform specific books is becoming less and less appealing and relevant.

I can’t recall who said it but “you can avoid technical obsolescence by choosing timeless books” is great advice. Here’s a list of recommended reading for all software developers:

Code Complete, Second Edition: Steve McConnell. If you’re in the software industry and you only ever read one book, then this is the book you should read. Every developer, regardless of language, platform or domain, should have read this book at least once. There is no single work that contains so much of relevance to developers. At the last count, I’m on my fifth re-read, cover to cover.

Rapid Development: Steve McConnell. If you only ever read two books on software development, make this the second! Keep this on your desk at all times. Buy two copies; one for work and one for home. It will pay for itself many, many times over. If you are beginning a career in software development, this book could short-circuit 5 years of lessons learned on the job.

The Pragmatic Programmer: Andrew Hunt and Dave Thomas.If you are only going to read one book and you want something a little shorter than either Code Complete or Rapid Development, then this is the one. If you loan it to another developer, do not expect to see it again! The first line of the book states “This book will help you become a better programmer”. It will.

Don’t Make Me Think: A Common Sense Approach to Web Usability. Steve Krug. Great for web, and equally applicable to windows. Short, easy read, but valuable. A little gem of a book. If you design web sites, this is required reading.

The Inmates are Running the Asylum: Alan Cooper. Discusses real world examples of usability, and is a highly enjoyable read. You probably won’t agree with everything (I didn’t), but it certainly gets you thinking.

The Medical Detectives: Berton Roueche. Not a computing book, but a great book on the approach to debugging. A good read to boot, although the prose can be a little laboured at times.

Refactoring: Martin Fowler. A great book that takes the reader on a journey through the process of refactoring actual code.

Head First Design Patterns: Elizabeth Freeman and Eric Freeman. This is a truly amazing book. If you want to learn about design patterns and more importantly how to apply the underlying OO design concepts, this is the best book available on the subject. I recently recommended this to several people.

Patterns of Enterprise Application Architecture by Martin Fowler. Coupled with “Head First Design Patterns” this is a superb reference to have to hand.

UML Distilled: Martin Fowler. If you seriously want to learn UML (and do it quickly without struggling) then this is the book to read.

Behind Closed Doors, Secrets of Great Management: Rothman and Derby. Practical advice on managing a software team. Excellent.

Test-Driven Development: Kent Beck. A slim, very readable, hands-on book that introduces and builds upon the concepts of the ‘write tests first’ development approach. Some would say that this is a natural evolution in the way that software should be created.

SQL Tuning: Dan Tow. A new approach to platform independent tuning of SQL queries. Took a while to get into, but well worth the effort.

The Mythical Man Month: Fred Brooks. Perhaps the classic work on managing software development projects. “How does project slip its schedule? One day at a time”

Programming Pearls: John Bently. An oldie, but a goldie! Insights into how algorithms are conceived and implemented. Introduces the concept of ‘back-of-the-envelope’ calculations. Very useful.

Writing Solid Code: Steve Maguire. Aimed at C programmers but full of insights equally applicable to other languages. This book had a profound effect on the way I write code and the approach I take.

The Psychology of Computer Programming: Silver Anniversary Edition by Gerald Weinberg. An insight into the mind of the programmer, also described as “computer programming as a human activity”.

Design Patterns: by Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. This classic work is currently being updated.

The Guru’s Guide to Transact SQL: Ken Henderson. If you write T-SQL as part of your day-to-day job, then should be the first of several Ken Henderson books you should read.

Programming Windows Security : Keith Brown. Everything you wanted to know about Windows security but were afraid to ask.

The last two are platform specific, but excellent nonetheless.

I started this list of books some time ago, but was prompted to finish and post it by a colleague whose son is studying computer science, and was concerned about what books he should read.

Agile Pioneer?

Was John Gall the pioneer of agile development? His little known book Systemantics (published in 1977 and currently out of print) has been influential in shaping the views of several prominent practitioners of software development:

“A complex system that works is invariably found to have evolved from a simple system that worked…A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”
— Systemantics: How Systems Really Work and How They Fail. John Gall

Gall’s Law has strong affinities to the practice of agile software development, where under-specification rather than over-specification is the key to success.