Pro-science: Programming tips

In my daily life, I work as an IT-consultant, mostly on a time/material basis (i.e. I bill per the hour). Given the fact that using consultants is somewhat more expensive than using your own work force, I only get hired in three types of situations:

1) The project needs some resources that cannot be found in-house.
2) The project needs some expertise that cannot be found in-house.
3) The project either has gone wrong, or is on the path in that direction, and there is a need for an outside view on things.

Obviously, this is not an either-or scenario, where I'm only hired for one of the reasons. And I've found that while reason 1 or 2 might be the reason I'm hired, reason 3 is the reason why this occurred.

Anyway, whatever the reason for me getting hired, it can generally be said that at the time I get on a project, it is usually in some kind of problems. This means that I tend to spend a large portion of time, looking at the existing code base, and try to improve that. Given this, I thought it might be worthwhile writing a list of what I see as good coding practices.

There is nothing groundbreaking in this list - most of you probably do this all the time. Still, when the project is going downhill, it's some times good to get reminded that cutting a corner now, might cause big problems later.

I should probably explain that my point of view is that of a person who does a lot of debugging. So, my main goal is ease of debugging. I also like performance, but if I have to choose between those two things, I'll focus on ease of debugging.

I take it as a given that you use some kind of source control. If you don't, your problem is much more fundamental than anything this list addresses, and you should take a long, hard look at your practices.

Fail Fast

This is an approach championed by people like Michael Nygard (author of Release It!), who rightfully point out that users are inpatient. It’s better to fail as early as possible. This means that you should ensure that you have all the data, resources etc. you need as early as possible (but not any earlier). E.g. if I want to update the information on a customer, for which I require that I have a customer number, any functions that calls the update customer method should ensure that they have a customer number, and otherwise fail.

Validate input

While this might seem redundant when using the fail fast strategy, the simple fact is that people make mistakes (we all do), or it might not be clear to someone else what’s required for the method to work.

So, ensure that any input that is required is actually given, and make sure that e.g. the string which should contain a number actually contains a number.

Fail explicitly, not implicitly

Often I run across methods that will results in exceptions when certain conditions are not met, without there being any explicit exceptions thrown. While this might seem acceptable, since those conditions should never be met, it’s better to throw an explicit exception. This shows to later reviewers that someone actually thought this through, and makes it possible to enrich the exception with more telling error-messages.

DRY

DRY stands for Don’t Repeat Yourself, and the principle is explained in the excellent book The Pragmatic Programmer (if you haven’t read it, I suggest reading it). Basically, the idea is to not repeat yourself in any way while developing. In this context, it means that if you find yourself writing a lot of very similar code, you should try to see if you can generalize the code into one or more methods that can be called.
This makes it easier to read the code, ensures that errors should only be fixed one place, and it makes it a lot easier to unit-test it.

Don’t copy and paste - generalize

Well, pretty much the same as above, but really: if you find yourself copying and pasting a lot of code, you’re pretty sure to be doing it wrong.

Split up your methods

Long methods is the bane of debugging and should be avoided as much as possible. If possible, try to split your method up in smaller methods, each with their own responsibility.
Generally speaking, any methods that take care of several responsibilities should be split up. It makes re-use and generalizing easier as well

Minimize your number of method calls

Often people tend to dislike local variables (Martin Fowler explicitly aimed at getting rid of them in his book Refactoring). That’s wrong, from both a debugging and a performance issue.

When you need to call the same method several times in a function, try to see if it’s possible to store the result in a local variable the first time you call it, and then use the local variable the rest of the time.

When you call the method, the object will be created anyway, so you won’t really save any memory, and by doing it as I suggest, any hidden (or later introduced) costs in the method will not come back and haunt you.

Don’t have more than one method call per line

This is a huge pet peeve of mine, and solely related to debugging.
A lot of people like to minimize methods by reducing the number of lines of code. One way they do this, is by making more than one method call on each line if possible.

E.g. a call to a method that creates an object which is needed as an input parameter to a different method, is often called something like thus:

Bool result = updateCustomer(GetCurrentCustomerFromCache());

While this line of code is technically fine, it’s very hard to debug (and adding more parameters generated the same way will only make it worse). Instead, split it up in two lines

Customer customer = GetCurrentCustomerFromCache();
Bool result = updateCustomer(customer);

Now, if it fails, it’s easy to see which method caused the failure.

Don’t just document code, document assumptions as well

Good code pretty much documents itself. I.e. we can read the code, and figure out what happens. Unfortunately it doesn’t tell us why it happens, so if any of the code is based on assumptions of any kind, make sure to include it in the code documentation.

Give your variable meaningful names

Yes, we have all given our input parameters names like 'x', and expected people (including ourselves at a later stage) to understand it when they saw it. Unfortunately, people don't understand parameter names like that, nor will the understand variables or methods with that sort of names. This means that they will have to read through the code, to see what 'x' really is.

If the parameter was named something like 'customerNo', then it's much easier to understand what it should contain.

Watch out for those loops

Loops of all kinds (for-loop, while-loop, foreach-loops etc.), are among the biggest causes of performance issues inside the code. Make sure that you place method calls outside the loops if at all possible.

NB: Remember that the for-loop declaration is part of the loop, so a method call as the stop variable will be executed each time the loops run.

So, the following for-loop is inefficient:

for(int i=0; i < GetArray().Length; i++)

Rather it should be written thus:

int arrayLength = GetArray().Length
for(int i=0; i < arrayLength; i++)

Use build-in methods rather than develop your own

We have probably all at one stage or another made our own version of some standard method because we thought we could do it smarter. My suggestion is that you don’t, unless there are some real needs you need to fulfill.
The build in methods are integrated tightly with the framework, and is usually much better implemented than we can hope to match – and if it isn’t right now, it will be so at a later stage (these methods do get updated).

Yes, I know there is a real challenge in building an XML parser, but please don’t. Use the build-in version instead.

Convert strings to enums, not visa versa

It appears that there often is a need to compare the value of an enum with the value inside the string. There are two ways of doing this, one is to convert the string into an enum and compare the two enums; the other is to convert the enum into a string, and compare the two strings.

Choose the first approach.

Enum comparisons are integer comparisons, which are much lighter than string comparing.
Also, in .NET, the ToString method on the enum object type is currently not very efficiently, and there is an amazing overhead in it, so there is no efficiency lost in converting the string to the enum, and not the other way.

Nullable types are there to be used

The default value of many data types (e.g. int) are impossible to distinguish from values that has been set. You can’t tell if the integer you’re working with has just been initialized, or if someone actually set it to 0, so if that makes a difference, make sure that you use nullable types if your languages supports them.

If your language don't support nullable types, make sure to make datatypes that contain your value, and can be null. E.g. an amount class that only contains an amount property.

Yes, there is an overhead, but in some types of systems it really makes a difference if the value has been set or not.

Doubles are imprecise

Doubles, and other floating point value types, are inherently imprecise, so use the decimal data type instead, if your languages supports this.

When parsing numbers and dates, include your culture

You can’t be sure what setup the server the program ends up running on has, so make sure to tell the methods what culture you’re using.

In .NET there is also a culture relevant property in rounding. In Europe, midway rounding is away from zero, while in the US is to-even. So, 2.5 will be rounded to 3 in Europe and 2 in the US. In .NET the default rounding method is the US way, so if you need to round, make sure to include the MidpointRounding mode.
I suspect that other languages have similar issues.

Make sure that unit tests also tests the failures

Here I assume that you actually make unit tests. If you don't, start doing it.

If you expect the method to throw an exception under certain circumstances, make sure to make a test for that also. This serves two purposes
1) It ensures you have implemented it correctly.
2) It ensures that any changes to the method will not cause this effect to go away.

Remember that lists and arrays can be null

Before checking the count or length of the list or array, better make sure that there actually is a list or array. The length of a null is a null-pointer exception.

Remember that lists and arrays can contain null objects

There are no problems in filling an array with nulls. So don’t assume that just because the list or array contains some values, that those values are actually initialized.

Enum values also have a default value

When you’re working with an enum, remember that it’s initialized to the first value in it. At least, that's how it is in .NET. Other languages might behave differently, but there will be a default value somehow.

Don't out-comment code, delete it

When there are errors in code, it's some times easier to re-write the code than try to fix the existing code. While doing so, people tend to out-commented the existing code, so they can roll-back if the new code doesn't work. That's fine, while working on a local copy of the code, but once you feel it's ready to commit to the code base, make sure to remove that out-commented code.

Code that's out-commented, is dead code, and is just confusing later reviewers/debuggers, who get worried if the code should actually have been used somehow. So, don't leave it there - remove it.

If it turns out later that we really needed to roll-back the code to the earlier version, then that's why our source-control is there. The code doesn't disappear, just because you delete it. We can always go back to the version of the code that included it.

When changing code, plan how you'll test it

We all know the situation. There is some code that works as described in the specs, but unfortunately the specs are out of date. This means we have to change the code. No problem, that happens all the time. Unfortunately, what also happens all the time, is that either the code stop working entirely, the code works the wrong way, or that there is some kind of side effect cause by the change.

So, when changing code, make sure that you have a test strategy for ensuring that the changed code works as expected, and that the change doesn't cause side effects.

Unittests and automatic tests are a bare minimum, but preferably, get the customers or testers to make a test scenario for the new functionality, before starting implementing it. This way, you are also more likely to understand the new requirements correctly - or at least realize that you don't understand what they want.

Finally, I'll say, that people shouldn't be afraid of refactoring if they see something that is not right, or is written in such a way that it's hard to understand the logic. Refactoring is not a goal in itself, but it can help the development process immensely, and it'll certainly make the code easier to maintain at a later stage.

Labels: programming, systems development

Pro-science

Saturday, October 25, 2008

Programming tips

0 Comments:

About Me

Previous Posts