Pro-science

Monday, April 06, 2009

Debugging friendly code

When you often take over other people’s code, you often start getting your own pet issue which you focus on. Well, my pet issue is debugging, or rather ease of debugging. I don’t want to have to know and understand the whole business domain, program or component when I want to fix an error (bug). Most of the time, it should be possible to fix errors by stepping through the code while debugging, and finding the error.

For this to be possible, however, requires the code to be debugging friendly. By this I mean that each class, method and even code line should have well defined responsibilities, which leaves no doubt where and how the error occurred.
This sounds all well and fine on the abstract plan, but how does it relate to real code? Well, that’s of course a little harder to say, but I can give some general guidelines which should be followed to achieve this.

Methods should be limited in scope. E.g. if you need to implement a method which fetches something from the database, the method should not also be responsible for putting the data in the cache or convert the data into a different data type. If those things are necessary, a method should be made for each of those functionalities.

Methods should be generalized as much as possible. Instead of copy and pasting methods and then modifying them to suit your needs, see if it isn’t possible to generalize the functionality in some way or other, so just one function is responsible for it.

Methods, parameters, classes, variables etc. should have telling names. Don’t pass x, y, z along as parameters in a method call. On the other hand, don’t make long names explaining the exact circumstances when it’s obvious from the scope what it is. The id of a customer object doesn’t need to get called customer Id.

Use local variables! Martin Fowler might disagree, but he probably doesn’t have to debug other peoples’ code very often. When you call methodX(methodY(Z)) it’s not possible to easily see whether it’s method or methodY which causes the null pointer exception.

Make unit tests for, at least, the critical methods.

Comment the code. Don’t explain the obvious, but rather focus on explaining the assumptions behind what you’re doing.

Check parameters for illegal values. If your parameter should never be null, then check for that – in case it is null, then throw an exception, explaining the problem. This shows other people (or a later you) that someone thought of the possibility, and didn’t just forget to handle null values as input parameters.

Ensure that your method behaves in a uniform way, i.e. giving the same parameters the method should always behave the same way (barring other dependencies). I once experienced a ToString() method in a class which always appended something at the end of the string, causing the behavior to different dependent on whether the method had been called before or not.

All of these things might seem simple, but when the project is 3 months over time, your customer or project leader (or both!) is breathing down your neck, then it’s easy to cut corners. You might also know the system and/or domain very well, which allows you to make some assumptions which are not obvious for others – not only does this make it harder for others to debug, but it also might cause people to use the code in the wrong way.

So, what happens if you come across code, or inherit code, which doesn’t conform to my guidelines? Well, be bold and refactor. Do it one step at the time – if method calls are used as parameters, make local variables. If methods are responsible for several things, split it into several methods with distinct responsibilities, and so on. And of course, make sure that there are unit tests.

Labels: debugging, IT consulting, programming

Saturday, January 17, 2009

Error-driven software development

When developing software systems, there are a number of systems development types out there, e.g. test-driven development (focuses on making tests before implementing), and what might be called requirements-driven development (focus on finding all the requirements before implementing). Unfortunately, there is a type of development that I all too frequently come across, which I've come to call error-driven development.

Error-driven development is systems development, where everything is done in reaction to errors. In other words, the development is reactive, rather than proactive, and everybody is working hard, just to keep the project afloat, without any real progress being made.

I should probably clarify, that I am not speaking about the bug fixing phases, which occurs in every project, but rather the cases where the project seems to be nothing but bug-fixing (or change-requests, which is to my eyes is a different sort of bug reports), without any real progress being made.

Unsurprisingly, this is not very satisfactory for any of the people involved. What's more, it's often caused by deep, underlying problems, where the errors are just symptoms. Until these underlying problems are found, the project will never get on the right track, and will end up becoming a death march.

The type of underlying problems, which can cause error-driven development, could be things like:

Different understanding of the requirements for the software among the people involved. Some times the people who make the requirements have an entirely different understanding of what the end system should be like than the end users.

Internal politics. Some departments or employees might have different agendas, which might lead to less than optimal working conditions.

Lack of domain knowledge among the people involved. If you're building e.g. a financial system, it helps if at least some of the people involved in the development have a basic idea of the domain you're working within.

Bad design. Some times early design decisions will haunt you for the rest of the project.

Unrealistic time constraints. If people don't have time to finish their things properly, they will need to spend more time on error fixing later.

There are of course many other candidates, and several of them can be in play at the same time, causing problems.

No matter what the underlying problems are, the fact is, that just focusing on fixing bugs and implementing change requests, won't help. Instead it's important to take a long hard look at the project, and see if the underlying problems can be found and addressed.

This seems trivial, but when you're in the middle of an error-driven development project, it's hard to step out and take an objective look at it. What's more, you might not be able to look objectively at the process. Often, it requires someone who hasn't been involved from the start, to come and look at things with fresh eyes.

As a consultant who often works on a time-material basis, I often get hired to work on error-driven development projects. The reason for this is simple: often it appears to the people involved, that the project just need a little more resources, so they can get over the hurdle of errors, and then it will be on the right track. When hired for such projects, I always try to see if there are some underlying problems which needs to be addressed, instead of just going ahead and fixing errors/implementing changes. Unsurprisingly there often are such problems.

Frequently these problems can be fixed fairly simply (reversing some old design decisions, expanding peoples' domain knowledge, get people to communicate better, implement a test strategy, use agile methods etc.), while at other times, they can't be fixed, only taken into consideration, allowing you to avoid the worst pitfalls.

So, my suggestion is, if you find yourself in a project which over time has turned into an error-driven development type project, try to take a long hard look at what has caused this, instead of just going ahead and try to fix all the errors/implement the changes. Error reports and change requests are just noisy symptoms in most cases, and will continue to appear as long as the real problems aren't addressed in one way or another.

Labels: IT consulting, programming, systems development

Saturday, November 22, 2008

Book Review: Release It!

Release It! - Design and Deploy Production Ready Software by Michael T. Nygard

If you are in the business of making software systems, odds are that you might have heard about Nygard's book. People have raved about it since it was published in 2007.

That being the case, it had been on my to-read list for a while, but without any urgency. Then I went to the JAOO conference last month, and heard two sessions with Michael Nygard presenting his ideas. After that, I knew I had to get hold of the book straight away.

Release It! is something as rare as a book which is groundbreaking while stating the obvious.

First of all, Nygard makes the simple point that we (meaning the people in the business) are all too focused on making our systems ready to pass QA's tests and not on making ready to go into production. This is hardly news, but it's the dirty little secret of the business. It's not something you're supposed to say out loud. Yet Nygard does that. And not only that, he dares to demand that we do better.

Having committed this heresy, he goes on to explain how we can go around doing that.

He does that in two ways. First he present us for the anti-patterns which will stop us from having a running system in production, and then he present us for the patterns which will make it possible to avoid them. Or, if it's not possible to avoid them, to minimize the damage caused by them.

That's another theme of Nygard's book. The insistence that the system will break, and the focus on implementing ways to do damage control and recovery.

The book is not only aimed at programmers, though they should certainly read it, it's also aimed at anyone else involved in the development, testing, configuration and deployment of the system at a technical level, including people involved in the planning of those tasks.

As people might have figured by now, I think the hype around the book has been highly warranted, and I think that any person involved in the field would do well to read the book.

Labels: book review, Michael Nygard, programming

Saturday, October 25, 2008

Programming tips

In my daily life, I work as an IT-consultant, mostly on a time/material basis (i.e. I bill per the hour). Given the fact that using consultants is somewhat more expensive than using your own work force, I only get hired in three types of situations:

1) The project needs some resources that cannot be found in-house.
2) The project needs some expertise that cannot be found in-house.
3) The project either has gone wrong, or is on the path in that direction, and there is a need for an outside view on things.

Obviously, this is not an either-or scenario, where I'm only hired for one of the reasons. And I've found that while reason 1 or 2 might be the reason I'm hired, reason 3 is the reason why this occurred.

Anyway, whatever the reason for me getting hired, it can generally be said that at the time I get on a project, it is usually in some kind of problems. This means that I tend to spend a large portion of time, looking at the existing code base, and try to improve that. Given this, I thought it might be worthwhile writing a list of what I see as good coding practices.

There is nothing groundbreaking in this list - most of you probably do this all the time. Still, when the project is going downhill, it's some times good to get reminded that cutting a corner now, might cause big problems later.

I should probably explain that my point of view is that of a person who does a lot of debugging. So, my main goal is ease of debugging. I also like performance, but if I have to choose between those two things, I'll focus on ease of debugging.

I take it as a given that you use some kind of source control. If you don't, your problem is much more fundamental than anything this list addresses, and you should take a long, hard look at your practices.

Fail Fast

This is an approach championed by people like Michael Nygard (author of Release It!), who rightfully point out that users are inpatient. It’s better to fail as early as possible. This means that you should ensure that you have all the data, resources etc. you need as early as possible (but not any earlier). E.g. if I want to update the information on a customer, for which I require that I have a customer number, any functions that calls the update customer method should ensure that they have a customer number, and otherwise fail.

Validate input

While this might seem redundant when using the fail fast strategy, the simple fact is that people make mistakes (we all do), or it might not be clear to someone else what’s required for the method to work.

So, ensure that any input that is required is actually given, and make sure that e.g. the string which should contain a number actually contains a number.

Fail explicitly, not implicitly

Often I run across methods that will results in exceptions when certain conditions are not met, without there being any explicit exceptions thrown. While this might seem acceptable, since those conditions should never be met, it’s better to throw an explicit exception. This shows to later reviewers that someone actually thought this through, and makes it possible to enrich the exception with more telling error-messages.

DRY

DRY stands for Don’t Repeat Yourself, and the principle is explained in the excellent book The Pragmatic Programmer (if you haven’t read it, I suggest reading it). Basically, the idea is to not repeat yourself in any way while developing. In this context, it means that if you find yourself writing a lot of very similar code, you should try to see if you can generalize the code into one or more methods that can be called.
This makes it easier to read the code, ensures that errors should only be fixed one place, and it makes it a lot easier to unit-test it.

Don’t copy and paste - generalize

Well, pretty much the same as above, but really: if you find yourself copying and pasting a lot of code, you’re pretty sure to be doing it wrong.

Split up your methods

Long methods is the bane of debugging and should be avoided as much as possible. If possible, try to split your method up in smaller methods, each with their own responsibility.
Generally speaking, any methods that take care of several responsibilities should be split up. It makes re-use and generalizing easier as well

Minimize your number of method calls

Often people tend to dislike local variables (Martin Fowler explicitly aimed at getting rid of them in his book Refactoring). That’s wrong, from both a debugging and a performance issue.

When you need to call the same method several times in a function, try to see if it’s possible to store the result in a local variable the first time you call it, and then use the local variable the rest of the time.

When you call the method, the object will be created anyway, so you won’t really save any memory, and by doing it as I suggest, any hidden (or later introduced) costs in the method will not come back and haunt you.

Don’t have more than one method call per line

This is a huge pet peeve of mine, and solely related to debugging.
A lot of people like to minimize methods by reducing the number of lines of code. One way they do this, is by making more than one method call on each line if possible.

E.g. a call to a method that creates an object which is needed as an input parameter to a different method, is often called something like thus:

Bool result = updateCustomer(GetCurrentCustomerFromCache());

While this line of code is technically fine, it’s very hard to debug (and adding more parameters generated the same way will only make it worse). Instead, split it up in two lines

Customer customer = GetCurrentCustomerFromCache();
Bool result = updateCustomer(customer);

Now, if it fails, it’s easy to see which method caused the failure.

Don’t just document code, document assumptions as well

Good code pretty much documents itself. I.e. we can read the code, and figure out what happens. Unfortunately it doesn’t tell us why it happens, so if any of the code is based on assumptions of any kind, make sure to include it in the code documentation.

Give your variable meaningful names

Yes, we have all given our input parameters names like 'x', and expected people (including ourselves at a later stage) to understand it when they saw it. Unfortunately, people don't understand parameter names like that, nor will the understand variables or methods with that sort of names. This means that they will have to read through the code, to see what 'x' really is.

If the parameter was named something like 'customerNo', then it's much easier to understand what it should contain.

Watch out for those loops

Loops of all kinds (for-loop, while-loop, foreach-loops etc.), are among the biggest causes of performance issues inside the code. Make sure that you place method calls outside the loops if at all possible.

NB: Remember that the for-loop declaration is part of the loop, so a method call as the stop variable will be executed each time the loops run.

So, the following for-loop is inefficient:

for(int i=0; i < GetArray().Length; i++)

Rather it should be written thus:

int arrayLength = GetArray().Length
for(int i=0; i < arrayLength; i++)

Use build-in methods rather than develop your own

We have probably all at one stage or another made our own version of some standard method because we thought we could do it smarter. My suggestion is that you don’t, unless there are some real needs you need to fulfill.
The build in methods are integrated tightly with the framework, and is usually much better implemented than we can hope to match – and if it isn’t right now, it will be so at a later stage (these methods do get updated).

Yes, I know there is a real challenge in building an XML parser, but please don’t. Use the build-in version instead.

Convert strings to enums, not visa versa

It appears that there often is a need to compare the value of an enum with the value inside the string. There are two ways of doing this, one is to convert the string into an enum and compare the two enums; the other is to convert the enum into a string, and compare the two strings.

Choose the first approach.

Enum comparisons are integer comparisons, which are much lighter than string comparing.
Also, in .NET, the ToString method on the enum object type is currently not very efficiently, and there is an amazing overhead in it, so there is no efficiency lost in converting the string to the enum, and not the other way.

Nullable types are there to be used

The default value of many data types (e.g. int) are impossible to distinguish from values that has been set. You can’t tell if the integer you’re working with has just been initialized, or if someone actually set it to 0, so if that makes a difference, make sure that you use nullable types if your languages supports them.

If your language don't support nullable types, make sure to make datatypes that contain your value, and can be null. E.g. an amount class that only contains an amount property.

Yes, there is an overhead, but in some types of systems it really makes a difference if the value has been set or not.

Doubles are imprecise

Doubles, and other floating point value types, are inherently imprecise, so use the decimal data type instead, if your languages supports this.

When parsing numbers and dates, include your culture

You can’t be sure what setup the server the program ends up running on has, so make sure to tell the methods what culture you’re using.

In .NET there is also a culture relevant property in rounding. In Europe, midway rounding is away from zero, while in the US is to-even. So, 2.5 will be rounded to 3 in Europe and 2 in the US. In .NET the default rounding method is the US way, so if you need to round, make sure to include the MidpointRounding mode.
I suspect that other languages have similar issues.

Make sure that unit tests also tests the failures

Here I assume that you actually make unit tests. If you don't, start doing it.

If you expect the method to throw an exception under certain circumstances, make sure to make a test for that also. This serves two purposes
1) It ensures you have implemented it correctly.
2) It ensures that any changes to the method will not cause this effect to go away.

Remember that lists and arrays can be null

Before checking the count or length of the list or array, better make sure that there actually is a list or array. The length of a null is a null-pointer exception.

Remember that lists and arrays can contain null objects

There are no problems in filling an array with nulls. So don’t assume that just because the list or array contains some values, that those values are actually initialized.

Enum values also have a default value

When you’re working with an enum, remember that it’s initialized to the first value in it. At least, that's how it is in .NET. Other languages might behave differently, but there will be a default value somehow.

Don't out-comment code, delete it

When there are errors in code, it's some times easier to re-write the code than try to fix the existing code. While doing so, people tend to out-commented the existing code, so they can roll-back if the new code doesn't work. That's fine, while working on a local copy of the code, but once you feel it's ready to commit to the code base, make sure to remove that out-commented code.

Code that's out-commented, is dead code, and is just confusing later reviewers/debuggers, who get worried if the code should actually have been used somehow. So, don't leave it there - remove it.

If it turns out later that we really needed to roll-back the code to the earlier version, then that's why our source-control is there. The code doesn't disappear, just because you delete it. We can always go back to the version of the code that included it.

When changing code, plan how you'll test it

We all know the situation. There is some code that works as described in the specs, but unfortunately the specs are out of date. This means we have to change the code. No problem, that happens all the time. Unfortunately, what also happens all the time, is that either the code stop working entirely, the code works the wrong way, or that there is some kind of side effect cause by the change.

So, when changing code, make sure that you have a test strategy for ensuring that the changed code works as expected, and that the change doesn't cause side effects.

Unittests and automatic tests are a bare minimum, but preferably, get the customers or testers to make a test scenario for the new functionality, before starting implementing it. This way, you are also more likely to understand the new requirements correctly - or at least realize that you don't understand what they want.

Finally, I'll say, that people shouldn't be afraid of refactoring if they see something that is not right, or is written in such a way that it's hard to understand the logic. Refactoring is not a goal in itself, but it can help the development process immensely, and it'll certainly make the code easier to maintain at a later stage.

Labels: programming, systems development

Sunday, October 05, 2008

The roots of C#

The Australian version of Computerworld is doing a series on programming languages, and a couple of days ago, they published an interview with Anders Hejlsberg about C#.

The A-Z of programming languages: C#

I guess you have to be interested in programming and programming languages to find the article interesting, but if you are interested in those subjects, I think you should check out the article.

Note that at the top of the first page, there are some links to earlier articles in the series (which includes C++, JavaScript, Haskell, and Python). I haven't read these yet myself, but I expect them to also be interesting.

Labels: .NET, Anders Hejlsberg, C#, programming, programming languages

Thoughts from JAOO

JAOO conference
Originally uploaded by Kristjan Wager

At the start of this week, I spent three days at the JAOO conference in Århus.

JAOO is without a doubt, the biggest developer conference in Denmark, and while it's roots is in the JAVA community in Denmark, it has grown to become a cross-technology conference, where there is focus on both individual technologies and on trends.

I went to the conference with some of my co-workers (including Frank Vilhelmsen) and with the head of Neo4j. Neo4j is something as interesting as a graph database (think math). While it doesn't support .NET yet, I find the product very interesting, and would suggest that anyone who codes in JAVA check it out.

Well, back to the conference.

Given the fact that I go to Microsoft seminars regularly, and that my role in projects are often not just pure development, but also involve stuff like architecture, code review and similar tasks, I decided to skip the technology specific talks, and focus on those with a broader scope.

I won't go into all the talks I listened to, but I got a lot out of this approach (more than those of my colleagues who only went for the technology specific talks), and I will certainly return to JAOO in the future.

The best talks I heard, was two talks given by Michael Nygard, the author of Release It! - a book that I have heard nothing but good stuff about, and which I certainly will get and read.

Labels: JAOO, Michael Nygard, neo4j, programming, programming languages, systems development

Sunday, September 28, 2008

The future of programming languages

I've just returned from a Microsoft Tech-talk event with Anders Hjelsberg and Steve Ballmer.

As people might realize, it's hardly every day that we get to see such people in Denmark, and even though it was held on a Sunday I decided to go.

To my disappointment, Steve Ballmer's participation was limited to basically 15 minutes of pep-talk. Fortunately, Hjelsberg's participation was pretty interesting, which made up for that - or rather, it would, if I didn't have the suspicion that I'm going to hear the exact same presentation from him tomorrow at JAOO.

The topic of Hejlsberg's presentation was "the future of programming languages", and while he deliberately kept away from making too far-fetched predictions, he still made a lot of good points.

I'll try to summaries them for you, though they will be para-phrased, as I was too busy paying attention to what he said, to make good notes.

First of all, Hejlsberg argued the following
1) Programming languages evolves slowly
2) Multi-language platforms are important.

This might sound like he is trying to push .NET, but as he explained, it's both the Java and .NET platform that are used to support multiple languages, and that's good, exactly because languages evolves slowly. Too much of the development time used to make new languages are used on tools and framework, which can be reused from existing languages.

After having said that, Hejlberg argued that he saw three trends now, and in the future.
1) Declarative
2) Dynamic
3) Concurrent

What he meant by declarative, is that we see, and will continue to see, more and more domain specific programming languages, and we'll see more of the functional programming languages.

Of course, he also said that the classic taxonomies are breaking down, and we see a trend towards multi-paradigm languages, so today's object orientated languages might mutate into also supporting functional languages, or contain domain specific languages. In fact, he argued that things like LINQ and Ruby on Rails are exactly such things.

Regarding functional languages, he mentioned that F#, which Microsoft plans on integrating in the next version of Visual Studio, is the first functional programming language with a full-blown development environment behind it, allowing it to break out of the traditional academic setting of functional programming languages. Of course, he also explained that F# is really a multi-paradigm language, and thus is a symptom of the taxonomy breakdown he spoke about.

Looking at the dynamic trend, he explained how the .NET framework is being extended to include dynamic programming languages, such as IronPython and IronRuby, and even extend existing languages like C# and VB.NET to allow dynamic programming. This is done by adding the possibility of dynamic execution on top of the existing framework, which he believes is the way to go, since it allows the framework to use the advantages of non-dynamic programming where possible.

Finally, the trend towards concurrent comes from the fact that we've have reached the current maximum for how much performance we can press out of a processor, so we are seeing multi-processor computers in larger numbers, and thus need to ensure that our code supports running efficiently on such. To be able to do that, we need to expand the frameworks, to support transparent concurrency, where the code are executed in such a way that the computer maximizes the concurrency, without the compiler, or even worse, the developer, having to know how many processors there are.

As Hejlsberg made clear, he is far from the first person to mention these trends, but he is in an unique position to actually implement them, so it was quite interesting to hear his take on them.

If I'm right, and I am going to hear the same speech tomorrow, I'm interested in seeing if the audience will react differently. Today was almost entirely .NET people, while tomorrow is going to be a mixed crowd.

Labels: .NET, Anders Hejlsberg, programming, programming languages

Monday, May 26, 2008

Book Review: The Pragmatic Programmer

The Pragmatic Programmer - from journeyman to master by Andrew Hunt and David Thomas (Addison-Wesley, 2000)

After having this book recommend several times, I got my work to buy it for the office. And I'm quite happy that I did that.

The goal of this book is to give programmers (or rather systems developers) a set if tips on how to become better, by becoming more pragmatic. In this, the book is quite successful.

When you've worked in the IT field for some years, as I have, you'll probably have heard most, or all, of the ideas before. Indeed, many of them are industry standards by now (e.g. using source control). Even so, it's good to have them all explained in one place, and it might remind people to actually do things the right way, instead of cutting corners, which will come back an haunt the project later.

If you're new to the field, I think this book is a must-read, especially if you're going to work in project-oriented environments (e.g. as a consultant). I'm certainly going to recommend that we get inexperienced new employees to read this book when they start.

Now, to the actual content of the book. It covers a lot of ground, not in depth, but well enough to give people a feel of the subject. The first two chapters ("A Pragmatic Philosophy" and "A Pragmatic Approach") explains the ideas and reasons behind being pragmatic, and how it applies to systems development. The next chapter ("The Basic Tools"), tells what tools are available and should be used. This is probably the most dated chapter, especially when it comes to the examples, but it's still possible to get the general idea.

Chapter 4 ("Pragmatic Paranoia") and 5 ("Bend, Or Break") deals with two areas where many people are too relaxed in my opinion: testing and coding defensively (ensuring valid input data etc.). I cannot recommend these two chapters too highly.

"While You Are Coding" explains how to code better, and (more importantly in my opinion) when and how to refactor. The last two chapters ("Before the Project" and "Pragmatic Projects") gives tips on how to set up and run projects in a pragmatic way.

There are of course tips that I disagree with, or which I would have put less emphasis on, and the book is obviously written before agile methods, like scrum, became widespread (though eXtreme Programming is mentioned). Still, even so, I can really recommend the book to everyone, novices and experienced developers alike.

Labels: book review, computers, programming, systems development