Saturday, June 13, 2009

Avoiding error-driven development, part 1 - testing

Some time ago, I wrote about something I call ”error-driven development”, which is a type of software development I come across all too often. You can find the original post here.

I’ve found out that many software developers and consultants can relate to the post, and I’ve discussed with several what one can do about error-driven development (EDD).

Well, there is no perfect answer to this question, since the root cause of EDD is different in every EDD project. I have, however, been on a number of EDD projects through the years, so I have some suggestion on some general measures one can do to either turn EDD into something else, or to limit the damage.

I’ll try to go through some of them from time to time. In this post I’ll focus on testing.

Testing

I will make the claim that testing is one of the most underrated activities in software development projects, and this has to change in order to avoid EDD. What’s more, testing is also a widely misunderstood concept. Testing is a much bigger activity than most people believe, and covers more aspects than generally thought.

Testing should of course ensure that the system works as intended, but it should also ensure that the system doesn’t work when it’s not supposed to, and that the system can handle unexpected events in a meaningful way.

In his book, Release It, Michael Nygard makes a very good point: Systems are built to pass acceptances tests, not to run in the real world. This is one of the things that lead to EDD projects, where the developers are working on a later version of a system which is in production.

Testing should allow for the particularities of the real world, and not only for the test environments (see Release It for some very good examples of the differences, and some good ways of making up for these differences).

There are several types of testing, some of which I will cover here, and in my experience, focusing on just one of them, will lead to problems in the long run.

Unittesting

With the spreading of concepts like test-driven development, unittests are very much in the vogue. Unfortunately, books on TDD and its irk generally doesn’t explain how unittests should be written – just that the they are important, and should be written before the code.

Making unittests ensuring that code works as expected is of course very important, but if that’s all what the unittests do, it’s not enough. Unittests should also ensure that code doesn’t work when that’s expected – e.g. if a method gets an invalid parameter, you expect it to fail in some way or another. Tests for this – don’t just assume that this is the case, even if the code works with correct input parameters. Besides ensuring that the code works as it should, even when this means throwing an exception, it also makes it easier for others to see what behavior is expected of the code.

There is, unfortunately, a tendency to focus on code coverage of unittests, where code coverage is taken to mean percentage of code lines executed during the tests. This is the wrong code coverage measure. Instead one should focus on covering all the breaking and non-breaking states that the code can be in.

E.g. if you have some code which receives a text string containing a number, which it converts to a number, make sure to test the following:
a) A string containing a positive integer
b) A string containing a positive floating point number using the normal separator (in the US an example could be “10.10”)
c) A string containing a positive floating point number using a separator from a different culture (e.g. the Danish “10,10”).
d) The same as b) and c) just with thousand-separators (“1,000.00” and “1.000,00” respectively).
e) The same as a) through d), but with negative numbers instead.
f) A string containing a number too large to be handled by the data type it’s going to be converted to.
g) A string containing a negative number too large to be handled by the data type it’s going to be converted to.
h) A string containing letters
i) A string containing zeros in front of the number

I could continue, but you get the point. As you can see, that’s a large number of tests for a fairly simple functionality, which is often implemented by using built in functionality. Even so, it’s worth spending the time on doing these, as this is the sort of things which can cause real problems in production.

Smoke testing

Unittests are of course not the only sort of testing; there are others which are just as important. Smoke tests are automatic tests which can be run to test different flows through the system. E.g. in an internet portal, the smoke test might log in, and navigate to a specific page, while entering data in the intermediate pages.

These tests generally need some kind of tool to be made. Depending on your development framework and the nature of the system, you need to find one that suits you. In portal projects I’ve seen pretty good results with smoke tests made in Ruby, but in my current project, we are using Art of Test’s WebAii, where the tests are written in C# or VB.NET (but can test web GUIs written in other languages).

Smoke tests require a lot of time to make and maintain, especially in a system where the user interface is changed often. In such cases, it might make sense to have resources focused on running and maintaining smoke tests. These shouldn’t only focus on this, but they should have the responsibility to ensure that all smoke tests can run at all time.

Even if there are people responsible for maintaining it should be the responsibility of the developers to run the relevant smoke tests before checking in any changes to the user interface, and in case it fails, to correct the tests or the code as needs be.

Smoke tests help ensure that changes in one part of the user interface don’t have a negative impact on the functionality of another part, which is often the case.

Integration testing

In these days of SOA, ROA and what have you, it’s very rare that a system stands alone. Rather, systems tend to work together with other systems through integration points. Even the system doesn’t work with other systems over the network, it will generally use a database manage system, such as DB2, Oracle, or MS SQL, run in an operative systems (*NIX, Windows etc.), or have other interaction with other systems. All this should be tested.

If possible, integration testing should be automated, but even if that’s not practical for some reason or other, manual integration testing should be done.

As in smoke testing, it’s possible to get a number of tools which allows you to make the tests. The selection of tools again depends on the system and the development framework.

Integration testing can be very difficult, as the testing is dependent upon external systems, some of which might not have been coded yet. In such cases, remember that it’s not the other systems that the test should test, but rather the integration points with these. So, there is no real need for a fully functional system in the other end. Rather, it’s sufficient to have a mock system which sends data as it could appear from the external system. This can be done through tools like soapUI, which can both send data through webservices your system exposes, and which can serve as a receiver for your web service requests. Of course, this isn’t always enough, and I have experienced a project where the behavior of the developed system was so dependent on the retrieved data, that it was necessary to build a simulator, simulating all the back end systems.

Remember to test for differences in cultures in different systems. Can your system survive that the date-format or numbers it receives confirm to a different cultural standard than yours? This is something that’s easily overlooked, but which can have a great impact – either by crashing the system, or by the system misunderstanding the values. It makes a great difference if the date “1/5/2009” is January 5th or May 1st.

Even less ambiguous formats might cause problems, and they can be even harder to figure out. E.g. if you use a date format “dd-MMM-yyyy”, would be fine for the first 4 months when exchanging data between a Danish and a US system, but on May 1st it would be “01-May-2009” in the English speaking world, but “01-Maj-2009” in the Danish speaking world. This could mean that the system suddenly, and unexpectedly, stops working as expected, even though everything has been running just fine until then (this is not an made up example – I once started in a new job on May 1st, where my first accomplishment was to figure out this exact problem).

The more integration tests you make during development, the fewer fixes needs to be done when the system is in production (I refer to Michael Nygard's Release It for good advice on making testing environments for integration testing).

Manual testing

There is unfortunately a tendency for developers to believe that as long as you have enough automatic tests, there is no need for manual testing. This is of course nonsense.
No matter how many automatic tests you have, and how sophisticated tools you’ve used to make them, there is no substitute for human eyes on the system.

Manuel tests can be divided into two groups: Systematic testing (based on test cases) and monkey testing.

Systematic testing, normally done based on test cases, tests the functionality of the system, ensuring that it works as specified, including implicit specifications. The testers should have enough understanding of the business that they meaningfully test the system, not just follow the test script step-by-step.

With regards to the test cases, my general suggestion is that they are not written by technical people, but rather by people with an understanding of the business domain. Optimally they should be written at the same time as the requirements or at least before the coding really start, and only be in general terms. Before the developer starts developing, he or she should read the relevant test cases, making sure that he or she understands the requirements as stated in general terms. If there are some business concepts that appear unclear, it’s possible for the developer to acquire the necessary domain knowledge before starting on the development. When the system is developed, the test cases can be made specific to the system (I recommend keeping the unspecific test cases in reserve though, as the system can change a lot over the time, and it’s good to have some general test cases to refer back to).

As with all the earlier tests, there should also be testing of wrong usage of the system, ensuring that this wrong usage will result in neither major problems nor a wrong result.

Note that while test cases and use cases might sound similar, at least at first, as I describe test cases, that’s not really the case. Use cases describe things on an abstract level, while test cases are more specific. In an insurance system, a use case would describe how the user creates an insurance policy. Test cases would not only describe that the user will create an insurance policy, but rather what sort of insurance to choose, what values should be used, and what extras should be selected.

Monkey testing is unsophisticated testing of the system, where the tester tries to do whatever suits him or her, trying to provoke a failure in the system. It might be entering a wrong value in a field, clicking on a button several times in a row, or doing something else unexpected by the developers. The purpose of the testing is to emulate the sort of things which might happen in the real world, outside the safe testing zone.

While monkey testing it’s very important to document the exact steps which results in the error. Some times the symptom of the error (the system failing) occurs a rather long time after the action which caused the error.

In conclusion

There are of course many other sorts of testing (performance testing for one), but I feel that by doing the sort of testing I mention, one can do a lot to prevent a project turning into an EDD project.

The reason good testing can help avoid EDD is simple. A lot of the time EDD projects only addresses the symptoms, fixing bugs as they are reported, but they don’t address the fundamental problems, so these fixes are only temporary at best, and in general introduces other errors, which are only discovered at a later stage.

Testing will ensure that the system being developed is stable, or at least that the non-working functionality are discovered at an earlier stage. Testing also ensures that changes can be introduced more easily, as side-effects are shown straight away.

Of course, introducing testing into an EDD project is not easy. It will be running behind schedule, and people will be overburdened with work, so adding new tasks will not be doable. This doesn’t mean that testing shouldn’t be done though, just that it should be done in steps, rather than all at once. Find the core functionality, or alternatively the most problematic code, and introduce testing there – unittesting should come first, but don’t forget the other types of testing.

I know this is easier said than done, but I’ve been in projects solidly in the EDD category, which we managed to turn around, in part because of testing. In one project, we made the case for unittests by making 10 unittests of basic functionality in the system, showing eight of them failing. This resulted in me getting resources allocated to me, just to ensure proper unittesting of all basic functionality (we later expanded to other functionality and introduced other types of testing).

If such a drastic demonstration isn’t possible, start by doing unittests whenever you change some code – this will ensure that the code works properly after you’ve changed it. Sadly, code in EDD projects are often not in a stage where unittests can easily be introduced. This is why they should be introduced when the code is changed anyway, since it gives an opportunity for refactoring the code at hand to allow unittests.

I hope this rather long post made sense to people. It’s not revolutionary concepts I’m trying to introduce, and for many people, the things I mention are blatantly obvious. Even so, there are many people, and organizations, out there, for which testing doesn’t come naturally. These people, and organizations, need to be reminded ever so often that there is a very good reason why we do these things.

Testing can’t stand alone of course; many other measures are needed to avoid project development to turn into EDD, or to turn a project away from being an EDD project. Still, they are fundamental for a healthy development, so leaving them out, will more or less guarantee that the project turn into EDD.

Labels: , , ,

3 Comments:

Blogger Christian said...

Nice post with lots of good info.
You give a good example of the many facets one test might have. To me that's also one of the important things for a programmer to know in order to make decent tests (besides time / understanding management).
Having a project with a unit test that just calls the one public available method that gets called every time the project loads just doesn't qualify as thorough test.
The example of showing how 8 out 10 tests failed was inspiring. I'm curious though, as to what weighs the most for the boss: continuing with the "EDD" and making money on selling upgrades, or making a rock-solid product that costs a lot more in development/testing time with less or no upgrades? (let's imagine that the software does not deal with human lives)

June 13, 2009 8:50 PM  
Blogger Kristjan Wager said...

Good question Christian.

I am idealistic enough to believe that very few people like to make crappy work, so I would think that most people are willing to try to move away from EDD.

Convincing them that making tests are a way can be hard though - time spend on making tests are considered unproductive by many managers, so it's a hard sell.

One way of doing it, is to emphasis the cost of not making tests.

The total cost of a project is:
cost of development + cost of testing + cost of maintainance

However, there is not a 1 to 1 exchange between these costs. If you spend less on testing, it will cost more in both development and maintainance. This cost will be radically higher than the saving on tests.

June 13, 2009 9:17 PM  
Anonymous Randall Stross said...

I've been meaning to read Nygard's Release it for a while now. Thanks for prompting me to do so now.

June 14, 2009 3:39 AM  

Post a Comment

<< Home