Search This Blog

Loading...

Saturday, November 29, 2008

Working Effectively with Legacy Code

Working Effectively with Legacy Code (Robert C. Martin Series)
For the past few years I have been wrestling with legacy code, on and off. Having dealing with legacy code for so long, I know how painful it is to work with  legacy code . But sometimes you just can't ignore it; it is lying on your path and you have to refactor it into something that can be read, maintained and expanded. 

There is a great book on how to work with legacy code.  And doubtless other people has written on this topic too. So some of the materials will overlap. But here I just want to write down what actually works for me in dealing with legacy code.

First, a definition of legacy code. I define legacy code any code that don't have unit tests. Of course not all of the code without test coverage is hard to work with. But the code without accompanying test code is usually very brittle with because you don't know what and when you break. And because there is no tests to constraint the design, the code usually is not well structured and exhibits the problem of high coupling and low cohesion. There is no clear separation of concern as well. This all makes legacy hard to read, let alone modified.

So, how to tackle legacy code? The first thing you have to do is to look for common routines and eliminate duplication. You may not understand the code ( it's not your problem, though, the legacy code is usually just too badly written that it is completely unreadable), but that's not important. The important thing is to refactor the code so that common logic is put in a single place.  Try to reduce the amount of code you have to read and you can get a clearer picture of what the original coder intends.

Second, analyze for public variable usage. See how the variables are used and try to establish a pattern. Some tools, such as Resharper , can help here. One thing I found working with legacy code is that a lot of the times, public variables are misused, especially when the code involves event handling. A lot of public variables are declared because the states need to be shared across different events. For such a case, try to cut down on public variable usage by doing things on the fly instead of using the states to store information. The less the public variables, the clearer the code.

Third, follow MVP or MVC pattern! If you are doing Windows Form programming or ASP.NET  or any other variants of event driven framework, then you may find that switching to MVP or MVC a bit hard. I don't know of any well-defined algorithms that help to convert code into MVP pattern, but it is always worthwhile to ask yourself how do you program against View/database/model changed? When you are doing the conversion, imagine that the View you have is going to be replaced one day, or that you may need to support your applications on the web, mobile phone or other platforms. In this case, how do you ensure that common logic is shared and code duplication is reduced to the minimum? Similarly, how to design your code so that if the model changes, you have to only rewrite the model layer and don't have to rewrite the business layer code?

Fourth, unit tests! All this while we are talking about Test Driven Development where the developer writes tests first before the actual code. But when you are working with legacy code you are working backwards, from the code to the tests. Refactor first, tests come second. You may think that why bother with tests when your code is working. But the tests allow you to do changes in the future. Not only that, tests help you to see how well design your code is. Tests also help you to test the edge cases that you might miss because you are examining a smaller piece of code in isolation. 

It can be hard to test legacy code because of huge amount of setup code needed, so make sure you have a few mocking tools at hand, such as Typemock.

Since  you are refactoring before you have tests, so you have to do it carefully by verifying manually the code still works as usual after the changes. Don't refactor a large chunk of code and only then press F5 to see the output. If there was anything went wrong you might have a hard time debugging later. In this war against legacy code victories can only come by incrementally. Having extract common logic in two places into a single method? Good, verify that the system still behaves as usual! Construct a common interface that binds a few similar classes together? Then check that the program still runs as usual! 

Working with legacy code is a painful process ( and is a must because rewrite from scratch is not an option ) , so you should be extra careful. But when you see you have transformed a horrendous Orc into an Elf , you will have a deep sense of satisfaction. You can now sleep soundly because you have decreased the software entropy and put in place a mechanism that can reduce the software entropy in the future. 

You are no longer a mere coder, but a programmer . 

2 comments:

Anonymous said...

one of the sure signs of a programmer...not a mention of costs and time to completion

when you hire a painter to redo the kids bedrooms, you don't expect them to "refactor" a hole in the back wall "to make it easier to repaint the rooms when they are teenagers"

you're familiar with "don't reinvent the wheel" but how about "don't redo tests"???

let's say you're working on an interactive conputer video training course. There are 28 different modules and it take over 100 hours to go thru the "easy path" to test them. If they break, you're company is out over 1 million dollars in penalties (since the customer is spending 10 million to fly all the servicemen in for the training)

would you:
1) keep any changes localized to the specific path you are fixing?
or...
2)refactor the code, generate the thousands of test cases (oh wait-no automated test cases for interactive video! code) and risk breaking something else?

Soon Hui said...

Anonymous, maybe you haven't encounter any legacy code in your life.

The problem with legacy code is that it is so bad that even making a single change will take a lot of efforts and can often introduce subtle bugs that you won't discover until a few months down the road.

It is often not possible to make localized change because the effect of the changes are not well localized.

You have to refactor the code just to keep your sanity.