Practical refactoring using unit tests
Having just read the first four chapters of Refactoring: Improving the Design of Existing Code, I embarked on my first refactoring and almost immediately came to a roadblock. It stems from the requirement that before you begin refactoring, you should put unit tests around the legacy code. That allows you to be sure your refactoring didn't change what the original code did (only how it did it).
So my first question is this: how do I unit-test a method in legacy code? How can I put a unit test around a 500 line (if I'm lucky) method that doesn't do just one task? It seems to me that I would have to refactor my legacy code just to make it unit-testable.
Does anyone have any experience refactoring using unit tests? And, if so, do you have any practical examples you can share with me?
My second question is somewhat hard to explain. Here's an example: I want to refactor a legacy method that populates an object from a database record. Wouldn't I have to write a unit test that compares an object retrieved using the old method, with an object retrieved using my refactored method? Otherwise, how would I know that my refactored method produces the same results as the old method? If that is true, then how long do I leave the old deprecated method in the source code? Do I just whack it after I test a few different records? Or, do I need to keep it around for a while in case I encounter a bug in my refactored code?
Lastly, since a couple people have asked...the legacy code was originally written in VB6 and then ported to VB.NET with minimal architecture changes.
Good example of theory meeting reality. Unit tests are meant to test a single operation and many pattern purists insist on Single Responsibilty, so we have lovely clean code and tests to go with it. However, in the real (messy) world, code (especially legacy code) does lots of things and has no tests. What this needs is dose of refactoring to clean the mess.
My approach is to build tests, using the Unit Test tools, that test lots of things in a single test. In one test, I may be checking the DB connection is open, changing lots of data, and doing a before/after check on the DB. I inevitably find myself writing helper classes to do the checking, and more often than not those helpers can then be added into the code base, as they have encapsulated emergent behaviour/logic/requirements. I don't mean I have a single huge test, what I do mean is mnay tests are doing work which a purist would call an integration test - does such a thing still exist? Also I've found it useful to create a test template and then create many tests from that, to check boundary conditions, complex processing etc.
BTW which language environment are we talking about? Some languages lend themselves to refactoring better than others.
From my experience, I'd write tests not for particular methods in the legacy code, but for the overall functionality it provides. These might or might not map closely to existing methods.
Write tests at what ever level of the system you can (if you can), if that means running a database etc then so be it. You will need to write a lot more code to assert what the code is currently doing as a 500 line+ method is going to possibly have a lot of behaviour wrapped up in it. As for comparing the old versus the new, if you write the tests against the old code, they pass and they cover everything it does then when you run them against the new code you are effectively checking the old against the new. I did this to test a complex sql trigger I wanted to refactor, it was a pain and took time but a month later when we found another issue in that area it was worth having the tests there to rely on.
In my experience this is the reality when working on Legacy code. Book (Working with Legacy..) mentioned by Esko is an excellent work which describes various approaches which can take you there.
I have seen similar issues with out unit-test itself which has grown to become system/functional test. Most important thing to develop tests for Legacy or existing code is to define the term "unit". It can be even functional unit like "reading from database" etc. Identify key functional units and maintain tests which adds value.
As an aside, there was recent talk between Joel S. and Martin F. on TDD/unit-tests. My take is that it is important to define unit and keep focus on it! URLS: Open Letter, Joel's transcript and podcast
That really is one of the key problems of trying to refit legacy code. Are you able to break the problem domain down to something more granular? Does that 500+ line method make anything other than system calls to JDK/Win32/.NET Framework JARs/DLLs/assemblies? I.e. Are there more granular function calls within that 500+ line behemoth that you could unit test?
The following book: The Art of Unit Testing contains a couple of chapters with some interesting ideas on how to deal with legacy code in terms of developing Unit Tests.
I found it quite helpful.