TDD - How It Can Be Done Right
TDD has gone wrong. It was well intentioned and is a great technique, but many developers who are trying it are not experiencing the promised benefits. How come?
This question was asked and answered by Ian Cooper in a DevTernity talk in 2017. Watch the video here, if you like. Itās worth pretty much all of the 63 minutes.
I came across it through a tweet by Robert C. Martin:
And after a couple of minutes into that talk I even followed his advice and took notes.
Itās these notes and some thoughts Iād like to share with you in what follows. I liked the talk very much. Iām agreeing with a lot of what Ian Cooper said. However, a couple of things were missing or not said as explicitly as needed for my taste.
I spare you a paraphrase of why and where TDD went wrong according to Ian Cooper. Watch the first 20min of his video instead, if you like to know. Rather let me cut to the chase by telling you what he thinks can be done about it:
- Avoid testing implementation details, test behaviours.
- Test only the public API.
- Run tests in isolation.
- Get red tests to green as fast as possible.
- Think about design and clean code only during the refactoring step of TDD.
Thatās it. Five easy steps to TDD bliss. No more red tests after refactoring. No more mock chaos. No more doubled or tripled coding time just because of tests.
Or maybe itās not that easy? Because if it where and always already had been in the book - Kent Beck, TDD by Example -, then why could TDD possibly have gone wrong?
So hereās my 2c why itās been so difficult to follow these steps.
1. Whatās this thing to test: behaviour?
The goal of requirements analysis is to define the desired behaviour of software. How you do it, whether you follow a formal process or ājust talk to the userā is of less concern, I think. However, of utmost importance is how the results are documented. How can desired behaviour be specified unambiguously?
To answer that question youāve to have an idea of what behaviour is in the first place. Hereās my definition:
Software shows behaviour by transforming input data (request) into output data (response) while using further data from resources.
Software behaviour is expressed only in data. You might think it has to do with user experience of user interfaces, but I disagree. User interfaces are just the means to make data tangible, to enable the user to input data, trigger behaviour, and view output data.
User interfaces are a detail. They look this way today, but different tomorrow. You use this technology today, but another one tomorrow. The behaviour essentially is unaffected by that.
The what of behaviour thus should clearly be separated from the how of behaviour. Think of a behaviour layer beneath an outer user interface layer of software.
User interfaces are about the how or looks of triggering and watching behaviour. But they donāt define the what of behaviour.
Sure, these two aspects are easily and often entwined. But I recommend to clearly separate them as much as possible. Otherwise it will be very hard to systematically create the desired behaviour and also have tests for it.
From that it should be clear that the typical user story is not enough. Itās neither a description of the how nor a specification of the what. Itās, well, just a nice story - which has to be listened to and interpreted. Precision looks different.
But precision is important, I think. āProgramming is about details,ā says Robert C. Martin. And how exactly input, output, and resources should be structured and filled to be considered good behaviour are details, important details. Precision is needed to get it right.
The part of the system to create a certain behaviour thus needs to provide means to feed it data - input as well as resources in a certain state. And it needs to provide means to receive data from it, output as well as resources in a changed state.
Thatās all you need to start with TDD. As you can see this hasnāt to do with DIP, IoC, any test framework or what not. Itās about focussing on behaviour, i.e. something tangible to the user, something the user has an opinion about and an can give feedback on.
TDD starts with compiling a list of desired behaviours - which then of course need to be encoded so they can be checked automatically.
Without suggesting any specific tool such a compilation could look as simple as this:
Sure, if data becomes more structured and larger in size and number Excel wonāt do anymore. But what needs to be done stays the same: If you want to know what the system to build is supposed to do donāt be content with plain text explanations or pretty UX prototype pictures. You need to gather explicit data to describe the behaviour.
Iād even argue that as long as users/PO cannot or do not want to provide you with such exact data, you should not start into the next phase of programming (be that designing or coding). The reason is simple: you donāt know when youāll be finished. Youāll stay more dependent on the user/PO than necessary which will lead to frustrating roundtrips and rework.
Itās that simple: no data, no tests. Is that what you want?
Behaviour defining data is crucial. Whatās not crucial is how you automate tests of the specified behaviour. In my view ādeveloper testsā, i.e. tests using a test framework like NUnit or JUnit are sufficient. As Ian Cooper says: users wonāt really use testing tools themselves. They are not interested in learning the ins and outs of Fit or Cucumber. So youāre stuck with encoding the data into tests yourself anyway.
As long as behaviour specific data could be elicited it should be easy to map it free of errors into a format you can use in developer tests. The user/PO has to trust you anyway. It thus should be sufficient to present him/her with a list of green and red tests to document the progress of the implementation.
Forget about the tools for a moment. Focus on not letting go of users/PO until they actually commit to a specific, automatically testable behaviour description.
2. Where to find the āpublic APIā in a user story?
In order to not paint yourself into a corner with a host of tests focus on just keeping tests on the āpublic APIā. Thatās very good advice from Ian Cooper. I wholeheartedly agree. But whatās the āpublic APIā?
Unless youāre tasked with writing a library with a specific API it might not be obvious what the āpublic APIā ist.
In my view an API defines the surface of a software system. Itās whatās visible to clients of the system; itās what has been explicitly designed for consumption.
By āconsumptionā I mean requesting behaviour. The āpublic APIā (or API for short, because APIs to me are public by definition) defines how requests can be send to the software system, and how responses are delivered back to a client. You see: APIs are about the how of behaviour.
An API thus can have very different shapes. I would not consider a user interface to be an API because itās not designed to be consumed by software but by humans. However, a web-service answering to REST calls offers an API. Or a micro-service subscribed to a RabbitMQ queue offers an API. Or an ETF-service listening for changes in the file system offers an API.
As much as I agree with Ian Cooper (or Kent Beck) I see two difficulties with testing just the āpublic APIā of software:
- Not all software has a āpublic APIā.
- A āpublic APIā can take on very different shapes.
Iāve thus come to the conclusion to not test the āpublic APIā itself. At least as long as the system to build isnāt a library.
With a library the API is easily accessible: itās just a list of public functions. (In OO software those functions are methods on public classes.) And functions are simple to test with a test framework like NUnit or JUnit.
To me the āpublic APIā of a non-library system is just a detail. Itās pretty much non-essential and should not stand in the way of testing. Any āinterface technologyā - be it a user interface technology like WPF or a service interface technology like ASP.NET MVC for REST calls - is comparatively hard to test and quite likely to change. I thus prefer to keep the code layer involved with such API technology as thin as possible - and donāt see much need to put it under full test automation.
I donāt want to test stuff thatās highly dependent on already tested stuff. ASP.NET MVC is doing its job fine, I trust. Why should I test a class tightly coupled to some framework? No, my job is to isolate such dependencies as much as possible from the really important code. Thatās a very basic architectural principle to me. That way I keep the decision for a particular technology at bay or at least reversible.
Instead of the āpublic APIā I recommend to look for the āAPI behind the APIā. And thatās true for all interfaces, be that APIs or UIs. My view of a software system is like this:
The interface to the environment consists of two layers: the outer layer is what clients of the system in the environment - be that human users or other software systems - directly interact with. Thatās where special interface technology is involved, e.g. WPF, ASP.NET MVC, RabbitMQ etc. etc. Thatās the real āpublic APIā - but it should be just a very thin layer.
Behind this outermost layer, the āpublic APIā, then lies a second API layer, the āinternal APIā. Itās still an API in that its shape is determined by the needs of the environment. But itās interface technology agnostic. It does not have a clue whether it gets triggered by a human user through a user interface or another software system by a message over TCP.
Take this trivial web-service for example:
It can be tested with a call to http://localhost:8000/api/add?a=1&b=41 using curl or Insomnia or some REST client framework. But Iād rather not do an automatic test by that means. Much too slow and cumbersome.
Instead Iād separate the service into public and internal APIs like this:
And then Iād just put InternalAPI.MathService{} under permanent automatic tests. (Note how the internal API is static. No need to make its Add() an instance method; static methods are easier to test. But thatās a topic for another time. The REST framework requires service methods to be non-static, though; an implementation detail.)
Sure, this is a trivial example, but I hope you get my point: the internal API is devoid of any hints as to which concrete technologies are needed for publication. Thatās what makes it easy to test.
No need to test if the framework which provides [Service] and [EntryPoint] is working correctly. And whether Get or /api/add are correctly applied can easily be checked once manually (or semi-automatically) - with all the dependencies hanging down from that entry point into the software system. No need for mocking.
What this boils down to for me is: Testing the āpublic APIā always means just testing an interface technology independent library. Itās just about regular public functions (regardless whether they are located in modules or classes).
Itās the internal API which has to provide means to send requests and receive responses. And the natural way to do that for programmers isā¦ calling a function.
Preferably thatās a single function per test case. A single function should be responsible for creating each behaviour. But maybe some other functions have to be called to set up the input and the initial resource state, and some other functions have to be called to retrieve the response and the final resource state. The fewer functions the better, though, I think.
Such an internal API is not only easier to test than a public API with all its technological dependencies. An internal API also can be reused with different public APIs. Itās simply a matter of applying the SRP: serving the environment is a responsibility different from actually creating some behaviour. Behaviour creating is about data transformation. Publication is about technology. Different logic, different technologies, different rate of change.
And never mind that the user/PO does not know how to use an internal API. Since he/she is not involved with test frameworks in the first place he/she does not care what exactly you feed the test data into and where you retrieve results from. He/she is concerned about green vs red tests only.
Sure the public part of the API is about details which also have to be correctly implemented. You need to check that. But what Iām concerned with here - and so is Ian Cooper, Iād say - are regression tests of the bulk of your software accessible through some form of API. Itās not the API itself, but whatās behind it.
The API to put under test is a list of functions hidden in User Stories and Use Cases. Look for where the environment triggers behaviour, try to come up with one internal API function to deliver each behaviour, then apply the collected sample behaviours to the internal API function in automated tests.
This is how in the end it looks:
Most tests to stay should exercise your software system from the outside. That way you are free to refactor it on the inside.
3. How to isolate without so many mocks?
Yes, tests should be run in isolation. No leftovers from a previous test should influence a later test. But doesnāt that lead to heavy mocking?
I donāt think so. Mocking should not be the default for test isolation. The reason is simple: Mocking is not the real thing.
Mocking hides whatās going on in a software system. And it introduces its own source of bugs.
Mocking is a form of waste. Itās doing something again in a fake way, which has been done already in a real way.
Donāt get me wrong: Mocking is a useful technique - sometimes. It should be applied carefully and in small doses.
Avoid broad brush isolation of everything from everything else because you want to put everything under permanent test. Itās a recipe for disaster.
Test isolation first and foremost means data isolation. The output and/or final resource state of one test should not be used as input and/or initial resource state of another test. Youād lose the ability to run just selected tests or run tests in any order you like. (Suggestion: Look for test frameworks which can run tests in random order; that way you quickly get feedback whether you really isolated your tests from each other.)
Itās ok to have several tests share a common context. But the effects of one test on that context should not carry over to the next test.
Test isolation thus is less about functional mocks, but more about data. As long as you ensure data to be always āfreshā and specific for a test according to the behaviour as specified you should not mock functionality. Your tests on the internal API should exercise as much of the function stack as possible. You need to have the parts of your program to actually collaborate. Your tests are experiments to confirm (or falsify) your hypothesis that all youāve coded actually creates desired behaviour.
To me thatās the goal: no mocks, only comparatively few tests at the outside to cover the whole functional stack.
Howeverā¦ sometimes you need to replace the real thing with a stand-in. Sometimes the real thing simply is too expensive during tests. Itās the same in movies: the star actor is usually not used in dangerous action sequences or during boring lighting and camera adjustments.
If a test with real data becomes too expensive or cumbersome, then sparingly use some form of mocking. Replace an RDBMS with file system persistence or an in-memory database. Replace a TCP-connection with data from a file system or whatever. Become creative. Thatās fine.
But again: I suggest you replace the least possible amount of code. Mocking mostly should happen on the boundary of a software system. Again, there is an interface:
Resources are accessed through their public API. In order to be able to replace them when needed this API needs to be wrapped by an adapter and not be used directly all over the place within the software system. Resource clients should always go only through this adapter.
With this adapter in place mocking can be done. Still, I wouldnāt recommend it, but itās possible with relatively little pain.
DIP + IoC help you with that. Itās no rocket science. A full blown dependency injection container is not necessary in my experience. Try to avoid as much additional complexity as possible which is inevitably introduced by the desire for mocking.
Probably the best advice with regard to test isolation is: avoid state. State as in-object state, but also persistent state.
Sure, in the end your systems will need to persist data. But that should not mean large parts of your systems should know about that. Making testing easier is not only about isolation and wrapping resource APIs. Itās also about not even to depend broadly on the resource wrapper at runtime.
4. Why is green so elusive?
In TDD green is supposed to be achieved by the simplest means possible. Follow the KISS principle. Ian Cooper emphasises that green does not equal clean, though. In the green phase of TDD donāt try to write beautiful timeless code. Just focus on getting the job done. Focus on satisfying the behaviour expectation specified by the test.
Thatās good news, Iād say. Itās an application of the SRP to your brain. Avoid multitasking. During green youāve one responsibility only to live up to: create functionally correct code.
Unfortunately my own experience as well as that of developers I watch in my trainings is that even dirty code is not that easy to write.
Letās not kid ourselves with degenerate test cases, e.g. checking input parameters for null. And letās not kid ourselves with fake solutions, e.g. returning the expected result as a hard coded value. Thatās procrastination at the micro level. It does not bring a real solution any closer.
What I mean are test cases to be satisfied with non-trivial code. This code often is elusive. Whyās that?
The problem to solve often is too big.
Take the ā8 queen problemā for example:
The eight queens problem is the problem of placing eight queens on an 8Ć8 chessboard such that none of them attack one another (no two are in the same row, column, or diagonal).
Hereās an example output from Wikipedia a software solution should be able to produce:
If the system to develop looked like this:
public class NQueensSolution { public static int[] Solve(int numberOfQueens) { ... } }
Then an acceptance test for the above depicted behaviour could look like this:
[TestFixture] public class NQueensSolution_tests { [Test] public void Acceptance_test() { var result = NQueensSolution.Solve(8); Assert.AreEqual(new[]{2,12,17,31,32,46,51,61}, result); } }
Wouldnāt that be a valid, crystal clear requirement? Of course - but itās a big one. It requires the full blown solution. If you had a hard time to come up with that even as very dirty code, Iād understand. And the point of TDD is to not even try.
Instead, find a simpler problem, a sub-problem hidden in the difficult big problem - and write a test for that first. Can the ā8 queen problemā be reduced to a ā1 queen problemā? Not really. The point is to have multiple queens. (Never mind that Wikipedia is not excluding this case.) Then maybe 2 queens. No, not possible. 3 queens? Not possible. Ah, but with 4 queens itās possible.
[TestFixture] public class NQueensSolution_tests { ... [Test] public void Simpler_4_queen_problem() { var result = NQueenSolution.Solve(4); Assert.AreEqual(new[]{1,7,8,14}, result); } }
But what now? Is that really, really a simpler problem? Sure, itās less queens to place and a smaller board. But the fundamental problem stays the same. I say, the solution hasnāt come much closer.
So what else could you do to break down the difficult problem into simpler ones which can be solved by the system under development? Keep in mind it needs to be a problem which is āof the same kindā, just simpler.
If you take the āroman numeral conversionā problem for example the ābig problemā could be to convert IV to 4. And if thatās too hard for you to tackle in full, then a sub-problem could be to convert VI to 6 (no subtractions). An if thatās still too difficult, start with converting I to 1 etc. (no addition, just single digits).
Each sub-problem is in and of itself a full problem (IV, VI, I are all roman numerals), just simpler in some regard. If you can decompose the userās requirements into such kind of simpler requirements then allās well with TDD. Work yourself incrementally from simple to complicated test cases.
To me, though, that does not seem possible with the ā8 queen problemā. Itās always the full blown problem. Thereās no way to approach the full solution incrementally. You either know the solution or you donāt.
And thatās my point: In real life problems donāt come packaged as nice little coding katas like the āroman numeral conversionā. You donāt really know how big they are. You cannot just look at required behaviour and derive simpler test cases from it.
Sure, the user/PO might help to classify behaviour. But in the end he/she is not concerned with difficulty. He/she just knows what kind of results should be produced. So youāre pretty much alone in TDD land.
My recommendation: always assume being confronted with an ā8 queen problemā - until you have evidence of the problem really being simpler.
That means: Diligently encode the acceptance test cases you came up with together with the user/PO. Always start with overall behaviour. Yes, even if that means those test cases will stay red for quite a while. Having some ambiguity tolerance helps to cope with that.
And then try to solve the problem. Yes, solve the problem without writing tests. Solving does not mean coding. It just means, you know how to transform input into output while using further data in resources. And that means you can state the explicit steps necessary. Again, no code is needed. Just steps, abstract steps, conceptual transformations.
Solving a problem without actually coding means modelling a solution.
How you find a model is a matter of creativity. But I know one thing: Your model must not be (imperative) code - because that would be in contraction with the meaning of model. Rather your model must just define partial transformations and their relationships from which a complete solution can be composed.
The originally required behaviour is created by a āthe one bigā transformation. But since thatās too big a problem to tackle at once the overall transformation has to be decomposed into a number of partial transformations.
Thatās a venerable problem solving technique but to me it seems as if many developers have forgotten about it.
TDD is great, but does not serve you a solution on a silver plate. You still have to come up with that yourself. Only when the problem is simple enough (see āroman numerals conversionā above) you can expect incremental TDD tests to help hammering out the actual logic needed for behaviour creation.
As long as thatās not the case, though, donāt expect TDD to help. Do your homework.
Thatās why I think green often is elusive: there is a misunderstanding. You think the problem is simple enough for TDD, but actually itās not. After a few degenerate and fake tests youāre hitting a wall. No small wonder you then blame it on TDD to not be helpful. Wasnāt TDD supposed to guide you with tests to functional and clean code? Well, yes - but this requires the problem to be of a certain kind.
Unfortunately this does not get communicated well in most TDD demonstrations. And even āGrowing Object-Oriented Software Guided by Testsā (by Steve Freeman and Nat Pryce) falls short in this regard for my taste.
If you donāt want green tests to be elusive please look closely at the problems at hand. In most cases youāll need to first decompose a comprehensive transformation into smaller, complementary ones. Maybe I can describe it like this: āGrow the function call tree from the root down guided by tests.ā Because thatās pretty much it, I guess: you start with a single function on the internal API which is required by the user as a root to direct requests at; and from there out grow branches of function calls which together deliver the behaviour requested. The deeper a function call is nested, the smaller the problem to be solved.
And what about objects? Well, locate those functions on classes as you see fit. Use an OO approach you like. This wonāt change the basic structure of any software solution which is a deep tree of function calls. Also Functional Programming does not change that.
5. What to refactor to?
Finally refactoring! Youāve found a solution, all tests are still green. Great! Now is the time for cleaning up your solution. Leave your duct tape programmer mode which was ok during the green phase. Enter the zen mode of a Japanese monk raking a rock garden.
This could be a time of pure delight. You bask in the light of a shiny functional and correct solution. Relax. Now live out the craftsman in you. Clean up the code so posterity will look up to you as a responsible programmer. Or, well, maybe just your fellow programmer will utter a couple of less āWTF!ā next time he/she has to work with the code you just wrote.
But, alas!, refactoring is often skipped. It seems dispensable. And the next increment is always more important than a good structure because itās clear who will be happy with more functionality compared to better structures: the paying customer.
Sooner or later, though, your productivity will drop because of accumulated dirt in your codebase. Hence you should really take the opportunity to do at least basic cleanup after each test you got to green. And then some thorough cleanup once you got the acceptance test to green as well. Thatās the test sitting at the root of your function tree and which was defined by the user/PO.
Really take the prescribed refactoring as a time to relax. Reflect on what youāve accomplished. Youāve done well. Now make your code sculpture pretty.
I think many more developers would really like to do that - but they donāt feel confident. Whatās clean code anyway? To what other structure than they have created during the green phase should they refactor the code? There sure are a million ways to improve the code - but which of them to go?
What does SLA mean? How to detect an abstraction level? Where are the boundaries of responsibilities? Because they are important if the SRP should be applied. How to slice and dice the code into classes/objects? Which rules to follow? Doesnāt help DIP with testing? Maybe a couple of more interfaces are in order then?
So many principles, so little time. Even if one developer could keep them all in his/her head there wouldnāt be enough time to apply them all, right?
I see great uncertainty and many misunderstandings when it comes to code clean in developers on all stacks. Thatās one of the reasons why technological frameworks are so popular: they suggest that allās gonna be well if you just use them properly. How often do I hear developers proudly exclaim āWeāve a Spring architecture!āā¦
Unfortunately the contrary is true, Iād say. You first have to be very clear about what clean code means, only then can you find the proper place within a clean structure to place usage of a framework.
So the question remains: what to refactor code to? How to make refactoring easier?
There are a lot of great principles. But under pressure we tend to develop tunnel vision. And pressure it is we feel during refactoring. Weāre not relaxed, but constantly feel the customer/PO with his/her need for more functionality breathing down our necks.
No chance a lot of great principles can be applied during refactoring. But maybe one? Yes, focus on just one principle. Thatās not too much, isnāt it?
Hereās my choice for this one principle which at least should always be observed: the Integration Operation Seggregation Principle (IOSP).
Itās very, very easy to check if code follows this principle. You donāt even need to understand the solution. IOSP is just concerned with form.
In IOSP code there are just two types of functions:
- Functions which contain logic, i.e. actually do something. They are called operations. They create behaviour.
- Functions which do not contain any logic but only call other functions. They are called integrations. Their purpose is to integrate parts into a whole, to compose a comprehensive solution from partial solutions.
Code structured according to the IOSP looks very different from what youāre used to, I guess. Itās lacking functional dependencies! There is no logic calling another of your functions to delegate some work and after that consume the result in some more logic.
This is how your code is looking today. Most methods contain logic (red) as well as calls (green) to other functions. They are hybrids, neither just integration, nor just operation.
As usual as that is itās not healthy in the long run. Several detrimental effects result from it, e.g.
- Functions grow indefinitely
- Logic in functions is hard to test
- Flow of behaviour creation is difficult to follow
- Changing levels of abstraction make understanding hard
Compare this to an IOSP function hierarchy:
As you can see: more functions. But all functions are now focused on one formal responsibility (remember the SRP?): they are either operations or integrations.
That has several beneficial effects:
- Functions tend not to grow beyond 10 to 50 lines.
- Logic is easy to test because it does not depend on other functions.
- The āflow of operationā is easy to follow in each integration.
- Code naturally conforms to a stratified design which makes understanding and reuse easier.
Refactoring to the IOSP is not without its challenges, but the price to pay is small compared to the large benefits to gain.
First and foremost, though, the IOSP is easy to remember. Just a single acronym to keep in your head for refactoring after green. That should be doable, I guess.
But even better: if you get into a habit of thinking along the lines of the IOSP youāll need less refactoring in the first place. Youāll produce cleaner code from the start. The IOSP is a natural match for problem solving by stepwise decomposition. So if you do your homework of solving the problem first before you start coding, then you very likely already have a list of integrations and operations to work through. This might go so far as to make implementation actually a boring task. Whatās really exciting, though, isā¦ design, i.e modelling a solution.
Summary
Yes, the understanding (or mainstream adoption) of TDD went wrong. But there is a way out of this pit of test frustration. Ian Cooper has presented a five step guideline to follow. And Iāve tried to fill in a couple of things I found missing. I hope in combination youāll find more happiness in doing TDD.