My Thoughts on Mocks
Up, down, Detroit, charm, inside out, strange, London, bottom up, outside in, mockist, classic, Chicago….
Do you remember the questions on standardized tests where they asked you to pick the thing that wasn’t like the others? Well, this isn’t a fair example as there are really two distinct groups of things in that list, but the names of TDD philosophies have become as meaningless to me as the names of quarks. At first I thought I’d use this post to try to sort it all out, but then I decided that I’m not the Académie française of TDD school names and I really don’t care that much. If the names interest you, I can suggest you read TDD – From the Inside Out or the Outside In. I’m not convinced that the author has all the grouping right (in particular, I started learning TDD from Kent Beck, Ron Jeffries and Bob Martin in Chicago, which is about as classic as you can get, and it was always what she calls outside in but without using mocks), but it’s a reasonable introduction.
Still, it felt like it was time to think about TDD again, so instead I went back to Ron Jeffries Thoughts on Mocks and a comment he made on the subject in his Google Groups Forum. In the posting, Ron speculated that architecture could push us a particular style of TDD. That feels right to me. He also suggested that writing systems that are largely “assemblies of OPC (Other People’s Code)” “are surely more complex” than the monolithic architectures that he’s used to from Smalltalk applications and that complexity might make observing the behavior of objects more valuable. That idea puzzles me more.
My own TDD style, which is probably somewhere between the Detroit school, which leans towards writing tests that don’t rely on mocks, and London schools, which leans towards using mocks to isolate each unit of the application, definitely evolved as a way to deal with the complexity I faced in trying to write all my code using TDD. When I first started out, I was working on what I believe would count as a monolithic application in that my team wrote all the code from the UI back to right before the database drivers. We started mocking out the database not particularly to improve the performance of the tests, but because the screens were customizable per user, the data for which was in a database, and the actual data that would be displayed was stored across multiple tables. It was really quite painful to try to get all the data set up correctly and we had to keep a lot more stuff in mind when we were trying to focus on getting the configurable part of the UI written. This was back in 1999 or 2000, and I don’t remember if someone saw an article on mocking, but we did eventually light on the idea of putting in a mock object that was much easier to set up than the actual database. In a sense, I think this is what Ron is talking about in the “Across an interface” section of his post, but it was all within our code. Could we have written that code more simply to avoid the complexity to start with? It was a long time ago and I can’t say whether or not I’d take the same approach now to solving that same problem, but I still do find a lot of advantages in using mocks.
I’ve been wanting to try using a NoSQL database and this seemed like a good opportunity to both try that technology and, after I read Ron’s post, try writing it entirely outside-in, which I always do anyway, and without using mocks, which is unusual for me. I started out writing my front-end using TDD and got to the point that I wanted to connect a persistence mechanism. In a sense, I suppose the simplest thing that could possibly work here would have been to keep my data in a flat file or something like that, but part of my purpose was to experiment with a NoSQL database. (I think this corresponds to the reasonably common situation of “the enterprise has Oracle/MS SQL Server/whatever, so you have to use it.) I therefore started with one of the NoSQL implementations for .NET. Everything seemed fine for my first few unit tests. Then one of my earlier tests failed after my latest test started passing. Okay, this happens. I backed out my the code I’d just written to make sure the failing test started passing, but the same test failed again. I backed out the last test I’d written, too. Now the failing test passed but a different one failed. After some reading and experimentation, I found that the NoSQL implementation I’d picked (honestly without doing a lot of research into it) worked asynchronously and it seemed that I’d just been lucky with timing before they started randomly failing. Okay, this is the point that I’d normally turn to a mocking framework and isolate the problematic stuff to a single class that I could either put the effort into unit testing or else live with it being tested through automated customer tests.
Because I felt more strongly about experimenting with writing tests without using mocks than with using a particular NoSQL implementation, I switched to a different implementation. That also proved to be a painful experience, largely because I hadn’t followed the advice I give to most people using mocks, which is to isolate the code for setting up the mock into an individual class that hides the details of how the data is set up. Had I been following that precept now that I was accessing a real persistence mechanism rather than a mock, I wouldn’t have needed to change my tests to the same degree. The interesting thing here was that I had to radically change both the test and the production code to change the backing store. As I worked through this, I found myself thinking that if only I’d used a mock for the data access part, I could have concentrated on getting the front-end code to do what I wanted without worrying about the persistence mechanism at all. This bothered me enough that I finally did end up decoupling the persistence mechanism entirely from the tests for the front-end code and focus on one thing at a time instead of having to deal with the whole thing at once. I also ended up giving up on the NoSQL implementation for a more familiar relational database.
So, where does all this leave my thoughts on mocks? Ron worried in his forum posting that using mocks creates more classes than testing directly and thus make the system more complex. I certainly ended up with more classes than I could have, but that’s the lowest priority in Ken Beck’s criteria for simple design. Passing the tests is the highest priority, and that’s the one that became much easier when I switched back to using mocks. In this case, the mocks isolated me from the timing vagaries of the NoSQL implementations. In other cases, I’ve also found that they help isolate me from other random elements like other developers running tests that happen to modify the same database tables that are modifying. I also felt like my tests became much more intention-revealing when I switched to mocks because they talked in terms of the high-level concepts that the front-end code dealt with instead of the low-level representation of the data of the persistence code needed to know about. This made me realize that the hard part was caused by the mismatch between the way the persistence mechanism (either a relational database or the document-oriented NoSQL database that I tried) and the way I thought of the data in my code. I have a feeling that if I’d just serialized my object graph to a file or used an object-oriented database instead of a document-oriented database, that complexity would go away. That’s for a future experiment, though. And, even if it’s true, I don’t know how much I can do about it when I’m required to use an existing persistence mechanism.
Ron also worried that the integration between the different components is not tested when using mocks. As Ron puts it in his forum message: “[T]here seems to be a leap of faith made: it’s not obvious to me that if we know that A sends the right messages to Mock B, and B sends the right messages to Mock A, A and B therefore work. There’s an indirection in that logic that makes me nervous. I want to see A and B getting along.” I don’t think I’ve ever actually had a problem with A and B not getting along when I’m using mocks, but I do recall having a lot of problems with it when I had to map between submitted HTML parameters and an object model. (This was back when one did have to write such code oneself.) It was just very to mistype names on either side and not realize it until actual user testing. This is actually the problem that led us to start doing automated customer testing. Although the automated customer tests don’t always have as much detail as the unit tests, I feel like they alleviate any concerns I might have that the wrong things are wired together or that the wiring doesn’t work.
It’s also worth mentioning that I really don’t like the style of using mocks that really just check if a method was called rather than it was used correctly. Too often, I see test code like:
mock.Stub(m => m.Foo(Arg.Is.Anything, Arg.Is.Anything)).Return(0);
mock.AssertWasCalled(m => m.Foo(Arg.Is.Anything, Arg.Is.Anything));
I would never do something like this for a method that actually returns a value. I’d much rather set up the mock so that I can recognize that the calling class both sent the right parameters and correctly used the return value, not just that it called some method. The only time I’ll resort to asserting a method was called (with all the correct parameters), is when that method exists only to generate a side-effect. Even with those types of methods, I’ve been looking for more ways to test them as state changes rather than checking behavior. For example, I used to treat logging operations as side-effects: I’d set up a mock logger and assert that the appropriate methods were called with the right parameters. Lately, though, with Log4Net, I’ve been finding that I prefer to set up the logger with a memory appender and then inspect its buffer to make sure that the message I wanted got logged at the level I wanted.
In his Forum posting, Ron is surely right in saying about the mocking versus non-mocking approaches to writing tests: “Neither is right or wrong, in my opinion, any more than I’m right to prefer BMW over Mercedes and Chet is wrong to prefer Mercedes over BMW. The thing is to have an approach to building software that works, in an effective and consistent way that the team evolves on its own.” My own style has certainly changed over the years and I hope it will continue to adapt to the circumstances in which I find myself working. Right now I find myself working with a lot of legacy code that would be extremely hard to get under test if I couldn’t break it up and substitute mocks for collaborators that are nearly impossible to get set up correctly. Hopefully I’ll also be able to use mocks less, as I find more projects that allow me to avoid the impedance between the code’s concept of the model and that of external systems.
In Diamond in the Rough I talked some about the similarity I found between David Gries’ work on proving programs correct in The Science of Programming and actually doing TDD. I’ve since gone back to re-read Science. Honestly it hasn’t gotten any easier to read since I last read it so long ago. It’s still a worthwhile, though, if you can find a copy. It’s a nice corrective to an issue I’ve seen with a lot of developers over the years: they just don’t have the skills or maybe the inclination to reason about their code.
This seems most evident in an over-reliance on debuggers. For myself, I only use a debugger when I have a specific question to answer, and I can’t answer it immediately from the code. “What’s the run-time type of this visitor given these circumstances?” of “Which of the five different objects on the silly call chain was null when it was used?” (that’s a little lazy, since I could refactor the code to make the answer clear without the debugger, but I’m generally trying to find the unit test to write to expose the problem before I fix it, and I don’t want to refactor until I have a green bar that includes the fix to the actual problem). Those are the types of questions I might turn to the debugger to help answer. Particularly when the cost of setting up a test to get that type of feedback is unusually high compared to using a debugger. This is common when my code is interacting with third-party code in a server container. Trying to set up some kind of integration or functional test to get the very specific information I want can be a horrible rabbit hole. (Although it may still be worth setting up the functional test to prove a problem is actually solved.)
So I do recognize times when one can get good feedback more effectively from a debugger than immediately trying to write a unit test or just reasoning about the code one is looking at, but it worries me when the debugger is the first tool a developer uses when something doesn’t act as they expect. Even when I try asking some developers the questions that I ask myself when trying to understand the problem, the first response is to fire-up the debugger, even if it’s patently not a question that needs a debugger to answer (“What kind of type is that method returning?” when the actual method declaration is no more than twenty lines down and declares that it’s returning a concrete type). And, most egregiously, I’ve often seen people debugging their unit tests.
That’s really worrying since the unit tests are meant to help us understand our code. It concerns me when I see someone’s first response to a unit test failure (even a test they’re just writing) is to run it through the debugger. Which is not to say that I never do so, but for me it’s always a case of “I know this test failed because this variable wasn’t what I expected it to be. What was it?” Again, for me the debugger is a means for getting a specific piece of information rather than something I try to use to help me understand the code. And that seems a much more efficient and effective way to get the code to do what I want it to do.
The more general case for turning to the debugger seems to be when one doesn’t understand the code. It’s a little more understandable when trying to understand someone else’s code. Even then, though, I’m not convinced that the debugger is the best option for trying to really understand what the code is doing. In a case like this, I’d comment out the code and write unit tests to make me uncomment one small part at a time. This forces me to really understand what the code is doing because setting up the tests correctly helps me look at the various conditions that affect the code. Gries’ techniques come into play here, too. It’s unconscious for me now, but the ability to reason formally about the code helps lead me into each new test that will make me uncomment the smallest possible bit of code.
So, how do we learn to reason about our code rather than turning to the debugger as an oracle that will hopefully explain it? Part of it may be the skills one learns in Gries’ Science, even if they’re not formally applied. The stronger influence, however, may be the way I learned to practice TDD. I do genuinely test-drive my code and when I first learned TDD it was drilled into me to write my failing test and state why the test would fail. After not really writing one’s tests before the code, not asking that simple question seems the biggest failure in how I’ve seen TDD practiced. That might be the better way to learn to really reason about what one’s code is doing. While I still respect and appreciate the techniques that Gries described in Science, it’s probably both easier and more efficient to learn the discipline of really writing tests before the code, asking the why the test should fail and thinking about it when it fails differently than one expects.
I like to attempt minor DIY projects around the house because 1) it saves money and, 2) it’s enjoyable to solve technical issues that don’t involve staring at a computer. Recently, I decided to wire the living room with recessed lights. There is no existing light fixture in the ceiling so I had to use power from a receptacle and wire in a new switch to control the lights.
I didn’t want to cut through the wall board, run wire and then find out that I didn’t know how to actually wire into the receptacle. So, I decided to approach the problem much like I do when faced with a daunting programming problem – by unit testing.
I pulled out the receptacle, examined the wiring, and scratched out a plan on a piece of paper. Then I wired a light switch and cheap single bulb fixture off of the receptacle. Luckily, it worked without major adjustments or shocks. And it entertained the kids for about 3 minutes.
I then was able to confidently cut through the wall and wire in the switch, solving one piece of the puzzle and allowing me to focus on installing the overhead lights.
As a design technique, Test-Driven Development (TDD) allows us to break down complex systems into smaller, more manageable chunks. Once you’ve written tests to satisfy a cohesive set of requirements, you commit the code and move on to the next set.
A good example of this can be found in the book, Practices of an Agile Developer (Subramaniam, Hunt). One of the practices is described as Attacking Problems in Isolation. As the authors explain:
“Large systems are complicated – many factors are involved in the way they execute. While working with the entire system, it’s hard to separate the details that have an effect on your particular problem from the ones that don’t.”
Isolating the problem is also useful when debugging a system issue that is buried under layers of UI, database and middle-tier abstractions. Remove each layer until you’ve discovered the likely culprit. Or build a simple prototype and isolate the misbehaving module.
It’s easy to get overwhelmed by a complex system when trying to decide where to begin. It may feel like a house-of-cards, teetering on the verge of collapse with the next interruption. Consider TDD as not just a test fixture, but as a design technique that helps narrow the scope.
And as for wiring a home – keep it simple and remember to cut off the power.
“A long time ago, in a university far away, a few professors had the idea that they might teach programmers to think a bit before hitting the keycaps. Nearly a lost cause, of course, even though Edsger Dijkstra and David Gries championed the movement, but the progress they made was astonishing. They showed how to create programs (of a certain category) without error, by thinking about the properties of the problem and deriving the program as a small exercise in simple logic. The code produced by the Dijkstra-Gries approach is tight, fast and clear, about as good as you can get, for those types of problems.”
To which Ron responds:
“I looked at Alistair’s article and got two things out of it. First, I understood the problem immediately. It seemed interesting. Second, I got that Alistair was suggesting doing rather a lot of analysis before beginning to write tests. He seems to have done that because Seb reported that some of his Kata players went down a rat hole by choosing the wrong thing to test.
“Alistair calls upon the gods, Dijkstra and Gries, who both championed doing a lot of thinking before coding. Recall that these gentlemen were writing in an era where the biggest computer that had ever been built was less powerful than my telephone. And Dijkstra, in particular, often seemed to insist on knowing exactly what you were going to do before doing it.
“I read both these men’s books back in the day and they were truly great thinkers and developers. I learned a lot from them and followed their thoughts as best I could.”
Ron’s article goes on to solve the same programming problem with TDD and no particular up-front thinking that Cockburn solved in what he calls a combination of “the Dijkstra-Gries approach” and TDD. On the whole, I would tend more towards the pure TDD approach that Ron takes because it got him feedback earlier and more frequently, while Cockburn’s approach, with more upfront thinking, didn’t provide him any feedback until he really did start writing his tests. If Cockburn had gone down a blind alley with his thinking, he wouldn’t have gotten any concrete feedback on it until much later in the game.
But that’s not what I actually want to think about. I did read both Dijkstra’s A Discipline of Programming and Gries’ The Science of Programming “back in the day” as well (last read in 1991 and 2001, respectively, although it was a second reading of Gries; I remember finding Dijkstra almost impossible to understand, but I did keep it, so it may be time to try it again), but I didn’t remember the emphasis on up front thinking that both Ron & Cockburn seemed to claim for them. I dug out my copies of both books and did a quick flip through both of them, and I still feel that the emphasis is much more on proving the correctness of one’s code rather than doing a lot of up-front thinking. I’d previously had the feeling that there was a similarity between Gries’ proofs and doing TDD. As I poke around in chapter 13 of Gries’ book, where he introduces his methodology, I find myself believing it even more strongly.
Gries starts out asking “What is a proof?” His answer?
“A proof, according to Webster’s Third New International Dictionary, is ‘the cogency of evidence that compels belief by the mind of a truth or fact,’ It is an argument that convinces the reader of the truth of something.
“The definition of proof does not imply the need for formalism or mathematics. Indeed, programmers try to prove their programs correct in this sense of proof, for they certainly try to present evidence that compels their own belief. Unfortunately, most programmers are not adept at this, as can be seen by looking at how much time is spent debugging. The programmer must indeed feel frustrated at the lack of mastery of the subject!”
Doesn’t TDD provide that for us, at least when practiced correctly? Oh, and the first principle that Gries gives in this chapter is: “A program and its proof should be developed hand-in-hand, with the proof usually leading the way.” Hmm, sounds familiar, no?
Admittedly, Gries does speak out against what he calls “test-case analysis:”
“‘Development by test case’ works as follows. Based on a few examples of what the program is to do, a program is developed. More test cases are then exhibited – and perhaps run – and the program is modified to take the results into account. This process continues, with program modification at each step, until it is believed that enough test cases have been checked.”
On the face of it, this does sound like a condemnation of TDD, but does it really represent what we do when we really practice TDD? Sort of, but it overlooks the critical questions of how we choose the test cases and the speed at which we can get feedback from them. If we’re talking randomly picking a bunch of test cases and getting feedback from them in a matter of days or hours, then I’d agree that it would be a poor way to develop software. When we’re practicing TDD, though, we should be looking for that next simplest test case that helps us think about what we’re doing. Let’s turn to Gries’ “Coffee Can Problem” as an example.
“A coffee can contains some black beans and some white beans. The following process is to be repeated as long as possible.
“Randomly select two beans from the can. If they have the same color, throw them out, but put another black bean in. (Enough extra black beans are available to do this.) If they are different colors, place the white one back into the can and throw the black one away.”
“Execution of this process reduces the number of beans in the can by one. Repetition of the process must terminate with exactly one bean in the can, for then two beans cannot be selected. The question is: what, if anything, can be said about the color of the final bean based on the number of white beans and the number of black beans initially in the can?”
Gries suggests we take ten minutes on the problem and then goes on to claim that “[i]t doesn’t help much to try test cases!” But the test cases he enumerates are not the ones we’d likely try were we trying to solve this with TDD. He suggests test cases for a black bean and a white bean to start with and then two black beans. Doing TDD, we’d probably start with a single bean in the can. That’s really the simplest case. What’s the color of the final bean in the can if I start with only a single black bean? Well, it’s black. And it’s going to be white if the only bean in the can is white. Okay, what happens if I start with two black beans? I should end up with a black bean. Two white beans wouldn’t make me change my code, so let’s try starting with a black and a white bean. Ah, I would end up with a white bean in that case. Can I draw any conclusions from this?
I did actually think about those test cases before I read Gries’ description of his process:
“Perhaps there is a simple property of the beans in the can that remains true as the beans are removed and that, together with the fact that only one bean remains, can give the answer. Since the property will always be true, we will call it an invariant. Well, suppose upon termination there is one black bean and no white beans. What property is true upon termination, which could generalize, perhaps, to be our invariant? One is an odd number, so perhaps the oddness of the number of black beans remains true. No, this is not the case, in fact the number of black beans changes from even to odd or odd to even with each move.”
There’s more to it, but this is enough to make me wonder if this is really different from writing a test case. Actually it is: he’s reasoning about the problem, and by extension the code he’d write. But he is still testing his hypotheses, it’s just in his head rather than in code. And there I would suggest that TDD, as opposed to using randomly selected test cases, allows us to do that same kind of reasoning with working code and extremely rapid feedback. (To be fair, I believe this is what Ron was saying, too. I just want to highlight the similarity to what Gries was saying, while Ron seems to be suggesting more of a difference.)
What might get lost in TDD, at least when it’s not practiced well, is that idea of reasoning about the code. There’s an art to picking that next simplest test to write, and I suspect that that’s where much of the reasoning really happens. If we write too much code in response to a single test, we’re losing some of the reasoning. If we write our tests after the code, we’ve probably lost it entirely. And that’s something I do believe is lacking in many programmers today, evidenced, as Gries suggests, by the amount of time spent fumbling around in debuggers and randomly adding code “to see what will happen.” But that’s for another time.