Integrated Tests Are A Scam

jbrains · August 15, 2016, 12:00am

In general, I would probably change the Acceptance Test for "buy one piece of X" into an Acceptance Test for "buy n pieces of X", for whatever value of n would make the Customer feel confident in the system.

I would probably then add microtests for the typical "collection" special cases: 0, 1, many, lots (in case a stress test is useful), oops (exception).

Finally, I would decide whether it's worth adding Acceptance Tests for the stress test ("lots") or one of the failure cases ("oops"). If there are many failure cases and we handle them all the same way, then one Acceptance Test for failure would certainly suffice.

Here's the principle: we write Acceptance Tests to help the Customer feel confident that we understand what they wanted and have built it; we write Programmer Tests (microtests, mostly) to help ourselves (the programmers) feel confident in each piece of code that we write.

jbrains · August 15, 2016, 12:00am

I see a risk in this function: it does the work of two layers in the same place. Namely, it turns application-level data into domain-level data (layer 1), and then does the domain-level work (layer 2). These two kinds of behavior are completely independent from each other: finding a vessel by ID and creating a container from container_data don't care about what happens to the resulting vessel and container; and loading a container onto a vessel doesn't care where the vessel and container came from. When you put these two layers of behavior in the same place, however, you create an integrated cluster with the usual risks: as each layer becomes more complicated, the effort to test them both together increases combinatorically (worse than exponentially).

Even though the risk is there, the current cost is quite low, because there are only two layers and they're extremely simple. I would probably just live with it until the code starts to grow. Each of the three pieces of behavior is very simple: find vessel, create container, load vessel onto container. If I trust all three pieces very well, then I might not bother microtesting them when I put them together. One or two Customer Tests (Acceptence Tests) would probably give me enough confidence.

This is a version of the typical functional programming quandary: if I trust function composition (the standard library provides it) and I trust each function individually, then how much do I need to test composing them? I probably expect the composition to do the thing right (the pieces will fit together and nothing bad will happen) and I would only check if it did the right thing (did I choose the right functions to compose?). In this case, it might only take two Customer Tests to convince the Customer that we correctly load containers onto vessels: one case for the happy path and another to show how we handle not finding the vessel. (I can't tell from this code whether creating the container can go wrong.)

The big risk lies in ignoring the design when the domain work becomes more complicated than simply `vessel.load(container)`. If we don't relentlessly push all the domain work down into the domain layer of the design, then we end up with some domain behavior in `load_container_onto_vessel()` and some in the "proper" domain layer. This would force us to test some of the pure domain work in the place where we have to worry about where we got the vessel and container, even though the pure domain work probably operates directly on a vessel and a container without worrying where they came from! (This is an "irrelevant details problem" in the tests... why do I have to stub the vessel repository to return this vessel, when I just want to instantiate the vessel and call methods on it?!) As long as this function always simply calls exactly one domain method, the cost remains low, even though the risk I've identified remains, and "I have to stub vessels to return a mock vessel" remains the symptom of that risk. As long as that chain of mocks doesn't become more complicated, I can live with the result. I simply prefer in most cases to avoid it at all.

The fix, in this case, seems crazy, even though in principle it's "correct". I recommend getting some coffee before you read on.

The (risky) duplication in this case really comes from the overall embedded algorithm of "transform application data into domain data, then run a domain script". This structure ends up copy/paste duplicated in all the Transaction Script/Controller functions that we use as Adapters between our Application Framework and the Domain. It becomes boilerplate, probably because removing the duplication just looks too strange. As long as it is boilerplate, and as long as all the interesting details remain in small, well-tested functions, I might not bother testing the Controller layer. They ultimately all have the same structure: call this group of functions (each transforms app data into domain data), then call this one function with all those return values as arguments (run the domain script with all the domain data we just "transformed"). If you extract this into a function, then you can test it with mocks (it's a big Template Method/Function) and it's pretty straightforward: error handling is the only complication, and even that can be as simple as "decide whether to throw a 'data transformation error' or a 'domain script error' based on where you caught the underlying error." It feels like the beginning of a universal application controller library. Then every controller looks like the entry point of a microapplication: it "composes" a bunch of "transform app data" functions with a single "run domain script" function. Code has turned into data and there's nothing left to test. It'll just work. (I wrote more about this in "Designing the Entry Point" at http://blog.thecodewhispere... )

(I should say that one of my training students recently showed me an example of this kind of design. I really like it. In the place where they register controllers with the application framework, each controller is a little composition of pieces, each looking like the entry point of a microapplication. The equivalent of your `load_container_onto_vessel()` controller function wouldn't even exist, because there'd be no need to name it. It would just be a "composition"--in the OO, not FP, sense--of a bunch of app data transformers and a single domain model function, and that domain model function would be a lambda wrapping vessel.load(container) or even just the function Vessel::load(). This student had no specific controller-level tests. There was no need. When they put the pieces together to expose them to the application as a controller, They Just Worked.)

If we're not prepared to go down that road and actually extract the duplication so that we can test it, then we have three choices (that I see off the top of my head): duplicate our tests (in the case of integrated tests), stub an object to return a mock so that we can check it (this leads to another kind of duplication between all those tests), or trust the boilerplate (this works only if we diligently make sure that all additional behavior gets squeezed out of the controller into the pieces). This is the problem with the Mediator or Template Method: either you test all the steps together as one big cluster or you make all the steps purely abstract so that you can test the algorithm without testing the steps. Which of these options fits your mind better?

Of course, most people try to choose the last option and just trust the boilerplate. And that works... for a while. And then some programmer somewhere lets domain work creep into the controller, because, you know, we just have to get this hotfix out the door. And then some other programmer sees that and decide it's OK to do it, too. And then... and then... and then... eventually some Senior Programmer decides that enough is enough and decrees that we have to test it all, but it's all a big tangle, and nobody wants to write those tests... I guess you've seen this movie before. :) Frankly, I'd rather just do the crazy thing, because after going through this loop a few times, the crazy thing seems a lot less crazy.

jbrains · August 15, 2016, 12:00am

Regarding your last paragraph, "... when I try to use mockist style with functional paradigm in domain layer, since values are immutable so functions returns new values."

I hope my article "Beyond Mock Objects" (http://blog.thecodewhispere... ) helps you. I think you can still work in the outside-in style, although you won't need mocks so often, because your design will tend more in the direction that I described in my other (bigger) comment. Instead of layers delegating smaller pieces down to lower layers, you'll have layers composing functions from lower layers, and you don't have test composition, because it just works.

simonberthiaume · August 18, 2016, 12:00am

Great article and I couldn't agree more; I would even add to that that automated integration tests are also a lot slower to execute so when a company will want to move toward a continuous integration and/or continuous delivery practice, those tests will become a huge hindrance.

To me, Integrated tests are either a way to assert modules communicate with one another they way we intend them to or a way to test the unit tests themselves (if an integrated test fails it might be because unit tests are missing something).

jbrains · August 18, 2016, 12:00am

Yes, that matches exactly the experiences that I had which led me originally to start thinking about this around 2005. More and more people asked me for help starting with the question "Our tests are so slow... some people want to give up TDD... what should we do?"

Your second paragraph shows two common things: (1) the problem with the words "integrated" and "integration" here and (2) how the scam happens. :) When I want to check that module A talks to module B using interface C, I call that an INTEGRATION test, because it checks ONLY the point of integration between A and B. However, I prefer not to connect A to B in the tests, because that becomes an INTEGRATED test and the whole scam situation applies. Instead, I check that A talks correctly to a perfect implementation of C (collaboration tests) and that B implements C correctly (contract tests), and that's how I avoid the integrated tests scam. When we start using integrated tests to "find holes in our microtests", then we start falling victim to the scam.

Now, if we write an integrated test for a Customer (my Customer tests are usually integrated tests, because that's usually what it takes to make the Customer interested and confident) and that test happens to find a problem that isn't a Customer/Programmer disagreement, then that is probably an integration problem. Probably, A's collaboration tests disagree with B's contract tests about interface C. Maybe we missed some tests on either side of that integration or the tests disagree with one another (we stub foo() to return null, but it throws an exception instead). In this case, it's nice that a test found the error--even an integrated test--but when we start to RELY on integrated tests to help programmers find those errors, the scam kills us exactly the way you describe in your first paragraph: slow, brittle tests smother us. They hinder other practices, such as continuous delivery.

Otto · September 15, 2016, 12:00am

Very illuminating article. I am facing this exact situation on a company I recently joined (which it is why I am posting as a guest). There are only integrated tests, where the first part is setting up a scenario in the database, then executing the whole pipeline and then checking the database again for the assertions. I tried to write some unit tests for an area that I will be working on but it is nearly impossible because of the myriad of events being generated, and the dependencies on objects created by a service locator (factory).
Here's where I am having a hard time explaining why this may not be good - for all intents and purposes "the system works". They see tests being broken when they make changes, so in their eyes the process is working. The product is in production, being used by many people and companies. Any attempt to turn this product into something testable (and with a better architecture) would mean rewriting the darn thing.

Maybe what I am really wanting to ask is - should I just go with it and learn to be productive in this environment (which I think I can go with this flow) or should I start raising the issue and risking becoming the "curmudgeon" in the team?

jbrains · September 22, 2016, 12:00am

Short answer: Yes, probably.

If they don't feel any pain, then they probably interpret your dissatisfaction as pointless perfectionism.

You might feel the temptation to "show them the pain". I don't recommend it. You could engineer a crisis to encourage them to change something, but that might create more problems than it solves. It is a risky move. Do you believe the situation is so bad that they need this crisis to wake them up? or is the situation "merely" annoying to you and not to your preferences. "Being a team player" (in a meaningful sense, and not in the typical American "can-do" sense) means putting the team's needs above your own personal ones. The team has other problems to solve, at least for now.

Of course, if the bedroom is on fire and they just act like they don't see it, then that's a different story. In that case, throw one of your colleagues into the fire and see what happens. If, however, it's just that the chair is at a strange angle, and they seem to like it that way, then don't move the chair yet.

If you want to slowly, gradually (and silently!) improve the design in a way that doesn't disrupt everyone, then do that instead. Maybe they just don't realize how much better it can be. Maybe they're all in Plato's cave. Maybe you can find some feature area to work on that lets you add something alongside the existing system, so that you can design things "better" without disrupting what is already working. If you do this, and it makes some things easier, and they notice how much easier it is, then they might ask you to show them how you did it. In this case, you're not manufacturing interest, but instead you are reacting to real interest.

Otto · September 23, 2016, 12:00am

Wise words, thank you. I have been long enough in this business to choose the battles to fight, and I agree with you. I will do what I can in my area, and hopefully something might transpire during code reviews (yes, we do have pull requests and code reviews, fortunately). I guess it is the old adage, lead by example. It is just too easy and unfair to be the new guy that drops in and starts claiming "everything sucks". Thank you again!

guilherme_froes · October 22, 2016, 12:00am

Could we say that integrated and isolated (state/collaboration/contract) tests correspond to the classic (integrated) and mockist (isolated) tests styles?

jbrains · October 27, 2016, 12:00am

Broadly speaking, yes, depending on what you intend to do with that correspondence. :)

guest-reader · October 16, 2017, 12:00am

Insensitive photo of the nuclear bombing of nagasaki that killed many innocent civilians casually dropped into an article about testing???

Next time use the world trade center...

jbrains · October 18, 2017, 12:00am

Fixed. I'm sorry.

Also: http://dhemery.com/articles... particularly the part about looking for three interpretations.

tilitaadrianflorin · October 25, 2017, 12:00am

For me, I see this opinion similar to the ethernal battle between "London school" vs "chicago school". Integrated (as you stated, not integration, which is for a complex system, not modules, more black box oriented) should follow the behaviour between components. It SHOULD focus on contract, not data. As I managed to learn, it's all about the approach. A system under test should be very well defined. You will not need tons of tests. You must decide WHAT do you want to be covered and keep in mind the "pareto" principle. It should not be a race to 100% coverage

jbrains · November 3, 2017, 12:00am

Indeed. I dislike picking a number, but when people ask, I often suggest 85%. Somewhere around checking the most interesting 85% of the system, we reach the point of diminishing returns.

Oddly enough, however, you are now saying "integrated" where I would say "integration". :) How does an integrated test follow behavior between components? As I use the term, an integrated test runs many components together and treats that as a single unit (a "black box"). Can you give an example about how to use these tests to check the behavior between components? It seems to me that these tests can only check the collective behavior of the components.

If I swap "integrated test" and "integration test" in your description, then I agree completely. Did you mean that instead?

ohnobinki · June 29, 2018, 12:00am

“You say it yourself, that "unit tests should be a [...] developer tool".
Exactly right: they are, and that's how I use them. I use them
specifically to drive the design and help me identify design risks. They
help me build components that I can confidently and safely arrange in a
way to solve business problems.”

For me, I think this might be getting closer to my trouble with trying to use unit tests in some circumstances.Sometimes when I write a component that I intend to be reused and have a nice API and needs to behave in a particular way and is naturally isolatable, the first thing I do is write a bunch of unit tests because I need to consume that component from other components and have confidence that it will behave correctly. For example, something which converts string representations within my application without talking to external systems.

However, when I am writing glue code which builds an Entity Framework LINQ query from user inputs, I can’t figure out how to make that a unit test properly isolted from a database because I find it to be uncoupleable from the database. I feel like if I wrote a unit test in that situation, my tests would be something like “compare this LINQ Expression tree to the expected one”—when I actually care more about whether or not the expression tree is capable of getting me the right rows from the database than whether or not it looks like what I think will give me the right rows from the database. For this system, if I were to try to decouple components and were to write unit tests for the part that created my SQL/LINQ query, the unit test would not be useful because all it would be verifying is that my method is implemented a certain way. I.e., the unit tests would basically be like using jest’s snapshot feature and running `jest --updateSnapshot` without taking the time to read the changes made to the snapshot.

jbrains · June 29, 2018, 12:00am

My reply blew up into an article of its own. I published it here:

https://experience.jbrains....

wsleyqueiroz · August 8, 2018, 12:00am

You just throw this argument, but what is your proposal for a solution? Any good advice?
If you do not have any idea of how to solve a problem and you keep arguing, you are not giving anything good for anyone, you are just complaining like an old man.

jbrains · August 30, 2018, 12:00am

I understand. I have written extensively on this subject for over ten years. You can find those articles here. Click "Past Articles" or "Integrated Tests are a Scam" and you can find them.

I also teach this concept in all my training courses on software design, such as The World's Best Intro to TDD, which I teach in Europe in person every year and for which there is an online training course that you can join whenever you like. https://tdd.training

I also describe my solution in the video linked in this article. I know that not everyone enjoys watching videos, so I hope that the other articles here provide enough information in case you don't want to watch a video.

Other people also write about this topic, so you can search the web for terms like "collaboration tests" and "contract tests" to read more about them.

animehair · January 14, 2019, 12:00am

" The link is useful thanks for sharing here."?

disqus_XMC15OQzEf · July 26, 2019, 12:00am

I've generally found that arguments such as this work really well as long as you don't have to refactor your code.

Whenever I refactor code, I find dozens (if not hundreds) of unit tests that fail because of failures (particularly of the "Mockito" variety) that say, "This function should have called this other function!" when in reality the answer is, "No, it doesn't call that other function because the entire logic surrounding that operation changed and made the call unnecessary."

And around and around and around it goes. Hour after hour goes by, "fixing" tests, but never fixing a single bug.

Integration tests (to me) have always been of the variety of, "If the user types 2+2 into the calculator, the answer had better be 4. You can refactor all you want, BUT ... if you return anything other than a 4, you broke it."

Now you're right--when one of those tests fails, it tells you nothing about *what* failed. But when they fail, it can usually only mean one thing--YOU BROKE IT.

Integration tests protect your code from introduced bugs. Unit tests protect your code from change.