Continue Discussion 74 replies
December 2010

disqus_kxzh7T4ZRq

J.B.,

1. Having read most of your blog posts on integrated tests, I have a far less allergic reaction to your "Integrated tests are a scam" position.

2. I didn't understand from my earlier reading that you hold Acceptance Tests in higher esteem than Integrated Tests. With your definition of Integrated Tests defined in a lot of detail, I understand how and why you're making the distinctions between Integrated Tests and Acceptance tests. That diffuses most of my objections to the statements that I thought you were making (through my own misinterpretation / over-simplification of your position).

3. I think the 6 seconds vs. 60 seconds is a real issue and you illustrated it well. To state the obvious, Acceptance tests trigger the 60 second problem much less frequently than frequently run Integrated Tests do.

4. Aslak Hellesoy is absolutely right when he points out that testers can effectively choose high-value tests through combinatorial and pairwise software test case designs. This test design strategy can often detect the significant majority of defects in a System Under Test with fewer than 1% of the total possible number of tests executed. Trying to write truly comprehensive Integrated Tests would be lunacy. See, e.g.:
http://www.combinatorialtes... and
http://hexawise.com/case-st...

5. If you had your way, everyone worldwide involved in developing and testing software would understand the significant costs and serious limitations of conducting Integrated Testing. If I had my way, those same people would all understand the significant efficiency and effectiveness gains they could achieve through pairwise and combinatorial testing methods. It is amazing to me that such a small percentage of the software development and software testing community remains unaware of such a powerful approach.

6. I used to think our views were mostly opposed. (Again, mea culpa: it was because I didn't take the time to fully understand what you were and were /not/ saying). I now think our views are mostly consistent.

- Justin Hunter

April 2012

Ron_662

And instead just do functional tests?

1 reply
April 2012 ▶ Ron_662

jbrains

I tried that. I noticed that those tests don't exert enough pressure on the design; they don't provide enough warning about tangled dependencies.

Besides, how often are functional tests not integrated? As far as I can tell, the concept of customer unit tests never really took off.

1 reply
June 2012

tomekkaczanowski

Hello, many thanks for this blog post and for the Agile 2009 presentation, which is one of the best I've ever seen.

A short comment on my side. One thing I try to fight against, is the idea of having a test-fixture created by bringing the whole Spring context to life, instead of simply creating few objects in some setup() method. This of course makes tests last much longer and requires the team to maintain numerous XML spring config files. I know this is not exactly what you are talking about, but for me this is just another example of blurring the border between unit (focused) and integration tests, mainly because someone is too lazy to create objects in test code.

Cheers!

2 replies
June 2012 ▶ tomekkaczanowski

jbrains

Dziękuję, Tomek! I have had the same experience with Spring, and so when someone asks me about that, I give them a simple Novice Rule: never load a Spring configuration in an automated programmer test. Eventually a person will reach the point where he wants to test that he has configured Spring correctly, but understands not to use Spring to test the classes he instantiates with Spring.

June 2012 ▶ tomekkaczanowski

yevgenbushuyev

Hi,
That's what makes whole idea fail. Whenever setup is an essential part of the application, and in most cases it is, you can't avoid integration tests. Logging, security, transactions or anything you can put into aspect/interceptor must be tested somehow and it's only integration test which can prove it works. And as soon as you have one integration test you add them more and more. Then add the automated UI/selenium/ testing and a pure test cases will never be written by your team. Doh.

July 2012

disqus_Si5QRIeOmA

By your definition aren't acceptance tests a form of integration tests? Do you keep those or tend to throw them out? Or is the point simply do the testing you have to do (ie - acceptance for feature specification) but don't go down the rabbit hole of just testing because it gives you some kind of invalid personal assurance that your app will be bug free?

1 reply
July 2012 ▶ disqus_Si5QRIeOmA

jbrains

Hello, Ryan. Thank you for your comment.

Yes, most people implement most acceptance tests as integrated tests. I have experimented with "customer unit tests" as James Shore has called them, in which I take examples from a customer and run them directly against the smallest cluster of objects (sometimes a single one) that implements them, but those tend to make up perhaps 1% of my test suites.

I don't use integrated tests to show basic correctness (see http://link.jbrains.ca/OcKoSm for more), and anyway this entire discussion relates to programmer tests, not customer tests. I use integrated tests only where I want to check integration: system-level tests, smoke tests (very few in number), performance and reliability tests... but not to show that an object would compute the right answer running on a Turing machine.

I hope this helps clarify things for you. I have said this for years in my training courses, but whenever I try to write about it, it grows to thousands of words, and I don't want to inflict too many such articles on the world.

1 reply
July 2012 ▶ jbrains

disqus_Si5QRIeOmA

Thanks so much for your response. A colleague and myself were discussing this post and while I agree with it, I found the concept of throwing out customer acceptance tests challenging.

1 reply
July 2012 ▶ disqus_Si5QRIeOmA

jbrains

It appears I need to write this in bigger, brighter letters. :)

September 2012 ▶ jbrains

ian_dunn

So, what do you recommend instead of integrated tests? You spent the whole article saying why they're bad (and made valid points), but didn't propose any kind of alternative.

2 replies
September 2012 ▶ ian_dunn

jbrains

This is only the first article in a series, but sadly, the link to the rest of the articles is broken. Until I fix it, try these Google keywords: "site:blog.thecodewhisperer.com integrated tests"

1 reply
September 2012 ▶ ian_dunn

jbrains

Link fixed. Thanks.

September 2012 ▶ jbrains

ian_dunn

Awesome, I'll check those out :)

November 2013

rmibourgarel

Do you treat ORM / Data Framework as trivial behavior ? With other words : do you always put your data access strategy behind a layer ?

1 reply
November 2013 ▶ rmibourgarel

jbrains

I assume that I should hide data access behind a layer, then wait for evidence that I should change it. I make this decision based on past experience. I learned this one very early: http://dddcommunity.org/cas... so sometimes I forget to reconsider it when things have changed, so I need other people to remind me to reconsider it. :)

1 reply
November 2013 ▶ jbrains

rmibourgarel

This decision is driven by the utopia of the "persistence independent" model isn't it ? Do you think it's realistic ? Would you be able to build your domain and then say "ok i'll store it with either xml serialization or nhibernate or ravendb or json files or entity framework....". Persistence is the main bottleneck in most applications, hiding it behind a layer means you'll have less control once you want to improve your response time.

http://ayende.com/blog/4567...

1 reply
November 2013 ▶ rmibourgarel

jbrains

I don't want to be able to switch ORMs, but I also don't want my persistence services to pervade the rest of my system, because then it's *really* hard to change how some part of it behaves.

"...hiding it behind a layer means you'll have less control once you want to improve your response time." I think the opposite: hiding it behind a layer means I have total control over improving response time, because there's almost no chance that clients could depend on the implementation details of the persistence services, do they can change almost any way they want, as long as that doesn't break the basic behavior of find and save.

March 2014

robpocklington

Seems REA Australia have a ruby gem to enforce client / server contract tests - worth a look.

https://github.com/realesta...

1 reply
April 2014

krisztinahirth

I just learned about this blog post and now I would really like to know your opinion about Ian Connors way: http://vimeo.com/68375232

1 reply
September 2014

jbrains

They are integra*tion* tests, but not integra*ted* tests, which explains why I changed the name of this talk several years ago.

September 2014 ▶ krisztinahirth

jbrains

I'm slow to respond. Sorry.

I haven't watched Ian's talk yet. I know that several programmers whom I respect don't use mock objects (or at least don't use message expectations, although they "fake" functions to return hardcoded responses), and I have not yet taken the time and opportunity to learn how they do things differently than I do.

I see many programmers use stubs/fakes as a way to implement a message expectation: they stub foo() to return 23, then call bar(18), which they know will call foo() and return its value plus 9, then they check that bar(18) returns 32. I find this risky: it uses indirect knowledge of bar()'s implementation in order to justify expecting 32 (23 + 9) at the end. I prefer simply to say "when I invoke bar(18), the result should be that something eventually invokes foo()", because although it always knows some implementation details: (1) it describes the *essential* result of bar(18) that we care about and (2) it describes this result quite abstractly ("something eventually invokes foo()"). I find this method expectation more stable and easier to understand than "I know that bar() returns foo() + 9 and I know that bar(18) causes foo() to return 23."

May 2015

davidleonhartsberger

You are wrong at about 19:30 of your talk where u said after refactoring of a cluster of objects that you would need more tests afterwards. If you refactor you will not need more tests afterwards otherwise you are not refactoring.. this was a really horrible example imo, otherwise the talk was ok/good

1 reply
May 2015 ▶ davidleonhartsberger

jbrains

Pulling things apart generally means opening code up to the possibility of being used in a way that its current client (probably the entry point of the cluster) happens not to use it. This is a (generally) unavoidable consequence of removing code from its context. When we leave a block of code in its context and its client only sends it a limited set of inputs, we can safely avoid some of the tests we would otherwise think to write, and in the interest of time, we usually don't write those tests. When we separate that block of code from its context, it becomes liable to be sent those previously-unseen inputs and we have to decide whether to care about that. It seems generally (though not always) irresponsible never to add tests for those previously-unconsidered inputs.

In the process, we have refactored--we haven't changed the behavior of the system yet--but we need more tests in order to support reusing newly-available code in other contexts. We don't need to add those tests for the current system yet, but it's only a matter of time before we regret not adding them.

So it is that refactoring can make tests that we were once able to safely avoid less safe to avoid. Of course, in exchange for this risk, our refactoring opens up code for potential reuse that was not previously available for reuse, so if we don't intend to try to reuse that code, then we probably shouldn't separate it just yet.

This highlights some interesting tension between which set of tests does the current system need as a whole and which set of tests do the parts of the system need individually. On the one hand, we don't want to waste energy testing paths that the system as a whole does not execute; but on the other hand, if we don't write those tests, then we might run into latent mistakes (bugs) while adding features that use never-before-executed paths of existing code. I'd never thought of that in particular before. Another of the many tradeoffs that makes writing software complicated.

August 2015

jbrains

Thank you for your kind words.

First, when I count layers, I refer to frames in the call stack, focusing on just the code I need to test. In a typical application I have some framework "above" me (it calls my code), some libraries "below" me (I call them), and my stuff in the "middle". If I halt in some arbitrary spot in the code, I get a call stack and can look at how many frames of that call stack represent "my code" (or, more precisely, code I want to test). A "layer" means a level/frame of the call stack. My code might go to a maximum depth of 10 layers between the framework I deploy it into and the libraries I use. Broadly speaking, then, my code "is 10 layers deep". Of course, different code paths go through different numbers of layers, but in something like a typical MVC structure, most of the controllers have similar call stack depth and I'd use that most common call stack depth as a stand-in for "the number of layers".

The actual number matters less than the fact that it is an exponent!

As for "examples*paths^layers", I don't think I wrote that and couldn't find that. Broadly speaking, one needs to write one example per path, so examples=paths is roughly true and we need approximately paths^layers tests/examples to check the code thoroughly. Again, I've used this as a simplifying approximation: the number of examples is roughly the product of the number of paths through each layer, so if there are 5 paths through layer 1, 7 paths through layer 2, and 3 paths through layer 3, then there are close to 5*7*3 paths through the 3 layers when taken together, assuming that all 3 layers are involved in every path, which might not be the case. (Even if you can cut this in half or in thirds, in a typical system, it grows out of control quickly.) This is pretty close to 5^3, where 5 is the median number of paths through a single layer and 3 is the number of layers. Again, even if we can multiply this by a relatively small constant like 1/5 or 1/10, as the exponent grows slowly, the number of tests/examples we need explodes. Of course, this all applies to checking the code by running all the layers together, meaning an integrated test.

August 2015

jbrains

In the case of form fields, I would count the paths through this layer this way: how many different data formats do I have to worry about (dates, numbers, other special kinds of text) and how many different responses are there for each type (are there many success paths? are there any fundamentally different failure paths?). If you test mostly through integrated tests, then you'd have to write the same tests over and over again for every date field, every number field, and so on. Duplication. Ick. If you test the UI separately from handling the requests in the controllers, then you can avoid a lot of this duplication by extracting duplicate code into Helpers (or whatever replaces those these days) and testing them directly in isolation. Then when you say "this is another number field", you can just wire up the right Helper with confidence. (You might write a smoke test to double-check that you've wired a NumberFieldHelper to that number field.)

Each controller method will probably have multiple paths, so count those, including if (for example) 3 fundamentally different kinds of input happen to lead down the same path--that's 3 paths, not 1.

With Rails in particular, you need to focus on model methods with custom code and if that custom code that calls some of the ActiveRecord magic, and you're worried about using ActiveRecord correctly, then you need to include the ActiveRecord layer in your test. When that's a straight pass-through, you can count Model + ActiveRecord as a single path through a single layer, as long as you feel confident that ActiveRecord isn't going to blow up on that path. It's when you do things like use scopes or complicated queries or complicated updates that you have to count more paths. It's even worse if you use ActiveRecord lifecycle callbacks. (Don't.)

Certainly, if you rely on end-to-end scenarios to check that behavior in one part of the system (like focusing on a particular UI element or a particular complicated model update), then you'll see rampant duplication/redundancy and understand how big a waste of energy integrated tests can be. I simply won't volunteer to write all those tests. We have to choose between spending all our time writing redundant integrated tests or cutting corners, writing those tests "tactically", and hoping that you haven't missed anything important.

Rails does, however, make it a little better, because if you stick to scaffolding, then the whole thing acts like one big layer. This means fewer integrated tests, but it also means that you're relying on Rails' default behavior. As you deviate from its omakase settings, you "introduce" new layers in the sense that things that used to behave as a single layer don't any more. It's probably more accurate, then, to think about the Rails UI+controller+model+ActiveRecord beast as a single layer for the purposes of this calculation. The number of paths still goes up the same way (combinatoric), but it's clearer that we can treat large parts of the app as "a single layer". It means that you need thousands of tests instead of millions. (You still shouldn't need to write all those tests.)

I hope this helps.

August 2015

jbrains

You're most welcome. I hope it helps.

June 2016

alexanderworden

What makes you think that your Unit Tests are covering the permutations? They're not either. Your argument is flawed. Yes, there are billions of potential permutations, but not from a business requirements perspective. The main problem with Unit Tests - apart from the fact that they don't tell you if your code actually works in production - is that they're typically not based on business requirements, because they're too low level.

1 reply
June 2016 ▶ alexanderworden

jbrains

This ended up being a long answer, so let me summarize:

1. I know that my unit tests aren't covering all the permutations; I didn't claim that here.
2. The billions of permutations come from our design choices, and not the business needs, but when we ignore those permutations, we get stack traces and admin calls at 4 AM, so let's stop ignoring them.
3. What you call a "problem" with unit tests seems similar to say that the problem with a cat is that it's not a dog. Well, yes: cats are cats and dogs are dogs.

The details will take longer to draft than I have time for right now, but I will post them in the coming days.

1 reply
June 2016 ▶ jbrains

alexanderworden

Thanks for the reply!

You simply can't cover every permutation, no matter what approach. It's just not a valid argument.

I agree that design is key. More importantly, the ability to refactor allows the implemented design to improve. Unit tests hinder refactoring. Integration tests enable and validate refactored code.

I'm not getting the cat analogy sorry. It seems like you're attached to the solution so much that it now trumps the requirements. If I asked for a pet that is faithful, likes to take walks and fetch sticks, and you give me a cat, then yes, that's a problem. ;-)

Writing performant and reliable Integration Tests is hard, but it's the right solution. Unit Tests should be a last resort or developer tool.

1 reply
June 2016 ▶ alexanderworden

jbrains

You're most welcome!

Alex, I don't quite understand what you mean by "valid argument" here. I don't claim to be able to cover every permutation, so I don't understand why you'd consider my argument invalid based on negating a claim that I still haven't made. Where is the wrong link in the chain?

"X is a bad approach" can be true even if there is no perfect approach. "X is a bad approach" can be true even if there is no good approach (trivially true, but still true). I agree that "X is a bad approach" is a weak argument if there are no better approaches, but even that wouldn't make the argument *invalid*. None of these match my argument.

I do not have "cover every permutation" as a goal, even though I do have the goal of "cover most of the most interesting permutations", because it generally leads to "have more confidence both in the code as is and in being able to change it safely when needed". I therefore prefer approaches that let me cover more permutations with fewer tests (less effort, less resistance). I get this with isolated object/module tests over integrated tests.

"Unit tests hinder refactoring" is just plain wrong, because it misidentifies the cause of the problem. Excessive coupling hinders refactoring. Yes, I see excessive coupling in a lot of code, and yes, some of that code is in what the programmer probably intended to be unit tests. When I use isolated tests to drive towards suitable abstractions, I just don't have this coupling problem and my tests don't hinder refactoring--at least not for long. If I see that my tests are threatening to hinder refactoring, then I look for the missing abstraction that provides the needed reduction in coupling, extract it, and the problem disappears. Indeed, this is how I learned to really understand how to engage the power of abstraction.

Integrated tests (not "integration tests"!) constrain the design less, which allows design freedom, but also provides less feedback about excess coupling. They tolerate weaker (harder to change, harder to understand) designs, and those get in the way of refactoring, usually due to high levels of hardwired interdependence. Isolated tests provide considerable feedback about unhealthy coupling, but a lot of programmers seem either to have too high a tolerance for this unhealthy coupling or don't see the signs. I teach them to see the signs and heed them.

The cat/dog thing is just this: cats are not better than dogs and dogs are not better than cats, but rather they are different and suit different situations. That a cat is not a dog isn't a "problem" with cats. Similarly, that an isolated programmer test doesn't do the job of a customer test (check requirements) isn't a problem with isolated programmer tests. You say it yourself, that "unit tests should be a [...] developer tool". Exactly right: they are, and that's how I use them. I use them specifically to drive the design and help me identify design risks. They help me build components that I can confidently and safely arrange in a way to solve business problems.

I don't understand why you'd think that I believe that "the solution... trumps the requirements". I don't. I believe that both matter and attend to both. I also use different tools for different needs: programmer tests help me with the design of the solution and customer tests help me check that we're meeting the needs of the users/customers. So yes, if I try to use programmer tests to talk to a customer, that'd be like giving you a cat to play fetch with. (Some cats play fetch, but I wouldn't bet on it.) Sadly, I see a lot of programmers do the opposite: they try to use their customer tests to check the correctness of tiny pieces of their design. This wastes a lot of time and energy. I recommend against it. I recommend approaching two different kinds of testing/checking differently. Specifically, I recommend not using integrated tests to check low-level details in the design.

"Writing performant and reliable integrated (I assume) tests... is the right solution." ...to which problem? I use them where I find them appropriate, but I don't find them appropriate when I really want to check that I've built the pieces correctly and that they will talk to each other correctly. Think about checking the wiring in your house: isn't it much more effective to just check the current on any wire at any point, so that you can isolate whether the problem is the wire or the ceiling fan? Why limit yourself to detaching and reattaching various ceiling fans in various rooms in order to isolate the problem to the wire or the ceiling fan? (This is a real problem we had recently in our house.) Why would such an indirect way to investigate the problem ever be "the (singular?) right solution"? Yes, if we can't justify digging into the walls, then it might be the most cost-effective solution in that situation, but now imagine that there are no walls! Why limit yourself then? Why act like there are walls to work around when there are no walls?

When we write software, we can just tear the walls down and rebuild them any time we want, so why would we volunteer to pretend that we can't touch them? I don't see the value in limiting ourselves in that way. I don't. My programmer tests help me build components that I can recombine however I need, and if I find that a combination of components isn't behaving as expected, I can inspect each one on its own, find the unstated assumption I'm making, and add or fix a couple of tests in a few minutes. No need to pour through logs and long, integrated tests with excessive setup code to isolate the problem.

Now, of course, we do only discover some problems by putting everything together and seeing how it runs, so we absolutely need some integrated tests, but why would I ever volunteer to find *every* problem this way?! I would hope that finding a problem with integrated tests would be the exception, and not the rule.

2 replies
August 2016

mykezero

Hi, I was wondering if integration tests for database related classes are scams as well?

I've been tasked with building yet another crud screen, and have been using integrated tests to ensure queries return the right information, and commands result in the correct side effects.

The integrated tests seem to have shortened my feedback loop when developing the SQL statements since any negative changes to the SQL tell me if something has gone awry.

They also provide a type of living documentation as to the database environment that the class is living in (where I work, there is no version control for the database and no tracking of changes).

At least, this way, when something changes, the test must change as well keeping this documentation up to date (I've been in-lining the SQL statements in one project, and using an ORM in another one).

I feel like they are helping me to some extent, but they are slow to develop and slow to run. I still unit test most of the functionality. Am I doing something terribly wrong by testing those types of classes this way? Is there an alternative to this type of testing?

1 reply
August 2016

rafauyski

Great article. In the begining I thought that mocking everything will lead me to complicated tests and will hinder refactoring. But after few tests I noticed that it's pretty clean and refactoring is easy as long as I keep good design (no extensive coupling), so it's not a problem of TDD approach but design itself.

I have a question about acceptance tests that drives our unit tests.
From what I know, we shouldn't write more unit tests if acceptance test is already passing. It means we have to write acceptance test for every little feature we want to implement. That leads me to writing a lot of acceptance tests. Am I doing something wrong? Because my testing pyramid looks a bit like a block, rather than pyramid.

Another question is about collaborators that returns other objects with non-trivial behaviour, should I mock both collaborator that returns this object and object itself? Or maybe I have some problem with my design if I have to do this?

2 replies
August 2016 ▶ mykezero

jbrains

Yes and no. I have a "standard talk" that I do to explain the details, and I don't have time to reproduce that here, but I appreciate the reminder to record it or write it down! :)

The "No" part comes from writing integrated tests only at the point of actually integrating with third-party software with the express goal of checking that integration. If you write integrated tests, then write them to check that the last layer of Your Stuff integrates well with Their Stuff.

The "Yes" part comes from ignoring the duplication in the production code at this integration point. I wrote about this in detail in _JUnit Recipes_, in the chapter on testing databases. You probably don't need to test 100 times that you can execute an SQL query or update and can clean up database resources and so on. (1) Check that once and (2) Use a library for it. (In the Java world, for example, I still like Spring JDBC templates, as long as we use them as a library, and don't inherit everything else from Spring.) So I recommend this to you: start relentlessly removing duplication among your modules that talk to the database directly, and see what happens. What kinds of duplication can you extract? Some of it will be specifically talking to the database without worrying about your domain model and some of it will be specifically working with your domain model without talking to the database. Both of these options are easier to test individually than putting both together. When we check database integration using our domain model or when we run the database just to check our domain model, that's where the scam returns. Don't do that.

But, as always, do whatever you find helpful until it becomes less helpful, then try removing more duplication. That almost always helps me. :)

1 reply
August 2016 ▶ jbrains

mykezero

Thanks for the reply! This definitely clears up a lot of confusion I was having.

August 2016 ▶ rafauyski

jbrains

"We shouldn't write more unit tests if acceptance test is already passing." I disagree, and even worse, following this rule is exactly a version of The Scam. Here's the problem: to change code confidently we need very fast feedback from a set of tests that check our code thoroughly; (1) to have very fast feedback from tests, we need tests to execute very quickly, but acceptance tests tend to execute more slowly AND (2) to check our code thoroughly requires a lot of tests AND acceptance tests tend to run more of the system, so the number of acceptance tests we need to cover code is much higher than the number of microtests (similar to unit tests) we need to cover the same code equally well. So it seems to me that using bigger tests in this way will create risk, and that risk will only increase over time until it becomes a problem, and when it becomes a problem, it will become a BIG problem.

This explains why I don't try to use one test for two purposes.

We write two kinds of tests: Customer Tests help the customer feel comfortable that we have built the feature that they have asked for and Programmer Tests help the programming feel comfortable that they understand the behavior of each part of the code and can change it with confidence and ease. These happen to be two very different goals that we solve best with two different kinds of tests. These two kinds of tests happen to compete with each other: Customer Tests generally need to be long, run the whole system, and are therefore less helpful telling us about what happens in smaller parts of the code. Programmer Tests generally need to be fast, zoom in on one small part of the system, and therefore aren't enough to give customers the confidence that we want to give them. Trying to use one kind of test for these two competing goals mostly doesn't work.

When the system is small, the two sets of tests look similar, and so we believe that we are needlessly duplicating effort. As the system grows and we remove more duplication and the structure becomes more complicated, these two sets of tests diverge more and more from each other, and it becomes much clearer the difference between these two kinds of tests. Sadly, many people don't see this divergence because the cost of your "testing block" starts to look too high, and many people lose patience and throw away the tests. They often don't reach the point of gaining the experience to see the point when the cost/benefit curve starts to bend. :P It's quite sad, really, because this makes me sound like I'm crazy and it makes the idea sound like a theoretical one, instead of the very practical one that it is. On the other hand, I got quite a few clients over the years because they reached a point where the tests are "too expensive" and asked themselves "Are we doing something strange?" and then contacted me.

If you are comfortable with the idea of having two kinds of tests for two different purposes and two different audiences, then The Scam won't kill you.

1 reply
August 2016 ▶ rafauyski

jbrains

Regarding "collaborators that returns other objects with non-trivial behaviour", this alone doesn't seem like a problem to me, as long as the current test focuses on one module's behavior at a time. The problem happens when you start to want to stub/mock A to return a B and then stub/mock B in the test in order to check the interesting behavior. I interpret this as a sign of a design problem. If X (the module you are checking now) only uses A to get to B, then I consider this dependency unhealthy. I would change the design so that X depends directly on B *without knowing where B came from*. This is an example of "pushing details up the call stack". The origin of B (that it comes from A) is the detail that X perhaps doesn't really need to know about. Read http://blog.thecodewhispere... for a deeper description.

If X uses A and B, then you might decide that X needs to know that it can ask A for B. I disagree. If X uses A and B, then it might just be a coincidence that I can ask A for B, and in this case, I imagine that A-provides-B is a property that can change over time, so I would change X to accept A and B in its constructor, and push the detail of A-provides-B up the call stack. If, on the other hand, A-provides-B sounds like an essential property of A and B-must-come-from-A sounds like an essential property of B, then I see another risk: if X needs both A and B, then X probably has too many responsibilities. I would expect to split X into Y and Z, where Y needs A and Z needs B.

I can think of one more possibility: if X needs both A and B AND A-provides-B seems to make sense AND X seems like it has only one responsibility, then maybe X knows too much about the interaction between A and B. You can see this if you have to copy/paste a lot of stubs/mocks of A and B throughout your tests of X. You can also see this if you feel tempted to write comments in your tests for X that explain why you have to write so many stubs/mocks of A and B in those tests. In this situation, X probably wants to rely on some new abstraction C that summarizes the purpose of using A and B together, and X should depend on C and maybe not A nor B at all!

This happens to me when I notice that I start to have many unrelated steps to perform in the same place, and I notice that adding a feature means adding a new unrelated step to an ever-expanding algorithm. Each of these unrelated steps needs to happen, but they are quite independent of each other, and as I add more collaborators, I have tests with 3-5 important side-effect goals, one on each collaborator. What's happening? These are event handlers, and my X is really just firing an event, and today I need 3 listeners for this event, but when I add new features, I often add a new listener for this event. The missing abstraction--the C in the previous paragraph--is the event. When I introduce the event, X's tests simplify to "fire this event with the right data at the right time" and all the tests for A, B, and its friends simplify to "here are the various kinds of event data you can receive--how you handle them all?" All those complicated tests for X that mock 3, 4, 5 (and every month more...) different collaborators disappear.

I know that this sounds quite abstract. It's harder to describe without a concrete example, which I plan eventually to include in my next online TDD course (The World's Best Intro to TDD: Level 2... some time in 2017, I hope). I hope that, for now, you can find your situation in one of these cases. :)

...or maybe you meant something else entirely? I can try again.

1 reply
August 2016 ▶ jbrains

rafauyski

Firstly I would like to thank you for Your very detailed answer. I really appreciate that you can find time to help guys like me and others, who cannot find answers on web or in books.

Coming back to my question. I have a concrete example and it's actually pretty simple. What I tried to test was an "application service" as of DDD.
In example:


def load_container_onto_vessel(vessel_id, container_data, vessels):
vessel = vessels.get(vessel_id) # vessels is a repository
vessel.load(Container(container_data['volume'], container_data['mass']))

This is where I had to mock `vessels.get` and `vessel.load` that will be returned by `vessels.get` mock.
According to your answer this a design problem, if I got it right, but I have no idea how to do it differently.

I also have a similiar problem when I try to use mockist style with functional paradigm in domain layer, since values are immutable so functions returns new values.

2 replies
August 2016 ▶ jbrains

rafauyski

Idea of different purpose tests appeals to me. But then, how do you drive implementation of features that are not covered by acceptance tests?
In example: "client can buy one piece of X from store" is acceptance tested, but there is no acceptance test for case "client can buy two pieces of X", but still we have to implement it. Do we just have to remember to write unit test for it, or maybe you write some more coarse grained tests for that, but not of acceptance test scale?

1 reply
August 2016 ▶ rafauyski

jbrains

In general, I would probably change the Acceptance Test for "buy one piece of X" into an Acceptance Test for "buy n pieces of X", for whatever value of n would make the Customer feel confident in the system.

I would probably then add microtests for the typical "collection" special cases: 0, 1, many, lots (in case a stress test is useful), oops (exception).

Finally, I would decide whether it's worth adding Acceptance Tests for the stress test ("lots") or one of the failure cases ("oops"). If there are many failure cases and we handle them all the same way, then one Acceptance Test for failure would certainly suffice.

Here's the principle: we write Acceptance Tests to help the Customer feel confident that we understand what they wanted and have built it; we write Programmer Tests (microtests, mostly) to help ourselves (the programmers) feel confident in each piece of code that we write.

August 2016 ▶ rafauyski

jbrains

I see a risk in this function: it does the work of two layers in the same place. Namely, it turns application-level data into domain-level data (layer 1), and then does the domain-level work (layer 2). These two kinds of behavior are completely independent from each other: finding a vessel by ID and creating a container from container_data don't care about what happens to the resulting vessel and container; and loading a container onto a vessel doesn't care where the vessel and container came from. When you put these two layers of behavior in the same place, however, you create an integrated cluster with the usual risks: as each layer becomes more complicated, the effort to test them both together increases combinatorically (worse than exponentially).

Even though the risk is there, the current cost is quite low, because there are only two layers and they're extremely simple. I would probably just live with it until the code starts to grow. Each of the three pieces of behavior is very simple: find vessel, create container, load vessel onto container. If I trust all three pieces very well, then I might not bother microtesting them when I put them together. One or two Customer Tests (Acceptence Tests) would probably give me enough confidence.

This is a version of the typical functional programming quandary: if I trust function composition (the standard library provides it) and I trust each function individually, then how much do I need to test composing them? I probably expect the composition to do the thing right (the pieces will fit together and nothing bad will happen) and I would only check if it did the right thing (did I choose the right functions to compose?). In this case, it might only take two Customer Tests to convince the Customer that we correctly load containers onto vessels: one case for the happy path and another to show how we handle not finding the vessel. (I can't tell from this code whether creating the container can go wrong.)

The big risk lies in ignoring the design when the domain work becomes more complicated than simply `vessel.load(container)`. If we don't relentlessly push all the domain work down into the domain layer of the design, then we end up with some domain behavior in `load_container_onto_vessel()` and some in the "proper" domain layer. This would force us to test some of the pure domain work in the place where we have to worry about where we got the vessel and container, even though the pure domain work probably operates directly on a vessel and a container without worrying where they came from! (This is an "irrelevant details problem" in the tests... why do I have to stub the vessel repository to return this vessel, when I just want to instantiate the vessel and call methods on it?!) As long as this function always simply calls exactly one domain method, the cost remains low, even though the risk I've identified remains, and "I have to stub vessels to return a mock vessel" remains the symptom of that risk. As long as that chain of mocks doesn't become more complicated, I can live with the result. I simply prefer in most cases to avoid it at all.

The fix, in this case, seems crazy, even though in principle it's "correct". I recommend getting some coffee before you read on.

The (risky) duplication in this case really comes from the overall embedded algorithm of "transform application data into domain data, then run a domain script". This structure ends up copy/paste duplicated in all the Transaction Script/Controller functions that we use as Adapters between our Application Framework and the Domain. It becomes boilerplate, probably because removing the duplication just looks too strange. As long as it is boilerplate, and as long as all the interesting details remain in small, well-tested functions, I might not bother testing the Controller layer. They ultimately all have the same structure: call this group of functions (each transforms app data into domain data), then call this one function with all those return values as arguments (run the domain script with all the domain data we just "transformed"). If you extract this into a function, then you can test it with mocks (it's a big Template Method/Function) and it's pretty straightforward: error handling is the only complication, and even that can be as simple as "decide whether to throw a 'data transformation error' or a 'domain script error' based on where you caught the underlying error." It feels like the beginning of a universal application controller library. Then every controller looks like the entry point of a microapplication: it "composes" a bunch of "transform app data" functions with a single "run domain script" function. Code has turned into data and there's nothing left to test. It'll just work. (I wrote more about this in "Designing the Entry Point" at http://blog.thecodewhispere... )

(I should say that one of my training students recently showed me an example of this kind of design. I really like it. In the place where they register controllers with the application framework, each controller is a little composition of pieces, each looking like the entry point of a microapplication. The equivalent of your `load_container_onto_vessel()` controller function wouldn't even exist, because there'd be no need to name it. It would just be a "composition"--in the OO, not FP, sense--of a bunch of app data transformers and a single domain model function, and that domain model function would be a lambda wrapping vessel.load(container) or even just the function Vessel::load(). This student had no specific controller-level tests. There was no need. When they put the pieces together to expose them to the application as a controller, They Just Worked.)

If we're not prepared to go down that road and actually extract the duplication so that we can test it, then we have three choices (that I see off the top of my head): duplicate our tests (in the case of integrated tests), stub an object to return a mock so that we can check it (this leads to another kind of duplication between all those tests), or trust the boilerplate (this works only if we diligently make sure that all additional behavior gets squeezed out of the controller into the pieces). This is the problem with the Mediator or Template Method: either you test all the steps together as one big cluster or you make all the steps purely abstract so that you can test the algorithm without testing the steps. Which of these options fits your mind better?

Of course, most people try to choose the last option and just trust the boilerplate. And that works... for a while. And then some programmer somewhere lets domain work creep into the controller, because, you know, we just have to get this hotfix out the door. And then some other programmer sees that and decide it's OK to do it, too. And then... and then... and then... eventually some Senior Programmer decides that enough is enough and decrees that we have to test it all, but it's all a big tangle, and nobody wants to write those tests... I guess you've seen this movie before. :) Frankly, I'd rather just do the crazy thing, because after going through this loop a few times, the crazy thing seems a lot less crazy.

August 2016 ▶ rafauyski

jbrains

Regarding your last paragraph, "... when I try to use mockist style with functional paradigm in domain layer, since values are immutable so functions returns new values."

I hope my article "Beyond Mock Objects" (http://blog.thecodewhispere... ) helps you. I think you can still work in the outside-in style, although you won't need mocks so often, because your design will tend more in the direction that I described in my other (bigger) comment. Instead of layers delegating smaller pieces down to lower layers, you'll have layers composing functions from lower layers, and you don't have test composition, because it just works.

August 2016

simonberthiaume

Great article and I couldn't agree more; I would even add to that that automated integration tests are also a lot slower to execute so when a company will want to move toward a continuous integration and/or continuous delivery practice, those tests will become a huge hindrance.

To me, Integrated tests are either a way to assert modules communicate with one another they way we intend them to or a way to test the unit tests themselves (if an integrated test fails it might be because unit tests are missing something).

1 reply
August 2016 ▶ simonberthiaume

jbrains

Yes, that matches exactly the experiences that I had which led me originally to start thinking about this around 2005. More and more people asked me for help starting with the question "Our tests are so slow... some people want to give up TDD... what should we do?"

Your second paragraph shows two common things: (1) the problem with the words "integrated" and "integration" here and (2) how the scam happens. :) When I want to check that module A talks to module B using interface C, I call that an INTEGRATION test, because it checks ONLY the point of integration between A and B. However, I prefer not to connect A to B in the tests, because that becomes an INTEGRATED test and the whole scam situation applies. Instead, I check that A talks correctly to a perfect implementation of C (collaboration tests) and that B implements C correctly (contract tests), and that's how I avoid the integrated tests scam. When we start using integrated tests to "find holes in our microtests", then we start falling victim to the scam.

Now, if we write an integrated test for a Customer (my Customer tests are usually integrated tests, because that's usually what it takes to make the Customer interested and confident) and that test happens to find a problem that isn't a Customer/Programmer disagreement, then that is probably an integration problem. Probably, A's collaboration tests disagree with B's contract tests about interface C. Maybe we missed some tests on either side of that integration or the tests disagree with one another (we stub foo() to return null, but it throws an exception instead). In this case, it's nice that a test found the error--even an integrated test--but when we start to RELY on integrated tests to help programmers find those errors, the scam kills us exactly the way you describe in your first paragraph: slow, brittle tests smother us. They hinder other practices, such as continuous delivery.

September 2016

Otto

Very illuminating article. I am facing this exact situation on a company I recently joined (which it is why I am posting as a guest). There are only integrated tests, where the first part is setting up a scenario in the database, then executing the whole pipeline and then checking the database again for the assertions. I tried to write some unit tests for an area that I will be working on but it is nearly impossible because of the myriad of events being generated, and the dependencies on objects created by a service locator (factory).
Here's where I am having a hard time explaining why this may not be good - for all intents and purposes "the system works". They see tests being broken when they make changes, so in their eyes the process is working. The product is in production, being used by many people and companies. Any attempt to turn this product into something testable (and with a better architecture) would mean rewriting the darn thing.

Maybe what I am really wanting to ask is - should I just go with it and learn to be productive in this environment (which I think I can go with this flow) or should I start raising the issue and risking becoming the "curmudgeon" in the team?

1 reply
September 2016 ▶ Otto

jbrains

Short answer: Yes, probably.

If they don't feel any pain, then they probably interpret your dissatisfaction as pointless perfectionism.

You might feel the temptation to "show them the pain". I don't recommend it. You could engineer a crisis to encourage them to change something, but that might create more problems than it solves. It is a risky move. Do you believe the situation is so bad that they need this crisis to wake them up? or is the situation "merely" annoying to you and not to your preferences. "Being a team player" (in a meaningful sense, and not in the typical American "can-do" sense) means putting the team's needs above your own personal ones. The team has other problems to solve, at least for now.

Of course, if the bedroom is on fire and they just act like they don't see it, then that's a different story. In that case, throw one of your colleagues into the fire and see what happens. If, however, it's just that the chair is at a strange angle, and they seem to like it that way, then don't move the chair yet.

If you want to slowly, gradually (and silently!) improve the design in a way that doesn't disrupt everyone, then do that instead. Maybe they just don't realize how much better it can be. Maybe they're all in Plato's cave. Maybe you can find some feature area to work on that lets you add something alongside the existing system, so that you can design things "better" without disrupting what is already working. If you do this, and it makes some things easier, and they notice how much easier it is, then they might ask you to show them how you did it. In this case, you're not manufacturing interest, but instead you are reacting to real interest.

1 reply
September 2016 ▶ jbrains

Otto

Wise words, thank you. I have been long enough in this business to choose the battles to fight, and I agree with you. I will do what I can in my area, and hopefully something might transpire during code reviews (yes, we do have pull requests and code reviews, fortunately). I guess it is the old adage, lead by example. It is just too easy and unfair to be the new guy that drops in and starts claiming "everything sucks". Thank you again!

October 2016

guilherme_froes

Could we say that integrated and isolated (state/collaboration/contract) tests correspond to the classic (integrated) and mockist (isolated) tests styles?

1 reply
October 2016 ▶ guilherme_froes

jbrains

Broadly speaking, yes, depending on what you intend to do with that correspondence. :)

October 2017

guest-reader

Insensitive photo of the nuclear bombing of nagasaki that killed many innocent civilians casually dropped into an article about testing???

Next time use the world trade center...

1 reply
October 2017 ▶ guest-reader

jbrains

Fixed. I'm sorry.

Also: http://dhemery.com/articles... particularly the part about looking for three interpretations.

October 2017 ▶ jbrains

tilitaadrianflorin

For me, I see this opinion similar to the ethernal battle between "London school" vs "chicago school". Integrated (as you stated, not integration, which is for a complex system, not modules, more black box oriented) should follow the behaviour between components. It SHOULD focus on contract, not data. As I managed to learn, it's all about the approach. A system under test should be very well defined. You will not need tons of tests. You must decide WHAT do you want to be covered and keep in mind the "pareto" principle. It should not be a race to 100% coverage

1 reply
November 2017 ▶ tilitaadrianflorin

jbrains

Indeed. I dislike picking a number, but when people ask, I often suggest 85%. Somewhere around checking the most interesting 85% of the system, we reach the point of diminishing returns.

Oddly enough, however, you are now saying "integrated" where I would say "integration". :) How does an integrated test follow behavior between components? As I use the term, an integrated test runs many components together and treats that as a single unit (a "black box"). Can you give an example about how to use these tests to check the behavior between components? It seems to me that these tests can only check the collective behavior of the components.

If I swap "integrated test" and "integration test" in your description, then I agree completely. Did you mean that instead?

June 2018 ▶ jbrains

ohnobinki

“You say it yourself, that "unit tests should be a [...] developer tool".
Exactly right: they are, and that's how I use them. I use them
specifically to drive the design and help me identify design risks. They
help me build components that I can confidently and safely arrange in a
way to solve business problems.”

For me, I think this might be getting closer to my trouble with trying to use unit tests in some circumstances.Sometimes when I write a component that I intend to be reused and have a nice API and needs to behave in a particular way and is naturally isolatable, the first thing I do is write a bunch of unit tests because I need to consume that component from other components and have confidence that it will behave correctly. For example, something which converts string representations within my application without talking to external systems.

However, when I am writing glue code which builds an Entity Framework LINQ query from user inputs, I can’t figure out how to make that a unit test properly isolted from a database because I find it to be uncoupleable from the database. I feel like if I wrote a unit test in that situation, my tests would be something like “compare this LINQ Expression tree to the expected one”—when I actually care more about whether or not the expression tree is capable of getting me the right rows from the database than whether or not it looks like what I think will give me the right rows from the database. For this system, if I were to try to decouple components and were to write unit tests for the part that created my SQL/LINQ query, the unit test would not be useful because all it would be verifying is that my method is implemented a certain way. I.e., the unit tests would basically be like using jest’s snapshot feature and running `jest --updateSnapshot` without taking the time to read the changes made to the snapshot.

1 reply
June 2018 ▶ ohnobinki

jbrains

My reply blew up into an article of its own. I published it here:

https://experience.jbrains....

August 2018

wsleyqueiroz

You just throw this argument, but what is your proposal for a solution? Any good advice?
If you do not have any idea of how to solve a problem and you keep arguing, you are not giving anything good for anyone, you are just complaining like an old man.

2 replies
August 2018 ▶ wsleyqueiroz

jbrains

I understand. I have written extensively on this subject for over ten years. You can find those articles here. Click "Past Articles" or "Integrated Tests are a Scam" and you can find them.

I also teach this concept in all my training courses on software design, such as The World's Best Intro to TDD, which I teach in Europe in person every year and for which there is an online training course that you can join whenever you like. https://tdd.training

I also describe my solution in the video linked in this article. I know that not everyone enjoys watching videos, so I hope that the other articles here provide enough information in case you don't want to watch a video.

Other people also write about this topic, so you can search the web for terms like "collaboration tests" and "contract tests" to read more about them.

January 2019 ▶ robpocklington

animehair

" The link is useful thanks for sharing here."?

July 2019

disqus_XMC15OQzEf

I've generally found that arguments such as this work really well as long as you don't have to refactor your code.

Whenever I refactor code, I find dozens (if not hundreds) of unit tests that fail because of failures (particularly of the "Mockito" variety) that say, "This function should have called this other function!" when in reality the answer is, "No, it doesn't call that other function because the entire logic surrounding that operation changed and made the call unnecessary."

And around and around and around it goes. Hour after hour goes by, "fixing" tests, but never fixing a single bug.

Integration tests (to me) have always been of the variety of, "If the user types 2+2 into the calculator, the answer had better be 4. You can refactor all you want, BUT ... if you return anything other than a 4, you broke it."

Now you're right--when one of those tests fails, it tells you nothing about *what* failed. But when they fail, it can usually only mean one thing--YOU BROKE IT.

Integration tests protect your code from introduced bugs. Unit tests protect your code from change.

1 reply
July 2019 ▶ disqus_XMC15OQzEf

jbrains

I understand your experience, I've seen dozens (hundreds?) of other people experience it, but I interpret it differently than you (and many of them) do. In particular, I would add something crucial to your last sentence: "Unit tests (sic) protect your code from change _if you don't pay attention to what they're trying to do tell you_."

I interpret the problem you describe as test doubles doing their job, alerting me to accidental dependencies in my code. Many programmers react to this situation as "the test doubles are killing me", but I disagree: your design is killing you and the test doubles cause you to feel the pain. If you try to live with the pain long enough, then eventually it becomes debilitating, and you end up exactly spending hours "fixing" tests but not fixing problems. What you have is indirection without (enough/suitable) abstraction. More-suitable abstraction helps me create useful, _stable_, hard boundaries between components. These boundaries become so stable that I _want_ to hear some alarm blaring somewhere when one tries to change it, and test doubles act as that alarm.

Sometimes we improve the design by applying the DIP to remove an abstraction, replacing a Supplier with the data it supplies. This removes an abstraction that might have been helpful in the past, but is no longer. I provide a simple example that illustrates this case in the article: https://blog.thecodewhisper...

Sometimes we improve the design by recognizing that excessive mocking (the kind that you probably have in your head) typically highlights duplication that, if removed, leaves behind an abstraction that marks a now-stable boundary between components. I don't have an article with a simple example of this, but it will become part of my next online training course, when I finally get around to producing it.

Yes, integrated (not integration!) tests can protect your code, but at a cost that rises super-exponentially (O(n!)) over time. This is the great think about junk drawers: they're really convenient until suddenly you can't find the one damn thing you're looking for. When that happens, create a new drawer dedicated to that kind of thing and put that thing back in it. That's what I'm recommending here.

Why "integrated" over "integration" in this context? https://blog.thecodewhisper...

As for "you broke it", I don't find that quite as universally true as you seem to. When an integrated tests fails, it often points to something that lies entirely outside my control. Did I break it? Maybe. Maybe not. Maybe someone else broke something. Should I depend directly on that thing that someone else controls? Maybe not. If I apply DIP there, then I create indirection so that I can check my side while making reasonable assumptions about their side; and if that test becomes unwieldy, then I introduce a more-suitable abstraction. Matteo Vaccari wrote a nice example of this so that I didn't have to: https://medium.com/@xpmatte...

When sit-ups hurt, let's not assume that the sit-ups are the problem, because more often the problem lies in the fat around our stomachs or our weak abdominal muscles. Test doubles are the sit-ups and the hardwired dependencies on concrete things are the fat/weak muscles.

November 2019 ▶ wsleyqueiroz

vignesh_nehru

I agree with Wesley,this article is mis guiding .Mainly with Agile project Cycle and short releases also with most companies moving away from Monolithic ESB gateway to more kind of micro services architecture where services deployed into several component level groups and contracts are tested individually to make sure component level ,unit level tests works .
Apart from this when those individual components are build together we need at least some level of integration tests either with Mock or without mocked data to tests the components together .So service to service level integration tests are very important in business.

1 reply
November 2019 ▶ vignesh_nehru

jbrains

I find this comment very interesting! You say that my article "is misguiding", but: (1) you say that companies using microservices architecture are testing components individually and checking contracts between components; and (2) you say that we need "integration tests with mock or without mocked data" to tests the components together.

So where is the misguiding part?

In point 1, you say that companies are doing exactly the thing that my article encourages them to do: to check components individually and to check carefully the contracts between components. I have noticed, too, that especially since Martin Fowler's article on Consumer-Driven Contract Testing, more companies are taking that advice seriously. They use tools like Pact to document and check the published behaviors of distributed services one by one, rather than all (or many) at once. I'm saying the same thing, except that I also do it within services: I think of it as "microservices in a single process".

In point 2, you say that we need "integration tests with mock". Yes! An "integration test with mocks" is exactly the same thing as the collaboration tests that I'm talking about here. An integration test (check the points of integration using something like mock objects without running the whole system together) and not an integrated test (connecting the production implementations together). This confusion is why I stopped saying "integration tests are a scam" in 2010. See https://blog.thecodewhisper...

I suspect that you and I agree quite a lot.

1 reply
November 2019 ▶ jbrains

vignesh_nehru

mm..yeah i am more clear.I think you need to change the title integration tests are a scam,In my company each ops review we have action item we need to increase integration level tests.So may be you could reword the words a bit.Thanks for the explanation. Even unit tests dont test permuations they only test the logic .TDD is a good way test the code before even writing it fully.But my experience writing 100% TDD tests are good for advise but with 2 weeks sprints it is really challenging to achieve 100 percent intergration testing

1 reply
November 2019 ▶ vignesh_nehru

jbrains

I changed the title in 2010, because the phrase "integration tests" was not accurate. I changed it to "integrated tests" to clarify my position as I have explained it to you.

As you say, it is challenging to achieve 100% integrated testing. (I assume that you mean this.) I agree, so I don't try to achieve it. Instead, I verify that each layer can talk to the next layer correctly using special kinds of unit tests. I can do this at every layer from the beginning of the system to the end. I have been writing about these techniques and teaching them for 15 years. It is difficult to explain in only one paragraph. :)

October 2020

jarmandocrlez

I got to collaborator tests (test roles) from pursuing a great software architecture.
I had always ignored frameworks like spock and the like but now I am blown away on how many things *click* with this when your architecture is well designed.

I can understand why this tests can tell you if your code is not well constructed and I find a little disappointing that advocates of collaborator tests aren't talking more about Object Oriented Arquitecture.

I invite you to hear from Jakub Nabrdalik he shares his implementation of
Hexagonal Architecure https://www.youtube.com/wat...
Behavior Driven Development https://www.youtube.com/wat...

Those two concepts should be discussed together more often (at least on OO languages)

1 reply
October 2020 ▶ jarmandocrlez

jbrains

Thank you for these references!

Yes, I teach in my training about how architecture improves when we focus on collaboration and contract tests. We can start from both sides: strong architecture towards nice tests or nice tests towards strong architecture. I like to teach it from nice tests towards strong architecture so that "good design" seems less mysterious and easier to achieve.

The book _Growing Object-Oriented Software: Guided By Tests_ provides the first long-form discussion of this connexion. It influenced my thinking a lot.

October 2020

disqus_N3lCq2cl3c

The article talks specifically about case where someone has written integration tests to cover all functional logic.

Unit tests would not cover infra, configurations, deployment issues and validate our assumptions about external system (that we make white writing the stubs).

How can we trust our CI/CD pipelines without validating our assumptions about API contracts ? We definitely will need some “system test” or “end-to-end test”.

I think this article gives a more pragmatic look on why we need integration test https://martinfowler.com/bl... and how contract tests might be the solution you are looking for if your data comes from external system : https://martinfowler.com/bl...

1 reply
October 2020 ▶ disqus_N3lCq2cl3c

jbrains

Yes. This article acted as the manifesto and I never intended for it to describe the entire picture. I wrote it purely to draw attention to the problem as I saw it. As the years go on, I describe how I limit integrated tests to the boundaries of the system. I do this more aggressively than most people.

Even so, we don't need end-to-end tests _of our entire system_ to validate assumptions about dependencies on external systems. I avoid this by limiting integrated tests to the boundary and relentlessly removing duplication at that boundary. This results in a minimum of integrated tests. That itself isn't always my goal, but I'm glad that I know exactly how to do it when I need it.

More articles in this series reach similar conclusions to the ones that Martin does.

December 2020

trentondadams

@jbrains:disqus I understand the exponential nature of branches to the power of the layers. What I don't understand is how this test methodology reduces the number of tests you have to write. The way you describe it in the video doesn't seem to decrease the number of tests more than marginally, when I draw out a tree of branching pathways.

The main benefit of mocking/stubbing of tests that I see, is that you're testing closely to the component, rather than *needing* to consider all the implications of all the pathways underneath that component. i.e. you're more likely to think of all the potential use cases when the testing is narrowly focused on just that component.

1 reply
January 2021 ▶ trentondadams

jbrains

In essence, two things happen:

1. products turn into sums
2. an object in the tree never knows about objects lower in the tree than its (direct) children (because interfaces hide those details)

If you already do (2) quite well, then it matters less. If trees of collaborating objects have a height of no more than 3, then the "combinatoric explosion of tests" argument loses its teeth, and instead the arguments for locality of change take over.

1 reply
January 2021 ▶ jbrains

trentondadams

Interesting, I didn't do very many layers in my trees.

Have you seen any of Vladimir Khorikov's stuff? His views seem to be diametrically opposed to yours. He strongly believes in leaving real objects in the dependency tree, with the exception of volatile dependencies (databases, email, other systems, etc). He also thinks mocks should almost never, if ever, be used. In reality, he creates his own mocks for re-use purposes, but if re-use isn't necessary I find that a simple mocking library like mockito is pretty succinct.

p.s.
I did note you wrote an article (don't recall which one) indicating that the need for mocks can be an indicator of badly designed code; which I too have found. But that doesn't mean you don't ever use them, and instead write classes to handle the mocking cases.

1 reply
January 2021 ▶ trentondadams

jbrains

I haven't seen Vladimir's work, but based on your description, I don't think we disagree at all. What you/he label "volatile" dependencies sound like Service implementation dependencies ("Service" in the DDD sense) as opposed to Values (again in the DDD sense). In essence, I freely put interfaces (and therefore test doubles) in front of Services and rarely in front of Values. We might differ in intent or framing, but otherwise it sounds pretty similar.

You might be referring to "Beyond Mock Objects" where I show the refactoring from a function that talks to a supplier of a value to a function that simply accepts the value as a parameter, which eliminates a stub (by eliminating a possibly unnecessary level of indirection). I tend not to think of test doubles as a sign of a design risk--it depends more on the presence of irrelevant details in a test, and needing to stub or mock something can be irrelevant just as the implementation details of a collaborator can be irrelevant.

December 2021

icetbr

I think what @trentondadams is alluding to is that Vladimir Khorikov and Eric Elliot, just to name a few, adhere to the Classic school, and you and Mark Seemann to the London school. But hey, don’t quote me on that, I don’t mean to put words on other people’s mouth, I just try to learn from everything you guys write about.

It’s an age old debate, with many nuances, such as your focus on contract tests. The classic approach think it is a good thing not to mock a pure call to another function because you’re creating a safety net the more you call it.

Both schools favors the test pyramid, so not having unit tests its not what is at stake here. Unit tests drive quality code. It’s how to test unit composition that is more of a hot topic.

1 reply
December 2021 ▶ icetbr

jbrains

First of all: broadly yes, with a few precise bits to add.

Indeed, I don’t adhere to the London school, although I definitely studied there extensively. :slight_smile: Moreover, I think there’s a misconception about the London school: it suggests using test doubles freely to check interactions with side effects. This is different from “test doubles are good, so use more of them”, which many people seem to think they are claiming.

As I interpret things, the London school tolerates side-effects and suggests testing them with test doubles instead of by duplicating tests at ever-bigger scales of the system. It doesn’t demand inverting dependencies on side-effects in the way that the FP style seems to do. It doesn’t give in to integrated tests in situations where we decide to leave the side-effect where it is. The Classic/Chicago/Detroit school seems to tolerate duplication in tests at different scales of the system while eliminating test doubles, whereas the London school tolerates test doubles while eliminating duplication in tests at different scales of the system.

As you say, both schools prefer microtests. They merely favor/teach/recommend different coping strategies.

My current position is this: You Don't Hate Mocks; You Hate Side-Effects - The Code Whisperer

Broadly speaking, I don’t mind injecting an abstraction to insulate the Client against a problematic side-effect, although when I do this, I often eventually invert the dependency, moving the side-effect up and (more) out of the way. This has the result of gradually reducing the number of test doubles in my tests, even though I never have the explicit goal of eliminating (all) test doubles.

Even so, I recognize problems with certain tests containing test doubles, such as too many expectations or stubbing a function to return another stub to return a desired value. I interpret those as design risks and tend to refactor to eliminate them. However, when a test stubs 1 query and expects 1 action, I don’t rush to refactor this to a Logic Sandwich. I feel comfortable doing that when it becomes helpful in support of some other way of improving the design.