I see a risk in this function: it does the work of two layers in the same place. Namely, it turns application-level data into domain-level data (layer 1), and then does the domain-level work (layer 2). These two kinds of behavior are completely independent from each other: finding a vessel by ID and creating a container from container_data don't care about what happens to the resulting vessel and container; and loading a container onto a vessel doesn't care where the vessel and container came from. When you put these two layers of behavior in the same place, however, you create an integrated cluster with the usual risks: as each layer becomes more complicated, the effort to test them both together increases combinatorically (worse than exponentially).
Even though the risk is there, the current cost is quite low, because there are only two layers and they're extremely simple. I would probably just live with it until the code starts to grow. Each of the three pieces of behavior is very simple: find vessel, create container, load vessel onto container. If I trust all three pieces very well, then I might not bother microtesting them when I put them together. One or two Customer Tests (Acceptence Tests) would probably give me enough confidence.
This is a version of the typical functional programming quandary: if I trust function composition (the standard library provides it) and I trust each function individually, then how much do I need to test composing them? I probably expect the composition to do the thing right (the pieces will fit together and nothing bad will happen) and I would only check if it did the right thing (did I choose the right functions to compose?). In this case, it might only take two Customer Tests to convince the Customer that we correctly load containers onto vessels: one case for the happy path and another to show how we handle not finding the vessel. (I can't tell from this code whether creating the container can go wrong.)
The big risk lies in ignoring the design when the domain work becomes more complicated than simply `vessel.load(container)`. If we don't relentlessly push all the domain work down into the domain layer of the design, then we end up with some domain behavior in `load_container_onto_vessel()` and some in the "proper" domain layer. This would force us to test some of the pure domain work in the place where we have to worry about where we got the vessel and container, even though the pure domain work probably operates directly on a vessel and a container without worrying where they came from! (This is an "irrelevant details problem" in the tests... why do I have to stub the vessel repository to return this vessel, when I just want to instantiate the vessel and call methods on it?!) As long as this function always simply calls exactly one domain method, the cost remains low, even though the risk I've identified remains, and "I have to stub vessels to return a mock vessel" remains the symptom of that risk. As long as that chain of mocks doesn't become more complicated, I can live with the result. I simply prefer in most cases to avoid it at all.
The fix, in this case, seems crazy, even though in principle it's "correct". I recommend getting some coffee before you read on.
The (risky) duplication in this case really comes from the overall embedded algorithm of "transform application data into domain data, then run a domain script". This structure ends up copy/paste duplicated in all the Transaction Script/Controller functions that we use as Adapters between our Application Framework and the Domain. It becomes boilerplate, probably because removing the duplication just looks too strange. As long as it is boilerplate, and as long as all the interesting details remain in small, well-tested functions, I might not bother testing the Controller layer. They ultimately all have the same structure: call this group of functions (each transforms app data into domain data), then call this one function with all those return values as arguments (run the domain script with all the domain data we just "transformed"). If you extract this into a function, then you can test it with mocks (it's a big Template Method/Function) and it's pretty straightforward: error handling is the only complication, and even that can be as simple as "decide whether to throw a 'data transformation error' or a 'domain script error' based on where you caught the underlying error." It feels like the beginning of a universal application controller library. Then every controller looks like the entry point of a microapplication: it "composes" a bunch of "transform app data" functions with a single "run domain script" function. Code has turned into data and there's nothing left to test. It'll just work. (I wrote more about this in "Designing the Entry Point" at http://blog.thecodewhispere... )
(I should say that one of my training students recently showed me an example of this kind of design. I really like it. In the place where they register controllers with the application framework, each controller is a little composition of pieces, each looking like the entry point of a microapplication. The equivalent of your `load_container_onto_vessel()` controller function wouldn't even exist, because there'd be no need to name it. It would just be a "composition"--in the OO, not FP, sense--of a bunch of app data transformers and a single domain model function, and that domain model function would be a lambda wrapping vessel.load(container) or even just the function Vessel::load(). This student had no specific controller-level tests. There was no need. When they put the pieces together to expose them to the application as a controller, They Just Worked.)
If we're not prepared to go down that road and actually extract the duplication so that we can test it, then we have three choices (that I see off the top of my head): duplicate our tests (in the case of integrated tests), stub an object to return a mock so that we can check it (this leads to another kind of duplication between all those tests), or trust the boilerplate (this works only if we diligently make sure that all additional behavior gets squeezed out of the controller into the pieces). This is the problem with the Mediator or Template Method: either you test all the steps together as one big cluster or you make all the steps purely abstract so that you can test the algorithm without testing the steps. Which of these options fits your mind better?
Of course, most people try to choose the last option and just trust the boilerplate. And that works... for a while. And then some programmer somewhere lets domain work creep into the controller, because, you know, we just have to get this hotfix out the door. And then some other programmer sees that and decide it's OK to do it, too. And then... and then... and then... eventually some Senior Programmer decides that enough is enough and decrees that we have to test it all, but it's all a big tangle, and nobody wants to write those tests... I guess you've seen this movie before. :) Frankly, I'd rather just do the crazy thing, because after going through this loop a few times, the crazy thing seems a lot less crazy.