A Developer Stance on Useful Test Cases

Seeing questions like "How do I test private methods?" or "Should I test private methods?" periodically appear in various communities made me think I should organize my thoughts on the topic someday. Quite some time has passed, and now I'm writing about it. The topic itself is relatively simple, but opinions seem to vary, especially among developers internationally. This issue is actually similar to the question: what constitutes an effective test case (hereafter TC)?

Private methods are considered from an object-oriented perspective, but functions hidden within closures, accessed only by exposed functions, fall into the same category. We're talking about encapsulated things hidden behind a module's external interface. Henceforth, I'll refer to these as internal implementation. Should internal implementation be tested? To get straight to the point, "No." Or more accurately, "Yes." It might sound like nonsensical wordplay, but it's true. One should avoid writing TCs directly for internal implementation; testing should only occur through the public external interface. This means, ultimately, they should be tested.

Writing TCs for internal implementation "temporarily" to automate repetitive tests is acceptable, but eventually, only TCs testing the external interface should remain.

TCs might be written for the present, but the TCs that are kept should be left for the future.

What Matters Most

From now on, let's broadly define "testing" as automating the tests of a target module by writing TCs. I won't list all the benefits of testing, as that information is readily available. Opinions on those benefits vary, and I won't distinguish between unit tests and integration tests here. I'm simply referring to tests written as code and automated. Such testing may or may not utilize the TDD methodology.

It has been about 7 years since I started seriously using TDD or test automation. My thoughts on testing have become clearer over time. The sole purpose of testing is to provide help to the developer (or project). It's an obvious statement. Frankly, however a TC is created, if it helps the developer writing it in any way, I think that's fine. Therefore, whether to test internal implementation is up to the developer's discretion; there's no need to ask others if their opinion is correct. So, if it helps the developer personally, they can test private methods. Opinions from Kent Beck, Martin Fowler, or Uncle Bob aren't necessary if it helps me. However, that's only when developing alone. From the perspective of a project developed by multiple people, the story changes.

I sometimes write TCs for internal implementation too, but only temporarily, for the present moment. I might briefly write a test to automate the testing of a divided routine. In fact, following the TDD process often leads to situations where what was thought to be a public interface ends up becoming part of the internal implementation. Interfaces might even be removed or merged. As the module changes, TCs that directly test internal implementation must also be deleted or absorbed into tests for the external interface.

The Self-Satisfaction Trap

Directly testing internal implementation diminishes the future value of TCs. Ultimately, it's not helpful. Writing TCs but being trapped in the Red-Green-Refactor cycle proposed by TDD, perhaps addicted to the dopamine hit provided by reaching 'Green' without understanding the true benefits TCs offer, should be avoided. Creating a TC doesn't automatically mean it's beneficial.

As you can experience yourself, directly testing internal implementation is much easier than testing through the external interface. Testing internal implementation also feels somewhat intuitive and satisfying, providing a sense of accomplishment. One might be tempted to write TCs easily just to see 'Green' quickly. Seeing a large volume of such TCs feels rewarding. But simply writing TCs doesn't guarantee help. Fewer TCs are better. The higher the efficiency—achieving maximum effect with minimal TCs—the clearer the value of testing. In the opposite case, numerous TCs will slow down the project's agility and hinder progress at every turn. This state inevitably leads to resentment towards testing.

The more unnecessary TCs there are, the lower the value of genuinely helpful and essential TCs becomes, and the project's trust in its TCs hits rock bottom. The moment TCs lose trust, they become obstacles worse than legacy code. You reach a point where you can't stop writing TCs, nor can you easily delete them, yet they provide no help... or you mistakenly believe they do.

The Relationship Between Modules and Test Cases

Modules within an application, each bearing responsibility and collaborating, may have some functions changed or be replaced by better-performing modules. They are like cogs in a large machine.

An old cog can be replaced with a lighter, faster, or more stable one. When this happens, the machine the cog belongs to only needs to know the shape of the teeth to ensure it meshes correctly; it doesn't need to know the material, color, or brand. Anything with matching teeth will suffice. Therefore, before running the new cog, it's enough to test whether it meshes properly with the other cogs it interacts with.

TCs guarantee that when a module's functionality is partially changed or added to, it can still faithfully perform its responsibilities within the existing system. And when a module's internal structure is refactored or entirely rewritten, TCs provide reliability that the module will mesh well within the larger application system.

Therefore, TCs should only verify that the target module can mesh and run correctly within the system, regardless of what it is. It doesn't matter what the cog looks like, whether its inside is hollow or solid, or if it's like a Matryoshka doll with cogs inside cogs. The user of the cog only needs to know the publicly exposed teeth shape to use it in the machine. That's the cog's role and responsibility. TCs are also users of the module. TCs should test the module's unchanging responsibilities, not the specific, changeable module itself. If the external interface, through which the module receives messages to perform its responsibilities appropriately, changes frequently, it's a sign of flawed design.

A module can always be replaced by another module with a different internal structure but the same external interface—meaning the same role and responsibilities. The TCs testing the module only need to know its abstracted interface. This ensures the module's polymorphism and autonomy. Doesn't this sound familiar? It's the Dependency Inversion Principle (DIP) from SOLID. While technically different, the purpose and effect are the same. TCs should also be viewed from the perspective of a user of the module, and TCs should not depend on the module's concrete details. They should depend on abstraction. Abstracted responsibilities should be tested, not the module itself. This makes TCs flexible and ultimately allows them to test any module.

Nevertheless, if you feel an internal implementation needs to be tested, it's highly likely a signal that this internal implementation should be extracted into a separate module with its own independent responsibility. By extracting the internal implementation into a class or module and changing the structure to use that module, you can write TCs for the extracted module and test it via its external interface. This is a good example of internal implementation being transformed into an external interface.

Think of a TC as a user of the module. A user who knows nothing about the internal structure, only what's publicly exposed. That's the relationship between the TC and the module, and the TC itself is a module that has the module under test as a dependency.

What Makes a TC Useful?

Useful TCs can be considered from present and future perspectives.

TCs written for the present automate the testing of the code being developed. This reduces the time spent adjusting input values and checking results. In the process, one might temporarily test a method corresponding to internal implementation (or what was initially thought to be an external interface), simply because it helps from an automation standpoint. And as TCs accumulate, they prevent subsequently written code from breaking previous code due to side effects. While TDD doesn't directly aid in the structural design of the application, it does help in effectively designing the necessary interfaces for each module's roles and responsibilities within an already established collaboration structure. Through this process, the internal and external boundaries are clearly defined and refined. Writing TCs first means starting development from the perspective of using the module.

TCs for the future should serve as excellent documentation, not only explaining the target module's roles and responsibilities but also demonstrating specific usage patterns. Therefore, descriptions should be written so that simply reading them reveals what the TC expects and how. Concise yet comprehensive. Furthermore, when the target module's functionality is changed or added, TCs should automatically verify if the changes meet existing specifications, minimizing problems caused by the modified code. They should function correctly even if the target module is entirely replaced.

Ultimately, whether using the TDD cycle or not, to get sufficient return on the time invested in writing TCs, those written for the present must evolve into TCs for the future as development progresses. Of course, one can write them for the future from the outset. The point is that as development proceeds, unnecessary TCs should be deleted or improved into better tests.

If a project adopts TDD or test automation, TCs, like modules, must be continuously improved and refactored. Modules can have their internal code made faster, simpler, and more understandable while maintaining their external interface precisely because TCs are supporting them. Just like modules, perfect code cannot emerge from the start, so TCs also need to be improved to become faster, simpler, and more understandable. A TC is a module that tests another module. It should be considered at least as important as a module. Just creating TCs doesn't mean they will eventually be helpful; they can actually be utterly useless and obstructive. One must constantly contemplate how to create helpful TCs and think about better testing methods.

I believe there's no "ultimate best way" to create useful TCs that works for every project and situation. There's only the best approach for each project's context. At the very least, knowing what help TCs provide or what kind of TC is helpful is the minimum preparation and starting point for creating TCs suitable for each project's situation.

Test Internal Implementation Only Through the Public Interface

As mentioned several times, TCs should not directly know about internal implementation. Testing must occur solely through the public interface. Internal implementation will be used by the public interface in some form. If not, that code should be deleted. While directly testing internal implementation might be easier, we don't write TCs for the sake of writing TCs. We write TCs to gain benefits through test automation.

In the debate about whether to test internal implementation, one argument is that testing internal parts via the external interface makes it difficult to grasp the completeness of the testing, because it's hard to know how much of the internal implementation is being tested. Let me tell you something here. The metric created for exactly this purpose is code coverage. It wasn't created to monitor whether developers are writing TCs properly.

It's a metric showing the quantitative quality of tests, referenced when it's hard to judge the test scope of a module solely from TCs, such as when testing internal structure through the public interface. It does not guarantee qualitative quality, as one can achieve near 100% coverage even with utterly useless TCs. While coverage doesn't guarantee the qualitative quality of TCs, it provides a measure to check which parts of the module have been tested by TCs and where testing is still needed. No wonder it's called coverage. It's quite... something... when people ask with ambitious eyes what percentage of coverage one aims for when conducting a project with TDD, considering the intention behind the question. For developers seriously adopting TDD or test automation in their projects, a more interesting question than another project's coverage percentage would be how testable and untestable aspects were separated. Here, "how" can be considered in terms of criteria, methods, and agreements, but I won't delve deeper into this part.

In Conclusion

Ultimately, my view is that TCs should be treated the same as the modules they develop: as modules responsible for testing other modules. If you think of the relationship between a TC and a module as a relationship between two modules, it becomes clear how a TC should handle the module under test.

From a front-end developer's perspective, test automation has been quite a challenging task. TCs written in code are suitable when inputs and outputs can be expressed as codeable data and when the output is distinct. For developers dealing with the visual domain, where results are side effects or difficult to verify as data, it was an area requiring much consideration. Fortunately, along with the advancement of the front-end, the field of front-end testing has also developed significantly, so nowadays the challenge is rather choosing what or which method to adopt.

Naturally, there will be no final, definitive version in this field, and testing tools and methodologies, as a domain of software projects, will undergo continuous change and improvement. Amidst the changes, we must not forget that what truly matters is the efficiency and utility of tests. We must look at it objectively. Tests must be genuinely "helpful" to be helpful. Something that "seems helpful" might actually be detrimental.