Getting out of your ruts

This post is part three (and the last) of reflections on a conversation I had with Chelsea Troy about our testing and development heuristics.

I asked Chelsea, how do you get people to be less clingy about holding on to tests that don’t add value?

She speculates that this will require getting people to unpack the psychological baggage they hold around the value of their tests or code. Once people have written tests or code, they don’t want to get rid of them because that is the primary visible evidence that they’ve done something. Developers rarely get recognized for removing unused code or throwing away brittle or inconsequential tests.

So how can you turn this attitude around? Let’s be blunt. Developers do lots of things to improve their ability to write and maintain code. They shouldn’t be penalized for these activities. Professional software development isn’t just cranking out code. 

A Heuristic for Switching Things Up

I learned this guiding heuristic from a colleague, John Schwartz: If something you are doing isn’t buying you any new or useful information, stop doing it. Instead, move on to something that will. And, if that doesn’t work, try something else. Don’t settle.

John applied this heuristic to every aspect of software development and management. Applied to testing: If tests always pass (or frequently fail because of nitpicky things that don’t matter), throw them out. When you fix brittle UI tests, only to find that they fail with the next CSS style change, recognize that you aren’t making forward progress or learning anything new. Repeatedly fixing those brittle tests is simply busywork.

You are better off acknowledging that you are testing at the wrong (too low) a level and that to buy information you should be testing differently.

So, what happens when you throw out those tests?

Arguably, you will have less clutter and useless information to wade through when something breaks your build. And if you do miss some of those tests, you can always bring them back. Or decide to run them periodically.

Letting Go

I have seen clients hold on to coding and testing practices that slow them down. At one client, developers frequently left unused code as comments so they wouldn’t forget about it just in case it might be needed. When asked “Why not just use version control?” they worried that they would forget about the code. But how long had that code been frozen in comments? When’s the last time you unthawed some commented code and used it as is?

In another situation, I once reviewed some code for a function which had a parameter that was never used. At one time it was; but now that code was obsolete. Every new team member struggled to understand the reason for that parameter and what the code that handled it did. I tried to convince them to remove this useless code. It would take less than 20 minutes. I’m not sure whether they did. But at least I got them to recognize that this dead code was an impediment. Before I pointed out this out, they had just considered the time it took to understand that code to be an annoying rite of passage for new developers.

Documenting Your Actions

Chelsea is experimenting with ways to simplify and minimize code and tests. The more stuff you have to wade through, the harder it is to maintain and debug your code. Reducing extraneous stuff–whether it is overly complex code, unused code, or awkward tests–makes life easier. She is also trying to get her team to write commit messages to document these efforts. She wants to leave a documentation trail. Her heuristic: When you pare down of code and tests, document both what has changed and why.

Using commit records as documentation won’t work well unless everyone follows conventions. But is it realistic to expect everyone to be as diligent as Chelsea at documenting their work? Developing software is a team sport. Unless we agree on how to work together, and then follow through on our agreements, results will be inconsistent. Even so, people still bend or disregard working agreements. I suspect there are several reasons for this: maybe they didn’t fully buy into the practice, or maybe they didn’t understand or appreciate the reasons behind what we agreed to, or…

When this happens, whatever the reasons, something needs to change. And we won’t know exactly what’s going on unless we talk to each other.

Chelsea suggests that we could benefit greatly from uncovering each other’s assumptions instead of simply letting things slide. How can you do this? Be direct: Ask people to think really hard and to openly share their thoughts and values. Then decide what actions to take.

Getting Maximal Value out of Documentation with Minimal Effort

Chelsea sees a similar a problem with software documentation. No one wants to write it. But they do so because they think they should. But they don’t bother updating existing docs. So, documentation gets out of date and they end up with lots of inaccurate versions. I’ve seen this problem many times throughout my consulting career. As I know I’m one of those rare individuals who actually likes to write documentation… I know that you can’t fix this problem by telling developers to just try harder. (Again, rarely do developers get rewarded for writing documentation).

One useful heuristic for mitigating your out-of-date documentation problem is to create “living” documentation (instead of static documentation that is written once and never updated). Cyrille Martraire has written a great book, Living Documentation, that contains many heuristics (written as patterns), and examples demonstrating code to create it. Living documentation is generated by executing scripts that extract facts from your codebase or running system or repositories. Then, using that information, those scripts dynamically generates or updates your documentation. A central value underlying Cyrille’s heuristics is to make documentation integral to software development, rather than it being a separate activity. If you connect your documentation directly to your code, it will always be up-to-date with the latest code. He suggests starting small, then growing tooling and scripts as you find the need. The guiding heuristic behind all of Cyrille’s heuristics is: Don’t let software documentation become stale; proactively generate documentation with information that can be automatically refreshed.

Efforts to create living documentation, streamline software development, or making software codebases more sustainable, are often undervalued. They shouldn’t be. My guiding heuristic: Cut out the crap so you can focus your development attention on the important stuff.

Our Heuristics are Shaped Through Experience

This post is part two of some reflections on a conversation I had with Chelsea Troy about our testing heuristics. You may also want to read part one and Chelsea’s writeup.

I shared with Chelsea how my Smalltalk development background contributed to my testing and design heuristics. I was involved in the early days of Smalltalk at Tektronix as a principal engineer in the AI Machines group. After a yearlong stint managing the software group through product introduction, I switched back to full-time engineering. Among other things, I added features to Smalltalk including color graphics, fonts, and support for low-level OS calls. All our code was visible to our users, and we had a strong engineering culture.

I learned how to work effectively in the Smalltalk environment by studying existing code, figuring out what it did, and understanding its coding and design style. I also observed more experienced Smalltalk programmers. Kent Beck, along with Ward Cunningham, and other Tek Lab’s folks were some of the very earliest Smalltalk application programmers. Ward and Kent worked together, developing prototypes and exploring what Smalltalk was good at. Many ideas about Extreme Programming (and TDD) and object-design can be traced back to these programming experiences.

The Smalltalk image was always running. It contained the entire development environment and had a browser where you could look at existing code and add your own. Much of my time was spent experimenting with and reading existing code, then trying to fit my new code in. The code I wrote was a mix of new classes as well as extensions or modifications to existing ones. To show someone else how to use your code, you’d create a workspace—a scratchpad window—and put snippets of commented code for them to read, edit, and evaluate it. By convention, methods were categorized (see the third pane of the System Browser below, which shows the categories for the abstract Collection class). Other classes also had a testing category, but it was not used for what you might think! The testing category for the class Collection included methods for querying (i.e. testing) its contents.

What a typical Smalltalk-80 system image looked like courtesy of https://randoc.wordpress.com/2018/07/20/tektronix-smalltalk-workstations-4400-and-4300-series/

So how did programmers test Smalltalk code? I didn’t have any conventions to follow for organizing my tests (and not inconsequentially, leaving test code around would clutter up the Smalltalk image). Since I could highlight code anywhere and execute it, I tested code as I wrote it. I could step through code with a debugger, change it on the fly, and run it again. I tested my code into existence, but didn’t leave around any tests.

In an article I wrote about Color Smalltalk here’s how I described this experience: “… the workspace, lets programmers experiment with code without actually incorporating the experimental code into the valid, running environment. A programmer can write, execute and debug code in a workspace, then pull it into the Smalltalk application when the new code is tested and operational.”

While this statement is mostly true, it is also misleading. Anything I did as a programmer would add more objects to and change the state of the running Smalltalk image. Code you executed in a workspace changed the image (sometimes with catastrophic results, especially if you were tinkering with basic low-level system functionality as I was). But the Smalltalk environment and tools made it so easy to back up a step or two, revise your code, and try again, even with code that mucked with low-level stuff.

Kent’s Smalltalk experience heavily influenced how he thought about incremental development. But when it came to testing, I suspect he tried to boil down his Smalltalk experiences into practices that would be more “failsafe” for programmers who didn’t work in such a dynamic and forgiving development environment. Kent’s thinking about testing has evolved since he wrote his books. In an interview with Andrew Binstock in 2019, Kent and Andrew chat about this evolution:

Binstock: Do you still work on strictly a test-first basis?

Beck: No. Sometimes, yes.

Binstock: OK. Tell me how your thoughts have evolved on that. When I look at your book Extreme Programming Explained, there seems to be very little wiggle room in terms of that. Has your view changed?

Beck: Sure. So there’s a variable that I didn’t know existed at that time, which is really important for the trade-off about when automated testing is valuable. It is the half-life of the line of code. If you’re in exploration mode and you’re just trying to figure out what a program might do and most of your experiments are going to be failures and be deleted in a matter of hours or perhaps days, then most of the benefits of TDD don’t kick in, and it slows down the experimentation—a latency between “I wonder” and “I see.” You want that time to be as short as possible. If tests help you make that time shorter, fine, but often, they make the latency longer, and if the latency matters and the half-life of the line of code is short, then you shouldn’t write tests.

Binstock: Indeed, when exploring, if I run into errors, I may backtrack and write some tests just to get the code going where I think it’s supposed to go.

Beck: I learned there are lots of forms of feedback. Tests are just one form of feedback, and there are some really good things about them, but depending on the situation you’re in, there can also be some very substantial costs. Then you have to decide, is this one of these cases where the trade-off tips one way or the other? People want the rule, the one absolute rule, but that’s just sloppy thinking as far as I’m concerned.

Practicing TDD ensures developers write tests. The underlying value heuristic is, “any tests are better than no tests.” But if we take Kent’s more recent thoughts to heart, we shouldn’t test without thinking through some consequences. Kent’s more recent guiding heuristic: Test when it matters and when you need a safety net. Think through both the benefits and costs of testing. If you are exploring, don’t let testing slow you down.

There is no single “definitive” answer to the question, “when should I test?”

Develop Test Strategies Based on System Context

Chelsea asked, “So, how do you determine what kinds of tests to write?”

I don’t have a definitive answer to this question, either. So, I shared a few stories. I’ve worked with clients unschooled in TDD. They write code, test it a little, and then throw these initial tests away. They build successful products. The tests they tend to keep are regression tests, tests that demonstrate a quirky bug that has been fixed (and to ensure that it stays that way). It’s always a bet.

If code is stable, and the tests always pass, running tests all the time isn’t buying any new information. Even worse, passing tests can give you a false sense of security about your code’s quality. So why do we write tests?

I like to focus on writing tests that check that stable (relatively unchanging) system expectations still hold, and that demonstrate ways new capabilities can be safely added. I also try to write tests that capture expectations I have around my system’s behavior.

For complex systems, though, this can be difficult. Unforeseen side effects can pop up in strange places (changing code in one place unexpectedly causing other code to break in a distant part of the system). It’s impossible to test for every possible edge case and you don’t know all the dependencies.

I remember Kent Beck telling this oddball story of writing his first TDD code when he went to work at Facebook. His code, which passed all his tests, suddenly caused other tests for other parts of the system to fail. Rather than revert his code, those familiar with the system decided to throw out those failing tests. Seems weird, but they knew those tests were brittle, and making some wrong assumptions. When you find problems with tests, think carefully about whether it is appropriate to add additional tests to ensure that things don’t break, whether your existing tests are brittle, or whether your assumptions are wrong.

Data Scientists Have Different Testing Values

When you need to process massive amounts of data, and the code for processing of that data is predictable, there is little value in repeatedly running functional tests that always pass.

I worked for a number of years for a client doing healthcare analytics on patient medical data. Sometimes, their heuristic for verifying an algorithm would be: test that the new code works by comparing its results against code written in an entirely different system/programming language. They would take a massive cut of the data and run it through and compare the results.

Another heuristic they sometimes used to test new algorithms and capabilities was to run their code and compare their results against those reported in published research papers. Where the results differed, they need to reason about those differences (sometimes it was a problem with their code; at other times, it was that their code was more accurate at choosing cohorts or their statistical algorithms were better). Some person needed to critically analyze the results, reason why the discrepancies were there, and determine what, if anything, to do about them. This process couldn’t be automated.

Chelsea works with data scientists at Mozilla on sanitizing personal data for searches. The rules for this are complicated, language-specific, and sometimes people enter search terms in more than one language. She finds data scientists don’t share the same testing values as many software developers do.

Data scientists make informed assumptions about aggregated data. If those assumptions don’t hold, they reassess the data processing rules and revisit their assumptions. To them, testing is insufficient to ensure system quality. Monitoring actual system behavior against expected data characteristics, however, is critical. When the data characteristics being monitored fall outside of expected tolerances, this triggers developers to look into the situation. Developers then run some automated tests to determine if something is wrong with their code. If those automated tests pass, they then call on a data scientist to analyze a sample of the data and decide what to do. Something has changed and there likely needs to be some change to either in the assertions about the data’s characteristics or the rules for handling it.

Trialing new Heuristics

Chelsea and I appreciate what we can learn from people with different backgrounds: data scientists, QA folks, testers, and new colleagues. There are many different ways to test and design software. And if I don’t hold onto my preferred heuristics too tightly, I might learn something.

But how do I decide when to try out some new heuristics or to stick with what I know?

If things are going well, I’m not as motivated to try out new ideas. I need a small nudge. I’ll try new some new-to-me heuristics if I feel I have some wiggle room. Let me experiment, practice, and think through the consequences. Give me a bit of time to let new values and practices soak in.

When I start to work on a new system or folks from different backgrounds, that too, is another opportunity to try out new ways of working.

But under pressure, I find myself narrowing my focus and sticking with what I know best (even if it is a poor option). So, if I can, I catch myself and take a small step back from problem solving. I pause, take a breath, and ask: If my heuristics aren’t currently working for me, what are some options?

If I want to introduce a test-first TDDer to my testing approach, I might suggest a modest experiment: “Let’s work together on some design and coding problems and compare our two approaches. Let’s find out what tests we come up with following my test-driven development approach. Let’s try your test-first TDD on a similar problem and see what tests we come up with. Let’s see what we learn.”

At the very minimum, I hope we’d learn of our shared value: we both value tested code. We might learn from each other more about the kinds of tests we like to write. Or how many tests we think are needed. Or how we rework existing tests. We might share some heuristics for deciding what next test to write or what isn’t worth testing. Through experimentation and reflection, we can grow and learn from each other.

Testing, Testing…our Heuristics

We gather heuristics through conversations
We gather heuristics through storytelling and conversations

Recently Chelsea Troy and I chatted over Zoom about software testing heuristics. I met Chelsea last year at DDD Europe. In this and a couple of snack-sized posts, I will reflect on some highlights of our conversation. Chelsea has also written about our conversation.

A Leading Question Leads to Some Heuristics

I started by asking, “What is important about testing that people should get but don’t?”

Chelsea answered, that while Test Driven Development (TDD) is useful, it doesn’t solve all testing needs.  If developers are oversold on the benefits of TDD, they can become jaded on testing in general. They shouldn’t. TDD doesn’t include specific practices that address resilience, or reliability. But it is useful for developing and testing deterministic code.

Chelsea shared the experience of learning first-hand how TDD didn’t have all the answers to testing. She worked on a team of TDD enthusiasts developing a mobile app for a client. Although the team thought they knew how to develop quality software, their initial prototype developed following TDD didn’t address these challenging requirements: being usable under extreme weather conditions, having a simple UX, and functioning when only intermittently connected to the internet and their backend software. They needed to add more design and testing techniques to their toolbox, along with their TDD testing. Chelsea also said that she learned a lot about testing for these kinds requirements from their client’s QA team.

Some heuristics we’ve touched on:

  • Use TDD to develop and test functionality of deterministic software.
  • Use other strategies to design and test for software system qualities such as usability, performance, reliability, or resilience.
  • Match your testing strategies and tactics to your application’s development and execution context.

A Brief Introduction to Heuristics

I have been intrigued by software development heuristics, ever since I read Billy Vaughn Koen’s Discussion of the Method: Conducting the Engineer’s approach to problem solving. Koen defines a heuristic as, “anything that provides a plausible aid or direction in the solution of a problem but is in the final analysis unjustified, incapable of justification, and potentially fallible.” Heuristics are never guaranteed. When a heuristic fails, you back up and try another one.

I enjoy hunting for heuristics while designing and coding with others. Open-ended conversations where we swap stories and reflect on our heuristics are another great opportunity. Generally, I look for three kinds of heuristics:

  1. Action heuristics. Things we do to solve our immediate problem. There are many action heuristics. Design patterns are one well-known form of action heuristic. We know these heuristics by name because authors took the time to write up them as named software patterns. But there are many testing and development techniques both smaller and larger than patterns. For example, in Test-Driven Development (TDD), the practice of “write a test, then write code to pass the test” is a heuristic for incrementally designing and implementing tested code.
  1. Value Heuristics. Values motivate our actions. Underlying TDD is the value: Testing should be an integral part of design and coding.

Our values determine what actions seem appropriate. Because I value understandable code, I I take several actions to make my code more comprehensible: I give methods, functions, and variables meaningful names; keep code in methods short; and write code at the same level of detail in a method, factoring out lower-level details into helper methods.

Values depend on context. As the context shifts, so do our values. This doesn’t mean we are fickle; just pragmatic. Most of the time we aren’t conscious of making these shifts. When cutting and pasting code from stack overflow, I don’t value code understandability so much as I do the ability to quickly determine whether that code addresses my current problem. If it does, then I rewrite that code to make it clearer and to fit with the style in my existing codebase. In production code, I do value understandability.

  1. Guiding heuristics. Heuristics that lead to related actions. For example, Chelsea shared one guiding heuristic: Don’t treat test code the same as production code, instead, make each test understandable in isolation. This leads her to write self-contained test methods. She doesn’t like a test where she has to read the code that it calls on before she can understand the test. She also isn’t a fan of applying the DRY (Don’t Repeat Yourself) heuristic to test code.

Comparing competing heuristics

Chelsea mentioned that understandable tests can also serve as valuable design documentation and discovery tools. It’s easier to modify test code that is self-contained, rerun it, and explore how the software responds.

I asked Chelsea whether she would put aside her heuristic of keeping tests self-contained if there were compelling reasons. What if set up conditions for tests took a long time (for example, doing a cut of a database in order to build an in-memory cache of test data)? What if there was complex code that was repeated in similar tests but was slightly altered? Did someone make cut-and-paste-modify-and-reuse errors, or were there valid reasons for these differences?

Factoring common initialization code out of tests into common setup code, provides a “standard” execution context for a suite of tests. It also makes it easier to vary that context and rerun the test suite. Factoring out code common to several tests and clearly labeling what it does eliminates having to second guess reasons for slight variations in test code.

Depending on your situation and personal preferences, you may choose the heuristic, “Keep code in tests so you can understand and easily manipulate it,” or the other, “Factor out expensive or error prone code into common code shared by tests.” These heuristics compete with each other. Neither is better. They are simply alternative ways to structure your test code.

The Value of Knowing your Values

If people don’t know your values (and how they differ from their values), they may not understand why you prefer to work the way you do. For example, while I value testing, I don’t practice test-first development.

If you understand TDD to mean strictly write tests before writing any code, your TDD heuristic is: begin by writing a small test, then write code that proves that the test fails, then rewrite your code to pass that test. Don’t add any more code than necessary to make the test pass. Do this repeatedly until you’ve fully implemented your code.

At the end of a TDD cycle, you have a bunch of tests and fully functioning code that passes those tests. Working this way, you typically implement a single class at a time. You test and implement lower-level functionality, then repeat the process to develop the code that uses that functionality. Your software tends to grow from the “bottom” up.

I value testing, but typically design and implement several classes that work together at the same time. Once I prove to myself that my overall design hangs together (through some sort of simulation), I implement it. When finished, I check in code for several classes along with tests that demonstrate their behavior. My code is tested, but I don’t leave around lots of low-level tests.

For example, I may use a strategy pattern to calculate charges for different items on an invoice. I would initially implement each individual strategy class and check that it worked as I expected. But I’d remove most if not all tests for those individual strategies once I proved to myself that they worked. Their code is simple enough to read at a glance. Once I get low level classes working (especially if they don’t retain any state), I don’t need to keep tests around to ensure that they work. Once implemented, they rarely change. If I do need to revise them, at that point I might reconsider my testing heuristics (and add some tests that reflect these changes). The valuable tests I tend to preserve are those that determine which strategy to use, how to add new kinds of strategies, and different ways to apply discounts and special pricing.

Let’s contrast my testing heuristics with those of test-first TDDers.

We both share this value heuristic: Value code that has tests over code (even if it works) that doesn’t have tests.

Test-first TDDers apply this heuristic: Write tests as you incrementally design code. Interleave testing and coding, repeatedly. Start with the simplest test and the simplest implementation. Only implement enough functionality so that your latest test passes. Build functionality and tests in small increments; each increment moving you closer to your final tested design.

They also have this guiding heuristic: You produce a cleaner design if you write tests first before writing any code.

I don’t share that heuristic.

My heuristic for developing designed, tested code is: Consider the design of one or more classes working together to achieve some functionality. Model your design using some lightweight technique, such as CRC cards (Class-Responsibility-Collaborators) or whiteboard sketches. Once you know what each class’ responsibilities are and how they interact, then implement them. Write simple tests and debug as you implement, but remove them if they are low level (and other code that has tests exercises their functionality). Keep only a lean set of illustrative tests that demonstrate how the classes work together and ensure that your design will continue to function properly.

At the end of my design/development cycle, I may write additional tests, revise existing ones, or remove insignificant tests. I use this grooming and cleanup step, before committing my code, as one way to double check my work.

Chelsea summarized my TDD heuristic as: Put tests in at the right level of abstraction once you know what your design is about.

Chelsea cautions, however, that if you don’t know what the right level of abstraction is and you follow test-first TDD heuristics by rote, you end up with tests at too low a level. Also, if you don’t have heuristics for pruning them, you end up with too many.

I view most testing I do while I implement my design as temporary scaffolding. Since I’ve already sketched out design ideas before coding, tests are not my primary tool for design. I test to verify my design. If I need to adjust my design as I implement it (and I expect to), that’s OK. I keep tweaking it and my code, and continue testing.

I suspect the biggest difference in our two approaches is that test-first TDDers don’t view their tests as temporary scaffolding, and I don’t view the cycle of test-first TDD as the only (or best) way to understand what a design should be. We both value tested, well-designed code.

Bringing to light the different values that underlie competing heuristics can be illuminating. But how can we get others to appreciate and try out our heuristics? How can we approach new-to-us heuristics with an open mind? I’ll touch on these topics and more in my next post.

Shift Left: Testing Earlier in Development

One candidate for best experience report at XP 2015 was “High Level Test Driven Development – Shift Left” by Kristian Bjerek-Gustuen, Emil Wiik Larsen, Tor Stålhane, and Torgeir Dingsøyr. Their report describes testing strategies and tactics used on a large-scale IT development project in Norway. These authors called it “shifting left” because they wanted to move testing to as early as possible in development.

Their project was complex along several dimensions.

Coordinated work among independent teams: Several vendors were simultaneously developing and delivering systems that communicated with each other via service interfaces. Vendors did detailed design and development including unit, integration and sprint system testing of their deliverables. This work had to be integrated by the company. They performed acceptance testing after every sprint, system integration testing which ran in parallel with the vendors’ sprint system tests, as well user acceptance testing.

Significant coding and testing effort: Over 100,000 hours of testing and development went into the release.

Time pressure and project criticality: The delivery date was fixed. The system was an extension of an existing system for processing payments. It had to be delivered on time and with high quality.

The duration of the project they described was roughly 40 weeks, if I read the report correctly: 12 3-week sprints followed by 6 weeks of system testing. Faced with a time crunch, an existing group of 3 maintenance scrum teams were scaled up to 11 teams over four months. The customer who was receiving the software system didn’t have fulltime resources to dedicate to the scrum teams who were comprised of these roles: scrum master, developer, functional architect, technical architect, and QA/tester. My guess is that the customer was called upon as needed to provide clarifications, offer feedback at sprint demos, and to do all necessary testing and integration testing.

To ensure that each team had efficient and sufficient testing and quality assurance, a dedicated QA person/tester was assigned to each team. This reminds me of Stephanie Savoia’s experience report at Marchex where they embedded testers into their dev teams.

In the case of this Shift left report, the dedicated QA person (I prefer the term quality advocate) seemed very busy and vital: they ensured the quality of design documentation; decided whether the implementation was testable; prepared test data; worked with devs and the Scrum master to ensure high quality code. They also made sure test activities were performed as early as possible and that the customer provided necessary clarifications to the dev team.

Bridge, go-between, quality advocate, tester extraordinaire!

During sprint demos, in addition to demonstrating new functionality, teams also provided information on how testing was performed and any issues they had encountered in testing the implemented functionality.

There is more in the report about how they managed testing and dependencies between teams, identified and tracked high-risk modules changes and defects to aid test and development planning, and introduced exploratory testing using interdisciplinary teams. But I digress.

Back to those busy QA advocates. Not surprisingly, the experience reporters mentioned in passing that the dedicated QA advocate became one of the most central resources for the teams.

It seems that they were deeply appreciated by the whole team. That’s important. Where they ever overwhelmed by their responsibilities? Were they overworked?

Any experience report never answers all questions I have. I still find them thought provoking. Even though I’d love to sit down and talk with experience reporters and ask them more questions.

One pattern Joe Yoder and I have written about in our Shifting From QA to AQ pattern collection is called Pair With A Quality Advocate. If you purposefully pair up devs or other folks with a QA advocate, their expertise can “rub off” on less skilled/experienced testers and developers. Steadily (and sometimes stealthily), a quality mindset gets infused into the entire team. You still need quality advocates, but everyone takes on more responsibility for quality. And that’s a good thing.

Distinguishing between testing and checking

At Agile 2013 Matt Heusser presented a history of how agile testing ideas have evolved in “Twelve Years of Agile Testing: And What Do We Do Now?” The most intellectually challenging idea I came away from Matt’s talk was the notion that testing and checking are different. I’m still trying to wrap my head around this distinction.

Disclosure: I’m not a testing insider. However, along with effective design and architecture practices, pragmatic testing is a passion of mine. I have presented talks at Agile with my colleague Joe Yoder on pragmatic test driven design and quality scenarios.

Like most, I suspect, I have a hard time teasing out a meaningful distinction between checking and testing. When I looked up definitions for testing and checking there was significant overlap. Consider these two definitions:

Testing-the means by which the presence, quality, or genuineness of anything is determined

Testing-a particular process or method for trying or assessing.

And these for checking:

Checking-to investigate or verify as to correctness.

Checking-to make an inquiry into, search through, etc.

Using the first definition for testing, I can say, “By testing I determine what my software does.” For example, a test can determine the amount of interest calculated for a late payment or the number of transactions that are processed in an hour. Using the second meaning of testing, I can say that, “I perform unit testing by following the test first cycle of classic TDD” or that, “I write my test code to verify my class’ behavior after I’ve completed a first cut implementation that compiles.” Both are particular testing processes or methods.

I can say, “I check that my software correctly behaves according to some standard or specification (first meaning).” I can also perform a check (using the second definition) by writing code that measure how many transactions can be performed within a time period.

I can check my software by performing manual procedures and observing results.

I can check my software by writing test code and creating an automated test suite.

I might want to assess how my software works without necessarily verifying its correctness. When tests (or evaluations) are compared against a standard of expected behavior they also are checks. Testing is in some sense a larger concept or category that encompasses checking.

Confused by all this word play? I hope not.

Humans (and speakers of any native language) explore the dimensions and extent of categories by observing and learning from concrete examples. One thing that distinguishes a native speaker from a non-native speaker is that she knows the difference between similar categories, and uses the appropriate concept in context. To non-native speakers the edges and boundaries of categories seem arbitrary and unfathomable (meanings aren’t found by merely reading dictionary definitions).

I’ve been reading about categories and their nuances in Douglas Hofstadter and Emmanuel Sander’s Surfaces and Essences. (Just yesterday I read about subtle difference between the phrases, “Letting the cat out of the bag” and “Spilling the beans.”)

So what’s the big deal about making a distinction between testing and checking?

Matt pointed us to Michael Bolton’s blog entry, Testing vs. Checking. Along with James Bach, Michael has nudged the testing world to distinguish between automated “checks” that verify expected behaviors versus “testing” activities that require human guided investigation and intellect and aren’t automatable.

In James Bach’s blog, Testing and Checking Refined, they makee these distinctions:

“Testing is the process of evaluating a product by learning about it through experimentation, which includes to some degree: questioning, study, modeling, observation and inference.
(A test is an instance of testing.)

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
(A check is an instance of checking.)”

My first reaction was to throw up my hands and shout “Enough!” My reaction was that of a non-native speaker trying to understand a foreign idiom! But then I calmed down, let go of my urge to precisely know James and Michael’s meanings, accept some ambiguity, and looked for deeper insight.

When Michael explained,

“Checking is something that we do with the motivation of confirming existing beliefs” while, “Testing is something that we do with the motivation of finding new information.”

it suddenly became more clear. We might be doing what appears to be the same activity (writing code to probe our software), but if our intentions are different, we could either be checking or testing.

Why is this important?

The first time I write test code and execute it I learn something new (I also might confirm my expectations). When I repeatedly run that test code as part of a test suite, I am checking that my software continues to work as expected. I’m not really learning anything new. Still, it can be valuable to keep performing those checks. Especially when the code base is rapidly changing.

But I only need to execute checks repeated on code that has the potential to break. If my code is stable (and unchanging), perhaps I should question the value of (and false confidence gained by) repeatedly executing the same tired old automated tests. Maybe I should write new tests to probe even more corners of my software.

And if tests frequently break (even though the software is still working), perhaps I need to readjust my checks. I’m betting I’ll find test code that verifies details that should be hidden/aren’t really essential to my software’s behavior. Writing good checks that don’t break so easily makes it easier to change my software design. And that enables me to evolve my software with greater ease.

When test code becomes stale, it is precisely because it isn’t buying any new information. It might even be holding me back.

I have a long way to go to become a fluent native testing speaker. And I wish that James and Michael could have chosen different phrases to describe these two categories of “testing” (perhaps exploration and verification?).

But they didn’t.
Fair enough.