Distinguishing between testing and checking

At Agile 2013 Matt Heusser presented a history of how agile testing ideas have evolved in “Twelve Years of Agile Testing: And What Do We Do Now?” The most intellectually challenging idea I came away from Matt’s talk was the notion that testing and checking are different. I’m still trying to wrap my head around this distinction.

Disclosure: I’m not a testing insider. However, along with effective design and architecture practices, pragmatic testing is a passion of mine. I have presented talks at Agile with my colleague Joe Yoder on pragmatic test driven design and quality scenarios.

Like most, I suspect, I have a hard time teasing out a meaningful distinction between checking and testing. When I looked up definitions for testing and checking there was significant overlap. Consider these two definitions:

Testing-the means by which the presence, quality, or genuineness of anything is determined

Testing-a particular process or method for trying or assessing.

And these for checking:

Checking-to investigate or verify as to correctness.

Checking-to make an inquiry into, search through, etc.

Using the first definition for testing, I can say, “By testing I determine what my software does.” For example, a test can determine the amount of interest calculated for a late payment or the number of transactions that are processed in an hour. Using the second meaning of testing, I can say that, “I perform unit testing by following the test first cycle of classic TDD” or that, “I write my test code to verify my class’ behavior after I’ve completed a first cut implementation that compiles.” Both are particular testing processes or methods.

I can say, “I check that my software correctly behaves according to some standard or specification (first meaning).” I can also perform a check (using the second definition) by writing code that measure how many transactions can be performed within a time period.

I can check my software by performing manual procedures and observing results.

I can check my software by writing test code and creating an automated test suite.

I might want to assess how my software works without necessarily verifying its correctness. When tests (or evaluations) are compared against a standard of expected behavior they also are checks. Testing is in some sense a larger concept or category that encompasses checking.

Confused by all this word play? I hope not.

Humans (and speakers of any native language) explore the dimensions and extent of categories by observing and learning from concrete examples. One thing that distinguishes a native speaker from a non-native speaker is that she knows the difference between similar categories, and uses the appropriate concept in context. To non-native speakers the edges and boundaries of categories seem arbitrary and unfathomable (meanings aren’t found by merely reading dictionary definitions).

I’ve been reading about categories and their nuances in Douglas Hofstadter and Emmanuel Sander’s Surfaces and Essences. (Just yesterday I read about subtle difference between the phrases, “Letting the cat out of the bag” and “Spilling the beans.”)

So what’s the big deal about making a distinction between testing and checking?

Matt pointed us to Michael Bolton’s blog entry, Testing vs. Checking. Along with James Bach, Michael has nudged the testing world to distinguish between automated “checks” that verify expected behaviors versus “testing” activities that require human guided investigation and intellect and aren’t automatable.

In James Bach’s blog, Testing and Checking Refined, they makee these distinctions:

“Testing is the process of evaluating a product by learning about it through experimentation, which includes to some degree: questioning, study, modeling, observation and inference.
(A test is an instance of testing.)

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
(A check is an instance of checking.)”

My first reaction was to throw up my hands and shout “Enough!” My reaction was that of a non-native speaker trying to understand a foreign idiom! But then I calmed down, let go of my urge to precisely know James and Michael’s meanings, accept some ambiguity, and looked for deeper insight.

When Michael explained,

“Checking is something that we do with the motivation of confirming existing beliefs” while, “Testing is something that we do with the motivation of finding new information.”

it suddenly became more clear. We might be doing what appears to be the same activity (writing code to probe our software), but if our intentions are different, we could either be checking or testing.

Why is this important?

The first time I write test code and execute it I learn something new (I also might confirm my expectations). When I repeatedly run that test code as part of a test suite, I am checking that my software continues to work as expected. I’m not really learning anything new. Still, it can be valuable to keep performing those checks. Especially when the code base is rapidly changing.

But I only need to execute checks repeated on code that has the potential to break. If my code is stable (and unchanging), perhaps I should question the value of (and false confidence gained by) repeatedly executing the same tired old automated tests. Maybe I should write new tests to probe even more corners of my software.

And if tests frequently break (even though the software is still working), perhaps I need to readjust my checks. I’m betting I’ll find test code that verifies details that should be hidden/aren’t really essential to my software’s behavior. Writing good checks that don’t break so easily makes it easier to change my software design. And that enables me to evolve my software with greater ease.

When test code becomes stale, it is precisely because it isn’t buying any new information. It might even be holding me back.

I have a long way to go to become a fluent native testing speaker. And I wish that James and Michael could have chosen different phrases to describe these two categories of “testing” (perhaps exploration and verification?).

But they didn’t.
Fair enough.