Shift Left: Testing Earlier in Development

One candidate for best experience report at XP 2015 was “High Level Test Driven Development – Shift Left” by Kristian Bjerek-Gustuen, Emil Wiik Larsen, Tor Stålhane, and Torgeir Dingsøyr. Their report describes testing strategies and tactics used on a large-scale IT development project in Norway. These authors called it “shifting left” because they wanted to move testing to as early as possible in development.

Their project was complex along several dimensions.

Coordinated work among independent teams: Several vendors were simultaneously developing and delivering systems that communicated with each other via service interfaces. Vendors did detailed design and development including unit, integration and sprint system testing of their deliverables. This work had to be integrated by the company. They performed acceptance testing after every sprint, system integration testing which ran in parallel with the vendors’ sprint system tests, as well user acceptance testing.

Significant coding and testing effort: Over 100,000 hours of testing and development went into the release.

Time pressure and project criticality: The delivery date was fixed. The system was an extension of an existing system for processing payments. It had to be delivered on time and with high quality.

The duration of the project they described was roughly 40 weeks, if I read the report correctly: 12 3-week sprints followed by 6 weeks of system testing. Faced with a time crunch, an existing group of 3 maintenance scrum teams were scaled up to 11 teams over four months. The customer who was receiving the software system didn’t have fulltime resources to dedicate to the scrum teams who were comprised of these roles: scrum master, developer, functional architect, technical architect, and QA/tester. My guess is that the customer was called upon as needed to provide clarifications, offer feedback at sprint demos, and to do all necessary testing and integration testing.

To ensure that each team had efficient and sufficient testing and quality assurance, a dedicated QA person/tester was assigned to each team. This reminds me of Stephanie Savoia’s experience report at Marchex where they embedded testers into their dev teams.

In the case of this Shift left report, the dedicated QA person (I prefer the term quality advocate) seemed very busy and vital: they ensured the quality of design documentation; decided whether the implementation was testable; prepared test data; worked with devs and the Scrum master to ensure high quality code. They also made sure test activities were performed as early as possible and that the customer provided necessary clarifications to the dev team.

Bridge, go-between, quality advocate, tester extraordinaire!

During sprint demos, in addition to demonstrating new functionality, teams also provided information on how testing was performed and any issues they had encountered in testing the implemented functionality.

There is more in the report about how they managed testing and dependencies between teams, identified and tracked high-risk modules changes and defects to aid test and development planning, and introduced exploratory testing using interdisciplinary teams. But I digress.

Back to those busy QA advocates. Not surprisingly, the experience reporters mentioned in passing that the dedicated QA advocate became one of the most central resources for the teams.

It seems that they were deeply appreciated by the whole team. That’s important. Where they ever overwhelmed by their responsibilities? Were they overworked?

Any experience report never answers all questions I have. I still find them thought provoking. Even though I’d love to sit down and talk with experience reporters and ask them more questions.

One pattern Joe Yoder and I have written about in our Shifting From QA to AQ pattern collection is called Pair With A Quality Advocate. If you purposefully pair up devs or other folks with a QA advocate, their expertise can “rub off” on less skilled/experienced testers and developers. Steadily (and sometimes stealthily), a quality mindset gets infused into the entire team. You still need quality advocates, but everyone takes on more responsibility for quality. And that’s a good thing.

Distinguishing between testing and checking

At Agile 2013 Matt Heusser presented a history of how agile testing ideas have evolved in “Twelve Years of Agile Testing: And What Do We Do Now?” The most intellectually challenging idea I came away from Matt’s talk was the notion that testing and checking are different. I’m still trying to wrap my head around this distinction.

Disclosure: I’m not a testing insider. However, along with effective design and architecture practices, pragmatic testing is a passion of mine. I have presented talks at Agile with my colleague Joe Yoder on pragmatic test driven design and quality scenarios.

Like most, I suspect, I have a hard time teasing out a meaningful distinction between checking and testing. When I looked up definitions for testing and checking there was significant overlap. Consider these two definitions:

Testing-the means by which the presence, quality, or genuineness of anything is determined

Testing-a particular process or method for trying or assessing.

And these for checking:

Checking-to investigate or verify as to correctness.

Checking-to make an inquiry into, search through, etc.

Using the first definition for testing, I can say, “By testing I determine what my software does.” For example, a test can determine the amount of interest calculated for a late payment or the number of transactions that are processed in an hour. Using the second meaning of testing, I can say that, “I perform unit testing by following the test first cycle of classic TDD” or that, “I write my test code to verify my class’ behavior after I’ve completed a first cut implementation that compiles.” Both are particular testing processes or methods.

I can say, “I check that my software correctly behaves according to some standard or specification (first meaning).” I can also perform a check (using the second definition) by writing code that measure how many transactions can be performed within a time period.

I can check my software by performing manual procedures and observing results.

I can check my software by writing test code and creating an automated test suite.

I might want to assess how my software works without necessarily verifying its correctness. When tests (or evaluations) are compared against a standard of expected behavior they also are checks. Testing is in some sense a larger concept or category that encompasses checking.

Confused by all this word play? I hope not.

Humans (and speakers of any native language) explore the dimensions and extent of categories by observing and learning from concrete examples. One thing that distinguishes a native speaker from a non-native speaker is that she knows the difference between similar categories, and uses the appropriate concept in context. To non-native speakers the edges and boundaries of categories seem arbitrary and unfathomable (meanings aren’t found by merely reading dictionary definitions).

I’ve been reading about categories and their nuances in Douglas Hofstadter and Emmanuel Sander’s Surfaces and Essences. (Just yesterday I read about subtle difference between the phrases, “Letting the cat out of the bag” and “Spilling the beans.”)

So what’s the big deal about making a distinction between testing and checking?

Matt pointed us to Michael Bolton’s blog entry, Testing vs. Checking. Along with James Bach, Michael has nudged the testing world to distinguish between automated “checks” that verify expected behaviors versus “testing” activities that require human guided investigation and intellect and aren’t automatable.

In James Bach’s blog, Testing and Checking Refined, they makee these distinctions:

“Testing is the process of evaluating a product by learning about it through experimentation, which includes to some degree: questioning, study, modeling, observation and inference.
(A test is an instance of testing.)

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
(A check is an instance of checking.)”

My first reaction was to throw up my hands and shout “Enough!” My reaction was that of a non-native speaker trying to understand a foreign idiom! But then I calmed down, let go of my urge to precisely know James and Michael’s meanings, accept some ambiguity, and looked for deeper insight.

When Michael explained,

“Checking is something that we do with the motivation of confirming existing beliefs” while, “Testing is something that we do with the motivation of finding new information.”

it suddenly became more clear. We might be doing what appears to be the same activity (writing code to probe our software), but if our intentions are different, we could either be checking or testing.

Why is this important?

The first time I write test code and execute it I learn something new (I also might confirm my expectations). When I repeatedly run that test code as part of a test suite, I am checking that my software continues to work as expected. I’m not really learning anything new. Still, it can be valuable to keep performing those checks. Especially when the code base is rapidly changing.

But I only need to execute checks repeated on code that has the potential to break. If my code is stable (and unchanging), perhaps I should question the value of (and false confidence gained by) repeatedly executing the same tired old automated tests. Maybe I should write new tests to probe even more corners of my software.

And if tests frequently break (even though the software is still working), perhaps I need to readjust my checks. I’m betting I’ll find test code that verifies details that should be hidden/aren’t really essential to my software’s behavior. Writing good checks that don’t break so easily makes it easier to change my software design. And that enables me to evolve my software with greater ease.

When test code becomes stale, it is precisely because it isn’t buying any new information. It might even be holding me back.

I have a long way to go to become a fluent native testing speaker. And I wish that James and Michael could have chosen different phrases to describe these two categories of “testing” (perhaps exploration and verification?).

But they didn’t.
Fair enough.