Getting out of your ruts

This post is part three (and the last) of reflections on a conversation I had with Chelsea Troy about our testing and development heuristics.

I asked Chelsea, how do you get people to be less clingy about holding on to tests that don’t add value?

She speculates that this will require getting people to unpack the psychological baggage they hold around the value of their tests or code. Once people have written tests or code, they don’t want to get rid of them because that is the primary visible evidence that they’ve done something. Developers rarely get recognized for removing unused code or throwing away brittle or inconsequential tests.

So how can you turn this attitude around? Let’s be blunt. Developers do lots of things to improve their ability to write and maintain code. They shouldn’t be penalized for these activities. Professional software development isn’t just cranking out code. 

A Heuristic for Switching Things Up

I learned this guiding heuristic from a colleague, John Schwartz: If something you are doing isn’t buying you any new or useful information, stop doing it. Instead, move on to something that will. And, if that doesn’t work, try something else. Don’t settle.

John applied this heuristic to every aspect of software development and management. Applied to testing: If tests always pass (or frequently fail because of nitpicky things that don’t matter), throw them out. When you fix brittle UI tests, only to find that they fail with the next CSS style change, recognize that you aren’t making forward progress or learning anything new. Repeatedly fixing those brittle tests is simply busywork.

You are better off acknowledging that you are testing at the wrong (too low) a level and that to buy information you should be testing differently.

So, what happens when you throw out those tests?

Arguably, you will have less clutter and useless information to wade through when something breaks your build. And if you do miss some of those tests, you can always bring them back. Or decide to run them periodically.

Letting Go

I have seen clients hold on to coding and testing practices that slow them down. At one client, developers frequently left unused code as comments so they wouldn’t forget about it just in case it might be needed. When asked “Why not just use version control?” they worried that they would forget about the code. But how long had that code been frozen in comments? When’s the last time you unthawed some commented code and used it as is?

In another situation, I once reviewed some code for a function which had a parameter that was never used. At one time it was; but now that code was obsolete. Every new team member struggled to understand the reason for that parameter and what the code that handled it did. I tried to convince them to remove this useless code. It would take less than 20 minutes. I’m not sure whether they did. But at least I got them to recognize that this dead code was an impediment. Before I pointed out this out, they had just considered the time it took to understand that code to be an annoying rite of passage for new developers.

Documenting Your Actions

Chelsea is experimenting with ways to simplify and minimize code and tests. The more stuff you have to wade through, the harder it is to maintain and debug your code. Reducing extraneous stuff–whether it is overly complex code, unused code, or awkward tests–makes life easier. She is also trying to get her team to write commit messages to document these efforts. She wants to leave a documentation trail. Her heuristic: When you pare down of code and tests, document both what has changed and why.

Using commit records as documentation won’t work well unless everyone follows conventions. But is it realistic to expect everyone to be as diligent as Chelsea at documenting their work? Developing software is a team sport. Unless we agree on how to work together, and then follow through on our agreements, results will be inconsistent. Even so, people still bend or disregard working agreements. I suspect there are several reasons for this: maybe they didn’t fully buy into the practice, or maybe they didn’t understand or appreciate the reasons behind what we agreed to, or…

When this happens, whatever the reasons, something needs to change. And we won’t know exactly what’s going on unless we talk to each other.

Chelsea suggests that we could benefit greatly from uncovering each other’s assumptions instead of simply letting things slide. How can you do this? Be direct: Ask people to think really hard and to openly share their thoughts and values. Then decide what actions to take.

Getting Maximal Value out of Documentation with Minimal Effort

Chelsea sees a similar a problem with software documentation. No one wants to write it. But they do so because they think they should. But they don’t bother updating existing docs. So, documentation gets out of date and they end up with lots of inaccurate versions. I’ve seen this problem many times throughout my consulting career. As I know I’m one of those rare individuals who actually likes to write documentation… I know that you can’t fix this problem by telling developers to just try harder. (Again, rarely do developers get rewarded for writing documentation).

One useful heuristic for mitigating your out-of-date documentation problem is to create “living” documentation (instead of static documentation that is written once and never updated). Cyrille Martraire has written a great book, Living Documentation, that contains many heuristics (written as patterns), and examples demonstrating code to create it. Living documentation is generated by executing scripts that extract facts from your codebase or running system or repositories. Then, using that information, those scripts dynamically generates or updates your documentation. A central value underlying Cyrille’s heuristics is to make documentation integral to software development, rather than it being a separate activity. If you connect your documentation directly to your code, it will always be up-to-date with the latest code. He suggests starting small, then growing tooling and scripts as you find the need. The guiding heuristic behind all of Cyrille’s heuristics is: Don’t let software documentation become stale; proactively generate documentation with information that can be automatically refreshed.

Efforts to create living documentation, streamline software development, or making software codebases more sustainable, are often undervalued. They shouldn’t be. My guiding heuristic: Cut out the crap so you can focus your development attention on the important stuff.

Our Heuristics are Shaped Through Experience

This post is part two of some reflections on a conversation I had with Chelsea Troy about our testing heuristics. You may also want to read part one and Chelsea’s writeup.

I shared with Chelsea how my Smalltalk development background contributed to my testing and design heuristics. I was involved in the early days of Smalltalk at Tektronix as a principal engineer in the AI Machines group. After a yearlong stint managing the software group through product introduction, I switched back to full-time engineering. Among other things, I added features to Smalltalk including color graphics, fonts, and support for low-level OS calls. All our code was visible to our users, and we had a strong engineering culture.

I learned how to work effectively in the Smalltalk environment by studying existing code, figuring out what it did, and understanding its coding and design style. I also observed more experienced Smalltalk programmers. Kent Beck, along with Ward Cunningham, and other Tek Lab’s folks were some of the very earliest Smalltalk application programmers. Ward and Kent worked together, developing prototypes and exploring what Smalltalk was good at. Many ideas about Extreme Programming (and TDD) and object-design can be traced back to these programming experiences.

The Smalltalk image was always running. It contained the entire development environment and had a browser where you could look at existing code and add your own. Much of my time was spent experimenting with and reading existing code, then trying to fit my new code in. The code I wrote was a mix of new classes as well as extensions or modifications to existing ones. To show someone else how to use your code, you’d create a workspace—a scratchpad window—and put snippets of commented code for them to read, edit, and evaluate it. By convention, methods were categorized (see the third pane of the System Browser below, which shows the categories for the abstract Collection class). Other classes also had a testing category, but it was not used for what you might think! The testing category for the class Collection included methods for querying (i.e. testing) its contents.

What a typical Smalltalk-80 system image looked like courtesy of https://randoc.wordpress.com/2018/07/20/tektronix-smalltalk-workstations-4400-and-4300-series/

So how did programmers test Smalltalk code? I didn’t have any conventions to follow for organizing my tests (and not inconsequentially, leaving test code around would clutter up the Smalltalk image). Since I could highlight code anywhere and execute it, I tested code as I wrote it. I could step through code with a debugger, change it on the fly, and run it again. I tested my code into existence, but didn’t leave around any tests.

In an article I wrote about Color Smalltalk here’s how I described this experience: “… the workspace, lets programmers experiment with code without actually incorporating the experimental code into the valid, running environment. A programmer can write, execute and debug code in a workspace, then pull it into the Smalltalk application when the new code is tested and operational.”

While this statement is mostly true, it is also misleading. Anything I did as a programmer would add more objects to and change the state of the running Smalltalk image. Code you executed in a workspace changed the image (sometimes with catastrophic results, especially if you were tinkering with basic low-level system functionality as I was). But the Smalltalk environment and tools made it so easy to back up a step or two, revise your code, and try again, even with code that mucked with low-level stuff.

Kent’s Smalltalk experience heavily influenced how he thought about incremental development. But when it came to testing, I suspect he tried to boil down his Smalltalk experiences into practices that would be more “failsafe” for programmers who didn’t work in such a dynamic and forgiving development environment. Kent’s thinking about testing has evolved since he wrote his books. In an interview with Andrew Binstock in 2019, Kent and Andrew chat about this evolution:

Binstock: Do you still work on strictly a test-first basis?

Beck: No. Sometimes, yes.

Binstock: OK. Tell me how your thoughts have evolved on that. When I look at your book Extreme Programming Explained, there seems to be very little wiggle room in terms of that. Has your view changed?

Beck: Sure. So there’s a variable that I didn’t know existed at that time, which is really important for the trade-off about when automated testing is valuable. It is the half-life of the line of code. If you’re in exploration mode and you’re just trying to figure out what a program might do and most of your experiments are going to be failures and be deleted in a matter of hours or perhaps days, then most of the benefits of TDD don’t kick in, and it slows down the experimentation—a latency between “I wonder” and “I see.” You want that time to be as short as possible. If tests help you make that time shorter, fine, but often, they make the latency longer, and if the latency matters and the half-life of the line of code is short, then you shouldn’t write tests.

Binstock: Indeed, when exploring, if I run into errors, I may backtrack and write some tests just to get the code going where I think it’s supposed to go.

Beck: I learned there are lots of forms of feedback. Tests are just one form of feedback, and there are some really good things about them, but depending on the situation you’re in, there can also be some very substantial costs. Then you have to decide, is this one of these cases where the trade-off tips one way or the other? People want the rule, the one absolute rule, but that’s just sloppy thinking as far as I’m concerned.

Practicing TDD ensures developers write tests. The underlying value heuristic is, “any tests are better than no tests.” But if we take Kent’s more recent thoughts to heart, we shouldn’t test without thinking through some consequences. Kent’s more recent guiding heuristic: Test when it matters and when you need a safety net. Think through both the benefits and costs of testing. If you are exploring, don’t let testing slow you down.

There is no single “definitive” answer to the question, “when should I test?”

Develop Test Strategies Based on System Context

Chelsea asked, “So, how do you determine what kinds of tests to write?”

I don’t have a definitive answer to this question, either. So, I shared a few stories. I’ve worked with clients unschooled in TDD. They write code, test it a little, and then throw these initial tests away. They build successful products. The tests they tend to keep are regression tests, tests that demonstrate a quirky bug that has been fixed (and to ensure that it stays that way). It’s always a bet.

If code is stable, and the tests always pass, running tests all the time isn’t buying any new information. Even worse, passing tests can give you a false sense of security about your code’s quality. So why do we write tests?

I like to focus on writing tests that check that stable (relatively unchanging) system expectations still hold, and that demonstrate ways new capabilities can be safely added. I also try to write tests that capture expectations I have around my system’s behavior.

For complex systems, though, this can be difficult. Unforeseen side effects can pop up in strange places (changing code in one place unexpectedly causing other code to break in a distant part of the system). It’s impossible to test for every possible edge case and you don’t know all the dependencies.

I remember Kent Beck telling this oddball story of writing his first TDD code when he went to work at Facebook. His code, which passed all his tests, suddenly caused other tests for other parts of the system to fail. Rather than revert his code, those familiar with the system decided to throw out those failing tests. Seems weird, but they knew those tests were brittle, and making some wrong assumptions. When you find problems with tests, think carefully about whether it is appropriate to add additional tests to ensure that things don’t break, whether your existing tests are brittle, or whether your assumptions are wrong.

Data Scientists Have Different Testing Values

When you need to process massive amounts of data, and the code for processing of that data is predictable, there is little value in repeatedly running functional tests that always pass.

I worked for a number of years for a client doing healthcare analytics on patient medical data. Sometimes, their heuristic for verifying an algorithm would be: test that the new code works by comparing its results against code written in an entirely different system/programming language. They would take a massive cut of the data and run it through and compare the results.

Another heuristic they sometimes used to test new algorithms and capabilities was to run their code and compare their results against those reported in published research papers. Where the results differed, they need to reason about those differences (sometimes it was a problem with their code; at other times, it was that their code was more accurate at choosing cohorts or their statistical algorithms were better). Some person needed to critically analyze the results, reason why the discrepancies were there, and determine what, if anything, to do about them. This process couldn’t be automated.

Chelsea works with data scientists at Mozilla on sanitizing personal data for searches. The rules for this are complicated, language-specific, and sometimes people enter search terms in more than one language. She finds data scientists don’t share the same testing values as many software developers do.

Data scientists make informed assumptions about aggregated data. If those assumptions don’t hold, they reassess the data processing rules and revisit their assumptions. To them, testing is insufficient to ensure system quality. Monitoring actual system behavior against expected data characteristics, however, is critical. When the data characteristics being monitored fall outside of expected tolerances, this triggers developers to look into the situation. Developers then run some automated tests to determine if something is wrong with their code. If those automated tests pass, they then call on a data scientist to analyze a sample of the data and decide what to do. Something has changed and there likely needs to be some change to either in the assertions about the data’s characteristics or the rules for handling it.

Trialing new Heuristics

Chelsea and I appreciate what we can learn from people with different backgrounds: data scientists, QA folks, testers, and new colleagues. There are many different ways to test and design software. And if I don’t hold onto my preferred heuristics too tightly, I might learn something.

But how do I decide when to try out some new heuristics or to stick with what I know?

If things are going well, I’m not as motivated to try out new ideas. I need a small nudge. I’ll try new some new-to-me heuristics if I feel I have some wiggle room. Let me experiment, practice, and think through the consequences. Give me a bit of time to let new values and practices soak in.

When I start to work on a new system or folks from different backgrounds, that too, is another opportunity to try out new ways of working.

But under pressure, I find myself narrowing my focus and sticking with what I know best (even if it is a poor option). So, if I can, I catch myself and take a small step back from problem solving. I pause, take a breath, and ask: If my heuristics aren’t currently working for me, what are some options?

If I want to introduce a test-first TDDer to my testing approach, I might suggest a modest experiment: “Let’s work together on some design and coding problems and compare our two approaches. Let’s find out what tests we come up with following my test-driven development approach. Let’s try your test-first TDD on a similar problem and see what tests we come up with. Let’s see what we learn.”

At the very minimum, I hope we’d learn of our shared value: we both value tested code. We might learn from each other more about the kinds of tests we like to write. Or how many tests we think are needed. Or how we rework existing tests. We might share some heuristics for deciding what next test to write or what isn’t worth testing. Through experimentation and reflection, we can grow and learn from each other.

Testing, Testing…our Heuristics

We gather heuristics through conversations
We gather heuristics through storytelling and conversations

Recently Chelsea Troy and I chatted over Zoom about software testing heuristics. I met Chelsea last year at DDD Europe. In this and a couple of snack-sized posts, I will reflect on some highlights of our conversation. Chelsea has also written about our conversation.

A Leading Question Leads to Some Heuristics

I started by asking, “What is important about testing that people should get but don’t?”

Chelsea answered, that while Test Driven Development (TDD) is useful, it doesn’t solve all testing needs.  If developers are oversold on the benefits of TDD, they can become jaded on testing in general. They shouldn’t. TDD doesn’t include specific practices that address resilience, or reliability. But it is useful for developing and testing deterministic code.

Chelsea shared the experience of learning first-hand how TDD didn’t have all the answers to testing. She worked on a team of TDD enthusiasts developing a mobile app for a client. Although the team thought they knew how to develop quality software, their initial prototype developed following TDD didn’t address these challenging requirements: being usable under extreme weather conditions, having a simple UX, and functioning when only intermittently connected to the internet and their backend software. They needed to add more design and testing techniques to their toolbox, along with their TDD testing. Chelsea also said that she learned a lot about testing for these kinds requirements from their client’s QA team.

Some heuristics we’ve touched on:

  • Use TDD to develop and test functionality of deterministic software.
  • Use other strategies to design and test for software system qualities such as usability, performance, reliability, or resilience.
  • Match your testing strategies and tactics to your application’s development and execution context.

A Brief Introduction to Heuristics

I have been intrigued by software development heuristics, ever since I read Billy Vaughn Koen’s Discussion of the Method: Conducting the Engineer’s approach to problem solving. Koen defines a heuristic as, “anything that provides a plausible aid or direction in the solution of a problem but is in the final analysis unjustified, incapable of justification, and potentially fallible.” Heuristics are never guaranteed. When a heuristic fails, you back up and try another one.

I enjoy hunting for heuristics while designing and coding with others. Open-ended conversations where we swap stories and reflect on our heuristics are another great opportunity. Generally, I look for three kinds of heuristics:

  1. Action heuristics. Things we do to solve our immediate problem. There are many action heuristics. Design patterns are one well-known form of action heuristic. We know these heuristics by name because authors took the time to write up them as named software patterns. But there are many testing and development techniques both smaller and larger than patterns. For example, in Test-Driven Development (TDD), the practice of “write a test, then write code to pass the test” is a heuristic for incrementally designing and implementing tested code.
  1. Value Heuristics. Values motivate our actions. Underlying TDD is the value: Testing should be an integral part of design and coding.

Our values determine what actions seem appropriate. Because I value understandable code, I I take several actions to make my code more comprehensible: I give methods, functions, and variables meaningful names; keep code in methods short; and write code at the same level of detail in a method, factoring out lower-level details into helper methods.

Values depend on context. As the context shifts, so do our values. This doesn’t mean we are fickle; just pragmatic. Most of the time we aren’t conscious of making these shifts. When cutting and pasting code from stack overflow, I don’t value code understandability so much as I do the ability to quickly determine whether that code addresses my current problem. If it does, then I rewrite that code to make it clearer and to fit with the style in my existing codebase. In production code, I do value understandability.

  1. Guiding heuristics. Heuristics that lead to related actions. For example, Chelsea shared one guiding heuristic: Don’t treat test code the same as production code, instead, make each test understandable in isolation. This leads her to write self-contained test methods. She doesn’t like a test where she has to read the code that it calls on before she can understand the test. She also isn’t a fan of applying the DRY (Don’t Repeat Yourself) heuristic to test code.

Comparing competing heuristics

Chelsea mentioned that understandable tests can also serve as valuable design documentation and discovery tools. It’s easier to modify test code that is self-contained, rerun it, and explore how the software responds.

I asked Chelsea whether she would put aside her heuristic of keeping tests self-contained if there were compelling reasons. What if set up conditions for tests took a long time (for example, doing a cut of a database in order to build an in-memory cache of test data)? What if there was complex code that was repeated in similar tests but was slightly altered? Did someone make cut-and-paste-modify-and-reuse errors, or were there valid reasons for these differences?

Factoring common initialization code out of tests into common setup code, provides a “standard” execution context for a suite of tests. It also makes it easier to vary that context and rerun the test suite. Factoring out code common to several tests and clearly labeling what it does eliminates having to second guess reasons for slight variations in test code.

Depending on your situation and personal preferences, you may choose the heuristic, “Keep code in tests so you can understand and easily manipulate it,” or the other, “Factor out expensive or error prone code into common code shared by tests.” These heuristics compete with each other. Neither is better. They are simply alternative ways to structure your test code.

The Value of Knowing your Values

If people don’t know your values (and how they differ from their values), they may not understand why you prefer to work the way you do. For example, while I value testing, I don’t practice test-first development.

If you understand TDD to mean strictly write tests before writing any code, your TDD heuristic is: begin by writing a small test, then write code that proves that the test fails, then rewrite your code to pass that test. Don’t add any more code than necessary to make the test pass. Do this repeatedly until you’ve fully implemented your code.

At the end of a TDD cycle, you have a bunch of tests and fully functioning code that passes those tests. Working this way, you typically implement a single class at a time. You test and implement lower-level functionality, then repeat the process to develop the code that uses that functionality. Your software tends to grow from the “bottom” up.

I value testing, but typically design and implement several classes that work together at the same time. Once I prove to myself that my overall design hangs together (through some sort of simulation), I implement it. When finished, I check in code for several classes along with tests that demonstrate their behavior. My code is tested, but I don’t leave around lots of low-level tests.

For example, I may use a strategy pattern to calculate charges for different items on an invoice. I would initially implement each individual strategy class and check that it worked as I expected. But I’d remove most if not all tests for those individual strategies once I proved to myself that they worked. Their code is simple enough to read at a glance. Once I get low level classes working (especially if they don’t retain any state), I don’t need to keep tests around to ensure that they work. Once implemented, they rarely change. If I do need to revise them, at that point I might reconsider my testing heuristics (and add some tests that reflect these changes). The valuable tests I tend to preserve are those that determine which strategy to use, how to add new kinds of strategies, and different ways to apply discounts and special pricing.

Let’s contrast my testing heuristics with those of test-first TDDers.

We both share this value heuristic: Value code that has tests over code (even if it works) that doesn’t have tests.

Test-first TDDers apply this heuristic: Write tests as you incrementally design code. Interleave testing and coding, repeatedly. Start with the simplest test and the simplest implementation. Only implement enough functionality so that your latest test passes. Build functionality and tests in small increments; each increment moving you closer to your final tested design.

They also have this guiding heuristic: You produce a cleaner design if you write tests first before writing any code.

I don’t share that heuristic.

My heuristic for developing designed, tested code is: Consider the design of one or more classes working together to achieve some functionality. Model your design using some lightweight technique, such as CRC cards (Class-Responsibility-Collaborators) or whiteboard sketches. Once you know what each class’ responsibilities are and how they interact, then implement them. Write simple tests and debug as you implement, but remove them if they are low level (and other code that has tests exercises their functionality). Keep only a lean set of illustrative tests that demonstrate how the classes work together and ensure that your design will continue to function properly.

At the end of my design/development cycle, I may write additional tests, revise existing ones, or remove insignificant tests. I use this grooming and cleanup step, before committing my code, as one way to double check my work.

Chelsea summarized my TDD heuristic as: Put tests in at the right level of abstraction once you know what your design is about.

Chelsea cautions, however, that if you don’t know what the right level of abstraction is and you follow test-first TDD heuristics by rote, you end up with tests at too low a level. Also, if you don’t have heuristics for pruning them, you end up with too many.

I view most testing I do while I implement my design as temporary scaffolding. Since I’ve already sketched out design ideas before coding, tests are not my primary tool for design. I test to verify my design. If I need to adjust my design as I implement it (and I expect to), that’s OK. I keep tweaking it and my code, and continue testing.

I suspect the biggest difference in our two approaches is that test-first TDDers don’t view their tests as temporary scaffolding, and I don’t view the cycle of test-first TDD as the only (or best) way to understand what a design should be. We both value tested, well-designed code.

Bringing to light the different values that underlie competing heuristics can be illuminating. But how can we get others to appreciate and try out our heuristics? How can we approach new-to-us heuristics with an open mind? I’ll touch on these topics and more in my next post.

2021 Year End Review

Kapa’a, Hawaii photo by Rebecca

Here’s a quick recap of blog posts I wrote in 2021.

Agile Experience Reports

Juggling Multiple Scrum Teams I introduce Iuri Ilatanski’s experience report about life as a multi-tasking Scrum Master. Juggling involves meeting each team’s specific needs. I was Iuri’s “shepherd”—his sounding board and advocate—as he wrote this report presented at Agile 2021. Thank you, Iuri, for being so open to discussion, reflection, and the hard work of revising your writing.

Agile Experience Reports: A Fresh Look at Timeless Content I spent August organizing the vast Agile Alliance experience reports collection hosted on the Agile Alliance’s website. The collection includes reports from 2014 to 2021 as well five XP conferences. Experience reports are personal stories that pack a punch. There are many gems of wisdom here.

Domain Driven Design

Splitting a Domain Across Multiple Bounded Contexts Sometimes it can more productive to meet the specific needs of individual users rather than to spend the time designing common abstractions in support of a “unified” model.

Design and Reality We shouldn’t assume domain experts have all the language they need to describe their problem (and all that you need to do as a software designer is to “capture” that language and make those real-world concepts evident in your code).

Models and Metaphors Listening to the language people use in modeling discussions can lead to new insights. Sometimes we find metaphors, that when pushed on, lead to a clearer understanding of the problem and clarity in our design.

Decision Making

Noisy Decisions After reading Noise: A Flaw in Human Judgment by Daniel Kahneman, Olivier Sibony, and Cass Sunstein I wrote about noisy decisions in the context of software design and architecture. These authors define noise as undesirable variability in human judgment. Often, we want to reduce noise and there are ways we can do so, even in the context of software.

Is it Noise or Euphony? At other times, however, we desire variability in judgments. In these situations variability isn’t noise, but instead an opportunity for euphony. And if you leverage that variability, you just might turn up some unexpected, positive results.

Heuristics Revisited

Too Much Salt? We build a more powerful heuristic toolkit when we learn the reasons why (and when) particular heuristics work the way they do. I now think it is equally important to seek the why behind the what you are doing as you cultivate your personal heuristics.

Too Much Salt?

Practiced speakers and writers know that good examples rarely tell the whole story. Instead they shape their narratives to make the big ideas stand out. Stories are bent ever so slightly, plot details are pared down, leaving space for emphasis and audience impact.

I wouldn’t go so far as to say we invent fiction, but rather that we simplify our stories to make them compelling. Too many details and our audience would tune us out. And when we repeatedly tell these stories, we come to believe we’ve pared down the narrative to its essence. We’ve nailed it!

But what happens when you encounter information that sheds new light on such a story? What if the story you’ve told no longer rings quite true?

The past few years I’ve explored Billy Vaughn Koen’s definition of heuristics as they relate to software design and architecture. I’ve written blog posts and essays, presented talks, keynotes, and workshops about heuristics (for a gentle introduction to different kinds of heuristics see Growing Your Personal Design Heuristics Toolkit).

Along the way I’ve encouraged people to discover, distill, and own their personal heuristics. I advise them to not just take every bit of advice they find about software design as being authoritative. Instead, they should question the validity of that advice’s applicability to their specific context. They should also bring their own heuristics they’ve accrued through experience to bear on the problem at hand.

I start most heuristics presentations with a story about my experience cooking my very first Blue Apron recipe for Za’atar Roasted Broccoli Salad (for details see Nothing Ever Goes Exactly by the Book). I jokingly point out all the places that the recipe suggests adding salt. I then postulate that if I just blindly followed Blue Apron instructions without applying any judgment, the dish would be way too salty.

Instead of following the recipe, I told how I used my past experiences to “modify” the instructions to fit with my understanding of what makes for a tasty dish. In short, I ignored lots of places where the recipe suggested adding salt.

My heuristic for this situation was to ignore advice on where to add salt if it seems excessive and only add salt to taste at the end. Following that heuristic, I most likely made a much blander dish that, while it looked great, undoubtedly lacked flavor.

But… achieving a tasty dish wasn’t the point of my original story!

Instead, it was to encourage using personal judgment and heuristics based on past experiences. I wanted to emphasize that we each have experiences and insights that we can and should draw on in many situations. Simply trusting and blindly following “experts” or “recipes” because they are published or credentialed can lead us astray—or to cooking inedible dishes. We should value and treasure our experiences and draw upon the heuristics we’ve accrued through those experiences.

Ta-da! Point made! Perhaps…

A week ago as I was waiting for surgery to repair my broken nose (that’s another story for another time) I started reading How to Taste, by Becky Selengut. At the time I was detached, slightly impatient, and resigned to just being there in the moment. The doctor was late and I had time to kill.

The introductory first chapter starts: “Telling you to ‘season to taste’ does nothing to teach you how to taste—and that is precisely the lofty goal of this book. Once you know the most common culprits when your dish is out of whack, you’ll save tons of time spinning your wheels grabbing for random solutions. You’ll start thinking like a chef. Some people are born knowing how to do this—they are few and far between and most likely have more Michelin stars that you or I; the rest of us need to be taught. I’ve got your back.”

Now that grabbed my attention!

Unless I was superhuman (I’m not), I can’t rely on my instincts to become a better cook, knowing when and how much seasoning or salt to add.

My experiences cooking have certainly been ad hoc. And the heuristic I applied for salting that Blue Apron dish came from who knows where. I never learned why I was doing what I was doing when following a recipe or ignored some parts of it. Instead, I learned a few shortcuts and substitutions, largely through combing the internet. And while my technique may have improved over time, I haven’t developed the ability to craft a dish with nuanced flavors, let alone improvise one.

Becky suggests reading her book “…start[ing] at the beginning, as I intend to build upon the concepts one puzzle piece at a time.” Each chapter presents fundamental facts, reinforced by a recipe that highlights the important points of the chapter and then suggesting Experiment Time activities intended to develop a reader’s palate

Aha, again!

A good way to learn how to exercise judgment is to perform structured experiments after you’ve learned a bit of theory and why things—in this case, the chemistry of cooking—work the way they do.

I quickly read through the chapter on Salt and learned: Salt is a flavorant—bringing out the flavor of other ingredients. Salting early and often can improve taste dramatically. For example, adding salt to onions as they sauté can speed up the cooking process and causes them to sweat out water. And when you only season a soup at the end, no matter how much salt you add, the flavors of unsalted ingredients (for example potatoes), fall flat. You end up over salting the soup stock and still having tasteless, bland potatoes. Salt needs to be added at the right time, often at several steps in the cooking process, to have the desired result. And to my surprise, different kinds of salt—iodized, kosher, flaky, fine-grained sea salt, each have their own flavoring properties and ratios in recipes.

This brought to mind a whole new way of thinking about my Blue Apron cooking experience. Blue Apron didn’t have bad recipes, but their recipes didn’t make me a better cook either. This is because most recipes focus on the how—not the why. Their pretty little pictures and step-by-step instructions did nothing to help me to achieve an understanding of how to achieve tasty dishes.

And that’s a problem if I want to get better at cooking tasty dishes and not simply at following recipes.

I’m afraid way too much information we absorb—whether it is about cooking or agile practices or software development—is presented as step-by-step lists of instructions, without any explanation of why it makes sense to do so or the consequences of not doing a particular step specifically as instructed.

Consequently, we learn a bunch of procedures, or simply cut and paste them. We follow instructions because somebody says this is what we should do. Over time we may build up a playbook of those procedures but our understanding of why these procedures work isn’t very deep or rich or adaptable.

If we want to truly gain proficiency in cooking (or software design or programming or running or gardening or basket weaving), instruction that emphasizes the why along with the how is what we need.

Teach me some facts that ground what I’m about to do in a bit of knowledge. Spark my curiosity. Inspire me. And then give me tasks that let me tinker and practice applying that knowledge. Only then will my actions become integrated with that knowledge, allowing me to build up adaptable heuristics that I can use in novel situations.

In hindsight, I now believe that the story I told about applying my personal heuristics and knowledge to a problem was OK. It’s reasonable to be a healthy skeptic when someone says, “Just do as I say. Trust me,” when attempting a new task. Distilling you own heuristics from previous experiences and applying them in familiar situations is also good. And writing them down helps to bring them to your awareness.

But in addition, I now think it is equally important to seek the why behind the what you are doing. And to loosen your grip on those simpler narratives you’ve held dear. They are not the whole story and they may be holding you back. Be open to new information that may reshape your stories and enhance your heuristic toolkit.

Perhaps one day, with enough knowledge and practice, I’ll be able to create a flavor profile for a dish instead of merely following the recipe.

Is it Noise or Euphony?

The book Noise: A Flaw in Human Judgment by Daniel Kahneman, Olivier Sibony, and Cass Sunstein has me thinking deeply about noisy decisions.  In this context, noise is defined as undesirable variability in judgments. They explain two different kinds of noise—level noise (variability in the average level of judgments by different people) and pattern noise. Pattern noise is further broken down into the unique noise individuals bring into any decision and occasion noise—noise caused by the particular context surrounding particular decisions. Occasion noise can be influenced by our mood, the interactions with people we’re deciding with, what we ate for dinner last night, or even the weather.

So when is noise worth reducing?  And what can we do to reduce that noise? How do we know our efforts at noise reduction have the desired effect?

Are there situations where variability might be desirable? I haven’t found a name in the literature for such desirable variation. Perhaps euphony—a harmonious succession of words having a pleasing sound—is one possibility. In these situations we’re favoring finding some euphony over conforming to a noise-free rigid standard for our judgments.

I’ll use the review of conference submissions of papers, talks, and workshops as an example of where both noise and euphony play a part in our decision-making, as it is one I am quite familiar with.

One major source of variability is when new reviewers join a review committee. Newcomers often look at submissions differently than experienced reviewers. But not all variability is noise. If some variability is welcomed, expected, and encouraged, the review process greatly benefits from fresh perspectives. This kind of variability adds spice.

And yet, there may be standards (whether formally written down or more loosely held) we’d like uphold for what we consider a worthy submission. One way to reduce level noise in reviews is to ensure that reviewers understand these expectations. One way to convey this information is to hold a meeting where we discuss and present examples of submissions and exemplary reviews (reviews from prior years are a good source). Newcomers can learn what a reasonable proposal is and what is expected in a review. They also get to know their peers, ask questions and, in effect, “calibrate” their expectations for reviewing.

But this meeting is insufficient to remove another major source of noise—occasion noise caused by group interactions. Kahneman, Sibony, and Sunstein state: “Groups can go in all sorts of directions depending in part on factors that should be irrelevant. Who speaks first, who speaks last, who speaks with confidence, who is wearing black, who is seated next to whom, who smiles or frowns or gestures at the right moment—all these factors, and many more, affect outcomes.” Group dynamics introduce noise.

But there are several practical ways to further reduce the noise in group decisions. Oscar Nierstrasz wrote a set of patterns called Identify the Champion for reviewing academic papers. I encourage anyone running a conference to consider a review process along the lines of what Oscar introduces. I’ve adapted these patterns and process to non-academic conference reviewing with only a few minor tweaks.

The key ideas in these patterns are the roles of champion and detractor, and a structured process for discussing submissions. Champions are strong advocates for a submission who are prepared to discuss its merits; detractors disapprove of a submission and are prepared to discuss its weaknesses.

Submissions are discussed in groups according to their highest and lowest scores. Care is taken to identify proposals with both extreme high and low scores, and to not to rank submissions numerically. If a submission has no champion, it isn’t discussed. It is rejected. Ranking and then discussing submissions one-by-one in order would only add level noise (actually I find we get numbed by reviewing and tend to reject “lower ranked” submissions without enough consideration).

The review committee is asked to suspend final judgments until all championed submissions are presented. The champion is first invited introduce the submission and explain why it should be accepted. Then, detractors are invited to state their reasons. At the end of all presentations, discussion is opened for all and the committee tries to reach consensus.

In practice, following this discussion protocol, it is easy to accept outstanding submissions—they typically have plenty of champions. This leaves the bulk of our time to dig into the strengths and weaknesses those championed submission that have mixed reviews.

The Identify the Champion process forces me to hit “pause” on my judgments and to not jump to premature conclusions. And the first thing we hear about any submission are its positive aspects. When detractors speak, I get a richer understanding of that submission. Although I might have had some initial impressions, I find they can and do change.

Sometimes I warm up to a submission. At other times, detractors’ perspectives grab my attention and make me revisit whether the submission is as strong as I had initially thought.

The cumulative weight of all this discussion has an even more profound effect. I find I am much more accepting of the outcome: what will happen will happen. Yes, there is unpredictability in this decision-making process. But we’re all trying to make reasonable decisions as a group. I end up actively engaged in making the outcome the best it can be and supportive of our collective decisions.

Although the Identify the Champion review process still has noise (it is hard to eliminate noise caused by group dynamics entirely), I believe it to be less noisy than most other review processes I’ve participated in.

One downside, however, is that it can be exhausting. To avoid having some occasion noise creep back in, it’s good to ensure that reviewers get sufficient breaks to meet their personal needs, and not get too tired or cranky or hungry.

One place I’ve applied my adaptation of the Identify the Champion pattern is for Agile Alliance experience report submissions. Experience report submissions are “pitches” for written experience reports. Only after a submission has been accepted does the actual writing begin. So as reviewers, we’re not only judging the topic of the pitch but also whether the submitter will be able to write a compelling report. Champions of experience reports also commit to shepherding the writing of the reports. These shepherd-champions commit to reviewing and commenting on drafts of reports are as they are written over a period of several weeks. Now that’s real commitment! Frequently we have more championed submissions than room in the conference program. So our judgments come down to some difficult choices.

Before we hold our review meeting, we ask reviewers to give us two lists: submissions they’d like to shepherd and an optional list of submissions they’d like to see on the program (but do not want to shepherd). At our meeting, we then have a lively discussion where champions forcefully advocate for their proposals and gain others’ support. Once again, I find we spend most of our time discussing those submissions that have mixed reviews. But we also spend time a lot of time listening to champions and then as a group making tradeoffs between submissions (remember we have more good submissions than we have capacity to accept them). The message we convey to all reviewers is that that if you really want to shepherd a submission, we as a group will support your decision to be a shepherd-champion. But let’s discuss first.

We can’t guarantee the quality of any final report. We base our judgments on both what the submitters have written (in many cases, there has been a back and forth conversation between submitters and reviewers that we can all see that has led submitters to reshape and refine their proposals) as well as the convincing arguments of champions.

Judging conference submissions is subjective. Our process acknowledges that. We accept the risk of selecting a less-than-stellar report proposal over missing an opportunity for a novel or insightful report.

Is it our goal to eliminate noise in our decision-making? Where we can, yes. But, that isn’t our only goal. If we tried to eliminate it entirely we might end up establishing standards for experience report submissions that would inadvertently filter out newness or novelty. In our search for a bit of euphony we stretch out to accept a submission if there is a convincing champion. Consequently, we accept a little variability (and unpredictability) in our decision-making. However, at the end of our review process, reviewers are generally happy with the proposals we accept, happy with their shepherding assignments, and eager to begin working with their experience report authors. An important aspect of our process, which cannot be understated, is that we also work hard to make good matches between each champion-shepherd and prospective authors. Not only do reviewers buy into the review process, they also commit to being ongoing champions.

Noise reduction is important in many situations, especially group decisions. Paying careful attention to how the group is informed, discusses, and then decides can reduce noise. Paying attention to the voices of champions is one way to turn up euphony. By tuning your decision-making processes you can achieve these goals.

Noisy Decisions

“The world is noisy and messy. You need to deal with the noise and uncertainty.”–Daphne Koller

I have tinnitus. When there isn’t much sound in my environment, for me it still isn’t quiet. I hear a constant background hum. It is hard to describe what this noise sounds like. I’ve lived with it for too long.  Remembering back to when I first noticed it, I thought there was some nearby electrical device humming. Was it my phone plugged into the wall outlet? Or??? I remember getting up from bed to hunt for the source of that noise.

I can’t forget that noise or ignore it. It doesn’t go away. But it doesn’t dominate my headspace. I’ve learned to slip between that noise and my desire to sleep or to just be in a quiet place, and not let it distract me. I’ve learned to deal with tinnitus.

Recently I finished reading Noise: A Flaw in Human Judgment by Daniel Kahneman, Olivier Sibony, and Cass Sunstein.

The entire book is about the “noise” in human judgments and what we can do to lessen its effects. So what exactly is this noise? A simple definition is “noise” is undesirable variability in judgments. Call this system noise if you will.

Both recurring and onetime decisions are influenced by noise. Depending on the time of day, how well I slept last night, what others say, and even how we as a group decide how to decide effects my judgment. This noise, in addition to any biases I have, affects all my judgments.

Kahneman, Sibony and Sunstein introduce two different types of system noise: level noise and pattern noise.  Let’s consider each in turn.

Level noise is easiest to understand. It is the variability in the average level of judgments by different people. People judge on different scales. Consider rating a talk at a conference. Perhaps you never give a conference speaker the highest possible rating because you believe they could do better. Or, maybe if you are star struck, you always rank a presentation from a well-known speaker more highly. Personally, I know that I tend to not rate speakers either as very high or very low, because, well…I’m sort of middling with my ratings. On average, humans aren’t average in their judgments.

The other kind of noise, pattern noise, is often an even bigger factor in our judgments. It is comprised two parts: occasion noise and our own personal idiosyncratic tics. Occasion noise is the variability in the judgment at different points in time. Depending on my mood, how stressful the situation, how well I slept last night, or how the question is put to me, my judgment will vary. A simple example of occasion noise that software folks can relate to is estimating how long it will take to complete a task. My mind isn’t the same today as it was yesterday. Heck, from moment to moment, I might give a different answer simply because I am thinking about the task differently, or I that I am hungry (and hence tend to come to a snap judgment), or I’m grumpy, or I’m happy.

The second source of pattern noise is our personal attitudes toward the particular judgment context. Consider, for example, this kind of noise when reviewing conference proposals for papers or talks. Some reviewers are harsher in their personal rating for some proposals and more lenient in others. This variability reflects a complex pattern in the individual attitudes of reviewers toward particular proposals. For example, one person may be relatively generous in their review of proposals on a particular topic. Another may be particularly keen on proposals that seem to break new ground but be a harsher judge of proposals on topics that are perceived to cover familiar territory.

As Kahneman, Sibony, and Sunstein state: “Noise in individual judgment is bad enough. But group decision making adds another layer to the problem. Groups can go in all sorts of directions depending in part on factors that should be irrelevant. Who speaks first, who speaks last, who speaks with confidence, who is wearing black, who is seated next to whom, who smiles or frowns or gestures at the right moment—all these factors, and many more, affect outcomes.”

Having been part of many conference review teams as well as on the receiving end of their reviews over the years…I find that the dynamics of group decisions to be a particularly salient example of system noise.

The information about system noise in general and noise in group decision making can be rather depressing. If we humans are naturally wired to be imperfect and flawed in our judgments, how can we hope to make reasonable decisions? And, once we become aware of our judgment errors, and try to be better decision-makers, the actions required to lessen the effects of noise these authors suggest seem surprisingly difficult to carry out.

Awareness is a good first step. But it’s not enough. In contrast to tinnitus, of which I’m constantly aware, the noise in our judgments is at first difficult to perceive. But once you become aware of sources of system noise, you start noticing them everywhere. How and when (and even whether) it is appropriate or feasible to mitigate these sources of noise is a topic for another day.

Design (Un)Certainty and Decision Trees

Billy Vaughn Koen, in The Discussion of The Method: Conducting the Engineer’s Approach to Problem Solving, says, “the absolute value of a heuristic is not established by conflict, but depends upon its usefulness in a specific context.”

Heuristics often compete and conflict with each other. Frequently I use examples I find in presentations or blog posts to illustrate this point.

For example, I took this photo of a presentation by Michiel Overeem at DDD Europe where he surveyed various approaches people employed to update event stores and their event schemas.

Five different alternatives for updating event stores

Five different alternatives for updating event stores


Event stores are often used (not exclusively) in event-sourced architectures, where instead of storing data in traditional databases, changes to application state are stored as a sequence of events in event stores. For an introduction to event sourced architectures see Martin Fowler’s article. An event store is comprised of a collection of uniquely identified event streams which contain collections of events. Events are typed, uniquely identified and contain a set of attribute/value pairs.

Michiel found five different ways designers addressed schema updates, each with different tradeoffs and constraints. The numbers across the top of the slide indicate the number of times each approached was used across 24 different people surveyed. Several used more than one approach (if you add up the x/24 count at the top of the slide there are more than 24 updates). Simply because you successfully updated your event store using one approach doesn’t mean you must do it the same way the next time. It is up to us as designers to sort things out and decide what to do next based on the current context. The nature of the schema change, the amount of data to be updated, the sensitivity of that data, and the amount of traffic that runs through apps that use the data all play into deciding which approach or approaches to take.

The “Weak schema” approach allows for additional data to be passed in an event record. The “upcaster” approach transforms incoming event data into the new format. The “copy transform” approach makes a copy and then updates that copy. Michiel found that these were the most common. “Versioned events” and “In-Place” updates were infrequently applied.

I always want to know more about what drives designers to choose a particular heuristic over another. So I was happy to read the research paper Michiel and colleagues wrote for SANER 2017(the International Conference on Software Analysis, Evolution and Reengineering) appropriately titled The Dark Side of Event Sourcing: Managing Data Conversion. Their paper, which gives a fuller treatment of options for updating event stores and their data schemas, can be found here. In the paper they characterized updates as being either basic (simple) or complex. Updates could be made to events, event streams, or the event store itself.

Basic event updates included adding or deleting an attribute, or changing the name or value of an attribute. Complex event updates included merging or splitting attributes.

Basic stream update operations were adding, deleting, or renaming events. Complex Stream updates involved merging or splitting events, or moving attributes between events.

Basic store updates were adding, deleting or renaming a stream. Complex store updates were merging and splitting streams or moving an event from one stream to another.

An example of a simple event update might be deleting a discountCode attribute. An example of a complex event update might be splitting an address attribute into various subparts (street, number, etc.).

Importantly, their paper offered a decision framework for selecting an appropriate update strategy which walks through a set of decisions leading to approaches for updating the data, the applications that use that data, or both. I’ve recreated the decision tree they used to summarize their framework below:

Decision Tree for Upgrading an Event Store System from The Dark Side of Event Sourcing: Managing Data Conversion

Decision Tree for Upgrading an Event Store System from The Dark Side of Event Sourcing: Managing Data Conversion


The authors also tested out their decision framework with three experts. Those experts noted that in their experiences they found complex schema updates to be rare. Hmm, does that mean that half of the decision tree isn’t very useful?

Also, they offered that instead of updating event schema they could instead use compensating techniques to accomplish similar results. For example, they might write code to create a new projection used by queries which contained an event attribute split into subparts—no need to update the event itself. Should the left half of their decision tree be expanded to include other compensating techniques (such as rewriting projection code logic)? Perhaps.

Also, the experts preferred copy or in-place transformations instead of approaches that involved writing and maintaining lots of conversion code. I wonder, did they do this even in the case of “basic” updates. If so, what led them to make this decision?

Decision trees are a good way to capture design options. But they can also be deceptive about their coverage of the problem/solution space. A decision table is considered balanced or complete if it includes every possible combination of input variables. We programmers are used to writing precise, balanced decision structures in our code based on a complete set of parameters. But the “variables” that go into deciding what event schema update approach aren’t so well-defined.

To me, initially this decision framework seemed comprehensive. But on further reflection, I believe it could benefit from reflecting the heuristics used by more designers and maintainers of production event-sourced systems. While this framework was a good first cut, the decision tree for schema updates needs more details (and rework) to capture more factors that go into schema update design. I don’t know of an easy way to capture these heuristics except through interviews, conversations, and observing what people actually do (rather than what they say they prefer).

For example, none of the experts preferred event record versioning, and yet, if it were done carefully, maybe event versions would require less maintenance. Is there ever a good time for “converting” an old event version to a newer one? And, if you need to delete an attribute on an event because it is no longer available or relevant, what are the implications of deprecating, rather than deleting it? Should it always be the event consumers’ responsibility to infer values for missing (deleted) attributes? Or is it ever advantageous to create and pass along default values for deleted attributes?

This led me to consider the myriad heuristics that go into making any data schema change (not just for event records): Under what conditions is it better to do a quick, cheap, easy to implement update to a schema—for example deciding to pack on additional attributes, having a policy to never delete any?

People in the database world have written much about gritty schema update approaches. While event stores are a relatively new technology, heuristics for data migration are not. But in order to benefit from these insights, we need to reach back in time and recover and repurpose these heuristics from those technologies to event stores.Those who build production event sourced architectures have likely done this or else they’ve had to learn some practical data schema evolution heuristics through experience (and trial and error). As Billy Vaughn Koen states, “If this context changes, the heuristic may become uninteresting and disappear from view awaiting, perhaps, a change of fortune in the future.” He wasn’t talking about software architecture and design heuristics per se, but he could have been.

Reconciling New Design Approaches with What You Already Know

Image

Change
Last week at the deliver:Agile conference in Nashville I attended a talk by Autumn Crossman explaining the benefits of functional programming to us old timey object-oriented designers. I also attended the session led by Declan Whelan and Sean Button, on “Overcoming dys-functional programming. Leverage & transcend years of OO know-how with FP.”

The implication in both talks was that although objects have strengths, they are often abused and not powerful enough for some of today’s problems. And that now is an opportune time for us OO designers to make some changes to our preferred ways of working.

Yet I find myself asking: when should I step away from what I’ve been doing and know how to do well and step into a totally new design approach?

No doubt, functional programming is becoming more popular. But objects aren’t going away, either.

There are some benefits of pure functional solutions to certain design problems. Pure functional programming solutions don’t have side effects. You make stream-processing steps easily composable by designing little, single purpose functions operating over immutable data. You are mutating data, it just isn’t being mutated in place. In OO terms, you aren’t changing the internal state of objects, you are creating new objects with different internal state. By using map-reduce you can avoid loop/iteration/end-condition programming errors (letting powerful functions handle those details). No need to define variables and counters. This is already familiar to Smalltalk programmers via do:, collect:, select:, and inject:into: methods which operate on collections (Ruby has its equivalents, too). And by operating on immutable data, multi-threading and parallelization get easier.

I get that.

But I can create immutable data using OO technology, too. Ever hear of the Value Object pattern? Long ago I learned to create designs that included both stateful and immutable objects. And to understand when it is appropriate to do so. I discovered and tweaked my heuristics for when it made sense to stream over immutable data and when to modify data in place. But in complex systems (or when you are new to libraries) it can be difficult to suss out what others are doing (or in the case of libraries, what they are forcing you to do).

But that’s not the point, really. The point is, once you understand how to use any technique, as you gain proficiency, you learn when and where to exploit it.

But is pure functional programming really, finally the panacea we’ve all been looking for? Or is it just another powerful tool in our toolkit? How powerful is it? Where is it best applied?

I’m still working through my answers to these questions. My answers will most likely differ from yours (after all, your design context and experience is different). That’s OK.

Whenever we encounter new approaches we need to reconcile them with our current preferred ways of designing. We may find ourselves going against the grain of popular trends or going with the flow. Whatever. We shouldn’t be afraid of trying something new.

Yet we also shouldn’t too easily discount and discard approaches that have worked in the past (and that still work under many conditions). Or, worse yet, we shouldn’t feel anxious that the expertise we’ve acquired is dated or that our expertise can’t be transferred to new technologies and design approaches. We can learn. We can adapt. And, yet, we don’t have to throw out everything we know in order to become proficient in other design approaches. But we do have to have an open mind.

We also shouldn’t be seduced by promises of “silver bullets.” Be aware that evangelists, enthusiasts, and entrepreneur frequently oversell the utility of technologies. To get us to adopt something new, they often encourage us to discard what has worked for us in the past.

While I like some aspects of functional programming, I see the value in multi-paradigm programming languages. I’m not a purist. Recently I’ve written some machine learning algorithms in Python for some Coursera courses I’ve taken. During that exercise, I rediscovered that powerful libraries make up for the shortcomings and quirks of any programming language. I still think Python has its fair share of quirks.

And while some consider Python to support functional programming, it isn’t a pure functional language. It takes that middle ground, as one stack overflow writer observes:

“And it should be noted that the “FP-ness” of a language isn’t binary, it’s on a continuum. Python doesn’t have built in support for efficient manipulation of immutable structures as far as I know. That’s one large knock against it, as immutability can be considered a strong aspect of FP. It also doesn’t support tail-call optimization, which can be a problem when dealing with recursive solutions. As you mentioned though, it does have first-class functions, and built-in support for some idioms, such as map/comprehensions/generator expressions, reduce and lazy processing of data.”

Python’s a multi-paradigm language with incredible support for matrix operations and a wealth of open machine learning open source libraries.

I haven’t had an opportunity to dial-up the knobs and solve larger design problems in a pure functional style. I hope to do so soon. My current thinking about a pure functional style of programming is that it works well for streaming over large volumes of data. I’m not sure how it helps support quirky, ever-changing business rules and lots of behavioral variations based on system state. Reconciling my “go to” design approaches with new ways of working takes some mental lifting and initial discomfort. But when I do take the time to new design approaches, I have no doubt that I’ll find some new heuristics, polish existing ones and learn more about design in the process.

What we say versus what we do

I’ve been hunting design heuristics for a couple of years. I’ve had conversations with designers in order to draw out their “go to” heuristics. I’ve joined design and programming sessions with experienced designers and captured on-the-fly what we were doing. My goal is to learn ways to effectively find heuristics in the wild, distill them, and then share them broadly.

But lately, I’ve been thinking about how to deal with this puzzle: What people say they do isn’t what they really do.

Let me give you an example. I joined the Cucumber folks last summer for several remote mobbing sessions. One heuristic they shared with me was this:

Heuristic: the person who has the most to learn (or knows the least about how to solve the problem) should take on the role of driver.

In “classic” mob programming as initially described, the person who is the driver and has his or her hands on the keyboard follows guidance of navigators—other mobbers who ostensibly guide the driver on what to do in order to make progress.

“In this “Driver/Navigator” pattern, the Navigator is doing the thinking about the direction we want to go, and then verbally describes and discusses the next steps of what the code must do. The Driver is translating the spoken English into code. In other words, all code written goes from the brain and mouth of the Navigator through the ears and hands of the Driver into the computer.”

What I observed the Cucumber mob doing was somewhat different. Sometimes the driver had an initial design idea and was keen to try it out. In this case, they often actively navigated and drove at the same time. Occasionally others would comment and offer advice. But mostly they just watched the design and implementation unfold. Sometimes that eager driver asked the others, should we try this now? But instead of waiting an uncomfortable length of time for them to chime in, the driver often continued on without any discussion. And I don’t think that driver was asking a rhetorical question. They wanted feedback if someone had any.

At other times the driver would stop to collect their thoughts and force a discussion. In this case the driver became uncomfortable when they didn’t get enough feedback. And sometimes they took themselves out of the driver’s role, asking someone else to fill in. In short, while I observed that driver was often in control of the wheel (and forward progress), at the same time, they didn’t overly dominate. Drivers rotated. Every one got their turn. But how these switches happened was very dynamic.

In all fairness, the mob programming website did touch on drivers and their participation in discussions:

“The main work is Navigators “thinking , describing, discussing, and steering” what we are designing/developing. The coding done by the Driver is simply the mechanics of getting actual code into the computer. The Driver is also often involved in the discussions, but her main job is to translate the ideas into code.”

While the main job of the driver may be “mechanics,” the small fast moving Cucumber team didn’t insist that getting the code into the computer be the driver’s main function. Now mind you, I suspect being remote affected their style of communications. They also knew each other well and knew each others’ common design approaches and preferences.

So why did the Cucumber mob behave this way? Did they believe one way but consciously act in another way? Did they intentionally lie about their heuristics? Or were they deceiving themselves? Are people wired to explain what they do through some kind of distortion field? How often do people believe one thing (and hold it up as an ideal) but then choose alternative heuristics? If so, is this OK?

I’m not sure the team was aware that their ways of driving/navigating deviated from the conventional driver/navigator roles until I shared my observations with them. I suspect that when they first started mobbing they were more rigorous about following the “rules” for these roles. Over time they found their own ways of working. And so the heuristics they collectively use to decide what to do, what design approach to try next, and how they interact with each other are much more fluid and nuanced than the simple descriptions of drivers and navigators on the mob programming website. They don’t exactly go “by the book.” And I suspect their heuristics for how they work together are still evolving.

So how should I as a heuristics hunter reconcile my simple goal of distilling essential heuristics with the messy realities I find on the ground?

Should I plunge into a concerted effort to sort out and formulate more nuanced heuristics? The short answer is, yes. While I want to find and record both general and more particular heuristics, I’m not inclined to want to sort them out into tidy, neat categories. After all, as Billy Vaughn Koen says, there is more than one way to solve any design problem and more than one heuristic that can work. By recording these nuances, I hope to get richer insights into the different conditions and cases and situations that lead to choosing them.

This still leaves me with one nagging question: How can I reconcile what people say they do and believe with what they actually do? My (current) approach is that as I distill heuristics I also describe the context where I find them. Should it bother me that designers don’t do as they say they do all the time? Probably not. After all, we’re wonderfully creative problem solvers. And there are always options.