An Architect’s Dilemna: Should I Rework or Exploit Legacy Architecture?

I recently spoke with an architect has been tuning up a legacy system that is built out of a patchwork quilt of technologies. As a consequence of its age and lack of common design approaches, the system is difficult to maintain. Error and event logs are written (in fact, many are), but they are inconsistent and scattered. It is extremely hard to collect data from and troubleshoot the system when things go wrong.

The architect has instigated many architectural improvements to this system, but one that to me was absolutely brilliant was to not insist that the system be reworked to use a single common logging mechanism. Instead, logs were redirected to a NoSQL database that could then be intelligently queried to troubleshoot problems as they arose.

Rather than dive in and “fix” legacy code to be consistent, this was a “splice and intelligently interpret” solution that had minimal impact on working code. Yet this fairly simple fix made the lives of those troubleshooting the system much easier. No longer did they have to dig through various logs by hand. They could stare and compare a stream of correlated event data.

Early in my career I was often frustrated by discrepancies in systems I worked on. I envisioned a better world where the design conventions were consistently followed. I took pride in cleaning up crufty code. And in the spirit of redesigning for that new, improved world, I’d fix any inconsistencies that were under my control.

At a large scale, my individual clean up efforts would be entirely impractical. Complex software isnâ’t the byproduct of a single mind. Often, it simply isn’t practical to rework large systems make things consistent. It is far easier to spot and fix system warts early in their life than later after myriad cowpaths have been paved and initial good design ideas have become warped and obsfucated. Making significant changes in legacy systems requires skill, tenacity, and courage. But sometimes you can avoid making significant changes if you twist the way you think about the problem.

If your infrastructure causes problems, find ways to fix it. Better yet (and here’s the twist): find ways to avoid or exploit its limitations. Solving a problem by avoiding major rework is equally as rewarding as cleaning up cruft. Even if it leaves a poor design intact. Such fixes breathe life into systems that by all measures should have been scrapped long ago. Fashioning fixes that don’t force the core of a fragile architecture to be revised is a real engineering accomplishment. In an ideal world I’d like time to clean up crufty systems and make them better. But not if I can get significant improvement with far less effort. Engineering, after all, is the art of making intelligent tradeoffs.

Agile Architecture Myths #4 Because you are agile you can change your system fast!

Agile designers embrace change. But that doesn’t mean change is always easy. Some things are harder to change than others. So it is good to know how to explain this to impatient product stakeholders, program managers, or product owners when they ask you to handle a new requirement that to them appears to be easy but isn’t.

Joe Yoder and Brian Foote, of the Big Ball of Mud fame, provide insights into ways systems can change without too much friction. They drew inspiration from Stuart Brand’s How Buildings Learn. Brand explains that buildings are made of components organized into shearing layers. He identifies six layers: the site, the structure, the skin, the services, the space plan, and physical stuff in the building.

Each shearing layer has its own value and speed of change, or pace. According to Brand, buildings are able to adapt because faster changing layers (e.g. the services layers and spaces) are purposefully designed so to not be obstructed by slower changing layers. If you design your building well, it is fairly easy to change the plumbing. Much easier than revising the foundation. And it is even easier to rearrange the furniture. Sometimes designers go to extra efforts to make a component easier to change. For example, most conference centers are designed so that sliding panels form walls that allow inside space to be quickly modified.

Brand’s ideas should’t be surprising to software developers who follow good design practices that enable us to adapt our software: keep systems modular, remove unnecessary dependencies between components, and hide implementation details behind stable interfaces.

Foote and Yoder’s advice for avoiding tangled, hard-to-change software is to, “Factor your system so that artifacts that change at similar rates are together.” They also present a chart of typical layers in a software system and their rates of change:

Frequently, we are asked to support new functionality that requires us to make changes deep in our system. We are asked to tinker with the underlying (supposedly slower changing) layers that the rest of our software relies upon. And often, we do achieve this miraculous feat of engineering because interfaces between layers were stable and performed adequately. We got away with tinkering with the foundations without serious disruption. But sometimes we aren’t so lucky. A new requirement might demand significantly more capabilities of our underlying layers. These types of changes require significant architectural rework. And no matter how matter how agile we are, major rework requires more effort.

Because we are agile, we recognize that change is inevitable. But embracing change doesn’t make it easier, just expected. I’d be interested in hearing your thoughts about Foote and Yoder’s shearing layers and ways you’ve found to ease the pain of making significant software changes.

Landing Zone Targets: Precision, Specificity, and Wiggle Room

A landing zone is a set of criteria used to monitor and characterize the “releasability” of a product. Landing zones allow you to take product features and system qualities and trade them off against each other to determine what an acceptable product has to be. Almost always these tradeoffs have architectural implications. If you’ve done something similar in the past, the criteria you should use to define your landing zone may be obvious. But for first time landing zone builders, I recommend you task someone who knows about the product to take a first cut at establishing landing zone criteria that is then reviewed and vetted by a small, informed group.

A business architect, product owner, or lead engineer might prepare a “proposed landing zone” of reasonable values for landing zone criteria that are questioned, challenged, and then reviewed by a small group. On one program I was involved with, the chief business architect made this initial cut. He was a former techno geek who knew his technical limits. More important, he had deep business knowledge, product vision, and had a keen sense about where to be precise and where there should be a lot of flexibility in the landing zone values.

Some transaction criteria were very precise. Since they were in the business of processing a lot of transactions, they knew their past and knew were they needed to improve (based on projected increases in transaction volumes). For example, that transaction throughput target for a particular business process was based on extrapolations from the existing implementation (taking into account the new architecture and system deployment capabilities). This is a purposefully obfuscated example:

Example Landing Zone Attribute
Characteristic Minimum Target Outstanding
Payment processing transactions per day 3,250,000 4,000,000 5,500,000

Some targets for explicit user tasks were very specific (one had a target of less than 4 hours with no errors, and an outstanding goal of 1 business day). On the other hand, many other landing zone criteria were only generally categorized as requiring either a patch, a new system release, or online update support. The definitions for what was a patch, a release or an online update were nailed down so that there was no ambiguity in what they meant.

For example, a patch was defined as a localized solution that took a month or less to implement and deploy. The goal was eventually to get closer to a week than a month, but they started out modestly. On the other hand, a release required coordination among several teams and an entire system redeployment. An online update was something a user could accomplish via an appropriate tool.

So, for example, the landing zone criteria for reconfiguring a workflow associated with a specific data update stream had minimal and target values of “release” and an outstanding value of “online update”.

When defining a landing zone for an agile product or program, carefully consider how precise you need to be and how many criteria are in your zone. Less precision allows for more wiggle room. Without enough constraints, however, it’s hard to know what is good enough. The more precise landing zone criteria are, the easier it is to tell whether you are on track to meet them. But if those landing zone criteria are too narrowly defined, there’s a danger of ignoring broader architecture and design concerns in order to focus only on specifically achieving targets.

We live in a world where there needs to be a balance. I’ll write more about who might be best suited to defining and redefining landing zones in another post.

Software Decision Making Under Stress

I recently blogged about my discomfort with making software design decisions at “the last responsible moment” and suggested that deciding at the “most responsible moment” might be a better approach. To me, a slight semantic shift made a difference in how I felt. Deciding at the most responsible moment made me feel less stressed and more in control of the situation.

But is this because I am basically lazy, preferring to put off decisions until they absolutely, positively must be made (and then hating that gut wrenching feeling when I finally realize that I don’t have enough time to think things through)? Or is something else going on?

I admit that the decisions we make about software development on a day in/day out aren’t always made under extreme stress; yet I thought I’d see what researchers say about decision-making under stress. As a developer I benefit from some stress, but not too much. That’s why (reasonable) deadlines and commitments are important.
But first, a disclaimer. I have not exhaustively researched the literature on this topic. I suspect there are more relevant and recent publications than what I found. But the two papers I read got me thinking. And so, I want to share them.

Giora Keinan, in a 1987 Journal of Personal and Social Psychology article, reports on a study that examined whether “deficient decision making” under stress was largely due to not systematically considering all relevant alternatives. He exposed college student test subjects to “controllable stress”, “uncontrollable stress”, or no stress, and measured how it affected their ability to solve interactive decision problems. In a nutshell being stressed didn’t affect their overall performance. However, those who were exposed to stress of any kind tended to offer solutions before they considered all available alternatives. And they did not systematically examine the alternatives.

Admittedly, the test subjects were college students doing word analogy puzzles. And the uncontrolled stress was the threat of a small random electric shock; but still, the study demonstrated that once you think you have a reasonable answer, you jump to it more quickly under stress. (Having majored in psychology and personally performed experiments on college students, I can anecdotally confirm that while college students are good test subjects, one should take care to not over-generalize any results.)

So, is stress “good” or “bad”? Is systematically considering all viable alternatives before making a decision a better strategy (or not)? Surely, we in the software world know we never have enough time to make perfectly researched decisions. And big-upfront-decision-making, without confirmation is discouraged these days. Agile developers believe that making just-in-time design decisions result in better software.

But what are the upsides or downsides to jumping to conclusions too hastily? What happens if you feel too stressed when you have to make a decision? To gain some insight into that, I turned to a summary article, Judgment and decision making under stress: an overview for emergency managers , by Kathleen M. Kowalski-Trakofler, Charles Vaught, and Ted Sharf of the National Institute of Occupational Safety and Health. These authors raised many questions about the status quo of stress research and the need for more grounded studies. However, they also drew three interesting conclusions:

1.Under extreme stress [think natural disasters, plane crashes, war and the like], successful teams tend to communicate among themselves. As the emergency intensifies, a flatter communication hierarchy develops with more (unsolicited) information coming from the field to the command centre. Under stressful, emergency situations, communication becomes streamlined and localized. Also, people take personal initiative to inform leaders about the situation.

2. Stress is affected by perception; it is the perceived experience of stress that an individual reacts to. When you perceive a situation as stressful, you start reacting as if it were stressful. What is stressful to you is what’s important. And not everyone reacts well under stress. Training helps emergency responders to not freak out in an emergency, but those of us in the software world aren’t nearly so well-trained to respond to software crises. When is the last time you had software crises training?

3. Contrary to popular opinion, judgment is not always compromised under stress. Although stress may narrow the focus of attention (the data are inconclusive), this is not necessarily a negative consequence in decision making. Some studies show that the individual adopts a simpler mode of information processing that may help in focusing on critical issues. So, we can effectively make reasonable decisions if we find and focus on the critical issues. If we miss out on a critical issue, well, things may not work out so well.

Reading these papers confirmed some suspicions I had: Stress is something we perceive. It doesn’t matter whether others share your perception or not. If you feel stressed, you are stressed. And you are more likely to make decisions without considering every alternative. That can be appropriate if your decisions are localized, you have experience, and you have a means of backing out of a decision if it turns out to be a bad one. But under extreme stress things start to break down. And then, if you haven’t had emergency training, how you respond is somewhat unpredictable.

I hope that we can start some ongoing discussions within the software community about design decisions and decision-making in general. How do you, your team or your management react to, avoid, or thrive on stress? Do you think agile practices help or hinder decision-making? If so, why? If not, why not.

Agile Architecture Myths #2 Architecture Decisions Should Be Made At the Last Responsible Moment

In Lean Software Development: An Agile Toolkit, Mary and Tom Poppendieck describe “the last responsible moment” for making decisions:

Concurrent development makes it possible to delay commitment until the last responsible moment, that is, the moment at which failing to make a decision eliminates an important alternative.

And Jeff Atwood, in a thought-provoking blog argues that “we should resist our natural tendency to prepare too far in advance” especially in software development. Rather than carry along too many unused tools and excess baggage, Jeff admonishes,

Deciding too late is dangerous, but deciding too early in the rapidly changing world of software development is arguably even more dangerous. Let the principle of Last Responsible Moment be your guide.

And yet, something about the principle of the last responsible moment has always made me feel slightly uneasy. I’ve blogged about a related topic (just barely enough design) before. And be aware that in both my personal and professional life that I am not known as someone who plans things far out in advance. As a consequence I rarely use frequent flyer miles because I don’t anticipate vacation plans far enough in advance. I am not known to get to the airport hours ahead of my flight either. My just-in-time decision-making and actions have been known to make my travelling companions a bit uneasy. They prefer a not so tight timeline.

But what about software development? Well, if I find an approach that seems worth pursuing, I’ll typically go for it. I like to knock off decisions that have architecturally relevant impacts early so I can get on to the grunt work. A lot of code follows after certain architectural choices are made and common approaches are agreed upon. Let’s make a rational decision and then vet it, not constantly revisit it, is my ideal.

Yet I know too-early architecture decisions are particularly troublesome as they may have to be undone (or if not, result in less-than-optimal architecture).

So what is it about forcing decision-making to be just-in-time at the last responsible moment that bugs me, the notorious non-planner? Well, one thing I’ve observed on complex projects is that it takes time to disseminate decisions. And decisions that initially appear to be localized (and not to impact others who are working in other areas) can and frequently do have ripple affects outside their initially perceived sphere of influence. And, sometimes, in the thick of development, it can be hard to consciously make any decisions whatsoever. How I’ve coded up something for one story may inadvertently dictate the preferred style for implementing future stories, even though it turns out to be wrongheaded. The last responsible moment mindset can at times lull me (erroneously) into thinking that I’ll always have time to change my mind if I need to. I’m ever the optimist. Yet in order to work well with others and to produce habitable software I sometimes need a little more forethought.

And so, I think I operate more effectively if I make decisions at the “most responsible moment” instead of the “last responsible moment”.

I’m not a good enough of a designer (or maybe I am too much of an optimist) to know when the last responsible moment is. Just having a last-responsible moment mindset leaves me open to making late decisions. I’m sure this is not what Mary and Tom intended at all.

So I prefer to make decisions when they have positive impacts. Making decisions early that are going to have huge implications isn’t bad or always wasteful. Just be sure they are vetted and revisited if need be. Deferring decisions until you know more is OK, too. Just don’t dawdle or keep changing your mind. And don’t just make decisions only to eliminate alternatives, but make them to keep others from being delayed or bogged down waiting for you to get your act together. Remember you are collaborating with others. Delaying decisions may put others in a bind.

In short: make decisions when the time is right. Which can be hard to figure out sometimes. That’s what makes development challenging. Decisions shouldn’t be forced or delayed, but taken up when the time is right. And to help me find the right times, I prefer the mindset of “the most responsible moment” not the “last responsible one.”

Agile Architecture Myths #1 Simple Design is Always Better

Over the next few weeks I plan to blog about some agile software architecture and design beliefs that can cause confusion and dissent on agile teams (and angst for project and program managers). Johanna Rothman and I have jointly drawn up a list of misconceptions we’ve encountered as we’ve been working on our new agile architecture workshop. However, I take full responsibility for any ramblings and rants about them on my blog.

The first belief I want to challenge is this: simple designs are always better designs. If you want to quibble, you might say that I am being too strict in my wording. Perhaps I should hedge this claim by stating, “simple design is almost always better”. The corollary: more complex designs are never better. Complex design solutions aren’t as good as a simpler solutions because they are (pick one): harder to test, harder to extend, harder to understand, or harder to maintain.

To break down the old bad habits of doing overly speculative design (and wasting a lot of time and effort over-engineering), keep designs simple. Who can argue against simplicity?

I can and will. My problem with an overly narrow “keep it simple” mindset is that it fosters the practice of “keeping designs stupidly simple. Never allowing time to explore design alternatives before jumping in and coding something that works can lead to underperforming, bulky code. Never allowing developers to rework code that already works to handle more nuances only serves to reinforce ill-formed management decisions to continually push for writing more code at the expense of designing better solutions. What we’ve done with this overemphasis on simplicity is to replace speculation with hasty coding.

Development may appear to go full throttle for a while with thit absurdly simple practice, but for more complex projects, eventually any lack of concerted design effort can cause things to falter. Sometimes more complex solutions lead to increased design flexibility and far less code. But you will never know until you try to design and build them.

One of the hardest things for agile developers is to achieve an appropriate balance between programming for today and anticipating tomorrow. The more speculative any solution is, the more chance it has of being impacted by changing requirements. But sometimes, spending time looking at that queue of user stories and other acceptance criteria can lead you to consider more complex, scalable solutions earlier, rather than way too late. And therein lies the challenge: Doing enough design thinking, coding and experimentation at opportune times.

Draw a Tree

I often use a short, icebreaker to introduce design storytelling in talks and classes. I hand out an index card and ask people to draw a tree in 60 seconds. I’ve adapted this from Thiagi’s 99 second Draw a Tree exercise. I ask attendees to draw a tree, any old tree, and to be sure to autograph it as there will be a prize. At the conclusion of the exercise I pick someone at random to receive a small gift or book.

I have collected hundreds of drawings, some are very beautiful. Rarely I get drawings of bamboo.

Invariably one nerd who wants to win the prize and show off his computer geekiness draws a directed graph. After all, he doesn’t know the criteria I’ll use to choose a winner (none, it is a random drawing).

But most draw physical trees.

I get canonical tree shapes: mainly deciduous trees, with and without leaves, and a few conifers.

After I’ve collected the drawings, I ask how many drew roots and if so, why? If not, why not? Invariably, as Thiagi observes, most do not include roots, but some include roots or hints of root structures.

When asked why they didn’t draw any roots, invariably the answers is, “Because I drew what I can see. No need to show what’s below ground.” When asked why they included roots, those who did answer, “Because trees have roots.” Some software folks are very detailed and want to show everything. I’ve even received trees with tree parts labeled.

And there is my hook into the art of design storytelling. It depends upon your audience and the goal for telling your story whether you should include roots or not. There’s no “right” answer. Depending upon what your audience already knows and what you want to focus on, it is perfectly OK to leave out certain details.

The art of effectively drawing or describing any aspect of a software design, is to knowing what to leave out. It’s just as important to know what to omit as it is to know what to include. There’s always more detail. Effective design storytelling leaves out unessential details so that the important stuff can get emphasized.

Design For Test

It sounds straightforward. Write your test code first, then write code to pass that test. Don’t write an inch of code without writing a test first. That is what test-driven development (TDD) is about: Use tests to drive out the design and implementation. Rinse and repeat. Repeat many times a day.

I know a number of top notch developers who are masters at their craft. Yet they don’t daily program in a pedal-to-the-metal test-first-only write-the-barest-amount-of-code-to pass the test style. Yet they value testing. Testing, to them, is integral to programming. I asked a few of my programmer buddies who value testing what does it mean to design for test (I have my ideas, but I don’t make my daily living writing production code)…even if they aren’t TDD followers.

And the bottom line is this: code that is designed for test must continually be tested (not necessarily at the speed of a TDDer). If you want to make testing feasible, you often need to make code-under-test easy to isolate from its production environment, tunable, and measurable. Not easy stuff. Just like design flexibility, testing doesn’t come for free. It’s part of a disciplined development process.

Read more about Design For Test in my latest IEEE Design Column.

I’d like to hear your reactions…

The Value of Design Documentation

Recently I asked students to tell me what kinds of requirements they start with and what (if any) design documents do they produce.Several students said that they produced documentation just because it was part of their development process. As a consequence, they felt that the documents were rarely read, were hard to keep up to date with the real code, and were expensive to generate. I know that everyone isn’t free to change their process…but if something is expensive to do and doesn’t add value, especially in this economic climate: look for a better, less expensive alternative.

My advice is to keep the precision at a low enough level that you don’t have to keep updating it with every small change. Last year I helped one client develop a 2 page high-level view of the architecture for IT applications. On the first page was a component diagram. On the back was a high-level statement of each components’ responsibilities. While during development they produced other docs, these high-level overviews were intended to orient people who were going to maintain these applications. They were pleased when this high-level view was well-received by those developers.

Simply because a tool lets you reverse-engineer an implementation into detailed class or sequence diagrams doesn’t mean you should create lots of implementation-level diagrams. On another project where we used TogetherJ, we pruned sequence diagrams (to omit detail) so that business analysts could understand the basic flow w/o having to know everything. These edited diagrams didn’t end up in any permanent design document. Instead they helped explain our working prototype.

To be effective design documents have to communicate valued information to its intended audience. So if you find yourself creating documents that aren’t useful…well, think about suggesting cost cutting and process improvement ideas to your team and management. This is the right economic climate to suggest such positive changes.

Sustainable Design

In my most recent IEEE Column, Creating Sustainable Designs, I explore what it means to create software that can be maintained without too many growing pains. I have been intrigued by Christopher Alexander’s writings, particularly the first two volumes of the Nature of Order where he explains the properties of designed (or architected) things which have life and processes for creating life.

It is just as important to look at process of creating good software as it is to consider what properties make software habitable for those who have to maintain it and keep it alive. While I appreciate process (and I think the agile community has given us a lot to consider) I am more keenly interested in exploring what makes complex software “living”.

Alexander identifies these properties (or qualities) of living things: levels of scale, strong centers, boundaries, alternating repetition, positive space, good shape, local symmetries, deep interlock and ambiguity, contrast, gradients, roughness, echoes, the void, simplicity and inner calm, and non separateness.

It can be easy picking to draw certain connections between certain “good” software design properties and Alexander’s list. For example, good shape, as others have pointed out can be a matter even as simple as properly indented a method. Or it can be more profound than that–leading you to break up a method into substeps and invoke helper methods, just to keep every step at a similar level of abstraction. I’m only giving you a taste to see whether you are interested in exploring these ideas further.

If you are, read my column, and also take a look at the C++ Report article by Jim Coplien on Space: The Final Frontier, which introduces Alexander’s notion of centers and how they relate to software structure, and peruse several very good blog entries by Regis Medina.

And if you are interested in exploring these ideas further, perhaps by taking a deep look into working code or frameworks or software that you consider to be particularly alive (or not)… let me know. I am considering finding a venue where software developers and designers and philosophers could concretely explore Alexander’s properties more thoroughly. I am not just satisfied to make those simple, easy connections and call it good. I want to challenge our best designs and see whether Alexander’s properties really do apply (or if we have some other properties that are more relevant).