Software Diagrams Aren’t Always Correct and That’s OK

on November 24, 2020

This morning I saw an interesting twitter thread:

Sorry, this makes no sense. Most valuable architectural depictions are high-level and abstract and can't be auto-generated from code, for example; hence they're manually created.
— Andrei Pacurariu (@AndreiPacurariu) November 23, 2020

Tudor is the driving-force behind gtoolkit.com, a modern and innovative programming environment that builds upon and extends classic Smalltalk development concepts. Tudor is someone worth listening to, but his tweet was something that I really couldn’t agree with. I found myself more in agreement with Andrei Pacurariu’s reply.

I started to tweet a response but quickly found that my thoughts were too complex to make a good tweet stream. Instead, I wrote the following mini essay.

Concretely, software is just bits in electronic storage that control and/or are manipulated by processors. Abstractions are the building blocks that enable humans to design and build complex software systems out of bits. Abstractions are products of out minds—they allow us to assign meaning to clusters (some large, some small) of bits. They allow us to build software systems without thinking about billions of bits or how processors work.

We manifest some useful and generally simple abstractions (instructions, statements, functions, classes, modules, etc.) as “code” using other abstractions we call “languages.” Languages give us a common vocabulary for us to communicate about those abstract building blocks and to produce the corresponding bits. There are many useful tools that can and should be created to help us understand the code-level operation of a system.

But most systems we build today are too complex to be fully understood at the level of code. In designing them we must use higher-level abstractions to conceptualize, compose, and organize code. Abstract machines, frameworks, patterns, roles, stereotypes, heuristics, constraints, etc. are examples of such higher-level abstractions.

The languages we commonly use provide few, if any, mechanisms for directly identifying such higher-level abstractions. These abstractions may manifest as naming or other coding conventions but recognizing them as such depends upon a pre-existing shared understanding between the writer and readers of the code.

If such conventions have been adequately formalized and followed, a tool may be able to assist a human identify and display the usage of higher-level abstractions within a code base. But traditionally, such tools are rare and often unreliable. One problem is that abstractions can overlap. So we (or the tools) not only need to identify and classify abstractions but also identify the clustering and organization of abstractions. Sometimes this results in multiple views that can provide totally different conceptions of the operation of the code we are looking at.

Software designers/architects often use informal diagrams to capture their conceptualization of the structure and interactions of the higher-level abstractions of their designs. Such diagrams can serve as maps that help developers understand the territory of the code.

But developers are often skeptical of such diagrams. Experience (and folklore) has taught them that design diagrams likely do not accurately reflect the actual state of a code base. So, how do we familiarize ourselves with a new software code base that we wish to modify or extend? Most commonly we just try to reverse engineer its architecture/design by examining the code. We tell ourselves that the code provides the ground truth and that we or our tools can generate truthful design docs from the code because the code doesn’t lie. When humans read code we can easily miss or misinterpret the higher-level abstractions that are lurking among and above the code. It’s a real challenge to imagine a tool that can do better.

Lacking design documents or even a good sense of a system’s overall design we then often make local code changes without correctly understanding the design context within which that code operates. Such local changes may “work” to solve some immediate problem. But as they accumulate over long periods such changes erode the design integrity of the overall system. And they contribute to the invalidation of any pre-existing design documents and diagrams that may have at one point in time provided an accurate map of the system.

We definitely need better tools for integrating higher-level system architecture/design abstraction with code—or at least for integrating documentation. But don’t tell me that creating such documents are a waste of time or that any such documents should be ignored. I’ve spent too many decades trying to guess how a complex program was intended to work by trying to use the code to get into the mind of the original designers and developers. Any design documentation is useful documentation. It will probably need to be verified against the current state of the code but even when the documents are no longer the truth they often provide insights that help our understanding of the system. When I asked Rebecca Wirfs-Brock to review this little essay, she had an experience to relate:

I remember doing a code/architecture review for a company that built software to perform high reliability backups…and I was happy that the architectural documents were say 70% accurate as they gave me a good look into the mind of the architect who engineered these frameworks. And I didn’t think it worthwhile to update it to reflect truth. As it was a snapshot in time. Sometimes updating may be worthwhile, but as in archeological digs, you’ve gotta be careful about this.

Diagrams and other design documents probably won’t be up to date when somebody reads them in the future. But that’s not an excuse to not create them. If you care about the design integrity of you software system you need to provide future developers (including yourself) a map of your abstract design as you currently know it.