Reframing the problem through design.
“The transition to a really deep model is a profound shift in your thinking and demands a major change to the design.” — Domain-Driven Design, Eric Evans
There is a fallacy about how domain modelling works. The misconception is that we can design software by discovering all the relevant concepts in the domain, turn them into concepts in our design, add some behaviors, and voilà, we’ve solved our problem. It’s a simplistic perception of how design works: a linear path from A to B:
- understand the problem,
- apply design,
- end up with a solution.
That idea was so central to early Object-Oriented Design, that one of us (Rebecca) thought to refute it in her book:
“Early object design books including [my own] Designing Object-Oriented Software [Wirfs-Brock et al 1990], speak of finding objects by identifying things (noun phrases) written in a design specification. In hindsight, this approach seems naive. Today, we don’t advocate underlining nouns and simplistically modeling things in the real world. It’s much more complicated than that. Finding good objects means identifying abstractions that are part of your application’s domain and its execution machinery. Their correspondence to real-world things may be tenuous, at best. Even when modeling domain concepts, you need to look carefully at how those objects fit into your overall application design.” — Object Design, Rebecca Wirfs-Brock
Domain language vs Ubiquitous Language
The idea has persisted in many naive interpretations of Domain-Driven Design as well. Domain language and Ubiquitous Language are often conflated. They’re not the same.
Domain language is what is used by people working in the domain. It’s a natural language, and therefore messy. It’s organic: concepts are introduced out of necessity, without deliberation, without agreement, without precision. Terminology spreads across the organization or fades out. Meaning shifts. People adapt old terms into new meanings, or terms acquire multiple, ambiguous meanings. It exists because it works, at least well enough for human-to-human communication. A domain language (like all language) only works in the specific context it evolved in.
For us system designers, messy language is not good enough. We need precise language with well understood concepts, and explicit context. This is what a Ubiquitous Language is: a constructed, formalized language, agreed upon by stakeholders and designers, to serve the needs of our design. We need more control over this language than we have over the domain language. The Ubiquitous Language has to be deeply connected to the domain language, or there will be discord. The level of formality and precision in any Ubiquitous Language depends on its environment: a meme sharing app and an oil rig control system have different needs.
Talking of oil rigs:
Rebecca was invited to consult for a company that makes hardware and software for oil rigs. She was asked to help with object design and modelling, working on redesigning the control system that monitors and manages sensors and equipment on the oil rig. Drilling causes a lot of friction, and “drilling mud” (a proprietary chemical substance) is used as a lubricant. It’s also used as a carrier for the rocks and debris you get from drilling, lifting it all up and out of the hole. Equipment monitors the drilling mud pressure, and by changing the composition of the mud during drilling, you can control that pressure. Too much pressure is a really bad thing.
And then an oil rig in the gulf exploded.
As the news stories were coming out, the team found out that the rig was using a competitor’s equipment. Whew! The team started speculating about what could have happened, and were thinking about how something like that could happen with their own systems. Was it faulty equipment, sensors, the telemetry, communications between various components, the software?
When in doubt, look for examples. The team ran through scenarios. What happens when a catastrophic condition occurs? How do people react? When something fails, it’s a noisy environment for the oil rig engineers: sirens blaring, alarms going off, … We discovered that when a problem couldn’t be fixed immediately, the engineers, in order to concentrate, would turn off the alarms after a while. When a failure is easy to fix, the control system logs reflect that the alarm went on and was turned off a few minutes later.
But for more consequential failures, even though these problems take much longer to resolve, it still shows up on the logs as being resolved within minutes. Then, when people study the logs, it looks like the failure was resolved quickly. But that’s totally inaccurate. This may look like a software bug, but it’s really a flaw in the model. And we should use it as an opportunity to improve that model.
The initial modeling assumption is that alarms are directly connected to the emergency conditions in the world. However, the system’s perception of the world is distorted: when the engineers turn off the alarm, the system believes the emergency is over. But it’s not, turning an alarm off doesn’t change the emergency condition in the world. The alarms are only indirectly connected to the emergency. If it’s indirectly connected, there’s something else in between, that doesn’t exist in our model. The model is an incomplete representation of a fact of the world, and that could be catastrophic.
The team explored scenarios, specifically the weird ones, the awkward edge cases where nobody really knows how the system behaves, or even how it should behave. One such scenario is when two separate sensor measurements raise alarms at the same time. The alarm sounds, an engineer turns it off, but what happens to the second alarm? Should the alarm still sound or not? Should turning off one turn off the other? If it didn’t turn off, would the engineers think the off switch didn’t work and just push it again?
By working through these scenarios, the team figured out there was a distinction between the alarm sounding, and the state of alertness. Now, in this new model, when measurements from the sensors exceed certain thresholds or exhibit certain patterns, the system doesn’t sound the alarm directly anymore. Instead, it raises an alert condition, which is also logged. It’s this alert condition that is associated with the actual problem. The new alert concept is now responsible for sounding the alarm (or not). The alarm can still be turned off, but the alert condition remains. Two alert conditions with different causes can coexist without being confused by the single alarm. This model decouples the emergency from the sounding of the alarm.
The old model didn’t make that distinction, and therefore it couldn’t handle edge cases very well. When at last the team understood the need for separating alert conditions from the alarms, they couldn’t unsee it. It’s one of those aha-moments that seem obvious in retrospect. Such distinctions are not easily unearthed. It’s what Eric Evans calls a Breakthrough.
An Act of Creation
There was a missing concept, and at the first the team didn’t know something was missing. It wasn’t obvious at first, because there wasn’t a name for “alert condition” in the domain language. The oil rig engineers’ job isn’t designing software or creating a precise language, they just want to be able to respond to alarms and fix problems in peace. Alert conditions didn’t turn up in a specification document, or in any communication between the oil rig engineers. The concept was not used implicitly by the engineers or the software; no, the whole concept did not exist.
Then where did the concept come from?
People in the domain experienced the problem, but without explicit terminology, they couldn’t express the problem to the system designers. So it’s us, the designers, who created it. It’s an act of creative modeling. The concept is invented. In our oil rig monitoring domain, it was a novel way to perceive reality.
Of course, in English, alert and alarm exist. They are almost synonymous. But in our Ubiquitous Language, we agreed to make them distinct. We designed our Ubiquitous Language to fit our purpose, and it’s different from the domain language. After we introduced “alert conditions”, the oil rig engineers incorporated it in their language. This change in the domain is driven by the design. This is a break with the linear, unidirectional understanding of moving from problem to solution through design. Instead, through design, we reframed the problem.
Is it a better model?
How do we know that this newly invented model is in fact better (specifically, more fit for purpose)? We find realistic scenarios and test them against the alert condition model, as well as other candidate models. In our case, with the new model, the logs will be more accurate, which was the original problem.
But in addition to helping with the original problem, a deeper model often opens new possibilities. This alert conditions model suggests several:
- Different measurements can be associated with the same alert.
- Alert conditions can be qualified.
- We can define alarm behaviors for simultaneous alert conditions, for example by spacing the alarms, or picking different sound patterns.
- Critical alerts could block less critical ones from hogging the alarm.
- Alert conditions can be lowered as the situation improves, without resolving them.
These new options are relevant, and likely to bring value. Yet another sign we’d hit on a better model is that we had new conversations with the domain experts. A lot of failure scenarios became easier to detect and respond to. We started asking, what other alert conditions could exist? What risks aren’t we mitigating yet? How should we react?
Design Creates New Realities
In a world-centric view of design, only the sensors and the alarms existed in the real world, and the old software model reflected that accurately. Therefore it was an accurate model. The new model that includes alerts isn’t more “accurate” than the old one, it doesn’t come from the real world, it’s not more realistic, and it isn’t more “domain-ish”. But it is more useful. Sensors and alarms are objective, compared to alert conditions. Something is an alert condition because in this environment, we believe it should be an alert condition, and that’s subjective.
The model works for the domain and is connected to it, but it is not purely a model of the problem domain. It better addresses the problems in the contexts we envision. The solution clarified the problem. Having only a real world focus for modelling blinds us to better options and innovations.
These creative introductions of novel concepts into the model are rarely discussed in literature about modelling. Software design books talk about turning concepts into types and data structures, but what if the concept isn’t there yet? Forming distinctions, not just abstractions, however, can help clarify a model. These distinctions create opportunities.
The model must be radically about its utility in solving the problem.
“Our measure of success lies in how clearly we invent a software reality that satisfies our application’s requirements—and not in how closely it resembles the real world.” — Object Design, Rebecca Wirfs-Brock
Written by Mathias Verraes and Rebecca Wirfs-Brock. Special thanks to Eric Evans for the spot on feedback and constructive advice.