I recently spoke with an architect who has been tuning up a legacy system that is built out of a patchwork quilt of technologies. As a consequence of its age and lack of common design approaches, the system is difficult to maintain. Error and event logs are written (in fact, many are), but they are inconsistent and scattered. It is extremely hard to collect data from and troubleshoot the system when things go wrong.
The architect has instigated many architectural improvements to this system, but one that to me was absolutely brilliant was to not insist that the system be reworked to use a single common logging mechanism. Instead, logs were redirected to a NoSQL database that could then be intelligently queried to troubleshoot problems as they arose.
Rather than dive in and “fix” legacy code to be consistent, this was a “splice and intelligently interpret” solution that had minimal impact on working code. Yet this fairly simple fix made the lives of those troubleshooting the system much easier. No longer did they have to dig through various logs by hand. They could stare and compare a stream of correlated event data.
Early in my career I was often frustrated by discrepancies in systems I worked on. I envisioned a better world where the design conventions were consistently followed. I took pride in cleaning up crufty code. And in the spirit of redesigning for that new, improved world, I'd fix any inconsistencies that were under my control. At a large scale, my individual clean up efforts would be entirely impractical. Complex software isn’t the byproduct of a single mind. Often, it simply isn't practical to rework large systems make things consistent. It is far easier to spot and fix system warts early in their life than later after myriad cowpaths have been paved and initial good design ideas have become warped and obsfucated. Making significant changes in legacy systems requires skill, tenacity, and courage. But sometimes you can avoid making significant changes if you twist the way you think about the problem.
If your infrastructure causes problems, find ways to fix it. Better yet (and here's the twist): find ways to avoid or exploit its limitations. Solving a problem by avoiding major rework is equally as rewarding as cleaning up cruft. Even if it leaves a poor design intact. Such fixes breathe life into systems that by all measures should have been scrapped long ago. Fashioning fixes that don’t force the core of a fragile architecture to be revised is a real engineering accomplishment. In an ideal world I'd like time to clean up crufty systems and make them better. But not if I can get significant improvement with far less effort. Engineering, after all, is the art of making intelligent tradeoffs.