Behold the un-credentialed and un-esteemed BD00’s taxonomy of software-intensive system complexity:
How many “M”s does the system you’re working on have? If the answer is three, should it really be two? If the answer is two, should it really be one? How do you know what number of “M”s your system design should have? When tacking on another “M” to your system design because you “have to“, what newly emergent property is the largest complexity magnifier?
Now, replace the inorganic legend at the top of the page with the following organic one and contemplate how the complexity and “success” curves are affected:
ln “Software’s Hidden Clockwork: A General Theory of Software Defects“, Les Hatton presents these two interesting charts:
The thing I find hard to believe is that Les has concluded that there is no obvious significant relationship between defect density and the choice of programming language. But notice that he doesn’t seem to have any data points on his first chart for the relatively newer, less “tricky“, and easier-to-program languages like Java, C#, Ruby, Python, et al.
So, do you think Les might have jumped the gun here by prematurely asserting the virtual independence of defect density on programming language?
When a failure occurs in a complex, networked, socio-technical system, the probability is high that the root cause is located far away from the failure detection point in time, space, or both. The progression in time goes something like this:
fault ———–> error———-> error—————–>error——>failure discovered!
An unanticipated fault begets an error, which begets another error(s), which begets another error(s), etc, until the failure is manifest via loss of life or money somewhere and sometime downstream in the system. In the case of a software system, the time from fault to catastrophic failure may take milliseconds, but the distance between fault and failure can be in the 100s of thousands of lines of source code sprinkled across multiple machines and networks.
Let’s face it. Envisioning, designing, coding, and testing for end-to-end “system level” error conditions in software systems is unglamorous and tedious (unless you’re using Erlang – which is thoughtfully designed to lessen the pain). It’s usually one of the first things to get jettisoned when the pressure is ratcheted up to meet some arbitrary schedule premised on a baseless, one-time, estimate elicited under duress when the project was kicked-off. Bummer.
In my travels through the whacky world of software development, I’ve found that bottom up design (a.k.a code first and “discover” the real design later) can lead to lots of unessential complexity getting baked into the code. Once this unessential complexity, in the form of extraneous classes and a rats nest of unneeded dependencies, gets baked into the system and the contraption “appears to work” in a narrow range of scenarios, the baker(s) will tend to irrationally defend the “emergent” design to the death. After all, the alternative would be to suffer humility and perform lots of risky, embarrassing, and ego-busting disentanglement rework. Of course, all of these behaviors and processes are socially unacceptable inside of orgs with macho cultures; where publicly admitting you’re wrong is a career-ending move. And that, my dear, is how we have created all those lovable, “legacy” systems that run the world today.
Don’t get BD00 wrong here. The esteemed one thinks that bottom up design and coding is fine for small scoped systems with around 7 +/- 2 classes (Miller’s number), but this “easy and fun and fast” approach sure doesn’t scale well. Even more ominously, the bottom up coding and emergent design strategy is unacceptable for long-lived, safety-critical systems that will be scrutinized “later on” by external technical inspectors.
I recently dug up and re-read the classic Parnas/Clement 1986 paper: “A Rational Design Process: How And Why To Fake It“. Despite the tendency of people to want to desperately believe the process of design is “rational“, it never is. The authors know there is no such thing as a sequential, rational design process where:
- There’s always a good reason behind each successive design decision.
- Each step taken can be shown to be the best way to get to a well defined goal.
The culprit that will always doom a rational design process is “learning“:
Many of the details only become known to us as we progress in the implementation (of a design). Some of the things that we learn invalidate our design and we must backtrack (multiple times during the process). The resulting design may be one that would not result from a rational design process. – Parnas/Clements
Since “learning“, in the form of going backwards to repair discovered mistakes, is a punishable offense in social command & control hierarchies where everyone is expected to know everything and constantly march forward, the best strategy is to cover up mistakes and fake a rational design process when the time comes to formally present a “finished” design to other stakeholders.
Even though it’s unobtainable, for some strange reason, Spock-like rationality is revered by most orgs. Thus, everyone in org-land plays the “fake-it” game, whether they know it or not. To expect the world to run on rationality is irrational.
Executives preach “evidence-based decision-making“, but in reality they practice “decision-based evidence-making“.