I’m not a fan of “emergent global architecture“, but I AM a fan of “emergent local design“. To mitigate downstream technical and financial risk, I believe that one has to generate and formally document an architecture at a high level of abstraction before starting to write code. To do otherwise would be irresponsible.
The figure below shows a portion of an initial “local” design that I plucked out of a more “global” architectural design. When I started coding and unit testing the cluster of classes in the snippet, I “discovered” that the structure wasn’t going work out. The API of the architectural framework within which the class cluster runs wouldn’t allow it to work without some major, internal, restructuring and retesting of the framework itself.
After wrestling with the dilemma for a bit, the following workable local design emerged out of the learning acquired via several wretched attempts to make the original design work. Of course, I had to throw away a bunch of previously written skeletal product and test code, but that’s life. Now I’m back on track and moving forward again. W00t!
Assume we have a valuable, revenue-critical software system in operation. The figure below shows one nice and tidy, powerpoint-worthy way to model the system; as a static, enumerated set of executables and libraries.
Given the model above, we can express the size of the system as:
Now, say we run a tool on the code base and it spits out a system size of 200K “somethings” (lines of code, function points, loops, branches, etc).
What does this 200K number of “somethings” absolutely tell us about the non-functional qualities of the system? It tells us absolutely nothing. All we know at the moment is that the system is operating and supporting the critical, revenue generating processes of our borg. Even relatively speaking, when we compare our 200K “somethings” system against a 100K “somethings” system, it still doesn’t tell us squat about the qualities of our system.
So, what’s missing here? One missing link is that our nice and tidy enumerations view and equation don’t tell us nuttin’ about what Russ Ackoff calls “the product of the interactions of the parts” (e.g Lib-to-Lib, Exe-Exe). To remedy the situation, let’s update our nice and tidy model with the part-to-part associations that enable our heap of individual parts to behave as a system:
Our updated model is still nice and tidy, but just not as nice and tidy as before. But wait! We are still missing something important. We’re missing a visual cue of our system’s interactions with “other” systems external to us; you know, “them”. The “them” we blame when something goes wrong during operation with the supra-system containing us and them.
Our updated model is once again still nice and tidy, but just not as nice and tidy as before. Next, let’s take a single snapshot of the flow of (red) “blood” in our system at a given point of time:
Finally, if we super-impose the astronomic number of all possible blood flow snapshots onto one diagram, we get:
D’oh! We’re not so nice and tidy anymore. Time for some heroic debugging on the bleeding mess. Is there a doctor in da house?
Suppose you’re developing a software-intensive product and you have to choose to write your app code on top of two competing infrastructure platforms:
Well, duh. I think I’ll take the candidate on the left. That way, if the code I write ends up being costly to maintain, it’s all my fault. I wasn’t “forced” to write crappy, jaggy code by having to comply with the platform:
But wait! Suppose either the clean infrastructure doesn’t exist or (more likely) you’re “mandated” to write your apps on top of the jaggy infrastructure. In this situation, here’s the best and worst we can do:
In both cases, our code has some unwanted “jagginess” to it – some forced upon us by the platform and some we introduced ourselves.
In summary, our code can take on one of the forms below. The two on the left, written on top of the clean infrastructure, are less costly to maintain than the two written on the right.
So, what’s the purpose of this post? Uh, I dunno. I started sketching out the graphics first and then I thought some interesting insight would pop up as I wrote the accompanying words. But other than the utterly obvious advice to “choose a clean infrastructure over a jaggy infrastructure when you can“, nothing arose.
Writing is sometimes like that. You have nothing to say, but you write and babble away anyway. In case you haven’t noticed, I do that a lot. Bummer.
Regardless of which methodology you use to develop software, the following technical allocation chain must occur to arrive at working source code from some form of requirements:
The figure below shows a 2/6/13 end result of the allocation chain for a hypothetical example project. How the 2/6/13 combo was arrived at is person and domain-specific. Given the same set of requirements to N different, domain-knowledgeable people, N different designs will no doubt be generated. Person A may create a 3/6/9 design and person B may conjure up 4/8/16 design.
Given a set of static or evolving requirements, how should one allocate components to namespaces and libraries? The figure below shows extreme 1/1/13 and 13/13/13 cases for our hypothetical 13 component example.
As the number of components, N, in the system design gets larger, the mindless N/N/N strategy becomes unscalable because of an increasing dependency management nightmare. In addition to deciding which K logical components to use in their application, library users must link all K physical libraries with their application code. In the mindless 1/1/N strategy, only one library must be linked with the application code, but because of the single namespace, the design may be harder to logically comprehend.
Expectedly, the solution to the allocation problem lies somewhere in between the two extremes. Arriving at an elegant architecture/design requires a proactive effort with some upfront design thinking. Domain knowledge and skillful application of the coupling-cohesion heuristic can do the trick. For large scale systems, letting a design emerge won’t.
Emergent design works in nature because evolution has had the luxury of millions of years to get it “right“. Even so, according to angry atheist Richard Dawkins, approximately 99% of all “deployed” species have gone extinct – that’s a lot of failed projects. In software development efforts, we don’t have the luxury of million year schedules or the patience for endless, random tinkering.
Before you unquestioningly accept the gospel of the “evolutionary architecture” and “emergent design” priesthood, please at least pause to consider these admonitions:
Give me six hours to chop down a tree and I will spend the first four sharpening the axe - Abe Lincoln
Measure twice, cut once – Unknown
If I had an hour to save the world, I would spend 59 minutes defining the problem and one minute finding solutions – Albert Einstein
100% test coverage is insufficient. 35% of the faults are missing logic paths – Robert Glass
Out of the bazillions of definitions of “software architecture” out in the wild, my favorite is:
“the initial set of decisions that are costly to change downstream“.
It’s my fave because it encompasses the “whole” product development ecosystem and not just the structure and behavior of the product itself. Here are some example decisions that come to mind:
- Programming language(s)
- Third party libraries
- Architectural pattern/style
- Build system selection
- Targeted operating system(s)
- Version control system
- Automated testing framework
- Communication middleware
- GUI framework
- Requirements management tool
- Non-functional requirements (latency, throughput, availability, security, safety)
- Development process
Once your production code gets intimately bound with the set of items on the list, there comes a point of no return on the project timeline where changing any of them is pragmatically impossible. Once this mysterious time threshold is crossed, changing the product source code may be easier than changing anything else on the list.
Got any other items to add to the list?
The figure below shows a layered view of the latest distributed system product that I’m working on. Customer teams compose their applications by writing their own system-specific Crumbs and linking them with our pre-written, pre-tested Crumbs. In the ideal case, customers don’t have to write a single line of Crumb code. They simply compose, compile, link, configure, and deploy an amalgamation of our off-the-shelf Crumbs as a set of application components that meets their needs.
Note that we are using C++11 to build the system. Also note the third party, open source libraries that we are building upon. Except for Poco, Crumb developers don’t directly use the OpenDDS or ACE/TAO APIs. Our Crumb “Tray” serves as a wrapper/facade that hides the complexity those inter-process communication facilities.
My role on the development team is as a Libs team “Crumb” designer/writer. If I gave you anymore product views or disclosed anything more concrete, then I’d either get fired or I’d have to kill you, or both.
What are you currently working on?
Comprehensiveness is the enemy of comprehensibility – Martin Fowler
Martin’s quote may be the main reason why this preference was written into the Agile Manifesto…
Working software over comprehensive documentation
Obviously, it doesn’t say “Working software and no documentation“. I’d bet my house that Martin and his fellow colleagues who conjured up the manifesto intentionally stuck the word “comprehensive” in there for a reason. And the reason is that “good” documentation reduces costs in both the short and long runs. In addition, check out what the Grade-ster has to say:
The code tells the story, but not the whole story – Grady Booch
Now that the context for this post has been set, I’d like to put in a plug for Simon Brown’s terrific work on the subject of lightweight software architecture documentation. In tribute to Simon, I decided to hoist a few of his slides that resonate with me.
Note that the last graphic is my (and perhaps Simon’s?) way of promoting standardized UML-sketching for recording and communicating software architectures. Of course, if you don’t record and communicate your software architectures, then reading this post was a waste of your time; and I’m sorry for that.
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system. – John Gall (1975, p.71)
This law is essentially an argument in favour of underspecification: it can be used to explain the success of systems like the World Wide Web and Blogosphere, which grew from simple to complex systems incrementally, and the failure of systems like CORBA, which began with complex specifications. – Wikipedia
We can add the Strategic Defense Initiative (Star Wars), the FBI’s Virtual Case File System (VCS), , FCS, and prolly a boatload of other high falutin’ defense projects to the list of wreckage triggered by violations of Gall’s law. Do you have any other majestic violations you’d like to share? Can you cite any counter-examples that attempt to refute the law….
One of the great tragedies of life is the murder of a beautiful theory by a gang of brutal facts – Benjamin Franklin
C++, which started out simply as “C With Classes“, is a successful complex “system“. Java, which started out as a simple and pure object-oriented system, has evolved into a successful complex system that now includes a mix of functional and generic programming features. Linux, which started out as a simple college operating system project, has evolved into a monstrously successful complex system. DDS, which started out as a convergence of two similar, field-tested, pub-sub messaging implementations from Thales Inc. and RTI Inc., has evolved into a successful complex system (in spite of being backed by the OMG). Do you have any other law abiding citizens you’d like to share?
Gall’s law sounds like a, or thee, platform for Fred Brooks‘ “plan to throw one away” admonition and Grady Booch‘s “evolution through a series of stable intermediate forms” advice.
Here are two questions to ponder: Is your org in the process of trying to define/develop a grand system design from scratch? Scanning your project portfolio, can you definitively know if you’re about to, or currently are, attempting a frontal assault on Gall’s galling law – and would it matter if you did know?