Plucked from his deliciously titled “Real Architecture: Engineering or Pompous Bullshit?” slide deck, I give you Tom Gilb‘s personal principles of software architecture engineering:
Tom’s proactive approach seems like a far cry from the reactive approaches of the “emergent architecture” and TDA (Test Driven Architecture) communities, doesn’t it?
:)! Tom’s list actually uses the words “engineering” and “the architect“. Maybe that’s why I have always appreciated his work so much.
The figure below depicts an architectural view of a real-time embedded sub-system that I and a team of 8 others built and delivered 10 (freakin!) years ago. At revision number 9, the diagram ended up being the final “as-built” model of the 20,000+ lines-of-code system. Since the software was written in C and, thus, not object-oriented, I chose not to use UML to capture the design at the time. Doing so would have introduced an impedance mismatch and a large intellectual gap of misunderstanding between the procedural C code base and the OO design artifacts. I used structured analysis and functional decomposition to concoct the design and I employed “pseudo” Data Flow Diagrams (DFD) instead.
At the beginning of this “waterfall” project, I created revision 0 of the diagram as the first “build-to” snapshot. Of course, as learning accrued and the system evolved throughout development, I diligently kept the diagram updated and synchronized with the code base in true PAYGO fashion.
As you can see from the picture, the system of 30+ asynchronous application tasks ran under the tutelage of the industrial-strength VxWorks Real Time Operating System (RTOS). Asynchronous inter-task communication was performed via message passing through a series of lock-protected queues. The embedded physical board was powered by a Motorola PowerPC CPU (remember those dinosaurs?). The board housed a myriad of serial and ethernet interface ports for communication to other external sub-systems.
The above diagram was not the sole artifact that I used to record the design. It was simply the highest level, catch-all, overview of the system. I also developed a complementary set of lower level functional diagrams; each of which captured a sliced view of an end-to-end strand of critical functionality. One of these diagrams, the “Uplink/Downlink Processing View“, is shown below. Note that the final “as-built” diagram settled out as revision number 5.
The purpose of this post was simply to give you a taste of how I typically design and evolve a non-trivial software-intensive system that I can’t entirely keep in my head. I use the same PAYGO process for all of my efforts regardless of whether the project is being managed as an agile or waterfall endeavor. To me, project process is way over-emphasized and overblown. “Business Value” creation ultimately distills down to architecture, design, coding, and testing at all levels of abstraction.
The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise. — Edsger Dijkstra
With Edsger’s delicious quote in mind, let’s explore seven levels of abstraction that can be used to reason about big, distributed, systems:
At level zero, we have the finest grained, most concrete unit of design, a single puny line of “source code“. At level seven, we have the coarsest grained, most abstract unit of design, the mysterious and scary “system” level. A line of code is simple to reason about, but a “system” is not. Just when you think you understand what a system does, BAM! It exhibits some weird, perhaps dangerous, behavior that is counter-intuitive and totally unexpected – especially when humans are the key processing “nodes” in the beast.
Here are some questions to ponder regarding the seven level stack: Given that you’re hired to build a big, distributed system, at what level would you start your development effort? Would you start immediately coding up classes using the much revered TDD “best practice” and let all the upper levels of abstraction serendipitously “emerge”? Relatively speaking, how much time “up front” should you spend specifying, designing, recording, communicating the structures and behaviors of the top 3 levels of the stack? Again, relatively speaking, how much time should be allocated to the unit, integration, functional, and system levels of testing?
I’m not a fan of “emergent global architecture“, but I AM a fan of “emergent local design“. To mitigate downstream technical and financial risk, I believe that one has to generate and formally document an architecture at a high level of abstraction before starting to write code. To do otherwise would be irresponsible.
The figure below shows a portion of an initial “local” design that I plucked out of a more “global” architectural design. When I started coding and unit testing the cluster of classes in the snippet, I “discovered” that the structure wasn’t going work out. The API of the architectural framework within which the class cluster runs wouldn’t allow it to work without some major, internal, restructuring and retesting of the framework itself.
After wrestling with the dilemma for a bit, the following workable local design emerged out of the learning acquired via several wretched attempts to make the original design work. Of course, I had to throw away a bunch of previously written skeletal product and test code, but that’s life. Now I’m back on track and moving forward again. W00t!
Assume we have a valuable, revenue-critical software system in operation. The figure below shows one nice and tidy, powerpoint-worthy way to model the system; as a static, enumerated set of executables and libraries.
Given the model above, we can express the size of the system as:
Now, say we run a tool on the code base and it spits out a system size of 200K “somethings” (lines of code, function points, loops, branches, etc).
What does this 200K number of “somethings” absolutely tell us about the non-functional qualities of the system? It tells us absolutely nothing. All we know at the moment is that the system is operating and supporting the critical, revenue generating processes of our borg. Even relatively speaking, when we compare our 200K “somethings” system against a 100K “somethings” system, it still doesn’t tell us squat about the qualities of our system.
So, what’s missing here? One missing link is that our nice and tidy enumerations view and equation don’t tell us nuttin’ about what Russ Ackoff calls “the product of the interactions of the parts” (e.g Lib-to-Lib, Exe-Exe). To remedy the situation, let’s update our nice and tidy model with the part-to-part associations that enable our heap of individual parts to behave as a system:
Our updated model is still nice and tidy, but just not as nice and tidy as before. But wait! We are still missing something important. We’re missing a visual cue of our system’s interactions with “other” systems external to us; you know, “them”. The “them” we blame when something goes wrong during operation with the supra-system containing us and them.
Our updated model is once again still nice and tidy, but just not as nice and tidy as before. Next, let’s take a single snapshot of the flow of (red) “blood” in our system at a given point of time:
Finally, if we super-impose the astronomic number of all possible blood flow snapshots onto one diagram, we get:
D’oh! We’re not so nice and tidy anymore. Time for some heroic debugging on the bleeding mess. Is there a doctor in da house?
Suppose you’re developing a software-intensive product and you have to choose to write your app code on top of two competing infrastructure platforms:
Well, duh. I think I’ll take the candidate on the left. That way, if the code I write ends up being costly to maintain, it’s all my fault. I wasn’t “forced” to write crappy, jaggy code by having to comply with the platform:
But wait! Suppose either the clean infrastructure doesn’t exist or (more likely) you’re “mandated” to write your apps on top of the jaggy infrastructure. In this situation, here’s the best and worst we can do:
In both cases, our code has some unwanted “jagginess” to it – some forced upon us by the platform and some we introduced ourselves.
In summary, our code can take on one of the forms below. The two on the left, written on top of the clean infrastructure, are less costly to maintain than the two written on the right.
So, what’s the purpose of this post? Uh, I dunno. I started sketching out the graphics first and then I thought some interesting insight would pop up as I wrote the accompanying words. But other than the utterly obvious advice to “choose a clean infrastructure over a jaggy infrastructure when you can“, nothing arose.
Writing is sometimes like that. You have nothing to say, but you write and babble away anyway. In case you haven’t noticed, I do that a lot. Bummer.
Regardless of which methodology you use to develop software, the following technical allocation chain must occur to arrive at working source code from some form of requirements:
The figure below shows a 2/6/13 end result of the allocation chain for a hypothetical example project. How the 2/6/13 combo was arrived at is person and domain-specific. Given the same set of requirements to N different, domain-knowledgeable people, N different designs will no doubt be generated. Person A may create a 3/6/9 design and person B may conjure up 4/8/16 design.
Given a set of static or evolving requirements, how should one allocate components to namespaces and libraries? The figure below shows extreme 1/1/13 and 13/13/13 cases for our hypothetical 13 component example.
As the number of components, N, in the system design gets larger, the mindless N/N/N strategy becomes unscalable because of an increasing dependency management nightmare. In addition to deciding which K logical components to use in their application, library users must link all K physical libraries with their application code. In the mindless 1/1/N strategy, only one library must be linked with the application code, but because of the single namespace, the design may be harder to logically comprehend.
Expectedly, the solution to the allocation problem lies somewhere in between the two extremes. Arriving at an elegant architecture/design requires a proactive effort with some upfront design thinking. Domain knowledge and skillful application of the coupling-cohesion heuristic can do the trick. For large scale systems, letting a design emerge won’t.
Emergent design works in nature because evolution has had the luxury of millions of years to get it “right“. Even so, according to angry atheist Richard Dawkins, approximately 99% of all “deployed” species have gone extinct – that’s a lot of failed projects. In software development efforts, we don’t have the luxury of million year schedules or the patience for endless, random tinkering.
Before you unquestioningly accept the gospel of the “evolutionary architecture” and “emergent design” priesthood, please at least pause to consider these admonitions:
Give me six hours to chop down a tree and I will spend the first four sharpening the axe - Abe Lincoln
Measure twice, cut once – Unknown
If I had an hour to save the world, I would spend 59 minutes defining the problem and one minute finding solutions – Albert Einstein
100% test coverage is insufficient. 35% of the faults are missing logic paths – Robert Glass
Out of the bazillions of definitions of “software architecture” out in the wild, my favorite is:
“the initial set of decisions that are costly to change downstream“.
It’s my fave because it encompasses the “whole” product development ecosystem and not just the structure and behavior of the product itself. Here are some example decisions that come to mind:
- Programming language(s)
- Third party libraries
- Architectural pattern/style
- Build system selection
- Targeted operating system(s)
- Version control system
- Automated testing framework
- Communication middleware
- GUI framework
- Requirements management tool
- Non-functional requirements (latency, throughput, availability, security, safety)
- Development process
Once your production code gets intimately bound with the set of items on the list, there comes a point of no return on the project timeline where changing any of them is pragmatically impossible. Once this mysterious time threshold is crossed, changing the product source code may be easier than changing anything else on the list.
Got any other items to add to the list?