Work started (on the 503 Mark II software system) with a team of fifteen programmers and the deadline for delivery was set some eighteen months ahead in March 1965.
Although I was still managerially responsible for the 503 Mark II software, I gave it less attention than the company’s new products and almost failed to notice when the deadline for its delivery passed without event.
The programmers revised their implementation schedules and a new delivery date was set some three months ahead in June 1965. Needless to say, that day also passed without event.
I asked the senior programmers once again to draw up revised schedules, which again showed that the software could be delivered within another three months. I desperately wanted to believe it but I just could not. I disregarded the schedules and began to dig more deeply into the project.
The entire Elliott 503 Mark II software project had to be abandoned, and with it, over thirty man-years of programming effort, equivalent to nearly one man’s active working life, and I was responsible, both as designer and as manager, for wasting it.
Mr. Hoare’s classic speech is the source of a few great quotes that have transcended time:
I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult…. No committee will ever do this until it is too late.
A feature which is included before it is fully understood can never be removed later.
At first I hoped that such a technically unsound project would collapse but I soon realized it was doomed to success.
The price of reliability is the pursuit of the utmost simplicity. It is a price which the
very rich find most hard to pay.
The mistakes which have been made in the last twenty years are being repeated today on an even grander scale. (1980)
Dontcha think that last quote can be restated today as:
The mistakes which have been made in the last fifty years are being repeated today on an even grander scale.
Since man’s ability to cope with complexity is relentlessly being dwarfed by his propensity to create ever greater complexity, the same statement might probably be true 50 years hence, no?
When a failure occurs in a complex, networked, socio-technical system, the probability is high that the root cause is located far away from the failure detection point in time, space, or both. The progression in time goes something like this:
fault ———–> error———-> error—————–>error——>failure discovered!
An unanticipated fault begets an error, which begets another error(s), which begets another error(s), etc, until the failure is manifest via loss of life or money somewhere and sometime downstream in the system. In the case of a software system, the time from fault to catastrophic failure may take milliseconds, but the distance between fault and failure can be in the 100s of thousands of lines of source code sprinkled across multiple machines and networks.
Let’s face it. Envisioning, designing, coding, and testing for end-to-end “system level” error conditions in software systems is unglamorous and tedious (unless you’re using Erlang – which is thoughtfully designed to lessen the pain). It’s usually one of the first things to get jettisoned when the pressure is ratcheted up to meet some arbitrary schedule premised on a baseless, one-time, estimate elicited under duress when the project was kicked-off. Bummer.