Raymond's reading list: The Mythical Man-Month, The Design of Everyday Things, and Systemantics
The first two of these books are probably
on everybody else's reading list,
but I'm going to mention them anyway
since I consider them required reading for managers and designers.
The Mythical Man-Month
is over 30 years old, but the lessons contained therein are as true now
as they were back in 1975,
such as what is now known as
Brooks' law:
Adding manpower to a late software product makes it later.
I much preferred the original title for
The Design of Everyday Things, namely,
The Psychology of Everyday Things,
but I'm told that booksellers ended up mistakenly filing the book
in the psychology section.
Once you've read this book, you will never look at a door the same way again.
And you'll understand the inside joke when I say,
"I bet it won an award."
The third book is the less well-known
Systemantics: How Systems Work and Especially How They Fail.
The book was originally published in 1978,
then reissued under the slightly less catchy title,
Systemantics: The Underground Text of Systems Lore,
and re-reissued under the completely soul-sucking title
The Systems Bible.
I reject all the retitling and continue to refer to the book as
Systemantics.
Systemantics is very much like The Mythical Man-Month,
but with a lot more attitude.
The most important lessons I learned
are a reinterpretation of Le Chatelier's Principle for
complex systems
("Every complex system resists its proper functioning")
and the Fundamental Failure-Mode Theorem
("Every complex system is operating in an error mode").
You've all experienced the
Fundamental Failure-Mode Theorem:
You're investigating a problem and along the way you find
some function that never worked.
A cache has a bug that results in cache misses when there should be hits.
A request for an object that should be there somehow always fails.
And yet the system still worked in spite of these errors.
Eventually you trace the problem to a recent change that exposed
all of the other bugs.
Those bugs were always there, but the system kept on working
because there was enough redundancy that one component was able
to compensate for the failure of another component.
Sometimes this chain of errors and compensation continues for
several cycles, until finally the last protective layer fails and the
underlying errors are exposed.
That's why I'm skeptical of people who look at some
catastrophic failure of a complex system and say,
"Wow, the odds of this happening are astronomical.
Five different safety systems had to fail simultaneously!"
What they don't realize is that one or two of those
systems are failing all the time, and it's up to the
other three systems to prevent the failure from turning into a disaster.
You never see a news story that says
"A gas refinery did not explode today because simultaneous
failures in the first,
second, fourth, and fifth safety systems
did not lead to a disaster thanks to a correctly-functioning third system."
The role of the failure and the savior may change over time,
until eventually all of the systems choose to have a bad day
all on the same day,
and something goes boom.