There is an excellent post by Joel Spolsky on getting to the root cause of a problem. Joel suggests that rather than address just the obvious symptom, dig deeper and as each cause comes up ask why, until you get to a problem that, when addressed, will prevent the initial problem occurring. It’s a hard thing to explain, Joel’s example, slightly amended, should help.
- Our link to a server went down
- Why? Our switch appears to have put the port in a failed state
- Why? After some discussion with the operations centre, we speculate that it was quite possibly caused by an Ethernet speed / duplex mismatch
- Why? The switch interface was set to auto-negotiate instead of being manually configured
- Why? We were fully aware of problems like this, and have been for many years. But - we do not have a written standard and verification process for production switch configurations
- Why? Documentation is often thought of as an aid for when the sysadmin isn’t around or for other members of the operations team, whereas, it should really be thought of as a checklist
Five levels of digging may be too many or too few, it all depends on the situation. The aim of the exercise is to fix and prevent the incident occurring again, so dig as deep as you need.
As a society we have a tendency to view failure to be ignored or as a outcome to be punished. I’m not suggesting that we praise failure, or condoning malicious actions or incompetence, but we do learn more from failure than from success so we must use the experience to our advantage. In Millionaire Upgrade, Richard Parkes Cordock talks about how entrepreneurs view failure differently from most of society and how they analyse the experience, make adjustments and carry on. Thomas Edison is quoted as saying whilst experimenting to develop the storage battery “I have not failed, I’ve just found 10,000 ways that won’t work”. With this mindset dig away, not to cast blame but to fix the problem so it doesn’t happen again.
Young children love the word why. It can drive parents mad as each answer given results in the response “Why?” Infuriating as it is, this is simply the child’s thirst for knowledge. They don’t care about not understanding, they keep on digging until they understand or their parents collapse with exhaustion.
So, investigate problems like children. Don’t give up until you understand completely and embrace recognising and learning from failure.
April 16th, 2008 at 8:47 am
[…] this happens contact this person on this number”. As you come up against new problems use the Five Whys to get to the route of the problem and address that. And if you go skiing in a group make sure […]