What Went Wrong?

strings connect notes pinned to pinboard - Credit: Breakermaximus

What Went Wrong?
Communications of the ACM, November 2021, Vol. 64 No. 11, Pages 94-96
Practice
By Poul-Henning Kamp

“For well over a decade, I have been arguing that governments should create IT accident investigation boards for the exact same reasons they have done so for ships, railroads, planes, and in many cases, automobiles.”

 

In April, 39 postmasters and sub-postmasters were cleared of wrongdoing by a court in the U.K. after being accused and sentenced for various forms of fraud and, in some cases, serving multiyear prison sentences.

 

In total, around 700 people have been prosecuted based on the “evidence” from a single IT system installed by the U.K. Post Office, and while some of them probably did embezzle money, it looks like the majority did not. They were sentenced based on evidence from an IT system, which … ehhh … to be honest, we don’t know what that IT system did, except we know it did it really, really badly.

 

Press reports have contained various mumblings and hand-waving about the shortcomings of the IT system, but nobody sat down and documented precisely what went wrong and what can be learned from it so that nobody ever makes a mistake like this again.

 

Had this been a ship sinking, a train derailing, or a plane crash, one of the U.K.’s official accident investigation boards would have come in and written a report everybody would be allowed to read, explaining what went wrong and how to avoid it ever happening again. But because no ships, trains, or airplanes were involved, there will be no such report.

 

For well over a decade, I have been arguing that governments should create IT accident investigation boards for the exact same reasons they have done so for ships, railroads, planes, and in many cases, automobiles.

Denmark got its Railroad Accident Investigation Board because too many people were maimed and killed by steam trains, and it has kept the board around because a thousand tons of steel hurtling along at 180km/h, just below a 25kV power line, can do a lot more damage than a steam locomotive with wooden wagons ever could.

 

The U.K.’s Air Accidents Investigation Branch was created for pretty much the same reasons, but, specifically, because when the airlines investigated themselves, nobody was any the wiser.

 

Does that sound slightly familiar in any way?

 

The crucial feature of any accident investigation board is that it focuses only on what went wrong and how to avoid it happening again, and not on whom to blame.

Read the Full Article »

About the Author:

Poul-Henning Kamp spent more than a decade as one of the primary developers of the FreeBSD operating system before creating the Varnish HTTP Cache software, which around one-fifth of all Web-traffic goes through. He is an independent contractor; one of his most recent projects was a super-computer cluster, to stop the stars twinkling in the mirrors of ESO’s new ELT telescope.

See also:

  • Cybersecurity & Infrastructure Security Agency (CISA), Cyber Safety Review Board: “The Executive Order establishes a Cybersecurity Safety Review Board, co-chaired by government and private sector leads, that may convene following a significant cyber incident to analyze what happened and make concrete recommendations for improving cybersecurity. Too often organizations repeat the mistakes of the past and do not learn lessons from significant cyber incidents. When something goes wrong, the Administration and private sector need to ask the hard questions and make the necessary improvements. This board is modeled after the National Transportation Safety Board, which is used after airplane crashes and other incidents.”
  • Cyber Safety Review Board – Review of the December 2021 Log4j Event (PDF): The first report of the board was published 11 July 2022 and described Log4j and Log4shell.