Single-digit mistake causes massive IT crash

A sign at Dunedin Hospital advises that the SDHB computer system is down. Photo by Shawn McAvinue.
A sign at Dunedin Hospital advises that the SDHB computer system is down. Photo by Shawn McAvinue.
A single incorrect digit entered into a computer system caused the massive IT crash in southern hospitals in February, a report from the Southern District Health Board says.

An incorrectly entered internet protocol (IP) address caused an outage that put hospital systems into chaos for about 36 hours.

While human error caused the crash, the underlying cause was a lack of maintenance, and staff have since been given more time for system checks.

The health board considered a report on the matter at a closed-door audit and risk meeting in Dunedin this week, and later released a summary to the Otago Daily Times.

''The root cause of the outage event was the incorrect configuration of the monitoring and alerting system on the storage area network equipment. This was due to an incorrect IP address being entered into a configuration field on the altering software and is most certainly to have been caused by human error. A single digit in a 12-digital address was incorrect.''

For several months, system administrators had no need to use the console, and were not seeing disk errors on it.

''As disks failed, the storage area network automatically reassigned spare disks to cover these failures.

''In the normal course of events, these failed disks would have been physically replaced with new ones that would then have been marked as spares.

''As there was no indication that the disks had failed and spares had been used, this was not occurring.''

When all available disk spares were used, the system shut down to preserve its data. Strict adherence to Monday to Friday morning checks was now followed, and administrators had been given extra time for these.

A ''phone home'' system had been set up to alert IBM of problems directly, as well as other measures to strengthen the system.

Dunedin North MP David Clark said he remained ''frustrated'' that the SDHB had still not released the full report. The summary had explained some of the ''what'' behind the IT outage, but had not explained why this had occurred.

Health Minister Tony Ryall had said last week he was happy for it to become public.

Dr Clark acknowledged the SDHB had put some precautionary measures in place, but said many questions remained unanswered until the full report became public. It remained unclear if the SDHB had previously been warned about potential IT risks, and how robust the board's IT system was, and if there were risks of future problems.

He intended to continue asking questions, and expected that the full report would be promptly released.


Add a Comment