Chris Gabriel looks at recent data calculation errors and suggests that CIOs should address certain basics in the light of rapidly growing big data volumes.
Anyone interested in economics will have noticed the discussion surrounding a research paper entitled “Growth in a Time of Debt” produced by Carmen M. Reinhart and Kenneth S. Rogoff for the National Bureau of Economic Research.
Significantly, policymakers, particularly European, have been extensively drawing on their work to provide an intellectual basis for the fiscal austerity policies that are central to their response to the continuing Eurozone crisis.
Reinhart and Rogoff’s paper has made the front pages in recent weeks because flaws in the data have been found - specifically spreadsheet errors.
It appears that a simple formula error led to key calculations drawing on data from 15 countries, rather than the intended 20. As a result, a central plank of the academics’ paper – that growth falls to -0.1% when debt hits 90% of GDP – was far less definite and universal than it seemed. Revised figures, with the formula error corrected, suggest that growth falls to 2.2%, and that there are many exceptions to the ‘rule’.
We will, of course, never know whether policy responses to sovereign debt crises would have been different had the results been correct the first time round, but the story highlights just how easy it is to get things wrong, and, while this was not a large set of data, be amplified by the size of the numbers involved.
And these are not rare incidents.
At the end of last year a flaw in a “gigantic, highly complicated Excel spreadsheet with complex mathematical equations dotted all over it” cost UK taxpayers £100 million when risk modelling for a bid to run a regional train service was revealed as being incorrectly calculated.
Similarly, in 2010 Barclays inadvertently entered into 179 contracts when a lawyer mistakenly highlighted details in a spreadsheet as part of the purchase of Lehman Brothers.
It is not just users who get it wrong; Microsoft had to announce that Excel 2007 had twelve specific calculation errors.
But, these issues are not limited to Excel of course. In 1962 NASA’s Mariner 1 was destroyed minutes after a launch due to a missing hyphen in the code for transmitting navigation instructions.
But what can we learn from these issues? Perhaps, with hardware and software becoming increasingly reliable and ubiquitous, it can sometimes be easy to take reliability for granted – especially given how huge data volumes can amplify even the smallest error.
The point is this; technology can help business, but we must not forget that a key link between the two is people. This raises two business issues; first, that we cannot blindly treat technology as ‘plug and play’ and assume everything is fine simply because technology is in place – more often than not, it is human, expert input that rives value from technology.
Second, people are not infallible – mistakes happen, so it is important to ensure the right checks and balances are in place to ensure small mistakes do not have big consequences.
Those checks and balances are many and will vary according to the tools used and type of data, but at a minimum CIOs should be considering the:
- skills of the user base
- reliability of the source data
- idiosyncrasies of the toolset
- types of data quality checks
Finally we can probably learn something from Lt Col Stanislav Petrov who in 1983 was a duty officer monitoring the Soviet Union's early-warning satellite system. Alarm bells sounded indicating that the US had launched five ballistic missiles at Russia – in fact the satellite had picked up the sun’s reflection off the cloud tops and somehow interpreted that as a missile launch. Thankfully he followed his gut instinct, rather than the early warning system, and is credited as the man who averted a nuclear war. In his words; “I had a funny feeling in my gut that this was a false alarm.”