Nathalie Jonsson

Science Writer 

Is big data useless just because it disrupts scientific tradition?


I was once in a room of scientists discussing big data.The discussion went something like this:

Big data is too big. How would you know what to look for? In traditional scientific investigation, you have one hypothesis and gather data to either prove or disprove that one hypothesis. Everything else you conclude and think you might see in the data are only your suspicions. To know for sure you need to conduct a whole new experiment focusing on each individual suspicion.

Big data is not very precise. One of the most important measures used to validate traditional scientific data is statistical significance, which is an indicator of whether or not your result is likely to be a real effect or just be a coincidence.

Then the discussion moved on to how Google had ambitiously tried to predict flu epidemics by tracking searches made by the public and failed.

And at that moment, when big data had almost been written off as the latest trend that would never add value to anyone ever, I realised that we were all wrong. That just because we could not apply our traditional measures of what is accurate data on large volumes of data does not make it useless.

Today, big data does play an important part in tracking epidemics and plays an important part in tracking Ebola right now. In fact, it seems like the HealthMap algorithm that mines the social web for mentions of Ebola managed to detect the epidemic nine days before the World Health Organization.

So, it is quite harsh to deem an entire science useless just because it gets predictions wrong, try telling Francis Galton, English scientist born in 1822, who constructed the world’s first weather map.

Back to index