Intuitive questions about over-reaching with big data


From college campuses to independent think tanks, people are intuitively realizing something important, and concerning, about big data. It is easy to get tangled up and mislead by misreading and misapplying the results.

A sophomore at Syracuse University--advertising and marketing management major Jared Rosen--identifies rather humorously in his Daily Orange blog last week the pitfalls of making unnecessary correlations between voter actions leading up to today's big presidential election.

He said that integrating big data analytics into political campaigns may help a campaign more accurately uncover the electorate's core beliefs and therefore shape its message, but it is a fallacy to believe that if you identify the cross section of buyers who care about the environment, and chart them against the percentage of pretzel-eating, mid-20-year-olds, that it necessarily outlines the voters' stance on offshore drilling. Rosen said correctly what so many are afraid of when it comes to big data: "Overanalyzing data can lead to numbness across an audience."

Consulting company KPMG, also said recently that companies need to avoid over-analysis in order to realize the benefits of big data. Eddie Short, partner and head of business intelligence at KPMG Management Consulting, said the volume of information generated on a minute-by-minute basis presents the genuine danger of companies spending their time focusing on the data itself, and missing out on the business improvements it can bring. And even then, he said, it is easy to select the data that simply suits your own hypothesis.

And this is just one of the moral hazards of big data. Here are seven more that pertain to the very act of measurement, which at its core is what big data is--These seven moral hazards of measurements are identified by Ron Baker, co-founder of a think tank for professional knowledge firms called the VeraSage Institute. They apply not just for big data, but for any kind of analysis with measurement at its core.

Baker's bottom line regarding measurements is: "[The] exact measurements of the wrong things can drive out good judgments of the right things, imperiling our future." He adds that believing we can manage something just because we can measure it allows people to "substitute statistics for thinking" and gives them a "false sense of security where there should exist more doubt."

The hazards include believing that just because you can measure consumers that you are measuring people, forgetting that in some cases, a version of Heisenberg's Uncertainty Principle applies and you effectively change what you measure, that entrenched measurements become conventional wisdom and aren't challenged, that measurements are not always reliable, comparisons may not be apples to apples, ideas can't be measured, and that most data provides only lagging indicators that say where we have been, not where we are going.

You can see Baker's moral hazards in more detail here. They are good rules of thumb for data scientists and business analysts who are working with big data.

For more:
- see Baker's Seven Moral Hazards of Measurement.

Related Articles:
Getting beyond the data warehouse, beyond Hadoop and beyond question
How big data is changing the world, how it isn't
Brown University awarded $1.5 million grant to develop big data analytical tools