The past is an untrustworthy guide to the future

Tools

People like to talk about the downside to big data. Those who do typically focus on privacy or simply mock the human race itself, by highlighting how shallow much of the data is that we create. I worry about the former and can't argue fully against the latter. But that's the thing about big data; it's going to tell us what the reality is--like it or not--and if the reality is that we are primarily a species of shallow, self-indulgent, consumption-motivated morons, then the problem is not with big data. It's with you-know-who.

When I think about the downside to big data, I also tend to drift toward our human shortcomings. But I don't think about our idiot selves; I think about our clever, interested and motivated selves. And I think about how even the smartest, well-intentioned people are fallible and how these clever, sometimes brilliant people are the ones who will be trying to make sense out of the planetoids of data we will produce.

It's not the technology that makes big data useful--some hope essential--it's the people designing the analysis and extraction tools. It's the data scientists and their amazing algorithms. It's the experts. Bless their souls, but they aren't always perfect.

In my November 17 issue of ScienceNews--print edition if you must know--is a good example of what worries me. It is the complexity of our societies and our data that force our escalating reliance on experts and their assumptions. And it shows how assumptions, used as the basis for inferring trends, can become accepted constants that can lead us down the path to errors--some small, some big.

The article, by Bruce Bower, is a small one. The problem it highlights is big. It doesn't really have to do with big data per se, although the solution to the problem appears to be big data with better primary assumptions. It shows how banks confused risk and uncertainty, and the predicted dollar and Euro valuations, in the years leading up to, during and after the recent financial crisis.

Bower cited research by psychologist Gerd Gigerenzer of the Max Planck Institute for Human Development in Berlin, showing that the industry's "complex risk models consistently flub predictions about the relative values of the dollar and the euro in the coming year." (That Gigerenzer is a psychologist says something else about big data, which we will explore later.) He found that annual forecasts of currency values from December 2001 to December 2010, which guided banks' investment decisions, badly missed the mark nine out of 10 times, and said it would be "hard to predict currency values worse than the banks did."

The problem, as Gigerenzer sees it, is that most economists and other risk modelers don't distinguish between risk and uncertainty, and that economic models assume--there's that word again--that the financial world consists of known risks that can be calculated based on prior behavior of stock markets and other elements of the monetary system. But, he says, uncertainty rules in the real financial world, where risks can't be known in advance because a complex tangle of factors trigger new, extremely unlikely hazards.

Gigerenzer asserts that this assumption not only caused errors in forecasting, but added that "confusing risk with uncertainty was one of the causes of the financial crisis."

These are the experts the world came to rely on for measuring its financial stability. They are whizzes at evaluating complex equations and modeling the consequences. Their flaw, if Gigerenzer is correct, was in their initial assumptions that known risks that can be calculated based on prior behavior of stock markets and other elements of the monetary system.

As it turns out, Bower wrote, "In an uncertain environment, the past is an untrustworthy guide to the future."

As we move forward with big data and come to rely on the insights it brings, a close eye must be kept on our initial assumptions and the constants we use when designing queries. If big data can dig as deep as its proponents say it can, perhaps it will uncover these flaws as well. Until then, assume nothing.

Now, about Gigerenzer being a psychologist. What does this say about the future of big data? That anyone with enough data and clever algorithms can be a data scientist? Not to disparage psychologists, but how will we know who the experts are? Will we all be running around trying to confirm or refute someone else's conclusions? We have already seen what the Internet has done for the reliability of information. And we have seen what the volume of messaging has done to our brains and attention spans. Well, we ain't seen nothin' yet. Another fire hose is being turned our way. How will our small minds stand up to the next big deluge? How will we know what's credible?

Next week, we turn from assumptions to conclusions. Halfway through my new coffee table book, "The Human Face of Big Data," I am finding that there is a lot of subjectivity and relativism in the meaning being found through big data. I thought big data was supposed to reduce that. - Tim