Is big data the crack cocaine of millennial scientists?


This is a very baby-boomer thing to say, and I don't mean that in a good way, but I will say it anyway. I have sometimes wondered if the upcoming millennials have what it takes to carry on the world's great scientific traditions. It also is quite an inequitable criticism as I certainly don't have what it takes myself, as evidenced by my comparatively lowly status as a reporter.

Nonetheless, the activity of science does not seem compatible with the personality traits, mental gifts and shortcomings of millennials. Science is more often than not a painstakingly slow process. Millenials in turn, as described in a recent survey, are obsessed with instant gratification. It's not their fault; thanks in no small part to technology they were raised that way. As I pondered whether they really have the patience to continue the slow slog of knowledge accumulation that has progressed through the scientific method nugget by nugget for centuries, along comes big data and a new kind of scientist was born.

I thought the stars had aligned in a cosmic twist of fate--also known as a coincidence--to create the perfect match between science and the next generation of young minds who could now feed their desire for immediacy while advancing our true understanding of so many big issues. Problem solved.

Then I thought, oh great, big data will be like smoking crack to these young data scientists. Talk about instant gratification. New data scientists are already being billed as the sexiest new professionals around. With the money that will soon be thrown at them, I could see these scientific rock stars snorting data sets just to get out of bed in the morning.

I also could see their impatience and impertinence--traits identified in a study by Richard Sweeney of the New Jersey Institute of Technology--leading us down the path of easy answers, half-truths and unverifiable conclusions, as they conducted all their science in the cloud and with a keyboard. I wondered if millennial impatience fueled by the narcotic of unfettered access to data and some pretty exceptional algorithmic skills would lead to a reliance on computing the likelihood of things and creating models in lieu of doing the physical work and experimentation required for true understanding. After all, rumor has it these kids never get off the couch. Would we begin to accept probabilities and big data outputs as sufficient scientific evidence for everything?

Then, I heard my parents talking in my head, telling me about their five-mile trek to school every day, barefoot in the snow and uphill both ways, and realized that not only were these all very toothless baby-boomer things to say (and think), but that the world's great scientific traditions would be in very capable hands as the most educated generation in history takes over, and data scientists begin to uncover deeper truths and find answers to perplexing questions about life and business.

As it turns out, millennials are not only tailor made for big data--provided they do the math--they also have a few other qualities that will serve them well in a complex, diverse world. Sweeney says they also are experiential learners. They prefer to learn by doing. And they aren't solitary number-crunching simulation addicts; they also love to collaborate and learn by interacting. They are interested in processes and services that really work and speed their interactions.

But when it comes to big data, something else is lacking for all generations: perspective. We could all use an appreciation of where big data concepts came from and how long people have been working on ways to improve the management of data. Hint: it goes back more than 6,000 years.

You can get it by watching the keynote address of Mark Madsen, president and co-founder of Third Nature, given this week at the O'Reilly Strata conference in London. He gave a fascinating talk on the history of data storage and retrieval going back to the invention of metadata some 4,000 years ago in Babylon. Don't roll your eyes before you've seen it.

Madsen took the audience on a trip from the clay boxes that first contained contracts, to clay tablets that were indexes for a library of clay tablets, to what he calls scroll technology and Paper Tech 1.0 and Paper Tech 2.0, which arrived with movable type. And, he related it to today's advancements in big data. Interesting note: 10 percent of the clay tablets that have been found contain stories and religion, but 90 percent describe the daily grind of accounting.

I can't do the presentation justice, but you can watch it here. Every new data scientist and every old curmudgeon ought to take a look. - Tim