Supplementing big data with crowdsourcing

Tools

An image developing of big data is of the lone, but brilliant, data scientist employed as the seer and overlord of all corporate data, creating algorithms that manipulate libraries full of data in an instant and bringing forth remarkable new insights. Yeah, that's not how it happens.

The models for big data will evolve over the next few years. Some models may resemble the one above, others will be more team-oriented in the data-science department; others will be business leaders running applications from third-party data scientists. One form, or at least related form, will be crowdsourcing, putting more "big" on the analysts themselves than on the data.

A recent example of crowdsourcing being applied to solving difficult problems was the immune system problem addressed by Dr. Ramy Arnaout of Beth Israel Deaconess Medical Center at Harvard Medical School featured this week in The Boston Globe.  

His problem was analyzing the makeup of genes that produce proteins involved in the immune system's ability to identify microbes. His approach was to extend the problem-solving to software programmers around the world rather than keep in contained in his circle of colleagues at Harvard and fellow biologists.  

He found those programmers on a platform called TopCoder and followed a practice commonly used in business rather than medical research, which was crowdsourcing. TopCoder has 461,886 programmers in its community.

Crowdsourcing lets researchers expand their talent pool. Karim Lakhani, associate professor in technology and operations management at Harvard Business School, and one of the leaders of the research, said in the Globe article that the amount of data generated in biology and other fields is growing much faster than the workforce of computational specialists.

This experiment suggests that academic researchers can sometimes benefit from thinking outside the box when it comes to solving hard problems, including enlisting minds outside of their field to do some of the thinking.

Researchers at Beth Israel Deaconess offered $6,000 in prize money and quickly got 102 people to submit software code attempting to solve the problem. 16 of the submissions were more accurate than code that Arnaout had written.

The researchers have already used this model on other issues, including predicting hot spots of medical need in Boston and for other big data biomedical research problems in HIV and genomics.

For more:
- see The Boston Globe article

Related Articles:
Getting maximum value from your big data initiative
The case for putting a cognitive scientist on your big data team
Business schools struggle with analytics training approach