Beware: The Black Hats are coming to data science
The Internet has flourished despite the openness and foundation of loose consensus that both enables it and makes it vulnerable, said Alistair Croll, founder of Solve for Interesting and host at O'Reilly Media, in a Webcast this week. But big data is about to raise the bar of vulnerability and the more successful big data becomes, the more serious the threat to its integrity.
"Big data is the Layer 8 protocol. It is where humanity's rubber meets technology's road and it has great power, but demands great responsibility. And it presents a great deal of risk unless we do it right," Croll said.
Kicking off a webcast called, "Data warfare: Data is our foundation. What happens when it's under attack?" Croll warned that the abuse of big data technologies has far reaching consequences that grow more serious as our dependency on information becomes more acute.
Because they can, hackers, malicious programmers, thieves and increasingly with big data--manipulators--will soon begin to corrupt information, blind well-intentioned algorithms and inject falsehoods into queries and data sets. While malicious activity will grow, the new threat may be more subtle, such as using false data to undermine a competitor or even flip an election.
We know this, Croll said, because every time a technology is invented or discovered, whether it be fire, knives or guns, someone finds a way to do evil with it. In the era of big data, evil may--and in some ways already has--been used to "reduce the level of reasoned discourse and justify a hard-line stance," he said, citing studies that show how the tone, particularly an angry tone, of comments can polarize a debate and prevent any chance for compromise.
Croll said that an Armistice around big data warfare is many years away. "We will see a lot of catastrophic and very visible examples of people swaying elections or bringing down competitors or basically fudging the numbers in subtle, optimized, data science-driven ways."
Who are these new evildoers? Joseph Turian, an expert on machine learning and natural language processing and head of MetaOptimize, a consultancy for predictive analytics and business intelligence, calls them Black Hats.
Black Hats have not yet been identified in the realm of data science, but they're coming, Turian said.
"Most industry practitioners of data science are in fact Grey Hats. If you are working on advertising and you prioritize advertiser needs over consumer privacy, then you are a grey hat," Turian said. He describes Grey Hats as, at best, indifferent and/or negligent towards ethical concerns.
There is an entire spectrum of grey hat activity and Turian identified companies such as RapLeaf, Mahalo, Zynga and Demand Media among the darker grey.
Turian scoffed at the notion that the data scientist is a sexy new profession and compared the allure more to the Gordon Gekko type of sexy from the 1987 film, "Wall Street," a character who today is universally reviled. If you are an evil genius in the 21st Century, what job would you take? If you were Gordon Gekko, you would problem aspire to be a data scientist.
"If our predictions are correct and the big data market becomes increasingly lucrative, we will attract more unscrupulous actors because unscrupulous actors will be drawn to any large market they could exploit for profit and gain," Turian said. "If we don't have Black Hat data scientists, we will get [them] soon because the money will be there."
The task upon the industry is to figure out what the attack vectors will be and reduce or eliminate the vulnerabilities.
- see O'Reilly Strata Webcast