Could little data derail big data?
People stupid enough to let their lives be exposed in reality shows worry about privacy only after their 15 minutes has run its course. People stupid enough to wind up in prison find out what privacy used to be when they share a commode with their "cellies" in an eight-by-eight, poorly ventilated room. So, rhetorical question--what do you call those of us who claim to be too busy to read and understand the privacy policies of the companies we do business with?
Verizon Wireless will begin using (and sharing) mobile usage information and consumer information for certain business and marketing reports and for making mobile ads we see more relevant. It gives instructions for opting out of the program. But here is their promise: Under these programs, we will not share any information that identifies you personally. In a second statement Verizon Wireless clarified that and said "we will not share outside of Verizon any information that identifies you personally."
The questions left in everyone's mind who reads this are: How can you prove that you or the companies you partner with are not identifying me personally? And if within Verizon those connections are being made, will they be readily available either by force of law (CALEA) or as privacy laws change in the future?
To be honest, I am confident the protections can be put in place for data shared with other entities to not be personally identifiable. In fact, the business models for mobile data and for big data depend on it. I am less confident that data, compiled and analyzed within Verizon and connected to me personally, won't find its way out eventually.
The real concern here is that the sharing of usage data and consumer data outside of a provider's domain and commingling it with other data is at the heart of the high stakes analytics game known as big data. If a battle escalates between privacy advocates and networked service providers and their partners, it could delay or derail big data before it gets off the ground. A lot of big money says that won't happen, but a lot of the data sets required for making useful science out of big data come from smaller data sources like this. Those sources could be dammed up river, leaving Hadoop and other NoSQL databases stuck on a dry mesa somewhere.
So as mobile operators and other companies with lots of information about us begin to exploit the gold mines that we are, they should try to do it in a way that brings the public along with them and not be dictatorial about it. - Tim