An interesting take and includes a prediction on the future of big data storage.
It's that time of year when everyone predicts what we will all be grappling with next year. Some predictions are more notable and likely to occur than others, of course, and the IEEE Computer Society falls in that category.
Managing storage effectively is an ongoing challenge for businesses of any size. Certainly having a centrally intelligent way to manage a mix of storage elements should be a big help.
"By far the biggest use for Hadoop to date has been as a 'poor person's ETL'--that is, a form of data integration, at the risk of oversimplifying--rather than all the big, sexy data science we see constantly hyped," writes Matt Asay in his ReadWrite post. But that's changing according to a survey which shows a significant number of enterprises are beginning to do considerably more with Hadoop.
Facebook decided that Hive is not fast enough since it relies so heavily on the relatively sluggish MapReduce, a batch processing system. So Facebook created Presto to speed things along. But some in the industry are rejecting Presto saying Hive is plenty fast enough--even though Cloudera, HortonWorks and others are working hard on essentially speeding-up the Hive querying engine. Yes, there's drama in this tale.
The one weak point of Apache is that the file system lacks security."That's because of the underlying architecture so we re-architectured it," said Jack Norris, CMO of MapR. "Because if the architecture isn't right, no matter what you layer on top of it, issues will remain."
"The assumption was that supercomputers were cliché five years ago. People thought, 'I can run my simulation on my laptop,'" said Barry Bolding, a Cray vice president, at the company's Seattle headquarters. "That may have been true, so long as the data associated wasn't growing as well. But raw data is being created in exabytes as we sit here. More data means bigger computer, bigger computer means more data."
The space agency collects hundreds of terabytes of data every hour--"the equivalent of all the data flowing on the Internet every two days," according to JPL--which creates extreme challenges in data storage, processing and accessing. Here's how they manage it all and stay abreast of the deluge.
"AWS is far and away the biggest player in the market, but in terms of pure performance and computing power I don't think there is anyone that can compete with Bigstep," Hreninciuc said.
First on the agenda was an aggressive goal to make analytics affordable and accessible to all agencies--large and small. But to get there they had to first strategically consolidate data centers and launch a host of cloud initiatives. It appears they've finally arrived at the goal post.