A day is coming when data dashboards will be used for far more flexible and innovative reporting at the whim of a single user. The democracy of data will enable some truly wondrous things but only if the dashboard can deliver to the dictates of the user's unique demands of it.
While big data analytics are getting better, things don't appear to be improving much in the storage area. Sure, storage has gotten cheaper, but that does little to help storage teams straighten out the mess.
Lukas Biewald, CEO of the startup Crowdflower, has an interesting post on "the three levels of big data" that succinctly categorizes data set sizes, thus explaining what is and is not "big" in the universe of data.
Amazon Web Services announced a new streaming data real-time processing service called Kinesis. It was released in a limited preview with pay-as-you-go-pricing last Thursday but it already looks to be a substantial differentiator in the cloud provider race.
But wasn't the business of weather prediction always data-driven and predictive analytics always the crux of their business? You might ask this question. Yes, it was. But now those efforts are super-sized.
Facebook decided that Hive is not fast enough since it relies so heavily on the relatively sluggish MapReduce, a batch processing system. So Facebook created Presto to speed things along. But some in the industry are rejecting Presto saying Hive is plenty fast enough--even though Cloudera, HortonWorks and others are working hard on essentially speeding-up the Hive querying engine. Yes, there's drama in this tale.
There is always a dollar value assigned to data because no one doubts there is value--but is the cost calculated correctly or is it just a Plucked From Air number? In other words, are data sellers charging too much or too little and are buyers getting a heck of a deal or simply getting fleeced?
Hunk is a new software product from Splunk designed to explore, analyze and visualize data in Hadoop. Among its more interesting features are virtual indexing--it doesn't need real indexing, but Splunk users will find the familiar setup comforting and easy to use--schema-on-the-fly and the ability to customize visualizations.
In an effort to bring you as much useable news as possible in a single issue, here's a quick roundup of some of the announcements that were not showcased in this week's newsletter but will be covered in more detail in future posts.
"For the first time data scientists can work directly in Hadoop or Teradata databases," David Smith--vice president of marketing at Revolution R and a data scientist himself--told me. "It saves a lot of time leaving the data where it lives and not having to move it at all."