The great SQL sequel

Tools

As part of a live preview for the upcoming Strata Conference next week in Santa Clara, Calif., Tim O'Brien, consultant and database expert, jumped into the SQL versus NoSQL debate in the world of big data. His position? SQL is not the bogeyman.

O'Brien calls SQL an expressive query language and says that it may just be going through some growing pains as it learns to scale, but even without the scale of new databases, it can still handle 90 percent of the requirements for data analysis by enterprise users. He will give a more complete argument at the conference, but here is some of what he had to say about the two main database structures: SQL and NoSQL.

He said he isn't attacking big data. O'Brien believes Hadoop and other big data technologies are essential for solving some problems and that some data sets are so huge and so diverse that SQL can never solve them, but SQL is not the bogeyman, he said. It doesn't deserve to be dismissed as too many people are doing today.

"More and more I see something of a backlash to the idea that the relational database is dead," O'Brien said. "For a handful of companies and some well funded startups, yes, relational databases are dead. But the majority of companies still have a massive investment in SQL."

He added that NoSQL is great when it's appropriate, but he said SQL is still the tip of the market that is driving the peaks and trends for the majority of developers. He said most businesses still use SQL for a majority of apps. Even Google's AdWords application was running on MySQL until a year ago, he said. "They ran it in a difficult, incredibly complex configuration, but AdWords resembles the traditional business applications that 99 percent of you are building. It requires transactions and consistency and it speaks SQL," O'Brien said.

O'Brien provided a survey of database migration over the last decade or so from the time open source Java first came onto the scene, Linux was still a question, and people were solving the scale problems then by simply adding more apps and servers and finally clusters of servers. They were focused on building their entire enterprise architecture and modeling in Unified Modeling Language.

By the end of 2005 and into early 2006, it all started to change. Open source Java finally killed the proprietary app server and companies stopped dropping a half-million dollars on a single vendor's app server. Open source databases were starting to make their challenge and Linux finally "arrived." There was also a cultural shift in which more people found open source acceptable.

This was the beginning of the singularity," O'Brien said. "It became clear what the future looked like … and in 2007, if you were working at a large web site, you saw what the future was and it was not a relational database."  

Then Google (NASDAQ: GOOG) ushered in the revolution of big data. But Google needs big data, O'Brien said. Facebook (NASDAQ: FB) needs big data. If you are a customer of Palantir or Tableau, you need big data. The IRS needs big data and the NSA needs big data. "These problems will never be solved with an RDBMS (relational database management system)," he said. "The challenge for big data is how to scale horizontally while providing the same interface most people are used to using?"

For more:
- see O'Reilly's streaming previews

Related Articles:
Beware: The Black Hats are coming to data science
Calling for a spectrum of intent in prosecuting hackers
Cloudera realizes 2-year effort on real-time queries for Hadoop