Q&A with Infochimps CEO Jim Kaskade

Tools

Infochimps is a cloud services company that thinks big data shouldn't be so hard. Enterprises looking to leverage big data analytics to solve specific business problems should not have to get all wrapped up in implementing and managing a big data infrastructure stack. So, Infochimps does the wrapping by partnering with cloud infrastructure providers to provide access to streaming data and real-time analytics, and the interfaces for connecting applications to on demand infrastructure.

Jim Kaskade, chairman and CEO of Infochimps, holds the new generation of developers and data scientists, who are enabling better ways to leverage the cloud and database technology, in high regard. He believes they are re-shaping the world in their image. And this is from a guy who did some reshaping himself as Entrepreneur-in-Residence at Xerox PARC, where he helped advance PARC's big data program.

He also served as chief of cloud at SIOS and CEO of StackIQ, a cloud startup and big data operating system provider. Additionally, Kaskade spent 10 years in analytics and data-warehousing at Teradata, where he initiated the company's in-database analytics and data mining programs.

While there, Kaskade became an addict of sorts. He was first turned on by one of Teradata's founders who put this monkey on his back: You could create big companies by using data in an intelligent way. He has been feeding that monkey ever since. That he would end up at a company called Infochimps is purely coincidental.

Recently, he spent a few minutes with FierceBigData to discuss the state of big data and Infochimps' strategy for advancing it.

FBD: What brought you to Infochimps?

JK: I joined because the team is all under the age of 30. They know everybody in Silicon Valley who is under the age of 30 and invented all of the data technology that is disrupting the world right now. I feel like I have a team that not only understands what will give companies like IBM (NYSE: IBM) and Oracle (NASAQ:ORCL) a run for their money, but also understands the mindset of the new developer out there, the groups that will be creating intelligent applications that actually leverage data to a better extent than ever before because they're coming at it with a completely fresh mind.

When they need to analyze a billion rows of data to understand what's trending over the last week, they are not using an SQL database; they're using NoSQL. When they're using three years or five years of historical data and trying to do heavy decision-support queries against it, they are not leveraging the power of SQL on Teradata; they are leveraging Hadoop. These kids are using a whole new set of tools. They are using technology that is still in a rough stage, but with some work will really create some transformation in the data center.

FBD: How does that shape your company?

JK: What we do as a company that I think is extremely novel is really a function of a combination of this new know-how that takes advantage of what is happening out there and putting it into a go-to-market strategy. We were born on Amazon (NASDAQ: AMZN) like many startups, but we don't live in Amazon anymore. We are a cloud service provider. We operate in a network of Tier 4 data centers and sit right next to the data we and our customers are analyzing. We are in the trusted data centers of Fortune 1000 companies that have begun to outsource their data infrastructure and are taking advantage of big data technologies. We package cloud services and operate in a way that allows us to help answer questions that are mission-critical and are making a large revenue impact. We are not competing with Amazon or Rackspace by going after the public cloud, were going after virtual private clouds.

I know I have the tools to help organizations gain insights because they're proven; these technologies we're deploying aren't kids' toys. They are developed by kids, but they are enterprise-class solutions powering some of the biggest brands in the world, companies that have run circles around the platform incumbents because they thought out of the box. They didn't have the baggage. They weren't like me. I wouldn't have been able to do this because I came from schema land where you can't write data into the data store until you have the data model. That's how I was taught and I was taught by some of the biggest minds. I was trained by Jack Shemer, the founder of Teradata. Schema is how I was born to think. SQL is my language. These kids? They could care less about structured query language. This is a schema-and-read generation versus a schema-and-write generation.

FBD: How is your cloud service different from a typical public cloud service and why is it best for big data?

JK: Our cloud service is pay as you drink, we manage it, it's still all the things that people define as cloud, but it is all single-tenant installations. I'm not putting you in a multi-tenant environment. Our big customers all want their own installation. That's why we are a single-tenant for now. But our novel version of Hadoop, what we call our ephemeral or on-demand Hadoop, gives us the ability to provide multi-tenancy to multiple groups within the same organization; single tenant for one company--multi-tenant within the company.

I think people get hung up on multi-tenancy. In this early stage, it is just unnecessary. It is too early to have a multi-tenant environment for big data. You won't gain enough economies of scale right now.

We are powered by a special version of Cloudera enterprise because we believe there is importance in security and high availability, and being able to audit and manage with enterprises in mind, but we need to focus on doing that under the umbrella of the business problem we are trying to solve.

FBD: What trends are you seeing in cloud and big data?

JK: Most people who think of big data think of Hadoop. This is going to be the year where that changes. We are already starting to hear this and I have said it since the day I started that Hadoop is not your only solution. It's the 80/20 rule. You'll get 20 percent of your real solution with 80 percent hype, but the reality is you can't make a decision support or a complex query engine do what you need for every data query or data type use case. It's like a round peg in a square hole. Making Hadoop real-time? Come on. Everybody wants to make one data store the only data store. I say B.S.  I say NoSQL is allowing us to do great things, but there are different flavors and NoSQL analyses have their own nuances.

We have created a cloud service that commercializes one of the most powerful stream processing technologies out there.  

You can connect to any data source of any volume, velocity or variety. If you have data, we can connect to it. We can asynchronously connect to any data source at any time and then bring that together in a very easy way to manage the data flows into our cloud or directly into your application. So if you have a requirement to operate against a decision that needs to be made based on data that you are capturing in real-time, we can do that.

FBD: How is the open source developer community different from traditional vendor-based development?

JK: What's neat about these people who are inventing all these technologies in Silicon Valley is that none of them invented it with the idea of selling it. They invented it with the idea of solving a problem for themselves and created applications they are making their money on, but not the infrastructure. They gave the infrastructure away for free with the hope others would help them evolve it faster. The Open Source model is all about gaining more perspective and collaborating with your friends who are fellow developers. That is a tight community now. So that's the beauty of it. They are all talking to each other. That's the power of this generation.

FBD: What's next for Infochimps?

JK: Big data is all around solving specific problems around network analytics and optimizing around intelligent data services and self-organizing networks, and smarter wireless operations. Being mindful of the end goal and being able to provide a solution that supports it is what will get us ahead. So what you will see in our cloud over the next 12 months is not just about how our cloud services are great technically. We will be announcing a new solution to a business problem every one to two months. We will be educating the market on how to connect this data infrastructure to solve problems. Anybody who adopts and embraces the cloud delivery model is a customer candidate for us. We will make it easier for people to adopt the cloud delivery model.