Some of the first few people to work on the Druid open-source data store are today launching a new startup, Imply, with $2 million in seed funding from Khosla Ventures.
Think of this as the next big-data startup to spin out, in the vein of Hadoop-oriented Hortonworks (former Yahoo), Kafka startup Confluent (former LinkedIn), and Drill startup Dremio (former MapR). In this case, Imply is spinning out of advertising analytics startup Metamarkets.
Imply now has a few paying customers, although Yang couldn’t mention their names. But without question, there’s a Druid user base to draw on: Cisco, eBay, Metamarkets, PayPal, Time Warner Cable, Yahoo. (Not surprisingly, Imply itself also uses Druid.)
“There are now many tech companies using it, to the point where we started getting contacted … biweekly, like, ‘Hey, can you guys help us with installation? Do you guys do support?’ Those types of questions,” Imply cofounder and chief executive Fangjin Yang told VentureBeat in an interview.
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
At Metamarkets, Eric “Cheddar” Tschetter (now at Yahoo) wrote the first few lines of Druid code in 2011. Yang jumped onto the project shortly after that. Metamarkets released the project under an open-source license in 2012. They started working on it full time in 2013. The point was to provide a backend for data on the performance of ads at scale. From there, Metamarkets provided dashboards in its cloud software for customers. Now other companies will be able to create dashboards and other software based on Druid, which is classified as an OLAP data store.
Of course, there are other data stores and databases that companies can use. The Druid website offers comparisons with Apache Hive/Cloudera Impala/Spark SQL/Presto, Amazon Web Services’ Redshift data warehouse, HP Vertica, Apache Cassandra, Apache Hadoop, Apache Spark, and Elasticsearch. One could also compare Druid to Cloudera’s new Kudu storage engine. There’s LinkedIn’s Pinot analytics data store, too.
Druid is designed to work at scale. At Metamarkets, for example, the cluster measures 50PB and sits on 200 nodes, with 50 trillion events, Yang said.
“It’s pretty hard to set up, but hopefully we’ll be able to make that easier,” said Yang, who is joined by former Metamarkets employees Gian Merlino and Vadim Ogievetsky. The team could reach 10 over the next year, Yang said.
Imply’s first product is the Imply Analytics Platform, which includes Druid and other open-source components like a user interface and the PlyQL SQL-like query language. It’s available for download today from Imply’s website.
It’s possible that Imply will put together a hosted version of its software, Yang said.
The Druid paper is here. Watch a video of Yang talking about Druid below.
An Imply blog post has more on today’s launch.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More