Skip to main content [aditude-amp id="stickyleaderboard" targeting='{"env":"staging","page_type":"article","post_id":1499805,"post_type":"exclusive","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,","session":"B"}']
Exclusive

Databricks ropes in Alteryx to push Spark adoption for big data projects

George Mathew, president and chief operating officer of Alteryx, speaks at the company's "Inspire" user conference.

George Mathew, president and chief operating officer of Alteryx, speaks at the company's "Inspire" user conference.

Image Credit: Alteryx

Databricks thinks the open-source Spark engine is the next big thing for big data processing — so it has teamed up with analytics firm Alteryx to supercharge the software.

The two data startups intend to drive Spark into the hands of more data analysts through a formal partnership, Databricks and Alteryx have revealed to VentureBeat. Alteryx will become a primary committer to SparkR, part of the open-source, in-memory Spark engine often seen as the leading candidate to replace MapReduce, the company said.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":1499805,"post_type":"exclusive","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,","session":"B"}']

MapReduce, originally conceived at Google, is the initial programming model for the Hadoop ecosystem of open-source tools for analyzing lots of different kinds of data. But while MapReduce boasts strong scalability, fault tolerance, and throughput, it generally runs jobs on a batch basis. That is quite limiting in terms of latency and accessibility, argued Alteryx chief operating officer George Mathew in a conversation with VentureBeat.

You need a custom MapReduce programmer every time you want to get something out of Hadoop, but that’s not the case for Spark, said Mathew. Alteryx is working toward a standardized Spark interface for asking questions directly against data sets, which broadens Spark’s accessibility from hundreds of thousands of data scientists to millions of data analysts — folks who know who to write SQL queries and model data effectively, but aren’t experts in writing MapReduce programming jobs in Java.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

The Spark framework is well equipped to handle those queries, as it exploits the memory spread across all of the servers in a cluster. That means it can run analytics models at blazing-fast speeds compared to MapReduce: Programs can go as much as 100 times faster in memory or 10 times faster on disk. Those performance enhancements — and the subsequent customer demand — has prompted Hadoop distribution vendors like Cloudera and MapR to support Spark.

Databricks, founded by the creators of Spark, today announced $33 million in new funding, bringing its total venture financing to $47 million. It also revealed a new service for running Spark jobs and visualizing data on a Databricks-owned cloud. That’s another move by Databricks to make Spark as accessible as possible, a goal the Alteryx partnership will help push forward.

“We want to create a whole new generation of data blenders and analytics modelers that were never able to touch this stuff before,” Mathew said. “We’re just really excited to be working on this together.”

Alteryx will focus primarily on SparkR, while Databricks will focus largely on SparkSQL, according to Mathew.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More