IBM doubles down on Apache Spark with cloud service, open-source code, and tech center

Jordan Novet @jordannovet June 14, 2015 9:01 PM

Image Credit: Open Grid Scheduler / Grid Engine/Flickr

IBM today announced several initiatives related to the Apache Spark open-source software for storing, processing, and analyzing lots of different kinds of data.

IBM is making Spark available as a cloud service on its BlueMix cloud platform. The company is releasing its SystemML software under a machine-learning license for the Spark community. IBM will open a Spark Technology Center in San Francisco. And IBM said it will teach Spark to more than 1 million data scientists and data engineers and direct more than 3,500 researchers and developers to work on projects involving Spark.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":1749578,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,business,","session":"C"}']

Legacy tech vendors have been slow to embrace Spark — which many see as a successor to Hadoop open-source big data software — but that’s just one reason why IBM’s set of moves today is significant.

Data has been an area of interest for IBM, and the company has moved to productize Hadoop, but Spark until now has not been a priority. IBM in recent years has bet large amounts of money on areas like the Internet of Things, software-defined storage, and Watson. Now big data is once again a focus, even if there’s no dollar amount at the top of today’s press release.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

IBM’s efforts represent a potential competitive threat to San Francisco startup Databricks, which claims to have committed more than 75 percent of the code added to Spark in the past year. The main commercial product from venture-backed Databricks is a cloud service that runs on top of the Amazon Web Services public cloud. IBM bringing out Spark on Bluemix equates to a direct attack on Databricks. And if IBM can get its people committing a sizable amount of code to Spark, that, too, could challenge Databricks.

IBM’s Spark announcements could improve matters for startups that have built software on top of Spark, including Adatao, Alpine Data Labs, and ClearStory Data.

But perhaps the biggest impact here is the coming increase in adoption of Spark in general. Big Blue providing Spark could help the project look suitable for big businesses and not just for startups.

“In the enterprise, I’m seeing almost no Spark adoption,” Nick Heudecker, a Gartner analyst, told VentureBeat in an interview last month. Going forward, that should change.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

Explore

None Big Data Business