Skip to main content [aditude-amp id="stickyleaderboard" targeting='{"env":"staging","page_type":"article","post_id":1808623,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,cloud,dev,enterprise,","session":"D"}']

Google launches beta of Cloud Dataproc, a managed service for Hadoop and Spark

Inside a Google data center in Georgia.

Image Credit: Google

It’s a new day, and Google has a new cloud service for storing and processing big data. Google Cloud Dataproc, which is being launched in beta today, is a managed service for running Hadoop and Spark.

Independent startups like Qubole, Altiscale, and Xplenty offer commercial software for running open-source Hadoop on top of public clouds, but now there’s an option that’s native to the Google Cloud Platform.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":1808623,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,cloud,dev,enterprise,","session":"D"}']

“Cloud Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don’t need them,” Google Cloud Platform product manager James Malone wrote in a blog post on the new service. “With less time and money spent on administration, you can focus on your jobs and your data.”

Microsoft Azure and Amazon Web Services, two other major public clouds, both have their own first-party services for running Hadoop, with HDInsight and Elastic MapReduce, respectively. Support for Spark — the open-source big data processing framework that’s seen as a successor to the MapReduce engine — has come to both. Now Google will have its own full-fledged tool to compete directly. And that’s important in the growing public cloud market, where ease of use and cost are both critical factors.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

Developers can run batch and streaming jobs with the Google Cloud Dataflow service, but that’s a largely proprietary system not explicitly based on the widely used Hadoop open-source big data software.

Google makes it possible to run core Apache Hadoop on its public cloud with the Cloud Launcher quick-start tool, but Cloud Dataproc makes it easy to manage clusters once they’re running.

Like Google Compute Engine, the new service is priced by the minute after the first 10 minutes.

Learn more about the new service here.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More