Google today announced that its Cloud Dataproc service — a fully managed tool based on the Hadoop and Spark open source big data software — is now generally available.
The service — which supports the MapReduce engine, the Pig platform for writing programs, and the Hive data warehousing software — first became available in beta in September. And Google has enhanced the tool since then.
[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":1880906,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"big-data,cloud,dev,","session":"C"}']“While in beta, Cloud Dataproc added several important features including property tuning, VM metadata and tagging, and cluster versioning,” Google product manager James Malone wrote in a blog post.
This tool complements a separate service called Google Cloud Dataflow for batch and stream processing. The underlying technology for the service has been accepted as an Apache incubator project under the name Apache Beam.
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
Now that Google Cloud Dataproc is available for anyone to use — at 1 cent per hour for each vCPU — the Google Cloud Platform is a bit more equipped to take on public cloud market leader Amazon Web Services as well as Microsoft Azure and the IBM public cloud. All three already have big-data services.
Meanwhile there are also startups that provide Hadoop as a service. But the distinction for this service is that Cloud Dataproc hooks in with other Google cloud services, like Google Cloud Storage, Google Cloud Bigtable, and BigQuery.
Documentation for the service is here.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More