Skip to main content

Google upgrades its AI Hypercomputer for enterprise use at Cloud Next

A glowing blue and reddish server farm with one server prominently displaying a primary colored text logo of the company Google.
Credit: VentureBeat made with Midjourney V6

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


Back in December 2023 — an age in the tech industry — Google announced something it called an “AI Hypercomputer,” which it described as “groundbreaking supercomputer architecture that employs an integrated system of performance-optimized hardware, open software, leading ML frameworks, and flexible consumption models.”

The goal? To “boost efficiency and productivity across AI training, tuning, and serving” for customers of Google Cloud, its third-place cloud service competing with Microsoft and Amazon for enterprise customer dollars.

Essentially, paying Google Cloud customers would be able to access Google’s AI Hypercomputer software and hardware virtually to train their own AI models and applications.

At the time, Google noted how “customers like Salesforce and Lightricks are already training and serving large AI models with Google Cloud’s TPU v5p AI Hypercomputer — and already seeing a difference.”


AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

  • Turning energy into a strategic advantage
  • Architecting efficient inference for real throughput gains
  • Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO


Fast forward to today, at Google Cloud Next 2024, the division’s annual conference taking place this week in Las Vegas, when Google announced massive upgrades to its AI Hypercomputer platform and boasted about how even more high-profile customers are using it.

Google Cloud AI Hypercomputer gets more powerful and capable Nvidia chips

First up on the upgrade list: Google Cloud’s Tensor Processing Unit (TPU) v5p, its custom chip that it previously described as its “most powerful, scalable, and flexible AI accelerator thus far,” is entering general availability to Cloud customers.

Secondly, Google Cloud is upgrading its A3 virtual machine (VM) family powered by NVIDIA H100 Tensor Core GPUs to become A3 Mega. A3 Mega VMs will now feature Nvidia H100 Tensor Core GPUs, each with 80 billion transistors on a single chip, and will be made available to Google Cloud customers starting in May.

Additionally, Google Cloud plans to incorporate Nvidia’s latest Blackwell GPUs into its offerings, enhancing support for high-performance computing (HPC) and AI workloads, specifically in the form of virtual machines powered by Nvidia HGX B200 and GB200 NVL72 GPUs, the former which Google Cloud says is “designed for the most demanding AI, data analytics, and HPC workloads.”

Meanwhile, the latter “liquid-cooled GB200 NVL72 GPU[s]” will offer “real-time LLM inference and massive-scale training performance for trillion-parameter scale models.”

While trillion parameter AI models remain scarce for now — SambaNova’s and Google’s own research Switch Transformer model come to mind — chipmakers including Nvidia and Cerebras are increasingly racing to offer hardware that can handle them in anticipation of what they assume will be rapidly expanding AI model sizes from leading providers in the field.

Already, though, some high-profile Google Cloud customers are reaping rewards from the current A3 setup and looking forward to Mega — among them, Character.AI, the consumer-facing chatbot company valued at more than $1 billion.

“Character.AI is using Google Cloud’s Tensor Processor Units (TPUs) and A3 VMs running on NVIDIA H100 Tensor Core GPUs to train and infer LLMs faster and more efficiently,” said Noam Shazeer, CEO, of Character.AI, in a statement provided by Google. “The optionality of GPUs and TPUs running on the powerful AI-first infrastructure makes Google Cloud our obvious choice as we scale to deliver new features and capabilities to millions of users. It’s exciting to see the innovation of next-generation accelerators in the overall AI landscape, including Google Cloud TPU v5e and A3 VMs with H100 GPUs. We expect both of these platforms to offer more than 2X more cost-efficient performance than their respective previous generations.”

Riding the Google Cloud JetStream (…high above the whole scene)

On the software front, Google Cloud is introducing JetStream, a throughput- and memory-optimized inference engine for large language models.

This new tool is designed to enhance the performance per dollar on open models and is compatible with JAX and PyTorch/XLA frameworks, boosting efficiency and reducing costs.

Storage wars

Storage solutions are also getting an upgrade. Google has improved its storage products with features like caching, which positions data closer to compute instances. This accelerates AI training and fine-tuning, optimizes GPU and TPU usage, and enhances energy efficiency and cost-effectiveness.

Google is introducing Hyperdisk ML, a new block storage service optimized for AI inference and serving workloads, capable of dramatically reducing model load times and increasing throughput, offering up to 12X faster model load times and significant cost efficiency.

It supports up to 2,500 instances per volume and delivers 1.2 TiB/s throughput—surpassing similar offerings from Microsoft and AWS.

Cloud Storage FUSE is now generally available, a file-based interface for Google Cloud Storage that increases training throughput by 2.9X and performance by 2.2X for foundation models.

Parallelstore, a high-performance parallel filesystem available in preview, now features caching that increases training speeds up to 3.9X and boosts training throughput up to 3.7X compared to traditional ML framework data loaders.

Meanwhile, the Filestore system is tailored for AI/ML models and allows simultaneous data access across all GPUs and TPUs in a cluster, improving training times by up to 56%

A host of cloud software collaborations and upgrades

Google is introducing a host of new software updates for AI Hypercomputer, including new scalable implementations for diffusion models and language models built on JAX and optimized for both Cloud TPUs and NVIDIA GPUs.

The company said it will also support open-source code from PyTorch/XLA 2.3, introducing features such as auto-sharding and asynchronous distributed checkpointing to improve distributed training scalability.

In a collaboration with New York City-based AI code repository company and community Hugging Face, Google Cloud’s new Optimum-TPU will allow customers to optimize training and serving of Hugging Face AI models on Google’s TPUs.

In addition, Google models will be available as NVIDIA NIM inference microservices, facilitating a flexible platform for developers to train and deploy AI using preferred tools.

To complement these technical enhancements, Google Cloud is rolling out new flexible consumption models. A new feature called Dynamic Workload Scheduler allows Google Cloud AI Hypercomputer customers to reserve GPUs for their needs for 14 days at a time, up to eight in weeks in advance, which could be a big boon for training new models.

Google says it “helps you optimize your spend for AI workloads by scheduling all the accelerators needed simultaneously, and for a guaranteed duration.”

Altogether, this raft of updates to AI Hypercomputer shows the practical business benefits to customers of Google’s ongoing research and innovation in creating a seamlessly integrated, efficient, and scalable AI training and inference environment.

Now the big questions: how much does it all cost — pricing wasn’t immediately disclosed for AI Hypercomputer usage and all the various offerings described above, but that is pretty typical for multifaceted and complex enterprise software-as-a-service (SaaS) offerings of this nature.

Also: who will bite and choose it over or in place of Microsoft Azure or AWS for AI enterprise development and deployment? And maybe most importantly: can Google, notorious for starting interesting efforts and then abandoning, shuttering, or changing them beyond recognition, convince customers that it will continue supporting AI Hypercomputer and improve it moving forward?