MLPerf: Google’s Cloud TPUs and Nvidia’s Tesla V100 break AI training records

Khari Johnson @kharijohnson July 10, 2019 10:00 AM

Nvidia and Google Cloud set AI training time performance records, according to the latest round of benchmark results from the MLPerf benchmark consortium. Benchmarks help AI practitioners adopt common standards for measuring the performance and speed of hardware used to train AI models.

MLPerf v0.6 examines the training performance of machine learning acceleration hardware in 6 popular usage categories. Among results announced today: Nvidia’s Tesla V100 Tensor Core GPUs used an Nvidia DGX SuperPOD to complete on-premise training of the ResNet-50 model for image classification in 80 seconds. By contrast, the same task using a DGX-1 station in 2017 took 8 hours to complete model training. Reinforcement learning with Minigo, an open source implementation of AlphaGoZero model, took place in 13.5 minutes, also a new record.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":2512202,"post_type":"story","post_chan":"ai","tags":"category-computers-electronics,category-science-computer-science","ai":true,"category":"ai","all_categories":"ai,big-data,business,cloud,dev,enterprise,","session":"B"}']

At Nvidia, the latest training benchmark results are primarily the result of advances in software.

“In just a matter of seven months on the same DGX-2 station, our customers can now enjoy up to 80% more performance, and that’s due to all the software improvements, all the work that our ecosystem is doing,” a company spokesperson said in a phone call.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

Google Cloud’s TPU v3 Pods also demonstrated record performance results in machine translation from English to German of the Transformer model in 51 seconds. TPU pods also achieved record performance in the image classification benchmark of the ResNet-50 model with the ImageNet data set, and model training in another object detection category in 1 minute and 12 seconds.

Google Cloud TPU v3 Pods capable of harnessing the power of more than 1,000 TPU chips were first made available in public beta in May.

Submissions to the latest round of training benchmark tests were made by Intel, Google, and Nvidia. Nvidia and Google demonstrated they make some of the fastest hardware for training AI models in the world when MLPerf shared the first training benchmark results in December 2018.

This news follows the launch of MLPerf’s inference benchmarks for computer vision and language translation last month. Results of the inaugural MLPerf inference benchmark will be reviewed in September and shared publicly in October, MLPerf Inference Working Group cochair David Kanter told VentureBeat in a phone interview.

MLPerf is a group of 40 organizations that play key roles in the AI hardware and model creation space, such as Amazon, Arm, Baidu, Google, and Microsoft.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

Explore

AI Big Data Business Cloud Dev Enterprise

MLPerf: Google’s Cloud TPUs and Nvidia’s Tesla V100 break AI training records

AI Weekly

Explore

DDM 3 NEW test

DDM 2 new

Test 1 DDM Brand