Breaking News

AI training benchmarks push the limits of hardware

Since 2018, the MLCommons consortium has been organizing a sort of Olympics for AI training. The competition, called MLPerf, consists of a set of tasks to train specific AI models, on predefined datasets, with a certain accuracy. Essentially, these tasks, called benchmarks, test how well a low-level hardware and software configuration is configured to train a particular AI model.

Twice a year, companies gather their submissions (usually clusters of CPUs, GPUs, and software optimized for them) and compete to see which of the submissions can train the models the fastest.

There is no doubt that since MLPerf’s inception, the cutting-edge hardware for AI training has improved significantly. Over the years, Nvidia has released four new generations of GPUs that have since become the industry standard (the latest, Nvidia’s Blackwell GPU, is not yet standard but growing in popularity). Companies competing with MLPerf also use larger GPU clusters to accomplish training tasks.

However, the MLPerf criteria have also become stricter. And this increased rigor is intentional: the benchmarks attempt to keep pace with the industry, says David Kanter, head of MLPerf. “The criteria are meant to be representative,” he says.

Intriguingly, the data shows that large language models and their precursors grew in size faster than hardware followed. So, every time a new benchmark is introduced, the fastest training time gets longer. Then, hardware improvements gradually reduce the execution time, only to be thwarted again by the next benchmark. Then the cycle repeats.

This article appears in the November 2025 print issue.

From the articles on your site

Related articles on the web

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button