Intel Takes On Nvidia With Habana-Based AWS EC2 Instances

‘With the company’s plans to introduce new EC2 instances featuring Gaudi for deep learning training, AWS will further reduce the cost of training AI datasets and lower total cost of operations for customers,’ an Intel exec says of the new EC2 instances, which AWS claims will have up to 40 percent better price-performance than Nvidia-based instances.

ARTICLE TITLE HERE

Intel’s Habana Gaudi accelerators are set to power new Amazon Web Services EC2 instances, providing what AWS said is up to 40 percent better price-performance than similar cloud instances running Nvidia GPUs for training deep learning models.

The new AI chips from the Santa Clara, Calif.-based chipmaker will arrive in EC2 instances in the first half of 2021, more than a year after Intel acquired the Israeli startup behind the new products, Habana Labs, for $2 billion and axed its competing Nervana product line in pursuit of AI leadership.

But Intel’s cloud leadership and its pursuit of the AI chip crown are facing challenges on multiple fronts, including from AWS itself, which announced new EC2 instances powered by its Arm-based Graviton2 processors and its newly revealed Trainium machine learning chips alongside the new Intel Habana-based instances at the AWS re:Invent conference Tuesday.

AWS’ new Habana-based EC2 instances will feature up to eight Habana Gaudi accelerators, which began sampling with cloud service providers earlier this year. The accelerators are specifically designed for training deep learning models for things like natural language processing and object detection.

With eight Habana Gaudi cards — which each feature 32 GB of HBM2 memory and RDMA over Converged Ethernet integration to connect the processors — the new EC2 instance will be capable of processing about 12,000 images per second on the ResNet-50 image recognition training model using the TensorFlow framework, according to Habana Labs.

In contrast, Nvidia’s DGX A100 system, which is outfitted with eight A100 GPUs, can reach more than 17,000 images per second on ResNet-50 using TensorFlow, according to benchmarks published to Nvidia’s website.

However, as Eitan Medina, chief business officer at Habana Labs, highlighted in a blog post, the Gaudi processors are meant to provide a better price-performance ratio, meaning customers will pay less for performance over competitive products. In this case, Habana-based EC2 instances can provide up to 40 percent price-performance than EC2 instances running Nvidia GPUs, which include the new A100, according to internal tests performed by AWS.

“In this morning’s keynote, AWS CEO Andy Jassy underscored the massively expanding demand across industry sectors for high-performance, yet more affordable AI workloads,” Medina said. “With the company’s plans to introduce new EC2 instances featuring Gaudi for deep learning training, AWS will further reduce the cost of training AI datasets and lower total cost of operations for customers.”

Habana Labs, which operates independently within Intel’s Data Platforms Group, is working on second-generation Gaudi accelerator cards, which will rely on TSMC’s 7-nanometer manufacturing process and make AI training applications and services “even more accessible to a breadth of customers, data scientists and researchers,” according to Medina.

Kevin Krewell, principal analyst at Tirias Research, said while Nvidia A100 GPUs may score higher than Habana Gaudi cards for image recognition, AWS’ claim about the price-performance advantages of Gaudi means the Habana accelerator cards will likely provide a better total cost of ownership — or operations, in the case of renting cloud instances.

“I think the key difference is going to be in the cost to outfit those units in the data center. I think the Habana units are going to be a lot less expensive than the Nvidia Amperes,” he said, referring to the GPU architecture for Nvidia’s A100 products.

In other words, it’s not just about how fast a processor can get; its about how the design and cost structure of the processor can impact the overall costs of running a data center, which is an important consideration for many organizations, including AWS, according to Krewell.

It’s the same reason AWS is developing its own processors, like Graviton and Trainium, with cloud instances that have even lower price points than Intel- or AMD-based instances, he added.

“They‘re deploying at a certain price point for a customer base and for internal use, and that TCO factor the cost of procuring and operating is very important to the price point they sell it at,” Krewell said.

While it’s important for Intel to rack up big customer wins like AWS with Habana’s Gaudi chips at first, Krewell said, he expects the chipmaker will eventually turn to channel partners to sell the new product type into a broader market.

“It‘s a challenge. Nvidia’s got the market fairly well covered, but people are chipping away at it,” he said.