Arm Says New Neoverse V1, N2 Server CPUs Faster Than Intel, AMD
‘We know the alternative architecture isn’t standing still, but we’re very confident that Neoverse N2 will continue to represent the ultimate in per socket performance and Neoverse V1 will offer the ultimate per thread performance into the future,’ Arm executive Chris Bergey says of the newly disclosed server CPU designs.
Arm has revealed two next-generation Neoverse server CPU designs, saying that the new V1 core for maximum performance and N2 core for scale-out performance will deliver significantly higher performance than processors made by Intel or AMD.
The British chip designer made the claims as part of a major update for its Neoverse server CPU roadmap Tuesday, a little more than a week after Nvidia announced that it plans to acquire Arm for $40 billion from its current owner in Japan, SoftBank Group.
[Related: IBM: Power10 CPU’s ‘Memory Inception’ Is Industry’s ‘Holy Grail’]
Arm’s new Neoverse disclosures were made as the company claimed significant progress in the data center market, with four of the world’s top seven hyperscalers adopting Arm-based processors for deployments and with Arm-based Fujitsu processors powering Fugaku, the newly minted fastest supercomputer in the world. Arm’s V1 design has already been provided to silicon partners, and the N2 design is already sampling with some partners, though full delivery won’t happen until next year.
“The emergence of Arm in the data center is being powered by many factors: customization, efficiency, ecosystem diversity, but all of that builds on top of performance,” Chris Bergey, senior vice president and general manager of Arm’s infrastructure business, said in a pre-briefing with journalists and analysts. “If Neoverse wasn’t delivering a significant measurable value proposition you would not see the market adoption and momentum that we are achieving.”
Bergey said the new V1 CPU core — previously code-named Zeus and part of the Neoverse’s new V-series for high single-threaded and machine learning performance — provides a more than 50-percent performance improvement over Arm’s N1 core. The N1 core is currently used to power Amazon Web Services’ Graviton2 processors and Ampere’s upcoming 128-core Altra Max processors, both of which have claimed significant gains over Intel’s and AMD’s server processors.
V1, which is designed for 7- and 5-nanometer process technologies, will be Arm’s first design core to support Scalable Vector Extensions (SVE), with two vector of 256 bit width, which will make it well-suited for high-performance computing and machine learning workloads, according to Bergey. The CPU will also support bfloat16, PCIe 5.0 connectivity, DDR5, HBM2e and CCIX 1.0 for bidirectional coherent communications between chips across sockets and in-package chiplets.
“At the implementation level, our silicon partners have full control over SVE voltage and frequency transitions, so there doesn’t have to be a frequency drop,” Bergey said. “Fujitsu’s A64FX CPU is a great example of this: They can run full frequency all day long while executing SVE code.”
With the new N2 CPU core — previously code-named Perseus and part of the N-series for scale-out and 5G applications — the design will provide more than a 50-percent performance improvement over N1 in addition to supporting SVE and bfloat16. N2 will be designed for 5nm process technologies, and, like V1, it will support PCIe 5.0 and DDR5, but it will go even further by supporting HBM3 for high-bandwidth memory as well as both CCIX 2.0 and CXL 2.0 for fabrics.
With a focus on scale-out deployments, N2 can support anywhere from 192 cores at 350 watts of thermal design power to eight cores at 20 watts, which means the CPU core is well suited for a wide range of applications, from cloud data centers to low-power edge gateways and routers.
“[CXL] could involve sharing a large pool of memory across a set of connected nodes, or it could mean just attaching a large amount of emerging memory to a single node,” Bergey said. “CXL is proving to be the preferred way to attach accelerators, where accelerators and the host can coherently access each other‘s memory. The most obvious use cases here are [machine learning] training and inference, but we expect new use cases to emerge by the time this hits the market.”
The biggest differences between the two new core designs is that V1 is designed to deliver the highest possible level of single-threaded performance, meaning more power consumption, while N2 is designed to deliver higher core counts and an optimized performance-power ratio for scale-out deployments.
“If your application is very CPU and bandwidth demanding, then V1 will give you the best performance per thread. But if your application is more scale out and needing more cores, then N2 may be a better choice as you will find more instances with higher core counts,” Bergey said. “So that‘s the beauty of Arm Neoverse, whether you need best per-thread performance or best scale-out performance, we think we’ve got you covered.”
With performance data from internal estimates, Bergey said an Arm N1 processor with 128 cores already provides higher performance per socket and higher performance per thread over AMD’s 64-core EPYC 7742 processor and Intel’s Xeon 8268 processor. But with a 96-core V1 processor and a 128-core N2 processor, those gains over the AMD and Intel processors increase significantly, he added.
“We know the alternative architecture isn’t standing still, but we’re very confident that Neoverse N2 will continue to represent the ultimate in per socket performance and Neoverse V1 will offer the ultimate per thread performance into the future,” he said.
Along with Arm’s advances in CPU designs, Bergey said the company has also built a solid foundation of software support in the data center, citing support from Red Hat, VMware, SUSE, Oracle Linux, KVM, Kubernetes, Docker, MySQL, MongoDB, Apache and other plays in the ecosystem.
“Arm‘s decade-long efforts to build an ecosystem of foundational infrastructure software is finally being seen,” he said. “Arm is now a first-class citizen with the largest continuous integration, continuous deployment of platforms. Even though we’ve been very focused on cloud-native, we continue to build an impressive list of commercial [independent software vendor] applications and have a lot of exciting developments in the pipeline.”
Dominic Daninger, vice president of engineering at Nor-Tech, a Burnsville, Minn.-based high-performance computing system integrator, told CRN that some of his university customers have previously expressed interest in Arm-based processors when the company shared a previous update because of Arm’s reputation for enabling chips with high energy efficiency.
But for a system integrator like Nor-Tech to consider selling and supporting Arm-based processors, there needs to be a greater ecosystem around the products, including server vendors like Supermicro and Gigabyte, according to Daninger.
“First of all, we‘d have to have a market interest in it and inquiries on it. And then you’ve got to have the infrastructure of some server motherboard manufacturer and CPU manufacturers to support it,” he said. “And you’re going to have to have some level of confidence that they’re going to be here in two years — that kind of thing. So all those things have to fall into place.”