Microsoft Azure continues to infuse its cloud platform with HPC- and AI-directed applied sciences. Right this moment the cloud providers purveyor introduced a brand new digital machine household geared toward “supercomputer-class AI,” backed by Nvidia A100 Ampere GPUs, AMD Epyc Rome CPUs, 1.6 Tbps HDR InfiniBand, and PCIe four.zero connectivity. The NDv4 VM situations are scalable to greater than 100 billion parameters and exaops of compute, based on Evan Burness, principal program supervisor for HPC & Huge Compute at Azure.
“In our continuum of Azure innovation, we’re excited to announce the brand new ND A100 v4 VM collection, our strongest and massively scalable AI VM, accessible on-demand from eight, to hundreds of interconnected Nvidia GPUs throughout a whole bunch of VMs,” mentioned Ian Finder, senior program supervisor, accelerated HPC infrastructure at Azure.
Earlier than constructing these situations into its Azure cloud service, Microsoft first designed and deployed an AI supercomputer for OpenAI out of comparable parts: Nvidia GPUs and AMD Epyc Rome chips. With greater than 285,000 CPU cores, 10,000 GPUs and 400 gigabits-per-second of community connectivity for every GPU server within the cluster, Microsoft claimed the system would place throughout the prime 5 echelon of the Prime500 record (though it didn’t seem on the June 2020 version of the bellwether record).
The supercomputer allowed researchers to ascertain OpenAI‘s 175-billion-parameter GPT-Three mannequin, which is capable of assist duties it wasn’t explicitly skilled for, together with composing poetry and language translation, advancing synthetic intelligence towards its foundational goal.
The brand new situations are a part of Azure’s NDs-series VMs, designed for the wants of AI and deep studying workloads.
NDv4 VMs are a comply with on to the NDv2-series digital machines, constructed on prime of the Nvidia HGX system, powered by eight Nvidia V100 GPUs with 32 GB of reminiscence every, 40 non-hyperthreaded Intel Xeon Platinum 8168 processor cores, and 672 GiB of system reminiscence. Azure NDv3 collection, at present in preview, function the Graphcore IPU, a novel structure that permits high-throughput processing of neural networks even at small batch sizes.
The ND A100 v4 VM collection brings Ampere A100 GPUs into the Azure cloud simply 4 months after their debut launch at GTC (Nvidia’s GPU Expertise Convention), illustrating the sped-up adoption cycle of AI- and HPC-class applied sciences flowing into the cloud. Google Cloud launched its A2 household, primarily based on A100 GPUs, lower than two months after Ampere’s arrival. Cloud big AWS has mentioned it would provide A100 GPUs.
“The ND A100 v4 VM collection is backed by an all-new Azure-engineered AMD Rome-powered platform with the newest hardware requirements like PCIe Gen4 constructed into all main system elements. PCIe Gen four and NVIDIA’s third-generation NVLink structure for the quickest GPU-to-GPU interconnection inside every VM retains information shifting by way of the system greater than 2x sooner than earlier than,” Finder said in a weblog put up.
He added most prospects can count on “a right away increase of 2x to 3x compute efficiency over the earlier technology of techniques primarily based on Nvidia V100 GPUs with no engineering work,” whereas leveraging A100 options, similar to multi-precision, sparsity acceleration and multi-instance GPU (MIG), prospects present as much as a 20x increase.
“Azure’s A100 situations allow AI at unimaginable scale within the cloud,” mentioned companion Nvidia. “To energy AI workloads of all sizes, its new ND A100 v4 VM collection can scale from a single partition of 1 A100 to an occasion of hundreds of A100s networked with Nvidia Mellanox interconnects.”
The accelerated compute chief added, “This [announcement] comes on the heels of prime server makers unveiling plans for greater than 50 A100-powered techniques and Google Cloud’s announcement of A100 availability.”
Azure ND A100 v4 machines can be found now in preview.
For extra particulars, see https://azure.microsoft.com/en-us/weblog/bringing-ai-supercomputing-to-customers/