Microsoft Azure continues to infuse its cloud platform with HPC- and AI-directed applied sciences. Right this moment the cloud providers purveyor introduced a brand new digital machine household geared toward “supercomputer-class AI,” backed by Nvidia A100 Ampere GPUs, AMD Eypc Rome CPUs, 1.6 Tbps HDR InfiniBand, and PCIe four.zero connectivity. The NDv4 VM cases are scalable to greater than 100 billion parameters and exaops of compute, in line with Evan Burness, principal program supervisor for HPC & Large Compute at Azure.
“In our continuum of Azure innovation, we’re excited to announce the brand new ND A100 v4 VM sequence, our strongest and massively scalable AI VM, out there on-demand from eight, to hundreds of interconnected Nvidia GPUs throughout tons of of VMs,” stated Ian Finder, senior program supervisor, accelerated HPC infrastructure at Azure.
Earlier than constructing these cases into its Azure cloud service, Microsoft first designed and deployed an AI supercomputer for OpenAI out of comparable components: Nvidia GPUs and AMD Eypc Rome chips. With greater than 285,000 CPU cores, 10,000 GPUs and 400 gigabits-per-second of community connectivity for every GPU server within the cluster, Microsoft claimed the system would place inside the prime 5 echelon of the Prime500 checklist (though it didn’t seem on the June 2020 version of the bellwether checklist).
The supercomputer allowed researchers to determine OpenAI‘s 175-billion-parameter GPT-Three mannequin, which is ready to assist duties it wasn’t explicitly skilled for, together with composing poetry and language translation, advancing synthetic intelligence towards its foundational goal.
The brand new cases are a part of Azure’s NDs-series VMs, designed for the wants of AI and deep studying workloads.
NDv4 VMs are a observe on to the NDv2-series digital machines, constructed on prime of the Nvidia HGX system, powered by eight Nvidia V100 GPUs with 32 GB of reminiscence every, 40 non-hyperthreaded Intel Xeon Platinum 8168 processor cores, and 672 GiB of system reminiscence. Azure NDv3 sequence, presently in preview, characteristic the Graphcore IPU, a novel structure that allows high-throughput processing of neural networks even at small batch sizes.
The ND A100 v4 VM sequence brings Ampere A100 GPUs into the Azure cloud simply 4 months after their debut launch at GTC (Nvidia’s GPU Know-how Convention), illustrating the sped-up adoption cycle of AI- and HPC-class applied sciences flowing into the cloud. Google Cloud launched its A2 household, based mostly on A100 GPUs, lower than two months after Ampere’s arrival. Cloud big AWS has stated it is going to supply A100 GPUs.
“The ND A100 v4 VM sequence is backed by an all-new Azure-engineered AMD Rome-powered platform with the most recent requirements like PCIe Gen4 constructed into all main system elements. PCIe Gen four and NVIDIA’s third-generation NVLink structure for the quickest GPU-to-GPU interconnection inside every VM retains knowledge shifting by the system greater than 2x quicker than earlier than,” Finder said in a weblog submit.
He added most prospects can anticipate “an instantaneous enhance of 2x to 3x compute efficiency over the earlier era of methods based mostly on Nvidia V100 GPUs with no engineering work,” whereas leveraging A100 options, reminiscent of multi-precision, sparsity acceleration and multi-instance GPU (MIG), prospects present as much as a 20x enhance.
“Azure’s A100 cases allow AI at unbelievable scale within the cloud,” stated companion Nvidia. “To energy AI workloads of all sizes, its new ND A100 v4 VM sequence can scale from a single partition of 1 A100 to an occasion of hundreds of A100s networked with Nvidia Mellanox interconnects.”
The accelerated compute chief added, “This [announcement] comes on the heels of prime server makers unveiling plans for greater than 50 A100-powered methods and Google Cloud’s announcement of A100 availability.”
Azure ND A100 v4 machines can be found now in preview.
For extra particulars, see https://azure.microsoft.com/en-us/weblog/bringing-ai-supercomputing-to-customers/