Performance per dollar of GPUs and TPUs for AI inference

[ad_1]

Accelerating your AI innovation at scale

With a full vary of high-performance, cost-efficient AI inference decisions powered by GPUs and TPUs, Google Cloud is uniquely positioned to empower organizations to speed up their AI workloads at scale:

“Our crew is a large fan of Google Cloud’s AI infrastructure resolution and we use Google Cloud G2 GPU VMs for the ‘AI Filter’ characteristic in our AI pictures app, Remini, together with for the newest filters – ‘Barbie and Ken.’ Utilizing G2 VMs has allowed us to significantly decrease latency instances for processing by as much as 15 seconds per process. Google Cloud has additionally been instrumental in serving to us seamlessly scale as much as 32,00zero GPUs at peak instances like when our Remini app soared into the No. 1 total place on the U.S. App Retailer and right down to a every day common of two,00zero GPUs.” — Luca Ferrari, CEO and Co-Founder, Bending Spoons

“Cloud TPU v5e persistently delivered as much as 4X higher efficiency per greenback than comparable options out there for working inference on our manufacturing mannequin. The Google Cloud software program stack is optimized for peak efficiency and effectivity, taking full benefit of the TPU v5e hardware that was purpose-built for accelerating essentially the most superior AI and ML fashions. This highly effective and versatile mixture of hardware and software program dramatically accelerated our time-to-solution: as an alternative of spending weeks hand-tuning customized kernels, inside hours we optimized our mannequin to satisfy and exceed our inference efficiency targets.” — Domenic Donato, VP of Know-how, AssemblyAI

“YouTube is utilizing the TPU v5e platform to serve suggestions on YouTube’s Homepage and WatchNext to billions of customers. TPU v5e delivers as much as 2.5x extra queries for a similar value in comparison with the earlier technology.” — Todd Beaupré, Director of Product Administration, YouTube

To get began with Google Cloud GPUs and TPUs, attain out to your Google Cloud account supervisor or contact Google Cloud gross sales.

^{1. MLPerf™ v3.1 Inference Closed, a number of benchmarks as proven, Offline, 99%. Retrieved September 11th, 2023 from mlcommons.org. Outcomes three.1-0106, three.1-0107, three.1-0120, three.1-0143. Efficiency per greenback will not be an MLPerf metric.TPU v4 outcomes are Unverified: not verified by MLCommons Affiliation. The MLPerf™ title and emblem are logos of MLCommons Affiliation in the US and different international locations. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for extra data.
2. To derive efficiency per greenback for the Oracle BM GPU v2.Eight, we divided the QPS that Oracle submitted for the A100 outcomes by $32.00, the publicly out there server value per hour (US$). The Oracle system used Eight chips. To derive G2 efficiency per greenback, we divided the QPS from the L4 end result by $zero.85, the publicly out there on-demand value per chip-hour (US$) for g2-standard-Eight (a comparable Google occasion kind with a publicly out there value level) within the us-central1 area. The L4 system used 1 chip.
three. To derive TPU v5e efficiency per greenback, we divided the QPS by the variety of chips (four) used multiplied by $1.20, which is the publicly out there on-demand value per chip-hour (US$) for TPU v5e within the us-west4 area. To derive TPU v4 efficiency per greenback, we divided the QPS (inner Google Cloud outcomes, not verified by MLCommons Affiliation) by the variety of chips multiplied by $three.22, the publicly out there on-demand value per chip-hour (US$) for TPU v4 within the us-central2 area.}

[ad_2]

Source link

Accelerating your AI innovation at scale

Related News

You may have missed

Categories