AMD unveils ‘world’s fastest’ HPC accelerator
High-performance computing GPU is engineered for the exascale era
Advanced Micro Devices Inc. (AMD), Santa Clara CA, has unveiled its Instinct MI100 accelerator – and what the firm is calling the ‘world’s fastest’ HPC GPU and the first x86 server GPU to surpass the 10 teraflops (FP64) performance barrier.
Supported by new accelerated compute platforms from Dell, Gigabyte, HPE, and Supermicro, the MI100, combined with AMD EPYCTM CPUs and the ROCm 4.0 open software platform, is designed to propel new discoveries ahead of the exascale era.
Targeted toward scientific computing
Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2nd Gen AMD EPYC processors. The MI100 offers up to 11.5 TFLOPS of peak FP64 performance for HPC and up to 46.1 TFLOPS peak FP32 Matrix performance for AI and machine learning workloads. With new AMD Matrix Core technology, the MI100 also delivers a nearly 7x boost in FP16 theoretical peak floating point performance for AI training workloads compared to AMD’s prior generation accelerators.3
“Today AMD takes a major step forward in the journey toward exascale computing as we unveil this device,” says Brad McCredie, corporate VP data center GPU and accelerated processing, AMD. “Squarely targeted toward the workloads that matter in scientific computing, our latest accelerator, when combined with the AMD ROCm open software platform, is designed to provide scientists and researchers a superior foundation for their work in HPC.”
Open software platform for the exascale era
The AMD ROCm developer software provides the foundation for exascale computing. As an open source toolset consisting of compilers, programming APIs and libraries, ROCm is used by exascale software developers to create high performance applications. ROCm 4.0 has been optimized to deliver performance at scale for MI100-based systems. ROCm 4.0 has upgraded the compiler to be open source and unified to support both OpenMP 5.0 and HIP. PyTorch and Tensorflow frameworks, which have been optimized with ROCm 4.0, can now achieve higher performance with MI1007,8. ROCm 4.0 is the latest offering for HPC, ML and AI application developers which allows them to create performance portable software.
“We’ve received early access to the MI100 accelerator, and the preliminary results are very encouraging. We’ve typically seen significant performance boosts, up to 2-3x compared to other GPUs,” adds Bronson Messer, director of science, Oak Ridge Leadership Computing Facility. “What’s also important to recognize is the impact software has on performance. The fact that the ROCm open software platform and HIP developer tool are open source and work on a variety of platforms, it is something that we have been absolutely almost obsessed with since we fielded the very first hybrid CPU/GPU system.”