At Its Advancing AI Event, AMD Reveals New GPUs, Software, and Systems
AMD is building an end-to-end AI ecosystem to keep pace with the next phase of AI driven by open standards.
At its 2025 Advancing AI event, AMD launched a slew of sweeping AI platform announcements, including its Instinct MI350 Series GPUs, Helios rack-scale AI systems, and ROCm 7 software stack. While Nvidia’s vertical integration has dominated recent AI infrastructure deployments, AMD hopes its platform offers an alternative built on open standards, chiplet modularity, and ecosystem partnerships.
CEO Dr. Lisa Su introduces the 2025 Advancing AI event.
Let’s take a look at some of the new releases and how they fit into AMD’s vision for the future of an open AI ecosystem.
The Instinct MI350 GPUs
AMD’s first big release is its Instinct MI350 Series GPU, the first to implement the company’s CDNA 4 architecture. Built on TSMC’s N3P node, the MI350X and MI355X incorporate a 3D hybrid-bonded XCD die stacked atop a 6-nm IOD, yielding a compact chiplet footprint. Each GPU features 256 compute units and 1,024 matrix cores, with support for mixed precision formats including FP16, BF16, FP8, and new microscaled FP6 and FP4 types. As a result, the MI355X can deliver up to 10 PFLOPs of sparse FP8 matrix performance, with support for FP4 extending performance to 20 PFLOPs—nearly double the throughput of MI300X.
AMD’s Instinct M1350 Series GPU.
The architecture includes 288 GB of HBM3E memory across eight stacks, achieving 8 TB/s of memory bandwidth. AMD also doubled the unified translation cache (UTC) capacity and optimized the memory subsystem to boost read bandwidth per watt by 1.3x over the previous generation. Meanwhile, partitioning options such as CPX+NPS2 enable a single MI355X to simultaneously serve eight instances of LLaMA 3.1 70B models, while SPX+NPS1 mode supports up to 520 billion parameters per GPU.
Helios Platform and the MI400 Roadmap
AMD also used the event to preview its upcoming Helios rack-scale system, built to accommodate next-generation workloads in distributed inference and agentic AI. Designed around future Instinct MI400 GPUs, Helios will pair these accelerators with Zen 6-based EPYC "Venice" CPUs and Pensando "Vulcano" NICs. AMD claims these devices deliver 10x the inference performance of MI300X when running Mixture of Experts models. While the MI400 architecture remains under wraps, AMD has disclosed that these racks will be key to achieving a 20x improvement in rack-scale energy efficiency by 2030.
To support this target, AMD modeled a scenario where a training workload that currently requires over 275 racks using MI300X systems would complete on a single Helios rack using <5% of the electricity. The company attributes this leap to Helios' projected gains in compute output per watt, memory bandwidth scaling, and the adoption of low-precision formats across AMD’s roadmap.
ROCm 7 Open-Source AI Software
ROCm 7, AMD’s latest open-source AI software stack, introduces kernel-level improvements for GEMM operations, optimized attention mechanisms, and expanded support for distributed inference. The update brings substantial speedups for inference workloads, with average performance increases of 3.2x to 3.8x for LLaMA 3.1 70B, Qwen2-72B, and DeepSeek R1 compared to ROCm 6. For training, ROCm 7 produces a 3x performance uplift on LLaMA 2 70B and Qwen 1.5 7B when paired with MI300X hardware.
ROCm7 versus ROCm6 training performance boosts.
Notably, ROCm 7 integrates support for FP4 and FP6 to offer finer control over quantization and token cost. Hardware-based stochastic rounding, expanded LOP3 instruction sets, and improved LDS bandwidth help lower memory contention and improve core utilization. Meanwhile, communication optimizations like rocSHMEM and GPU Direct Access facilitate cross-node data transfer and reinforce AMD’s competitiveness in distributed inference.
AMD also announced the global availability of the AMD Developer Cloud, which provides a managed environment for ROCm-based development with support for PyTorch, Triton, JAX, and open-source models like Instella and Viking. By combining ROCm 7 with this infrastructure, AMD hopes to position itself as a viable alternative to proprietary AI stacks from Nvidia.
AMD’s Open Ecosystem Strategy
Throughout the event, a common throughline was AMD’s strategy to differentiate itself through openness and scale. Leading firms such as Meta, OpenAI, and Microsoft have shared production insights from MI300X deployments and expressed enthusiasm for the MI350 Series. For example, Oracle Cloud Infrastructure announced plans to scale to 131,072 MI355X GPUs, with zettascale clusters optimized for inference and training at unprecedented volumes. Other groups, such as Red Hat, Marvell, and Astera Labs, contributed to AMD’s UALink-based interconnect ecosystem, which is designed to prevent vendor lock-in and promote interoperability across data center components.
With the MI350X platform delivering a 35x inference improvement over the MI300X and a 40% increase in tokens-per-dollar compared to competing solutions like Nvidia B200, AMD is aligning performance, cost, and openness to drive AI infrastructure over the next decade. Combined with a maturing ROCm stack and forward-looking, rack-scale systems like Helios, AMD's 2025 announcements position it to be a serious contender in the next phase of global AI deployment.
All images used courtesy of AMD.