December 2, 2024 | IBM Z

New IBM z AI Hardware and Processing Trends

image

Recent surveys indicate that 78 – 85% of organizations running IBM z systems are currently deploying or plan to deploy generative AI on the mainframe. Implementing mainframe AI for batch processing, innovation, security, administration, and other tasks will inevitably require significant hardware capacity updates. 

 

Here’s a quick rundown of the current state of IBM z system hardware capability for AI processing, including:  

  • Telum processors running on IBM z16 machines 
  • Telum II processors and IBM Spyre Accelerators coming in 2025 

Telum Processors for z16 Machines 

IBM z16 systems can use Telum processors for AI processing. 

 

Telum provides critical AI hardware capabilities for z systems. In addition to its traditional processing cores (CPUs), Telum adds AI accelerator cores to each chip. AI accelerator cores are designed for running deep learning workflows with over six teraflops (TFLOPs) of compute capacity on each chip. With two sets of cores, Telum was specifically designed to run AI workflows alongside (or with) batch and other processing. 

 

Telum features redesigned chip caching that increases cache size and reduces latency. Telum on z16 systems added Open Neural Network Exchange (ONNX) support for AI applications that are developed outside the mainframe. ONNX compatibility allows z16 machines to run cross-platform workloads and leverage different learning models.  

 

Further reading: IBM z16 System Hardware for z/OS AI Processing 

Telum II Processors and IBM Spyre Accelerator (2025 availability) 

IBM made two big AI-related hardware technology announcements at the Hot Chips 2024 conference:  

 

1. Telum II processors
2. The IBM Spyre Accelerator 

 

Telum II and Spyre are separate technologies that can be combined to substantially increase AI acceleration, throughput, and performance. Both items are expected to be available for IBM z systems starting in 2025. 

Telum II 

Telum II is the next generation z system processor. It offers more advanced AI features, built-in Data Processing Unit (DPU) capabilities, enhanced acceleration, and a significantly larger cache memory.  

 

Telum II processors build on and expand the AI capabilities of the z16 Telum processor set, including: 

  • Integrated AI Accelerator: Telum II comes with an integrated on-chip AI accelerator that offers high AI inferencing throughput and low-latency. The accelerator runs at 24 Tera Operations per Second (TOPS), a 4x performance improvement over the original Telum processor. A drawer of eight Telum processors would max out at 192 TOPS. 
  • Processing cores and caching: Each Telum II processor will feature eight high-performance cores running at 5.5GHz. Caching will be increased with ten 36MB Level-2 (L2) caches available on the chip, with L2 caches assigned to each core, the DPU, and for overall caching capability. Telum II represents a 40% increase in on-chip capacity compared to the current Telum processor. The virtual L3 and L4 caches introduced in the z16 Telum processor will grow to 360MB and 2.88GB respectively in Telum II.  Telum II cache size increases will decrease latency, more easily handle large datasets in memory, and reduce fetches. 
  • DPU Integration/IO Accelerator: Telum II features an on-chip, low latency, integrated Data Processing Unit (DPU) for I/O acceleration. Locating the DPU directly onto the processor chip allows for reduced latency and faster data transfer within complex environments, while reducing its physical footprint and power usage.  
  • The DPU contains four processing clusters—with each cluster containing eight programmable micro-controllers—and an I/O accelerator that manages the clusters. The accelerator and clusters connect to each other via their own private 36MB L2 cache. The DPU connects both to the cache fabric and to PCIe interfaces, which can prevent bulk data transfers from overwhelming L2 cache. The Telum DPU also contains a separate L1 cache and a request manager for tracking outstanding requests. All of which should increase AI processing performance by decreasing data transfer latency.

IBM Spyre Accelerator 

The IBM Spyre Accelerator is a 5nm process node that complements and provides additional AI compute capability to Telum II processors. Each Spyre card is built with 32 AI accelerator cores for AI processing. Spyre cards are designed specifically to handle machine learning, large language models (LLMs), and other tasks needed for AI processing. Spyre Accelerator processors also use a range of lower precision numeric formats (such as int4 and int8) to make running AI models more energy efficient and far less memory intensive. 

 

Multiple Spyre cards can be connected into the I/O subsystem via the PCIe interface. They can be clustered to scale AI processing workloads across an IBM z system. If you attach one Spyre card to a z system, you will add another 32 accelerator cores for your AI processing. Attach two Spyre cards and add an additional 64 accelerator cores. And so on. Spyre Accelerator cores will be available in addition to the 32 AI accelerator cores included with a Telum II processor.  

 

Spyre Accelerators allows z systems to scale their AI processing to meet the next wave of large language AI models (LLMs) in transaction processing. Spyre will be an expandable way to add AI processing capacity to z systems as your needs grow and your budget allows. Combining Telum II and Spyre Accelerators can add significant amounts of AI acceleration, throughput, and performance to IBM z systems. 

Telum versus Telum II and Spyre Accelerator 

There are key differences between IBM’s Telum, Telum II, and Spyre Accelerator technologies. The original Telum processor contains many of the same capabilities as Telum II, but it contains less cache space and less advanced AI capabilities. 

 

Telum II is an upgraded version of the original Telum processor. It provides significant increases for AI processing in its integrated AI accelerator, processor cores, caching, and integrated DPU. The Spyre Accelerator will allow organizations to significantly ramp up AI processing capability while retaining the same traditional CPU capabilities used for batch processing and other tasks. Using Telum II along or in combination with one or more Spyre Accelerators will allow IBM z systems to handle larger AI workloads. 

 

It’s unclear which IBM z systems will offer Telum, Telum II, and Spyre Accelerator options in 2025 and beyond. Their usage will be dictated by organizational needs, availability, and use case. Stay tuned for more information on how IBM will continue to roll out these technologies.  

 

Please Contact SEA for more information about IBM Z report management, output management, and enterprise batch and DevOps tools.