August 28, 2024 | IBM Z

IBM z16 System Hardware for z/OS AI Processing

image

Although AI applications can run inside z/OS servers using IBM z16, z15 or z14 systems, the best performance can be achieved using IBM z16 machines running Telum processors. This blog article examines some of the architectural advances IBM has made with Telum processors and how those advances can integrate AI processing into traditional z/OS processing.  

Integrating AI into transactional and batch processing 

Released in 2022, the Telum chip was specifically designed to integrate AI processing along with traditional aspects of IBM Z transactional and batch processing. Newer z16 machines with Telum processors are optimized for AI processing in a way that previous z machines simply cannot match.

 

Telum processors are also available in IBM’s LinuxOne 4 family of servers. Any business needing to combine AI processing with transactional or batch processing may want to upgrade their z hardware to newer z16 machines with embedded AI accelerators (IBM Documentation).

Hardware innovations for AI processing 

Telum chips used in z16 systems provide significantly enhanced AI processing inside z/OS systems including: 

  • Eight processing cores with redesigned chip caching 
  • New on-chip AI accelerator to integrate AI processing directly into IBM Z workloads 
  • Encrypted memory and an improved, trusted execution environment 
  • Reliability and availability improvements 

Processing cores and redesigned chip caching 

The Telum chip also has more horsepower for running the enterprise-class workloads found in z/OS systems. Each chip contains eight traditional processing cores (CPUs) clocked higher than 5GHz. IBM totally redesigned how chip caching works, going from four levels of caching (L1 – L4) to two levels (L1 – L2). This redesign keeps data closer to the processor, increases L2 cache size and reduces latency. Telum quadruples the L2 cache size of previous chips and optimizes its cache access patterns.  

 

Eight Telum chips can be stored and linked together in a flat-drawer topology. Each chip in a drawer enjoys a direct connection with the other chips in the drawer, allowing the processing core to access 32 MBs of L2 cache in each chip with low latencies. 

New On-Chip AI accelerator 

With Telum, IBM has also added AI accelerator cores to each chip. The AI cores are designed to efficiently run deep learning workloads, which require higher processing speeds and handle large volumes of data. Per some sources, AI accelerator cores have been implemented with more than six teraflops (TFLOPS) of compute capacity per chip and over 200 TFLOPS for a 32-chip system.  

 

In contrast, enterprise z/OS workloads use the full horsepower of the traditional processing cores. The two types of cores are tightly integrated on each chip allowing for faster data exchange. With Telum, IBM is attempting to enable AI processing into every workload transaction, and it is providing the hardware infrastructure to do so.  

 

On-chip AI accelerators minimize security concerns as AI processing calls are no longer sent to external servers. Also note that AI accelerator performance may vary, depending on how many Telum chips are installed onto a z16 machine. Adding more chips to your system will allow for more inferences per milli-second.  

 

The z16 platform also supports the Open Neural Network Exchange (ONNX) for machine learning interoperability. ONNX provides an open format for supporting deep learning and machine learning models, complete with common file formats, tools, frameworks, runtimes, and compilers for developers to utilize.  

 

With ONNX support, AI modules can be trained and developed anywhere and compiled to the z16 platform using the IBM Z deep learning compiler, allowing z16 machines to run cross-platform workloads. Information about leveraging other learning models such as TensorFlow or IBM Snap ML with IBM z16 systems can be found on the Leveraging the IBM z16 Integrated Accelerator for AI Web page. 

Encrypted memory and improved trusted execution environment 

Telum chips also provide encryption for main memory. When running within containerized workloads in trusted execution environments, Telum’s secure execution improvements ensure that z/OS data is secure from other system users running on the same machine in a Managed Service Provider (MSP), a Cloud Data Center environment, or a Hybrid Cloud environment. 

Reliability and availability improvements 

Telum chips contain a new error correction and sparing mechanism that can recover data when an entire L2 cache array has a wipe-out error. The chip can correct the data and implement a spare array as needed, driving availability beyond 99.99999%. 

Z16 machines with the new Telum chip are also extensible. IBM will be able to provide new system upgrades, features and functions inside firmware updates. The hardware design also lends itself to enhancements in future generations of silicon.  

>40% per socket growth 

IBM previously stated that when all the Telum enhancements are taken together, z16 systems can achieve more than 40% per socket performance growth. This is desirable and necessary due to the increased performance growth in z/OS workloads, including the new on-chip processing enabled by Telum AI accelerators. Given this increase in workload demand and system capability, IBM Z shops may want to consider upgrading to z16 machines when implementing AI processing into their workloads.