September 8, 2020 | IBM i

New IBM Power10 Systems to Deliver 3X Performance Boost

image

It’s official! IBM will begin shipping the first servers equipped with Power10 in the second half of 2021. When it arrives next year, the ground-breaking Power10 processor will provide a 3x performance and a 2.6x efficiency gain over Power9, which IBM says will bolster AI, hybrid cloud, and security workloads for years to come.

Power10 for IBM i, AIX, and Linux

As you will see, Power10 offers impressive hardware and performance improvements over existing Power9 hardware. But we will not know for sure how Power10 affects IBM i, AIX, and Linux processing until after the hardware is available next year. After hardware availability, IBM will release new OS versions or technology refreshes that take advantage of Power10 hardware capabilities, especially the new processing speed, security, virtual machine, encryption, data transfer, and in-memory workload improvements. Expect more exciting OS-based Power10-based capabilities next year.

Impressive architecture specs

The specs on Power10 are quite impressive, and the architecture is even more so, even if it is not what chip-watchers were expecting. Based on a 7 nanometer (7nm) process, IBM was able to cram 18 billion transistors onto a 602 square-millimeter piece of silicon, up from 8 billion with Power9 (which was built on a 14nm process). The new chip features 128 MB of L3 cache, 32 MB of L2 cache, but where it really shines is an assortment of advanced I/O and memory sharing capabilities that will extend IBM’s lead in that area.

 

Two versions of Power10 are being manufactured by IBM’s partner, Samsung Electronics. The high-performance SMT8 features eight simultaneous multi-threads (SMT) per core and 16 physical cores (although only 15 cores will be activated due to yield issues), while the SMT4 sports four threads per core and spans 32 physical cores (only 30 will be activated). The actual chip for SMT8 and SMT4 is identical; the difference in core counts matters primarily for software licensing.

 

IBM will package Power10s in a single-chip setup, with up to 16 sockets, each delivering 15 SMT8 cores running at 4GHz. It is also building dual-chip modules, with up to four dual-chip sockets featuring up to 30 SMT8 cores running at 3.5 GHz. This allows IBM to deliver up to 240 threads per socket.

Data transfer enhancements

IBM is providing an array of mechanisms for keeping the processors fed with data, and this is really where IBM’s new Power10 architecture shines. For starters, Power10 will feature two PCI-Express 5.0 controllers, each of which has 16 lanes running at 32 gigatransfers per second (GT/s). This works out to 128 GB per second across all lanes. The PCI-Express 5.0 spec was just finalized earlier this year, and this is the first Power chip to support the new bus, which features twice the bandwidth as PCI-Express 4.0.

 

PCI-Express 5.0 is fast, but Power10 gets much faster. On two sides of the Power10 chip are a total of 16 Open Memory Interface (OMI) serializer/deserializer (SerDes) blocks, each of which is eight lanes wide (i.e. x8), delivering 32 GT/S per second, or 1 TB per second of bandwidth. The OMI SerDes, which originally debuted with the Power 9’ kicker, will talk to DDR4 memory (and DDR5 in the future).

 

Each Power10 chip contains four PowerAXON interfaces that can connect to OpenCAPI interfaces (for connecting to GPUs, FPGAs, ASICs, and other hardware accelerators) or for clustering processors in a scale-up NUMA configuration. There are 16 PowerAXON SerDes blocks, in an 8x configuration, providing 32 GT/S per second, or 1 TB per second of bandwidth, just like the OMI blocks.

An order-of-magnitude increase for memory

PowerAXON powers the new “Memory Inception” architecture that IBM is introducing with this chip. This new memory architecture, which some are calling the “Holy Grail” of microelectronics, will enable customers to share petabytes of memory among multiple Power10 systems, creating a single pool of clustered memory that scales to 2 PB, far outpacing the 64 TB of addressable memory of the biggest X86 systems. IBM says this new approach will deliver data at a fraction of the latency of other memory-sharing techniques, such as RDMA.

 

This Memory Inception architecture will position Power10 as the preeminent platform for the toughest in-memory workloads, according to IBM, which highlighted German ERP vendor SAP and analytics software developer SAS Institute as potential beneficiaries of the new big breakthrough (or rather, their customers will be).

Power10 Machine learning workloads to be as common as commercial workload

IBM is also including new dedicated accelerators on the Power10 die. This includes an embedded matrix math accelerator that should deliver 10x to 20x speedups for machine learning inference processing compared to Power9. Machine learning workloads, and in particular those based on the latest deep learning models, are also expected to be some of the most common workloads for the new processor, in addition to traditional transaction processing in Power Systems servers.

A 40% increase in encryption and 4 times the number of encryption engines

Security is another focus with Power10, and IBM says the expanded memory of Power10 processors will allow each chip to support 4x the number of AES encryption engines as Power9 and encrypt data 40% faster. a

New security and isolation for containers

IBM developed the new chip to maximize the security of containers, and to provide isolation among virtual machines running on the same chip. IBM developed the Power10’s new hardware-enforced container protection and isolation capabilities in concert with the Power firmware, which will be optimized to protect containers running within Red Hat OpenShift, thereby will bolster Power10’s usefulness in hybrid environments.

 

Power10 has been over five years in the making, and is the result of dozens of patents and hundreds of billions of dollars invested by IBM and Samsung. With its ground-breaking I/O and memory-sharing capabilities, Power10 almost certainly will reset expectations for processors for years to come.