IBM’s Subsequent-Gen Z Processor Detailed: Telum Chip Based mostly on 7nm Course of, 22.5 Billion Transistors, 8 Cores Working Past 5 GHz Clocks

Written by Jeff Lampkin

IBM has detailed its next-generation Telum chip which is a part of the Z processor lineup at HotChips 33. The Telum chip contains a model new core structure design that is geared for AI acceleration.

IBM’s Subsequent-Gen Z Processor: 7nm Telum Chip With 22.5 Billion Transistors, 8 Cores, 5 GHz+ Clocks & 6+ TFLOPs AI Acceleration

In response to IBM, the newly optimized Z core together with its model new cache and multi-chip material hierarchy allows over 40% per socket efficiency progress. The Telum chip is comprised of a complete of 8 cores that characteristic their devoted L2 cache. The chip options SMT2 so which provides 16 threads on the chip whereas a most configuration of 32 core and 64 threads is feasible with a 4-drawer system.

IBM Unveils The World’s First ‘2nm’ Know-how With Nanosheets – However Don’t Let That 2nm Tag Idiot You

Clock speeds are stated to be greater than 5 GHz whereas the Telum Z chip comes with a re-designed department prediction with built-in 1st/2nd stage BTB, Dynamic BTB entry reconfiguration, & greater than 270K department goal desk entries. The personal L2 cache has a measurement of 32 MB and contains a 19 cycle load-use latency (~3.8 ns together with TLB entry).

Transferring over to L3 and L4 caches that are shared throughout the 8 cores, the IBM Z Telum chip packs digital on-chip 256 MB L3 cache and digital 2 GB L4 cache throughout as much as 8 chips. The L2 cache makes use of a 320 GB/s dual-direction ring interconnect topology whereas the L3 cache is distributed via L2 cooperation and has a mean latency of 12ns. The digital L3 and L4 cache present 1.5x cache per core.

Efficiency in AI Acceleration is rated at over 6 TFLOPs per chip & over 200 TFLOPs in a 4-drawer system that packs 4 IBM Z chips. The inner Matrix array options 128 tiles with 8-way FP-16 SIMD, high-density multiply, and accumulates FPUs whereas the Activation Array consists of 32 tiles with 8-way FP16/FP-32 SIMD. A dual-chip configuration yields 116,000 inferences (1.1ms) whereas a 32-chip configuration yields 3,600,000 inferences (1.2ms).

IBM Z Telum chips may be scaled up for much more efficiency as there are each single-chip and dual-chip modular designs. The two-chip configuration contains a chiplet design with 2 Telum chips and affords 16 cores, 32 threads, and 512 MB of cache.

AMD Groups With IBM To Enhance Safe ‘Confidential Computing’ Cloud Information Processing

The AI accelerator on the IBM Z Telum chip offers:

  • Very low and constant inference latency
  • Compute capability for utilization at scale
  • Number of AI fashions starting from conventional ML to RNNs and CNNs
  • Safety – present enterprise-grade reminiscence virtualization and safety
  • Extensibility with future firmware and {hardware} updates

The IBM Z Telum Chip goes to be fabricated on the 7nm Samsung course of node and can characteristic a die measurement of 530mm2. The chip will home 22.5 Billion transistors and will likely be aimed toward enterprise & embedded workloads.

About the author

Jeff Lampkin

Jeff Lampkin was the first writer to have joined He has since then inculcated very effective writing and reviewing culture at GamePolar which rivals have found impossible to imitate. His approach has been to work on the basics while the whole world was focusing on the superstructures.