Tag Archive HPC

ByAdele Hars

World’s New Fastest Supercomputer? That’s FinFETs-on-SOI in Action.

The CPUs in Summit, the world’s new fastest supercomputer are built on 14nm FinFET-on-SOI technology. Yes, those IBM Power9 CPUs are fabbed by GlobalFoundries (you’ll also find them in the z14, the most recent in IBM’s z-series of servers – a series that’s been on various iterations of SOI since its launch in 2003, btw). Summit’s at the U.S. Department of Energy’s Oak Ridge National Laboratory (ORNL) in Tennessee, USA. It is now the top US supercomputer, and it’s for science.

The IBM-built Summit currently claims the spot in the Top500 as the world’s smartest and most powerful supercomputer. “It is capable of performing 200 quadrillion calculations per second — or 200 petaflops — making it the fastest in the world,” says IBM’s Dr. John E. Kelly, III, SVP, Cognitive Solutions and IBM Research. “But this system has never been just about speed. Summit is also optimized for AI in a data-intense world. We designed a whole new heterogeneous architecture that integrates the robust data analysis of powerful IBM Power CPUs with the deep learning capabilities of GPUs. The result is unparalleled performance on critical new applications.”

And if that’s not impressive enough for you, it’s also #5 on the Green500 list for the world’s most energy-efficient computers, posting Power Efficiency (GFlops/watts) of 13.889.

Summit supercomputer nodes: The IBM-built Summit supercomputer is the world’s smartest and most powerful AI machine. It consists of 4,600 individual nodes. Each node contains two 22-core 3.07GHz IBM POWER9 CPUs, which are built on GlobalFoundries’ 14nm HP FinFET-on-SOI technology, as well as six NVIDIA Telsa GPUs. (Photo Credit: ORNL).

As GF noted when they announced the technology in the fall of 2017 (read the GF press release here), their 14HP is the industry’s only technology to integrate a FinFET transistor architecture on SOI. Featuring a 17-layer metal stack and more than eight billion transistors per chip, the technology leverages embedded DRAM and other innovative features to deliver higher performance, reduced energy, and better area scaling over previous generations to address a wide range of deep computing workloads.

These technologies have long, deep histories (and were developed in close collaboration with SOI wafer leader Soitec). Here at ASN we have a fabulous archive of pieces contributed by IBM explaining the genesis of the technology – they’re great reads and still entirely pertinent:

The IBM POWER9 processor delivers unprecedented speeds for deep learning and AI workloads. IBM Engineer, Stefanie Chiras tests the IBM Power System server in Austin, Texas. (Photo Credit: Jack Plunkett/Feature Photo Service for IBM).

As ORNL noted in its press release (you can read it here), the first projects will apply machine learning and AI to astrophysics, materials science, cancer research and systems biology.

BTW, Summit also has a slightly smaller sister machine called Sierra, going in at the Lawrence Livermore National Laboratory (part of the Department of Energy’s National Nuclear Security Administration). With 4,320 nodes (each  also containing two 22-core 3.07GHz IBM POWER9 CPUs, which are built on GlobalFoundries’ 14nm HP FinFET-on-SOI technology, but just four NVIDIA Telsa GPUs), Sierra’s claimed the #3 spot on the June 2018 Top500 list of the world’s most powerful supercomputers.

And the Power 9 is now finding it’s way into major data centers – like Google’s (read about that here). There have been some good pieces in the press about it, including in Forbes and The Motley Fool.  So yes, clearly there are exciting markets for FinFETs on SOI!

ByGianni PRATA

Kalray considers FD-SOI for many-core processors (Electronics360)

Kalray is considering an FD-SOI version of its family of programmable multicore processors, reports Peter Clarke of Electronics360 (see article here).  Clarke says that Kalray’s director of solutions and software services told him that while they’re currently on 28nm bulk, they’ve had customer interest in an FD-SOI version of a planned 64-core chip for telecom, automotive and medical apps.  The company says that its gigaflops-per-watt ratio is already one of the world’s best, and an FD-SOI version would make it even better.

ByGianni PRATA

IBM’s SOI-based BlueGene/Q takes the top 5 spots on the latest “Green500” list of energy-efficient supercomputers.

IBM‘s SOI-based BlueGene/Q takes the top 5 spots on the latest “Green500” list of energy-efficient supercomputers.

ByGianni PRATA

The new 32nm SOI Bulldozer-based AMD Opteron™ Processors will power the National Science Foundation’s Blue Waters project

The new 32nm SOI Bulldozer-based AMD Opteron™ Processors will power the National Science Foundation’s Blue Waters project. Per the TOP500 Supercomputers list, more than two million AMD Opteron cores power many of the world’s fastest supercomputers across 14 countries.

ByGianni PRATA

AMD Bulldozer Architecture Leverages 32nm SOI

With performance, efficiency, and power optimization as top priorities, AMD’s innovative Bulldozer architecture is built on 32nm SOI.

As of the Fall of 2011, AMD is shipping both client and server CPUs based on the new Bulldozer architecture. The first of the new APUs (CPU + GPU) incorporating Bulldozer modules will start shipping in 2012.

All of AMD’s innovative new Bulldozer architectures are built on 32nm SOI technology fabbed by GlobalFoundries.

Bulldozer is the code name for AMD’s next-generation CPU core, which targets the two key “heavy lifting” markets:

  • servers, and
  • the high-performance end of the client platform.

As indicated on the AMD roadmap, all of the company’s CPUs for the “server” market – chips in the Opteron family – are based on the 32nm SOI Bulldozer architecture.

The roadmap for “client” products also shows key chip families for desktop processors and high-performance notebooks based on the 32nm SOI Bulldozer architecture.

Opteron/Interlagos

OpteronInterlagos is the codename for AMD’s 12- or 16-core 32nm SOI server processors based on the new “Bulldozer” processor core. It carries the AMD Opteron™ 6200 and 6100 Series processor brands and is supported by the AMD Opteron™ 6000 Series (“Maranello”) platform.

Interlagos includes the world’s first 16-core x86 processors. The first Interlagos shipments began in August 2011 to large custom supercomputer installations: 25,000 to Oak Ridge labs and 38,000 to Los Alamos, for example.

FX

Unlocked FX ProcessorThe latest AMD FX series marks the first retail availability of Bulldozer-based processors. Available in 8-, 6- and 4-core configurations, these CPUs targets extreme multi-display gaming, mega-tasking and HD content creation for PC and digital enthusiasts.

The new FX includes the first-ever eight-core desktop processor, which took the Guinness World Record for the “Highest Frequency of a Computer Processor,” hitting a top speed of 8.429 GHz.

Up next: APUs

AMD has dubbed the company’s new Fusion APUs the era of “Personal Supercomputing”. The A-Series APUs, codenamed Llano, that started shipping in mid-2011 are based on 32nm SOI, but their CPUs are based on the previous generation of the x86 CPU architecture, and as such are not yet Bulldozer.

However, the next generation in the A-Series APUs, codenamed Trinity and scheduled for release in 2012, will also be based on 32nm SOI with next-generation Bulldozer CPU cores.

Power-optimized design

The Bulldozer architecture is based on “modules” of two cores each. AMD explains that this means two simultaneous threads can be executed more efficiently than two threads running on a single integer core.

Bulldozer module

(Courtesy: AMD)
Bulldozer specs⁴:
> each module has 2 cores
> 213 million transistors/module
> 11 metal layers, 32nm SOI, HKMG
> 0.8 – 1.3V operation
> Area/module: 30.9mm2 (for a 2-core CPU module + 2MB L2 cache)

For each two-core module, there is a shared 2MB L2 cache. The shared L3 cache varies from 8MB to 16MB, depending on the processor.

The Bulldozer design is new from the ground-up. It required co-development of power efficiency, timing, and functionality¹. The team reduced leakage power by 95% when both cores are idle by module-level VSS (rather than VDD) power gating, first used in the 32nm Llano CPU. SOI enables this to be done without extra processing steps².

The L1 caches use an 8T storage cell. The design team said that the change from a 6T cell in 45nm to 8T in 32nm improved the low-voltage margin and read timing and reduced power³.

This game-changing architecture on SOI promises an exciting new era of high performance and low power in systems ranging from sleek but powerful notebooks to the fastest supercomputers on the planet.

1. Design Solutions for the Bulldozer 32nm SOI 2-Core Processor Module in an 8-Core CPU. Tim Fischer et al. IEEE ISSCC 2011, p.78.
2. An x86-64 Core Implemented in 32nm SOI CMOS, by Ravit Jotwani, et al. IEEE ISSCC 2010, p. 106.
3. Idem, Fisher et al.
4. Idem, Fisher et al.

———-

Thank you to AMD for help on this article.