The CPUs in Summit, the world’s new fastest supercomputer are built on 14nm FinFET-on-SOI technology. Yes, those IBM Power9 CPUs are fabbed by GlobalFoundries (you’ll also find them in the z14, the most recent in IBM’s z-series of servers – a series that’s been on various iterations of SOI since its launch in 2003, btw). Summit’s at the U.S. Department of Energy’s Oak Ridge National Laboratory (ORNL) in Tennessee, USA. It is now the top US supercomputer, and it’s for science.
The IBM-built Summit currently claims the spot in the Top500 as the world’s smartest and most powerful supercomputer. “It is capable of performing 200 quadrillion calculations per second — or 200 petaflops — making it the fastest in the world,” says IBM’s Dr. John E. Kelly, III, SVP, Cognitive Solutions and IBM Research. “But this system has never been just about speed. Summit is also optimized for AI in a data-intense world. We designed a whole new heterogeneous architecture that integrates the robust data analysis of powerful IBM Power CPUs with the deep learning capabilities of GPUs. The result is unparalleled performance on critical new applications.”
And if that’s not impressive enough for you, it’s also #5 on the Green500 list for the world’s most energy-efficient computers, posting Power Efficiency (GFlops/watts) of 13.889.
As GF noted when they announced the technology in the fall of 2017 (read the GF press release here), their 14HP is the industry’s only technology to integrate a FinFET transistor architecture on SOI. Featuring a 17-layer metal stack and more than eight billion transistors per chip, the technology leverages embedded DRAM and other innovative features to deliver higher performance, reduced energy, and better area scaling over previous generations to address a wide range of deep computing workloads.
These technologies have long, deep histories (and were developed in close collaboration with SOI wafer leader Soitec). Here at ASN we have a fabulous archive of pieces contributed by IBM explaining the genesis of the technology – they’re great reads and still entirely pertinent:
As ORNL noted in its press release (you can read it here), the first projects will apply machine learning and AI to astrophysics, materials science, cancer research and systems biology.
BTW, Summit also has a slightly smaller sister machine called Sierra, going in at the Lawrence Livermore National Laboratory (part of the Department of Energy’s National Nuclear Security Administration). With 4,320 nodes (each also containing two 22-core 3.07GHz IBM POWER9 CPUs, which are built on GlobalFoundries’ 14nm HP FinFET-on-SOI technology, but just four NVIDIA Telsa GPUs), Sierra’s claimed the #3 spot on the June 2018 Top500 list of the world’s most powerful supercomputers.
And the Power 9 is now finding it’s way into major data centers – like Google’s (read about that here). There have been some good pieces in the press about it, including in Forbes and The Motley Fool. So yes, clearly there are exciting markets for FinFETs on SOI!
By: Tamer Ragheb, Digital Design Methodology Technical Manager at GlobalFoundries and Josefina Hobbs, Senior Manager of Strategic Alliances, Synopsys
It’s clear that getting an optimal balance of power and performance at the right cost is foremost in the minds of designers today. Designers who want either high performance or ultra low-power, or ideally both, have a choice to make when it comes to migrating to next generation nodes. For applications that push the envelope in performance, FinFET would be the optimal solution. For applications that require ultra low-power and more RF integration, FD-SOI is the right solution. The two technologies have different value propositions that need to be considered while designing for applications ranging from high-performance computing and server to high-end mobile and Internet of Things (IoT).
GlobalFoundries 22FDX is the industry’s very first 22nm FD-SOI platform. The 22FDX technology is specifically designed to meet the ultra low-power requirements of the next generation of connected devices. The big advantage of this platform is its ability to provide software control at the transistor level through flexible body-biasing (Fig. 1). The ability to provide real-time trade-offs between power and performance via software-controlled body-biasing of the transistor creates new options for the designer. For example, imagine designing a processor for a Smartwatch that could match its power-performance tradeoff to your typical use and modify its performance based on how you’re using it that day.
The full impact of the body bias capability of 22FDX becomes clear when compared to incumbent high-performance process technologies (Fig. 2). 22FDX compared to a 28nm high K metal gate (HKMG) technology can provide up to 50% less power at the same frequency, or 40% faster performance at the same total power than 28HKMG. In addition, 22FDX can be further optimized with forward body bias, shown on the blue curve, to further reduce the power or to further boost the speed in a turbo operation mode.
In addition to the body bias, 22FDX offers capabilities for design flexibility and intelligent control that are not available in other technologies. These include:
Manufacturing success is highly sensitive to specific physical design features, with advanced nodes requiring more complex design rules and more attention to manufacturability issues on the part of designers. However, there are essentially no additional manufacturing requirements to design in 22FDX beyond what is required for 28nm designs.
There are four application optimized extensions available with 22FDX (Fig. 3). These are:
GlobalFoundries reference flow for 22FDX has been optimized to support forward and reverse body bias (FBB/RBB), which provides the design flexibility to optimize the performance/power trade-offs. The reference flow supports implant-aware and continuous diffusion-aware placement, tap insertion and body bias network connectivity according to high voltage rules, double-patterning aware parasitic extraction (PEX), and design for manufacturing (DFM). This provides designers with the flexibility to manage power, performance and leakage targets for the next-generation chips used in mainstream mobile, IoT and networking applications.
GlobalFoundries has been collaborating with Synopsys to enable and qualify their tools for the 22FDX Reference Flow. The recent qualification of Synopsys’ Galaxy™ Design Platform for the current version ofGlobalFoundries’ 22FDX technology allows the designer to manage power, performance and leakage and achieve optimal energy efficiency and cost effectiveness. Synopsys’ Galaxy Design Platform supports body biasing techniques throughout the design flow, including both forward and reverse body bias, enabling power/performance trade-offs to be made dynamically and delivering up to 50% power reduction.
Key tools and features of the Galaxy Design Platform in the 22FDX reference flow include:
The 22FDX technology leverages existing design tools such as the Galaxy Design Platform, manufacturing infrastructure and the broader design ecosystem. This speeds time to market and enables the creation of differentiated products.
Citing SOI in the Power family of high-performance processors, Chipworks concludes that IBM is a major source of chip innovation. In a recent EETimes article (read it here), which charts IBM developments at the transistor level over the last decade, the article notes that “..the 32 nm technology used to fabricate the IBM Power7+ represents an extraordinary technical achievement. IBM continues to be one of the technology leaders in the global semiconductor industry.” The article has excellent charts and pictures, and is a recommended read for anyone interested in the evolution of leading-edge SOI-based processors over the last decade.
IBM’s Watson supercomputer, which is based on SOI, has a new job in medical training. IBM
announced that the team of researchers that created Watson will work with Cleveland Clinic
clinicians, faculty and medical students to enhance the capabilities of Watson’s Deep Question
Answering technology for the area of medicine. Over time, the expectation is that Watson will get
“smarter” about medical language and how to assemble good chains of evidence from available
PCMag’s Michael Miller called IBM’s 22nm SOI Power8 “the most fascinating” of the high-end processors. Reporting on this year’s Hot Chips conference, presented there. He noted that the chip “will have 12 cores, each capable of running up to eight threads, with 512KB of SRAM Level 2 cache per core (6MB total L2) and 96MB of shared embedded DRAM as a Level 3 cache.” He cites the eDRAM, which ASN readers first learned about in an ’06 article by Subi Iyer, the IBM father of eDRAM – when he explained, “The complexity adder is about half in SOI compared to bulk for deep trench based eDRAMs.”
Miller also says, “Compared with the previous generation Power 7+, which was manufactured on a 32nm SOI process, Power8 should have more than twice the memory bandwidth at 230GBps. IBM says each core should have 1.6 times the performance of Power7 on single-threaded applications and twice the SMT (symmetric multi-threaded) performance.”
AMD has made two new 32nm SOI-based product announcements:
At the recent DATE Conference in Grenoble (DATE is like DAC, but in Europe, alternating yearly between Grenoble and Dresden), STMicroelectronics, CEA-Leti & Mentor Graphics joined forces for a FD-SOI presentation organized by CMP and sponsored by Mentor.
Here are some of the highlights (the complete presentations are all available from the CMP website).
Presented by Philippe Magarshack, Technology R&D Group Vice-President and Central CAD GM at STMicroelectronics – Download pdf
Presented by Jean-Marc Talbot, Senior Director of Engineering Analog & Mixed Signal at Mentor Graphics Grenoble R&D Center – Download pdf
The advantages of back-biasing increase as you shrink the SOI layers, so it will get even better with each node!
A few other notables from the DATE conference:
The YouTube video Introduction to FD-SOI by STMicroelectronics and ST-Ericsson has generated enormous coverage in the press as well as in-depth discussions across various user groups in LinkedIn. In its first two weeks, it had over 3000 YouTube views, and LinkedIn postings of it generated over 50 Likes and Comments in a single group.
As you no doubt know by now, at CES a few weeks ago, ST-Ericsson showed the new NovaThor L8580, which integrates an eQuad 2.5GHz processor based on the ARM Cortex-A9, an Imagination PowerVR™ SGX544 GPU running at 600Mhz and an advanced multimode LTE modem on a single 28nm FD-SOI die. Process technology and manufacturing credit goes to ST. In a live video from the show, the chip reached 2.8GHz in a high-performance demo, and in a low-power demo hit 1GHz using just 0.636V (which would take 1.1V on bulk).
Since then, Giorgio Cesana, Director of Technology Marketing at STMicroelectronics, has been everywhere, responding to questions from readers and correcting misunderstandings as they arise.
One of the top things people want to know more about is biasing in FD-SOI, which can provide a big performance boost or huge power savings.
In case you missed it, here’s what Giorgio had to say to questions posed in the big LinkedIn Semiconductor Professional’s Group:
Thank you all for this interesting discussion and for giving me the opportunity to provide more details about the ST 28nm FD-SOI technology. I hope this clarifies any misunderstandings.”
Body bias, or more properly back bias (because biasing is done on the back face of the transistor) is a way to electrically control the Vt of the device by controlling of the polarization of the wells.
Conceptually, it is like having the planar transistor controlled by two gates: the real “classical” gate, we build with a HKMG, gate-first manufacturing approach, and a virtual gate (represented in the video with a transparent gate below the transistor) that represents the capability to control the transistor through biasing.
The back gate is the “virtual” one. It does not require any extra manufacturing steps to be fabricated. It is created simply by polarizing the well.
The particular FD-SOI technology that ST is using, called UTTB (Ultra Thin Body and Box), benefits from a extremely thin (25nm) Buried Oxide (BOX) which enables extremely efficient control of the transistor threshold voltage through the biasing, up to 80mV/V. In addition, because of the insulator in FD-SOI, biasing is not limited to 300mV like in bulk technologies, allowing an extremely wide dynamic control of the transistor Vt.
In terms of biasing efficiency, this past Dec 10th we published some figures for 600mV forward body bias in 28nm, showing up to 45% speed increase when running cores at 0.6V. That said, exploiting body biasing is a matter of making a design that provides an independent supply to the wells, managed through the power supply controller, to optimize the Vt to reach proper energy efficiency, balancing the static and dynamic part of the power consumption. Of course biasing conditions should be considered at design optimization and sign-off phase.
Finally body/back biasing in FinFETs simply does not work, because the transistor channel is vertical and the gate controls 3 sides of the channel. The 4th side (the one sitting on the substrate) is too narrow to be influenced through body biasing. Body biasing is simply not an option with FinFETs.
Someone at one of the big programmable device companies then asked a follow-up question on the implementation. Giorgio responded:
In 28nm FD-SOI, threshold-voltage centering is a function of the gate work function, where the Vt is controlled by implanting a ground plane (GP) below the BOX (Buried Oxide). Depending on its type (N or P), Vt can be raised by more than 50mV, allowing the manufacturer to offer two device flavors: regular Vt and low Vt.
Threshold voltage is also statically controlled by modulating the gate length. ST’s multi-channel standard-cell library allows us to modulate the gate length up to +16nm, offering a static leakage control of up to 50x for a single Vt design, almost twice the leakage control offered by dual-Vt designs plus multi-channel libraries of competing bulk planar technologies.
Body bias is just one way to modulate the threshold voltage, and the dynamic nature of the control allows new and innovative design solutions to be implemented for extremely energy efficient designs.
I should note that body-bias usage is not mandatory in FD-SOI: we can make devices without using it and which still benefit from a good speed/power balance, low Vmin memories, better device variability, and all the other benefits FD-SOI processing offer. Chip architects can also decide to limit body-bias adoption only to some critical blocks/IPs in the SoC for the best trade-off between optimal energy efficiency and implementation simplicity. For further reference, you may read F. Arnaud, “Switching Energy Efficiency Optimization for Advanced CPU thanks to UTBB Technology,” IEDM 2012.
FD-SOI vs. PD-SOI: Ultra-Thin Body and Buried Oxide (UTBB) FD-SOI technology is very different from Partially-Depleted technologies manufactured before. Those partially-depleted technologies were affected by floating-body effects where the body was subject to an uncontrolled charging/discharging that led transistor behavior to depend on the previous transitions –i.e. making them suffer from a kind of memory effect.
In UTBB FD-SOI technology, hybridation lets us contact the body, so it is not left floating, overcoming the problems with PD-SOI technologies.
Self-heating: Self-heating is also a problem that exists with Partially-Depleted SOI technologies, where the Buried Oxide thickness (~150nm) was thermally isolating transistors from the substrate, leading to self-heating effects.
UTBB FD-SOI technology offers two advantages to overcome this self-heating:
– The Buried Oxide (BOX) is extremely thin (only 25nm thick in 28nm technology), offering significantly less thermal resistance;
– The big diodes, the drift MOS, the vertical bipolar, some resistors… are all implemented on the “hybrid” bulk part, eliminating even the thin BOX below them.
Wafer thickness: The ST process specification is for wafers with 12nm thick silicon (+/- 5A). Process manufacturing then “uses” part of the silicon film for the manufacturing of the transistors, leading to a final 7nm film below the transistors.
We are moving from a raw 12nm thick silicon film (=120A, +/- 5A) to a final film of 7nm (=70A) under the transistors. This is a perfectly repeatable process and is already qualified for production at ST.
Wafer costs: UTBB FD-SOI technology manufacturing uses up to 15% fewer steps vs. our bulk planar 28LP HKMG gate-first technology. This process simplification, by itself, is capable of totally compensating for the current substrate cost difference. Then, we expect in high volume production, UTBB FD-SOI die costs should be even better than bulk planar, with substrate-cost erosion and with UTBB FD-SOI improving electrical yield over bulk planar.
Manufacturability: to prove manufacturability, the recent announcement from ST-Ericsson about their NovaThor L8580 product, which was demonstrated at CES, is capable of running its eQuad ARM cores up to 2.8GHz, while still fitting a mobile smartphone thermal footprint and proving (if needed) the potential and the maturity of FD-SOI technology.
Additional recommended reading:
– O. Faynot et al, “Planar Fully Depleted SOI Technology: a powerful architecture for the 20nm node and beyond”, International Electron Device Meeting Technical Digest, 2010
– Advantages of UTBB FD-SOI: A. Khakifirooz at al., “Extremely thin SOI for system-on-chip applications”, CICC 2012*, written by authors from IBM, STMicroelectronics, LETI, Renesas, and GLOBALFOUNDRIES.
*Editor’s note: ETSOI is what IBM calls its flavor of FD-SOI.
To keep up-to-date on the latest in SOI-related news, please join us at the Advanced Substrate News LinkedIn group.
As noted in ASN last year, the multi-core CPU in Nintendo’s new Wii U, which hit the shelves in November 2012, is fabbed by IBM on 45nm SOI.
With performance, efficiency, and power optimization as top priorities, AMD’s innovative Bulldozer architecture is built on 32nm SOI.
As of the Fall of 2011, AMD is shipping both client and server CPUs based on the new Bulldozer architecture. The first of the new APUs (CPU + GPU) incorporating Bulldozer modules will start shipping in 2012.
All of AMD’s innovative new Bulldozer architectures are built on 32nm SOI technology fabbed by GlobalFoundries.
Bulldozer is the code name for AMD’s next-generation CPU core, which targets the two key “heavy lifting” markets:
As indicated on the AMD roadmap, all of the company’s CPUs for the “server” market – chips in the Opteron family – are based on the 32nm SOI Bulldozer architecture.
The roadmap for “client” products also shows key chip families for desktop processors and high-performance notebooks based on the 32nm SOI Bulldozer architecture.
Interlagos is the codename for AMD’s 12- or 16-core 32nm SOI server processors based on the new “Bulldozer” processor core. It carries the AMD Opteron™ 6200 and 6100 Series processor brands and is supported by the AMD Opteron™ 6000 Series (“Maranello”) platform.
Interlagos includes the world’s first 16-core x86 processors. The first Interlagos shipments began in August 2011 to large custom supercomputer installations: 25,000 to Oak Ridge labs and 38,000 to Los Alamos, for example.
The latest AMD FX series marks the first retail availability of Bulldozer-based processors. Available in 8-, 6- and 4-core configurations, these CPUs targets extreme multi-display gaming, mega-tasking and HD content creation for PC and digital enthusiasts.
The new FX includes the first-ever eight-core desktop processor, which took the Guinness World Record for the “Highest Frequency of a Computer Processor,” hitting a top speed of 8.429 GHz.
AMD has dubbed the company’s new Fusion APUs the era of “Personal Supercomputing”. The A-Series APUs, codenamed Llano, that started shipping in mid-2011 are based on 32nm SOI, but their CPUs are based on the previous generation of the x86 CPU architecture, and as such are not yet Bulldozer.
However, the next generation in the A-Series APUs, codenamed Trinity and scheduled for release in 2012, will also be based on 32nm SOI with next-generation Bulldozer CPU cores.
The Bulldozer architecture is based on “modules” of two cores each. AMD explains that this means two simultaneous threads can be executed more efficiently than two threads running on a single integer core.
For each two-core module, there is a shared 2MB L2 cache. The shared L3 cache varies from 8MB to 16MB, depending on the processor.
The Bulldozer design is new from the ground-up. It required co-development of power efficiency, timing, and functionality¹. The team reduced leakage power by 95% when both cores are idle by module-level VSS (rather than VDD) power gating, first used in the 32nm Llano CPU. SOI enables this to be done without extra processing steps².
The L1 caches use an 8T storage cell. The design team said that the change from a 6T cell in 45nm to 8T in 32nm improved the low-voltage margin and read timing and reduced power³.
This game-changing architecture on SOI promises an exciting new era of high performance and low power in systems ranging from sleek but powerful notebooks to the fastest supercomputers on the planet.
1. Design Solutions for the Bulldozer 32nm SOI 2-Core Processor Module in an 8-Core CPU. Tim Fischer et al. IEEE ISSCC 2011, p.78.
2. An x86-64 Core Implemented in 32nm SOI CMOS, by Ravit Jotwani, et al. IEEE ISSCC 2010, p. 106.
3. Idem, Fisher et al.
4. Idem, Fisher et al.
Thank you to AMD for help on this article.