By: Tamer Ragheb, Digital Design Methodology Technical Manager at GlobalFoundries and Josefina Hobbs, Senior Manager of Strategic Alliances, Synopsys
It’s clear that getting an optimal balance of power and performance at the right cost is foremost in the minds of designers today. Designers who want either high performance or ultra low-power, or ideally both, have a choice to make when it comes to migrating to next generation nodes. For applications that push the envelope in performance, FinFET would be the optimal solution. For applications that require ultra low-power and more RF integration, FD-SOI is the right solution. The two technologies have different value propositions that need to be considered while designing for applications ranging from high-performance computing and server to high-end mobile and Internet of Things (IoT).
GlobalFoundries 22FDX is the industry’s very first 22nm FD-SOI platform. The 22FDX technology is specifically designed to meet the ultra low-power requirements of the next generation of connected devices. The big advantage of this platform is its ability to provide software control at the transistor level through flexible body-biasing (Fig. 1). The ability to provide real-time trade-offs between power and performance via software-controlled body-biasing of the transistor creates new options for the designer. For example, imagine designing a processor for a Smartwatch that could match its power-performance tradeoff to your typical use and modify its performance based on how you’re using it that day.
The full impact of the body bias capability of 22FDX becomes clear when compared to incumbent high-performance process technologies (Fig. 2). 22FDX compared to a 28nm high K metal gate (HKMG) technology can provide up to 50% less power at the same frequency, or 40% faster performance at the same total power than 28HKMG. In addition, 22FDX can be further optimized with forward body bias, shown on the blue curve, to further reduce the power or to further boost the speed in a turbo operation mode.
In addition to the body bias, 22FDX offers capabilities for design flexibility and intelligent control that are not available in other technologies. These include:
Manufacturing success is highly sensitive to specific physical design features, with advanced nodes requiring more complex design rules and more attention to manufacturability issues on the part of designers. However, there are essentially no additional manufacturing requirements to design in 22FDX beyond what is required for 28nm designs.
There are four application optimized extensions available with 22FDX (Fig. 3). These are:
GlobalFoundries reference flow for 22FDX has been optimized to support forward and reverse body bias (FBB/RBB), which provides the design flexibility to optimize the performance/power trade-offs. The reference flow supports implant-aware and continuous diffusion-aware placement, tap insertion and body bias network connectivity according to high voltage rules, double-patterning aware parasitic extraction (PEX), and design for manufacturing (DFM). This provides designers with the flexibility to manage power, performance and leakage targets for the next-generation chips used in mainstream mobile, IoT and networking applications.
GlobalFoundries has been collaborating with Synopsys to enable and qualify their tools for the 22FDX Reference Flow. The recent qualification of Synopsys’ Galaxy™ Design Platform for the current version ofGlobalFoundries’ 22FDX technology allows the designer to manage power, performance and leakage and achieve optimal energy efficiency and cost effectiveness. Synopsys’ Galaxy Design Platform supports body biasing techniques throughout the design flow, including both forward and reverse body bias, enabling power/performance trade-offs to be made dynamically and delivering up to 50% power reduction.
Key tools and features of the Galaxy Design Platform in the 22FDX reference flow include:
The 22FDX technology leverages existing design tools such as the Galaxy Design Platform, manufacturing infrastructure and the broader design ecosystem. This speeds time to market and enables the creation of differentiated products.
This post was first published as part of Paul McLellan’s new Breakfast Bytes blog on the Cadence website.
~ ~ ~
Cadence recently put out a press release that the Cadence implementation flow had been qualified on the GLOBALFOUNDRIES 22FDX process. At ARM TechCon, Joerg Winkler and Tamer Ragheb from the design enablement group of GLOBALFOUNDRIES in Dresden, Germany, provided a lot more detail. With German precision, their talk was titled The Implementation of ARM® Cortex®-A17 Quad-Core in GLOBALFOUNDRIES 22FDX Technology Using Cadence Innovus Implementation System. But I didn’t need to wait, I got a 1:1 meeting with Joerg earlier in the week and we went into more detail.
One thing that I had been confused about that Joerg clarified when I asked him is whether double patterning is used. I knew that the metal pitch was 80nm and so, in principle, could be single patterned. But that is only true if the layer is patterned in a single direction (vertical or horizontal but not both). For metal1 and metal2 they wanted to have both directions so that Ls and Ts could be made inside the standard cells, which meant that they needed to use double patterning.
At some level, the details of the process don’t affect the implementation flow that much. The transistors are inside the standard cells and other blocks, so whether they are planar, FinFET or FD-SOI is secondary to where the pins are, how the coloring affects placement, and so on. So the basic flow through Genus, Innovus and the various signoff engines is unchanged from any other process.
One area where FD-SOI is very different, as I went through in detail in my earlier blog, is the forward and reverse body bias (fbb/rbb). This is a voltage applied to the back of the thin buried oxide (the box) that doesn’t turn the transistor on or off (the box is too thick, and the bias can only change slowly due to the high capacitance). However, it does affect the performance of the transistor. This allows various tradeoffs: lower the voltage to reduce power, and then speed it up again with fbb; reduce leakage in a hibernating IoT device with rbb. Basically fbb increases the performance and this can be taken purely as increased performance, or as reduced power at the same performance. And rbb reduces performance but also decreases leakage, so it can be used when high performance is not required but power is critical.
The challenge that Joerg and his team faced was to come up with an architecture for how to connect up the bias. It is a little like planning a power grid. There are local decisions as to how to actually connect, what layer of metal to use and so on. Then there are block-level issues such as how to distribute the signals without creating huge blockages for routing. There isn’t really a chip-level issue like for power since, except perhaps for test chips, the bias is not expected to be externally applied through the package pins, but rather generated internally on the chips with charge pumps and enabled/disabled under software control. The highest level decision is to partition the chip into areas where different biases can be applied—the same bias is not needed everywhere.
The test vehicle chosen was a Quad-Core ARM Cortex-A17 processor. They decided to create five areas where the bias could be controlled independently, each of the four cores and then everything else, which notably includes the L2 cache and its controller. The libraries used were an 8-track standard cell library from Invecas (GF’s IP development partner) with continuous RX and support for body biasing. The cache memories were built from GF evaluation memory kit, with 14 different L1 cache memory macros, 1 L2 cache memory macro with support for body biasing of the bitcell array and for the memory periphery. An additional complication is that these areas also need to support power down (so cores can be powered off completely as well as biased). The body bias is all specified in the IEEE 1801 power file (so that all the other tools can handle the power policy chosen) and in the script that drives the Innovus Implementation System during physical design to actually create the connections.
The body bias nets needed to be connected up to dedicated pins on the well-tap cells, the power switches and the memory macros. The cells were carefully aligned so that the body bias connections were straight runs of metal, and then a body-bias net ring was placed around the perimeter of the module as is often done with power nets. See the diagram below.
The actual connections can be seen in the diagram below. The 10 yellow lines running across are the 10 body bias signals for the 5 regions (they are in pairs, one for P transistors and one for N). The green vertical lines in the middle pick up the appropriate pair of bias signals and these, in turn, are connected to the actual well tap cells (where the signals effectively connect to the back gate).
In a very similar way, the bias for the memory is also picked up from a ring and then run through the core of the memories using straight runs of metal.
GLOBALFOUNDRIES also created an implementation of the smaller ARM Cortex-A9 processor. They have been using this microprocessor for several technology nodes to allow a comparison of power, performance, and area (PPA). In this particular case they wanted to compare performance at different body biases compared to the 28SLP process. The result is that 22FDX with fbb has 30% higher frequency at the same power (along with a 45% area reduction) and with rbb has 45% power reduction at the same frequency (and obviously the same 45% area reduction). This means that the implementation can vary over a huge range of power/performance with the same silicon, whereas at 28nm it would require a complete re-implementation (which is why it appears as a single red dot rather than a curve).
~ ~ ~
This post was first published under the new Breakfast Bytes blog on the Cadence website. The original is here. Many thanks to the folks at Cadence and to Paul McLellan for permission to repost it here on ASN.
In his recent piece, A couple of misconceptions about FD-SOI (3 September 2014), semiwiki blogger and IP expert Eric Esteve corrects some assertions surfacing about FD-SOI. He reminds designers that to really benefit from FD-SOI, you want to leverage body-biasing. He explains how ST has automated the IP conversion process so it takes about half the time you’d normally expect. He also advocates FD-SOI for wearables and smartphones, as it provides both performance advantages and power savings.
By Ali Khakifirooz (Spansion)
One of the unique features of the FD-SOI technology is the ability of using a wide range of body bias to modulate the transistor VT. Unlike bulk planar technology, where the maximum body bias is limited by p-n junction leakage and potential latch-up, in FD-SOI technology the full range of forward body bias (FBB) is available owing to oxide isolation and the use of flip-well structure .
While designers are familiar with the concept of body biasing and have been using it in different forms for many years in bulk CMOS technology, concerns are occasionally raised – often from non-designers – about the complexity and effectiveness of body biasing in advanced nodes.
Body biasing has been known for many years  and was in fact identified as a key technology enabler in sub-0.1µm era by industry leaders . Although ironically the recent move to the FinFET structure removed this gadget from the designers’ toolbox, the need for body biasing is still echoed .
Early studies demonstrated the effectiveness of body biasing in reducing leakage, improving performance, and reducing variability and thereby worst-case power consumption in complex circuits [5-7]. It was, however, pointed out that due to the competing effect of other leakage mechanisms, such as band-to-band tunneling, the effectiveness of reverse body bias (RBB) in managing leakage diminishes with technology scaling . Nonetheless Intel continued using body biasing at least down to 45nm node .
Static Body Biasing
Device variability is one of the key detractors of product yield. Historically, the desktop-driven semiconductor industry used product binning to turn this natural performance variability into profit. However, it is known that changes in market demand or process may lead to significant imbalance between the demand and inventory . Moreover, with the emergence of mobile applications as the dominant technology driver  and strict power requirements, binning is not effective anymore. With the desire to reduce VDD below 0.8V in order to reduce active power, managing the device variability becomes increasingly important.
Body biasing has been long considered as an effective and relatively easy way to compensate for some of the process variations. Not only does it lead to a tighter performance distribution and better yield, but also by mitigating the guardband requirements for process corners and temperature variation, it leads to better performance and faster design cycle.
For example, in a media processor design in 65nm technology a 20% reduction in the worst-case delay was achieved by using an embedded FBB circuit . While most body biasing designs are geared toward keeping VT constant, it has been shown that a combination of VT and drive current control leads to significantly tighter distribution (an 85% reduction in variation) and 25% reduction in total power . These numbers are well comparable to the power saving expected from scaling the design by one technology node. Given the concerns about the saturation of cost scaling beyond 28nm, an FD-SOI design with a wide range of body biasing is thus very appealing.
Dynamic Body Biasing
For applications with varied workload, a more elaborate use of body bias is to adjust the transistor performance based on the workload. This can be, of course, combined with other known low-power techniques such as dynamic voltage and frequency scaling (DVFS), sleep transistors, power gating, etc. In particular, when combined with DVFS, the optimum VT for each VDD can be used to minimize total power .
Design Complexity and Area Overhead
Potentially added design complexity and area overhead due to body bias generation circuits and routing is sometimes voiced as a concern. Static body biasing is relatively easy to implement. Depending on the level of sophistication it requires some sensing circuits (leakage, delay, skew, temperature, etc.), charge pump circuits to generate the body bias, and a network to distribute it across the chip. In typical designs, this does not impose more than 1-2% area overhead. The design complexity is actually reduced as less resources are needed to meet target performance across process and temperature corners. Notable bulk CMOS designs that used body bias to reduce variability include Samsung’s ExynosTM SoC in both 32nm and 28nm node [13-14], and Oracle’s SPARC processors in 40nm .
Dynamic body biasing, on the other hand, needs additional system and software development. However, we do not expect this to be more complex than implementing any other low-power technique such as dynamic voltage scaling. An example is TI’s 45nm OMAP SoC that used body bias as a part of their SmartReflex technology (Figure 1) .
Figure 1. Example of combined dynamic body bias and voltage scaling in TI’s 45nm SoC . Proper VDD and body bias is selected based on the power mode and process corner. (Courtesy: ISSCC, TI)
No Body Effect?
While many bulk CMOS designs used body bias in some form, on the other end of the spectrum are the designs that used PD-SOI technology, where majority of the devices do not have a body contact. The lack of body effect in PD-SOI devices was claimed to help stacked transistors and passgates, leading to 15-25% speed improvement . For designers that prefer a zero-body-effect style, the move to FinFET or a thick BOX FD-SOI structure seems more natural. However, for mainstream applications where power and parametric yield are the main drivers, thin BOX FD-SOI and use of body bias is more sensible.
– – –
 D. Jacquet, et al., “A 3 GHz dual core processor ARM CortexTM-A9 in 28 nm UTBB FD-SOI CMOS with ultra-wide voltage range and energy efficiency optimization,” IEEE JSSC, p. 812, 2014.
 M. Kube, R. Hori, O. Minato, and K. Sato, “A threshold voltage controlling circuit for short channel MOS integrated circuits,” ISSCC, p. 54, 1976.
 S. Thompson, I. Young, J. Greason, and M. Bohr, “Dual threshold voltage and substrate bias: Keys to high performance, low power, 0.1 µm logic designs,” Symp. VLSI Tech., p. 69, 1997.
 G. Yeap, “Smart mobile SoCs driving the semiconductor industry: technology trend, challenges and opportunities,” IEDM Tech. Dig., p. 1.3.1, 2013.
 M. Miyazaki, et al., “A 1000-MIPS/W microprocessor using speed adaptive threshold-voltage CMOS with forward bias,” ISSCC, p. 420, 2000.
 S. Narendra, et al., “1.1V 1GHz communication router with on-chip body bias in 150nm CMOS,” ISSCC, p. 218, 2002.
 J. Tchanz, et al., “Adaptive body bias for reducing impact of die-to-die and within-die parameter variations on microprocessor frequency and leakage,” ISSCC, p. 422, 2002.
 A. Keshavarzi, et al., “Technology scaling behavior of optimum reverse body bias for standby leakage power reduction in CMOS IC’s,” ISLPED, p. 252, 1999.
 F. Hamzaoglu, et al., A 153Mb-SRAM design with dynamic stability enhancement and leakage reduction in 45nm high-k metal-gate CMOS technology,” ISSCC, p. 376, 2008.
 J.Y. Chen, “GPU technology trends and future requirements,” IEDM Tech. Dig., p. 3, 2009.
 S. Nomura, et al., “A 9.7mW AAC-decoding, 620mW H.264 720p 60fps decoding, 8-core media processor with embedded forward-body-biasing and power-gating circuit in 65nm CMOS technology,” ISSCC, p. 262, 2008.
 M. Sumita, et al., “Mixed body-bias technique with fixed Vt and Ids generation circuits,” ISSCC, p. 158, 2004.
 S.-H. Yang, et al., “A 32nm high-k metal gate application processor with GHz multi-core CPU,” ISSCC, p. 214, 2012.
 Y. Shin, et al., “28nm high-k metal-gate heterogeneous quad-core CPUs for high-performance and energy efficient mobile application processor,” ISSCC, p. 154, 2013.
 J.L. Shin, et al., “A 40nm 16-core 128-thread CMT SPARC SoC processor,” ISSCC, p. 98, 2010.
 G. Gammie, et al., “A 45nm 3.5G baseband-and-multimedia application processor sing adaptive body-bias and ultra-low-power techniques, ISSCC, p. 258, 2008.
 M. Canada, et al., “A 580MHz RISC microprocessor in SOI,” ISSCC, p. 430, 1999.
~ ~ ~