This post was first published as part of Paul McLellan’s new Breakfast Bytes blog on the Cadence website.
~ ~ ~
Cadence recently put out a press release that the Cadence implementation flow had been qualified on the GLOBALFOUNDRIES 22FDX process. At ARM TechCon, Joerg Winkler and Tamer Ragheb from the design enablement group of GLOBALFOUNDRIES in Dresden, Germany, provided a lot more detail. With German precision, their talk was titled The Implementation of ARM® Cortex®-A17 Quad-Core in GLOBALFOUNDRIES 22FDX Technology Using Cadence Innovus Implementation System. But I didn’t need to wait, I got a 1:1 meeting with Joerg earlier in the week and we went into more detail.
One thing that I had been confused about that Joerg clarified when I asked him is whether double patterning is used. I knew that the metal pitch was 80nm and so, in principle, could be single patterned. But that is only true if the layer is patterned in a single direction (vertical or horizontal but not both). For metal1 and metal2 they wanted to have both directions so that Ls and Ts could be made inside the standard cells, which meant that they needed to use double patterning.
At some level, the details of the process don’t affect the implementation flow that much. The transistors are inside the standard cells and other blocks, so whether they are planar, FinFET or FD-SOI is secondary to where the pins are, how the coloring affects placement, and so on. So the basic flow through Genus, Innovus and the various signoff engines is unchanged from any other process.
One area where FD-SOI is very different, as I went through in detail in my earlier blog, is the forward and reverse body bias (fbb/rbb). This is a voltage applied to the back of the thin buried oxide (the box) that doesn’t turn the transistor on or off (the box is too thick, and the bias can only change slowly due to the high capacitance). However, it does affect the performance of the transistor. This allows various tradeoffs: lower the voltage to reduce power, and then speed it up again with fbb; reduce leakage in a hibernating IoT device with rbb. Basically fbb increases the performance and this can be taken purely as increased performance, or as reduced power at the same performance. And rbb reduces performance but also decreases leakage, so it can be used when high performance is not required but power is critical.
The challenge that Joerg and his team faced was to come up with an architecture for how to connect up the bias. It is a little like planning a power grid. There are local decisions as to how to actually connect, what layer of metal to use and so on. Then there are block-level issues such as how to distribute the signals without creating huge blockages for routing. There isn’t really a chip-level issue like for power since, except perhaps for test chips, the bias is not expected to be externally applied through the package pins, but rather generated internally on the chips with charge pumps and enabled/disabled under software control. The highest level decision is to partition the chip into areas where different biases can be applied—the same bias is not needed everywhere.
The test vehicle chosen was a Quad-Core ARM Cortex-A17 processor. They decided to create five areas where the bias could be controlled independently, each of the four cores and then everything else, which notably includes the L2 cache and its controller. The libraries used were an 8-track standard cell library from Invecas (GF’s IP development partner) with continuous RX and support for body biasing. The cache memories were built from GF evaluation memory kit, with 14 different L1 cache memory macros, 1 L2 cache memory macro with support for body biasing of the bitcell array and for the memory periphery. An additional complication is that these areas also need to support power down (so cores can be powered off completely as well as biased). The body bias is all specified in the IEEE 1801 power file (so that all the other tools can handle the power policy chosen) and in the script that drives the Innovus Implementation System during physical design to actually create the connections.
The body bias nets needed to be connected up to dedicated pins on the well-tap cells, the power switches and the memory macros. The cells were carefully aligned so that the body bias connections were straight runs of metal, and then a body-bias net ring was placed around the perimeter of the module as is often done with power nets. See the diagram below.
The actual connections can be seen in the diagram below. The 10 yellow lines running across are the 10 body bias signals for the 5 regions (they are in pairs, one for P transistors and one for N). The green vertical lines in the middle pick up the appropriate pair of bias signals and these, in turn, are connected to the actual well tap cells (where the signals effectively connect to the back gate).
In a very similar way, the bias for the memory is also picked up from a ring and then run through the core of the memories using straight runs of metal.
GLOBALFOUNDRIES also created an implementation of the smaller ARM Cortex-A9 processor. They have been using this microprocessor for several technology nodes to allow a comparison of power, performance, and area (PPA). In this particular case they wanted to compare performance at different body biases compared to the 28SLP process. The result is that 22FDX with fbb has 30% higher frequency at the same power (along with a 45% area reduction) and with rbb has 45% power reduction at the same frequency (and obviously the same 45% area reduction). This means that the implementation can vary over a huge range of power/performance with the same silicon, whereas at 28nm it would require a complete re-implementation (which is why it appears as a single red dot rather than a curve).
~ ~ ~
This post was first published under the new Breakfast Bytes blog on the Cadence website. The original is here. Many thanks to the folks at Cadence and to Paul McLellan for permission to repost it here on ASN.