Abstract—3D integration has the potential to increase performance and decrease energy consumption. However, there are many unsolved issues in the design of these systems. In this work we study the design of many-tier (more than 4 tiers stacked) 3D power-supply networks and demonstrate a technique specific to 3D systems that improves IR-drop over a straightforward extension of traditional design techniques. Previous work in 3D power delivery network design has simply extended 2D techniques by treating through-silicon vias (TSVs) as extensions of the C4 bumps. By exploiting the smaller size and much higher interconnect density possible with TSVs we demonstrate significant reduction of nearly 50% in the IR-drop of our 3D design. Simulations also show that a 3-tier stack with the distributed TSV topology actually lowers IR-drop by 20% over a non-3D system with less power dissipation. Finally, we analyze the power distribution network of an envisioned 1000-core processor with 30 stacked dies and show scaling trends related to both increased stacking and power distribution TSVs. Our 3D analysis technique is validated using commercial-grade sign-off IR-drop software from a major EDA vendor.

I. INTRODUCTION

3D stacking of ICs has generated increasing interest from the VLSI community in recent years. The many potential benefits of 3D integration include reduced power consumption from off-chip communication, reduced wirelength and delay, and lower-cost process integration. All of these points indicate that increasing the scale of 3D integration (the number of tiers stacked together) can continue to provide better system performance. However, there are many challenges involved in the design of 3D ICs in general, and large-scale (many-tier) 3D systems in particular, that have not been met. For example, smaller footprints combined with larger package-level system power imply increased power delivery problems. In this work we provide a layout-level examination of the design of many-tier 3D power delivery networks, and demonstrate that the unique environment of 3D ICs can have a dramatic effect on IR-drop in these networks. 1

IR-drop (sometimes referred to as ground-bounce) is the resistive voltage drop in power and ground distribution networks caused by the dynamic and leakage power of ICs. IR-drop causes many problems in modern microprocessor and ASIC designs and was one of the causes of the end of the frequency scaling era. As device scaling continues, lower and lower supply voltages are increasing total current and reducing power supply noise margins even further. These issues are causing a larger and larger percentage of available routing resources to be dedicated to power supply distribution in high-performance designs, which can add significantly to congestion problems and reduce the amount of functionality that can be packed into a unit area.

Many researchers have proposed optimization schemes for traditional IC power network design. Previous work on 3D power delivery networks has largely assumed a straightforward extension of 2D power delivery network design. Huang et al. [1] presented a physical model of 3D power distribution networks. Jain et al. [2] extended previous work examining multi-story power delivery in 3D ICs. Yu et al. [3] demonstrate an optimization scheme for supply bump assignment and via insertion simultaneously considering both supply noise and temperature. All previous work assumes that supply bumps are aligned with TSVs in every case.

All prior work has also been limited to studying 3D systems with a small number of stacked tiers. In this work we demonstrate that the number of tiers stacked together has a strong affect on the power-supply-noise performance of the resulting system. Accordingly, we examine large-scale systems with up to 30 tiers stacked together. This is not meant as an argument for creating these systems, but merely for demonstrating the power-supply-noise scaling behavior of 3D stacking.

The overall goal of this work is to explore power delivery in 3D ICs and how it differs from traditional designs. Compared to prior efforts we demonstrate the benefits of re-examining the unique capabilities of TSVs relative to package-level bumps. We also perform our analysis using layout-level designs and validate our model results using commercial-grade sign-off IR-drop analysis software. The major contributions of this work are as follows:

- We present the first layout-level analysis of 3D power distribution networks that is validated using commercial tools.
- We demonstrate the potential IR-drop benefits of spreading power and ground distribution TSVs away from the power and ground supply bumps in designs with non-uniform power dissipation.
- We examine scaling trends in 3D power distribution networks using this framework to demonstrate future potential for increased 3D stacking using an envisioned 1000-core system with 30 stacked tiers.

II. 3D AND FLIP-CHIP POWER NETS

High performance 3D systems will generally use flip-chip style packaging to increase off-chip interconnect density
and reduce parasitics. Flip-chip power distribution systems are commonly laid out as grids. High-level metal layers are reserved for laying out a coarse-grained grid with large wires that connects a regular array of power and ground C4 bumps. A fine-grained mesh provides local distribution and connects to lower-level-metal power rings or standard-cell row distribution wiring. Most commercial products today have C4 bump pitches around 100 to 200μm, however, researchers have demonstrated micro-bumps with pitches below 10μm.

In 3D systems, each tier contains a power distribution grid. All of the individual grids are connected by TSVs. TSVs can be manufactured in many different sizes. Diameters of less than 1μm have been shown in the literature. Power and ground TSVs should be large to have low resistance, but signal TSVs should be small to increase interconnect density and reduce parasitic capacitance. Manufacturing multiple TSV sizes on a single die would increase cost and reduce yield. Therefore, it will likely be necessary to use a single TSV size for both power distribution and signal wiring.

In this work it is assumed that only one TSV size is available, and is optimized for signals. There are several potential combinations of TSV distribution that could be used to deliver power. Figure 1 shows two of the basic choices we investigate thoroughly in this paper.

- **clustered topology**: multiple small TSVs are clustered over the C4 pads for both power and ground distribution.
- **distributed topology**: multiple small TSVs are spread evenly throughout the die.

In both topologies, the combined resistance of all the TSVs is assumed to be the same. The figure depicts TSV topologies for a single tile in the power/ground network. This tile is mirrored and replicated all over the chip.

III. Prototype Layout

The prototype layout used in our simulations is based on a design targeted at demonstrating extreme memory bandwidth using 3D interconnects. Our design is a many-core processor composed of an array of simple cores connected with a nearest-neighbor communication mesh. Each core has eight banks of dedicated SRAM directly stacked above it in two separate tiers. Each core tier contains a 10 × 10 array of cores. One grouping of one core tier and two SRAM tiers is defined to be one “set” of our scalable prototype layout. We envision stacking 10 sets together to form a 1000-core processor.

Our 1000-core processor was designed using a 130-nm standard cell library from Global Foundries. The layouts for a single core and a single memory tile are shown in Figure 2. We also highlight the areas in the layout reserved for ground TSV connections. For the distributed TSV topology, TSVs are located at all of the potential locations. In the clustered TSV topology, all of the TSVs are grouped into the center position, over the C4 bump. Each location is capable of accepting a 6μm diameter via-first TSV, while the locations over the C4 bumps (the center and near the corners) are capable of accepting 25 or more of these TSVs.

The single-core and single-tile layouts are both 560μm square. The full 100-core and 100-tile layers are approximately 6mm square. The maximum total power dissipation per set (1 core tier + 2 memory tiers) is approximately 13.2W, the 1000-core system then has a total power dissipation of 132W. Figure 3 shows the power map for a single core.

IV. 3D IR-Drop Analysis Methodology

A. Methodology

Layout-level IR-drop values are computed by dividing gate-level power consumption values by the nominal supply voltage to obtain gate- and module-level current consumption values. Next, parasitic extraction is performed on the layout to obtain a SPICE netlist that models the power distribution network. Our experiments were performed using Cadence’s QRC transistor-level extraction tool. The current consumption values are then assigned to the appropriate circuit nodes.

Simulation of power distribution networks is a generally difficult problem for traditional ICs due to their large size. Given the extreme regularity of the prototype design that is examined in this work, we reduce the memory and execution-time requirements of our simulations by only considering an area containing a single core and the tiers directly above it. We stress that our design is extremely regular and so this reduction should only impact the accuracy of our analysis in a minor way. Simulations indicate that this introduces approximately 3% error in our results. However, the error is systematic in nature, and should not affect the results of our scaling studies.

B. Validation

To validate the IR-drop analysis flow described above we compare the results for a 2D layout to Cadence’s VoltageStorm sign-off power noise analysis tool. The results of our analysis flow are within 4% of the values reported by VoltageStorm. We
were also able to create a method for tricking VoltageStorm into performing 3D analysis for two-tier stacks.

First we create an ICT file, a process technology description file, that contains a description of all of the metal layers in two tiers. The metal and dielectric layers are renamed so that the tier number is embedded in the name. For example, “METAL1” becomes “METAL1_1” and “METAL1_2.” Then, a techfile is created using Cadence’s TechGen based on the new ICT file. Next, we modify the LEF files that describe the technology, standard cells, and macros. The DEF and instance power files for the designs of each tier are also modified in the same way. Each file is essentially duplicated so that there is one version for the first tier and one version for the second tier. Using the above method we were able to match the 3D IR-drop results from VoltageStorm within 4%. Our experiments use a face-to-back style 3D design, however, this technique is general enough to apply to face-to-face 3D designs as well.

V. EXPERIMENTAL RESULTS

For our baseline analysis we assume copper via-first TSVs with 6 \( \mu \)m square diameter, 20 \( \mu \)m depth, and 35m\( \Omega \) resistance. For simplicity we present only the results for the ground distribution network. Simulations show that the power distribution network has the same trends, only the location of the maximum IR-drop peak is shifted. In real designs the difference between the actual supply and ground voltages are what determine the performance of the gates. Given that we only simulate a single core and the tiers above it we utilize a lumped package model for the C4 bumps. The C4 resistance in our simulations is 5m\( \Omega \). Each of the memory tiers in our simulations consume about 0.7 \( \times \) the power value of the core tiers, so the term “low-power tier” is somewhat relative.

A. IR-drop Comparison: Clustered vs Distributed

Figure 4 shows the effect on IR-drop of stacking more sets of the scalable prototype together. The distributed TSV topology provides a much lower IR-drop value as the number of sets stacked together becomes large. The distributed topology also allows up to six more tiers to be stacked together before crossing the 10% noise margin of 150mV compared to the clustered topology. The basic reason for this improvement in IR-drop is that the distributed TSV topology allows the tiers with the most IR-drop to accept current through the networks with lower IR-drop. The distributed topology effectively utilizes the “IR-drop slack” of the low-power tiers to lower the maximum system-level IR-drop.

For systems with fewer numbers of sets stacked in Figure 4 the clustered and distributed topologies result in very similar IR-drop values. Figure 5 shows the actual percentage improvement of the distributed topology over the clustered topology IR-drop for three TSV site resistance values. The default is 35m\( \Omega \). These resistances are the resistance of each possible TSV location, called a TSV site. The resistances can represent multiple TSVs at each location in parallel. Figure 4 shows that TSV site resistance has a significant impact on the relative IR-drop, with the distributed topology always showing lower IR-drop when more sets are stacked.

B. Decreasing C4 Bump Pitch

The Euclidean distance between neighboring C4 bumps in our default layout is 283\( \mu \)m. Given the low power dissipation of a single core this is sufficient for low-tier systems, however, for our 1000-core system the IR-drop is still above the 10%
IR-Drop (mV)
-20  -10  0  10  20  30  40  50  60
3  6  9  12  15  18  21  24  27  30
Number of Tiers Stacked
% Improvement
1 mΩ
35 mΩ
100 mΩ

In this work we have explored many-tier 3D power delivery network design and shown that IR-drop can be improved in these systems by exploiting the particular attributes of power supply TSVs that are unique compared to those of C4 supply bumps. Previous works have assumed a straightforward extension of traditional power supply network design in which the TSVs are treated as an extension of the C4 bumps. We advocate a design style in which power network TSVs are distributed throughout the entire surface of the layout. This increases the level of coupling between the power distribution networks of the various tiers in the 3D stack. This technique also allows the utilization of IR-drop slack in the lower-power tiers to reduce overall system-level IR-drop.

We designed and analyzed a 1000-core 3D processor across 30 stacked tiers at the layout level to support our claims. Our 3D IR-drop analysis method was verified against commercial-grade sign-off IR-drop analysis software from a major EDA vendor at both the 2D and two-tier 3D level. Detailed simulations of the stacking scaling and TSV resistance scaling demonstrate that the distributed TSV topology generally provides much lower IR-drop. In our baseline system with 30 stacked tiers the distributed topology provides nearly 50% lower IR-drop than the clustered topology. For low-tier systems the savings are still significant. In fact, the distributed TSV topology lowers IR-drop for a 3-tier system compared to a non-3D system by 20%, even though the total power consumption is higher in the 3-tier system.

REFERENCES