METHOD FOR PERFORMING SIZING-DRIVEN PLACEMENT Background of the Invention
1. Field of the Invention
The present invention is directed to digital logic design systems. More particularly, the invention is directed to automated digital logic synthesis and placement systems.
2. Background of the Related Art
Prior art computer aided design (CAD) systems for the design of integrated circuits and the like assist in the design thereof by providing a user with a set of software tools running on a digital computer. In the prior art, the process of designing an integrated circuit on a typical CAD system was done in several discrete steps using different software tools.
First, a schematic diagram of the integrated circuit is entered interactively to produce a digital representation of the integrated circuit elements and their interconnections. This representation may initially be in a hardware description language such as Verilog and then translated into a register transfer level (RTL) description in terms of pre-designed functional blocks, such as memories and registers. This may take the form of a data structure called a net list.
Next, a logic compiler receives the net list and, using a component database, puts all of the information necessary for layout, verification and simulation into object files whose formats are optimized specifically for those functions.
Afterwards, a logic verifier checks the schematic for design errors, such as multiple outputs connected together, overloaded signal paths, etc., and generates error indications if any such design problems exist. In many cases, the IC designer improperly connected or improperly placed a physical item within one or more cells. In this case, these errors are flagged to the IC designer so that the layout cells may be fixed so that the layout cells perform their proper logical operation. Also, the verification process checks the hand-laid-out cells to determine if a plurality of design rules have been observed. Design rules are provided to integrated circuit designers to ensure that a part can be manufactured with greater yield. Most design rules include hundreds of parameters and, for example, include pitch between metal lines, spacing between diffusion regions in the substrate, sizes of conductive regions to ensure proper contacting without electrical short circuiting, minimum widths of conductive regions, pad sizes, and the like. If a design rules violation is identified, this violation is flagged to the IC designer so that the IC designer can properly correct the cells so that the cells are in accordance with the design rules.
Then, using a simulator the user of the CAD system prepares a list of vectors representing real input values to be applied to the simulation model of the integrated circuit. This representation is translated into a form which is best suited to simulation. This representation of the integrated circuit is then operated upon by the simulator which produces numerical outputs analogous to the response of a real circuit with the same inputs applied. By viewing the simulation results, the user may then determine if the represented circuit will perform correctly when it is constructed. If not, he or she may re-edit the
schematic of the integrated circuit, re-compile and re-simulate. This process is performed iteratively until the user is satisfied that the design of the integrated circuit is correct.
Then, the human IC designer presents as input to a logic synthesis tool a cell library and a behavioral model The behavioral circuit model is typically a file in memory which looks very similar to a computer program. The behavioral circuit model contains instructions which define logically the operation of the integrated circuit. The logic synthesis tool receives as input the instructions from the behavioral circuit model and the library cells from the library. The synthesis tool maps the instructions from the behavioral circuit model to one or more logic cells from the library to transform the behavioral circuit model to a gate schematic net list of interconnected cells. A gate schematic net list is a data base having interconnected logic cells which perform a logical function in accordance with the behavioral circuit model instructions. Once the gate schematic net list is formed, it is provided to a place and route tool.
The place and route tool is used to access the gate schematic net list and the library cells to position the cells of the gate schematic net list in a two-dimensional format withm a surface area of an integrated circuit die peπmeter. The output of the place and route step is a two-dimensional physical design file which indicates the layout interconnection and two-dimensional IC physical arrangements of all gates cells with m the gate schematic net list.
Prior art placement techniques, however, deal with actual, fixed cells, and the nethst given to the placing routine is fixed. Summary of the Invention
The present invention has been made with the above problems of the prior art in mind, and a first object of the present invention is to provide a method for constant delay placement of cells and nets within a core which is capable of meeting a target core area utilization.
The above object is achieved according to an aspect of the invention by providing a method for constant delay cell and net placement in which cells are sized according to the loads they drive, while maintaining the delay of each cell constant. This is done by an initial placement of cells and nets based on parameters from a cell library followed by an iterative process of coarsely partitioning the cells and nets into buckets, sizing the cells based on the partitioning and checking whether the placement meets specified utilization criteria. In this way, an optimal placement can be obtained which meets constant delay utilization requirements.
In this way, a flexible constant delay placement system can be implemented m which the delay across nets is fixed and the size of cells is chosen according to the load they drive.
Brief Description of the Drawings
These and other objects, features, and advantages of the invention are better understood by reading the following detailed description of the preferred embodiment, taken in conjunction with the accompanying drawings, in which: FIGURE 1 is a flowchart of a first part of a placement process according to a preferred embodiment of the invention;
FIGURE 2 is a diagram of buckets in a core undergoing placement in the embodiment;
FIGURE 3 is a diagram of a slicing structure corresponding to the core of FIG. 2;
FIGURE 4 is a flowchart of a second part of a placement process according to the preferred embodiment; and
FIGURE 5 shows a pairwise refinement process in the preferred embodiment.
Detailed Description Of The Presently Preferred Exemplary Embodiment
In one model, the delay through a single logic gate can be represented as d = g » h + p (1) where d is the delay, g is a parameter called the "logical effort" of the gate, h is a parameter called the "electrical effort" of the gate, and p is the parasitic or fixed part of the delay, g, in turn, is defined by σ — R g",e _min Cgn«_min ..,-,■. Sgate ~ O C *■ ' inv _min ιπv_min where gate_min refers to a minimum-sized gate and inv_min to a minimum-sized inverter, h, in turn, is defined by
/, = £--<-. (3)
where cout is the capacitance out of the gate and cm is the capacitance into the gate. In a constant delay approach to cell placement, the pin-to-pin delay of each cell is fixed early on in the optimization flow. This delay is maintained independently of the load a cell drives. In order to keep delay constant, the size of the cell is adjusted according to the load that it drives. As a result, the area of each cell in the design varies with the load that it drives. The area of each cell is a = b + s»cout (4) where b and s are constants related to the logic of the cell and the chosen constant delay for the cell. Thus, the total area of the netlist is, in matrix notation for plural gates,
A = B + SC (5)
Approximating the input load at each pin of the cell by c
k/h
k, the total load at the output of a cell i is
or, alternatively,
c, = ∑ . — + d, (7)
where d, is the wire load. That is, the total load at the output of cell i is the sum of all its fanout loads plus the load of the wire connecting the cell to its fanouts. In matrix notation for plural gates,
C = HC + D (8)
(I - H)C = D (9)
Setting G = I - H,
GC = D (10) C = σ'D (11) where C is the output capacitance of all gates in the circuit and D is the wire load. Thus, according to the last equation above, the output load of the cells in the circuit can be found once the placement is known. Then, the size of each cell can be found to keep its delay constant. The area of each cell in the netlist denoted by A is A = K, + K2 TC (12) and substituting Equation (10) gives
A = K, + K2 τσ'D (13)
Since the load of each wire can be represented as d = u»l, where d is the wire load, u is the load per unit length of wire and 1 is the total length of the wire, A = K, + uK2 τσ'L (14) and combining constant terms,
A = K, + WL (15)
Thus, in order to minimize the circuit area one can minimize WL, where the matrix W may be viewed as a set of weights of the wire lengths L. Generally, the cell is modeled as a rectangle, with the height of the rectangle being the height of a standard cell row. Thus, the width and therefore the area of the cell are functions of the load.
The preferred embodiment of the present invention processes a data structure representative of the circuit being placed and routed. Preferably, this is done on a digital computer as is known in the art. The data structure may be a netlist or other suitable structure known in the art; however, it is preferably a data model of the type disclosed in the United States Patent Application Van to Ginneken et al. entitled "Method For Storing Multiple Levels Of Design Data In A Common Database", attorney docket number 53455/253032, the contents of which are incorporated herein by reference.
The overall flow of a placement process according to a preferred embodiment of the present invention is shown in FIG. 1. Since a design constraint of the placement process is that the delay across
net be constant, the area of a cell is dependent on the load it dnves In turn, the load of a wire is not known with certainty until the placement process is finished Thus, to make an initial placement of cells withm the core some initial estimations must be made Each cell is assigned a pin-to-pin constant delay in Step 10 Appropπate techniques will be readily apparent to those skilled in the art; however, pin delay assignment is preferably done according to the technique descnbed in the United States Patent Application by Buch filed on April 21, 1999 and entitled "Generalized Theory of Logical Effort for Look-Up Table Based Delay Models, incorporated herein by reference Throughout the placement process, this delay will be maintained constant and the area of the cell varied according to the load it drives in order to achieve the assigned delay. To make the initial cell placement, the area of each cell is calculated in Step 20 using wire loads obtained from the cell library in Step 15 and substituted into Equation 15 Although cells of varying power levels are available only in discrete steps in the target cell library, this phase of the technique proceeds as if a continuous spectrum of cell power levels are available and selects a cell from the library closest to the size ultimately selected as one of the final steps of the process. The total cell area Atotaι is determined by adding up the areas of all the cells in Step 25, and the sizes of the cells are scaled to achieve a target percentage of core utilization m Step 30. Based on this, standard cell rows are created.
More specifically, the core 100 where the cells are placed is divided into coarse placement regions called buckets 110 (see FIG. 2). Each bucket 110 is a small rectangular region withm the core 100. Buckets 110 have equal dimensions but can the placeable area withm a bucket 110 depends on the presence of blockages such as macros in the bucket 110. A bucket 110 can accommodate about fifty average-sized standard cells 120. Then, in Step 40 a slicing structure or binary tree 130 is built whose leaves 140 are the coarse buckets 110. For example, a core 100 having a 4x4 matrix of buckets 110 imposed thereon (of course, in practice there will be a much greater number of buckets 110) as shown in FIG. 2 can be represented by the slicing structure 130 shown in FIG. 3.
Cells 120 are assigned to the buckets 110 so that the total area of cells 120 withm each bucket 110 closely matches the area of that bucket 110 This is done by an iterative bipartitiomng of the data model. First, a horizontal or vertical cut of the core 100 is chosen. The total area available on each side of the partition is computed. Cells 120 are divided using quadratic placement (see below) and a mmcut technique (see, e.g., Fiduccia et al., "A Linear Time Heuπstic for Improving Network Partitions", ACM/TEEE Design Automation Conference, 1982, pp. 175-81, incorporated herein by reference) on each side so that total wire length is minimized. This iteration continues until a desired resolution, e.g , a bucket 110, is reached.
Next, each cell 120 is assigned to one of the buckets 110 using a partitioning technique in Step 45 as shown in FIG. 4. A good placement of cells 120 is one that can be easily routed and satisfies the given timing constraints for the logic circuit. Quadratic placement, and m particular Gordian quadratic placement, finds a legal placement while minimizing the total squared wire length in the circuit and is
the placement technique preferably used Gordian quadratic placement is well-known in the art as shown by, e.g., Klienhans et al., "GORDIAN: VLSI Placement by Quadratic Programming and Slicing Optimization", IEEE Trans, on Computer-Aided Design, v. 10, n. 3 (Mar. 1991), pp. 356-365 (incorporated herein by reference), and for simplicity will be generally described below. The problem is independently solved for the x and y coordinates Briefly descnbing the process for the x coordinates (the process for the y coordinates is similar), quadratic placement solves the following equation subject to a constraint Hx = t (to account for physical realities such as overlapping cells and the like) to minimize total wire length duπng placement:
'* ,M -XjΫ + ∑a -bjf) (16) Vi x'Ax - x'd + constant (17) x is the location of cells 120 and star nets. Star nets are nets with more than fifteen pms. A star net is treated like a cell 120. All cells 120 attached to a star net are considered to be attached to the center of the net through a two-pm net Star nets are used to reduce the number of fill-ms in the matrix A. The weight of a net k, is 2/(number of pins). The weight of a net connecting a cell 120 to the center of a star net is 1. b has the locations of fixed points Fixed points are pms of pads or macros. The diagonal elements of A are non-zero and are computed as follows: a,j = SUM k, (18)
Any cell 120 connecting to cell n through a non-star net and a star net connecting to a cell I contn- bute to the summation. The element au is non-zero if cells I and j are connected through a net. a^ -SUM k, (19)
The contnbution comes from the nets connecting cells I and j. d, = SUM bjk, (20)
The contnbution comes from all constant pms attached to cell l. The x coordinates for a placement that minimizes the total wire length is obtained by solving Ax = d (21)
The initial constraints for quadratic placement assumes the center of mass for all cells 120 on the chip is the center of the chip. If the area of each cell is a„ ∑a,x, = xcenter forms the first constraints for quadratic placement.
Now that the cells 120 have been coarsely partitioned in the buckets 110 in Step 45, more accurate wire loads can be calculated based on the coarse placement in Step 50, and the cell areas can be recalculated in Step 55 using Equation 15 with the new wire loads substituted therein. At this point, the cell placement will likely be somewhat unbalanced This imbalance may take several forms:
— widely varying cell utilization percentages ~ for example, if the core utilization before the cell area recalculation is 90%, after recalculation some buckets 110 will have higher utilization percentages and some buckets 110 will have lower utilization percentages. This is undesirable because, for example, overutihzed buckets 110 may present obstacles to wire routing or usage of pads
— cell recalculation enlarges the size of some cells 120 so that they do not fit within their buckets 110, or so that they overlap other cells 120.
— cell recalculation results in too much wasted area, i.e., unutilized core area.
To correct these problems, an iterative procedure is used. First, the current layout is checked to see if it meets given utilization constraints such as core utilization percentage in Step 60. If so, the placement procedure is complete and this part of the routine ends. If not, i.e., if the total area Atota| of the cells 120 does not fit in the core 100 within the given predetermined utilization constraints, the procedure returns to Step 45 where repartitioning is conducted by coarse placement based on the last- determined cell areas from Step 50, and the repartition-recalculation-checking loop is iteratively executed again based on the newly-calculated cell areas and wire loads to further converge toward an acceptable placement.
Additional analysis shows that it is always possible to find a floor plan where the total area of the cells 120 matches the core area. Consider a coarse placement where each cell 120 has a location (xi, yi) and an area based on the load it drives as outlined above. From Equation 14 above, the total area of the design is
Acen ^ K. + WL (22)
Now, assume both x and y directions are stretched by a factor of α. The length of each wire is increased by α, and since the cell area is linearly dependent on the wire length,
AscAdan - αOC + WL) (23) However, by scaling the core 100 by a factor of α its area will increase quadratically:
AscaledCore = O Acore (24)
Since the core area increases more rapidly that the cell area as they are scaled, at some point the core area will be equal to and then exceed the cell area. This point can be found by setting the scaled core area equal to the total scaled cell area and solving for α: α2Acore = α(K, + WL) (25)
« = — (26)
^Core
This is the factor by which the core 100 must be enlarged to accommodate the total cell area.
After a satisfactory placement has been found in Step 60, the cell area in individual buckets 110 is balanced to balance routing resource usage and area usage among all buckets 110. First, a global router assigns routes to all nets in Step 65 and an analysis of routing resources on the core 100 determines congested areas. In Step 70, cells 120 in the most congested areas are "padded" by arbitrarily increasing their areas slightly, and cells 120 in the most underutilized areas are "shrunk" by arbitrarily reducing their areas slightly. This tends to increase the rate at which cells 120 migrate from overutilized areas to underutilized areas.
Next, a bucket equalization process is applied to the cells 120 to move cells 120 from overutilized buckets 110 to underutilized ones in Step 75. This is a sort of "bucket brigade" movement in which a cell 120 moves at most from one bucket 110 to an adjacent bucket 110. For example, in a series of ten consecutively numbered buckets 110 on a horizontal path, if cells 120 need to be moved from bucket 1 to bucket 10, some cells 120 are moved from bucket 1 to bucket 2; some from bucket 2 to bucket 3, etc. As cells 120 move from one bucket 110 to another, the loads of nets attached to them change. This causes a corresponding change in the area of other cells 120 in the design, and these are corrected locally rather than through a global recalculation process. To ensure that changes to cell areas are minimized, cell movements along many different paths are examined and only the best used. Finally, in the pairwise refinement process of Step 80, a mincut process is applied between adjacent buckets 110 in a sweeping fashion as shown in FIG. 5. Starting from the topmost corner of core 100, each bucket 110 and its immediate neighbors to the right and bottom are repartitioned in order to reduce the number of crossing nets. One full pass of the repartitioning ends when the bottom rightmost bucket 110 is reached. At this point, the total wire length in the circuit is computed in Step 85 and the areas of all cells 120 are readjusted in Step 90. If there is an improvement in wire length, another iteration through the process is begun at step 65; if not, the process is complete, and the circuit has been placed and routed.
The above description of the preferred embodiment of the present invention has been given for purposes of illustration only, and the invention is not so limited. Modification and variations thereof will become readily apparent to those skilled in the art, and these too are within the scope of the invention. Thus, the present invention is limited only by the scope of the appended claims.