IC Design : ASIC DESIGN

ASIC Design Flow

Step 1: Prepare an Requirement Specification

Step 2: Create an Micro-Architecture Document.

Step 3: RTL Design & Development of IP's

Step 4: Functional verification all the IP's/Check whether the RTL is free from Linting

Errors/Analyze whether the RTL is Synthesis friendly.

Step 4a: Perform Cycle-based verification(Functional) to verify the protocol behaviour of the RTL

Step 4b: Perform Property Checking , to verify the RTL implementation and the specification understanding is matching.

Step 5: Prepare the Design Constraints file (clock definitions(frequency/uncertainity/jitter),I/O delay definitions, Output pad load definition, Design False/Multicycle-paths) to perform Synthesis, usually called as an SDC synopsys_constraints, specific to synopsys synthesis Tool (design-compiler)

Step 6: To Perform Synthesis for the IP, the inputs to the tool are (library file(for which synthesis needs to be targeted for, which has the functional/timing information available for the standard-cell library and the wire-load models for the wires based on the fanout length of the connectivity), RTL files and the Design Constraint files, So that the Synthesis tool can perform the synthesis of the RTL files and map and optimize to meet the design-constraints requirements. After performing synthesis, as a part of the synthesis flow, need to build scan-chain connectivity based on the DFT(Design for Test) requirement, the synthesis tool (Test-compiler), builds the scan-chain.

Step 7: Check whether the Design is meeting the requirements (Functional/Timing/Area/Power/DFT) after synthesis.

Step 7a: Perform the Netlist-level Power Analysis, to know whether the design is meeting the power targets.

Step 7b: Perform Gate-level Simulation with the Synthesized Netlist to check whether the design is meeting the functional requirements.

Step 7c: Perform Formal-verification between RTL vs Synthesized Netlist to confirm that the synthesis Tool has not altered the functionality.( Tool: Formality )

Step 7d: Perform STA(Static Timing Analysis) with the SDF(Standard Delay Format) file and synthesized netlist file, to check whether the Design is meeting the timing-requirements.( Tool: PrimeTime)

Step 7e: Perform Scan-Tracing , in the DFT tool, to check whether the scan-chain is built based on the DFT requirement.

Step 8: Once the synthesis is performed the synthesized netlist file(VHDL/Verilog format) and the SDC (constraints file) is passed as input files to the Placement and Routing Tool to perform the back-end Actitivities.

Step 9: The next step is the Floor-planning, which means placing the IP's based on the connectivity,placing the memories, Create the Pad-ring, placing the Pads(Signal/power/transfer-cells(to switch voltage domains/Corner pads(proper accessibility for Package routing), meeting the SSN requirements(Simultaneous Switching Noise) that when the high-speed bus is switching that it doesn't create any noise related acitivities, creating an optimised floorplan, where the design meets the utilization targets of the chip.

Step 9a : Release the floor-planned information to the package team, to perform the package feasibility analysis for the pad-ring .

Step 9b: To the placement tool, rows are cut, blockages are created where the tool is prevented from placing the cells, then the physical placement of the cells is performed based on the timing/area requirements.The power-grid is built to meet the power-target's of the Chip .

Step 10: The next step is to perform the Routing., at first the Global routing and Detailed routing, meeting the DRC(Design Rule Check) requirement as per the fabrication requirement.

Step 11: After performing Routing then the routed Verilog netlist, standard-cells LEF/DEF file is taken to the Extraction tool (to extract the parasitics(RLC) values of the chip in the SPEF format(Standard parasitics Exchange Format), and the SPEF file is generated. ( Tool: STARRC )

Step 12: Check whether the Design is meeting the requirements (Functional/Timing/Area/Power/DFT/DRC/LVS/ERC/ESD/SI/IR-Drop) after Placement and Routing step.

Step 12a: Perform the Routed Netlist-level Power Analysis, to know whether the design has met the power targets.

Step 12b: Perform Gate-level Simulation with the routed Netlist to check whether the design is meeting the functional requirement .

Step 12c: Perform Formal-verification between RTL vs routed Netlist to confirm that the place & route Tool has not altered the functionality.

Step 12d: Perform STA(Static Timing Analysis) with the SPEF file and routed netlist file, to check whether the Design is meeting the timing-requirements.

Step 12e: Perform Scan-Tracing , in the DFT tool, to check whether the scan-chain is built based on the DFT requirement, Peform the Fault-coverage with the DFT tool and Generate the ATPG test-vectors.

Step 12f: Convert the ATPG test-vector to a tester understandable format(WGL)

Step 12g: Perform DRC(Design Rule Check) verfication called as Physical-verification, to confirm that the design is meeting the Fabrication requirements.

Step 12h: Perform LVS(layout vs Spice) check, a part of the verification which takes a routed netlist converts to spice (call it SPICE-R) and convert the Synthesized netlist(call it SPICE-S) and compare that the two are matching.

Step 12i : Perform the ERC(Electrical Rule Checking) check, to know that the design is meeting the ERC requirement.

Step 12j: Perform the ESD Check, so that the proper back-to-back diodes are placed and proper guarding is there in case if we have both analog and digital portions in our Chip. We have seperate Power and Grounds for both Digital and Analog Portions, to reduce the Substrate-noise.

Step 12k: Perform seperate STA(Static Timing Analysis) , to verify that the Signal-integrity of our Chip. To perform this to the STA tool, the routed netlist and SPEF file(parasitics including coupling capacitances values), are fed to the tool. This check is important as the signal-integrity effect can cause cross-talk delay and cross-talk noise effects, and hinder in the functionality/timing aspects of the design.

Step 12l: Perform IR Drop analysis, that the Power-grid is so robust enough to with-stand the static and dynamic power-drops with in the design and the IR-drop is with-in the target limits.

Step 13: Once the routed design is verified for the design constraints, then now the next step is chip-finishing activities (like metal-slotting, placing de-coupling caps).

Step 14: Now the Chip Design is ready to go to the Fabrication unit, release files which the fab can understand, GDS file.

Step 15: After the GDS file is released , perform the LAPO check so that the database released to the fab is correct.

Step 16: Perform the Package wire-bonding, which connects the chip to the Package.

Synthesis is process of converting RTL (Synthesizable Verilog code) to technology specific gate level netlist (includes nets, sequential and combinational cells and their connectivity).

Goals of Synthesis

1. To get a gate level netlist

2. Inserting clock gates

3. Logic optimization

4. Inserting DFT logic

5. Logic equivalence between RTL and netlist should be maintained

Input files required

1. Tech related:

· .tf- technology related information.

· .lib-timing info of standard cell & macros

2. Design related:

· .v- RTL code.

· SDC- Timing constraints.

· UPF- power intent of the design.

· Scan config- Scan related info like scan chain length, scan IO, which flops are to be considered in the scan chains.

3. For Physical aware:

· RC co-efficient file (tluplus).

· LEF/FRAM- abstract view of the cell.

· Floorplan DEF- locations of IO ports and macros.

Synthesis Interview Questions

What is Synthesis?

explain what role the Synopsys DesignWare libraries fulfill in the synthesis process.

What is the difference between a high level synthesis tool (as represented by Synopsys behavioral Compiler) versus a logic synthesis tool (as represented by Synopsys Design Compiler)?

Explain what it meant for Synopsys DesignWare component to be ‘inferred’ by a synthesis tool?

What are different power reduction techniques?

How do you perform Synthesis activities in Multi vt libraries?

What are the advantages of clock gating?

One circuit will be given to you, where one of the inputs X have a high toggling rate in the circuit. What steps you take to reduce the power in that given circuit?

You will be told to realize a Boolean equation. The next question is how efficient usage of power is achieved in that circuit?

Some circuit will be given to you and will be instructed to set certain timing exceptions commands on that particular path.

What is the difference in PT timing analysis during post and pre layout designs?

What you mean by FSM States?

Draw the timing waveforms for the circuit given?

What is Setup time and hold time effects on the circuit behavior while providing different situations?

What is the difference of constraints file in Pre layout and post layout?

What is SPEF? Have you used it? How you can use it?

What difference you found (or can find) in the netlist and your timing behavior, while performing timing analysis in pre layout and post layout?

What is clock uncertainty, clock skew and clock jitter?

What is the reason for skew and jitter?

What is clock tree synthesis?

What are the timing related commands with respect to clock?
In front end, you set ideal network conditions on certain pins/clocks etc. Why? In Back end how is it taken care?

Which library you have used?

What difference you (can) find in TSMC and IBM libraries?

Draw the LSSD cell structure in TSMC and IBM libraries?

Every tool has some drawbacks? What drawbacks you find in Prime time?

What are the difference you find when you switch from 130nm to 90nm?

Explain the basic ASIC design flow? Where your work starts from? What is your role?

What is 90nm technology means?

What are the issues you faced in your designs?

Perform the setup and hold check for the given circuit.

Why setup and hold required for a flop?

You had any timing buffer between synthesis and P&R? How much should be the margin?

What are the inputs for synthesis and timing analysis from RTL and P&R team? Whether any inputs for changing the scripts?

How will you fix the setup and hold violation?

What are the constraints you used for the synthesis? Who decides the constraints?

What is uncertainty?

What is false path and multi cycle path? Give examples? For given example for false path what you will do for timing analysis?

What strategies used for the power optimization for your recent project?

Why max and min capacitance required?

You have two different frequency for launch (say 75Mhz) and capture (say 100Mhz).

what will happen to data? Write the waveform? If hold problem what you will do?

What is Metastability? How to overcome metastability? If metastable condition exists which frequency you will use as clock- faster or slower? Why?

Have you used formality? For a given block what checks it will do? How it verifies inside the block?

If you changed the port names during the synthesis how will you inform Formality?

Why you use power compiler? What is clock gating? What are advantage and disadvantages of clock gating? Write the clock gating circuit? Explain.

How will you control the clock gating inference for block of register? Write the command for the same?

Write the total power equation? What is leakage power? Write equation for it.

For clock gated flop and non clock gated flop outputs connected to a AND gate what problem can you expect? How to avoid the problem?

Write the sequence detector state which detects 10? How will optimize? write the verilog code for the same?

What is jitter? Why it will come? How to consider? What is the command for that?

What is clock latency? How to specify? What is the command for that?

What is dynamic timing analysis? What is the difference with static timing analysis? Which is accurate? Why it is accurate?

Give any example for Dynamic timing analysis? Do you know anything about GCL simulation?

What is free running clock?

What type of operating condition you consider for post layout timing analysis?

What is one-hot encoding technique? What are advantages? What are types of encoding?

Which scripting language you know?

Constant folding

Verilog constructs in synthesis

Unconstructs in synthesis

synthesis inputs

Outputts to be given to the PD team

Power optimization

Area optimization

DRC

Auto ungrouping

SPEF

DEF

How will you analysis the timing of different modes in design? How many modes you had in your design? What are the clock frequencies?

What your script contains?

Write the digital circuit for below condition: "when ever data changes from one to zero or zero to one the circuit should generate a pulse of one clock period length"?

Have come across any design with latches? What is the problem in timing analysis if you have latch in your design?

Have you come across any multiple clock design? What are the issues in multiple clock designs?

What you mean by synthesis strategies?

Latency,clock skew

clock constraint

ideal clock

Constraining register paths

Multiple output paths -constraints

combo path constraining

global skew,local skew

positive skew,negative skew,useful skew

Timing reports

group/ungroup\

Boundary optimization

top design low.bottom design flow.when preferred?

design aware why used?

translation,map,optimization

min delay,max delay

timing exceptions

operating conditions.how to pick for synthesis?

Latch in asic.problems

Logical lib vs physical library

LEF vs DEF.how significant

scan synthesis.lock up latches use

reset synchronizer.reset recovery and removal times

Network conditions on certain pins/clocks.why?

pipeling,optimization technique.verilog used

derate factor

cross talk

is it true that synthesis transformations take less time at the top abstraction levels?

Is it true that synthesis transformations give refined results at the top abstraction levels?

What will a well formed case statement synthesize to?

What will happen to a design that is synthesized without any constraints?

STATIC TIMING ANALYSIS

Dynamic Timing Analysis

Advantages:

1. Extends coverage of circuit simulation (edges to region).

2. Evaluates worst-case timing using both minimum and maximum delay values for components.

3. Uses the same test stimulus as logic simulation.

4. Does not report false errors.

Disadvantages:

1. It is not complete.

2. It is not path oriented.

3. It is slower than logic simulation and may require additional test stimulus.

4. It requires functional behavioral models.

Dynamic timing analysis extends logic simulation by reporting violations in terms of simulation times and states. To test circuit timing using worst-case conditions, dynamic timing analysis evaluates the circuit using minimum and maximum propagation delays for each component for each component in the design.

Since dynamic timing analysis performs a simulation, it can use the same stimulus as a logic simulation. Because the stimulus functionally exercises the design, false errors of unused or uninteresting paths are not tested.

Note a timing simulation reports results differently than a logic simulation.

A logic simulation reports results as edge times and a timing simulation reports results as regions of ambiguity. The results of a timing simulation do not specify exactly when an event occurs, they specify a range of time in which an event can occur.

Static Timing Analysis

Advantages:

1.It resembles manual analysis methods.

2.It is path oriented and finds all setup and hold violations.

3.It does not require stimulus or functional models.

4.It is faster than simulation. (for the same amount of coverage)

Disadvantages:

1. It can report false errors.

2. It cannot detect timing errors related to logical operation.

Static timing analysis tools typically use timing models at the logic primitive level. The timing parameters are typically similar among different timing tools. The following are some of the common timing parameters for primitive logic gates, flip-flop and latch.

Timing Measurements for Primitive Gates

Transition time is the time between one specified voltage level and another voltage level for a given signal. Transition rise time is the time between a specified low voltage level and a specified high voltage level. Transition fall time is the time between a specified high voltage level and a specified low voltage level.

Setup time:-Setup time is the minimum time for which the control level needs to be mantain constant and should not change before the triggering edge of clock pulse.

Hold time:-Hold time is the minimum time for which the control level needs to be mantain constant and should not change after the triggering edge of clock pulse.

Meta-stability:-If the setup and hold window is violated, metastable state occurs where the output can not settle down to a particular state and keep oscillating between 0 and 1.

To recover from Metastability there are a number of techniques available:

One of them could be the use of a 2 Flip flop or 3 flip flop Synchronizer depending on MTBF(Mean time Before Failure) to provide the Metastability enough time to settle down at the output.

Force the flip flop to enter into a valid logic state so that it should not enter into Metastability or to wait at the output so that the circuit comes out of the metastability on its own.

Proper use of mux recirculation technique and mesochronous synchronizer also reduces Metastability.

Increase the clock frequency and adding buffers can also help in reducing set up violations.Generally in short when calculating timing on a logic circuit it is calculated on four different paths.

The paths are:-

Data path

Clock path

Clock gating path

Asynchronous path.

1)how to solve setup and hold violations in the design

To address setup time violations, you can:

Use larger/stronger cells to drive paths with high capacitance, which can reduce the time needed to transition on sluggish net.

Adjust the skew of the clock to the start or endpoint of the path which is violating. (time borrowing).

Move gates around to make the total distance between different cells in the violating path smaller (less capacitance to drive = faster transitions)

Insert retiming flops on the path, if the design will allow for it (try to do an operation in two clock cycles instead of one)

Reduce the overall clock frequency.

For hold time violations:

·Skew the clock to the start/endpoint (reverse of how to fix setup) to make the endpoint clock arrive earlier.

Insert cells along the path to increase the propogation time (insert chains of buffers).

Reduce the drive strength of cells on the path to make the transition time increase.

Why Timing Constraints?

Timing Constraints is an Important part of designing ASICs or FPGAs. Generally, we want to make sure that your design is functional by verification methods and to make sure that it will behave correctly after manufacturing by timing analysis.

How Timing Analysis?

There are two ways to perform Timing Analysis

Dynamic Timing Analysis requires a set of input vectors to check the timing characteristics of the paths in the design. If we have N inputs then we need to make 2^N simulation combinations to get full timing analysis.

Static Timing Analysis checks timing violations without simulations. This is faster but doesn't check functionality issues.

ASIC Design Flow: flow is referred to as RTL2GDSII flow and the process to generate GDSII is termed as tapeout.The ASIC digital flow is divided into Logical & Physical flow i.e. the Frontend and Backend.

Logical design

A- RTL Design

Specification >>System Architecture >> RTL Design>> Functional Verification

The flow starts with High-Level design Specification, the designer puts specification for Area, Speed and Power requirements.Then the designer starts setting Chip Architecture.

RTL, Register Transfer Level, describing the functional behavior using HDL, hardware description languages, VHDL, Verilog or SystemVerilog .

Functional Verification, verifiing the functionality using simulation.

B- Synthesis

Synthesis >>DFT >> Equivalence Checking >> Static Timing Analysis

Synthesis, the first step of converting the RTL to gate netlist based on timing, power and area constraints,DFT, this step is for preparing the design for testability.

Scan insertion is a common technique that helps to make all registers in the design controllable and observable.

Equivalence Checking, this step is for verifying the functionality of gate netlist against the RTL description using formal verification techniques.

STA, static timing analysis, a method of checking the ability of the design to meet the timing requirements statically without simulation.

The designer is responsible of specifying 'Timing Constraints' to model how the design needs to be constrained & the STA tools check that the design meets the timing requirements.

The designer uses an industry standard format 'SDC' Synopsys Design Constraints.

STA on this stage acts as the bridge between logical and physical design

Physical design

A-Layout

Floor Planning >>Placement >> Clock Tree Synthesis >> Routing

The flow starts with Floor Planning, the logical blocks of the design are placed considering many optimization factors to account for Area, Speed and Power.Then Placement occurs where the connections between blocks are routed.Placement is followed by Clock Tree Synthesis to distribute the clock and reduce clock skew between different parts of the design.Then Routing the design is the final step to generate the layout.

During the physical design, STA may be done multiple times to perform a more accurate timing analysis.

B-Tapeout

LVS >>DRC >> Signoff STA >> GDSII release

Two steps are needed to verify the layout.LVS, Layout versus Netlist, matching the layout with the netlist generated after synthesis.

DRC, Design Rule Checking, All rules laid out by the foundry where it will be fabricated into a chip are adhered.then Signoff Static Timing Analysis is performed .Finally, GDSII release, Fabs manufacture chips based on the GDSII.

Timing Constraints

From timing perspective, the designer creates timing constraints for synthesis which are a series of constraints applied to a given set of paths or nets that dictate the desired performance of a design. Constraints may be period, frequency, net skew, maximum delay between end points, or maximum net delay.. The designer uses an industry standard format 'SDC' Synopsys Design Constraints.

Static Timing Analysis

EDA tools check setup, hold and removal constraints, clock gating constraints, maximum frequency and any other design rules. They take design netlist, timing libraries, delay information and timing constraints as Inputs to perform static timing analysis.

Static Timing Analysis is a method for determining if a circuit meets timing constraints without having to simulate so it is much faster than timing-driven, gate-level simulation. STA as well as Equivalence checking are performed in many steps in Digital design flow, after synthesis, scan, placement, clock tree synthesis or routing.

Timing Paths

The paths are:-

Data path

Clock path

Clock gating path

Asynchronous path.

1)how to solve setup and hold violations in the design

To address setup time violations, you can:

Use larger/stronger cells to drive paths with high capacitance, which can reduce the time needed to transition on sluggish net.

Adjust the skew of the clock to the start or endpoint of the path which is violating. (time borrowing).

Move gates around to make the total distance between different cells in the violating path smaller (less capacitance to drive = faster transitions)

Insert retiming flops on the path, if the design will allow for it (try to do an operation in two clock cycles instead of one)

Reduce the overall clock frequency.

For hold time violations:

·Skew the clock to the start/endpoint (reverse of how to fix setup) to make the endpoint clock arrive earlier.

Insert cells along the path to increase the propogation time (insert chains of buffers).

Reduce the drive strength of cells on the path to make the transition time increase.

There are four timing paths :

Input to Register path

Input to Output path

We also can divide timing constraints into 3 categories:

Clocking Requirements

Boundaries Settings

Timing Exceptions

Clocks

Clock need to be defined as follow:

Clock Source, maybe "Port", "Net" or "Pin" or "Virtual"

Clock Period

Duty Cycle

Clock Skew, Uncertainty

Clock Latency, due to clock tree propagation

Rise & Fall time

Example on defining clocks

// clock A 10ns with 70% duty cycle

create_clock -period 10 -name ClkA -waveform {0 7} [get_ports A]

// clock B 20ns with 50% duty cycle and phase 5 ns w.r.t clock A

create_clock -period 20 -name ClkA -waveform {5 15} [get_ports A]

Port Delays

The timing constraints is applied on input and output ports.The main target is to leave a budget in time for the signal outside the block. The designer should specify the time at which the inputs would be available on the block and should specify the time for which a signal travels outside the block for outputs.

Input Delay

Input arrival time should be considered in timing constraints as described in the following example

# assume that T_CLKtoQ+TM = 10ns

set_input_delay -clock CLOCK -max 10 [get_ports D}

Output Delay

Output required time should be considered in timing constraints as described in the following example

# assume that TN+T_setup = 2ns

set_output_delay -clock CLOCK -max 10 [get_ports D}

Combinational Paths

Sometimes there are some input/output paths which are completely combinational. In such cases the designer uses set_max_delay&set_min_delay constraints.

False Paths

The first timing exception is the False Paths where the changes in the source inputs, registers are not affecting the destination. So we make this timing exception.

The changes on {a} and {b} are not affecting {c_d}. Setting this path as a False Path makes the Synthesis tool not to constraint this path and for STA tool to ignore any violations on this path.

set_false_path -from [get_ports {a b}] -to [get_ports {c_d}]

Multi Cycle Paths

The second timing exception is for multi-cycle paths. Sometimes a designer might need to provide some additional cycles before the data is to be captured. If there is a Multicycle path then It doesn’t limit the system frequency and we make another timing exception.

The slow logic path is relaxed and waits 2 cycles to affect the output. The requirements for timing the output for Setup and Hold of {FF4} are moved 2 more cycles to reach {FF5}.

set_multicycle_path -setup 2 -from [get_ports FF4] -to [get_cells FF5]

set_multicycle_path -hold 1 -from [get_ports FF4] -to [get_cells FF5]

If hold violation exists in design, is it OK to sign off design? If not, why?

How are timing constraints developed?

Explain timing closure flow/methodology/issues/fixes.

Explain SDF (Standard Delay Format) back annotation/ SPEF (Standard Parasitic Exchange Format) timing correlation flow.

Given a timing path in multi-mode multi-corner, how is STA (Static Timing Analysis) performed in order to meet timing in both modes and corners, how are PVT (Process-Voltage-Temperature)/derate factors decided and set in the Primetime flow?

With respect to clock gate, what are various issues you faced at various stages in the physical design flow?

What are synthesis strategies to optimize timing?

Explain ECO (Engineering Change Order) implementation flow. Given post routed database and functional fixes, how will you take it to implement ECO (Engineering Change Order) and what physical and functional checks you need to perform?

In building the timing constraints, do you need to constrain all IO (Input-Output) ports?

Can a single port have multi-clocked? How do you set delays for such ports?

How is scan DEF (Design Exchange Format) generated?

What is purpose of lockup latch in scan chain?

Explain short circuit current.

Short Circuit Power

What are pros/cons of using low Vt, high Vt cells?

Multi Threshold Voltage Technique

Issues With Multi Height Cell Placement in Multi Vt Flow

How do you set inter clock uncertainty?

set_clock_uncertainty –from clock1 -to clock2

In DC (Design Compiler), how do you constrain clocks, IO (Input-Output) ports, maxcap, max tran?

What are differences in clock constraints from pre CTS (Clock Tree Synthesis) to post CTS (Clock Tree Synthesis)?

Difference in clock uncertainty values; Clocks are propagated in post CTS.

In post CTS clock latency constraint is modified to model clock jitter.

How is clock gating done?

What constraints you add in CTS (Clock Tree Synthesis) for clock gates?

Make the clock gating cells as through pins.

What is trade off between dynamic power (current) and leakage power (current)?

Leakage Power Trends

Dynamic Power

How do you reduce standby (leakage) power?

Low Power Design Techniques

Explain top level pin placement flow? What are parameters to decide?

Given block level netlists, timing constraints, libraries, macro LEFs (Layout Exchange Format/Library Exchange Format), how will you start floor planning?

With net length of 1000um how will you compute RC values, using equations/tech file info?

What do noise reports represent?

What does glitch reports contain?

What are CTS (Clock Tree Synthesis) steps in IC compiler?

What do clock constraints file contain?

How to analyze clock tree reports?

What do IR drop Voltagestorm reports represent?

Where /when do you use DCAP (Decoupling Capacitor) cells?

What are various power reduction techniques?

What is setup/hold? What are setup and hold time impacts on timing? How will you fix setup and hold violations?

Explain function of Muxed FF (Multiplexed Flip Flop) /scan FF (Scal Flip Flop).

What are tested in DFT (Design for Testability)?

In equivalence checking, how do you handle scanen signal?

In terms of CMOS (Complimentary Metal Oxide Semiconductor), explain physical parameters that affect the propagation delay?

What are power dissipation components? How do you reduce them?

How delay affected by PVT (Process-Voltage-Temperature)?

Process-Voltage-Temperature (PVT) Variations and Static Timing Analysis (STA)

Why is power signal routed in top metal layers?

How do you minimize clock skew/ balance clock tree?

Given 11 minterms and asked to derive the logic function.

Given C1= 10pf, C2=1pf connected in series with a switch in between, at t=0 switch is open and one end having 5v and other end zero voltage; compute the voltage across C2 when the switch is closed?

Explain the modes of operation of CMOS (Complimentary Metal Oxide Semiconductor) inverter? Show IO (Input-Output) characteristics curve.

Implement a ring oscillator.

How to slow down ring oscillator?

How do you optimize power at various stages in the physical design flow?

What timing optimization strategies you employ in pre-layout /post-layout stages?

What are process technology challenges in physical design?

Design divide by 2, divide by 3, and divide by 1.5 counters. Draw timing diagrams.

What are multi-cycle paths, false paths? How to resolve multi-cycle and false paths?

Given a flop to flop path with combo delay in between and output of the second flop fed back to combo logic. Which path is fastest path to have hold violation and how will you resolve?

What are RTL (Register Transfer Level) coding styles to adapt to yield optimal backend design?

Draw timing diagrams to represent the propagation delay, set up, hold, recovery, removal, minimum pulse width.

Cell delay calculation,net delay calculation

output transition calculation

uncertainty

false path,MCP

LATENCY,SKEW,JITTER

How to minimize jitter

min delay max delay

input delay output delay

Best case,worst case

OCV.AOCV

Exceptions

disable timing

Timing arcs

Derate

CRPR

Clock gating

Multi frequency clocks

Static verification flow

inputs to STA

Slack anlaysis

Setup,hold analysis

stages of STA

Timing reports

Data arrival,data required

setup condition

Hold Condition

clock latency

source latency

network latency

path from slow to fast clock

path from fast to slow clock

Half cycle path

sdc constraints

variation sources

Global process variations

Local variations.how in STA?

Glitch detection.

What are the inputs you get for Block level Physical Design?

Netlist (.v /.vhd)

Timing Libraries (.lib/.db)

Library Exchange Format (LEF)

Technology files (.tf/.tech.lef)

Constrains (SDC)

Power Specification File

Clock Tree Constrains

Optimization requirements

IO Ports file

Floorplan file

What are the different checks you do on the Input Netlist.

Floating Pins

Unconstrained pins

Undriven input ports

Unloaded output ports

Pin direction mismatches

Multiple Drivers

Zero wire load Timing checks

Issues with respect to the Library file, Timing Constraints, IOs and Optimization requirements.

How to do macro Placement in a block

Analyse the fly-line for connectivity between Macros to Macros and between the Macros to IO ports.

Group and Place the same hierarchy Macros together.

Calculate/Estimate the Channel length required between Macros.

Avoid odd shapes

Place macros around the block periphery, so that core area will have common logic.

Keep enough room around Macros for IO routing.

Give necessary blockages around the Macros like Halo around the macros.

What are the issues you see if floorplan is bad.

Congestion near Macro corners due to insufficient placement blockage.

Standard cell placement in narrow channels led to congestion.

Macros of same partition which are placed far apart can cause timing violation.

What are different optimization techniques?

Cell Sizing: Size up or down to meet timing/area.

Vt Swapping

Cloning: fanout reduction

Buffering: Buffers are added in the middle of long net paths to reduce the delay.

Logical restructuring: Breaking complex cells to simpler cells or vice versa

Pin swapping

What are the inputs for the CTS.

CTS SDC

Max Skew

Max and Min Insertion Delay

Max Transition, Capacitance, Fanout

No of Buffer levels

Buffer/Inverter list

Clock Tree Routing Metal Layers

Clock tree Root pin, Leaf Pin, Preserve pin, through pin and exclude pin

What is Metal Fill

Metal Density Rule helps to avoid Over Etching or Metal Erosion.

Fill the empty metal tracks with metal shapes to meet the metal density rules.

There are two types of Metal Fill

Floating Metal Fill: Does not completely shield the aggressor nets, so SI will be there.

Grounded Metal Fill: Completely shield the aggressor nets, less SI

Why the Metal Fill is required

If there is lot of gap between the routed metal layers (empty tracks), during the process of Etching the etching material used will fall more in this gap due to which Over Etching of existing metal occurs which may create opens. So in order to have uniform Metal Density across the chip, Dummy Metal is added in these empty tracks.

What are the reasons for routing congestion

Inefficient floorplan

Macro placement or macro channels is not proper.

Placement blockages not given

No Macro to Macro channel space given.

High cell density

High local utilization

High number of complex cells like AOI/OAI cells which has more pin count are placed together.

Placement of std cells near macros

Logic optimization is not properly done.

Pin density is more on edge of block

Buffers added too many while optimization

IO ports are crisscrossed, it needs to be properly aligned in order.

What are the different methods to reduce congestion.

Review the floorplan/macro placements according to the block size and port placement.

Add proper placement blockages in channels and around the macro boundaries.

Reduce the local density using the percentage utilization/density screens.

Cell padding is applied for high pin density cells, like AOI/OAI.

Check and reorder scan chain if needed.

Run the congestion driven placement with high effort.

Check the power network is proper and on routing tract. If it is not on track, adjacent routing tracts may not be used, so it might lead to congestion

Why power stripes routed in the top metal layers?

The resistivity of top metal layers are less and hence less IR drop is seen in power distribution network. If power stripes are routed in lower metal layers this will use good amount of lower routing resources and therefore it can create routing congestion.

Why do you use alternate routing approach HVH/VHV (Horizontal-Vertical-Horizontal/ Vertical-Horizontal-Vertical)?

This approach allows routability of the design and better usage of routing resources.

What are several factors to improve propagation delay of standard cell?

Improve the input transition to the cell under consideration by up sizing the driver.

Reduce the load seen by the cell under consideration, either by placement refinement or buffering.

If allowed increase the drive strength or replace with LVT (low threshold voltage) cell.

How do you compute net delay (interconnect delay) / decode RC values present in tech file?

What are various ways of timing optimization in synthesis tools?

Logic optimization: buffer sizing, cell sizing, level adjustment, dummy buffering etc.

Less number of logics between Flip Flops speedup the design.

Optimize drive strength of the cell , so it is capable of driving more load and hence reducing the cell delay.

Better selection of design ware component (select timing optimized design ware components).

Use LVT (Low threshold voltage) and SVT (standard threshold voltage) cells if allowed.

What would you do in order to not use certain cells from the library?

Set don’t use attribute on those library cells.

How delays are characterized using WLM (Wire Load Model)?

For a given wireload model the delay are estimated based on the number of fanout of the cell driving the net.

Fanout vs net length is tabulated in WLMs.

Values of unit resistance R and unit capacitance C are given in technology file.

Net length varies based on the fanout number.

Once the net length is known delay can be calculated; Sometimes it is again tabulated.

What are various techniques to resolve congestion/noise?

Routing and placement congestion all depend upon the connectivity in the netlist , a better floor plan can reduce the congestion.

Noise can be reduced by optimizing the overlap of nets in the design.

Let’s say there enough routing resources available, timing is fine, can you increase clock buffers in clock network? If so will there be any impact on other parameters?

No. You should not increase clock buffers in the clock network. Increase in clock buffers cause more area , more power. When everything is fine why you want to touch clock tree??

How do you optimize skew/insertion delays in CTS (Clock Tree Synthesis)?

Better skew targets and insertion delay values provided while building the clocks.

Choose appropriate tree structure – either based on clock buffers or clock inverters or mix of clock buffers or clock inverters.

For multi clock domain, group the clocks while building the clock tree so that skew is balanced across the clocks. (Inter clock skew analysis).

What are pros/cons of latch/FF (Flip Flop)?

How you go about fixing timing violations for latch- latch paths?

As an engineer, let’s say your manager comes to you and asks for next project die size estimation/projection, giving data on RTL size, performance requirements.

How do you go about the figuring out and come up with die size considering physical aspects?

How will you design inserting voltage island scheme between macro pins crossing core and are at different power wells? What is the optimal resource solution?

What are various formal verification issues you faced and how did you resolve?

How do you calculate maximum frequency given setup, hold, clock and clock skew?

What are effects of metastability?

Metastability

Consider a timing path crossing from fast clock domain to slow clock domain. How do you design synchronizer circuit without knowing the source clock frequency?

How to solve cross clock timing path?

How to determine the depth of FIFO/ size of the FIFO?

FIFO Depth

What are the challenges you faced in place and route, FV (Formal Verification), ECO (Engineering Change Order) areas?

How long the design cycle for your designs?

What part are your areas of interest in physical design?

Explain ECO (Engineering Change Order) methodology.

Explain CTS (Clock Tree Synthesis) flow.

What kind of routing issues you faced?

How does STA (Static Timing Analysis) in OCV (On Chip Variation) conditions done? How do you set OCV (On Chip Variation) in IC compiler? How is timing correlation done before and after place and route?

Process-Voltage-Temperature (PVT) Variations and Static Timing Analysis (STA)

If there are too many pins of the logic cells in one place within core, what kind of issues would you face and how will you resolve?

Define hash/ @array in perl.

Using TCL (Tool Command Language, Tickle) how do you set variables?

What is ICC (IC Compiler) command for setting derate factor/ command to perform physical synthesis?

What are nanoroute options for search and repair?

What were your design skew/insertion delay targets?

How is IR drop analysis done? What are various statistics available in reports?

Explain pin density/ cell density issues, hotspots?

How will you relate routing grid with manufacturing grid and judge if the routing grid is set correctly?

What is the command for setting multi cycle path?

If hold violation exists in design, is it OK to sign off design? If not, why?

How are timing constraints developed?

Explain timing closure flow/methodology/issues/fixes.

Explain SDF (Standard Delay Format) back annotation/ SPEF (Standard Parasitic Exchange Format) timing correlation flow.

With respect to clock gate, what are various issues you faced at various stages in the physical design flow?

What are synthesis strategies to optimize timing?

In building the timing constraints, do you need to constrain all IO (Input-Output) ports?

Can a single port have multi-clocked? How do you set delays for such ports?

How is scan DEF (Design Exchange Format) generated?

What is purpose of lockup latch in scan chain?

Explain short circuit current.

Short Circuit Power

What are pros/cons of using low Vt, high Vt cells?

Multi Threshold Voltage Technique

Issues With Multi Height Cell Placement in Multi Vt Flow

How do you set inter clock uncertainty?

set_clock_uncertainty –from clock1 -to clock2

In DC (Design Compiler), how do you constrain clocks, IO (Input-Output) ports, maxcap, max tran?

What are differences in clock constraints from pre CTS (Clock Tree Synthesis) to post CTS (Clock Tree Synthesis)?

Difference in clock uncertainty values; Clocks are propagated in post CTS.

In post CTS clock latency constraint is modified to model clock jitter.

How is clock gating done?

What constraints you add in CTS (Clock Tree Synthesis) for clock gates?

Make the clock gating cells as through pins.

What is trade off between dynamic power (current) and leakage power (current)?

Leakage Power Trends

Dynamic Power

How do you reduce standby (leakage) power?

Low Power Design Techniques

Explain top level pin placement flow? What are parameters to decide?

Given block level netlists, timing constraints, libraries, macro LEFs (Layout Exchange Format/Library Exchange Format), how will you start floor planning?

With net length of 1000um how will you compute RC values, using equations/tech file info?

What do noise reports represent?

What does glitch reports contain?

What are CTS (Clock Tree Synthesis) steps in IC compiler?

What do clock constraints file contain?

How to analyze clock tree reports?

What do IR drop Voltagestorm reports represent?

Where /when do you use DCAP (Decoupling Capacitor) cells?

What are various power reduction techniques?

Low Power Design Techniques

What is setup/hold? What are setup and hold time impacts on timing? How will you fix setup and hold violations?

Explain function of Muxed FF (Multiplexed Flip Flop) /scan FF (Scal Flip Flop).

What are tested in DFT (Design for Testability)?

In equivalence checking, how do you handle scanen signal?

In terms of CMOS (Complimentary Metal Oxide Semiconductor), explain physical parameters that affect the propagation delay?

What are power dissipation components? How do you reduce them?

Short Circuit Power

Leakage Power Trends

Dynamic Power

Low Power Design Techniques

How delay affected by PVT (Process-Voltage-Temperature)?

Process-Voltage-Temperature (PVT) Variations and Static Timing Analysis (STA)

Why is power signal routed in top metal layers?

How do you minimize clock skew/ balance clock tree?

Given 11 minterms and asked to derive the logic function.

Given C1= 10pf, C2=1pf connected in series with a switch in between, at t=0 switch is open and one end having 5v and other end zero voltage; compute the voltage across C2 when the switch is closed?

Explain the modes of operation of CMOS (Complimentary Metal Oxide Semiconductor) inverter? Show IO (Input-Output) characteristics curve.

Implement a ring oscillator.

How to slow down ring oscillator?

How do you optimize power at various stages in the physical design flow?

What timing optimization strategies you employ in pre-layout /post-layout stages?

What are process technology challenges in physical design?

Design divide by 2, divide by 3, and divide by 1.5 counters. Draw timing diagrams.

What are multi-cycle paths, false paths? How to resolve multi-cycle and false paths?

Given a flop to flop path with combo delay in between and output of the second flop fed back to combo logic. Which path is fastest path to have hold violation and how will you resolve?

What are RTL (Register Transfer Level) coding styles to adapt to yield optimal backend design?

Draw timing diagrams to represent the propagation delay, set up, hold, recovery, removal, minimum pulse width.

1. You might want to include things like what are the different ways of fixing antenna violations
-- there are about 4 methods

2. Why non-leaf clock nets are routed on top-most layers

3. Does jitter effect setup/hold paths?

BACKEND DESIGN INTERVIEWS

* What is signal integrity? How it affects Timing?
* What is IR drop? How to avoid .how it affects timing?
* What is EM and it effects?
* What is floor plan and power plan?
* What are types of routing?
* What is a grid .why we need and different types of grids?
* What is core and how u will decide w/h ratio for core?
* What is effective utilization and chip utilization?
* What is latency? Give the types?
* What is LEF?
* What is DEF?
* What are the steps involved in designing an optimal pad ring?
* What are the steps that you have done in the design flow?
* What are the issues in floor plan?
* How can you estimate area of block?
* How much aspect ratio should be kept (or have you kept) and what is the utilization?
* How to calculate core ring and stripe widths?
* What if hot spot found in some area of block? How you tackle this?
* After adding stripes also if you have hot spot what to do?
* What is threshold voltage? How it affect timing?
* What is content of lib, lef, sdc?
* What is meant my 9 track, 12 track standard cells?
* What is scan chain? What if scan chain not detached and reordered? Is it compulsory?
* What is setup and hold? Why there are ? What if setup and hold violates?
* In a circuit, for reg to reg path ...Tclktoq is 50 ps, Tcombo 50ps, Tsetup 50ps, tskew is 100ps. Then what is the maximum operating frequency?
* How R and C values are affecting time?
* How ohm (R), fared (C) is related to second (T)?
* What is transition? What if transition time is more?
* What is difference between normal buffer and clock buffer?
* What is antenna effect? How it is avoided?
* What is ESD?
* What is cross talk? How can you avoid?
* How double spacing will avoid cross talk?
* What is difference between HFN synthesis and CTS?
* What is hold problem? How can you avoid it?
* For an iteration we have 0.5ns of insertion delay and 0.1 skew and for other iteration 0.29ns insertion delay and 0.25 skew for the same circuit then which one you will select? Why?
* What is partial floor plan?
* What parameters (or aspects) differentiate Chip Design & Block level design??
* How do you place macros in a full chip design?
* Differentiate between a Hierarchical Design and flat design?
* Which is more complicated when u have a 48 MHz and 500 MHz clock design?
* Name few tools which you used for physical verification?
* What are the input files will you give for primetime correlation?
* What are the algorithms used while routing? Will it optimize wire length?
* How will you decide the Pin location in block level design?
* If the routing congestion exists between two macros, then what will you do?
* How will you place the macros?
* How will you decide the die size?
* If lengthy metal layer is connected to diffusion and poly, then which one will affect by antenna problem?
* If the full chip design is routed by 7 layer metal, why macros are designed using 5LM instead of using 7LM?
* In your project what is die size, number of metal layers, technology, foundry, number of clocks?
* How many macros in your design?
* What is each macro size and no. of standard cell count?
* How did u handle the Clock in your design?
* What are the Input needs for your design?
* What is SDC constraint file contains?
* How did you do power planning?
* How to find total chip power?
* How to calculate core ring width, macro ring width and strap or trunk width?
* How to find number of power pad and IO power pads?
* What are the problems faced related to timing?
* How did u resolve the setup and hold problem?
* If in your design 10000 and more numbers of problems come, then what you will do?
* In which layer do you prefer for clock routing and why?
* If in your design has reset pin, then it’ll affect input pin or output pin or both?
* During power analysis, if you are facing IR drop problem, then how did u avoid?
* Define antenna problem and how did u resolve these problem?
* How delays vary with different PVT conditions? Show the graph.
* Explain the flow of physical design and inputs and outputs for each step in flow.
* What is cell delay and net delay?
* What are delay models and what is the difference between them?
* What is wire load model?
* What does SDC constraints has?
* Why higher metal layers are preferred for Vdd and Vss?
* What is logic optimization and give some methods of logic optimization.
* What is the significance of negative slack?
* How the width of metal and number of straps calculated for power and ground?
* What is negative slack ? How it affects timing?
* What is track assignment?
* What is grided and gridless routing?
* What is a macro and standard cell?
* What is congestion?
* Whether congestion is related to placement or routing?
* What are clock trees?
* What are clock tree types?
* Which layer is used for clock routing and why?
* What is cloning and buffering?
* What are placement blockages?
* How slow and fast transition at inputs effect timing for gates?
* What is antenna effect?
* What are DFM issues?
* What is .lib, LEF, DEF, .tf?
* What is the difference between synthesis and simulation?
* What is metal density, metal slotting rule?
* What is OPC, PSM?
* Why clock is not synthesized in DC?
* What are high-Vt and low-Vt cells?
* What corner cells contains?
* What is the difference between core filler cells and metal fillers?
* How to decide number of pads in chip level design?
* What is tie-high and tie-low cells and where it is used

Physical Design Questions and Answers

What parameters (or aspects) differentiate Chip Design and Block level design?

Chip design has I/O pads; block design has pins.

Chip design uses all metal layes available; block design may not use all metal layers.

Chip is generally rectangular in shape; blocks can be rectangular, rectilinear.

Chip design requires several packaging; block design ends in a macro.

How do you place macros in a full chip design?

First check flylines i.e. check net connections from macro to macro and macro to standard cells.

If there is more connection from macro to macro place those macros nearer to each other preferably nearer to core boundaries.

If input pin is connected to macro better to place nearer to that pin or pad.

If macro has more connection to standard cells spread the macros inside core.

Avoid criscross placement of macros.

Use soft or hard blockages to guide placement engine.

Differentiate between a Hierarchical Design and flat design?

Hierarchial design has blocks, subblocks in an hierarchy; Flattened design has no subblocks and it has only leaf cells.

Hierarchical design takes more run time; Flattened design takes less run time.

Which is more complicated when u have a 48 MHz and 500 MHz clock design?

500 MHz; because it is more constrained (i.e.lesser clock period) than 48 MHz design.

Name few tools which you used for physical verification?

Herculis from Synopsys, Caliber from Mentor Graphics.

What are the input files will you give for primetime correlation?

Netlist, Technology library, Constraints, SPEF or SDF file.

If the routing congestion exists between two macros, then what will you do?

Provide soft or hard blockage

How will you decide the die size?

By checking the total area of the design you can decide die size.

If lengthy metal layer is connected to diffusion and poly, then which one will affect by antenna problem?

Poly

If the full chip design is routed by 7 layer metal, why macros are designed using 5LM instead of using 7LM?

Because top two metal layers are required for global routing in chip design. If top metal layers are also used in block level it will create routing blockage.

In your project what is die size, number of metal layers, technology, foundry, number of clocks?

Die size: tell in mm eg. 1mm x 1mm ; remeber 1mm=1000micron which is a big size !!

Metal layers: See your tech file. generally for 90nm it is 7 to 9.

Technology: Again look into tech files.

Foundry:Again look into tech files; eg. TSMC, IBM, ARTISAN etc

Clocks: Look into your design and SDC file !

How many macros in your design?

You know it well as you have designed it ! A SoC (System On Chip) design may have 100 macros also !!!!

What is each macro size and number of standard cell count?

Depends on your design.

What are the input needs for your design?

For synthesis: RTL, Technology library, Standard cell library, Constraints

For Physical design: Netlist, Technology library, Constraints, Standard cell library

What is SDC constraint file contains?

Clock definitions

Timing exception-multicycle path, false path

Input and Output delays

How did you do power planning? How to calculate core ring width, macro ring width and strap or trunk width? How to find number of power pad and IO power pads? How the width of metal and number of straps calculated for power and ground?

Get the total core power consumption; get the metal layer current density value from the tech file; Divide total power by number sides of the chip; Divide the obtained value from the current density to get core power ring width. Then calculate number of straps using some more equations. Will be explained in detail later.

How to find total chip power?

Total chip power=standard cell power consumption,Macro power consumption pad power consumption.

What are the problems faced related to timing?

Prelayout: Setup, Max transition, max capacitance

Post layout: Hold

How did you resolve the setup and hold problem?

Setup: upsize the cells

Hold: insert buffers

In which layer do you prefer for clock routing and why?

Next lower layer to the top two metal layers(global routing layers). Because it has less resistance hence less RC delay.

If in your design has reset pin, then it’ll affect input pin or output pin or both?

Output pin.

During power analysis, if you are facing IR drop problem, then how did you avoid?

Increase power metal layer width.

Go for higher metal layer.

Spread macros or standard cells.

Provide more straps.

Define antenna problem and how did you resolve these problem?

Increased net length can accumulate more charges while manufacturing of the device due to ionisation process. If this net is connected to gate of the MOSFET it can damage dielectric property of the gate and gate may conduct causing damage to the MOSFET. This is antenna problem.

Decrease the length of the net by providing more vias and layer jumping.

Insert antenna diode.

How delays vary with different PVT conditions? Show the graph.

P increase->dealy increase

P decrease->delay decrease

V increase->delay decrease

V decrease->delay increase

T increase->delay increase

T decrease->delay decrease

Explain the flow of physical design and inputs and outputs for each step in flow.

Click here to see the flow diagram

What is cell delay and net delay?

Gate delay

Transistors within a gate take a finite time to switch. This means that a change on the input of a gate takes a finite time to cause a change on the output.[Magma]

Gate delay =function of(i/p transition time, Cnet+Cpin).

Cell delay is also same as Gate delay.

Cell delay

For any gate it is measured between 50% of input transition to the corresponding 50% of output transition.

Intrinsic delay

Intrinsic delay is the delay internal to the gate. Input pin of the cell to output pin of the cell.

It is defined as the delay between an input and output pair of a cell, when a near zero slew is applied to the input pin and the output does not see any load condition.It is predominantly caused by the internal capacitance associated with its transistor.

This delay is largely independent of the size of the transistors forming the gate because increasing size of transistors increase internal capacitors.

Net Delay (or wire delay)

The difference between the time a signal is first applied to the net and the time it reaches other devices connected to that net.

It is due to the finite resistance and capacitance of the net.It is also known as wire delay.

Wire delay =fn(Rnet , Cnet+Cpin)

What are delay models and what is the difference between them?

Linear Delay Model (LDM)

Non Linear Delay Model (NLDM)

What is wire load model?

Wire load model is NLDM which has estimated R and C of the net.

Why higher metal layers are preferred for Vdd and Vss?

Because it has less resistance and hence leads to less IR drop.

What is logic optimization and give some methods of logic optimization.

Upsizing

Downsizing

Buffer insertion

Buffer relocation

Dummy buffer placement

What is the significance of negative slack?

negative slack==> there is setup voilation==> deisgn can fail

What is signal integrity? How it affects Timing?

IR drop, Electro Migration (EM), Crosstalk, Ground bounce are signal integrity issues.

If Idrop is more==>delay increases.

crosstalk==>there can be setup as well as hold voilation.

What is IR drop? How to avoid? How it affects timing?

There is a resistance associated with each metal layer. This resistance consumes power causing voltage drop i.e.IR drop.

If IR drop is more==>delay increases.

What is EM and it effects?

Due to high current flow in the metal atoms of the metal can displaced from its origial place. When it happens in larger amount the metal can open or bulging of metal layer can happen. This effect is known as Electro Migration.

Affects: Either short or open of the signal line or power line.

What are types of routing?

Global Routing

Track Assignment

Detail Routing

What is latency? Give the types?

Source Latency

It is known as source latency also. It is defined as "the delay from the clock origin point to the clock definition point in the design".

Delay from clock source to beginning of clock tree (i.e. clock definition point).

The time a clock signal takes to propagate from its ideal waveform origin point to the clock definition point in the design.

Network latency

It is also known as Insertion delay or Network latency. It is defined as "the delay from the clock definition point to the clock pin of the register".

The time clock signal (rise or fall) takes to propagate from the clock definition point to a register clock pin.

What is track assignment?

Second stage of the routing wherein particular metal tracks (or layers) are assigned to the signal nets.

What is congestion?

If the number of routing tracks available for routing is less than the required tracks then it is known as congestion.

Whether congestion is related to placement or routing?

Routing

What are clock trees?

Distribution of clock from the clock source to the sync pin of the registers.

What are clock tree types?

H tree, Balanced tree, X tree, Clustering tree, Fish bone

What is cloning and buffering?

Cloning is a method of optimization that decreases the load of a heavily loaded cell by replicating the cell.

Buffering is a method of optimization that is used to insert beffers in high fanout nets to decrease the dealy.

What is the difference between soft macro and hard macro?

What is the difference between hard macro, firm macro and soft macro?

What are IPs?

Hard macro, firm macro and soft macro are all known as IP (Intellectual property). They are optimized for power, area and performance. They can be purchased and used in your ASIC or FPGA design implementation flow. Soft macro is flexible for all type of ASIC implementation. Hard macro can be used in pure ASIC design flow, not in FPGA flow. Before bying any IP it is very important to evaluate its advantages and disadvantages over each other, hardware compatibility such as I/O standards with your design blocks, reusability for other designs.

Soft macros

Soft macros are in synthesizable RTL.Soft macros are more flexible than firm or hard macros.

Soft macros are not specific to any manufacturing process.Soft macros have the disadvantage of being somewhat unpredictable in terms of performance, timing, area, or power.Soft macros carry greater IP protection risks because RTL source code is more portable and therefore, less easily protected than either a netlist or physical layout data.

From the physical design perspective, soft macro is any cell that has been placed and routed in a placement and routing tool such as Astro. (This is the definition given in Astro Rail user manual !).Soft macros are editable and can contain standard cells, hard macros, or other soft macros.

Firm macros

Firm macros are in netlist format.Firm macros are optimized for performance/area/power using a specific fabrication technology.

Firm macros are more flexible and portable than hard macros.Firm macros are predictive of performance and area than soft macros.

Hard macro

Hard macros are generally in the form of hardware IPs (or we termed it as hardwre IPs !).Hard macos are targeted for specific IC manufacturing technology.Hard macros are block level designs which are silicon tested and proved.Hard macros have been optimized for power or area or timing.

In physical design you can only access pins of hard macros unlike soft macros which allows us to manipulate in different way.You have freedom to move, rotate, flip but you can't touch anything inside hard macros.

Very common example of hard macro is memory. It can be any design which carries dedicated single functionality (in general).. for example it can be a MP4 decoder.Be aware of features and characteristics of hard macro before you use it in your design... other than power, timing and area you also should know pin properties like sync pin, I/O standards etcLEF, GDS2 file format allows easy usage of macros in different tools.

From the physical design (backend) perspective:

Hard macro is a block that is generated in a methodology other than place and route (i.e. using full custom design methodology) and is brought into the physical design database (eg. Milkyway in Synopsys; Volcano in Magma) as a GDS2 file.

Synthesis and placement of macros in modern SoC designs are challenging. EDA tools employ different algorithms accomplish this task along with the target of power and area. There are several research papers available on these subjects. Some of them can be downloaded from the given link below.

What is difference between normal buffer and clock buffer?

Clock net is one of the High Fanout Net(HFN)s. The clock buffers are designed with some special property like high drive strength and less delay. Clock buffers have equal rise and fall time. This prevents duty cycle of clock signal from changing when it passes through a chain of clock buffers.

Normal buffers are designed with W/L ratio such that sum of rise time and fall time is minimum. They too are designed for higher drive strength.

What is difference between HFN synthesis and CTS?

HFNs are synthesized in front end also.... but at that moment no placement information of standard cells are available... hence backend tool collapses synthesized HFNs. It resenthesizes HFNs based on placement information and appropriately inserts buffer. Target of this synthesis is to meet delay requirements i.e. setup and hold.

For clock no synthesis is carried out in front end because no placement information of flip-flops So synthesis won't meet true skew targets . in backend clock tree synthesis tries to meet "skew" targets...It inserts clock buffers (which have equal rise and fall time, unlike normal buffers !)... There is no skew information for any HFNs.

LEC is significant as the designers need to compare synthesis netlist (revised design) with RTL (reference design), to make sure synthesis optimization and scan insertion do not alter designers’ intent.Physical designers need to compare PnR netlist (revised design) with synthesis netlist (reference design), to make sure the PnR results and timing ECOs do not change synthesis netlist functionalities.Designers may perform ECOs for new feature addition and bug fix. There should not be any mismatch between ECO ed RTL and synthesis netlist.The most common LEC tools include Cadence Conformal LEC, and Synopsys Formality.

Formal verification is same as Logic equivalence checking (LEC) for which the tools are formality by Synopsys and Conformal LEC by cadence. LEC is for RTL vs NETLIST comparison.Formal verification is for property check.

Formal verification can be classified into 2 types:

1. Logic equivalent.

2. Property check.

LEC steps

In LEC setup, there are several steps that designers need to follow:

Read and elaborate reference design

Read and elaborate revised design

Specify “notranslate modules” for blackboxing if the module is a macro, or the module has been LECed in block level.Set certain constraints, for example, when comparing between RTL and synthesis netlist, set case analysis to ignore scan ports.LEC key point mapping is the next step.

In this step, Conformal LEC will first identify primary inputs and outputs, DFF, Latch, Blackboxes, etc. as the key points in both reference design and revised design; then pair corresponding reference and revised key points, and perform LEC comparison after that.

Mapping methods:

There are 2 LEC mapping method, i.e., name-based and function-based. By default, Conformal LEC will first do name-based mapping, followed by function-based mapping. This approach typically works really well.

In LEC report generation, designers will typically dump the following reports:

report unmapped points (-unreachable, -notmapped, -extra)

report compare data (-class nonequivalent, -class abort)

report black box (check if there’s any unexpected black boxes)

report ignored inputs and outputs

report pin constraints

report output stuck at

report floating signals

report renaming rule (this guides how LEC performs key point mapping)

LEC non-equivalence debug will be the last step.

Debugging typically starts from unmapped points, and possible root cause includes:

Not mapped BBOX pins causes NEQs (Use renaming rule if pins names not matched)

Not mapped DFF/DLATCH/CUT/PI causes NEQs (Optimize and merge DFF/DLATCH in LEC;

Resolve unbalanced loop cutting; Constrain test/scan signals)

Incorrect mapping causes NEQs (Remap the incorrectly mapped pairs manually)

After resolving all mapping issues, designers can start debugging real mismatches, and possible root cause includes:

Unbalanced floating signals can cause NEQs (Tie off floating pins in RTL, or use “add tie signal” to individually tie each floating Z)

Logic optimized differently b/w LEC and implementation causing NEQs

Debug phase mapping (Recommended: turn off phase inversion in synthesis; Or auto analysis & remap with phase)

Balanced but opposite optimization constant

Constraints depend on the stage of LEC.

1. RTL to Pre-scan: no constraints. Just read the design, map and compare.

2. Pre-Scan vs. Post-Scan: Test constraints to mask the test logic. Map and compare.

3. Post-scan vs. Post routed /CTS netlist ; No constraints.

do file that is generated after synthesis for doing LEC sufficient to do LEC

Do file is nothing but your own way of generating the commands.

1. Set the log file

2. Read the .lib

3. Read the golden and revised files.

4. Set the mapping rules.

5. Compare.

6. Write the reports.

If you have automated flow for generating do file for LEC, thats good to start with.If you are using formality for Formal verification , SVF file is useful , where as LEC doesnt support SVF constructs.

do file commands

Invoke Conformal LEC inside Equivalence checking directory in non-GUI by using the command “lec –xl –nogui -color -64 -dofile counter.do”

-xl :- Launches Encounter® Conformal® L with Datapath and advanced equivalence checking capabilities

-nogui :- Starts the session in non-GUI mode

-color :-Turn on color-coded messaging when in non-GUI mode

-64 :- Runs the Encounter® Conformal® software in 64-bit mode

-dofile <filename> :- Runs the script <filename> after starting LEC

set log file <filename.log> - replace

Save log file and replaces if any log file exist with same name if any.

Read the Verilog library by entering:

read library <filename> -verilog –both

Both verilog and liberty format can be used but verilog format is preferred. Steps to generate .v from .lib using Conformal is mentioned at the end of this session]

-verilog :- to indicate that library is in Verilog format

-both :- use same library to model or structure both golden and revised design.

Read the Golden Design (RTL)

read design <filename> -verilog –golden

-verilog :- to indicate that RTL is coded in Verilog

-golden :- to input the golden design

Read the Revised Design:

read design <filename> -verilog –revised

-verilog :- to indicate that netlist is in Verilog

-revised :- to input the revised design

Ignore the scan input(scan_in) and Scan output (scan_out) pins (as these instances are not available in golden design and primary output key point is compare point)

add ignored inputs scan_in –revised [ignores scan_in pin]

add ignored outputs scan_out –revised [ignores scan_out pin]

Constraint the scan enable (SE) pin to zero to keep the revised design in functional mode.

add pin constraints 0 SE -revised [tool keeps the design in functional mode and ignore scan_in pin while compare. Also scan_in is not a compare point]

Change the mode of operation from ‘setup’ to ‘lec’

Set system mode lec

Conformal lec got two modes of operation i.e. SETUP mode and LEC mode. Setup mode is used to prepare the design to be compared. Any command that affects the way the design is modeled will need to be issued in this mode. LEC mode is where the designs will get modeled, key points mapped and where the compare process takes place.

Compare golden Vs. revised netlist

add compare points –all

compare

Once the compare process is completed, Conformal LEC will print a summary report that tells how many key points are equivalent, non-equivalent, aborted and not compared.

Generate verification report.

report verification

Reports a table of all violated checklist items

Use command ‘set gui on’ to turn on GUI window. In case of mapping issue or comparison issue or not equivalence, use mapping manager or debug manager or Schematic viewer options in LEC to resolve the issue. Same is the flow to compare netlist generated at different stages of physical design. Use proper modelling directives and constraints

Create .v from .lib

Invoke LEC by using the command “lec -xl -nogui -64”

Read library in liberty(.lib) format by using the command “read library <.lib file> –liberty –both”

Write out the verilog file by using the command “Write library <file name*> -verilog”

*file name can be any name with extension .v

Logical Equivalence Checking Netlist vs Netlist problem

Comparing Plain synthesis Netlist vs Low power synthesis Netlis- have only used clock gating low power technique. there are unmapped points in Low power netlist ( Red coloured "U") which is of clock gating Since clock gating is not present in plain synthesis netlist. i have enable flatten model -clock gating.

You should not ignore these. "Plain Synthesis Netlist" Is "Golden" and your "Low Power Synthesis NEtlist" is "Revised" . Also as you have written that you can see the unmapped points at cgc cells. add the cgc libs cells "Verilog definition" of the cells, at the golden side- from where you are reading all the files and parsing the same. so as to tell the tool to look for these cells and map them.

If you check module by module, then it becomes simpler and In LEC our main motive is that to check whether the flip flops and latches we had made in RTL are required or some extra flops and latches are added which are not required there. Then if this thing is checked module by module then whole process becomes simpler.

in other terms LEC is used to check whether synthesized design(at gate level) which will further go in back-end is according to specification which were written in form of RTL, if we find any mismatch then its clear the RTL logic we had written is not fully according to specification.

You can either do a "flat" LEC run which first fully synthesize the design, map it and then compare. Or hierarchical run which compares separately modules and then can copy them to other and save time.If your design is very large you can thing about diving it to sub design or do hierarchical run.

Equivalence Checking RTL vs RTL) would be the use of equivalence checkers which were traditionally oriented towards combinational equivalence checking but have some sequential capabilitie The advantage is that these tools are well known to designers and are fairly mature. However, the tools are still primarily oriented towards verifying the output of a synthesis tool. These tools are able to deal with some changes in state encoding, allowing a synthesis tool to perform some retiming operations. However, the debug environment is not ideally suited to RTL designers: The user is presented with a schematic displaying gate-level differences between the two designs, without benefit of a waveform or stimulus.

Engineering Change Order (ECO) is the process of making local changes to the design netlist without re-running the entire synthesis and P&R from scratch.

• ECO Types:

• Functional ECO

• Change the functionality of the design

• Non-functional ECO:

• Fix timing, cross talk

• Stage:

• Pre-masks

• Usage of standard cells to implement the modifications

• Post-masks

• Base layer taped-out, metal fix using spare cells

As with other types of formal analysis, equivalence checking of a large design is a tough mathematical problem. For most ASICs and simple FPGAs, the problem can be simplified by mapping the state elements (memory and registers) between the two designs. If this is possible, combinational equivalence checking needs to consider only the combinational logic between the state elements.

Sequential equivalence checking, in its purest form, treats two designs as black boxes in which the input and outputs must match but the internal state can be entirely different.

In practice, two designs may be similar but not close enough for combinational equivalence checking. This is commonly the case for FPGA synthesis, which may reorder state machines or migrate logic across registers to meet timing requirements. Thus, proving formal equivalence for FPGAs typically employs sequential checking for optimized parts of the design and combinational checking for the remainder.

How is equivalence checking related to other formal approaches?

It is common to divide formal verification into equivalence checking and property checking. The latter category includes both assertion-based verification and automated formal solutions such as apps. A simple way to conceptually unify these two categories is to think of equivalence checking as ABV on a model containing both designs to be compared as well as a set of assertions specifying that the outputs of the two designs must match. This does not necessarily represent how actual tools are build, but it may help in understanding.

Nothing can replace formal analysis. Simulation acceleration uses hardware to run faster, so it will execute more tests in a given time, but it still exercises very little design behavior. In-circuit emulation (ICE) and prototypes built using FPGAs are typically designed to be plugged into the end system in lieu of the actual chip under design. They may run even faster than acceleration and will exercise different aspects of the design. While ICE and prototypes are valuable for hardware-software co-verification, they are in no way a substitute for the exhaustive nature of formal verification.

It is no longer sufficient for FPGA designers to defer verification to the bring-up lab. Modern FPGAs are complex system-on-chip (SoC) devices with all the complexity of their ASIC counterparts. The difficulty of lab debug and long reprogramming cycles mandate that FPGA teams adopt ASIC-like verification processes. Formal verification tools work equally well on both technologies and have been adopted on many FPGA projects.

Equivalence checking is arguably even more important for FPGAs than for ASICs. The FPGA synthesis and place-and-route processes often make significant changes to the design structure, including moving logic across register stages, in order to maximize use of the underlying chip technology. These manipulations present some risk of altering design functionality. Formal equivalence checking can catch any such problems, but sequential equivalence checking is required since there is no 1-to-1 mapping between state elements in the RTL source and the optimized netlist.

Sequential logic equivalence checking (SLEC) is effective in finding bugs in new logic required to reduce dynamic power consumption, validating last minute ECOs, or verifying that design optimizations aren’t too aggressive. It is also very efficient in verifying safety mechanisms used in ISO 26262 and other fault mitigating designs.

SLEC’s effectiveness comes from using exhaustive formal verification algorithms, which do not require a testbench; and indeed are completely automated so the user does not need to know about formal technology themselves.

Given the formal-based nature of the analysis, SLEC can prove functional equivalence of the two designs for all inputs and all time, or identify any differences between the two designs. In contrast, simulation-based approaches cannot prove sequential equivalence. Indeed, even with well-written constrained-random testbenches, simulation may find functional differences depending on the quality of the testbenches but such analysis could still miss critical corner cases. As such, SLEC can save a lot of resimulation time after small modifications of the design.

Non-equivalence means that they have different functionality.Unmapped means that Conformal can't find object in Revised design according to rename rules.If your points non-equivalent, check Revised design correctness.If they reported (REPORT UNMAPPED POINTS) as unmapped make something like this :

Do the following to manually map key points:

1. Click the primary output pin in the Golden design to highlight it.
2. Right-click and choose Set Target Mapping Point from the pop-up menu.
3. Click the primary output pin in the Revised design to highlight it.
4. Right-click and choose Add Mapping Point – Non-invert from the pop-up menu.

Modified the netlist to either use input declaration or use supply* for both side.then sign-off by another person, it should be safe enough.

Unmapped in lec

Unreachable means there's no path from that keypoint to a primary output through any sort of logic.Hence they can't have an effect on the behaviour of the design. Conformal by default won't map them nor compare them. The typical causes:

1) unused code in RTL

2) spare gates

3) disabled logic (e.g. pre to post test)

We still like to show you have them though because they could be a problem if they are unexpected.

LEC Mapping and reports

There are 2 LEC mapping method, i.e., name-based and function-based.

By default, Conformal LEC will first do name-based mapping, followed by function-based mapping. This approach typically works really well.

In LEC report generation, designers will typically dump the following reports:

report unmapped points (-unreachable, -notmapped, -extra)

report compare data (-class nonequivalent, -class abort)

report black box (check if there’s any unexpected black boxes)

report ignored inputs and outputs

report pin constraints

report output stuck at

report floating signals

report renaming rule (this guides how LEC performs key point mapping)

LEC non-equivalence debug will be the last step.

Debugging typically starts from unmapped points, and possible root cause includes:

Not mapped BBOX pins causes NEQs (Use renaming rule if pins names not matched)

Not mapped DFF/DLATCH/CUT/PI causes NEQs (Optimize and merge DFF/DLATCH in LEC; Resolve unbalanced loop cutting; Constrain test/scan signals)

Incorrect mapping causes NEQs (Remap the incorrectly mapped pairs manually)

After resolving all mapping issues, designers can start debugging real mismatches, and possible root cause includes:

Unbalanced floating signals can cause NEQs (Tie off floating pins in RTL, or use “add tie signal” to individually tie each floating Z)

Logic optimized differently b/w LEC and implementation causing NEQs

Debug phase mapping (Recommended: turn off phase inversion in synthesis; Or auto analysis & remap with phase)

Balanced but opposite optimization constant

Abort points

Abort Points do come because of large fanin logic cone and if you have not used few RTL rules like using X in RTL.In some cases. You need to figure out first why they have got blackboxed. They are also important.

Blackboxes causes and techniques

Blackbox are produced in following cases:

1)Either you had not provided any basic gate ,common file or you have provided extra file in lec scripts but not included in synthesis (src_design.lst FILE)
2)Either there is problem in your functionality.You just check parametres and null slices in your design.If you find and remove those null slices by changing parametres.Then your blackboxes will automatically removed

After removing BB ,other problems will automatically be removed

Though there are different techniques available like Hierarchical flow, OVF flow, etc., for handling aborts, these have their own limitations. The technique discussed here helps identify common points that have common functionality in the RTL and Gate level netlist.

Adding CUT points at these common points helps to break the huge and complex combinational logic into smaller parts and this eventually helps in reducing the aborts.

The technique is user-friendly and easy to debug, has no Logic Equivalence coverage loss and results in reduced runtime. This technique also has negligible impact on the QoR of Gate-level netlist in terms of area and timing optimization. unique technique to identify CUT points in LEC to resolve Aborts

With increasing design complexities that also incorporate complex data structures there is an increase in the number of aborts in the Logic Equivalence Checker (LEC), which is mainly due to the limitation of the LEC tool in handling the complex logic cone.

Aborts are an inconclusive result that the formal verification tool is not able to resolve. Aborts can be due to:

i) Complex datapath;

ii) Huge logic cone for comparison; and

iii) Large number of don’t cares.

The greater the number of aborts in a design, the lesser the LEC coverage and the greater the probability of missing some non-equivalence in a design. Though there are different techniques for resolving aborts, they either involve complex methodologies or a lot of manual processes. It is always imperative that any technique applied should have a shorter turn-around-time, possess minimum LEC coverage loss, and be designer friendly

Adding cut points is a preferred way of avoiding aborts and has an advantage of having no LEC coverage loss. Though the LEC tool itself has the capacity to add cut points at certain points in the combinational logic, the position of these cut points usually are not design friendly. The position of the cut point is necessary in order for a correct comparison and to avoid false non-equivalencies.

Adding a cut point to the hierarchy actually helps to see same point at the RTL and Gate-level netlist. LEC treats the input and output of the comparison elements separately; hence the input of a cut point is verified during the LEC comparison so adding cut points should not cause any problems. Also, adding a cut point partitions the data path and allows the LEC tool to see reduced data cone and hence be able to resolve the aborts.

Unmapped points in LEC

Simplest thing is to provide the SVF file from synthesis. This will tell the tool what logic is removed, what is optimized and how its optimized.
This resolves most issues.

how to enlarge the number of unmapped keypoint when doing LEC

Plus the -max_unmap behand remodel command .

When I perform the lec using the cadence tools CONFORMAL, There's too much of umapped key points in my design. This bring to the huge number of unequivalent when i do the comparison between the revise and golden of my rtl and gate.

try to check constraints, check libraries check options, maybe you might need some renaming rules, generate reports to get some info from there. you should clear the un-mapped points before comparison. Start based on categories like Not-mapped, Unreachable & Extra.

If you are doing formal check between RTL and Netlist, there is possiblity that synthesis tool has optimized some signals in RTL and removed unused signals. IN this case if LEC is unable to understand the logic, you may get un-mapped points.

Another case is when you insert DFT logic on synthesis netlist (scan insertion etc), you may get some extra ports or signals used for DFT.Go through the list of un-mapped points. Discuss them with concern RTL designer.Un-mapped points are OK unless synthesis tool has done some un-wanted optimisation.

set mapping method -name first –unreach

Unreachable FF/Latch in LEC means, it is the FF/Latch which is not effecting the functionality of the design.So, If you know the functionality of the design very well, validate that, really this FF is not going to effect the functionality of the design or not.If it is really unreachable, no need to worry. You can ignore this warning.

If you have very little information about the design, then try the following steps to map unreachable points.
You said that unreachable points are in golden design.

read verilog golden.v -goldend -keep_unreach
read verilog revised.v -revised -keep_unreach

set mapping method –unreach

Scenario of the Problem I am facing.1.Read a Golden Design and a Revised DesignMap the Points and Compare2.Read the Same Golden Design and The Different Revised Design.
This time the Revised Design has been created by changing some Constarints in DC

The Problem I am facing isb Some Points in the Golden Design are not reachable for the First Run , But they are NOT_MAPPED for the Second RunHow this can be?

because Conformal Says NOT_MAPPED unmapped points are reachable but do not have correspoding point in the Logic Fanin Cone of the Corresponding Design.

If you look at the definition of ' Unreachable' points - they are those points which will not reach any of the compare points/outputs. Obviusly then they do not influence any of the functionality. Therefore it is possible that synthesis (during optimization) has removed this point. Now it is unmapped as there is no corrosponding equivalent point in the revised design. Looks Logical.

By the way, check the log file. If it says "DESIGNS EQUAL" then you are fine.

Some times the not-mapped points are due to the sequential merging of the logic during the synthesis
Look in to your synthesis log file if there are any sequentially merged statements present .
If present then add the instance equivalences constraints in the do script and run the lec and this may solve the issue.

procedure to find this in log file

During compare not-mapped points are a problem they ahve to ve be debugged and esnured that they are mapped and compared - this could be due to many reasons - incorrect mapping, incorrect modelling, incoreect pin constraints etc

Unreachables could beclock gating latches etc which are not present in the revised or points that do not reach a compare point; but these has to be reviewed as well since this your sign-off

If your performing RTL2Gate that means your using hierarchical compare (if not pl stry this) now check the not-mapped point if in golden then see in the synthesis logfile if it was optimized away say as a constant or a seq merge - in which case ensure that your LEC dofile has the necessary "set flatten model" options to address that modelling.

Friday, December 6, 2019

ASIC DESIGN

ASIC Design Flow

BACKEND DESIGN INTERVIEWS

No comments:

Total Pageviews

Contact Form