Clemson University
CPSC 464/664 Lecture Notes
Fall 2003
Mark Smotherman
Transistors and VLSI Chips
- Transistor Operation
-
n-type transistor (Intel)
-
Paul DeMone article on CMOS (part 1)
-
Paul DeMone article on CMOS (part 2)
-
Java CMOS basic gate demonstration
- Chip fabrication
-
Intel pages
- steps
- design and layout
- mask creation (one mask per layer, high fixed cost of masks
is a problem for low volume chips)
- wafer processing (light exposed through mask then wafer developed,
cleaned, and inspected) -- steps are repeated for each layer
- each die on wafer tested and then cut out
- good dies are inserted into package then tested
- delivery of remaining chips after burn-in and speed-binning
-
Sematech page on how semiconductor chips are made
-
Close-up view of interconnect in an SRAM chip (IBM)
see also
Semiconductor manufacture and interconnect packaging (IBM)
-
Notes on Fabrication and Layout, Kenneth Yun, UCSD
- Die photos
- Transistor-level design
-
Modern VLSI Design, Wayne Wolf
-
Notes on Cell Design and Layout, Kenneth Yun, UCSD
-
MOSIS design rules
- HDL-level design example from UCR
-
block diagram
-
design hierarchy
-
top level microprocessor design
-
control unit design
-
control state machine design
- Synthesis tools
- HDL synthesized into netlists, which are then checked and optimized
- netlists synthesized into fabrication masks
- combined HDL
and block diagram displays
-
T. Chan, et al., "Challenges of CAD Development for Datapath Design"
(Intel Tech. Journal, Q1, 1999)
- Technological implications
- transistor density -
Moore's Law
- performance measures
- transistor switching time is on the order of a few picoseconds
(varies according to process, etc.)
- a logic gate is built of several transistors, so the gate propagation
delay is larger (e.g., 25 ps for a FO4 gate in 180 nm process)
- each pipeline stage uses the equivalent of around 10 gate levels +/-,
so the propagation delay through pipe stage is larger still (e.g.,
on the order of 250 ps => 4 GHz)
- clock skew and jitter (e.g., 51 ps in 180 nm Pentium 4) -
usually assumed to be constant for a given process and thus
becomes a bigger percentage of the clock cycle time as the clock
frequency increases
- latch delay = 3 FO4 gate delays (e.g., 75 ps in 180 nm)
- logic delay = additional gate delays
- optimizations exist to reduce skew/jitter/latch overhead
(e.g., time borrowing circuits, domino pipelines)
- longest critical stage governs minimum clock cycle time (and thus
maximum clock frequency)
- but note, as in Pentium 4, some sections of the processor may run at
double, half, or quarter speed
- wire delay becoming as important as transistor switching speed
-
range of wire at various clock speeds
- Pentium 4 devotes entire pipeline stages for chip crossings
- static power (i.e., leakage) becoming a major factor in power budget
-
Stefan Rusu, Intel, "Trends and Challenges in VLSI Technology Scaling
Towards 100nm"
-
special issue, IBM Journal of Research and Devlopment, "Scaling CMOS
to the limit," Vol. 46, Nos. 2/3, 2002
- On-chip caches
- Mike Haertel: "The relationship between cache size and performance
tends to be vaguely logarithmic: if you double the size of the cache
(and magically manage to keep it at the same speed), performance
increases by +x%. Then you have to double the cache again to get
another +x%. Obviously this depends heavily on your workload, but
it's a reasonable rule of thumb. ...
Nowdays in many processors, both CISC and RISC, more than half the
die area is devoted to cache. For example, looking at a K8
(Sledgehammer) die photo, it looks like about 60% of the die is L2
cache and maybe another 10% is L1 cache. That leaves just 30% left
over for all the core logic. ...
Now, if the +x% gain for 2x cache growth is on the ballpark of 5-7%,
(which seems like a reasonable assumption looking at the SPEC database)
then a 1.2x increase in cache size is probably worth at most +2%
performance."
- Wire-exposed ("communications exposed") instruction sets
- RAW (MIT)
-
TRIPS (UT Austin)
- Imagine (Stanford)
- Asynchronous logic
-
University of Manchester Asynchronous Logic home page
-
Sun Research Labs Asynchronous Design Group
Key Points
- transistors and wires are the basic building blocks in VLSI design
- most chip designers work at HDL level
- transistor switching time << gate propagation delay << clock cycle
- power and wire delay are now critical factors in VLSI and chip design
- most of the transistors in a current high-performance microprocessor are
used for on-chip caches
- most designs today are synchronous (i.e., clocked), but asynchronous
designs may become important
[Course home page]
[Mark's homepage]
[CPSC homepage]
[Clemson Univ. homepage]
mark@cs.clemson.edu