Clemson University
CPSC 464/664 Lecture Notes
Fall 2003
Mark Smotherman


Transistors and VLSI Chips

  1. Transistor Operation
    1. n-type transistor (Intel)
    2. Paul DeMone article on CMOS (part 1)
    3. Paul DeMone article on CMOS (part 2)
    4. Java CMOS basic gate demonstration

  2. Chip fabrication
    1. Intel pages
    2. steps
      1. design and layout
      2. mask creation (one mask per layer, high fixed cost of masks is a problem for low volume chips)
      3. wafer processing (light exposed through mask then wafer developed, cleaned, and inspected) -- steps are repeated for each layer
      4. each die on wafer tested and then cut out
      5. good dies are inserted into package then tested
      6. delivery of remaining chips after burn-in and speed-binning
    3. Sematech page on how semiconductor chips are made
    4. Close-up view of interconnect in an SRAM chip (IBM)
      see also Semiconductor manufacture and interconnect packaging (IBM)
    5. Notes on Fabrication and Layout, Kenneth Yun, UCSD

  3. Die photos

  4. Transistor-level design
    1. Modern VLSI Design, Wayne Wolf
    2. Notes on Cell Design and Layout, Kenneth Yun, UCSD
    3. MOSIS design rules

  5. HDL-level design example from UCR
    1. block diagram
    2. design hierarchy
    3. top level microprocessor design
    4. control unit design
    5. control state machine design

  6. Synthesis tools
    1. HDL synthesized into netlists, which are then checked and optimized
    2. netlists synthesized into fabrication masks
    3. combined HDL and block diagram displays
    4. T. Chan, et al., "Challenges of CAD Development for Datapath Design" (Intel Tech. Journal, Q1, 1999)

  7. Technological implications
    1. transistor density - Moore's Law
    2. performance measures
      1. transistor switching time is on the order of a few picoseconds (varies according to process, etc.)
      2. a logic gate is built of several transistors, so the gate propagation delay is larger (e.g., 25 ps for a FO4 gate in 180 nm process)
      3. each pipeline stage uses the equivalent of around 10 gate levels +/-, so the propagation delay through pipe stage is larger still (e.g., on the order of 250 ps => 4 GHz)
        1. clock skew and jitter (e.g., 51 ps in 180 nm Pentium 4) - usually assumed to be constant for a given process and thus becomes a bigger percentage of the clock cycle time as the clock frequency increases
        2. latch delay = 3 FO4 gate delays (e.g., 75 ps in 180 nm)
        3. logic delay = additional gate delays
        4. optimizations exist to reduce skew/jitter/latch overhead (e.g., time borrowing circuits, domino pipelines)
      4. longest critical stage governs minimum clock cycle time (and thus maximum clock frequency)
      5. but note, as in Pentium 4, some sections of the processor may run at double, half, or quarter speed
    3. wire delay becoming as important as transistor switching speed
      1. range of wire at various clock speeds
      2. Pentium 4 devotes entire pipeline stages for chip crossings
    4. static power (i.e., leakage) becoming a major factor in power budget
    5. Stefan Rusu, Intel, "Trends and Challenges in VLSI Technology Scaling Towards 100nm"
    6. special issue, IBM Journal of Research and Devlopment, "Scaling CMOS to the limit," Vol. 46, Nos. 2/3, 2002

  8. On-chip caches
    1. Mike Haertel: "The relationship between cache size and performance tends to be vaguely logarithmic: if you double the size of the cache (and magically manage to keep it at the same speed), performance increases by +x%. Then you have to double the cache again to get another +x%. Obviously this depends heavily on your workload, but it's a reasonable rule of thumb. ...

      Nowdays in many processors, both CISC and RISC, more than half the die area is devoted to cache. For example, looking at a K8 (Sledgehammer) die photo, it looks like about 60% of the die is L2 cache and maybe another 10% is L1 cache. That leaves just 30% left over for all the core logic. ... Now, if the +x% gain for 2x cache growth is on the ballpark of 5-7%, (which seems like a reasonable assumption looking at the SPEC database) then a 1.2x increase in cache size is probably worth at most +2% performance."

  9. Wire-exposed ("communications exposed") instruction sets
    1. RAW (MIT)
    2. TRIPS (UT Austin)
    3. Imagine (Stanford)

  10. Asynchronous logic
    1. University of Manchester Asynchronous Logic home page
    2. Sun Research Labs Asynchronous Design Group


Key Points


[Course home page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu