Clemson University
CPSC 464/664 Lecture Notes
Fall 2003
Mark Smotherman


Introduction

  1. What is "Computer Architecture"?

    1. Brooks and Blaauw (IBM S/360 architects)
      1. architecture -- appearance to assembly language programmer and compiler writer
        1. instruction set
          1. register set(s)
          2. memory address space(s) (flat vs. segmented)
          3. data types (sizes, encoding, byte ordering, memory alignment)
          4. operations (arithmetic, logical, data movement, control transfer)
          5. instruction formats (lengths, fields, encoding)
          6. addressing modes (base register, index register, scaling, autoincrement, etc.)
          7. interrupts, faults, and exceptions
          8. execution modes (user and OS)
        2. software conventions (e.g., register usage)
        3. hardware exposed to OS (control registers, virtual memory mapping, and protection)
      2. implementation -- logical design and organization (e.g., ALUs, caches, buses)
      3. realization -- hardware specifics (e.g., logic family, level of integration, clock rate)
      4. separate concerns since many implementations of same architecture will share software
        1. same arch./impl.: vacuum-tube IBM 709 and transistorized IBM 7090
        2. same arch.: IBM S/360 computer family in mid-1960's

    2. Hennessy and Patterson (RISC pioneers)
      1. instruction set architecture - now relatively rare to introduce new ISA
      2. organization/microarchitecture - now is main focus of processor "architect"
      3. hardware - chip designer will use CAD tools, simulators, etc.
      4. compiler - now designed in conjunction with processor, not as afterthought

  2. What is "Good Architecture"? (see chapter 2, section 16, Historical Perspective)
    1. 1960's - easy for assembly language programmer to understand and use
    2. 1970's - qualitative, divorced from implementation, "semantic gap"
    3. 1980's - quantitative, best performance from arch./impl./software tradeoffs (RISC)
    4. 1990's - emphasis on organization (memory/bus/functional units) in terms of cost, performance, packaging, power consumption, and time to market
    5. 2000's - emphasis on ILP? - HP/Intel EPIC

  3. Hardware vs. Software Design Tradeoffs
    1. any algorithm can be fully or partially committed to hardware
    2. software advantages (ease of debugging, changing, upgrading)
    3. hardware advantage (performance, but complexity delays time to market)
    4. custom computing machine reconfigures to implement algorithm using FPGAs


Historical Overview

  1. Influence of John von Neumann

    1. von Neumann (1903-1957) made contributions to pure math, mathematical logic, quantum mechanics, cybernetics, and automata theory; he is credited with inventing game theory and cellular automata; he was also active in the Manhattan Project (the atomic bomb); picture; biography; another biography

    2. Burks, Goldstine, and von Neumann, "Preliminary discussion of the logical design of an electronic computing instrument," 1946 (though some also assign credit for the idea of the stored program computer to J. Presper Eckert and John Mauchly)

    3. von Neumann machine characteristics
      1. random-access, one-dimensional memory (vs. sequential memories)
      2. stored program, no distinction between instructions and data (vs. "Harvard architecture" with separate instruction and data memories)
      3. binary, parallel by word, two's complement (vs. decimal, serial-by-digit, signed magnitude)
      4. instruction fetch/execute cycle, branch by explicit change of PC (vs. following a link address from one instruction to the next)
      5. three-register arithmetic -- ACC, MQ, MBR

    4. von Neumann's ideas were implemented in Princeton IAS machine
      1. standing in front of IAS machine
      2. another view of IAS machine
      3. one integer data type, 40 bits, binary, two's complement (can be viewed as a scaled fraction with implied binary point)
      4. 10-bit addressability; no addressing modes
      5. array access by explicitly updating the instruction address (self-modifying code: insert next element address into the address field of the array accessing instruction)
      6. single type of conditional jump; jump if ACC != 0
      7. for loop is implemented by counting down from N-1 to 0
      8. procedure call by instruction modification (programmer inserts return address directly into return jump => no recursion)
      9. IAS instruction set

    5. von Neumann foresaw
      1. floating point (IBM 704, 1954), but recommended programmer scaling
      2. indexing (Univ. of Manchester, 1949), "B-lines" along with "A-line" (ACC)
      3. multiple precision arithmetic
      4. hexadecimal notation
      5. policing of unassigned opcodes
      6. single-stepping for debugging
      7. pipelining (IBM Stretch, 1961)
      8. CPU-bound vs. I/O-bound behavior
      9. archival storage

    6. later innovations
      1. buffered I/O (Univac I, 1951)
      2. I/O interrupts and DMA (NBS DYSEAC, 1954)
      3. duplex processors (IBM SAGE, 1955)
      4. indirect addressing (IBM 704, 1956)
      5. general purpose register set (Ferrante Pegasus, 1956) [also R0 == 0]
      6. I/O channels (IBM 709, 1957)
      7. virtual memory (Univ. of Manchester Atlas, 1959)
      8. symmetric multiprocessor system (Burroughs D-825, 1960)
      9. pre-execution within instruction stream - decoupled access-execute architecture (IBM Stretch, 1961)
      10. out-of-order execution (CDC 6600, 1964)
      11. computer family (IBM S/360, 1964)
      12. cache (IBM S/360 model 85, 1967)
      13. superscalar (IBM ACS design, 1967; IBM RS/6000, 1989)

  2. Some Important Machines (a subjective list, many others could be included)

    1. early scientific computers, IBM 701/704/709/7090/7094 series
      1. 36-bit words, 6-bit characters, word addressability, 12-bit program counter in 701, 15-bit program counter in 704
      2. accumulator architecture, like IAS; three, then seven, 15-bit index registers were added in later models
      3. vacuum tubes in 7xx models, core memory in 704 and later models (see also this about core)

    2. early business computers, IBM 702/705, IBM 1401
      1. oriented to decimal and variable-length data

    3. early supercomputer, IBM Stretch, 1961
      1. designed for Los Alamos, high-performance scientific computing (e.g., nuclear bomb design)
      2. 64-bit words, introduced 8-bit byte, bit addressability
      3. developed transistor technology for IBM; but, cost more than list price so Watson, Sr., halted sales
      4. pre-executes subset of instruction stream dealing with index registers so it can start loads early
      5. predict untaken with branch mispredict recovery
      6. See also Dag Spicer's article on Stretch, and Gordon Bell's presentation on supercomputers

    4. mainframe, IBM S/360, 1964
      1. 32-bit words, 8-bit bytes, byte addressability, 24-bit program counter
      2. 16 general-purpose registers, four 64-bit floating-point registers
      3. combined scientific and business data processing orientations to provide "360 degrees of data processing"
      4. introduced idea of computer family -- multiple implementations at different price/performance points
      5. success of systems allowed IBM to dominate the computer market for many years
      6. Model 25 installation
      7. Model 50 installation
      8. Model 65 front panel
      9. See also Blaauw and Brooks, Structure of System/360

    5. supercomputer, CDC 6600, 1964
      1. 60-bit words for the central processor, word addressability, 18-bit program counter
      2. 8 arithmetic registers (60 bits), 8 address registers (18 bits), 8 index registers (18 bits)
      3. load/store architecture, 3-register instruction formats for arithmetic and logic operations (some consider it the first "RISC")
      4. ten functional units, dynamic instruction scheduling ("scoreboard"), and out-of-order execution
      5. dynamic memory scheduling ("stunt box")
      6. ten peripheral processing units (12-bit minicomputer architecture, implemented using single set of shared PPU hardware), these ran the OS and handled I/O
      7. See also Dag Spicer's article on the CDC 6600, Gordon Bell's presentation on Seymour Cray, Thornton, Parallel Operation in the Control Data 6600

    6. early minicomputer, PDP-8, 1965
      1. 12-bit words, 6-bit characters, word addressability, 12-bit program counter
      2. accumulator architecture

    7. minicomputer, PDP-11, 1970
      1. 16-bit words, 8-bit bytes, byte addressability
      2. 16 registers with stack pointer and program counter mapped into R14 and R15
      3. optional floating-point unit adds six 64-bit registers
      4. 12 addressing modes
      5. Ákos Varga's PDP-11 site, and Bell, et al., A New Architecture for Minicomputers

    8. early microcomputers, 1970s -- Intel 4004, 8008, 8080

    9. vector supercomputer, Cray 1, 1976
      1. 64-bit words, word addressability, 24-bit program counter
      2. eight 24-bit address registers (plus 64 24-bit backup registers), eight 64-bit scalar accumulators (plus 64 64-bit backup registers), eight vector registers (64 entries of 64 bits each), vector mask register, and vector length register
      3. instruction set similar to CDC 6600 but also includes vector-scalar and vector-vector instructions
      4. See also Computer History Museum article

    10. PC microprocessor, Intel 8086, 1978
      1. 16-bit words, byte addressability, 20-bit addressability using segmentation scheme
      2. extended accumulator architecture, AX, BX, CX, DX, SP, BP, SI, DI registers (registers have special purposes, e.g., CX contains counts)
      3. ten addressing modes
      4. later versions: 286 added segmentation and protected mode; 386 moved to 32-bit architecture, made the eight registers more general purpose, and also added paging; 486 integrated the integer unit and FPU on one chip; Pentium was superscalar; recent instruction set extensions include MMX, SSE, SSE2

    11. superminicomputer, DEC VAX-11/780, 1978 -- typical "CISC"
      1. 32-bit words, byte addressability, 32-bit program counter
      2. 16 general registers plus various control registers
      3. 16 addressing modes
      4. 243 instructions (e.g., 6 different forms of XOR)
      5. highly variable instruction format with 1-6 operands
      6. see Strecker, VAX-11/780

    12. workstation microprocessor, MIPS R2000, 1986 -- typical "RISC"
      1. 32-bit words, byte addressability, 32-bit program counter
      2. 32 integer registers, 32 floating-point registers
      3. two addressing modes
      4. fixed-length instructions
      5. load/store architecture, 3-register instruction formats for arithmetic and logic operations
      6. delayed branches with delay slot instructions to be filled (i.e., pipeline implementation shows through into the architecture)

    13. minisupercomputers, 1980s -- vector: Convex; VLIW: Multiflow

    14. 64-bit processors, 1990s -- DEC Alpha, MIPS, SPARC v9

    15. Intel/HP 64-bit explicitly parallel architecture -- IA-64, Itanium

ENIAC IBM 704 IBM S/360 M50 VAX 11-780 Sun SPARCStation 2 Dell 4600
date 1946 1955 1965 1978 1992 2003
addition time 200 usec 24 usec 4 usec 400 nsec 25 nsec 208 psec
memory cycle time 12 usec 2 usec 200 nsec 80 nsec 3 nsec
standard memory size 168 KB 64 KB 128 KB 128 MB 256 MB
rental $48,000/mo. $32,000/mo. $6,000/mo.
purchase $500,000 $1,390,000 $409,000 $128,000 $15,000 $800
constant 2003 dollars $4.7M $9.5M $2.4M $360,000 $19,600 $800


Key Points


[Course home page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu