Clemson University
CPSC 464/664 Lecture Notes
Fall 2003
Mark Smotherman
Introduction
- What is "Computer Architecture"?
- Brooks and Blaauw (IBM S/360 architects)
- architecture -- appearance to assembly language programmer and
compiler writer
- instruction set
- register set(s)
- memory address space(s) (flat vs. segmented)
- data types (sizes, encoding, byte ordering, memory alignment)
- operations (arithmetic, logical, data movement, control transfer)
- instruction formats (lengths, fields, encoding)
- addressing modes (base register, index register, scaling,
autoincrement, etc.)
- interrupts, faults, and exceptions
- execution modes (user and OS)
- software conventions (e.g., register usage)
- hardware exposed to OS (control registers, virtual memory mapping,
and protection)
- implementation -- logical design and organization
(e.g., ALUs, caches, buses)
- realization -- hardware specifics (e.g., logic family, level of
integration, clock rate)
- separate concerns since many implementations of same architecture
will share software
- same arch./impl.: vacuum-tube IBM 709 and transistorized IBM 7090
- same arch.: IBM S/360 computer family in mid-1960's
- Hennessy and Patterson (RISC pioneers)
- instruction set architecture - now relatively rare to introduce new ISA
- organization/microarchitecture - now is main focus of processor
"architect"
- hardware - chip designer will use CAD tools, simulators, etc.
- compiler - now designed in conjunction with processor, not as
afterthought
- What is "Good Architecture"? (see chapter 2, section 16,
Historical Perspective)
- 1960's - easy for assembly language programmer to understand and use
- 1970's - qualitative, divorced from implementation, "semantic gap"
- 1980's - quantitative, best performance from arch./impl./software
tradeoffs (RISC)
- 1990's - emphasis on organization (memory/bus/functional units) in terms
of cost, performance, packaging, power consumption, and time to market
- 2000's - emphasis on ILP? - HP/Intel EPIC
- Hardware vs. Software Design Tradeoffs
- any algorithm can be fully or partially committed to hardware
- software advantages (ease of debugging, changing, upgrading)
- hardware advantage (performance, but complexity delays time to market)
- custom computing machine reconfigures to implement algorithm using FPGAs
Historical Overview
- Influence of John von Neumann
- von Neumann (1903-1957) made contributions to pure math, mathematical
logic, quantum mechanics, cybernetics, and automata theory; he is
credited with inventing game theory and cellular automata; he was
also active in the Manhattan Project (the atomic bomb);
picture;
biography;
another biography
-
Burks, Goldstine, and von Neumann, "Preliminary discussion of the
logical design of an electronic computing instrument," 1946
(though some also assign credit for the idea of the
stored program computer to J. Presper Eckert and John Mauchly)
- von Neumann machine characteristics
- random-access, one-dimensional memory (vs. sequential memories)
- stored program, no distinction between instructions and data
(vs. "Harvard architecture" with separate instruction and data
memories)
- binary, parallel by word, two's complement
(vs. decimal, serial-by-digit, signed magnitude)
- instruction fetch/execute cycle, branch by explicit change of PC
(vs. following a link address from one instruction to the next)
- three-register arithmetic -- ACC, MQ, MBR
- von Neumann's ideas were implemented in Princeton IAS machine
- standing
in front of IAS machine
- another
view of IAS machine
- one integer data type, 40 bits, binary, two's complement
(can be viewed as a scaled fraction with implied binary point)
- 10-bit addressability; no addressing modes
- array access by explicitly updating the instruction address
(self-modifying code: insert next element address into the
address field of the array accessing instruction)
- single type of conditional jump; jump if ACC != 0
- for loop is implemented by counting down from N-1 to 0
- procedure call by instruction modification (programmer inserts
return address directly into return jump => no recursion)
- IAS instruction set
- von Neumann foresaw
- floating point (IBM 704, 1954), but recommended programmer scaling
- indexing (Univ. of Manchester, 1949), "B-lines" along with "A-line"
(ACC)
- multiple precision arithmetic
- hexadecimal notation
- policing of unassigned opcodes
- single-stepping for debugging
- pipelining (IBM Stretch, 1961)
- CPU-bound vs. I/O-bound behavior
- archival storage
- later innovations
- buffered I/O (Univac I, 1951)
- I/O interrupts and DMA (NBS DYSEAC, 1954)
- duplex processors (IBM SAGE, 1955)
- indirect addressing (IBM 704, 1956)
- general purpose register set (Ferrante Pegasus, 1956) [also R0 == 0]
- I/O channels (IBM 709, 1957)
- virtual memory (Univ. of Manchester Atlas, 1959)
- symmetric multiprocessor system (Burroughs D-825, 1960)
- pre-execution within instruction stream - decoupled access-execute
architecture (IBM Stretch, 1961)
- out-of-order execution (CDC 6600, 1964)
- computer family (IBM S/360, 1964)
- cache (IBM S/360 model 85, 1967)
- superscalar (IBM ACS design, 1967; IBM RS/6000, 1989)
- Some Important Machines
(a subjective list, many others could be included)
- early scientific computers,
IBM 701/704/709/7090/7094 series
- 36-bit words, 6-bit characters, word addressability,
12-bit program counter in 701, 15-bit program counter in 704
- accumulator architecture, like IAS;
three, then seven, 15-bit index registers were added in later models
-
vacuum tubes in 7xx models,
core memory
in 704 and later models
(see also this about core)
- early business computers, IBM 702/705, IBM 1401
- oriented to decimal and variable-length data
- early supercomputer,
IBM Stretch, 1961
- designed for Los Alamos, high-performance scientific computing
(e.g., nuclear bomb design)
- 64-bit words, introduced 8-bit byte, bit addressability
- developed transistor technology for IBM; but, cost more than list
price so Watson, Sr., halted sales
- pre-executes subset of instruction stream dealing with index
registers so it can start loads early
- predict untaken with branch mispredict recovery
- See also
Dag Spicer's article on Stretch, and
Gordon Bell's presentation on supercomputers
- mainframe, IBM S/360, 1964
- 32-bit words, 8-bit bytes, byte addressability,
24-bit program counter
- 16 general-purpose registers, four 64-bit floating-point registers
- combined scientific and business data processing orientations
to provide "360 degrees of data processing"
- introduced idea of computer family -- multiple implementations at
different price/performance points
- success of systems allowed IBM to dominate the computer market for
many years
-
Model 25 installation
-
Model 50 installation
-
Model 65 front panel
- See also
Blaauw and Brooks, Structure of System/360
- supercomputer, CDC 6600, 1964
- 60-bit words for the central processor, word addressability,
18-bit program counter
- 8 arithmetic registers (60 bits), 8 address registers (18 bits),
8 index registers (18 bits)
- load/store architecture, 3-register instruction formats for
arithmetic and logic operations (some consider it the first "RISC")
- ten functional units, dynamic instruction scheduling ("scoreboard"),
and out-of-order execution
- dynamic memory scheduling ("stunt box")
- ten peripheral processing units (12-bit minicomputer architecture,
implemented using single set of shared PPU hardware), these ran
the OS and handled I/O
- See also
Dag Spicer's article on the CDC 6600,
Gordon Bell's presentation on Seymour Cray,
Thornton, Parallel Operation in the Control Data 6600
- early minicomputer,
PDP-8, 1965
- 12-bit words, 6-bit characters, word addressability,
12-bit program counter
- accumulator architecture
- minicomputer, PDP-11, 1970
- 16-bit words, 8-bit bytes, byte addressability
- 16 registers with stack pointer and program counter mapped into
R14 and R15
- optional floating-point unit adds six 64-bit registers
- 12 addressing modes
-
Ákos Varga's PDP-11 site, and
Bell, et al., A New Architecture for Minicomputers
- early microcomputers, 1970s --
Intel 4004, 8008, 8080
- vector supercomputer, Cray 1, 1976
- 64-bit words, word addressability, 24-bit program counter
- eight 24-bit address registers (plus 64 24-bit backup registers),
eight 64-bit scalar accumulators (plus 64 64-bit backup registers),
eight vector registers (64 entries of 64 bits each),
vector mask register, and vector length register
- instruction set similar to CDC 6600 but also includes vector-scalar
and vector-vector instructions
- See also
Computer History Museum article
- PC microprocessor,
Intel 8086, 1978
- 16-bit words, byte addressability, 20-bit addressability using
segmentation scheme
- extended accumulator architecture, AX, BX, CX, DX, SP, BP,
SI, DI registers (registers have special purposes, e.g., CX
contains counts)
- ten addressing modes
- later versions: 286 added segmentation and protected mode;
386 moved to 32-bit architecture, made the eight registers
more general purpose, and also added paging;
486 integrated the integer unit and FPU on one chip;
Pentium was superscalar; recent instruction set extensions
include MMX, SSE, SSE2
- superminicomputer,
DEC VAX-11/780, 1978 -- typical "CISC"
- 32-bit words, byte addressability, 32-bit program counter
- 16 general registers plus various control registers
- 16 addressing modes
- 243 instructions (e.g., 6 different forms of XOR)
- highly variable instruction format with 1-6 operands
- see
Strecker, VAX-11/780
- workstation microprocessor, MIPS R2000, 1986 -- typical "RISC"
- 32-bit words, byte addressability, 32-bit program counter
- 32 integer registers, 32 floating-point registers
- two addressing modes
- fixed-length instructions
- load/store architecture, 3-register instruction formats for
arithmetic and logic operations
- delayed branches with delay slot instructions to be filled
(i.e., pipeline implementation shows through into the architecture)
- minisupercomputers, 1980s -- vector:
Convex; VLIW: Multiflow
- 64-bit processors, 1990s -- DEC Alpha, MIPS, SPARC v9
- Intel/HP 64-bit explicitly parallel architecture -- IA-64, Itanium
|
ENIAC |
IBM 704 |
IBM S/360 M50 |
VAX 11-780 |
Sun SPARCStation 2 |
Dell 4600 |
| date |
1946 |
1955 |
1965 |
1978 |
1992 |
2003 |
| addition time |
200 usec |
24 usec |
4 usec |
400 nsec |
25 nsec |
208 psec |
| memory cycle time |
|
12 usec |
2 usec |
200 nsec |
80 nsec |
3 nsec |
| standard memory size |
|
168 KB |
64 KB |
128 KB |
128 MB |
256 MB |
| rental |
|
$48,000/mo. |
$32,000/mo. |
$6,000/mo. |
|
|
| purchase |
$500,000 |
$1,390,000 |
$409,000 |
$128,000 |
$15,000 |
$800 |
| constant 2003 dollars |
$4.7M |
$9.5M |
$2.4M |
$360,000 |
$19,600 |
$800 |
Key Points
- Any algorithm can be fully committed to hardware but it may not make
economic sense.
- An instruction set architecture lasts much longer than a particular
implementation.
- The types of computer systems we use today are credited to
John von Neumann.
[Course home page]
[Mark's homepage]
[CPSC homepage]
[Clemson Univ. homepage]
mark@cs.clemson.edu