But recently ...
ENIAC
IBM 704
IBM S/360 M50
VAX 11-780
Sun SPARCStation IPC
Dell 4600
date
1946
1955
1965
1978
1990
2003
addition time
200 µsec
24 µsec
4 µsec
400 nsec
40 nsec
208 psec
memory cycle time
12 µsec
2 µsec
200 nsec
80 or 100 nsec
3 nsec
standard memory size
168 KB
64 KB
128 KB
8 MB (48 MB max)
256 MB
rental
$48,000/mo.
$32,000/mo.
$6,000/mo.
purchase
$500,000
$1,390,000
$409,000
$128,000
$9,995 (w/ color monitor)
$800
constant 2010 dollars
$5.6M
$11.3M
$2.8M
$428,000
$16,670
$950
CPU technology
17,500 vacuum tubes
5,000 vacuum tubes in CPU
~ 25 K SLT circuits on 72 circuit boards (2x1 and 1x3
diodes, one transistor, and three printed resistors on AOI module;
medium-speed gate was 30 nsec)
1500 Schottky TTL chips on 20 circuit boards (1-4 gates or
circuits per chip; 3-nsec gates)
approx. 1 M transistors in LSI S1C0010 IU (separate FPU)
55 M transistors in Pentium 4
For fun - media representations of computing power
comparison of execution times (e.g., in seconds) on three machines
machine X | machine Y | machine Z | |
---|---|---|---|
program 1 | 20 | 10 | 40 |
program 2 | 40 | 80 | 20 |
total execution time | 60 | 90 | 60 |
average execution time | 30 | 45 | 30 |
normalized speedups and means (relative to machine X)
machine X | machine Y | machine Z | |
---|---|---|---|
program 1 | 1 | 2 | 0.5 |
program 2 | 1 | 0.5 | 2 |
arithmetic mean | 1 | 1.25 | 1.25 |
geometric mean | 1 | 1 | 1 |
normalized speedups and means (relative to machine Y)
machine X | machine Y | machine Z | |
---|---|---|---|
program 1 | 0.5 | 1 | 0.25 |
program 2 | 2 | 1 | 4 |
arithmetic mean | 1.25 | 1 | 2.125 |
geometric mean | 1 | 1 | 1 |
same comparisons done using rates (1/execution_time)
machine X | machine Y | machine Z | |
---|---|---|---|
program 1 | 0.05 | 0.1 | 0.025 |
program 2 | 0.025 | 0.0125 | 0.05 |
arithmetic mean | 0.0375 | 0.05625 | 0.0375 |
harmonic mean | 0.0333 | 0.0222 | 0.0333 |
CPU performance CPU_time = IC * CPI * CCT cycles secs = i insts * j ------ * k ----- = ijk secs inst cycle IC * CPI CPU_time = --------------- Clock_Frequency IC_i CPU_time = sum ( IC * CPI ) * CCT = IC * sum ( ---- * CPI ) * CCT i i IC i i i Consider the following instruction mix IC_i / IC CPI_i operation frequency cycle count weighted CPI_i --------- --------- ----------- -------------- ALU ops 50% 1 _____ Loads 20% 2 _____ Stores 10% 3 _____ Branches 20% 4 _____ a) What is the average CPI? (Use the weighted arithmetic average.) CPI = (.5*1) + (.2*2) + (.1*3) + (.2*4) = .5 + .4 + .3 + .8 = 2.0 b) What is the cycle count distribution according to operation type? frequency in percentage of operation inst stream CPI total cycle count --------- ----------- --- ----------------- ALU ops 50% 1 25% (=.5/2) Loads 20% 2 20% (=.4/2) Stores 10% 3 15% (=.3/2) Branches 20% 4 40% (=.8/2) avg = 2 Alternatively, consider that out of every 100 instructions, 50 are ALU ops, taking one cycle each, etc. |------------------------- 100 instructions --------------------------| | | | 50 ALU ops | 20 loads | 10 stores| 20 branches | | * | * | * | * | | 1 cycle / ALU op| 2 cycles/ld | 3 cyc/st | 4 cycles / branch | | = | = | = | = | | 50 cycles | 40 cycles | 30 cycles| 80 cycles | | | |------------------------- 200 clock cycles --------------------------| So, branches represent 20/100 of the instruction count (20%) but 80/200 of the cycle count (40%). Note that on average each instruction takes two cycles (avg CPI = 2), but branches take twice the average number of cycles. So, it's not surprising that the branch cycle count percentage is twice the branch instruction count percentage. c) If branch cycles are reduced to 2, do you have enough information to determine how much faster the modified processor will be than the original processor? CPU_time_old IC * CPI_old * CCT CPI_old speedup = ------------ = ------------------ = ------- CPU_time_new IC * CPI_new * CCT CPI_new CPI_old = 2.0 CPI_new = (.5*1) + (.2*2) + (.1*3) + (.2*2) = .5 + .4 + .3 + .4 = 1.6 speedup = 2.0 / 1.6 = 1.25