Intel 80486 ("486") Case Study
- 1989
- five-stage integer pipeline (approach is called an AGI pipeline)
- fetch - fetch 16 bytes of instructions from the single
physically-addressed 4-way set associative 8KB cache into
a prefetch buffer (providing about five instructions per fetch);
use the two 16-byte buffers in a double buffered manner or use
one for prefetching down a branch target path
- D1 (main decoding stage) - processes up to three instruction bytes
at a time; determines the length of the instruction and causes the
prefetch buffer to step to the next instruction; extra cycles for
prefix bytes or two-byte opcodes
- D2 (secondary decoding stage) - includes effective address calculation;
extra cycles when inst. has both an immediate constant and a memory
displacement or when an index register must be added to a
base register and a displacement
- EX (execution) - includes register operand fetch and data cache access
- data cache hit for either a load or store operation (i.e., MOV
mem to reg, MOV reg to mem) can be accomplished by the EX stage
in one cycle
- alu operations with all operands/results in registers can be
performed in one cycle
- extra cycles required for complex instructions, e.g. reg-to-memory
add requires three EX cycles: one for data fetch from cache, one
for the add itself, and one for result store to cache -- this type
of instruction is common in x86-style code
- using forwarding, a loaded result is available for use in the very
next cycle; however, because address calculation occurs in a
previous stage (D2), there can be pointer load delays (that is,
a sequence of a load and then an instruction that uses the loaded
register as a base or index register will encounter a one-cycle
stall)
- WB - write back to registers
- branches
- predict-untaken (even for unconditional jumps)
- two-cycle mispredict penalty since change in the PC (caused by a
taken conditional jump or an unconditional jump) is determined
during the EX stage; contents of D1 and D2 must be flushed
- 4-ported register file (3 read, 1 write)
- eight-stage FP pipeline with integer stages fetch/D1/D2/EX followed
by FP stages X1 (execute-1), X2 (exexute-2), WF (FP write-back), and ER
(error reporting)
- chief designer was John Crawford (IA-32 architects were
John Crawford and Patrick Gelsinger)
- J. Crawford, "The i486 CPU: Executing Instructions in One Clock Cycle,"
IEEE Micro, February 1990, pp. 27-36.
- B. Fu, A. Saini, and P. Gelsinger, "Performance and Microarchitecture
of the i486 Microprocessor," Intl. Conf. Computer Design, 1989,
pp. 182-187.
- E. Grochowski and K. Shoemaker, "Issues in the Implementation of the
i486 Cache and Bus," Intl. Conf. Computer Design, 1989, pp. 193-198.