Intel Pentium 4 Case Study
- adds SSE2 instructions
- decoupled microarchitecture: translates x86 instructions into internal
uops (micro-operations)
- treats translation as predecoding, stores up to 12K uops in a
trace cache which acts as the equivalent of an L1 icache
- 128 physical registers, separate from ROB
- 20+ pipeline stages (Prescott version rumored to have 30+)
- half-cycle ALU
- schedulers assume cache hits and speculative issue dependent uops,
replay mechanism for dependent uops if cache miss
- 4K-entry BTB, GA_
- chief architect was Glenn Hinton
-
G. Hinton, et al., "The Microarchitecture of the Pentium 4 Processor,"
Intel Tech. Journal, 1st Q 2001. (pdf)
-
D. Boggs, et al., "The Microarchitecture of the Intel Pentium 4
Processor on 90nm Technology,"
Intel Tech. Journal, vol. 8, issue 1, February 2004. (pdf)
-
Tom Shanley, The Unabridged Pentium 4 (pdf)