Instruction Predecoding and Decoded Instruction Caches
Mark Smotherman
Last updated: May 2025
UNDER CONSTRUCTION
Summary: ... tbd...
... intro to be written
... incomplete - lots of ideas and patents,
... will try to include the early ones ...
- decoded instruction queues
- IBM
Stretch (1961) - predecoded up to two instructions at a time
after instruction fetch, held predecoded (and some pre-executed)
instructions in its four-level lookahead unit
- IBM S/370 M165 (1971) and 3033 (1978) - four-entry decoded
instruction queue between IPPF ("instruction pre-processing
function") and execution units; IPPF starts the fetching
of memory operands, similar to IBM Stretch
- decoded instruction caches
- LLNL
S-1 (1978) - expanded 36-bit word to 56-bit icache format
to reduce inst. decoding time
- Jim Pomerene and Rudolph Rechtschaffen,
"Cache memory architecture with decoding"
US Patent 4,437,149 (filed Nov. 1980, granted to IBM March 1984)
- Dave Patterson, RISC-II (ISCA 1983 paper) -
"on the miss" expansion between memory and icache
- CCI Power 6/32 (1984)
- variable-length
instuction set similar to the VAX in which the length of an
instruction cannot be determined by the opcode
- also sold as the Harris HCX-5, -7, and -9; the ICL Clan 5, 6, and 7;
and, the Unisys 7000/40
- A 1998 TUHS post by Eric Edwards says that it was implemented
on five boards using AMD 2900 series bit slice processors, PLAs,
and 74F series parts.
A 1987 comp.arch post by "bjj" from Penn State says that the
4K-word instruction cache contained decoded instructions stored as
fixed-length 73-bit words.
Kevin McKean in a 1986 two-part series in UNIX/WORLD states
that with the decoded instruction cache and memory interleaving
turned on, the 10 MHz Power 6/32 was 11x faster
than a 5 MHz VAX 11/780 on a call-and-return-intensive benchmark.
- Yale Patt,
HPS (1985)
- AT&T CRISP (1987) - variable-length source instructions were
predecoded into fixed-length entries and placed in a 32-entry DIC;
also each branch wwas folded into the previous instruction's DIC
entry by including a next-address field; conditional branches were
handled by including a second, alternate next-address field and
information for determining a misprediction
- IBM RS/6000 (1989) - instructions are predecoded into eight general
classes to assist in routing to the function units
(see also US Patents 5,828,895, filed 1995, and 6,286,094, filed 1999)
- ... lots recently ... (perhaps discuss MIPS R10000)
- instruction caches with instruction length info added
- ... early patents ...
- AMD K5 (1994) - adds 5 bits per instruction byte (start, end, prefix,
opcode, number of Rops); see US Patent 5,758,114 (parent filed 1995,
granted to AMD in 1998); only 2 bits added per byte in K8
- ...
- instruction caches with scheduling information added
- NS
Swordfish (1991) - instruction pair dependency bit is contained
in each decoded i-cache entry; it is set on i-cache refill by
predecode hardware and yields LIW issue of independent instruction
pairs; no bits are used in the normal instruction format;
see US Patent 5,669,011 (parent filed 1990, granted to NS 1997)
- Minagawa/Saito/Aikawa (1991) - "Pre-decoding mechanism for superscalar
architecture," IEEE Pacific Rim Conf. on Comm., Comp., and Sig. Proc.,
pp. 22-24; on i-cache miss, a predecoder adds instruction grouping
("priority") and function unit assignment fields;
see US Patent 5,377,339 (parent filed 1991, granted 1994)
- ...
- trace caches
- Alex Peleg and Uri Weiser, "Dynamic flow instruction cache memory
organized around trace segments independent of virtual address line,"
US Patent 5,381,533 (parent filed 1992, granted to Intel 1995)
- ...
- ...
(US Patents - search subclass 213 under class 712)
[History page]
[Mark's homepage]
[CPSC homepage]
[Clemson Univ. homepage]
mark@cs.clemson.edu