HARP - Hatfield Advanced RISC Processor
Mark Smotherman. Last updated July 2011
HARP is a VLIW architecture dating from the late 1980s
that has been cited in many papers
and patents, and it may have influenced the Itanium design.
The HARP execution model characteristics include:
- parameterized architecture of 3-5 operations per long instruction word
(i.e., the number is fixed in any particular implementation)
- no fixed operation-type slotting in the long instruction word
(but see below about the iHARP implementation)
- 64 32-bit general registers (R0 is a constant 0)
- 16 1-bit Boolean registers (instead of condition codes;
B0 is a constant 0; note that iHARP later included
carry and overflow condition code bits for each ALU pipeline)
- each operation is predicated on a Boolean register and can
choose either a true or false predicate (the idea of
conditional execution / guarded execution of each operation was
derived from Acorn ARM, but ARM predicates on condition code values
rather than a set of Boolean registers)
- predicated computational operations can be located in the
same long instruction word as the predicate-setting operations
(but not possible for load/stores and branches)
- load/store architecture with RISC-like instruction formats
- single-cycle execution operations (no floating-point; integer
multiply-step and divide-step operations)
- effective address computation by logical OR rather than addition
(to remove the need for a pipe stage)
- four-stage pipeline that avoids load-use delays
(derived from study of the MIPS-X pipeline)
- IF - instruction fetch
- ID/RF - decode and read register file in phase 2 of cycle;
combine address components for branch or memory address;
resolve branch taken/untaken decision
- ALU/MEM - execute using ALU or 1-cycle access to data cache
- WB - write back in phase 1 of cycle
- destination register write-back-prohibit bits for loads and ALU
operations (used to reduce the number of write-backs when a value
can instead be forwarded across a register-bypass network;
however, these are overridden in iHARP during exception processing
by performing multiple write-back cycles so that a program can
correctly resume after exception handling; iHARP treats as
register write-back permission bits rather than prohibit)
- instruction addresses denote long instruction words,
thus you cannot branch to a subset of the operations inside
a given long instruction word
- one long-instruction-word delay slot for branches, where
branches have both normal predication as well as a conditional
test on a Boolean register (allows for ease of placing branches
in delay slot)
- (later) speculative loads and operations (with "pollution" bits
added to the register sets)
- code examples
// combined shift and add
liw1: ADD R3,R1(ASL#4),R2 // R3 := (R1<<4) + R2
// WAR dependence within a long instruction word:
// in liw3, R3 is assigned 3 and R5 is assigned 70
liw1: MOV R1,#1 MOV R2,#2
liw2: MOV R3,#30 MOV R4,#40
liw3: ADD R3,R1,R2 ADD R5,R3,R4
// simple predication example
liw1: EQ B1,R1,R2 // B1 := R1==R2
liw2: TB1 ADD R5,R3,R4 // if( B1 is true ) R5 := R3 + R4
// predication of computational operation within same long instruction word
liw1: EQ B1,R1,R2 TB1 ADD R5,R3,R4 // if( R1==R2 ) R5 := R3 + R4
// simple branching example (without predication)
liw1: EQ B1,R1,R2 // B1 := R1==R2
liw2: BT B1 target // if( B1 is true ) PC := target
liw3: ... // delay slot
liw4: ... // target liw if taken
// more complex branching example using predicates
// if( R1==R2) branch to target1
// else if( R3==R4 ) branch to target2
liw1: EQ B1,R1,R2
liw2: BT B1 target1 FB1 EQ B2,R3,R4
liw3: FB1 BT B2 target2
// write-back prohibit
liw1: ADD R3!,R1,R2 // forward the value of R1 + R2 to any use
// of R3 in next long instruction word,
// but do not write into R3 (unless an
// exception occurs)
liw2: ADD R5,R3,R4 // forwarded value used for R3 operand
// speculative load and operation moved above branch
liw1: LD! R1,8(SP) // should an exceptional condition arise,
// LD! sets a pollution bit for R1 instead
// of immediately triggering the exception
liw2: ADD! R3,R1,R2 // ADD! sets the pollution bit for R3 if a
// pollution bit for either R1 or R2 is set
...
... // conditional branch
...
liw_: ADD R5,R3,R4 // exception generated if a pollution bit for
// either R3 or R4 is set
A brief aside on names:
- The University of Hertfordshire was first established in the
early 1950s as Hatfield Technical College based on a gift of land
from the de Havilland Aircraft Company in Hatfield, England (in the
county of Hertfordshire). In 1967 it became Hatfield Polytechnic,
and in 1992 it became the University of Hertfordshire.
- HARP was first described as an acronym for HAtfield
RISC Processor;
later the 'A' was used to stand for "Advanced".
The people involved in the HARP design include: [incomplete]
- Gordon Steven - leader of the HARP and later HSA efforts, now retired;
served as Senior Lecturer and Reader in Computer Architecture
at University of Hertfordshire; BSE and MSE from Princeton in 1966
and 1967, PhD from Manchester University in 1969.
- students (some of whom took post-doctoral and faculty positions
at the university)
- Rod Adams - PhD (math) and MSc (computer science) from University
of Hertfordshire in 1983 and 1987; served as Senior Lecturer in
Computer Science at the University of Hertfordshire.
- Roger Collins - PhD from University of Hertfordshore in 1995.
- Shirley Davis
- Colin Egan - BSc and PhD from University of Hertfordshire in 1996 and
2000; currently a Senior Lecturer in Computer Science at the University
of Hertfordshire.
- Corrie Elston
- Paul Findlay (iHARP, EE department)
- James Finlay
- Sue Gray - MSc and PhD from University of Hertfordshire in 1984 and 1991.
(worked on various aspects of the project)
- Gordon Green (HARP simulator in ELLA HDL)
- Brian Johnson (iHARP, EE department)
- Dave McHale (EE department)
- Richard Potter - PhD from University of Hertfordshire in 1998.
- Fleur Steven - MSc and PhD from University of Hertfordshire in 1986
and 1989. (C compiler)
- Simon Trainis - PhD from University of Hertfordshire in 1994.
(iHARP)
- Daniel Tate
- Liang Wang - PhD from University of Hertfordshire in 1993.
(C compiler and RLS scheduler)
- David Whale - BSc (hons) from University of Hertfordshire in 1992.
(HARP simulator)
- other faculty members
- Bruce Christianson
- John Davis
- L.C.W. Dixon
- Paul Kaye
- Martin Loomes
- Steve Stott
The HARP design was started in the late 1980s by Gordon Steven
at Hatfield Polytechnic with the goals of "execut[ing] non-scientific
programs at a sustained instruction execution rate in excess of one
instruction per cycle" and "exploit[ing] the low-level parallelism
available in systems programs and general purpose computations".
[Microproc. and Microprog. paper, 1990]
Professor Steven and his students followed the approach of a
VLIW-like machine model (described above) and an optimizing compiler.
The HRC (HARP Research Compiler) compiled a subset of Modula-2 and
consisted of three major phases: sequential code generation,
local compaction (basic block scheduling), and
conditional compaction (global scheduling).
A gcc port was later developed to generate sequential HARP code
that could then be run through the two compaction phases.
David Whale wrote a C-based generic instruction set simulator as part
of his BSc project that was then used to implement the complete HARP model
to run the SPEC benchmarks for a number of the later papers.
The simulation model allowed the user to vary features such as
the number of pipelines and register bank sizes, and allowed the team
to explore the design space.
Students from the EE department at Hatfield Polytechnic, including
Simon Trainis, designed a VLSI implementation of the
a four-pipeline instance of HARP. This implementation was called
the iHARP and featured reduced register counts (32 general registers
and 8 Boolean registers) and slotting of some of the pipeline
functions.
pipeline 0 |
pipeline 1 |
pipeline 2 |
pipeline 3 |
computational |
computational |
computational |
computational |
relational |
relational |
relational |
relational |
memory reference |
|
memory reference |
|
|
|
Boolean |
|
|
branch (1st priority) |
|
branch (2nd priority) |
|
special purpose |
|
|
|
|
|
traps |
|
32-bit literal for pipeline 0 |
|
32-bit literal for pipeline 2 |
There could be at most two branches per long instruction word, with
pipeline 1 checked first and thus given priority in the case that both
branches evaluated to be taken.
Also, the compiler was expected to generate predicated code in such a manner
that even though two load/stores are allowed in a given long instruction
word, there could be only one data cache access at run-time. Likewise,
using predication and write-back-permission there could be only two register
write-backs allowed per long instruction word at run-time. (The register
file had ten read ports and two write ports.) The four ALUs have a complete
set of forwarding paths to each other.
[Note that the pipeline functions and branch priority assignment differ
among the various papers; the above description comes from the 1995
IEE Proceedings paper.]
A Resource Limited Scheduler (RLS) was subsequently developed
specifically for the iHARP, and it incorporated loop unrolling
and interprocedural scheduling as well as local and global compaction.
An evaluation in 1994 of a simulated iHARP configuration reported
a 1.76 speedup over a simulated single-pipeline version. [EuroMicro94]
A slightly later study reported a 1.8 speedup. [IEE Proceedings 1995]
The 1994 evaluation found that the scheduled iHARP code was 134%
larger than code for a single-pipeline version of HARP, mainly
because of the nop-padding required for the long instruction words.
Because of that increase, an in-order superscalar version was
compared; it resulted in only an 18% increase in code size over
the single-pipeline version.
The HARP team also investigated speculative loads and ALU operations
using "pollution" bits added to the general registers (and also to the
Boolean registers) to indicate delayed exceptions.
In 1992, the research team changed its name from HARP to
HSP (Hatfield Superscalar Processor) and then to
HSA (Hatfield Superscalar Architecture).
As part of this effort, a variable-length branch delay slot technique
was proposed that uses a count field within each branch operation.
Floating-point operations, as well as integer multiply and divide,
were included.
The HSA execution model was also extended from strictly in-order to
also encompass out-of-order techniques.
The Hatfield Superscalar Scheduler (HSS) was developed for the
new execution model.
An asynchronous processor with a five-stage pipeline
was also investigated under the project name "Hades".
Some of the more recent papers from the HSA team are listed on
the web page for the
Compiler Technology and Computer Architecture Research Group (CTCA)
at Hertfordshire.
References
[Note: there is a fair amount of repeated material across many of the
papers, so the best papers to read first are marked with **.]
Journal, conference, and periodical papers (not exhaustive)
- G.B. Steven,
"A novel effective address calculation for RISC microprocessors,"
Computer Architecture News,
vol. 16, no. 4, September 1988, pp. 150-156.
- **
G.B. Steven, S.M. Gray, and R.G. Adams,
"HARP: A parallel pipelined RISC processor,"
Microprocessors and Microsystems,
vol. 13, no. 9, November 1989, pp. 579-587.
- R.G. Adams, S.M. Gray, and G.B. Steven,
"Utilising low level parallelism in general purpose code:
The HARP project,"
Microprocessing and Microprogramming,
vol. 29, no. 3, October 1990, pp. 137-149.
- R. Adams and G.B. Steven,
"A parallel pipelined processor with conditional instruction execution,"
ACM SIGARCH Computer Architecture News,
vol. 19, no. 1, March 1991, pp. 135-142.
- P.A. Findlay, S.A. Trainis, G.B. Steven, and R. G. Adams,
"HARP: A VLIW RISC processor,"
Proceedings of the 5th Annual European Computer Conference
(CompEuro91), Bologna, Italy, May 1991, pp. 368-372.
- S.A. Trainis, P.A. Findlay, G.B. Steven, R.G. Adams, and D. McHale,
"iHARP: A multiple instruction processor chip incorporating RISC
and VLIW design features,"
Microelectronics Journal,
vol. 23, no. 2, April 1992, pp. 115-119.
- **
G.B. Steven, R.G. Adams, P.A. Findlay, and S.A. Trainis,
"iHARP: A multiple instruction issue processor,"
IEE Proceedings, Part E, Computers and Digital Techniques,
vol. 139, no. 5, September 1992, pp. 439-449.
- S.M. Gray, R.G. Adams, G.J. Green, and G.B. Steven,
"Static instruction scheduling for the HARP
multiple-instruction-issue architecture,"
Microprocessors and Microsystems,
vol. 17, no. 7, September 1993, pp. 415-424.
- G.B. Steven and F.L. Steven,
"ALU design and processor branch architecture,"
Microprocessing and Microprogramming,
vol. 36, no. 5, October 1993, pp. 259-278.
- F.L. Steven, R.G. Adams, G.B. Steven, L. Wang, and D.J. Whale,
"Addressing mechanisms for VLIW and superscalar processors,"
Microprocessing and Microprogramming,
vol. 39, nos. 2-5, December 1993, pp. 75-78.
- R.G. Adams, S.M. Gray, and G.B. Steven,
"HARP: A statically scheduled multiple-instruction issue
architecture and its compiler,"
Proceedings of the Second Euromicro Workshop on Parallel and
Distributed Processing, Malaga, Spain, January 1994, pp. 76-81.
- F.L. Steven, G.B. Steven, and L. Wang,
"An evaluation of the iHARP multiple instruction issue processor,"
Proceedings of the 20th EUROMICRO Conference on System Architecture
and Integration (EuroMicro 94), Liverpool, U.K., September 1994,
pp. 437-444.
- R. Collins and G.B. Steven,
"An explicitly declared delayed-branch mechanism for a
superscalar architecture,"
Microprocessing and Microprogramming,
vol. 40, nos. 10-12, December 1994, pp. 677-680.
- **
F.L. Steven, G.B. Steven, and L. Wang,
"Using a resource limited instruction scheduler to evaluate
the iHARP Processor,"
IEE Proceedings: Computers and Digital Techniques,
vol. 142, no. 1, January 1995, pp. 23-31.
- R. Adams and S. Gray,
"Using conditional execution to exploit instruction level concurrency,"
Software-Practice and Experience,
vol. 25, no. 9, September 1995, pp. 1003-1020.
- R. Potter and G.B. Steven,
"Investigating the limits of fine-grained parallelism in a
statically-scheduled superscalar architecture,"
Proceedings of the Second International Euro-Par Conference,
Lyon, France, August 1996, published as
EuroPar'96 Parallel Processing,
Lecture Notes in Computer Science, vol. 1124,
1996, pp. 779-788.
[an appendix contains producer-consumer latency matrices
that define how data dependencies are handled by iHARP]
- R. Collins and G.B. Steven,
"Instruction scheduling for a superscalar architecture,"
Proceedings of the 22nd EuroMicro Conference,
September 1996, pp. 643-650.
- G.B. Steven, B. Christianson, R. Collins, R. Potter, and F. Steven,
"A superscalar architecture to exploit instruction level parallelism,"
Microprocessors and Microsystems,
vol. 20, no. 7, March 1997, pp. 391-400.
- C. Egan, F.L. Steven, and G.B. Steven,
"Delayed branches versus dynamic branch prediction in
a high-performance superscalar architecture,"
Proceedings of the 23rd Euromicro Conference,
Budapest, Hungary, September 1997, pp. 266-271.
- D. Tate, G.B. Steven, and P. Findlay,
"The impact of a realistic cache structure on a statically scheduled
architecture,"
Proceedings of the 24th Euromicro Conference,
Vasteras, Sweden, August 1998, pp. 325-328.
- D. Tate, G.B. Steven, and F.L. Steven,
"Static scheduling for out-of-order instruction issue processors,"
Proceedings of the 5th Australasian Computer Architecture Conference
(ACAC-2000), Canberra, Australia,
January-February 2000, pp. 90-96.
- F.L. Steven, C. Egan, R.D. Potter, and G.B. Steven,
"Adding static data dependence collapsing to a
high-performance instruction scheduler,"
Journal of Systems Architecture,
vol. 47, no. 8, December 2001, pp. 727-745.
Additional technical reports (not exhaustive and does not include the
technical report versions of the published papers above)
- S.M. Gray,
"Considerations in the design of an instruction pipeline for a
reduced instruction set computer,"
Computer Science Technical Report No. 83,
Hatfield Polytechnic, 1988.
- F.L. Steven,
"The impact of instruction set orthogonality and complexity
on compiler code construction,"
Ph.D. thesis,
Hatfield Polytechnic,
August 1989.
- S.A. Trainis,
"Discussion of results from a single HARP ALU,"
Computer Science Technical Note No. 90-N1,
Hatfield Polytechnic,
March 1990.
- G.J. Green,
"A simulation of the HARP architecture in ELLA,"
Computer Science Technical Report No. 104,
Hatfield Polytechnic,
July 1990.
- S.M. Gray,
"The implementation of procedure call and return
sequences in the HARP Research Compiler,"
Computer Science Technical Report,
Hatfield Polytechnic,
1990.
- S.M. Gray,
"The implementation of arrays in the HARP Research Compiler,"
Computer Science Technical Report No. 116,
Hatfield Polytechnic,
December 1990.
- G.B. Steven and S.M. Gray,
"Specification of a machine model for the HARP architecture
and instruction set: Version 3,"
Computer Science Technical Report No. 117,
Hatfield Polytechnic,
January 1991.
- S.A. Trainis,
"A device specification for the iHARP processor,"
Computer Science Technical Report No. 118,
Hatfield Polytechnic,
February 1991.
- L. Wang,
"Crafting a C compiler for the iHARP chip using the GNU compiler
compiler,"
Computer Science Technical Report No. 121,
Hatfield Polytechnic,
April 1991.
- G.B. Steven,
"iHARP instruction set specification: Version 4,"
Computer Science Technical Report No. 124,
Hatfield Polytechnic,
June 1991.
- S.A. Trainis,
"The hardware cost of full register bypassing,"
Computer Science Internal Report,
Hatfield Polytechnic,
July 1991.
- S.M. Gray,
"Code generation for a long instruction word architecture,"
Ph.D. thesis,
Hatfield Polytechnic,
December 1991.
- D.J. Whale,
"Development of a processor simulator for iHARP,"
Computer Science Technical Report,
University of Hertfordshire,
April 1992.
- R. Collins,
"Towards a minimal superscalar implementation,"
Computer Science Technical Note No. 93-N1,
University of Hertfordshire,
February 1993.
- R. Collins,
"Scheduling code for the delayed branch instructions in the
HSP architecture,"
Computer Science Technical Note,
University of Hertfordshire,
February 1993.
- F.L. Steven,
"An evaluation of the HARP ORed indexing addressing mechanism,"
Computer Science Technical Report No. 156,
University of Hertfordshire,
July 1993.
- F.L. Steven, G.B. Steven, and L. Wang,
"An evaluation of the architectural features of the iHARP processor,"
Computer Science Technical Report No. 170,
University of Hertfordshire,
December 1993.
- R. Collins,
"Developing a simulator for the Hatfield Superscalar Processor,"
Computer Science Technical Report No. 172,
University of Hertfordshire,
December 1993.
- L. Wang,
"Instruction scheduling for a family of
multiple-instruction-issue architectures,"
Ph.D. thesis,
University of Hertfordshire,
December 1993.
- G.B. Steven,
"The Hatfield Superscalar Architecture: Version 2,"
Computer Science Technical Report,
University of Hertfordshire,
September 1994.
- R. Collins,
"Exploiting instruction-level parallelism in a superscalar architecture,"
Ph.D. thesis,
University of Hertfordshire,
October 1995.
- F.L. Steven,
"An introduction to the Hatfield Superscalar Scheduler,"
Computer Science Technical Report No. 316,
University of Hertfordshire,
Spring 1998.
- R. Potter,
"Exploring the limitations of fine-grained parallelism
for a superscalar architecture,"
Ph.D. thesis,
University of Hertfordshire,
1998.
- D. Tate,
"Out-of-order instruction issue and its integration into the
Hatfield Superscalar Architecture,"
Computer Science Technical Report No. 330,
University of Hertfordshire,
April 1999.
- F.L. Steven and G.B. Steven,
"The anatomy of the Hatfield Superscalar Instruction Scheduler,"
Computer Science Technical Report No. 341,
University of Hertfordshire,
March 2000.
Acknowledgements
My thanks to Colin Egan for his help in collecting this information and
to David Whale for information about the simulator.
[History page]
[Mark's homepage]
[CPSC homepage]
[Clemson Univ. homepage]
mark@cs.clemson.edu