CPSC 3300 - Fall 2014 Study Guide for Exam 2 Coverage: CPU implementation 1. Be able to define or match these terms register internal CPU bus tri-state buffer CPU data path control control signal hardwired control microprogrammed control control store pipelining pipeline stages pipeline latches structural hazard data dependencies RAW (read after write ordering must be preserved) - true dependency WAR (write after read ordering must be preserved) - anti-dependency WAW (write after write ordering must be preserved) - output dependency load-use data hazard pipeline stall register scoreboard forwarding multi-ported register file control hazard (branch hazard) branch target address (BTA) branch taken branch not taken (untaken) delayed branch branch delay slot branch prediction misprediction misprediction recovery flushing pipeline stages dynamic branch prediction branch target address cache (BTAC) branch history table (BHT) branch target buffer (BTB) branch history shift register (BHSR) gshare branch prediction algorithm multiple intruction issue superscalar very long instruction word (VLIW) explicitly parallel instruction computing (EPIC) reservation station reorder buffer (completion buffer) out-of-order execution in-order commit (retire) false dependencies register renaming architectural registers speculative execution precise exceptions imprecise exceptions loop unrolling predication multithreading multicore vector processor 2. Be able to: A. Given a high-level RTL or assembly language statement, give the necessary step-by-step RTL and/or control signals to implement that statement on a given datapath. B. Identify critical path(s) in a datapath when implementing a high- level RTL statement or assembly-language instruction. C. Given a code sequence, draw a data dependency graph. D. Describe in general what each stage in the 5-stage pipeline does. E. Given a code sequence, identify where any stalls occur and determine their duration on the 5-stage pipeline. F. Given a code sequence, identify where any forwarding actions occur on the 5-stage pipeline. G. Given a code sequence, draw the pipeline cycle diagram (stairstep diagram) for that code executing on the 5-stage pipeline. H. Explain how data hazards are detected and how forwarding paths work. I. Draw forwarding paths on a pipeline diagram. J. Describe how delayed branches work in the 5-stage pipeline. K. Describe how a BTB (or BTAC and BHT) provides a predicted next instruction address for branch instructions. L. Calculate the specific or average CPI for a given branching scheme. Alternatively, calculate misprediction penalties. M. Distinguish among superscalar, EPIC, and VLIW in terms of dependency checking, function unit assignment, and execution scheduling. N. Explain the hardware and software tradeoffs between a superscalar and VLIW. (E.g., control logic complexity, compiler complexity, coded compatibility.) O. Given a code sequence and functional unit latencies, schedule the operations in a set of very long instruction words. Be prepared to work problems as given in homework 3. Example questions Associate each term or statement below with aspects of ILP. Circle S or V, for Superscalar or VLIW, respectively. Note some questions may require both to be circled. 1. S V often uses hardware register renaming 2. S V the compiler makes decisions on what is in an issue packet 3. S V can benefit from loop unrolling optimizations performed by a compiler 4. Identify superscalar, EPIC, and VLIW (leaving one blank empty). dependency fn. unit time to start checking assignment execution ---------- ---------- ------------- _____________ hardware hardware hardware _____________ software hardware hardware _____________ software software hardware _____________ software software software 5. Identify the type of organization that is used by each of the following processors: scalar pipeline, superscalar, VLIW, EPIC. (2 pts. each) 486 _________________ Itanium _________________ Pentium III _________________ Pentium 4 _________________ 6. Predication changes a ________ into a ________ dependency. Fill in the blanks for the processing order and names of pipeline stages in a dynamically-scheduled processor such as the Intel P6. steps in the front end of the pipeline are done in program order 8. _instruction_fetch__ obtain instructions from the icache 9. _decode_____________ determine the operation and operands 10. _register_renaming__ map register operand names to tags or phy. registers 11. _dispatch___________ allocate ROB entry and send decoded and renamed instruction to the instruction window steps in the middle of the pipeline are done out of program order 12. _issue______________ search instruction window and send ready instructions to execution units 13. _execute____________ execution units performs the required operation 14. _write_(or_complete)_ results from execution units are written to the ROB steps at the last of the pipeline are done in program order 15. _retire_(or_commit)_ results from the oldest completed ROB entries are examined for exceptions and branch mispredictions, and if none are found, the results are written to the register file