CPSC 330 - Fall 2005 Study Guide for Exam 4 This exam will cover the RAM part of Appendix A and cache part of Chapter 7 1. Be able to define or match these terms random access memory (RAM) read-only memory (ROM) static RAM (SRAM) dynamic RAM (DRAM) refresh memory array row decoder row address strobe row buffer column decoder column address strobe page-mode DRAM programmable ROM (PROM) erasable PROM (EPROM) memory bus split transaction bus centralized arbitration daisy chain arbitration decentralized arbitration ring (token) arbitration memory module interleaving memory hierarchy locality spatial locality temporal locality working set hit rate miss rate hit time miss penalty multilevel cache primary cache (level one, L1) secondary cache (level two, L2) fetch policy (demand fetch or prefetch) placement policy (fully associative, set-associative, direct-mapped) replacement policy (LRU, random, etc.) cache miss refill burst transfer over memory bus for refill write-hit policy (write-through or write-back) write-miss policy (write-allocate or write-no-allocate) cache lines (often called cache blocks) tag valid bit direct-mapped cache set-associative cache fully associative cache compulsory miss capacity miss conflict miss thrashing write-through write-back dirty bit write-allocate write-no-allocate least recently used (LRU) 2. Be able to: A. Give typical capacity, latency, and block size values for the memory hierarchy components. B. Draw a block diagram (high-level circuit) showing the two-dimensional organization of a RAM and/or identify the components and signals required to access RAM. C. Draw a timing diagram of bus and RAM chip activity for a memory read. D. Identify at least one example each of temporal and spatial locality of memory references. E. Given memory and cache parameters, give the tag, index, and offset field sizes within the main memory address. F. Given an address stream and cache parameters, determine the number of misses. G. Explain how row versus column access to a matrix affects cache misses. (E.g., discuss how misses were reduced in the matrix multiply example.) --- CPSC 330 - Fall 2005 Study Guide for Exam 3 to be updated Coverage: datapath design, control unit implementation 1. Be able to define or match these terms. register clock internal CPU bus tri-state buffer CPU data path control control signal hardwired control microprogrammed control control store 2. Be able to: F. Given a high-level statement, derive the necessary RTL and/or control signals to implement that statement on a given datapath. G. Given a user-level instruction, derive the necessary RTL and/or control signals to implement the fetch/decode/execute of that instruction on a given datapath. H. Explain the purpose/use of the fields in a microinstruction in the microprogramming example we studied in class. --- * Chapter 5 section 5.3 - register files and other datapath elements section 5.9 - recent Pentium implementations * Chapter 6 sections 6.1 + 6.2 - pipelining sections 6.4 + 6.5 - data hazards, forwarding, and stalls section 6.6 - branch hazards and branch prediction section 6.9 - ILP and multiple-issue approaches section 6.10 - Pentium 4 pipeline 1. Be able to define or match these terms. instruction cache multi-ported register file sign-extension ALU data cache pipelining pipeline stages pipeline latches structural hazard dependence (data dependency) data hazard forwarding load-use data hazard pipeline stall register scoreboard control hazard (branch hazard) branch target address (BTA) branch taken branch not taken (untaken) delayed branch branch delay slot branch prediction misprediction misprediction recovery flushing pipeline stages dynamic branch prediction branch target address cache (BTAC) branch history table (BHT) branch target buffer (BTB) branch history shift register (BHSR) gshare branch prediction algorithm instruction-level parallelism (ILP) multiple issue static multiple issue very long instruction word (VLIW) issue slots predication speculation (speculative execution) issue packet explicitly parallel instruction computing (EPIC) loop unrolling dynamic multiple issue superscalar RAW (read after write ordering must be preserved) - true dependency WAR (write after read ordering must be preserved) - anti-dependency WAW (write after write ordering must be preserved) - output dependency false dependencies register renaming architectural registers commit unit reservation station reorder buffer (completion buffer) in-order commit (retire) out-of-order execution trace cache thread-level parallelism (TLP) multithreading simultaneous multithreading (SMT) multicore data-level parallelism (DLP) vector processor single-instruction-multiple-data (SIMD) multimedia ISA extension 2. Be able to: A. Describe in general what each stage in the 5-stage pipeline does. B. Identify and discuss hardware and software solutions to hazards. C. Given a code sequence, draw a data dependency graph. D. Given a code sequence, identify where any stalls occur and determine their duration on the 5-stage pipeline. E. Given a code sequence, identify where any forwarding actions occur on the 5-stage pipeline. F. Given a code sequence, draw the pipeline cycle diagram (stairstep diagram) for that code executing on the 5-stage pipeline. G. Explain how data hazards are detected and how forwarding paths work. H. Draw forwarding paths on a pipeline diagram. I. Describe how delayed branches work in the 5-stage pipeline. J. Describe how a BTB (or BTAC and BHT) provides a predicted next instruction address for branch instructions. K. Calculate the specific or average CPI for a given branching scheme. Alternatively, calculate misprediction penalties. L. Calculate the prediction accuracy for a given branch prediction scheme. M. Distinguish among superscalar, EPIC, and VLIW in terms of dependency checking, function unit assignment, and execution scheduling.