CPSC 330 - Fall 2011 - Exam 2 Name: ____ No calculators. 1. Consider the MIPS branch on equal (beq) instruction as implemented on the datapath given in Figure 4.2 below. Assume Reg[1] = 2 and Reg[3] = 2. beq $1, $3, 100 // if( Reg[1] == Reg[3] ) // PC <- ( PC + 4 ) + ( sign_ext(BranchOffset) << 2 ) Circle the correct value 0 or 1 for the control signals (a-e) and circle whether each of the three muxes (f-h) selects its upper input, lower input, or don't care. For the ALU operation (i) circle one of the function names. (12 pts.) (a) Branch = 0 1 (b) MemRead = 0 1 (c) RegWrite = 0 1 (d) Zero = 0 1 (e) MemWrite = 0 1 (f) Mux1 (upper left; output to PC) = upper, lower, don't care (g) Mux2 (upper middle; output to Regs) = upper, lower, don't care (h) Mux3 (lower middle; output to ALU) = upper, lower, don't care (i) ALU operation = and, or, add, subtract, set-on-less-than, nor
(a) 1 (b) 0 (c) 0 (d) 1 (e) 0 (f) upper (g) don't care (h) upper (i) subtract 2. Explain the purpose of a tri-state buffer, and state where you would find such buffers used in a datapath. (4 pts.) A tri-state buffer allows a signal to pass through as 0 or 1 as well as providing a third state of zero impedance to appear disconnected. Tri-state buffers are used when connecting multiple sources to a single shared communication medium, such as an internal bus. 3. Identify the primary advantage of hardwired control. (5 pts.) speed 4. Identify an advantage of microprogrammed control. (5 pts.) flexibility; contents of control store can typically be changed without redesigning or rewiring the rest of the hardware 5. Identify at least three difference between DRAM and SRAM. (6 pts.) SRAM is faster SRAM is more expensive SRAM is used for caches while DRAM is used for main memory (Cray-1 was exception) SRAM uses 6 transistors per bit (typically) while DRAM uses one SRAM bit storage based on latches while DRAM based on capacitance DRAM needs periodic refresh 6. Explain the difference between temporal locality and spatial locality. (8 pts.) temporal - use once, likely to use again spatial - use once, likely to use a neighboring location 7. For the MIPS instruction sequence below, draw the data dependency graph and identify any data dependencies as RAW, WAR, or WAW. (Destination registers are listed first.) (12 pts.) i1: sub $3, $1, $2 // Reg[3] <- Reg[1] - Reg[2] i2: add $6, $4, $5 // Reg[6] <- Reg[4] + Reg[5] i3: lw $2, 8($7) // Reg[2] <- Memory[ Reg[7] + 8 ] i4: add $8, $2, $6 // Reg[8] <- Reg[2] + Reg[6] $1 $2 v v .--------. | i1:sub | `--------' . v $3 . $2/WAR . . $7 $4 $5 v v v v .--------. .--------. | i3:lw | | i2:add | `--------' `--------' \ / $2/RAW \ / $6/RAW v v .--------. | i4:add | `--------' v $8 8. For the MIPS instruction sequence given in question 7, show the pipeline cycle diagrams for the standard 5-stage pipeline with forwarding. Assume register file writes occur in the first half cycle and reads in the second half cycle. (6 pts.) 1 2 3 4 5 6 7 8 9 i1:sub IF ID EX MEM WB i2:add IF ID EX MEM WB i3:lw IF ID EX MEM WB i4:add IF ID -- EX MEM WB - WAR will not cause a stall in a scalar pipeline - i4 reads $6 from regfile in cycle 6 (ID stage actions are repeated) - loaded value from i3 not available until end of cycle 6 - i4 gets forwarded value from i3 at start of cycle 7 9. Consider a memory burst of 5-1-1-1 that transfers a total of 16 bytes. If the bus is clocked at 500 MHz and there is no bus arbitration overhead, what is the bus bandwidth? (6 pts.) bus cycle time = 1/(500 MHz) = 1/(500x10^6 cycles/sec) = (1/500) * 10^(-6) sec/cycle = (1000/500) * 10^(-9) sec/cycle = 2 * 10^(-9) sec/cycle = 2 nsec/cycle 5-1-1-1 = 8 total bus cycles to transfer 16 bytes 16 bytes / 8 bus cycles = 2 bytes / cycle = 2 bytes / (1 cycle * 2 nsec/cycle) = 2 bytes / 2 nsec = 1 bytes / nsec = 1 bytes / 10^(-9) sec = 10^9 bytes / sec = 1 GB/sec 10. Consider a computer system that contains a cache with 2-nsec access time and a memory with 100-nsec access time. If the hit rate is 80% and a cache miss requires both a single cache access and a single memory access, what is the average memory access time (AMAT)? (6 pts.) 80% of time => 2 nsec 20% of time => 100 nsec + 2 nsec AMAT = 2 nsec + 20% * 100 nsec = 22 nsec 11. Consider a 4 GB byte-addressable main memory with a level 1 data cache that is three-way set-associative, 48 KB in size, and has an 16-byte line size. a) How many total lines are there in cache? (not just per bank) (2 pts.) b) How many lines are there in bank? (1 pts.) c) Show how the main memory address is partitioned into fields for the cache access and give the bit lengths of those fields. (9 pts.) a) 48 K bytes / (16 bytes/line) = 3 K lines = 3072 lines b) 3 K lines / 3 banks = 1 K lines/bank = 1024 lines per bank c) log_base_2(16) = 4 bits for line offset field log_base_2(1024) = 10 bits for line index field log_base_2(4G) = 32 bits for full address 32 - 4 - 10 = 18 bits for tag field +--------------------------+--------------+------+ | tag | index |offset| +--------------------------+--------------+------+ 18 bits 10 bits 4 bits 12. Assume a 256-byte main memory and a four-line cache with four bytes per line. The cache is initially empty. For the byte address reference stream (reads) given below circle which of the references are hits for the different cache placement schemes. Also, show the final contents of the cache. (The byte addresses are in decimal.) a) direct-mapped (6 pts.) 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 mapping +--+ index=0| | 0- 3, 16-19, 32-35, ... +--+ index=1| | 4- 7, 20-23, 36-39, ... +--+ index=2| | 8-11, 24-27, 40-43, ... +--+ index=3| | 12-15, 28-31, 44-47, ... +--+ 2 hits: 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 hit hit final contents +-------+ 0| 32-35 | +-------+ 1| 20-23 | +-------+ 2| 8-11 | +-------+ 3|| +-------+ b) two-way set associative with LRU replacement (6 pts.) 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 mapping +--+ +--+ index=0| | | | 0- 3, 8-11, 16-19, 24-27, 32-35, ... +--+ +--+ index=1| | | | 4- 7, 12-15, 20-23, 28-31, 36-39, ... +--+ +--+ 4 hits: 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 hit hit hit hit final contents bank 0 bank 1 +-------+ +-------+ 0| 32-35 |<-LRU | 8-11 | +-------+ +-------+ 1| 4- 7 |<-LRU | 20-23 | +-------+ +-------+ c) fully-associative with FIFO replacement (6 pts.) 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 5 hits: 4, 20, 5, 35, 6, 36, 7, 21, 8, 22 hit hit hit hit hit final contents +-------+ +-------+ +-------+ +-------+ | 8-11 | | 20-23 | | 32-35 | | 36-39 | +-------+ +-------+ +-------+ +-------+ next to replace XC. Give the control sequence to implement jsub. Include the instruction fetch. (up to 12 pts.) jsub -- mem[addr] <- updated pc; pc <- addr + 1 ACC_in = ACC <- bus ACC_out = bus <- ACC IR_in = IR <- bus IR_out = bus <- address bits of IR MAR_in = MAR <- bus MDR_out = bus <- MDR MDR_in = MAR <- bus PC_out = bus <- PC PC_in = PC <- bus TMP_out = bus <- TMP alu_add = TEMP <- MDR + bus alu_sub = TEMP <- MDR - bus read = MDR <- memory[MAR] write = memory[MAR] <- MDR pc_incr = PC <- PC + 1 br_table = jump to microcode subroutine based on opcode in IR .-------. .-------. .-------. pcincr->| PC | | MAR | | ACC |(=0)-> acceq0 `-------' `-------' `-------' | ^ ^ | ^ PC_out o o PC_in MAR_in o ACC_out o o ACC_in v | | v | .-----------*---------------------*--------------------*----------------. | internal datapath bus | `-----------*---------------------*--------------------*----------------' ^ | ^ | | ^ IR_out o o IR_in MDR_out o o MDR_in | o TEMP_out | v | v | | .-------. .-------. | | | IR | | MDR |-----. | | `-------' `-------' v v | ----- ----- | \ \______/ / | (memory signals) \ / | read alu_add->\ ALU / | write alu_sub-->\__________/ | v | .------. | | TEMP |---------' `------ // instruction fetch 1. PC_out, MAR_in // MAR <- PC 2. read, PC_incr // MDR <- memory[MAR]; PC <- PC + 1 3. MDR_out, IR_in // IR <- MDR 4. decode // memory[address] <- updated PC 5. IR_out, MAR_in // MAR <- address bits of IR 6. PC_out, MDR_in // MDR <- PC 7. write // memory[MAR] <- MDR // PC <- address + 1 8. IR_out, PC_in // PC <- address bits of IR 9. PC_incr // PC <- PC + 1