[Note: used different textbook in Fall 2008] CPSC 330 - Fall 2008 - Exam 2 (with answers) No calculators. Consider a block diagram (high-level circuit) showing the two-dimensional organization of a RAM, and identify the components and signals required to access the RAM. Place the appropriate letter, a-j, of the correct component or signal in the blanks numbered 1-10. (1 pt. each) a. address f. read/write control signal b. column decoder g. row decoder c. column address strobe (CAS) h. row buffer d. data bits i. row address strobe (RAS) e. memory cell array j. sense/write circuitry 1) __g___ 2) __e___ +-+ +-------------+ 3) __i___ --------------------->| |------>| | .--------/----->| | ... | 4K x 4K | | high bits | |------>| | 4) __a___ --/--< +-+ +-------------+ | | | ... | | | +-------------+ | | 5) __j___ |<-- 6) __f___ | +-------------+ | +-+ | | ... | | | low bits | |-->+-------------+ `-------/---------->| |...| 8) __h___ | 7) __c___ ------------------------->| |-->+-------------+ +-+ ^ ^ |...| 9) __b___ v v 10) __d___ 11. Identify at least three difference between DRAM and SRAM. (6 pts.) SRAM is faster SRAM is more expensive SRAM is used for caches while DRAM is used for main memory (Cray-1 was exception) SRAM uses 6 transistors per bit (typically) while DRAM uses one SRAM bit storage based on latches while DRAM based on capacitance DRAM needs periodic refresh 12. Consider a memory burst of 7-1-1-1 that transfers a total of 16 bytes. If the bus is clocked at 500 MHz and there is no bus arbitration overhead, what is the bus bandwidth? (6 pts.) 7-1-1-1 means that 16 bytes is transferred in 10 bus clock cycles (four bytes after the initial seven cycles and then four bytes each for the next three cycles) 500 MHz means that a bus clock cycle takes 2 ns 16 bytes 16 8 -------- = ----------- Bps = -- * 10^9 Bps = 0.8 GBps 10 * 2ns 20 * 10^-9 10 (or 800 MBps) 13. Explain the difference between temporal locality and spatial locality. (8 pts.) temporal - a memory location that was just referenced is likely to be referenced again (a data item near the top of the stack) spatial - a memory location nearby one that was just referenced is likely to be referenced next (e.g., traversing an array) 14. Explain the difference between the three types of cache misses: compulsory, conflict, and capacity. (9 pts.) compulsory (a.k.a. cold miss) - first reference to a line (assumes that there is no prefetch) conflict - a reference to a line that has been in cache but was replaced because of a mapping conflict (more active lines map to the same index value than the degree of set associativity) capacity - a reference to a line that has been in cache but was replaced because of the size of the cache (the cache is too small to hold the working set and thus active lines are being replaced, or there are mutiple working sets that cannot be co-resident because of their total size) 15. Consider a 1 TB byte-addressable main memory with a level 1 data cache that is four-way set-associative, 2 MB in size, has a 16-byte line size, and implements write-back, write-allocate, and pseudo-LRU replacement. bank 0 bank 1 bank 2 bank 3 replacement info +------+ +------+ +------+ +------+ +-+ | | | | | | | | | | ... ... ... ... ... | | | | | | | | | | +------+ +------+ +------+ +------+ +-+ a) How many total lines are there in cache? (not just per bank) (2 pts.) 2 MB/cache 2^21 ---------- = ---- lines/cache = 2^17 lines/cache 16 B/line 2^4 = 128 K lines/cache b) How many lines are there in bank? (2 pts.) 128 K lines/cache ----------------- = 32 K lines/bank 4 banks/cache c) Show how the main memory address is partitioned into fields for the cache access and give the bit lengths of those fields. (9 pts.) log_2( 1 T ) = 40 address bits log_2( 16 ) = 4 offset bits log_2( 32 K ) = 15 index bits 40 - 4 - 15 = 21 tag bits +---------------------+---------------+------+ | tag | index |offset| +---------------------+---------------+------+ 21 15 4 d) How many bits are in each line in a bank, including the tag and any other needed directory bits? Explain your calculation. (4 pts.) each line needs a valid bit, dirty bit (since the cache is write- back), and tag value along with the line contents = ( 1 + 1 + 21 ) bits + 16 bytes * 8 bits/byte = 23 + 128 = 151 bits/line (directory bits) +-+-+---------+-----------------------------------------+ |v|d| tag | contents | +-+-+---------+-----------------------------------------+ 1 1 21 128 Extra Credit) How many bits are needed to hold the replacement info for the complete cache? Explain your calculation. (4 pts.) pseudo-LRU for 4-way set associativity needs 3 bits per index value (that is, per lines/bank) 3 bits/index * 32 K index values = 96K bits 16. Assume a 256-byte main memory and a four-line cache with four bytes per line. The cache is initially empty. For the byte address reference stream (reads) given below circle which of the references are hits for the different cache placement schemes. Also, show the final contents of the cache. (The byte addresses are in decimal.) a) direct-mapped (9 pts.) 0, 16, 1, 31, 2, 32, 3, 17, 4, 18 hit hit +---------+ 0 | 16-19 | +---------+ 1 | 4- 7 | +---------+ 2 | | +---------+ 3 | 28-31 | +---------+ b) two-way set associative with LRU replacement (9 pts.) 0, 16, 1, 31, 2, 32, 3, 17, 4, 18 hit hit hit hit +---------+ +---------+ 0 | 0- 3 |<-LRU | 16-19 | +---------+ +---------+ 1 | 28-31 |<-LRU | 4- 7 | +---------+ +---------+ c) fully-associative with FIFO replacement (9 pts.) 0, 16, 1, 31, 2, 32, 3, 17, 4, 18 hit hit hit hit hit +---------+ +---------+ +---------+ +---------+ | 4- 7 | | 16-19 | | 28-31 | | 32-35 | +---------+ +---------+ +---------+ +---------+ ^ FIFO 17. Explain the purpose of a tri-state buffer, and state where you would find such buffers used in a datapath. (3 pts.) can electrically disconnect a source from a shared circuit; used to connect multiple registers to a single shared bus 18. Identify the primary advantage of hardwired control. (3 pts.) speed 19. Identify an advantage of microprogrammed control. (3 pts.) can easily be changed (updated/corrected) Consider the handout with the Figure 7.1 datapath and the example control sequence for ADD (R3),R1, which implements R1 <- R1 + memory[R3]. 20. Give the control sequence for a two-word add instruction ADD NUM,R1 where the opcode ADD, register R1, and addressing mode are given in the first instruction word, and the address NUM is given in the second instruction word. The effect should be: R1 <- R1 + memory[NUM]. (12 pts.) // was given as example on HW3 1. PC_out, MAR_in, Read, Select_4, Add, Z_in // inst. fetch 2. Z_out, PC_in, Y_in, WMFC 3. MDR_out, IR_in 4. PC_out, MAR_in, Read, Select_4, Add, Z_in // address fetch 5. Z_out, PC_in, WMFC 6. MDR_out, MAR_in, Read // data fetch 7. R1_out, Y_in, WMFC // stage R1 into Y 8. MDR_out, Select_Y, Add, Z_in // add memory operand 9. Z_out, R1_in // update R1 Extra Credit. Give the control sequence for a one-word unconditional branch instruction BR X where the opcode BR and a PC-relative offset are given in the instruction word. To calculate the branch target address X, you must add the updated PC (that is, the address of the next word beyond the branch) to the address field obtained from the IR (you can use the IR_address_out control signal to place the sign-extended offset on the internal processor bus). The effect should be: PC <- updated_PC + offset. (6 pts.) // was given in Figure 7.7 of textbook 1. PC_out, MAR_in, Read, Select_4, Add, Z_in // inst. fetch 2. Z_out, PC_in, Y_in, WMFC // note staging of 3. MDR_out, IR_in // updated PC to Y 4. IR_address_out, Select_Y, Add, Z_in 5. Z_out, PC_in