CPSC 330 - Fall 2004 Study Guide for Exam 4 This exam will cover Chapters 7 [not 8 and 9 in Fall 2004] 1. Be able to define or match these terms (usually noted in the textbook margins). p. 308 little-endian p. 308 big-endian p. 309 random access memory (RAM) p. 309 read-only memory (ROM) p. 310 memory latency p. 310 memory bandwidth p. 311 primary memory p. 311 secondary memory p. 312 block transfer p. 312 static RAM (SRAM) p. 312 dynamic RAM (DRAM) p. 312 refresh p. 322 row address strobe p. 322 column address strobe p. 325 page-mode DRAM p. 326 programmable ROM (PROM) p. 326 erasable PROM (EPROM) p. 334 single in-line memory module (SIMM) dual in-line memory module (DIMM) p. 334 memory bus p. 337 memory module interleaving p. 337 blocked memory modules p. 344 locality p. 344 spatial locality p. 344 temporal locality p. 344 working set p. 346 miss ratio p. 346 hit ratio p. 347 cache / main-memory latency ratio (cache regime) p. 347 main-memory / disk latency ratio (disk regime) p. 347 virtual memory p. 347 multiprogramming p. 347 primary cache (level one, L1) p. 347 secondary cache (level two, L2) p. 349 cache lines (often called cache blocks) fetch policy (demand fetch or prefetch) p. 348-349 placement policy (fully associative, set-associative, direct-mapped) p. 348-349 replacement policy (LRU, random, etc.) p. 350,356 read miss (refill) burst transfer over memory bus for refill p. 350,356 write-hit policy (write-through or write-back) p. 350,357 write-miss policy (write-allocate or write-no-allocate) p. 350 associative mapping p. 350 tag p. 350 valid bit p. 350 fully associative cache p. 351 associative memory p. 352 direct-mapped cache compulsory miss capacity miss conflict miss p. 353 thrashing p. 354 set-associative cache p. 356 write-through p. 356 write-back p. 356 dirty bit p. 357 write-allocate p. 357 write-no-allocate p. 357 least recently used (LRU) p. 357 random replacement p. 357 cache access time p. 358 speedup p. 358 memory management unit (MMU) p. 359 page fault p. 360 effective address p. 360 virtual address p. 360 physical address p. 362 demand paging p. 363 page table p. 366 translation lookaside buffer (TLB) [omit: ... disk seek rotational latency ... RAID ... ] 2. Be able to fill in the blanks to complete statements of key concepts. p. 372-373 chapter 7 summary 3. Be able to: A. Explain the memory performance parameters in Table 7.2. B. Give typical capacity, latency, and block size values for the memory hierarchy components as listed in Table 7.3. C. Identify and discuss the key design decisions for a two-level memory hierarchy as listed on pp. 348-349. D. Give the three advantages of virtual memory as listed on p. 360. E. Explain the operation of the memory hierarchy as depicted in Figure 7.44. E.g., explain what happens on a TLB miss but cache hit. (Hint: know this diagram well, to the point of being able to reproduce it.) F. Identify at least one example each of temporal and spatial locality of memory references. G. Show big-endian and little-endian storage of data. H. Given memory and cache parameters, give the tag, index, and offset field sizes within the main memory address. I. Explain how a TLB operates. J. Given an address stream and cache parameters, determine the number of misses. K. Explain how row versus column access to a matrix affects cache misses. (E.g., discuss how misses were reduced in the matrix multiply example.) [ omit: L. Identify the three trends in PC memory and I/O controller chip sets. 1. more integration onto North-Bridge and South-Bridge controllers 2. narrower buses to reduce pin count, but running pins at higher speed 3. point-to-point connections rather than shared bus M. Identfy the four components of disk read/write timing and give an approximate value for each. 1. controller overhead 2. disk seek 3. rotational latency 4. transfer rate N. Explain how a disk failure can be tolerated in a RAID system. O. Explain why a disk write is so much more expensive than a read in RAID 5. ] 4. Thought Questions. A. Why do we care about little-endian versus big-endian? What difference does it make? B. Can a programs locality of reference be improved? If so, describe one or two ways to do this. If not, explain why not. C. Can a TLB be called an address cache? Explain your answer. D. Many operating systems use a form of LRU for page replacement. Can the software-implemented replacement algorithm used by the OS also be used for cache line replacement? Explain why or why not. 330 Fall 2003 -- Exam 4 Name: ____________________ Fill in the blank. (2 pts. each blank, 12 pts. total) 1. To supply fast access to a large amount of memory, large and _______________ memories are attached to small ________ ones in a ________________ structure. 2. Virtual memory is the __________________________ and _________________ pair. 3. Slow disk access requires __________________________________ on a page fault. Memory Performance Parameters - write matching letter in blank (2 pts. each) 4. Access time _____ a. time to access first of a sequence of words 5. Bandwidth _____ b. time to access a memory word 6. Block access time _____ c. word transmission rate 7. Block size _____ d. time from start of read to start of next 8. Cycle time _____ e. number of words per block 9. Latency _____ f. time to access entire block from start of read 10. Mark which functions are performed by each different type of cache. (9 pts.) lookup tag match parallel compares __ __ __ Direct mapped |__| |__| |__| Set associative |__| |__| |__| Fully associative |__| |__| |__| 11. Mark which bits are needed by each different type of cache. (9 pts.) valid bit dirty bit replacement choice __ __ __ Direct mapped, write back |__| |__| |__| Set associative, write through |__| |__| |__| Fully associative, write back |__| |__| |__| True/False. Circle T or F. (2 pts. each) 12. T / F Locality of reference is a property of the CPU on which a program is running (e.g., how big the cache is). 13. T / F A cache read miss causes the missing cache line to be brought into cache, and this action is called a refill. 14. T / F A cache write miss will always cause a refill. 15. T / F A burst transfer occurs when multiple words are transferred over the memory bus in response to one address. 16. T / F Interleaved memory modules are required to support burst transfers. 17. T / F The main memory / disk latency ratio is four to five orders of magnitude (i.e., a difference of 10,000 to 100,000 in speed). 18. T / F A page fault occurs when a page in main memory has a set of bit errors that cannot be corrected by ECC. 19. Show the little-endian and big-endian byte ordering of the following data values. Start the byte numbering at address 100. (8 pts.) int a = 0x123; char c[4] = {'a', 'b', 'c', '\0'}; 100 101 102 103 104 105 106 107 big-endian: ___ ___ ___ ___ ___ ___ ___ ___ little-endian: ___ ___ ___ ___ ___ ___ ___ ___ 20. If the main memory access time is 45 ns, and the cache measurements are: hit time of 5 ns, miss time of 55 ns, and hit rate of 90% - what is the memory access speedup of using a cache versus not using a cache? (8 pts.) 21. Consider a 1 GB byte-addressable main memory with a four-way set-associative cache of 256 KB and 16 bytes per line. a) How many total lines are there in cache? (not just per bank) (4 pts.) b) Show the field lengths within the main memory address. (6 pts.) 22. Give the three advantages of virtual memory as listed in your text. (3 pts. each) 1) 2) 3) 23. Extra credit - Draw Figure 7.44 - Operation of the Memory Hierarchy - which shows access to the TLB, cache, and page table. (15 pts.) 330 Spring 2004 -- Exam 4 Name: ____________________ Questions 1-2 -- fill in the blanks. (2 pts. per blank; 4 blanks) 1. To supply fast access to a large amount of memory, large and _______________ memories are attached to small ________ ones in a ________________ structure. 2. Slow disk access requires __________________________________ on a page fault. Memory Performance Parameters - write matching letter in blank. (2 pts. each) 3. Access time _____ a. time to access first of a sequence of words 4. Bandwidth _____ b. time to access a memory word 5. Block access time _____ c. word transmission rate 6. Block size _____ d. time from start of read to start of next 7. Cycle time _____ e. number of words per block 8. Latency _____ f. time to access entire block from start of read Label the following with D (DRAM), S (SRAM), B (both), or N (neither). (2 pts. each) 9. ___ a form of semiconductor memory 10. ___ used for cache memories 11. ___ used for main memories 12. ___ used for disk memories 13. ___ typically implemented with six transistors per bit 14. ___ needs periodic refresh True/False. Circle T or F. (2 pts. each) 15. T / F Locality of reference is a property of the CPU on which a program is running (e.g., how big the cache is). 16. T / F In a write-through cache, a store instruction will set the dirty bit for the cache line that is changed. 17. T / F A burst transfer occurs when multiple words are transferred over the memory bus in response to one address. 18. T / F The main memory / disk latency ratio is four to five orders of magnitude (i.e., a difference of 10,000 to 100,000 in speed). 19. T / F A physically-tagged cache requires that a TLB access be made before or at least in parallel with the cache access. 20. Show the little-endian and big-endian byte ordering of the following data values. Start the byte numbering at address 100. (8 pts.) int a = 0x10234; /* int defaults to 32-bit word */ char c[4] = {'a', 'b', 'c', '\0'}; 100 101 102 103 104 105 106 107 big-endian: ___ ___ ___ ___ ___ ___ ___ ___ little-endian: ___ ___ ___ ___ ___ ___ ___ ___ 21. Identify at least one example each of temporal and spatial locality of memory references. (6 pts.) temporal - spatial - 22. Explain how fast page mode on DRAMs relates to burst transfers. (8 pts.) 23. If the main memory access time is 70 ns, and the cache measurements are: hit time of 20 ns, miss time of 80 ns, and hit rate of 75% - what is the memory access speedup of using a cache versus not using a cache? (8 pts.) 24. Consider a 4 GB byte-addressable main memory with a two-way set-associative cache of 512 KB and 32 bytes per line. a) How many total lines are there in the cache? (not per bank) (2 pts.) b) How many lines are there per bank? (2 pts.) c) How many total tag comparators are there in the cache? (2 pts.) d) Show the field lengths within the main memory address. (4 pts.) 25. Assume a 256-byte main memory and a four-line cache with two bytes per line. The cache is initially empty. For the byte address reference stream (reads) given below circle which of the references are hits for the different cache placement schemes. a) direct-mapped (6 pts.) 0, 8, 1, 25, 9, 16, 8, 17, 24 b) two-way set associative with LRU replacement (6 pts.) 0, 8, 1, 25, 9, 16, 8, 17, 24 c) fully-associative with FIFO replacement (6 pts.) 0, 8, 1, 25, 9, 16, 8, 17, 24 Extra credit - Draw Figure 7.44 - Operation of the Memory Hierarchy - which shows access to the TLB, cache, and page table. (15 pts.)