Homework 4 examples Purposes: (1) perform average memory access time calculations; (2) determine partitioning of memory address fields for cache access; (3) determine cache hits and misses for a given memory access trace. Note: For main memory and cache sizes, interpret K, M, and G as powers of two. Otherwise, interpret K, M, and G as powers of ten. 1. Consider a computer system that contains a cache with 1-cycle access time and a memory with 10-cycle access time. If the hit rate is 75% and a cache miss requires both a single cache access and a single memory access, what is the average memory access time (AMAT)? AMAT = .75*(1 cycle) + .25*(1 cycle + 10 cycles) = 3.5 cycles 2. Consider a 16 MB main memory with a 32 KB direct-mapped cache and 8 bytes per line. (a) How many total lines are there in cache? # lines = 32K bytes / 8 bytes per line = 4K lines (b) Identify the main memory address fields and sizes. # memory address bits = log_base_2( 16 M ) = 24 # index bits = log_base_2( 4 K ) = 12 # line offset bits = log_base_2( 8 ) = 3 # tag bits = 24 - ( 12 + 3 ) = 9 23 15 14 3 2 1 0 +------------------+------------------------+------+ | 9-bit tag | 12-bit index |3-bit | | | |offset| +------------------+------------------------+------+ (c) How many comparators must operate in parallel to perform tag-matching? There is only one comparator in a direct-mapped cache. 3. Consider a word-addressed computer system with one-word instructions. If a program has an initial sequential execution section and then a loop with instruction addresses as follows .-----. | 0-10| `-----' .---->| | .-----. | |11-30| three iterations | `-----' `-----' the instruction fetch address stream is 0,1,...,9,10, 11,12,...,29,30, 11,12,...,29,30, 11,12,...,29,30 \iteration 1/ \iteration 2/ \iteration 3/ Consider a 16-word direct-mapped instruction cache with 4 words per line. Determine the number of cache hits and misses when executing the program above. index _________________ memory block mapping 0 |v|tag|__/__/__/__| 0- 3, 16-19, ... 1 |v|tag|__/__/__/__| 4- 7, 20-23, ... 2 |v|tag|__/__/__/__| 8-11, 24-27, ... 3 |v|tag|__/__/__/__| 12-15, 28-31, ... cache reference analysis cache block first second third start address iteration iteration iteration 0 .-----. 0 miss (1) | | 1 hit | 0-10| 2 hit | | 3 hit 4 | | 4 miss (2) | | 5 hit | | 6 hit | | 7 hit 8 | | 8 miss (3) | | 9 hit `-----' 10 hit .---->| | .-----. 11 hit 11 miss (9) 11 miss (13) 12 | | | 12 miss (4) 12 miss (10) 12 miss (14) | |11-30| 13 hit 13 hit 13 hit | | | 14 hit 14 hit 14 hit | | | 15 hit 15 hit 15 hit 16 | | | 16 miss (5) 16 hit 16 hit | | | 17 hit 17 hit 17 hit | | | 18 hit 18 hit 18 hit | | | 19 hit 19 hit 19 hit 20 | | | 20 miss (6) 20 hit 20 hit | | | 21 hit 21 hit 21 hit | | | 22 hit 22 hit 22 hit | | | 23 hit 23 hit 23 hit 24 | | | 24 miss (7) 24 miss (11) 24 miss (15) | | | 25 hit 25 hit 25 hit | | | 26 hit 26 hit 26 hit | | | 27 hit 27 hit 27 hit 28 | | | 28 miss (8) 28 miss (12) 28 miss (16) | | | 29 hit 29 hit 29 hit | `-----' 30 hit 30 hit 30 hit `-----' miss 1 - 0- 3 fills empty line 0 miss 2 - 4- 7 fills empty line 1 miss 3 - 8-11 fills empty line 2 miss 4 - 12-15 fills empty line 3 miss 5 - 16-19 replaces 0- 3 in line 0 miss 6 - 20-23 replaces 4- 7 in line 1 miss 7 - 24-27 replaces 8-11 in line 2 miss 8 - 28-31 replaces 12-15 in line 3 miss 9 - 8-11 replaces 24-27 in line 2 miss 10 - 12-15 replaces 28-31 in line 3 miss 11 - 24-27 replaces 8-11 in line 2 miss 12 - 28-31 replaces 12-15 in line 3 miss 13 - 8-11 replaces 24-27 in line 2 miss 14 - 12-15 replaces 28-31 in line 3 miss 15 - 24-27 replaces 8-11 in line 2 miss 16 - 28-31 replaces 12-15 in line 3 Thus there are 55 hits and 16 misses, for a total of 71 references. Assume for a moment that instruction execution time is based solely on the instruction fetch timiing. If the instruction cache access time is 1 cycle, the memory access time is 5 cycles, and cache misses require 1 cycle to miss and 5 cycles per word to refill, what is the instruction fetch timing? 55 cache hits @ 1 cycle = 55 cycles 16 cache misses @ 21 cycles (1+5*4)= 336 cycles total = 391 cycles If supported in the hardware, a burst transfer from memory could speed up the timing. If a 5-1-1-1 burst is available, what is the timing? 55 cache hits @ 1 cycle = 55 cycles 16 cache misses @ 9 cycles (1+5+1+1+1) = 144 cycles total = 199 cycles