CPSC 330 Fall 2005 Program 4 Due by 4:00 on Monday, December 12 Turn in using "handin.330.1 4 " Write a simulation of the three-state MSI (modified, shared, invalid) snoopy cache coherency protocol for a two-processor system. Each processor has a one-line cache (8 bytes per line = 2 words per line). Each cache is write- back and write-allocate. The cache coherency protocol is write-invalidate. Each individual cache line has a state: M = modified data in this cache line. The line is incoherent with memory, and the cache is said to own the line since the cache must respond to read requests rather than the memory. No other caches in the system may have a copy of this line. S = shared cache line. The line is coherent with memory and may be present in several caches. Caches must notify the rest of the system about any changes to this line. The main memory is said to own this cache line (but in some implementations a cache responds to read requests when possible since a cache-to-cache transfer is faster than a memory-to- cache refill.) I = invalid line. The cache line is not currently being used. Processor doing a read * If a hit on a Shared or Modified line, there is no bus action. * If a miss, first check what must be replaced. If a Modified line must be replaced, write-back to memory. Second, send a read request across the bus. Bring the line in and set it to Shared state. Processor doing a Write * If a hit on a Modified line, there is no bus action. * If a hit on a Shared line, send an invalidate request across the bus and set the line to Modified state. * If a miss, first check what must be replaced. If a Modified line must be replaced, write-back to memory. Second, send a read-with-intent-to- modify (RIM) request across bus. Bring the line in and set it to Modified state. Another processor seeing a read request * If the address on the bus matches a Shared line, there is no action. * If the address on the bus matches a Modified line, the cache must supply that line to the requestor directly (and write back to memory) and then change the state of that line in its own cache to Shared. Another processor seeing an invalidate request * If the address on the bus matches a line, set the state of that line to invalid. Another processor seeing a RIM request * If the address on the bus matches a Shared line, set that line to invalid. * If the address on the bus matches a Modified line, the cache must supply that line to the requestor directly (and write back to memory) and then change the state of that line in its own cache to invalid. Here are two tables summarizing the above. Of the nine possible global states for a given line ( I/S/M x I/S/M ), three are illegal (SxM, MxS, MxM) and thus not shown in the table. global processor reads processor writes state ------------------- ------------------- ------ bus this another bus this another ---- ---- ------- ---- ---- ------- IxI miss READ I->S I miss RIM I->M I IxS READ I->S S RIM I->M S->I IxM READ I->S WB M->S RIM I->M WB M->I ---- ---- ------- ---- ---- ------- SxI hit none S I hit INV S->M I SxS none S S INV S->M S->I MxI none M I none M I ---- ---- ------- ---- ---- ------- WB notes: (1) on a miss, if the line to be replaced has state M, then WB (2) preemptive WB for another cache holding the line in state M Assume that the main memory is composed of 512 bytes, addressed in hex from 000 to 1ff. Memory is accessed as words (big-endian addressing). Each word should start with a value of zero. Assume each write increments the addressed word by 1. The input should be one line for each processor read or write. The first character should be a '0' or '1' (for processor number), the second character should be an 'r' or 'w' (for read or write), and the rest of the line should be a memory word address (in hex). There are no spaces. The last line should start with something other than 0 or 1. (See the examples below.) (Note: a non-aligned address should be truncated to the word address.) For each processor read or write, you should print the bus action(s) as well as the processor action and cache contents for each processor. (See the examples below.) Also, at the end of the simulation, you should print counts for the various bus actions and cache read/write misses/hits. (See the examples below.) For the input file 0r100 0w100 1r108 1w108 0r100 1r108 0w100 1w108 2 you should get processor 0 processor 1 bus ---------------------------- ---------------------------- action action cache contents action cache contents addr wrd0 wrd1 addr wrd0 wrd1 I ----- ---- ---- I ----- ---- ---- READ read 100 S 100 0 0 I ----- ---- ---- INV write 100 M 100 1 0 I ----- ---- ---- READ M 100 1 0 read 108 S 108 0 0 INV M 100 1 0 write 108 M 108 1 0 (none) read 100 M 100 1 0 M 108 1 0 (none) M 100 1 0 read 108 M 108 1 0 (none) write 100 M 100 2 0 M 108 1 0 (none) M 100 2 0 write 108 M 108 2 0 Stats: bus processor 0 processor 1 -------- --------------- --------------- READs 2 read hits 1 read hits 1 RIMs 0 read misses 1 read misses 1 WBs 0 write hits 2 write hits 2 INVs 2 write misses 0 write misses 0 -------- --------------- --------------- total 4 hit rate 75.0% hit rate 75.0% while modifying this input file so that processor 1 uses address 0x104 rather that 0x108, 0r100 0w100 1r104 1w104 0r100 1r104 0w100 1w104 2 you should get (demonstrating the effects of false-sharing) processor 0 processor 1 bus ---------------------------- ---------------------------- action action cache contents action cache contents addr wrd0 wrd1 addr wrd0 wrd1 I ----- ---- ---- I ----- ---- ---- READ read 100 S 100 0 0 I ----- ---- ---- INV write 100 M 100 1 0 I ----- ---- ---- RD/WB S 100 1 0 read 104 S 100 1 0 INV I ----- ---- ---- write 104 M 100 1 1 RD/WB read 100 S 100 1 1 S 100 1 1 (none) S 100 1 1 read 104 S 100 1 1 INV write 100 M 100 2 1 I ----- ---- ---- RIM/WB I ----- ---- ---- write 104 M 100 2 2 Stats: bus processor 0 processor 1 -------- --------------- --------------- READs 3 read hits 0 read hits 1 RIMs 1 read misses 2 read misses 1 WBs 3 write hits 2 write hits 1 INVs 3 write misses 0 write misses 1 -------- --------------- --------------- total 10 hit rate 50.0% hit rate 50.0%