CPSC 330 - Spring 2013 Project 3 - Simple simulator for MSI cache coherency [4/28 - corrected handin command] Due on Monday, April 29 Grading Weight: 100% correct program Submission: handin.330.1 3 Write a simulation of "snoopy" cache coherency for a two-processor system using the MSI protocol. You can use C or any other programming language available on the School of Computing Linux servers. The program should read from standard input and write to standard output, and thus be capable of command-line I/O redirection. In a cache-coherent system based on snooping, there is typically a single memory bus, and each cache snoops (i.e., listens to) the bus traffic. (Often, each processor's cache will have a second set of cache tags dedicated for snooping.) The MSI protocol uses the modified and invalid bits in a cache line to encode three states: M = Modified cache line. The cache line is incoherent with memory, and no other caches in the system are allowed to have a copy of this cache line. S = Shared cache line. The cache line is coherent with memory and there may be copies of this cache line in one or more caches. Caches must notify the rest of the system about any changes to a cache line in Shared state. I = Invalid cache line. The cache line is empty. These are the cache actions in the active processor: Write back - A complete line is written back to memory when a cache line in Modified state is replaced. Read word - This can result in either a cache hit or miss. The refill action on a miss will first cause a WB if the cache line being replaced is in Modified state. Write word - This can result in either a cache hit or miss. The refill action on a miss will first cause a WB if the cache line being replaced is in Modified state. These are the bus actions: READ - Read a cache line (multiple words) from memory or another cache. RIM - Read with intent to modify - read a cache line in preparation for writing one of the words held within. INV - Tell other caches to invalidate a cache line. WBr - A complete line is written back to memory when a Modified cache line is replaced. In this simulation, a replacement-caused WB action will be a separate bus action that precedes the associated READ or RIM action. WBc - A complete line is written back to memory when a READ or RIM is recognized on the bus by a snoopy cache that holds the cache line in Modified state. In this simulation, a coherence-caused WB action will be combined with the associated READ or RIM action. These are the cache actions in the snooping processor: Read - If the address on the bus matches a cache line in Shared state, there is no change. If the address on the bus matches a cache line in Modified state, the cache will cause the bus action to abort, write the modified cache line back to memory, change the state of the cache line to Shared, and then allow the bus read to retry. In many systems, this sequence can be accelerated by having the the active cache grab the data as it is written back by the snooping cache. This will be designated in the simulation as a RD/WB action. RIM - If the address on the bus matches a cache line in Shared state, the cache line is invalidated. If the address on the bus matches a cache line in Modified state, the cache must cause the bus action to abort, write the modified cache line back to memory, invaidate the cache line, and then allow the bus read to retry. In many systems, this sequence can be accelerated by having the active cache grab the data as it is written back by the snooping cache. This will be designated in the simulation as a RIM/WB action. INV - If the address on the bus matches a cache line in Shared state, the cache line is invalidated. If the address on the bus matches a cache line in Modified state, this is an error. As a state machine, the active transitions (that is, transitions based on action by the active processor) are: cache hit, current state, action | new state / bus action ----------------------------------+-------------------------- miss I read | S READ miss I write | M RIM miss S read | S READ miss S write | M RIM miss M read | S WBr then READ miss M write | M WBr then RIM hit S read | S (none) hit S write | M INV hit M read | M (none) hit M write | M (none) (Note: You can't have a hit on an invalid cache line.) The bus-driven, snoopy transitions (that is, transitions based on an action by another processor) are: address, current state, bus action | new state / revised match | bus action ------------------------------------+------------------------ no x x | no change yes S READ | S yes S RIM | I yes S INV | I yes M READ | S RD/WB yes M RIM | I RIM/WB (Note: INV is only possible if the active cache has the cache line in state S. You can't have a cache line in one cache as S and the other cache as M; this would indicate a logic error.) The caches in your simulation are trivial - one cache line per processor, eight bytes per cache line (two words), and the processors only access aligned words. You should simulate a memory of 512 words (= 2K bytes). Each memory word should start with a value of zero, and is incremented by each processor write. In the simulation, write backs should update the simulated memory so that later reads with reload the cache line with the correct values. The input should be one line for each processor read or write. The first character is a '0' or '1' (for processor number), the second character is an 'r' or 'w' (for read or write), and the rest of the line should be a memory word address (in hex and word aligned). There are no spaces in an input line, and the last line should start with something other than 0 or 1. (See the examples below.) For each processor read or write, you should print the bus action(s) as well as the processor action and cache contents for each processor. Also, at the end of the simulation, you should print counts for the various bus actions and cache read/write misses/hits. For this input file: 0r100 0w100 0r200 1r100 0r100 1w100 1w300 2 you should get the following output: processor 0 bus processor 1 ---------------------------- ------ ---------------------------- action cache contents action action cache contents addr wrd0 wrd1 addr wrd0 wrd1 I ----- ---- ---- I ----- ---- ---- read 100 S 100 0 0 READ I ----- ---- ---- write 100 M 100 1 0 INV I ----- ---- ---- WBr read 200 S 200 0 0 READ I ----- ---- ---- S 200 0 0 READ read 100 S 100 1 0 read 100 S 100 1 0 READ S 100 1 0 I ----- ---- ---- INV write 100 M 100 2 0 WBr I ----- ---- ---- RIM write 300 M 300 1 0 Stats: processor 0 processor 1 bus --------------- --------------- -------- read hits 0 read hits 0 READs 4 read misses 3 read misses 1 RIMS 1 write hits 1 write hits 1 WBs 2 write misses 0 write misses 1 INVs 2 --------------- --------------- -------- hit rate 25.0% hit rate 33.3% total 9 For this file: 0r100 0w100 1r108 1w108 0r100 1r108 0w100 1w108 2 you should get: processor 0 bus processor 1 ---------------------------- ------ ---------------------------- action cache contents action action cache contents addr wrd0 wrd1 addr wrd0 wrd1 I ----- ---- ---- I ----- ---- ---- read 100 S 100 0 0 READ I ----- ---- ---- write 100 M 100 1 0 INV I ----- ---- ---- M 100 1 0 READ read 108 S 108 0 0 M 100 1 0 INV write 108 M 108 1 0 read 100 M 100 1 0 (none) M 108 1 0 M 100 1 0 (none) read 108 M 108 1 0 write 100 M 100 2 0 (none) M 108 1 0 M 100 2 0 (none) write 108 M 108 2 0 Stats: processor 0 processor 1 bus --------------- --------------- -------- read hits 1 read hits 1 READs 2 read misses 1 read misses 1 RIMS 0 write hits 2 write hits 2 WBs 0 write misses 0 write misses 0 INVs 2 --------------- --------------- -------- hit rate 75.0% hit rate 75.0% total 4 Modifying the second input file so that processor 1 uses address 0x104 rather than 0x108: 0r100 0w100 1r104 1w104 0r100 1r104 0w100 1w104 2 you should get the following (demonstrating the performance impact of false-sharing): processor 0 bus processor 1 ---------------------------- ------ ---------------------------- action cache contents action action cache contents addr wrd0 wrd1 addr wrd0 wrd1 I ----- ---- ---- I ----- ---- ---- read 100 S 100 0 0 READ I ----- ---- ---- write 100 M 100 1 0 INV I ----- ---- ---- S 100 1 0 RD/WB read 104 S 100 1 0 I ----- ---- ---- INV write 104 M 100 1 1 read 100 S 100 1 1 RD/WB S 100 1 1 S 100 1 1 (none) read 104 S 100 1 1 write 100 M 100 2 1 INV I ----- ---- ---- I ----- ---- ---- RIM/WB write 104 M 100 2 2 Stats: processor 0 processor 1 bus --------------- --------------- -------- read hits 0 read hits 1 READs 3 read misses 2 read misses 1 RIMS 1 write hits 2 write hits 1 WBs 3 write misses 0 write misses 1 INVs 3 --------------- --------------- -------- hit rate 50.0% hit rate 50.0% total 10 The following example input file has R/W sharing of the word at 0x100. 0r100 1r100 0w100 1w100 0r100 1r100 0w100 0r100 1w100 1r100 0r200 1w100 1r100 1w100 0r100 2 For this file, the simulation should produce: processor 0 bus processor 1 ---------------------------- ------ ---------------------------- action cache contents action action cache contents addr wrd0 wrd1 addr wrd0 wrd1 I ----- ---- ---- I ----- ---- ---- read 100 S 100 0 0 READ I ----- ---- ---- S 100 0 0 READ read 100 S 100 0 0 write 100 M 100 1 0 INV I ----- ---- ---- I ----- ---- ---- RIM/WB write 100 M 100 2 0 read 100 S 100 2 0 RD/WB S 100 2 0 S 100 2 0 (none) read 100 S 100 2 0 write 100 M 100 3 0 INV I ----- ---- ---- read 100 M 100 3 0 (none) I ----- ---- ---- I ----- ---- ---- RIM/WB write 100 M 100 4 0 I ----- ---- ---- (none) read 100 M 100 4 0 read 200 S 200 0 0 READ M 100 4 0 S 200 0 0 (none) write 100 M 100 5 0 S 200 0 0 (none) read 100 M 100 5 0 S 200 0 0 (none) write 100 M 100 6 0 read 100 S 100 6 0 RD/WB S 100 6 0 Stats: processor 0 processor 1 bus --------------- --------------- -------- read hits 1 read hits 3 READs 5 read misses 4 read misses 1 RIMS 2 write hits 2 write hits 2 WBs 4 write misses 0 write misses 2 INVs 2 --------------- --------------- -------- hit rate 42.9% hit rate 62.5% total 13