330 Spring 2010 -- Final Exam Name: ____________________ No calculators. Most problems can be worked with simple fractions. 1. (a) Find the execution time of a program that executes 10 billion instructions on a processor with an average CPI of 1.0 and a clock frequency of 2 GHz. (5 pts.) (b) What is the MIPS value for the machine described in part (a)? (5 pts.) 2. Use Amdahl's Law to determine what fraction of time a fifty times speedup must be used to obtain an overall speedup of twenty? Give your answer as a fraction. (5 pts.) 3. Consider a pipelined computer system with this workload and CPIs. (5 pts.) inst. type freq. cpi ------------------------- load/store .3 2 integer ALU op .4 1 taken branch .2 4 untaken branch .1 2 (a) What is the average CPI? (b) For 10 million instructions and a CPU cycle time of 0.5 nsec, what is the CPU execution time? 4. Consider a two-bit saturating counter used for branch prediction, in which 1 represents a taken branch and 0 represents an untaken branch. (a) give the state diagram (10 pts.) (b) give the truth table (there is no need for output columns) (10 pts.) (c) design a circuit (output is merely the state values) (up to 10 pts. extra credit) 5. For the MIPS instruction sequence below, draw the data dependency graph and identify any data dependencies as RAW, WAR, or WAW. (10 pts.) i1: lw $1, 0($2) // regs[1] <- memory[ regs[2] + 0 ] i2: addi $2, $2, 1 // regs[2] <- regs[2] + 1 i3: lw $3, 0($1) // regs[3] <- memory[ regs[1] + 0 ] 6. For the MIPS instruction sequence given in question 5, show the pipeline cycle diagrams for the standard 5-stage pipeline without forwarding. Assume register file writes occur in the first half cycle and reads in the second half cycle. (10 pts.) 7. Give at least two differences bewteen superscalar and VLIW. (5 pts.) 8. Consider a 4 GB byte-addressable main memory with a two-way set- associative cache of 256 KB and 16 bytes per line. (a) How many total lines are there in cache? (not just per bank) (1 pt.) (b) Show how the main memory address is partitioned into fields for the cache access and give the bit lengths of those fields. (3 pts.) (c) How many total tag comparators are there in the cache? (1 pt.) 9. Explain the following three terms for parallel processor systems. (5 pts. each) (a) UMA (b) NUMA (c) NORMA 10. Assume a 256-byte main memory and a four-line cache with four bytes per line. (Note that this is larger than the example exam cache.) The cache is initially empty. For the byte address reference stream (reads) given below circle which of the references are hits for the different cache placement schemes. Also, show the final contents of the cache. (The byte addresses are in decimal.) (a) direct-mapped (5 pts.) 4, 20, 5, 38, 6, 36, 7, 21, 8, 22 (b) two-way set associative with LRU replacement (5 pts.) 4, 20, 5, 38, 6, 36, 7, 21, 8, 22 (c) fully-associative with FIFO replacement (5 pts.) 4, 20, 5, 38, 6, 36, 7, 21, 8, 22