Homework 6 examples Purposes: (1) perform calculations using Amdahl's Law; (2) perform calculations to determine MIPS and MFLOPS; (3) consider instruction execution patterns for multithreading. 1. Consider enhancing a scalar machine by providing a vector mode, which is 4 times faster than the normal mode of operation. (a) If the percentage of vectorization is 25%, what is the overall speedup? 1 1 1 speedup = ----------------- = ---------- = ----- = 16/13 = 1.23 (1-1/4) + (1/4)/4 3/4 + 1/16 13/16 (b) What percent of vectorization is needed to achieve an overall speedup of 2? 1 2 = ------------- => (1-f) + (f/4) = 1/2 (1-f) + (f/4) => 4-4f+f = 2 => -3f = -2 => f = 2/3 so % vectorization needed is 67% 2. Consider a program that executes 100 million instructions in 5 seconds. What is the MIPS rating for this program? 100*10^6 insts MIPS = -------------- = 20 MIPS 5 secs * 10^6 Consider a processor with a CPI value of 10 cycles/inst. and a clock frequency of 200 MHz. What is the MIPS rating for this processor? 200 M cycles/sec MIPS = --------------------- = 20 MIPS 10 cycles/inst * 10^6 3. Consider the program in question 2 that executes 100 million instructions in 5 seconds. If 15% of these instructions are floating-point operations, what is the MFLOPS rating for this program? 0.15 flops/inst * 100*10^6 insts MIPS = -------------------------------- = 3 MFLOPS 5 secs * 10^6 4. Consider the following two threads acting on a shared variable "sv": initially: sv = 0; thread 1: sv++; thread 2: sv=2; when compiled, these relevant portions of these threads are: thread 1: thread 2: (1.1) ld r1, sv (2.1) st r2, sv // assume that r2 has (1.2) addi r1, r1, 1 // been preloaded (1.3) st r1, sv // with the value 2 How many different interleavings are possible for the four instructions? The answer is less than 4! (4 factorial) because the ordering within thread 1 must be observed. If you consider four slots for the four total instructions executed, you can assign (without loss of generality) thread 1 to three of them (for 4 choose 3) and assign the single instruction from thread 2 to the remaining slot. 4 1 # interleavings = ( ) * ( ) = 4 * 1 = 4 3 1 What are the possible values of sv that can result? interleaving 1 interleaving 2 interleaving 3 interleaving 4 sv = 0 sv = 0 sv = 0 sv = 0 1.1 r1=0 1.1 r1=0 1.1 r1=0 2.1 sv = 2 1.2 r1=1 1.2 r1=1 2.1 sv=2 1.1 r1=2 1.3 sv=1 2.1 sv=2 1.2 r1=1 1.2 r1=3 2.1 sv=2 1.3 sv=1 1.3 sv=1 1.3 sv=3 sv = 2 sv = 1 sv = 1 sv = 3