CPSC 3300 - Spring 2015 Project 1 - Compiler Optimizations [updated 1/13/15 - revised part 2, added recording of system info for part 3, changed -O2 to -O3] Due date: Wednesday, January 21, at class time Submission: turn in printed copy of paper Grading Weights: 40% part 1 - description of the techniques 15% part 2 - description of gcc optimization 15% part 3 - speedup experiment 20% part 4 - debugging and distributing optimized code 10% bibliographic references and citation style This is to be an individual or team of two assignment. Write a 6- to 8-page paper that briefly describes and explores some common compiler optimizations. Part 1. Define and describe each of the following optimization techniques, perhaps giving example code before and after the optimization if such examples are concise enough. 1. loop unrolling 2. loop interchange 3. loop fusion 4. loop invariant code motion 5. strength reduction 6. basic block and basic block scheduling 7. trace and trace scheduling 8. cache blocking (tiling) Part 2. Identify at least eight additional optimization techniques available in gcc, especially among those that are enabled by the -O3 flag. There is no need to identify them all nor to describe them. Part 3. Give the speedup in performance between gcc and gcc -O3 for this C program and this input file. You can use the time command. people.cs.clemson.edu/~mark/330/predictors.c people.cs.clemson.edu/~mark/330/pred.in % gcc predictors.c % time ./a.out < pred.in predictor analysis for trace with 1000 entries << program output >> 8.216u 0.016s 0:08.76 93.8% 0+0k 0+0io 0pf+0w Use the elapsed time value and show your calculation. (In the above case, 8.76 seconds is the elapsed time.) Run each case three times, and use the median value for your calculation. (That is, ignore the two outlying results for each case, but make a note if there is significant variance.) You should also record a minimum of system information such as CPU type, CPU frequency, memory size, and OS version. For a Linux system, run these commands: % cat /proc/cpuinfo | grep name % cat /proc/meminfo | grep Total % uname -a Extra credit. See if you can increase the performance of the program using selected optimization flags beyond -O3. (Credit will depends upon the amount of additional speedup.) Part 4. Explain why compiler optimizations are not enabled by default. For example, is code harder to debug after optimization? If so, why? Is commercial code distributed as shrink-wrapped binaries typically in optimized form or not? Why? Your audience should be other junior-level computer science students. Please remember that you need to indicate quotations taken from other sources by using block-quoting (i.e., indentation) for multiple-sentence excepts or quote marks for shorter phrases and for single sentences appearing in-line in the text. Feel free to use tables or diagrams in your report, but, if you cut and paste a table or a diagram from the web or other source, be sure to cite the source both in a caption for the table/diagram and also in the references section. For bibliographic style, see www.ieee.org/documents/ieeecitationref.pdf Here are some lecture notes and papers that can help you start: Maria Garzaran, "Compiler Optimizations," Univ. Illinois lecture notes, 2006, http://www.cs.uiuc.edu/class/fa06/cs498dp/notes/optimizations.pdf R. Bader, H. Bast, and G. Hager, "Strategies and Methods for Code Optimization with focus on the Intel Itanium Architecture," lecture notes, 2008, http://blogs.fau.de/hager/files/2010/07/optimization_08.pdf