CPSC 3300 - Spring 2016 Homework 1 Due Friday, January 22 Each student must turn in a separate set of homework solutions, but you may work together in study groups with other students from the class. Include the names of your study group members on the solution set you submit. (It is acceptable for the group to prepare one document and then for each group member to turn in a copy of that document with his or her name highlighted or circled.) I suggest that in the interest of better participation you limit your study group size to at most four. Also, please provide sufficient space for your calculations and answers so that grading will be easier. (10 points each part) 1. Define these compiler optimizations: (a) if conversion (b) loop-invariant code motion (c) machine-specific peephole optimization 2. Define these attributes of loops: (a) loop-carried dependency (b) reduction 3. Take a program you have written in the past that requires at least 15 seconds to run as unoptimized code (or find the source code to one such C program). Run all experiments below on one of the school servers and record the elapsed time using the time command. (a) Compile using gcc only and run the program using the time command. E.g., % gcc myprog.c % time ./a.out (b) Compile using gcc -Og and run using the time command. (c) Compile using gcc -O3 and run using the time command. (d) If possible, find a major loop within the source code that does not have any loop-carried dependencies, that does not perform reductions, and that does not have any data races on shared variables. If found, insert the following statement just prior to the loop: #pragma omp parallel for Compile using gcc -fopenmp and run using the time command. If not found, then using excerpts from the code, show why the major loop had a loop-carried dependence, performed a reduction, and/or had a data race on a shared variable that would lead to an incorrect answer if parallelized. (OpenMP tutorials can be found at www.embedded.com/design/ mcus-processors-and-socs/4007154/Use-OpenMP-for-programming- parallel-threads-in-multicore-applications-Part-1, www.compunity.org/training/tutorials/openmp_Boston.pdf, and people.math.umass.edu/~johnston/PHI_WG_2014/OpenMPSlides_tamu_sc.pdf) 4. Explain why the 462.libquantum SPECint2006 benchmark experiences such high performance in recent SPEC reports. What optimization(s) enable this? Be specific in your answer and cite your sources. (A mere quote that the benchmark is "broken" is not acceptable.)