Software Pipelining Example for IA-64 Architecture C Code Example #define N 5 int x[N],y[N]; for (i=0;i add-. `-> add-. `-> add-. `-> add-. `-> add-. ... | | | | | stage 3: ... ... `---> st `---> st `---> st `---> st `---> st predicate register contents p16 1 1 1 1 1 0 0 p17 0 1 1 1 1 1 0 p18 0 0 1 1 1 1 1 effect of branch lc 4->3 3->2 2->1 1->0 0 0 0 ec 3 3 3 3 3->2 2->1 1->0 pr63 ->1 ->1 ->1 ->1 ->0 ->0 ->0 then rotate (decrement rrb) By array element x[0] x[1] x[2] x[3] x[4] | | | | | V | | | | iteration 1: ld | | | | lc 4->3 prolog | V | | | iteration 2: add ld | | | lc 3->2 " | | V | | iteration 3: st add ld | | lc 2->1 kernel | | | V | iteration 4: | st add ld | lc 1->0 " | | | | V iteration 5: | | st add ld ec 3->2 " | | | | | iteration 6: | | | st add ec 2->1 epilog | | | | | iteration 7: | | | | st ec 1->0 " | | | | | V V V V V y[0] y[1] y[2] y[3] y[4] register contents initially 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- ---- ---- ---- ---- ---- ---- pr: 1 0 0 ar: 4 3 enter at loop first iteration body of loop actions - only the load into r32 is executed since only p16=1 32 33 34 35 36 37 38 16 17 18 lc ec gr: x[0] ---- ---- ---- ---- ---- ---- pr: 1 0 0 ar: 4 3 LOAD br actions - decrements lc, sets p63=1, and rotates register files 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- x[0] ---- ---- ---- ---- ---- pr: 1 1 0 ar: 3 3 branch to loop second iteration body of loop actions - only the load and the add into r34 are executed since only p16=p17=1 32 33 34 35 36 37 38 16 17 18 lc ec gr: x[1] x[0] y[0] ---- ---- ---- ---- pr: 1 1 0 ar: 3 3 LOAD ---ADD--- br actions - decrements lc, sets p63=1, and rotates register files 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- x[1] x[0] y[0] ---- ---- ---- pr: 1 1 1 ar: 2 3 branch to loop third iteration body of loop actions - all three instructions are executed 32 33 34 35 36 37 38 16 17 18 lc ec gr: x[2] x[1] y[1] y[0] ---- ---- ---- pr: 1 1 1 ar: 2 3 LOAD ---ADD--- STORE br actions - decrements lc, sets p63=1, and rotates register files 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- x[2] x[1] y[1] y[0] ---- ---- pr: 1 1 1 ar: 1 3 branch to loop fourth iteration body of loop actions - all three instructions are executed 32 33 34 35 36 37 38 16 17 18 lc ec gr: x[3] x[2] y[2] y[1] y[0] ---- ---- pr: 1 1 1 ar: 1 3 LOAD ---ADD--- STORE br actions - decrements lc, sets p63=1, and rotates register files 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- x[3] x[2] y[2] y[1] y[0] ---- pr: 1 1 1 ar: 0 3 branch to loop fifth iteration body of loop actions - all three instructions are executed 32 33 34 35 36 37 38 16 17 18 lc ec gr: x[4] x[3] y[3] y[2] y[1] y[0] ---- pr: 1 1 1 ar: 0 3 LOAD ---ADD--- STORE br actions - decrements ec since lc=0, sets p63=0, rotates register files 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- x[4] x[3] y[3] y[2] y[1] y[0] pr: 0 1 1 ar: 0 2 branch to loop sixth iteration body of loop actions - only the add and store instructions are executed since only p17=p18=1 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- x[4] y[4] y[3] y[2] y[1] y[0] pr: 0 1 1 ar: 0 2 ---ADD--- STORE br actions - decrements ec since lc=0, sets p63=0, rotates register files 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- ---- x[4] y[4] y[3] y[2] y[1] pr: 0 0 1 ar: 0 1 branch to loop seventh iteration body of loop actions - only store instruction is executed since only p18=1 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- ---- x[4] y[4] y[3] y[2] y[1] pr: 0 0 1 ar: 0 1 STORE br actions - decrements ec since lc=0, sets p63=0, rotates register files 32 33 34 35 36 37 38 16 17 18 lc ec gr: ---- ---- ---- x[4] y[4] y[3] y[2] pr: 0 0 0 ar: 0 0 branch falls through since lc=ec=0