SPARC Subroutines

Mark Smotherman. Last updated September 2002.

(under construction)

Introduction

SPARC

   SPARC designers wanted to eliminate as many memory accesses as possible

   memory-stack-based call requires at least four memory accesses
      1. the return address (pc) is pushed by the call instruction
      2. the old fp is pushed at the top of the subroutine
      3. the old fp is popped at the bottom of the subroutine, and
      4. the return address is popped by the return instruction

   then there are two memory accesses for each register saved (push to save
      and then pop to restore).

   the SPARC designers provided overlapped register windows instead


    CPU +--------------------+             +-------+ main memory
        | +---+              |             |.......|
        | |   | register     |             | insts.|
        | |   | windows      |     bus     |.......|
        | |...|              |=============| data  |
        | |   |<-----.       |             |.......|
        | |...|      |       |             | heap  |
        | |   |      |       |             |.......|
        | |   |      |   PSR |             | stack |
        | |   |  +---|-----+ |             |.......|
        | |   |  |...CWP...| |             |       |
        | +---+  +---------+ |             |       |
        +--------------------+             |       |
                                           |       |
      the current register window is       |       |
      selected by the current window       |       |
      pointer (CWP) in the processor       |       |
      status register (PSR)                |       |
                                           +-------+

   unless one of the following cases occurs, there are no memory accesses
   required by a subroutine call and return

      (a) the subroutine nesting level exceeds the depth of the windows
      (b) the number of parameters exceeds six
      (c) access to memory-allocated data or to a local variable that cannot
          be register-allocated

Dedicated registers

        %sp - stack pointer
        %fp - base pointer

Subroutine instructions

        call subr

        call %o0  =  jmpl %o0,%o7
        ret       =  jmpl %i7,8,%g0
        retl      =  jmpl %o7,8,%g0

        jmpl saves its own address in the destination register rD and loads
        the effective address rS1+rS2 or rS1+signed_immediate into the pc

        jmpl rS1,rS2,rD
        jmpl rS1,signed_immediate,rD

Special instructions for register windows

        save      advances register window to allocate a new set of regs.
        restore   restores old register window

Parameter passing

... overlapping ...

Register windows

   windows      before save       after save

                . . . .           . . . .         save instruction changes
                .     .           .     .         the register mapping (CWP)
                +-----+           .     .
        %i0-%i7 |     |           .     .         restore instruction changes
        %l0-%l7 |     |           .     .         back to previous mapping
                +-----+           +-----+
        %o0-%o7 |     |<->%i0-%i7 |     |         on window overflow, registers
                |/////|   %l0-%l7 |     |         be written into memory
                +-----+           +-----+
                .     .   %o0-%o7 |     |
                .     .           |/////|
                .     .           +-----+
                .     .           .     .
                . . . .           . . . .

                +-----+           +-----+
        %g0-%g7 |     |           |     |
                +-----+           +-----+


   more detailed view of overlap

          before save     after save

          +--------+         . . . . ..
      %i7 |        |         .        .\
      %i6 | old fp |         .        . | subroutine cannot access these
      %i5 |        |         .        . |   registers of the calling program
      %i4 |        |         .        . |
      %i3 |        |         .        . |
      %i2 |        |         .        . |
      %i1 |        |         .        . |
      %i0 |        |         .        . |
          +--------+         . . . . .. |
      %l7 |        |         .        . |
      %l6 |        |         .        . |
      %l5 |        |         .        . |
      %l4 |        |         .        . |
      %l3 |        |         .        . |
      %l2 |        |         .        . |
      %l1 |        |         .        . |
      %l0 |        |         .        ./
          +--------+         +--------+
      %o7 | & call | <-> %i7 | & call |-- ret inst. will return to %i7 + 8
      %o6 | old sp | <-> %i6 | new fp |-- old sp becomes new fp
      %o5 | parm 6 | <-> %i5 | parm 6 |\
      %o4 | parm 5 | <-> %i4 | parm 5 | |
      %o3 | parm 4 | <-> %i3 | parm 4 | |
      %o2 | parm 3 | <-> %i2 | parm 3 | |
      %o1 | parm 2 | <-> %i1 | parm 2 | | up to six parameters
      %o0 | parm 1 | <-> %i0 | parm 1 |/
          +--------+         +--------+
          .        .     %l7 |        |\
          .        .     %l6 |        | | fresh set of local registers
          .        .     %l5 |        | |   for use by subroutine without
          .        .     %l4 |        | |   need for memory traffic
          .        .     %l3 |        | |
          .        .     %l2 |        | |
          .        .     %l1 |        | |
          .        .     %l0 |        |/
          . . . . ..         +--------+   %o7 is available for saved address if
          .        .     %o7 |        |-- subroutine has any nested calls
          .        .     %o6 | new sp |-- new sp = old sp + value in save inst.
          .        .     %o5 |        |\
          .        .     %o4 |        | |
          .        .     %o3 |        | | available for parameters if
          .        .     %o2 |        | |   subroutine has any nested calls
          .        .     %o1 |        | |
          .        .     %o0 |        |/
          . . . . ..         +--------+


Calling program structure

           ...
           move parameter1 to %o0
           move parameter2 to %o1
           call subroutine           ! places address of call instruction in %o7
           ...

Non-leaf procedure structure

        /* prologue */
           save %sp,-96,%sp

        /* body */
           ... body of subroutine, in which you access parameters in %i0,%i1,...
               and register-allocated local vars in %l0,%l1,...  ...

        /* epilogue */
           ret                    ! return to %i7+8
           restore

Leaf procedure structure

A leaf procedure can use %g1-%g4 and %o0-%o5 without having to save registers.

        /* no prologue in leaf subroutine */

        /* body */
           ... body of subroutine, in which you access parameters in %o0,%o1,...
               and register-allocated local vars in the remaining output
               registers (thru %o5) or in the global registers (%g1-%g4) ...

        /* epilogue */
           retl                  ! return to %o7+8
           nop

Stack frame layout


      %sp -->+------------+
             |    ...     |
             | space for  |
             | reg window |
             |    ...     |
             +------------+<-- %sp+64
             | rtn struct |               expected size of return struct will
             +------------+<-- %sp+68     be placed in line after call
             |    ...     |
             | space for  |               more space allocated (below %sp+92)
             |  %i0-%i5   |               whenever there are more than six
             |    ...     |               parameters in a call from this frame
             +------------+<-- %sp+92
             |    pad     |               (input parameters stored in %fp+68,
             +------------+<-- %sp+96     %fp+72, ..., %fp+88, and extra parms
             |    ...     |               in %fp+92, ... => in caller's frame;
             |   locals   |               caller stores extra parms in %sp+92,
             |    ...     |               ...)
      %fp -->+------------+

Generated assembly code for SPARC from gcc (optimization turned on)

Note the use of registers to pass parameters, delay slot scheduling for control transfers (call and retl), and leaf optimizations in swap().

        void main() {         main:
          void swap();          save %sp, -120, %sp
          int a,b;              mov  5, %o0
          a = 5;                st   %o0, [%fp-20]
          b = 44;               mov  44, %o0
          swap(&a,&b);          st   %o0, [%fp-24]
        }                       add  %fp, -20, %o0
                                call swap, 0
                                add  %fp, -24, %o1  ! scheduled in
                                ret                 !    delay slot
                                restore             !    of call

        void swap(x,y)        swap:
        int *x,*y;              ld   [%o0], %g3
        {                       ld   [%o1], %g2
          int temp;             st   %g2, [%o0]
          temp = *x;            retl
          *x = *y;              st   %g3, [%o1]     ! scheduled in
          *y = temp;                                !    delay slot
          return;                                   !    of retl
        }

Other resources


[History of subroutines page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu