Mark Smotherman. Last updated September 2002.
(under construction)
SPARC SPARC designers wanted to eliminate as many memory accesses as possible memory-stack-based call requires at least four memory accesses 1. the return address (pc) is pushed by the call instruction 2. the old fp is pushed at the top of the subroutine 3. the old fp is popped at the bottom of the subroutine, and 4. the return address is popped by the return instruction then there are two memory accesses for each register saved (push to save and then pop to restore). the SPARC designers provided overlapped register windows instead CPU +--------------------+ +-------+ main memory | +---+ | |.......| | | | register | | insts.| | | | windows | bus |.......| | |...| |=============| data | | | |<-----. | |.......| | |...| | | | heap | | | | | | |.......| | | | | PSR | | stack | | | | +---|-----+ | |.......| | | | |...CWP...| | | | | +---+ +---------+ | | | +--------------------+ | | | | the current register window is | | selected by the current window | | pointer (CWP) in the processor | | status register (PSR) | | +-------+ unless one of the following cases occurs, there are no memory accesses required by a subroutine call and return (a) the subroutine nesting level exceeds the depth of the windows (b) the number of parameters exceeds six (c) access to memory-allocated data or to a local variable that cannot be register-allocated
%sp - stack pointer %fp - base pointer
call subr call %o0 = jmpl %o0,%o7 ret = jmpl %i7,8,%g0 retl = jmpl %o7,8,%g0 jmpl saves its own address in the destination register rD and loads the effective address rS1+rS2 or rS1+signed_immediate into the pc jmpl rS1,rS2,rD jmpl rS1,signed_immediate,rD
save advances register window to allocate a new set of regs. restore restores old register window
... overlapping ...
windows before save after save . . . . . . . . save instruction changes . . . . the register mapping (CWP) +-----+ . . %i0-%i7 | | . . restore instruction changes %l0-%l7 | | . . back to previous mapping +-----+ +-----+ %o0-%o7 | |<->%i0-%i7 | | on window overflow, registers |/////| %l0-%l7 | | be written into memory +-----+ +-----+ . . %o0-%o7 | | . . |/////| . . +-----+ . . . . . . . . . . . . +-----+ +-----+ %g0-%g7 | | | | +-----+ +-----+ more detailed view of overlap before save after save +--------+ . . . . .. %i7 | | . .\ %i6 | old fp | . . | subroutine cannot access these %i5 | | . . | registers of the calling program %i4 | | . . | %i3 | | . . | %i2 | | . . | %i1 | | . . | %i0 | | . . | +--------+ . . . . .. | %l7 | | . . | %l6 | | . . | %l5 | | . . | %l4 | | . . | %l3 | | . . | %l2 | | . . | %l1 | | . . | %l0 | | . ./ +--------+ +--------+ %o7 | & call | <-> %i7 | & call |-- ret inst. will return to %i7 + 8 %o6 | old sp | <-> %i6 | new fp |-- old sp becomes new fp %o5 | parm 6 | <-> %i5 | parm 6 |\ %o4 | parm 5 | <-> %i4 | parm 5 | | %o3 | parm 4 | <-> %i3 | parm 4 | | %o2 | parm 3 | <-> %i2 | parm 3 | | %o1 | parm 2 | <-> %i1 | parm 2 | | up to six parameters %o0 | parm 1 | <-> %i0 | parm 1 |/ +--------+ +--------+ . . %l7 | |\ . . %l6 | | | fresh set of local registers . . %l5 | | | for use by subroutine without . . %l4 | | | need for memory traffic . . %l3 | | | . . %l2 | | | . . %l1 | | | . . %l0 | |/ . . . . .. +--------+ %o7 is available for saved address if . . %o7 | |-- subroutine has any nested calls . . %o6 | new sp |-- new sp = old sp + value in save inst. . . %o5 | |\ . . %o4 | | | . . %o3 | | | available for parameters if . . %o2 | | | subroutine has any nested calls . . %o1 | | | . . %o0 | |/ . . . . .. +--------+
... move parameter1 to %o0 move parameter2 to %o1 call subroutine ! places address of call instruction in %o7 ...
/* prologue */ save %sp,-96,%sp /* body */ ... body of subroutine, in which you access parameters in %i0,%i1,... and register-allocated local vars in %l0,%l1,... ... /* epilogue */ ret ! return to %i7+8 restore
A leaf procedure can use %g1-%g4 and %o0-%o5 without having to save registers.
/* no prologue in leaf subroutine */ /* body */ ... body of subroutine, in which you access parameters in %o0,%o1,... and register-allocated local vars in the remaining output registers (thru %o5) or in the global registers (%g1-%g4) ... /* epilogue */ retl ! return to %o7+8 nop
%sp -->+------------+ | ... | | space for | | reg window | | ... | +------------+<-- %sp+64 | rtn struct | expected size of return struct will +------------+<-- %sp+68 be placed in line after call | ... | | space for | more space allocated (below %sp+92) | %i0-%i5 | whenever there are more than six | ... | parameters in a call from this frame +------------+<-- %sp+92 | pad | (input parameters stored in %fp+68, +------------+<-- %sp+96 %fp+72, ..., %fp+88, and extra parms | ... | in %fp+92, ... => in caller's frame; | locals | caller stores extra parms in %sp+92, | ... | ...) %fp -->+------------+
Note the use of registers to pass parameters, delay slot scheduling for control transfers (call and retl), and leaf optimizations in swap().
void main() { main: void swap(); save %sp, -120, %sp int a,b; mov 5, %o0 a = 5; st %o0, [%fp-20] b = 44; mov 44, %o0 swap(&a,&b); st %o0, [%fp-24] } add %fp, -20, %o0 call swap, 0 add %fp, -24, %o1 ! scheduled in ret ! delay slot restore ! of call void swap(x,y) swap: int *x,*y; ld [%o0], %g3 { ld [%o1], %g2 int temp; st %g2, [%o0] temp = *x; retl *x = *y; st %g3, [%o1] ! scheduled in *y = temp; ! delay slot return; ! of retl }
[History of subroutines page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]
mark@cs.clemson.edu