A Programmer's View of the SPARC Architecture (Version 7) Mark Smotherman Clemson University 1. Introduction Scalable Processor Architecture (descendant of Berkeley RISC) defines IU and FPU; IU generates addresses for IU and FPU load/stores IU conditionally branches based on IU or FPU condition codes 2. The Architecture 2.1 Address Space - flat, 32-bit virtual address 2.2 Fundamental Data Types byte, unsigned byte, halfword, unsigned halfword, word, unsigned word, floating point: single, double, and extended precision big-endian, aligned 2.3 Register Set register windows providing access to 32 32-bit registers at a time 8 global, 8 ins, 8 local, and 8 outs (outs of previous window overlap ins of current window) switch windows by explicit SAVE and RESTORE instructions, trap on underflow or overflow (purpose is to reduce loads and stores on procedure entry/exit, process switch needs to save windows) PSR (processor status register) + current and previous user/supervisor bits + condition codes: N,Z,V,C + priority interrupt level (4 bits) + FPU and Coprocessor disable bits + CWP (5-bit current window pointer) + version number (8 bits) TBR (trap base register) + address of interrupt vector table (per process) + trap type field (8 bits) WIM (window invalid mask) Y (multiply step register) - used to create 64-bit products PC and nPC - current instruction address and next instruction address FPU has 32 32-bit floating pt. registers (pairs for double precision) (criticized for no IU register<->FPU register connection) FSR (FPU status register) has condition codes, enables, etc. 2.4 Core Instruction Set - load/store architecture, 69 basic instructions instruction formats - fixed 32-bit length, 3-register format standard data movement and ALU instructions SETHI - set high 22 bits with immediate explicit opcode bit to enable setting of condition code MULSCC - multiply step (suggested multiplication and division routines given in arch. manual) control instructions delayed branches, annul bit on branch says that delay slot should not be executed if branch not taken (architecture states that taken branch should be faster than untaken if any difference) CALL - store return address in %o7 JMPL - store return address in rd TICC - trap on IU condition code, low 7 bits of EA placed in TBR trap type field, used for OS entry RETT - return from trap, restores only S and CWP of PSR 2.6 Condition Codes N,Z,V,C similar to MC68000 parallel forms of most arithmetic operations - one form sets CC 2.7 Extended Instructions read and write control registers (Y,PSR,WIM,TBR) multiprocessor synchronization using SWAP and LDSTUB (test and set) cache flush 2.8 Two Addressing Modes M[ reg1 + reg2 ] M[ reg1 + signed_13_bit_constant ] 2.9 IEEE-754 Floating Point provided by FPU 5. Systems Programming 5.1 Multiple Execution Modes - supervisor/user execution mode change upon interrupt privileged instructions include load/store alternate spaces, which can be used to access MMU, I/O, etc. 5.2 Interrupts 16 priority levels, interrupt vector table specified by TBR new window automatically allocated upon interrupt and the two PCs saved in new local registers 5.4 Memory Management Unit - not defined in architecture 6. Pipelining and Multiple Function Unit Optimizations delayed branches and interlocked, delayed loads at least one multiplier and one adder in FPU for concurrent operation FPU has register scoreboard, also inst. queue to provide precise interrupts IU sends floating point operations to FPU queue until it is full, then IU stalls on next FP op SPARC V7 instruction set mnemonic / action (* = privileged) / (alternate versions) ---- -------------------------- -------------------- ldsb load signed byte (ldsba = alternate space *) ldsh load signed halfword (ldsha = alternate space *) ldub load unsigned byte (lduba = alternate space *) lduh load unsigned halfword (lduha = alternate space *) ld load word (lda = alternate space *) ldd load doubleword (ldda = alternate space *) ldf load floating-point lddf load double floating-point ldfsr load flt-pt state register ldc load coprocessor lddc load double coprocessor ldcsr load coprocessor state reg. stb store byte (stba = alternate space *) sth store halfword (stha = alternate space *) st store word (sta = alternate space *) std store doubleword (stda = alternate space *) stf store floating-point stdf store double floating-point stfsr store flt-pt state register stdfq store double flt-pt queue * stc store coprocessor stdc store double coprocessor stcsr store coprocessor state reg. stdcq store double coproc. queue * ldstub atomic ld-st unsigned byte (ldstuba = alternate space *) swap swap register with memory (swapa = alternate space *) add add (addcc = add and set condition code) addx add with carry (addxcc = add with carry and set cc) taddcc tagged add and set cc (taddcctv = tadd, set cc, trap on ovf.) sub subtract (subcc = sub and set condition code) subx subtract with carry (subxcc = sub with carry and set cc) tsubcc tagged subtract and set cc (tsubcctv = tsub, set cc, trap on ovf.) mulscc multiply step and set cc and and (andcc = and and set cc) andn and not (andncc = and not and set cc) or or (orcc = or and set cc) orn or not (orncc = or not and set cc) xor exclusive or (xorcc = xor and set cc) xnor exclusive nor (xnorcc = xnor and set cc) sll shift left logical srl shift right logical sra shift right arithmetic sethi set high 22 bits of register save save caller's window restore restore caller's window branch on integer cond. code (ba = branch always) (bn = branch never) (bne = branch on not equal, or bnz) (be = branch on equal, or bz) (bg = branch on greater) (ble = branch on less or equal) (bge = branch on greater or equal) (bl = branch on less) (bgu = branch on greater unsigned) (bleu = br on less or equal unsigned) (bcc = branch on carry clear, or bgeu) (bcs = branch on carry set, or blu) (bpos = branch on positive) (bneg = branch on negative) (bvc = branch on overflow clear) (bvs = branch on overflow set) ,a branch on icc and annul (see bicc alternatives above) branch on flt-pt cond. code (fba = branch always) (fbn = branch never) (fbu = branch on unordered) (fbg = branch on greater) (fbug = branch on unordered or greater) (fbl = branch on less) (fbul = branch on unordered or less) (fblg = branch on less or greater) (fbne = branch on not equal, or fbnz) (fbe = branch on equal, or fbz) (fbue = branch on unordered or equal) (fbge = branch on greater or equal) (fbuge = branch on unordered or greater or equal) (fble = branch on less or equal) (fbule = branch on unordered or less or equal) (fbo = branch on ordered) ,a branch on fpcc and annul (see fbfcc alternatives above) branch on coproc. cond. code (not listed here, see architecture manual) call call jmpl jump and link rett return from trap * trap on integer cond. code (see bicc alternatives above) rdy read y register rdpsr read processor state reg. * rdwim read window invalid mask * rdtbr read trap base register * wry write y register wrpsr write processor state reg. * wrwim write window invalid mask * wrtbr write trap base register * unimp unimplemented instruction iflush instruction cache flush floating-point operate (fitos = convert integer to single) (fitod = convert integer to double) (fitox = convert integer to extended) (fstoi = convert single to integer) (fdtoi = convert double to integer) (fxtoi = convert extended to integer) (fstod = convert single to double) (fstox = convert single to extended) (fdtos = convert double to single) (fdtox = convert double to extended) (fxtos = convert extended to single) (fxtod = convert extended to double) (fmovs = move single/one word) (fnegs = negate / toggle sign bit) (fabss = absolute value / clear sign bit ) (fsqrts = square root single) (fsqrtd = square root double) (fsqrtx = square root extended) (fadds = add single) (faddd = add double) (faddx = add extended) (fsubs = subtract single) (fsubd = subtract double) (fsubx = subtract extended) (fmuls = multiply single) (fmuld = multiply double) (fmulx = multiply extended) (fdivs = divide single) (fdivd = divide double) (fdivx = divide extended) (fcmps = compare single, or fcmpes = cmp single and exception if unordered) (fcmpd = compare double, or fcmped = cmp double and exception if unordered) (fcmpx = compare extended, or fcmpex = cmp ext and exception if unordered) coprocessor operate (not listed here, see architecture manual)