Subroutine and procedure call support

Mark Smotherman
Last updated: March 2004

Summary: TBD

.. portions under construction ..

Early history

1842 - Ada Augusta King, Countess of Lovelace - described the idea of a set of reusable instruction for the Analytic Engine
1946 - Turing's ACE
(from Bill Findlay) "Turing wrote a design study for the ACE in the late 1940s that envisaged a stack of return addresses, partly managed by software, and facilitated by the equivalent of a macro-assembler! Some of his ideas were later incorporated in the pilot ACE (at NPL) and the commercially-produced DEUCE. ... It is a remarkable document, discussing everything from the programming system to whether the delay lines should be filled with alcohol or mercury."
A.M. Turing, "Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE)." Report E882, Executive Committee, NPL February 1946. Reprinted April 1972 as NPL Report Com. Sci 57.
(I believe this is either a draft of the paper or is the cited paper itself. It discusses performing subsidiary operations that are invoked by BURY instructions and returning control by UNBURY instructions, along with a stack (LIFO) store of instruction addresses in main memory and a stack pointer in temporary storage [i.e., "TS 31"].)
```
        We also wish to be able to arrange for the splitting up of
        operations into subsidiary operations.  This should be done in
        such a way that once we have written down how an operation is
        done we can use it as a subsidiary to any other operation.

        ...

        When we wish to start on a subsidiary operation we need only
        make a note of where we left off the major operation and then
        apply the first instruction of the subsidiary.  When the subsidiary
        is over we look up the note and continue with the major operation.
        Each subsidiary operation can end with instructions for this
        recovery of the note.  How is the burying and disinterring of the
        note to be done?  There are of course many ways.  One is to keep a
        list of these notes in one or more standard size delay lines (1024),
        with the most recent last.  The position of the most recent of these
        will be kept in a fixed TS, and this reference will be modified
        every time a subsidiary is started or finished.  The burying and
        disinterring processes are fairly elaborate, but there is
        fortunately no need to repeat the instructions involved, each time,
        the burying being done through a standard instruction table BURY,
        and the disinterring by the table UNBURY.
     
```
[pp. 11-12]
```
        BURY (General Description). The content of TS 1 with 1 added is
        transferred to the position indicated in TS 31, and 1 is added
        to the reference in TS 31.  We then proceed to carry out the
        instruction in TS 1.

        UNBURY (General Description). The minor cycle whose position is
        given in TS 31 is taken to be position of the next instruction.
     
```
[p. 30. Notes: TS 1 contains the address of the currently executing instruction. "minor cycle" = word.]
see also extract from Andrew Hodges book on Turing
(from http://www.bshs.org.uk/conf/ace2000.html)
"The Pilot Model ACE ran its first program on May 10, 1950, at the National Physical Laboratory. With a clock speed of 1 MHz, Pilot Model ACE was for some time the fastest computer in the world. DEUCE, the production version of the Pilot Model, was constructed by the English Electric Company. In total more than 30 were sold. NPL's full-scale ACE began work in 1958. The fundamentals of Turing's ACE design were used in the Bendix G15 computer. The G15 was arguably the first personal computer and over 400 were sold worldwide. DEUCE and the G15 remained in use until about 1970. Another computer deriving from Turing's ACE design, the MOSAIC, played a role in Britain's air defences during the Cold War period; other derivatives include the Packard-Bell PB250 (1961)."
- Deuce manuals
- Bendix G15 manual (pdf)
(my thanks to Bill Findlay for telling me about the ACE)
1947 - EDSAC - Wheeler Jump, named for David Wheeler. An instruction loads itself into the accumulator; thus the value in the accumulator is a bit pattern that contains the address of this instruction. A second instruction then adds a predetermined constant to the accumulator to transform the bit pattern into a branch to the first instruction's address plus some number of words needed to cause a proper return branch. Then a third instruction stores the new "return" branch at the bottom of the subroutine.
```
         location   contents (ignoring length code)

           m      :   A  m      - add the number in memory location m to the
                                  accumulator (i.e., it loads itself)

           m + 1  :   G  n      - branch to memory location n (the start of
                                  the subroutine

           m + 2  :   ...       - control will return here


           ...


           n      :   A  3      - add the number in memory location 3 to the
                                  accumulator (it is a predetermined constant
                                  that will change "A m" to "G m+2")

           n + 1  :   T  p      - store new return instruction at bottom of
                                  this subroutine

           ...

           p      :   ----      - location for return branch
     
```
adapted from EDSAC Tutorial Guide, p. 24 (pdf)
"Originally, to simplify the rather complex programming process, subroutines were simply stringed together on the tape. The major advance came when Wheeler thought up the idea of the sub-routine. This meant that, rather than simply executing subroutine after subroutine, a program could execute a subroutine, and after the subroutine had run, the program would again gain control where it left off. This was called the 'Wheeler Jump', and is an optimal programming technique which forms the basis of the way programs are written today." [http://hoc.co.umist.ac.uk/storylines/compdev/electronic/edsac.html]
EDSAC simulator (Java applet)
1950 - Zuse Z4 - Instructions for a program were fetched one-by-one from a punched tape. A single subroutine could be executed by reading instructions from separate punched tape on a second tape reader. The Z4 included call and return instructions to transfer control between tape readers.
1950 - Bell Labs Model VI - 200 "hardwired subroutines" are provided to improve programmer productivity (see this description) [check this]
1950/1951 - NBS SEAC - two program counters provide co-routine structure [check date first implemented - feature at least described in Leiner's 1951 paper for expansion to SEAC]
1951 - Ferranti Mark 1 - A store return address instruction was provided. [check this! Blaauw and Brooks assert this, but I can't find corroboration in Lavington]
1951 - Univac I - A return instruction when paired with a branch inserts the updated instruction address into a branch at the bottom of a subroutine.
1955 - IBM 704 - The TSX instruction stores its own address in an index register. Parameters can be passed in-line after the TSX instruction and can be accessed by use of the index register. Subroutine return is accomplished by branching back to the address in the index register (which has been updated after each parameter access).
1959 - Bull Gamma 60 - Uses a normal branch but the first instruction of the subroutine can save the return address by reading and storing the contents of a CPU address register into a memory word. An indirect branch placed at bottom of the subroutine can use this stored address to accomplish the return.
1961 - English Electric KDF9 - had a 16-deep return address stack (along with a a 16-deep arithmetic stack and a bank of 16 counting/indexing registers) for each of 4 processes, driven by blocking and unblocking on I/O transfer termination status, an implemented entirely in hardware (the stacks *and* the dispatching).
(thanks to Bill Findlay)
1963 - Burroughs B5000 - A stack-based computer with support for block-structured languages like Algol. Parameters and return address are stored on the stack, but subroutine entry is a fairly complex operation.

Linkage considerations

Depending on language and architecture, registers may hold:

- uplevel addressing link
- exception handler pointer (for Ada)
- frame pointer
- program counter
- stack pointer
- actual parameters 
- return values
- constant pool pointer
- spare space for called routines to trash
- registers reserved for link-time-generated code to use

(list from Don Lindsay usenet post)

Memory-based linkages

Early subroutine linkages stored the return address in memory, e.g., as the first word of subroutine.

Example linkages

Other systems with memory-based linkage include the EDSAC (using the Wheeler jump), (check Ferranti Mark 1 / Atlas used extracodes for subroutine calls), Univac I, Bull Gamma 60, IBM Stretch, CDC 160 and 6600 PPU, and Univac 1103A.

Register-based linkages

The IBM 704 appears to be the first computer to store the return address in a register. This approach, popular for many years because of the in-line passing of parameters, fell out of favor when memory stack linkages became popular in the 1970s. It then regained favor when RISC architects wanted to avoid traffic to a memory stack on each call and return.

Example linkages

The Cray-1 also used a register-based linkage.

Stack-based linkages

... store return address in a memory stack ...

Typical procedure support in 1970s

   dedicated registers

        sp - stack pointer, top of stack
        fp - frame pointer, anchors the stack frame for this subroutine
           invocation (will not change as things are pushed onto the stack)

   memory stack frame (subroutine call activation record)

            | / / / / / /|
            +------------+
      sp -->|    ...     |
            |   locals   |  fp-c is usually an address of a local variable
            |    ...     |
            +------------+
      fp -->|   old fp   |
            | saved regs |
            |  rtn addr  |
            +------------+
            |    ...     |
            | parameters |  fp+k is usually an address of a parameter
            |    ...     |
            +------------+
            | / / / / / /|

   calling program

        push parameter2 on stack       ! typically done in reverse order
        push parameter1 on stack
        call subroutine                ! pushes return address onto the stack
        clean parameters off stack     ! add constant to stack pointer after
                                       !    return

   subroutine (callee save convention to save registers)

        /* prologue */
           push registers to save
           push old fp
           set new fp as current sp
           subtract from sp to make room for locals

        /* body */
           ... body of subroutine, in which you access parameters with memory
               addresses fp+k and local vars with memory addresses fp-c ...

        /* epilogue */
           set sp to current fp
           pop fp
           pop registers to restore
           return                      ! pops return address from stack

S.C. Johnson and D.M. Ritchie, "The C Language Calling Sequence," Bell Labs, CS Tech. Rept. No. 102, September, 1981.

Example linkages - stack machines

Burroughs B-5000 and B-6700

Other stack machines include the ICL 2900 series and the HP-3000 (pcal and scal).

Example linkages - general-register machines

Register-window-based linkages

... desire to reduce memory traffic ...

   SPARC designers wanted to eliminate as many memory accesses as possible

   memory-stack-based call requires at least four memory accesses
      1. the return address (pc) is pushed by the call instruction
      2. the old fp is pushed at the top of the subroutine
      3. the old fp is popped at the bottom of the subroutine, and
      4. the return address is popped by the return instruction

   then there are two memory accesses for each register saved (push to save
      and then pop to restore).

   the SPARC designers provided overlapped register windows instead

... design choices:
fixed size vs. variable size
overlapping vs. non-overlapping

Some folks see the idea of register windows in the scratchpad registers of the Fairchild F8 (you changed which of its 64 8-bit registers you could easily access by changing the contents of a 6-bit indirect scratchpad address register) and in the memory-mapped registers of the TI TMS-9900 (you changed the memory mapping of its 16 general registers by changing the contents of the workspace pointer register).

... also Pyramid 90x and Celerity C12000 ...

John Mashey's comments on register windows (Usenet posting to comp.arch)

Example linkages

Call-gate-based linkages

TO DO ... desire to provide different protection domains ...
Multics
Intel x86 call gates

Capability-based linkages

TO DO ... desire to provide different protection domains ...
M68020 callm and retm
Intel 432 call and call_thru_domain
Plessey 250, S/38 (AS/400), ...

Hank Levy's Capability-Based Computer Systems book

Acknowledgements

My thanks go to several folks who helped me understand the subroutine linkages on the described machines: John Ahlstrom, Bill Findlay, Jim Hull, and Doug Jones. (My apologies to these individuals for any misunderstandings on my part about the information they have shared with me; the errors in the descriptions are entirely mine.)

[History page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu