Mark Smotherman. Last updated January 24, 2018
-- U.S. Patent 3,577,189 (Cocke, Randell, Schorr, and Sussenguth)
The present invention offers several advantages over prior art branching
apparatus and methods. Primary among these are the ability to anticipate
branches, the ability to reduce the number of branches and the ability
to reduce branch delays. With these abilities, the branching apparatus
and method according to the invention can enable a highly parallel
computer to overlap most branches such that branching takes a minimal
delay, and essentially full data rate is maintained across the branch points.
24-bit (short) instruction format:
opcode | S | R1 | R2 | R3 |
48-bit (long) instruction format:
opcode | S | R1 | X2 | X3 | 24-bit literal value |
(The opcode list below is taken from Lynn Conway's simulator notes of April 1967, which differs slightly from the ACS-1 MPM Instruction Manual of January 1968. The two columns are used to indicate to which decoder the instruction is sent. L or S after the opcode indicates long or short format.)
op code
index unit
arithmetic unit
Load/store 001 S LXH - load index (halfword format) 002 L LX - load index 003 L LXA - load index using alternate key 004 S STXH - store index (halfword format) 005 L STX - store index 006 L STXA - store index using alternate key 007 L LXC - load index and count 008 L LXCA - load index using alternate key 009 S LAH - load arithmetic (halfword format) (both) 010 L LA - load arithmetic (both; as replace for A-unit) 011 L LAA - load arithmetic using alternat key (both; replace) 012 S STAH - store arithmetic (halfword format) (both) 013 L STA - store arithmetic (both) 014 L STAA - store arithmetic using alternate key (both) 015 S LDH - load arithmentic double (halfword format) (both; replace) 016 L LD - load arithmetic double (both; replace) 017 S STDH - store arithmetic double (halfword format) (both) 018 L STD - store arithmetic double (both) 019 S LATH - load arithmetic (scaled indexing, halfword format) (both; replace) 020 L LAT - load arithmetic (scaled indexing) (both; replace) 021 S STATH - store arithmetic (scaled indexing, halfword format) (both) 022 L STAT - store arithmetic (scaled indexing) (both) 023 L LL - load arithmentic, left half (both; replace) 024 L LR - load arithmetic (right half) (both; replace) 025 L STL - store arithmetic (left half) (both) 026 L STR - store arithmetic (right half) (both) 027 L LMX - load multiple index 028 L STMX - store multiple index 029 L LMA - load multiple arithmetic (both; replace) 030 L STMA - store multiple arithmetic (both) 031 L LMS - load multiple special 032 L STMS - store multiple special 033 L STMZ - store multiple zeros 034 L STMZA - store multiple zeros using alternate key Move 038 S MXA - move index to arithmetic (both) 039 S MAX - move arithmetic to index (both) 040 L MKL - move constant to left half arithmentic (both) 041 L MKR - move constant to right half arithmentic (both) 042 S MLX - move location to index 043 S MXS - move index to special 044 S MSX - move special to index 045 S MSXZ - move special to index and zero 046 S MSXO - move special to index by or 047 S MXC - move index bit to condition bit 048 S MCX - move condition bit to index bit 049 S MLC - move location bit to condition bit (both) 050 S MAC - move arithmentic bit to condition bit (both) 051 MXP . move index to (P?) 052 MKP - move constant to (P?) Arithmetic 056 S AN - add normalized 057 S ADN - add double normalized 058 S AR - add rounded 059 S ADR - add double rounded 060 S AU - add unnormalized 061 S ADU - add double unnormalized 062 S SN - subtract normalized 063 S SDN - subtract double normalized 064 S SR - subtract rounded 065 S SDR - subtract double rounded 066 S SU - subtract unnormalized 067 S SDU - subtract double unnormalized 068 S MN - multiply normalized 069 S MDN - multiply double normalized 070 S MR - multiply rounded 071 S MDR - multiply double rounded 072 S MU - multiply unnormalized 073 S MDU - multiply double unnormalized 074 S MMN - multiply mixed normalized 075 S MMU - multiply mixed unnormalized 076 S DN - divide normalized 077 S DDN - divide double normalized 078 S DR - divide rounded 079 S DDR - divide double rounded 080 S DMN - divide mixed normalized 081 S DMR - divide mixed rounded 082 S RND - round 083 S SPF - set positive, floating 084 S SNF - set negative, floating 085 S CVS - convert to short floating 086 S CVF - convert to full floating 089 S AI - add integer 090 S SI - subtract integer 091 S MI - multiply integer 092 S MMI - multiply mixed-length integer 093 S DI - divide integer 094 S DMI - divide mixed-length integer 095 S ACL - add continued low-order 096 S SCL - subtract continued low-order 097 S ACH - add continued high-order 098 S SCH - subtract continued high-order 099 S SPI - set positive integer 100 S SNI - set negative integer 101 S CVN - convert to normalized 102 S CVI - convert to integer 106 S AX - add index 107 S SX - subtract index 108 S MX - multiply index 109 S DRX - divide with remainder index 110 S DX - divide index 111 S RX - remainder index 112 S AXC - add index to short constant 113 L AXK - add index to constant 114 L MXK - multiply index by constant 115 L DRXK - divide with remainder index by constant 116 L DXK - divide index by constant 117 L RXK - remainder index by constant 118 S SPX - set positive index 119 S SNX - set negative index Compare 123 S CGEN - compare GE normalized (both) 124 S CEQN - compare EQ normalized (both) 125 S CGED - compare GE double (both) 126 S CEQD - compare EQ double (both) 127 S CMGEN - compare magnitude GE normalized (both) 128 S CMEQN - compare magnitude EQ normalized (both) 129 S CMGED - compare magnitude GE double (both) 130 S CMEQD - compare magnitude EQ double (both) 131 S CGEI - compare GE integer (both) 132 S CEQI - compare EQ integer (both) 133 S CUGEI - compare unsigned GE integer (both) 134 S CGEX - compare GE index 135 S CEQX - compare EQ index 136 S CUGEX - compare unsigned GE index 137 L CGEXK - compare GE index with constant 138 L CEQXK - compare EQ index with constant 139 L CUGEXK - compare unsigned GE index with constant 140 S CBA - compare bytes arithmetic (both) 141 S CBMA - compare bytes multiple arithmetic (both) 142 S CBX - compare bytes index 143 S CBMX - compare bytes multiple index Shift 147 S SHA - shift arithmetic 148 S SHX - shift index 149 S SHAC - shift arithmetic by constant 150 S SHXC - shift index by constant 151 S SHD - shift double 152 S SHDX - shift double index 153 S SHDC - shift double by constant 154 S SHDXC - shift double index by constant 155 L SWA - swap storage with arithmentic (both) 156 L SWX - swap storage with index 157 S IFA - insert field arithmetic 158 S IFX - insert field index 159 S IFZA - insert field and zero arithmetic 160 S IFZX - insert field and zero index 161 S SIA - shift integer arithmetic 162 S SIX - shift integer index 163 S SIAC - shift integer arithmetic by constant 164 S SIXC - shift integer index by constant 165 S SID - shift integer double 166 S SIDC - shift integer double by constant Logical 170 S ANDA - and arithmetic 171 S TAFA - and-not arithmetic (true and false) 172 S FAFA - nor arithmetic (false and false) 173 S ORA - or arithmetic 174 S TOFA - or-not arithmetic (true or false) 175 S FOFA - nand arithmetic (false or false) 176 S EQA - equal arithmetic 177 S XORA - exclusive-or arithmetic 178 S ANDX - and index 179 S TAFX - and-not index (true and false) 180 S FAFX - nor index (false and false) 181 S ORX - or index 182 S TOFX - or-not index (true or false) 183 S FOFX - nand index (false or false) 184 S EQX - equal index 185 S XORX - exclusive-or index 186 S ANDC - and condition 187 S TAFC - and-not condition (true and false) 188 S FAFC - nor condition (false and false) 189 S ORC - or condition 190 S TOFC - or-not condition (true or false) 191 S FOFC - nand condition (false or false) 192 S EQC - equal condition 193 S XORC - exclusive-or condition 194 S CNTT - count total ones arithmetic 195 S CNTAA - count leading alike arithmetic 196 S CNTDA - count leading different arithmetic 197 S CNTAX - count leading different index 198 S CNTDX - count leading different index Branch 202 L BAND - branch at exit and 203 L BTAF - branch at exit and-not (true and false) 204 L BFAF - branch at exit nor (false and false) 205 L BOR - branch at exit or 206 L BTOF - branch at exit or-not (true or false) 207 L BFOF - branch at exit nand (false or false) 208 L BEQ - branch at exit equal 209 L BXOR - branch at exit exclusive-or 210 S BU - branch at exit unconditionally 211 S EXIT - exit (change inst. addr. reg.) (both) 212 S EXITL - exit, save location and stop skipping (both) 213 S EXITA - ? (both) 214 S EXITP - ? (both) 215 S SKAND - skip on and (both) 216 S SKTAF - skip on and-not (true and false) (both) 217 S SKFAF - skip on nor (false and false) (both) 218 S SKOR - skip on or (both) 219 S SKTOF - skip on or-not (true or false) (both) 220 S SKFOF - skip on nand (false or false) (both) 221 S SKEQ - skip on equal (both) 222 S SKXOR - skip on exclusive-or (both) 223 L IVIB - invalidate instruction buffers and branch (both) 224 S? NOP - no operation (both) Control 225 S PAUSE - pause (interrupt/exception fence) 226 S PI - pause with exception 227 L SCAN - scan in of interrupted state 228 S SVC - supervisor call 229 S SVR - supervisor return 230 S IC - interrupt call (internally generated) 231 S IR - interrupt return 235 L SIO - start i/o 236 S HIO - halt i/o 237 S TCH - test channel 238 S MTX - move T register to index 239 S MXT - move index to T register 240 S MZT - move zero to T register bit 241 S MOT - move one to T register bit 242 S ITUMA - invalidate tag and update memory using alternate key 243 S ITUMP - invalidate tag and update memory using PDA 244 ? IDA - ? 245 ? LDA - ? 246 ? LDHAA - ? 247 ? LDHBA - ? 248 ? LDHCA - ? 249 ? LDHDA - ? 250 ? STHAA - ? 251 ? STHBA - ? 252 ? STHCA - ? 253 ? STHDA - ?
Instructions to support commercial data processing, such as decimal and variable-field-length instructions, were ruled out by the designers. Other S/360 instruction set features such as the execute instruction (i.e., a single-instruction subroutine) and the register-memory (RX) instruction format were also ruled out.
Notes:
Recognizing that they could lose half or more of the design's performance on branching, the designers took several aggressive steps to reduce the number of branches:
Example:
In a conventional instruction set two separate branch instructions would be needed, one for each comparison. Here, there is only one change of the instruction counter and thus only one pipeline disruption.CEQX 10,1,2 // compare index regs // X1 and X2 and set // condition C10 if // X1==X2 CGEX 11,3,4 // compare index regs // X3 and X4 and set // condition C11 if // X3>=X4 BOR 10,11,0,LABEL // establish a branch // predicate of // (C10 or C11) and // set corresponding // branch target // address to LABEL EXIT // perform transfer of // control if branch // predicate is true
The ACS-1 combined the first two actions in its "branch" instruction and used the "exit" instruction to accomplish the third action. This is in effect a prepare-to-branch approach that allows a variable number of branch delay slots to be filled (termed within ACS as "anticipating a branch"). Perhaps even more importantly, it provides for multiway branch specification. That is, multiple branch instructions could be executed and only the first successful branch instruction would be used by the exit instruction to transfer control.
Example:
The above is equivalent to a nested if-then-else structure... BOR 1,2,0,ALPHA // establish the first // branch predicate // of (C1 or C2) and // set corresponding // branch target // address to ALPHA ... BAND 3,4,0,BRAVO // establish the second // branch predicate // of (C3 and C4) and // set corresponding // branch target // address to BRAVO BEQ 5,6,0,BRAVO // establish the third // branch predicate // of (C5 == C6) and // set corresponding // branch target // address to BRAVO // - thus BRAVO if // (C3andC4)or(C5==C6) ... BAND 7,7,0,CHARLIE // establish the fourth // branch predicate // of (C7) and set // corresponding // branch target // address to CHARLIE ... EXIT // perform transfer of // control to a branch // target address based // on the first true // branch predicate; // if none are true, // then fall through; // as part of this // instruction the exit // condition is reset to // no true predicates
See US Patent 3,577,189, John Cocke, Brian Randell, Herb Schorr, and Ed Sussenguth, "Apparatus and method in a digital computer for allowing improved program branching with branch anticipation reduction of the number of branches and reduction of branch delays," filed January 1969, and issued May 1971. Note that an "advance branch" instruction with "from" and "to" address fields was proposed in US Patent 3,551,895, Graham Driscoll, Jr., "Look-ahead branch detection system," filed January 1968, issued December 1970, and also assigned to IBM. US Aif( condition_1 || condition_2 ) goto ALPHA else if( condition_3 && condition_4 ) goto BRAVO else if( condition_5 == condition_6 ) goto BRAVO else if( condition_7 && condition_7 ) goto CHARLIE
Example:
becomes (where the * following an opcode indicate that the skip flag in that instruction should be set)CGEN 10,1,2 // compare arithmetic // regs A1 and A2 // and set condition // C10 if A1>=A2 BAND 10,10,0,LABEL // establish a branch // predicate of (C10) // and set corresponding // branch target address // to LABEL EXIT // perform transfer of // control if required AN 5,3,4 // add arithmetic regs // A3 and A4 and place // result in A5 LABEL: ...
Thus there is no branch and therefore no pipeline disruption. There can only be one skip predicate at a time. It remains in effect until one of the following instructions is executed: another skip, a special form of an exit instruction, or a supervisor call instruction.CGEN 10,1,2 // compare arithmetic // regs A1 and A2 // and set condition // C10 if A1>=A2 SKAND 10,10 // establish a skip // predicate of (C10) AN* 5,3,4 // add arithmetic regs // A3 and A4 and place // result in A5 if the // skip predicate is // not true
See US Patent 3,577,190, John Cocke, Phil Dauber, Herb Schorr, and Ed Sussenguth, "Apparatus in a digital computer for allowing the skipping of predetermined instructions in a sequence of instructions in response to the occurrence of certain conditions," filed June 1968, and issued May 1971.
The example ACS-1 code segment shown below is adapted from the MPM Timing Simulation report of August 1967. It is described as part of a Crout reduction program for matrix decomposition.
LOOP: CGEX 2,4,3 // compare index regs // X4 and X3 and set // condition C2 if // X4>=X3 BAND 2,2,0,LOOP // establish a branch // predicate of (C2) // and set corresponding // branch target address // to LOOP CGEN 1,1,2 // compare arithmetic regs // A1 and A2 and set // condition C1 if // A1>=A2 AXK 3,3,0,1 // add constant of 1 to // index reg X3 AN 1,1,8 // add arithmetic regs // A1 and A8 and place // result in A1 LA 8,0,0,1000 // load arithmetic reg A8 AN 2,2,9 // add arithmetic regs // A2 and A9 and place // result in A2 LA 9,0,0,2000 // load arithmetic reg A9 SKOR 1,1 // set skip predicate to // (C1) MN* 1,1,2 // multiply arithmetic regs // A1 and A2 and place // result in A1 if skip // predicate is not true EXIT // perform transfer of // control if required STA 1,0,0,1000 // store arithmetic reg A1 STOP
Navigation within IBM ACS pages:
Next sections: Logical Organization / Instruction Issue / Memory Hierarchy / Maintenance