Mark Smotherman
last updated December 30, 2016
In order to isolate the effects of changes in the CPU design and to
modularize the experiment, the ACS compiler classified the optimizations
as machine independent or machine dependent. ... With the exception of
[instruction] scheduling, most of the other analyses and transformations
had appeared in previous IBM compilers ... [but in ACS] the techniques
were generalized and isolated as much as possible from source and machine
constructs.
Out of this project, therefore, came numerous advances in compiler technology,
and, more importantly, the foundations of the theory of program analysis and
optimization.
-- Fran Allen, "The history of language processor technology in IBM" (1981)
From the beginning of the project it was recognized that the only way to
realistically realize the performance goals and make them accessible to
the user was to design the compiler and the computer at the same time.
In this way features would not be put in the hardware that the software
could not use or which the software could support more effectively.
An aggressive optimizing compiler was an integral part of the Project Y and ACS plans, with machine-independent and machine-dependent optimizations as well as profile-guided optimization (PGO). The compiler was planned to include front-ends for PL/I and an extended FORTRAN.
The FORTRAN extensions included many of the OS/360 FORTRAN IV extensions, as well as:
The compiler included the following techniques.
Fran Allen published some of her work on the compiler in 1966 in an article in the Annual Review of Automatic Programming. John Cocke later worked with Jacob Schwartz, and they wrote a famous textbook draft entitled "Programming languages and their compilers". Jim Beatty worked on optimization techniques and register allocation and published some of his work in the Journal of the ACM and the book Design and Optimization of Compilers in 1972 and in the IBM Journal of Research and Development in 1974.
The ACS-1 was designed in a period of increasing software expectations on the part of the high-performance user community. This was unlike earlier in the 1960s when initial users of the CDC 6600 eagerly accepted a role in developing the software infrastructure for the system.1
A new kernel-based operating system was planned for ACS-1 with much of the work contracted to the Computer Sciences Corporation. The OS design envisioned a protected-mode kernel, called the "nucleus", and a user-mode set of additional services, called the "extended nucleus". It was anticipated that many of the customers might want to change parts of the extended nucleus, so the nucleus was defined to be the essential OS requirements across all customers.
The nucleus was to include task management, interrupt handling, explicit memory hierarchy management and protection, I/O processor interface, and file handling and protection. The extended nucleus was to include job scheduling, an I/O control system, and data management (e.g., access methods, blocking, and buffering). In some instances, documents classified the language processors (e.g., compilers, loader, link editor, library routines, command language interpreter) as part of the extended nucleus services.
An interesting design decision was that demand paging was to be the responsibility of the extended nucleus rather than the nucleus. Extended nucleus services like demand paging would have to invoke the protected-mode nucleus services in order to make changes in the memory hierarchy placement of objects. (A similar design approach was taken some twenty years later with the Mach kernel at CMU; however, the IPC calls required to implement external pagers in Mach typically resulted in poor performance.)
ACS-1 hardware was not anticipated to be available to coders until 1970, so most of the coding was to be done using the ACS simulator running on the project's S/360 Model 75.
The ACS-1 architecture team was planning to be extensively involved with specifying and coding the nucleus, and then specifying and monitoring the progress and quality of the extended nucleus and language processors. The plans called for most of the OS and language processing modules to be written in the ACS extended FORTRAN.
Estimates in 1967 called for 21K lines of code for the nucleus, over 70K lines for the extended nucleus, and 60K lines for the language processors. Man-years for the software were estimated at 45 for the ACS architecture team members and 200 for CSC contract employees, with a labor cost estimate of $11 million.
As of November 1967, the software schedule was:
-- Performance goal from Schorr's 1968 paper
-- John Cocke, Turing Award Lecture (CACM, March 1988)
It would have accomplished four to five instructions per cycle on
linear algebra-type problems. But, because of cache latency and the
inability to almost always anticipate the correct branch flow, we would
not have achieved near that rate on more general problems.
Special emphasis was given to several Livermore and Los Alamos benchmark kernels by both the design and compiler groups. Performance comparisons were made with the IBM 7090, CDC 6600, and IBM System/360 Model 91 on Lagrangian Hydrodynamics Calculation (LHC) and Neutron Diffusion (ND), and these were presented in January 1967 to the IBM Science Advisory Board. At this point in time, the issue rates were projected to be 2-out-of-6 for both the index and arithmetic units.
1967 Performance Comparisons | IBM 7090 | CDC 6600 | S/360 M91 | ACS-1 |
---|---|---|---|---|
Relative performance on LHC | 1 | 50 | 110 | 2500 |
Relative performance on ND | 1 | 21 | 72 | 1286 |
The following table gives statistics projected in 1967 for the ND kernel.
1967 Detailed Performance Comparison using ND Kernel |
IBM 7090 | CDC 6600 | S/360 M91 | ACS-1 |
---|---|---|---|---|
Arithmetic instructions | 26 | 38 | 24 | 32 |
Index instructions | 2 | 5 | 3 | 6 |
Load/store instructions | 61 | 29 | 31 | 14 |
Branch instructions | 11 | 9 | 11 | 7 |
Total instructions | 78 | 81 | 63 | 59 |
Storage references | 139 | 59 | 57 | 14 |
Total cycles | 295 | 296 | 148 | 45 |
Cycles in critical path | 110 | 76 | 30 | 37 |
Percentage in critical path | 37% | 26% | 20% | 82% |
Sustained insts. per cycle (IPC) | 0.26 | 0.27 | 0.4 | 1.3 |
Relative performance | 1 | 21 | 72 | 1286 |
Performance limiter | sequential machine | inst. fetch | branches | arithmetic |
A 1967 summary of MFLOPS projections was split into two cases, independent operations and sequentially-dependent operations, and reveals the major impact of overlap in the high-end machines.
1967 MFLOPS Comparison | IBM 7090 | CDC 6600 | S/360 M91 | ACS-1 |
---|---|---|---|---|
MFLOPS (independent) | .046 | 4 | 11 | 160 |
MFLOPS (seq. dependent) | .046 | 1.3 | 6.7 | 27 |
Overlap factor | 1 | 3 | 1.6 | 5.9 |
By 1968, performance projections were significantly higher with the 3-of-8 Index and 3-of-8 Arithmetic issue rates. IPC on the ND kernel had increased to 1.8 and relative performance to 1608.
1968 Performance Comparison using ND Kernel |
IBM 7090 | CDC 6600 | S/360 M91 | ACS-1 |
---|---|---|---|---|
Sustained insts. per cycle (IPC) | 0.26 | 0.27 | 0.4 | 1.8 |
Relative performance | 1 | 21 | 72 | 1608 |
An analysis of several machines was performed that normalized relative performance with respect to the circuit speed of each machine and the number of circuits in each machine. The result was called a relative "architectural performance" factor, although average memory access times (and thus the presence or absence of cache memory) affected the results. For example, note the effect of the M195 having cache memory whereas the M91 does not. (M195 also has a 10% faster clock than the M91: 54 nsec cycle time versus 60 nsec.) Based on this analysis, the superscalar issue, decoupled access and execute, branch avoidance techniques, and cache memory of the ACS-1 design gave a it a factor of 5 "architectural" improvement.
Design Comparison | 7090 | Stretch | CDC 6600 | S/360 M91 | S/360 M195 | ACS-1 |
---|---|---|---|---|---|---|
Relative performance | 1 | 3.5 | 23 | 61 | 96 | 1230 |
Performance normalized to circuit speed | 1 | 3.5 | 11.5 | 14.6 | 23 | 98 |
Performance normalized to circuit speed and count | 1 | 1.2 | 1.1 | 1.1 | 1.7 | 5.2 |
Sidebar - Benchmark programs
LHC was part of the Livermore CORONET program. The kernel of CORONET became loop 18 in the Livermore Loops. The 1981 Fortran version is:
c c******************************************************************************* c*** KERNEL 18 2-D EXPLICIT HYDRODYNAMICS FRAGMENT c******************************************************************************* c c 1018 T= 0.003700d0 S= 0.004100d0 KN= 6 JN= n DO 70 k= 2,KN DO 70 j= 2,JN ZA(j,k)= (ZP(j-1,k+1)+ZQ(j-1,k+1)-ZP(j-1,k)-ZQ(j-1,k)) 1 *(ZR(j,k)+ZR(j-1,k))/(ZM(j-1,k)+ZM(j-1,k+1)) ZB(j,k)= (ZP(j-1,k)+ZQ(j-1,k)-ZP(j,k)-ZQ(j,k)) 1 *(ZR(j,k)+ZR(j,k-1))/(ZM(j,k)+ZM(j-1,k)) 70 CONTINUE c DO 72 k= 2,KN DO 72 j= 2,JN ZU(j,k)= ZU(j,k)+S*(ZA(j,k)*(ZZ(j,k)-ZZ(j+1,k)) 1 -ZA(j-1,k) *(ZZ(j,k)-ZZ(j-1,k)) 2 -ZB(j,k) *(ZZ(j,k)-ZZ(j,k-1)) 3 +ZB(j,k+1) *(ZZ(j,k)-ZZ(j,k+1))) ZV(j,k)= ZV(j,k)+S*(ZA(j,k)*(ZR(j,k)-ZR(j+1,k)) 1 -ZA(j-1,k) *(ZR(j,k)-ZR(j-1,k)) 2 -ZB(j,k) *(ZR(j,k)-ZR(j,k-1)) 3 +ZB(j,k+1) *(ZR(j,k)-ZR(j,k+1))) 72 CONTINUE c DO 75 k= 2,KN DO 75 j= 2,JN ZR(j,k)= ZR(j,k)+T*ZU(j,k) ZZ(j,k)= ZZ(j,k)+T*ZV(j,k) 75 CONTINUE
Donald MacKenzie briefly discusses CORONET in "The Influence of the Los Alamos and Livermore National Laboratories on the Development of Supercomputing," IEEE Annals of the History of Computing, Vol. 13, No. 2, April 1991, as well as the use of the kernel in specifying the Livermore workload as part of the design of the CDC Star-100 in the late 1960s. MacKenzie states that the CORONET kernel was "judged untypical of Livermore weapons code because it contained no data-dependent branches."
ND was part of the Los Alamos DTF code, which is a Fortran version of Bengt Carlson's difference-equation method to solve a linear, time-independent Boltzmann equation for particle transport.
1 Boelie Elzen and Donald MacKenzie, in "The social limits of speed: The development and use of supercomputers," IEEE Annals of the History of Computing, vol. 16, no. 1, 1994, discuss how this change in expectations of supercomputer manufacturers additionally providing system software affected Cray Research, Inc., in the late 1970s and early 1980s.
Navigation within IBM ACS pages:
Next section: end of ACS project