Intel 432 (1975-1985)

Mark Smotherman

Summary

UNDER CONSTRUCTION

previous project names: SSO, 8816, 8800

The 432 was originally intended as Intel's 16-bit microprocessor; see interview of Dave House

[regarding Colwell, et al.] They identify several design mistakes that hampered performance

instruction set design weaknesses, including
- bit-aligned instructions
- no literals (immediate data) allowed in instructions
enter_environment management (i.e., make a new object-oriented environment accessible, specifically to change capability lists)
compiler weaknesses, including
- code optimization, e.g.,
  - no flow analysis to determine if access to an object requires the execution of an expensive enter_environment instruction to establish initial access rights or if the access rights from a previous access are still valid; the compiler instead generated an enter_environment instruction for each access, even when the instruction was a loop invariant
  - no common subexpression analysis, resulting in, e.g., no reuse of array index calculations even when the same array element was used on both sides of an assignment statement
- blanket use of the expensive, generalized procedure call, even to intra-package procedures, when less expensive branch_and_link instructions are available in the ISA (and for fans of the VAX 11/780 call instructions, so pointedly panned in Hennessy and Patterson, a typical 432 call takes 982 cycles plus 40 memory accesses, making it about ten times slower than even the infamous VAX CALLS)
- implementing every IN OUT parameter as call by value/result, even for large arrays, resulting in, e.g., the parameter passing overhead in Dhrystone being ten times the amount of time to run the rest of the benchmark

They estimate that compiler weaknesses account for 25-35% loss of throughput, while inst. set design weaknesses account for another 5-10%; and, they point out that these are independent of the object-oriented nature of the 432.

They postulate even better performance could have been obtained by adding

local data registers
expanding buses to 32 bits
expanding top-of-stack register to 32 bits
wider microinstruction bus
access descriptor cache
memory clear instruction

Yet, in spite of the corrections and additions, they see a factor of two to three performance hit for the 432 style of object orientation.

Robert Colwell writes in two postings to comp.arch from April, 1995:

   Robert Colwell wrote:
	There were some really neat ideas in that system. As I pointed out
	in my thesis in '85, the i432 was a wonderful research project
	masquerading as a bad product.

   I believe the research was done at CMU as C.MMP, Wulf, Jones, et. al.
Let's be very careful here. The pioneering work into capability machines was indeed done at (among other places) CMU. But the i432 went far beyond where the CMU work left off. For instance, in the i432, even the physical processors were themselves objects, with SW-readable data records describing their states. Processes were also objects, and were managed just like any other object. The 432 folks extended the object paradigm to an unprecedented extent. It was almost breathtaking when you first realized what they'd attempted; a kind of grand unification of computer systems.
I think it's a shame that the failure of the product (due at least partly to its very low performance) so obscured the contributions this design might have made. (I say that because my research convinced me that this slowness was not intrinsic to the design philophy. The poor performance was due to a combination of other factors.) This project could have turned out quite differently.

and

   I was never impressed by the i432. It seemed to suffer form some of the same
   architectural excesses amd lack of attention to basic performance as the 
   Burroughs 1700.

   In a previous job, I was asked to analyse the 432 as a competitor product at 
   the time it was comming out. The state of the project could be gleaned from 
   the published ISSCC paper on the 432. This paper described a hardware LRU 
   algorithm for deciding which cache line to flush, and then went on to admit 
   that they had only two words of cache.
The 432 address cache had 4 entries, and when that cache missed, a 7-levels-of-indirection table walk ensued. Gordon Bell picked on this as "it's obvious that this will cripple the performance, no wonder it's slow." In the research I did, I never saw this effect. It's not so simple as blaming the 432's addressing for its low performance.
   My conclusions were:

   a) They were in a hurry to complete the project and they cut out cache to
      make it fit on the chip. A panic of this magnitude usually suggests that
      things are not really in control.
1) I don't think that cache was too small (I had lots of simulations to back that up)
2) I have no evidence that they chopped it down due to die size pressure
3) Even if they did have to downsize that cache, I don't think it qualifies necessarily as "panic"; it's a normal part of chip design.
   b) Given all the complicated addressing stuff, without adequate cache, it
      was not going to perform.
So you're in Gordon's camp. You missed the real lessons of the i432. Read the paper again.
   My conclusion was that we did not need to worry about the 432, and I was
   right.
You were right, but for the wrong reasons. That's not a trivial distinction.
   P.S. I always saw that paper by Doug Jensen and his students as being CMU 
   defending its own.
You're wrong. It's a technical paper. Speculating on why I might have done the work is pointless, and has nothing to do with the technical merits of its contribution. But in case you really are interested, my interest in doing the research, and then in writing it up, was this: the field of computer architecture has a very difficult time applying the canonical scientific approach. We never get to do a design twice, first one way, and then another way, so that we can see the effects of a given design choice. The i432 cost Intel $100M's. All we can do is analyze the final result, and incrementally tweak the design (via its simulator) to see if we can extract the various influences of the myriad design decisions embodied therein.
So I did that analysis to see why such a radical design had gone so far wrong. I became convinced that the field as a whole was missing the point: the 432 wasn't slow because it was object-oriented. It was slow because it got some basic things wrong. That's an important lesson for ANY chip development.

References

Robert P. Colwell, The Performance Effects of Functional Migration and Architectural Complexity in Object-Oriented Systems. Ph.D. dissertation, Carnegie-Mellon, 1985, tech. report CMU-CS-85-159.
Robert P. Colwell, Edward F. Gehringer, and E. Douglas Jensen, "Performance effects of architectural complexity in the Intel 432," ACM TOCS, Vol. 6, No. 3, Aug. 1988, pp. 296-339.
George W. Cox, William M. Corwin, Konrad K. Lai, and Fred J. Pollack, "Interprocess communication and processor dispatching on the Intel 432," ACM TOCS, Vol. 1, No. 1, Feb. 1983, pp 45-66.
Edward F. Gehringer and Robert P. Colwell, "Fast object-oriented procedure calls: lessons from the Intel 432," Proc. ISCA-13, Tokyo, 1986, pp 92-101.
D. Johnson, "The Intel 432: A VLSI Architecture for fault-tolerant computer systems," Computer, Vol. 17, No. 8, Aug. 1984, pp. 40-48.
Henry Levy, Capability-Based Computer Systems. Digital Press, 1984. [432 is the topic of chapter 9]
Glenford J. Myers, Advances in Computer Architecture (2nd ed.). Wiley, 1978. [432 is the topic of Part VI, pp. 335-417]
Elliot I. Organick, A Programmer's View of the Intel 432 System. McGraw-Hill, 1983.
Eric Smith, http://www.brouhaha.com/~eric/retrocomputing/intel/iapx432/

Acknowledgements

[History page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu