Intel 432 analysis highlights from April 1995 comp.arch traffic ---- From: mark@hubcap.clemson.edu (Mark Smotherman) Date: 14 Apr 95 14:34:55 GMT Newsgroups: comp.arch [snip] See the excellent "post mortem" on the 432 by Bob Colwell, Ed Gehringer, and Doug Jensen, "Performance Effects of Architectural Complexity in the Intel 432," ACM Trans. on Computer Systems, vol. 6, no. 3, August 1988, pp. 296-339. [Ed Gehringer and Bob Colwell also wrote more specifically about procedure calls in the 432 in "Fast Object-Oriented Procedure Calls: Lessons From the Intel 432," ISCA-13, Tokyo, 1986, pp. 92-101.] They identify several design mistakes that hampered performance enter_environment management (i.e., make a new object-oriented environment accessible, specifically to change capability lists) compiler weaknesses, including - code optimization - optimizing procedure calls by using branch and link where appropriate rather than relying on the heavy-duty protected call - parameter-passing instruction set design weaknesses, including - bit-aligned instructions - no literals (immediate data) allowed in instructions They estimate that compiler weaknesses account for 25-35% loss of throughput, while inst. set design weaknesses account for another 5-10%; and, they point out that these are independent of the object-oriented nature of the 432. They postulate even better performance could have been obtained by adding local data registers expanding buses to 32 bits expanding top-of-stack register to 32 bits wider microinstruction bus access descriptor cache memory clear instruction Yet, in spite of the corrections and additions, they see a factor of two to three performance hit for the 432 style of object orientation. ---- From: mark@hubcap.clemson.edu (Mark Smotherman) Newsgroups: comp.arch Date: 17 Apr 95 17:43:30 GMT >Hmmm. Does this mean the particular compiler (an Ada compiler I >vaguely remember) was weak, or that compiler technology wasn't (or >isn't) up to the task? > >If the latter, then it sounds like an architectural problem. If >you're designing (or building or delivering) architectures that aren't >good targets for today's compilers, then you're making a mistake. >Sure, we've got to push the state of the compiler art, but while >you're waiting for those research results to trickle in, your >architecture will be dying on the vine. It was an Ada compiler, but the benchmarks they studied avoided the more complicated features of Ada. They identify several examples in the paper of the former problem (poor compiler implementation), including no flow analysis to determine if access to an object requires the execution of an expensive enter_environment instruction to establish initial access rights or if the access rights from a previous access are still valid; the compiler instead generated an enter_environment instruction for each access, even when the instruction was a loop invariant no common subexpression analysis, resulting in, e.g., no reuse of array index calculations even when the same array element was used on both sides of an assignment statement blanket use of the expensive, generalized procedure call, even to intra-package procedures, when less expensive branch_and_link instructions are available in the ISA (and for fans of the VAX 11/780 call instructions, so pointedly panned in Hennessy and Patterson, a typical 432 call takes 982 cycles plus 40 memory accesses, making it about ten times slower than even the infamous VAX CALLS) implementing every IN OUT parameter as call by value/result, even for large arrays, resulting in, e.g., the parameter passing overhead in Dhrystone being ten times the amount of time to run the rest of the benchmark ---- From: ... (Robert Colwell) Newsgroups: comp.arch Subject: Re: i432 Date: 27 Apr 95 21:26:16 Organization: Intel Corp., Hillsboro, Oregon In article <3ngeqn$rvs@obvious.ictv.com> emil@ictv.com (Emil Rojas) writes: In article , Robert Colwell wrote: There were some really neat ideas in that system. As I pointed out in my thesis in '85, the i432 was a wonderful research project masquerading as a bad product. I believe the research was done at CMU as C.MMP, Wulf, Jones, et. al. Let's be very careful here. The pioneering work into capability machines was indeed done at (among other places) CMU. But the i432 went far beyond where the CMU work left off. For instance, in the i432, even the physical processors were themselves objects, with SW-readable data records describing their states. Processes were also objects, and were managed just like any other object. The 432 folks extended the object paradigm to an unprecedented extent. It was almost breathtaking when you first realized what they'd attempted; a kind of grand unification of computer systems. I think it's a shame that the failure of the product (due at least partly to its very low performance) so obscured the contributions this design might have made. (I say that because my research convinced me that this slowness was not intrinsic to the design philophy. The poor performance was due to a combination of other factors.) This project could have turned out quite differently. ---- From: ... (Robert Colwell) Newsgroups: comp.arch Subject: Re: i432 Date: 27 Apr 95 21:42:41 Organization: Intel Corp., Hillsboro, Oregon In article rich@dcache.uucp (Richard J Taylor (System Architect)) writes: I was never impressed by the i432. It seemed to suffer form some of the same architectural excesses amd lack of attention to basic performance as the Burroughs 1700. In a previous job, I was asked to analyse the 432 as a competitor product at the time it was comming out. The state of the project could be gleaned from the published ISSCC paper on the 432. This paper described a hardware LRU algorithm for deciding which cache line to flush, and then went on to admit that they had only two words of cache. The 432 address cache had 4 entries, and when that cache missed, a 7-levels-of-indirection table walk ensued. Gordon Bell picked on this as "it's obvious that this will cripple the performance, no wonder it's slow." In the research I did, I never saw this effect. It's not so simple as blaming the 432's addressing for its low performance. My conclusions were: a) They were in a hurry to complete the project and they cut out cache to make it fit on the chip. A panic of this magnitude usually suggests that things are not really in control. 1) I don't think that cache was too small (I had lots of simulations to back that up) 2) I have no evidence that they chopped it down due to die size pressure 3) Even if they did have to downsize that cache, I don't think it qualifies necessarily as "panic"; it's a normal part of chip design. b) Given all the complicated addressing stuff, without adequate cache, it was not going to perform. So you're in Gordon's camp. You missed the real lessons of the i432. Read the paper again. My conclusion was that we did not need to worry about the 432, and I was right. You were right, but for the wrong reasons. That's not a trivial distinction. P.S. I always saw that paper by Doug Jensen and his students as being CMU defending its own. You're wrong. It's a technical paper. Speculating on why I might have done the work is pointless, and has nothing to do with the technical merits of its contribution. But in case you really are interested, my interest in doing the research, and then in writing it up, was this: the field of computer architecture has a very difficult time applying the canonical scientific approach. We never get to do a design twice, first one way, and then another way, so that we can see the effects of a given design choice. The i432 cost Intel $100M's. All we can do is analyze the final result, and incrementally tweak the design (via its simulator) to see if we can extract the various influences of the myriad design decisions embodied therein. So I did that analysis to see why such a radical design had gone so far wrong. I became convinced that the field as a whole was missing the point: the 432 wasn't slow because it was object-oriented. It was slow because it got some basic things wrong. That's an important lesson for ANY chip development. ----