Mark Smotherman
Last updated: April 2024
Summary: ILP dates backs to the 1940s, and various attempts have been made to exploit it over the years.
... under construction ...
... intro to be written ...
dependency fn. unit time to start checking assignment execution ---------- ---------- ------------- superscalar hardware hardware hardware ........... EPIC software . hardware hardware ............ dynamic VLIW software software . hardware ............ VLIW software software software code generation | superscalar |`----------------------------------. | | .-------|-----------------. .-------|-----------------. | v | | v | | dependency checking | | dependency checking | | O(n^2) | | | | | | | EPIC | | | | |`----------------------------------. | | v | | v | | fn. unit assignment | | fn. unit assignment | | | | dynamic | | | | | | VLIW | | | | |`----------------------------------. | | v | | v | | time to start execution | | time to start execution | | | | | | | | ` | VLIW | | | | `----------------------------------. | | | | | | `-------------------------' `-------|-----------------' compiler scheduling | hardware control | v hardware fn. units
5.4 ... The logical procedure to avoid these long durations, consists of
telescoping operations, that is of carrying out simultaneously as
many as possible. ...
Such accelerating, telescoping procedures are being used in all existing
devices. ... However, they save time only at exactly the rate at which
they multiply the necessary equipment, that is the number of elements
in the device: Clearly if a duration is halved by systematically carrying
out two additions at once, double adding equipment will be required (even
assuming that it can be used without disproportionate control facilities
and fully efficiently), etc.
This way of gaining time by increasing equipment is fully justifed in
non vacuum tube element devices, where gaining time is of the essence,
and extensive engineering experience is available regarding the handling
of involved devices containing many elements. A really all-purpose automatic
digital computing system constructed along these lines must, according to
all available experience, contain over 10,000 elements.
5.5
For a vacuum tube element device on the other hand, it would seem that the
opposite procedure holds more promise. ...
5.6 ...
Thus it seems worth while to consider the following viewpoint: The device
should be as simple as possible, that is, contain as few elements as
possible. This can be achieved by never performing two operations
simultaneously, if this would cause a signifcant increase in the number
of elements required. The result will be that the device will work more
reliably and the vacuum tubes can be driven to shorter reaction times
than otherwise.
5.3
At this point there arises another question of principle. In all existing
devices where the element is not a vacuum tube the reaction time of the
element is sufficiently long to make a certain telescoping
of the steps involved in addition, subtraction, and still more in
multiplication and division, desirable. ...
The full parallelism of [the distributed control version of]
ENIAC was seldom used for two reasons. First, hardly any problems
lent themselves to such extensive parallelism of operation. ...
Second, in the case of ENIAC, its slow manual method of problem
setup and its relatively slow input-output operations significantly
decreased the payoff for optimizaing parallelism.
... When two or more chains of operation were to proceed in parallel,
and were to be followed by another chain dependent upon them, the
operator would determine the computation time of each of the parallel
chains, and would connect the last program control of the longest
parallel chain to the first program control of the following chain.
It was not always possible, however, to figure out in advance the
computation time for a given chain. ...
In the case of the [divide/square-root] unit, therefore, each
program was provided with an extra program pulse input, called
the "interlock input," ... [to show] that a parallel sequence of
operations was complete ... The card-reader program control also
had an interlock ...
My thanks to David Hemmendinger for pointing me to the Harvard Mark I and other early machines with overlaps among the function units.
[History page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]
mark@cs.clemson.edu