contol unit implementation (more detailed approach in Appendix D) datapath - CPU registers, ALU, and interconnecting buses register transfer language (RTL) - precisely describes datapath activities control signals - enabling signals sent to the datapath that control the registers and that select the operation of the ALU control unit - generates the control signals that activate the datapath control points - points in the datapath that enable register input/output rule of shared buses - only one value on a shared bus at a time tri-state buffer - used on each register output line to connect/disconnect a register to a shared bus . E A | B |\ ------+---- | \ 0 0 | Z A -->| --> B 0 1 | Z | / 1 0 | 0 |/ 1 1 | 1 '| E ----' E = enable = control signal to output the contents of a register onto a bus Z = high-impedance = electrically disconnected tri-state buffers provide for connecting multiple sources to a single shared interconnect (internal bus) higher-level block diagram actually implemented using using registers as blocks flip-flops, t-s-buffers, etc. / R1_enable bus R1 | bus +------+ n R1_enable .-. / +-----+ .-* d0 d1 ... | R1 |--/-------o------>| | => |+---+| |\ | | | +------+ | | \ ||FF0|-----| --------* | ... | | |+---+| |/ | | | +------+ n R2_enable | | \ | | | | | ... | R2 |--/-------o------>| | | | .-* | | +------+ | | \ |+---+| |\ | | | ... | | ||FF1|-----| --------|--* ... | | \ |+---+| |/ | | | ... | ... | | | | loading a register can be controlled by a multiplexer select signal that either recirculates the current flip-flop values or selects external values (an alternative without using muxes is to AND the clock signal to the FFs with the select signal) higher-level block diagram actually implemented using using registers as blocks flip-flops, muxes, etc. n R1_in +------+ R1_in input --/-----o--->| R1 |-> | +------+ .---------------------. | .----* | | v | | | +------+ | +---+ | `->|2-to-1|--->|FF0|--*--> input_0 --->| mux | | +---+ +------+ | | .---------------------. | .----* | | v | | | +------+ | +---+ | `->|2-to-1|--->|FF1|--*--> input_1 --->| mux | | +---+ +------+ | | ... general register datapath consider an 8-bit instruction with four 2-bit fields opcode Rsrc1, Rsrc2, Rdest a three-bus datapath might be structured as follows +-------+ |2-to-4 |<-------------------------------------------- 2-bit |decoder|<-------------------------------------------- Rdest +-------+ | | | | v | | | +----+ n .---o-------->| R0 |--/--*-----------. | v | | +----+ n | | *-----o------>| R1 |--/----*-----------. | v | +----+ n | | | | *-------o---->| R2 |--/------*-----------. | v +----+ n | | | | | | *---------o-->| R3 |--/--------*-----------. | +----+ | | | | v v v v | | | | | +-------+ | | | | | |4-to-1 |<----------- 2-bit | | | | | | mux |<----------- Rsrc2 | v v v v +-------+ | +-------+ | | |4-to-1 |<----------------------- 2-bit | | mux |<----------------------- Rsrc1 | +-------+ | | | | | / n / n | v v | +---- ----+ +-------+ | \ \_____/ /<--| | | \ /<---|2-to-4 |<-- 2-bit | \ ALU /<----|decoder|<-- opcode | \_______/<-----| | | n bits wide | +-------+ `------------/--------------------' timing +----------+ +---------------+ +-------------+ | source |--|>-------bus------| combinational |--------| destination | | register | | | logic | .--| register | +----------+ | +---------------+ | +-------------+ R_out R_in |<---propagation---->|<----logic---->|<--setup-->|<--hold-->| | delay for bus delay time time | | | |<----------------minimum clock cycle time----------------->| for clock timing, you find the delay time along the longest active path in a datapath; delays can typically be separately identified, as in the diagram above setup and hold - defines how long input to register must remain valid (see Figure C.8.6) if each identified component of the delay time above is, e.g., 1 nsec, then the minimum clock cycle time would be 4 nsec and the maximum clock rate would be 250 MHz an example of a simple datapath and control .-. +-------------+ incrementer will .-----------------------| |-->| incrementer |--. always increment | R1_in +------+ R1_out | | +-------------+ o W_in value on bus `---o-->| R1 |---o--->| | v +------+ | | W_out +-------+ .-----------------------| |<--o------------| W | | R2_in +------+ R2_out | | +-------+ `---o-->| R2 |---o--->| | each line (and +------+ | |-----------------------. the bus) are .-----------------------| | | n bits wide | R3_in +------+ R3_out | | Y_in +-------+ | `---o-->| R3 |---o--->| |--o-->| Y | | +------+ | | +-------+ | .-----------------------| | | | | R4_in +------+ R4_out | | v v `---o-->| R4 |---o--->| | ----- ----- +------+ | | \ \______/ / | | \ / adder will always ... | | \ adder / add value in Y | | \__________/ with value on bus | | | | | o Z_in | | v | | Z_out +-------+ | |<--o---------| Z | `-' +-------+ bus to implement R[3] <- R[1] + R[2] + 1 step-by-step RTL corresponding control signal sequence -------------------- ------------------------------------- 1) W <- R[2] + 1; 1) W_in, R[2]_out; // W_in subsumes the +1 2) Y <- W; 2) Y_in, W_out; 3) Z <- R[1] + Y; 3) Z_in, R[1]_out; // Z_in subsumes the add 4) R[3] <- Z; 4) R[3]_in, Z_out; step one can accomplish the +1 in a single step step two "stages" the one operand for an add into the Y register step three provides the second operand for the add using the bus step four stores the addition result back into a general register a memory bus is different from an internal bus internal bus - data lines only - provides interconnection among registers and fn units memory bus - address, data, and control (read/write) lines - connected on one end to MAR and MDR in CPU and on other end to the main memory instruction execution on simple computer with single internal bus (see http://www.cs.clemson.edu/~mark/uprog.html and section 7.2 in text) +-------+ +-------+ +-------+ pcincr->| PC | | MAR | | ACC |(=0)-> acceq0 +-------+ +-------+ +-------+ | ^ ^ | ^ PC_out o o PC_in MAR_in o ACC_out o o ACC_in v | | v | ------------*---------------------*--------------------*------------------ ^ | ^ | | ^ IR_out o o IR_in MDR_out o o MDR_in | o TEMP_out | v | v | | +-------+ +-------+ | | | IR | | MDR |-----. | | +-------+ +-------+ v v | ----- ----- | \ \______/ / | (memory signals) \ / | read aluadd-->\ ALU / | write \__________/ | | | (timing signals) v | T0, T1, ... +------+ | | TEMP |---------' +------+ load instruction: T0: PC_out, MAR_in T1: read, pcincr T2: MDR_out, IR_in T3: time step for decoding opcode in IR T4: IR_out(addr part), MAR_in T5: read T6: MDR_out, ACC_in, reset to T0 add instruction: T0: PC_out, MAR_in T1: read, pcincr T2: MDR_out, IR_in T3: time step for decoding opcode in IR T4: IR_out(addr part), MAR_in T5: read T6: ACC_out, aluadd T7: TEMP_out, ACC_in, reset to T0 store instruction: T0: PC_out, MAR_in T1: read, pcincr T2: MDR_out, IR_in T3: time step for decoding opcode in IR T4: IR_out(addr part), MAR_in T5: ACC_out, MDR_in T6: write, reset to T0 brz instruction: T0: PC_out, MAR_in (branch on zero) T1: read, pcincr T2: MDR_out, IR_in T3: time step for decoding opcode in IR T4: if (acceq0) then { IR_out(addr part), PC_in } T5: reset to T0 hardwired implementation (see section 7.4 in text) develop inverted table of when control signals are active PC_out = T0 MAR_in = T0 + load*T4 + add*T4 + store_T4 ... implement these logic expressions with random logic or with PLA instruction register +--------+-----+ | opcode | ... | +--------+-----+ | | +--------------+ |2-to-4 decoder| +--------------+ | | | | load,add,store,brz v v v v +------+ +----------------+ clk->| ring |->T0->| | random logic | cntr |->T1->| implementation | or | |... | | PLA +------+ +----------------+ | | ... | v v v control signals microprogrammed implementation (see section 7.5 in text) control store contents addr control signals next addr or control signal names +-----------------+-----------+---+ 0 | 000001000100000 | 0010 (=2) | 0 | MAR_in, PC_out decoding 1 | 000010001000000 | 0000 (=0) | 0 | IR_out, PC_in table 2 | 000000000011000 | 0011 (=3) | 0 | pc_incr, read op addr 3 | 000100010000000 | 0100 (=4) | 0 | IR_in, MDR_out +----+ 4 | 000000000000001 | 0000 (=0) | 0 | br_table ----------> 00|0101| 5 | 000011000000000 | 0110 (=6) | 0 | IR_out, MAR_in 01|1000| 6 | 000000000001000 | 0111 (=7) | 0 | read 10|1100| 7 | 100000010000000 | 0000 (=0) | 0 | ACC_in, MDR_out 11|1111| 8 | 000011000000000 | 1001 (=9) | 0 | IR_out, MAR_in +----+ 9 | 000000000001000 | 1010 (=a) | 0 | read a | 011000000000000 | 1011 (=b) | 0 | ACC_out, alu_add b | 100000000000100 | 0000 (=0) | 0 | ACC_in, TMP_out c | 000011000000000 | 1101 (=d) | 0 | IR_out, MAR_in d | 010000100000000 | 1110 (=e) | 0 | ACC_out, MDR_in e | 000000000000010 | 0000 (=0) | 0 | write f | 000000000000000 | 0000 (=0) | 1 | or_addr (low bit of next addr is +-----------------+-----------+---+ or'ed with ACC==0 condition) example trace of steps cycle PC IR MAR MDR ACC TMP CSAR CSIR --------------------------------------------------------------- 1: 0 0 0 0 0 0 0 00000100010000020 MAR_in PC_out 2: 0 0 0 0 0 0 2 00000000001100030 pc_incr read 3: 1 0 0 0 0 0 3 00010001000000040 IR_in MDR_out 4: 1 0 0 0 0 0 4 00000000000000100 br_table 5: 1 0 0 0 0 0 5 00001100000000060 IR_out MAR_in 6: 1 0 0 0 0 0 6 00000000000100070 read 7: 1 0 0 0 0 0 7 10000001000000000 ACC_in MDR_out --------------------------------------------------------------- 8: 1 0 0 0 0 0 0 00000100010000020 MAR_in PC_out 9: 1 0 1 0 0 0 2 00000000001100030 pc_incr read 10: 2 0 1 c12 0 0 3 00010001000000040 IR_in MDR_out 11: 2 c12 1 c12 0 0 4 00000000000000100 br_table 12: 2 c12 1 c12 0 0 f 00000000000000001 or_addr 13: 2 c12 1 c12 0 0 1 00001000100000000 IR_out PC_in --------------------------------------------------------------- ...