System diagram for a PC (see also Figure 6.9) +------------+ | DRAM | | memory | processor +------------+ e.g., Pentium 4 || || +-----------+ +-----------------+ +------------+ | graphics | +-------+ |+--------+ +----+| FSB | |=======|accelerator|===|monitor| ||backside|=|CPU ||=========|North-Bridge| AGP |(with local| +-------+ ||L2 cache|=|core|| 800 MHz | controller | | memory) | |+--------+ +----+| | | +-----------+ +-----------------+ |(memory hub)| | |====== 1 GBit Ethernet +------------+ || +------------+ serial ATA | | South-Bridge has dedicated ports disk ================| | for legacy devices like mouse, |South-Bridge| keyboard, floppy; also has the parallel ATA | controller | serial and parallel ports CD/DVD ===============| | | (I/O hub) | South-Bridge typically contains 10 MBit Ethernet =====| | flash-eprom holding the BIOS, | | real-time clock, CMOS memory PCI bus ==============| | (w/ independent battery backup) +------------+ note that recent processors have incorporated North-Bridge functions also note that recent South-Bridge chips have extended functions, such as including a RAID controller Memory terms (see section B.9) SRAM - static RAM 6 transistors per bit used for caches DRAM - dynamic RAM 1 transistor per bit (capacitor charge needs periodic refreshing) physically smaller, less expensive, less power, cooler than SRAM used for main memory DRAM packaging SIMM - single inline memory module - 32-bit-wide memory width (72 pins, 36 bits using parity) DIMM - double inline memory module - 64-bit-wide memory width (168 pins, 72 bits using ECC) (note: DDR2, DDR3, DDR3 SDRAM DIMM packages use up to 288 pins) Main memory technology memory cell array word line - row decoder used to enable access all bits in given row bit line - column decoder bit | | word -------------*----------------------------- | | *----transistor----capacitor----supply voltage | example 4 Mbit DRAM (Figure B.9.6) row decoder memory cell array +-+ +-------------+ RAS --------------------------->| |------>| | .--------/----->| | ... | 2K x 2K | | high 11 bits | |------>| | --/-------------< +-+ +-------------+ 11-bit address | | | ... | | | +-------------+ | | sense/write |<-- R/W | +-------------+ | | | ... | | | +-+ | | ... | | | low 11 bits | |-->+-------------+ `-------/---------->| |...| row | CAS ------------------------------->| |-->+-------------+ +-+ ^ column | decoder v data bit square (n x n) array organization to reduce the amount of sense/write and decoding circuitry can also multiplex fewer address lines into the chip 1) pass row address when RAS signal asserted (row address strobe) 2) pass column address when CAS signal asserted (column address strobe) (note in DRAM terminology, row is also called a page) bus cycles - |-----|-- ... wait states ... --|-----| bus control - read . ack | shared lines - addr | data v ^ memory controller - latch RAS . CAS . | transfer actions addr | | | v v | DRAM actions - read select row cols faster DRAM fast page mode - memory controller or DRAM chip itself strobes columns (multiple CAS selections after one RAS selection) bus control - read ack shared lines - addr data data data data DRAM actions - RAS CAS CAS CAS CAS burst transfer - same effect from data in a saved buffer (but without repeated CAS selections) word-by-word bus transfer timing would be 2+w cycles per word (w = wait) ---- word 0 ---- ---- word 1 ---- ---- word 2 ---- / transfer \ / transfer \ / transfer \ +-----+ +-----+-----+ +-----+-----+ +-----+ |addr0|.wait.|data0|addr1|.wait.|data1|addr2|.wait.|data2| ... +-----+ +-----+-----+ +-----+-----+ +-----+ burst timing is (2+w)-1-1-1 (with zero wait states within burst) ---- four-word burst transfer ---- requires four-way interleaving / \ or automatic column switching +-----+ +-----+-----+-----+-----+ |addr0|.wait.|data0|data1|data2|data3| typically sized to match the +-----+ +-----+-----+-----+-----+ cache line size DRAM bus cycles 1 - cpu to memory controller 1 - lookup in controller memory map / arbitrate 1 - controller to DRAM chips 2 - DRAM chip response (2 for row hit, 4 for row miss) 1 - back through memory controller 1 - memory controller to cpu --- 7 burst mode cycles row hit 7-1-1-1 (10 bus cycles for four 64-bit doublewords vs. 28) row miss 9-1-1-1 cache miss on a system with a 100 MHz bus (10ns / bus cycle) can require over a hundred processor cycles (1 GHz CPU => 100-120 cycle miss penalty) Synchronous DRAM (SDRAM) synchronous with bus clock rather than asynchronous handshaking 64-bit wide data path - standard DIMM packaging burst length is programmable memory chips from different manufacturers were not interchangeable so Intel drew up strict specifications for 100MHz bus - PC100 c-d-p-r-T timing notation for SDRAM (in clock cycles) c - CAS latency d - RAS to CAS delay p - row precharge r - row active T - total latency DDR RAM DDR - double data rate - two words are read/written at a time vs. only one in plain SDRAM; data is transferred on both leading and trailing edges of bus clock signal DDR2 - four words are read/written at a time vs. only one in plain SDRAM; bus is clocked at twice the rate that memory cells are clocked now DDR3 and DDR4 e.g., DDR4-2133 SDRAM - has 17 GB/sec bandwidth memory controller may be dual or triple channel - two or three memory banks, each accessed in parallel by controller (another way of increasing the bandwidth) Flash memory like EEPROM (electrically erasable programmed ROM) nonvolatile immune to vibration deteriorates after a million writes or so (insulating oxide layer around gates breaks down) Overview of new storage technologies Burr, et al., "Overview of candidate device technologies for storage- class memory," IBM Journal of Research and Development, v. 52, no. 4/5, 2008. http://www.research.ibm.com/journal/rd/524/burr.html flash SONOS flash (silicon-oxide-nitride-oxide-silicon flash) nanocrystal flash FeRAM (ferroelectric RAM) FeFET MRAM (magnetic RAM) racetrack PCRAM (phase-change RAM) RRAM (resistive RAM) solid electrolyte organic memory