IBM Power5 Processor

The Power5 processor has the same instruction pipeline as the Power4, but adds simultaneous multithreading (SMT) and has a different memory hierarchy. (There are also improvements to power management, RAS, etc.) This page highlights the Power5's memory hierarchy.


Memory hierarchy in general


Cache comparisons (Power5 vs. other processors)


Details on Power5 memory hierarchy levels

cycles
(2004)
lmbench
(2005)
sizes line size associativity write policy sharing transfer rate
registers 1 32+32
(120+120)
per thread
(per core)
L1 Icache 1 64 KB 128 bytes 2-way per core
L1 Dcache 2 2 32 KB 128 bytes 4-way write-through per core 2 words/cycle
L2 cache 13 32 1.875 MB 128 bytes 10-way write back shared 4 words/cycle
L3 cache 87 92 36 MB 256 bytes 12-way write back shared < 1 word/cycle
memory 220 403 4 GB per node

All caches on the Power5 use LRU replacement.


set/line/associativity/slice organization

  (note 1: the directory bits needed for valid/tag/writeback are not shown below)
  (note 2: some books use "bank" to mean what the IBM authors have called a slice;
     slices are used to avoid L2 access conflicts between the two CPUs)


L2     one of three slices (640 KB per slice)
         __________________________________     N-way set associativity
        /                                  \      => N banks operating
         bank 0     bank 1          bank 9           in parallel
        +-------+  +-------+       +-------+
     0  | line  |  |       |  ...  |       |    512 lines/bank
     1  |       |  |       |       |       |     * 128 bytes/line
   ...                                           * 10 banks/slice
   511  |       |  |       |       |       |     * 3 slices
        +-------+  +-------+       +-------+    = 1920 KB = 1.875 MB
        |<-128->|


L3                one of three slices (12 MB per slice)
         _______________________________________________________
        /                                                       \
             bank 0            bank 1                bank 11
        +--------------+  +--------------+       +--------------+
     0  |  sb0 / sb1   |  |              |  ...  |              |
     1  |              |  |              |       |              |  4096   l/b
     2  |              |  |              |       |              |   * 256 B/l
   ...                                                              * 12  b/s
  4094  |              |  |              |       |              |   * 3   s
  4095  |              |  |              |       |              |  = 36 MB
        +--------------+  +--------------+       +--------------+
        |<----256----->|


Power5 performance considerations


Other general (non-Power5-specific) memory hierarchy issues


Power5 links


Power4 optimization (Power5 has same instruction pipeline as Power4)


last updated: March 2006
Mark Smotherman, Dept. of Computer Science, Clemson University