The Power5 processor has the same instruction pipeline as the Power4, but adds simultaneous multithreading (SMT) and has a different memory hierarchy. (There are also improvements to power management, RAS, etc.) This page highlights the Power5's memory hierarchy.
cycles (2004) |
lmbench (2005) |
sizes | line size | associativity | write policy | sharing | transfer rate | |
---|---|---|---|---|---|---|---|---|
registers | 1 | 32+32 (120+120) |
per thread (per core) |
|||||
L1 Icache | 1 | 64 KB | 128 bytes | 2-way | per core | |||
L1 Dcache | 2 | 2 | 32 KB | 128 bytes | 4-way | write-through | per core | 2 words/cycle |
L2 cache | 13 | 32 | 1.875 MB | 128 bytes | 10-way | write back | shared | 4 words/cycle |
L3 cache | 87 | 92 | 36 MB | 256 bytes | 12-way | write back | shared | < 1 word/cycle |
memory | 220 | 403 | 4 GB per node |
All caches on the Power5 use LRU replacement.
set/line/associativity/slice organization (note 1: the directory bits needed for valid/tag/writeback are not shown below) (note 2: some books use "bank" to mean what the IBM authors have called a slice; slices are used to avoid L2 access conflicts between the two CPUs) L2 one of three slices (640 KB per slice) __________________________________ N-way set associativity / \ => N banks operating bank 0 bank 1 bank 9 in parallel +-------+ +-------+ +-------+ 0 | line | | | ... | | 512 lines/bank 1 | | | | | | * 128 bytes/line ... * 10 banks/slice 511 | | | | | | * 3 slices +-------+ +-------+ +-------+ = 1920 KB = 1.875 MB |<-128->| L3 one of three slices (12 MB per slice) _______________________________________________________ / \ bank 0 bank 1 bank 11 +--------------+ +--------------+ +--------------+ 0 | sb0 / sb1 | | | ... | | 1 | | | | | | 4096 l/b 2 | | | | | | * 256 B/l ... * 12 b/s 4094 | | | | | | * 3 s 4095 | | | | | | = 36 MB +--------------+ +--------------+ +--------------+ |<----256----->|
integer m, n, i, j real a(m,n), s(m) c$doacross local(i,j), shared(s,a) do i = 1,m s(i) = 0.0 do j = 1, n s(i) = s(i) + a(i,j) enddo enddo
last updated: March 2006
Mark Smotherman, Dept. of Computer Science, Clemson University