The Power5 processor has the same instruction pipeline as the Power4, but adds simultaneous multithreading (SMT) and has a different memory hierarchy. (There are also improvements to power management, RAS, etc.) This page highlights the Power5's memory hierarchy.
| cycles (2004) |
lmbench (2005) |
sizes | line size | associativity | write policy | sharing | transfer rate | |
|---|---|---|---|---|---|---|---|---|
| registers | 1 | 32+32 (120+120) |
per thread (per core) |
|||||
| L1 Icache | 1 | 64 KB | 128 bytes | 2-way | per core | |||
| L1 Dcache | 2 | 2 | 32 KB | 128 bytes | 4-way | write-through | per core | 2 words/cycle |
| L2 cache | 13 | 32 | 1.875 MB | 128 bytes | 10-way | write back | shared | 4 words/cycle |
| L3 cache | 87 | 92 | 36 MB | 256 bytes | 12-way | write back | shared | < 1 word/cycle |
| memory | 220 | 403 | 4 GB per node |
All caches on the Power5 use LRU replacement.
set/line/associativity/slice organization
(note 1: the directory bits needed for valid/tag/writeback are not shown below)
(note 2: some books use "bank" to mean what the IBM authors have called a slice;
slices are used to avoid L2 access conflicts between the two CPUs)
L2 one of three slices (640 KB per slice)
__________________________________ N-way set associativity
/ \ => N banks operating
bank 0 bank 1 bank 9 in parallel
+-------+ +-------+ +-------+
0 | line | | | ... | | 512 lines/bank
1 | | | | | | * 128 bytes/line
... * 10 banks/slice
511 | | | | | | * 3 slices
+-------+ +-------+ +-------+ = 1920 KB = 1.875 MB
|<-128->|
L3 one of three slices (12 MB per slice)
_______________________________________________________
/ \
bank 0 bank 1 bank 11
+--------------+ +--------------+ +--------------+
0 | sb0 / sb1 | | | ... | |
1 | | | | | | 4096 l/b
2 | | | | | | * 256 B/l
... * 12 b/s
4094 | | | | | | * 3 s
4095 | | | | | | = 36 MB
+--------------+ +--------------+ +--------------+
|<----256----->|
integer m, n, i, j
real a(m,n), s(m)
c$doacross local(i,j), shared(s,a)
do i = 1,m
s(i) = 0.0
do j = 1, n
s(i) = s(i) + a(i,j)
enddo
enddo
last updated: March 2006
Mark Smotherman, Dept. of Computer Science, Clemson University