Scalable Working Set Estimation Method For Chip Multicores Using Tagged
Bloom Filter And Its Application
Aparna Mandke, Bharadwaj Amrutur, Y.N.Srikant
Abstract:
In chip multicore platforms (CMPs), leakage power consumption of large
on-chip caches has already become a major power consuming component of
the memory subsystem. Leakage power can be saved by switching off
over-allocated ways in associative cache. However, the state-of-the-art
heuristics such as average memory latency or cache miss rate fail to
achieve near optimal energy savings. This is either due to dispersed
nature of large caches or they are not fast enough to respond to changes
in working set size (WSS), especially in case of over-provisioning of
cache. Hence, we first propose a new kind of bloom filter, which we call
it as a ``tagged bloom filter (TBF)''. We implement TBF implicitly in last
level cache on a scalable tiled chip multicore platform. TBF is then used
to estimate WSS of an application and switch-off over-allocated cache ways
in Static and Dynamic Nonuniform Cache Architecture (SNUCA, DNUCA)
accordingly. In our implementation of adaptable way SNUCA and DNUCA
caches, associativity decision is taken locally by each L2 controller,
making it scalable with the number of tiles present on CMP. It gives
average of 22% and
23% more EDP savings than average memory latency and cache miss rate
heuristics on SNUCA, respectively.
pdf