Efficient Compilation of Stream Programs for Heterogeneous Architectures:
A Model-Checking based approach
Rajesh Kumar Thakur, Y.N.Srikant
Abstract :
Stream programming based on the synchronous data flow (SDF) model naturally exposes
data, task and pipeline parallelism. Statically scheduling stream programs for
homogeneous architectures has been an area of extensive research. With graphic
processing units (GPUs) now emerging as general purpose co-processors, scheduling
and distribution of these stream programs onto heterogeneous architectures (having
both GPUs and CPUs) provides for challenging research. Exploiting this abundant
parallelism in hardware, and providing a scalable solution is a hard problem.
In this paper we describe a coarse-grained software pipelined scheduling algorithm
for stream programs which statically schedules a stream graph onto heterogeneous
architectures. We formulate the problem of partitioning the work between the CPU
cores and the GPU as a model-checking problem. The partitioning process takes into
account the costs of the required buffer layout transformations associated with the
partitioning and the distribution of the stream graph. The solution trace result
from the model checking provides a map for the distribution of actors across
different processors/cores. This solution is then divided into stages, and then a
coarse grained software-pipelined code is generated. We use CUDA streams to map
these programs synergistically onto the CPU and GPUs. We use a performance model for
data transfers to determine the optimal number of CUDA streams on GPUs. Our
software-pipelined schedule yields a speedup of upto 55.86X and a geometric mean
speedup of 9.62X over a single threaded CPU.
pdf