db_connect: Could not connect to paper db at "wotug@dragon.kent.ac.uk"
db_connect: Could not connect to paper db at "wotug@dragon.kent.ac.uk"
%T A Comparison Of Data\-Parallel Programming Systems With Accelerator
db_connect: Could not connect to paper db at "wotug@dragon.kent.ac.uk"
%A Alex Cole, Alistair A. McEwan, Satnam Singh
db_connect: Could not connect to paper db at "wotug@dragon.kent.ac.uk"
%E Peter H. Welch, Adam T. Sampson, Jan Bækgaard Pedersen, Jon Kerridge, Jan F. Broenink, Frederick R. M. Barnes
%B Communicating Process Architectures 2011
%X Data parallel programming provides an accessible model for
exploiting
the power of parallel computing elements without
resorting to the
explicit use of low level programming
techniques based on locks,
threads and monitors.
The
emergence of GPUs with hundreds or thousands of
processing
cores has made data parallel computing available
to a wider class of
programmers. GPUs can be used not only
for accelerating the
processing of computer graphics but
also for general purpose
data\-parallel programming. Low
level data\-parallel programming
languages based on the CUDA
provide an approach for developing
programs for GPUs but
these languages require explicit creation and
coordination
of threads and careful data layout and movement. This
has
created a demand for higher level programming languages
and libraries
which raise the abstraction level of
data\-parallel programming and
increase programmer
productivity.
The Accelerator system was developed by
Microsoft for writing data
parallel code in a high level
manner which can execute on GPUs,
multicore processors using
SSE3 vector instructions and FPGA chips.
This paper compares
the performance and development effort of
the high level
Accelerator system against lower level systems which
are
more difficult to use but may yield better results.
Specifically,
we compare against the NVIDIA CUDA compiler
and sequential C++
code considering both the level of
abstraction in the
implementation code and the execution
models. We compare the
performance of these systems using
several case studies. For some
classes of problems,
Accelerator has a performance comparable to
CUDA, but for
others its performance is significantly reduced
however in
all cases it provides a model which is easier to use
and
allows for greater programmer productivity.