Sanchayan Maity
2fcc51c2c1
While at it also add the libpthread static library amd m5op_x86 for matrix multiplication test code as well. Note that the splash2 benchmark code does not comply with gem5 coding guidelines. Academic guys never seem to follow 80 columns and no whitespace guideline :(.
124 lines
4.9 KiB
Text
124 lines
4.9 KiB
Text
We are pleased to announce the release of the SPLASH-2 suite of
|
|
multiprocessor applications. SPLASH-2 is the successor to the SPLASH
|
|
suite that we previously released, and the programs in it are also
|
|
written assuming a coherent shared address space communication model.
|
|
SPLASH-2 contains several new applications, as well as improved versions
|
|
of applications from SPLASH. The suite is currently available via
|
|
anonymous ftp to
|
|
|
|
www-flash.stanford.edu (in the pub/splash2 subdirectory)
|
|
|
|
and via the World-Wide-Web at
|
|
|
|
http://www-flash.stanford.edu/apps/SPLASH/
|
|
|
|
Several programs are currently available, and a few others will be added
|
|
shortly. The programs fall into two categories: full applications and
|
|
kernels. Additionally, we designate some of these as "core programs"
|
|
(see below). The applications and kernels currently available in the
|
|
SPLASH-2 suite include:
|
|
|
|
Applications:
|
|
Ocean Simulation
|
|
Ray Tracer
|
|
Hierarchical Radiosity
|
|
Volume Renderer
|
|
Water Simulation with Spatial Data Structure
|
|
Water Simulation without Spatial Data Structure
|
|
Barnes-Hut (gravitational N-body simulation)
|
|
Adaptive Fast Multipole (gravitational N-body simulation)
|
|
|
|
Kernels:
|
|
FFT
|
|
Blocked LU Decomposition
|
|
Blocked Sparse Cholesky Factorization
|
|
Radix Sort
|
|
|
|
Programs that will appear soon include:
|
|
|
|
PSIM4 - Particle Dynamics Simulation (full application)
|
|
Conjugate Gradient (kernel)
|
|
LocusRoute (standard cell router from SPLASH)
|
|
Protein Structure Prediction
|
|
Protein Sequencing
|
|
Parallel Probabilistic Inference
|
|
|
|
In some cases, we provide both well-optimized and less-optimized versions
|
|
of the programs. For both the Ocean simulation and the Blocked LU
|
|
Decomposition kernel, less optimized versions of the codes are currently
|
|
available.
|
|
|
|
There are important differences between applications in the SPLASH-2 suite
|
|
and applications in the SPLASH suite. These differences are noted in the
|
|
README.SPLASH2 file in the pub/splash2 directory. It is *VERY IMPORTANT*
|
|
that you read the README.SPLASH2 file, as well as the individual README
|
|
files in the program directories, before using the SPLASH-2 programs.
|
|
These files describe how to run the programs, provide commented annotations
|
|
about how to distribute data on a machine with physically distributed main
|
|
memory, and provides guidelines on the baseline problem sizes to use when
|
|
studying architectural interactions through simulation.
|
|
|
|
Complete documentation of SPLASH2, including a detailed characterization
|
|
of performance as well as memory system interactions and synchronization
|
|
behavior, will appear in the SPLASH2 report that is currently being
|
|
written.
|
|
|
|
|
|
OPTIMIZATION STRATEGY:
|
|
----------------------
|
|
|
|
For each application and kernel, we note potential features or
|
|
enhancements that are typically machine-specific. These potential
|
|
enhancements are encapsulated within comments in the code starting with
|
|
the string "POSSIBLE ENHANCEMENT." The potential enhancements which we
|
|
identify are:
|
|
|
|
(1) Data Distribution
|
|
|
|
We note where data migration routines should be called in order to
|
|
enhance locality of data access. We do not distribute data by
|
|
default as different machines implement migration routines in
|
|
different ways, and on some machines this is not relevant.
|
|
|
|
(2) Process-to-Processor Assignment
|
|
|
|
We note where calls can be made to "pin" processes to specific
|
|
processors so that process migration can be avoided. We do not
|
|
do this by default, since different machines implement this
|
|
feature in different ways.
|
|
|
|
In addition, to facilitate simulation studies, we note points in the
|
|
codes where statistics gathering routines should be turned on so that
|
|
cold-start and initialization effects can be avoided.
|
|
|
|
For two programs (Ocean and LU), we provide less-optimized versions of
|
|
the codes. The less-optimized versions utilize data structures that
|
|
lead to simpler implementations, but which do not allow for optimal data
|
|
distribution (and can generate false-sharing).
|
|
|
|
|
|
CORE PROGRAMS:
|
|
--------------
|
|
|
|
Since the number of programs has increased over SPLASH, and since not
|
|
everyone may be able to use all the programs in a given study, we
|
|
identify some of the programs as "core" programs that should be used
|
|
in most studies for comparability. In the currently available set,
|
|
these core programs include:
|
|
|
|
(1) Ocean Simulation
|
|
(2) Hierarchical Radiosity
|
|
(3) Water Simulation with Spatial data structure
|
|
(4) Barnes-Hut
|
|
(5) FFT
|
|
(6) Blocked Sparse Cholesky Factorization
|
|
(7) Radix Sort
|
|
|
|
The less optimized versions of the programs, when available, should be
|
|
used only in addition to these.
|
|
|
|
The base problem sizes that we recommend are provided in the README files
|
|
for individual applications. Please use at least these for experiments
|
|
with upto 64 processors. If changes are made to these base parameters
|
|
for further experimentation, these changes should be explicitly stated
|
|
in any results that are presented.
|