125 lines
4.9 KiB
Text
125 lines
4.9 KiB
Text
|
We are pleased to announce the release of the SPLASH-2 suite of
|
||
|
multiprocessor applications. SPLASH-2 is the successor to the SPLASH
|
||
|
suite that we previously released, and the programs in it are also
|
||
|
written assuming a coherent shared address space communication model.
|
||
|
SPLASH-2 contains several new applications, as well as improved versions
|
||
|
of applications from SPLASH. The suite is currently available via
|
||
|
anonymous ftp to
|
||
|
|
||
|
www-flash.stanford.edu (in the pub/splash2 subdirectory)
|
||
|
|
||
|
and via the World-Wide-Web at
|
||
|
|
||
|
http://www-flash.stanford.edu/apps/SPLASH/
|
||
|
|
||
|
Several programs are currently available, and a few others will be added
|
||
|
shortly. The programs fall into two categories: full applications and
|
||
|
kernels. Additionally, we designate some of these as "core programs"
|
||
|
(see below). The applications and kernels currently available in the
|
||
|
SPLASH-2 suite include:
|
||
|
|
||
|
Applications:
|
||
|
Ocean Simulation
|
||
|
Ray Tracer
|
||
|
Hierarchical Radiosity
|
||
|
Volume Renderer
|
||
|
Water Simulation with Spatial Data Structure
|
||
|
Water Simulation without Spatial Data Structure
|
||
|
Barnes-Hut (gravitational N-body simulation)
|
||
|
Adaptive Fast Multipole (gravitational N-body simulation)
|
||
|
|
||
|
Kernels:
|
||
|
FFT
|
||
|
Blocked LU Decomposition
|
||
|
Blocked Sparse Cholesky Factorization
|
||
|
Radix Sort
|
||
|
|
||
|
Programs that will appear soon include:
|
||
|
|
||
|
PSIM4 - Particle Dynamics Simulation (full application)
|
||
|
Conjugate Gradient (kernel)
|
||
|
LocusRoute (standard cell router from SPLASH)
|
||
|
Protein Structure Prediction
|
||
|
Protein Sequencing
|
||
|
Parallel Probabilistic Inference
|
||
|
|
||
|
In some cases, we provide both well-optimized and less-optimized versions
|
||
|
of the programs. For both the Ocean simulation and the Blocked LU
|
||
|
Decomposition kernel, less optimized versions of the codes are currently
|
||
|
available.
|
||
|
|
||
|
There are important differences between applications in the SPLASH-2 suite
|
||
|
and applications in the SPLASH suite. These differences are noted in the
|
||
|
README.SPLASH2 file in the pub/splash2 directory. It is *VERY IMPORTANT*
|
||
|
that you read the README.SPLASH2 file, as well as the individual README
|
||
|
files in the program directories, before using the SPLASH-2 programs.
|
||
|
These files describe how to run the programs, provide commented annotations
|
||
|
about how to distribute data on a machine with physically distributed main
|
||
|
memory, and provides guidelines on the baseline problem sizes to use when
|
||
|
studying architectural interactions through simulation.
|
||
|
|
||
|
Complete documentation of SPLASH2, including a detailed characterization
|
||
|
of performance as well as memory system interactions and synchronization
|
||
|
behavior, will appear in the SPLASH2 report that is currently being
|
||
|
written.
|
||
|
|
||
|
|
||
|
OPTIMIZATION STRATEGY:
|
||
|
----------------------
|
||
|
|
||
|
For each application and kernel, we note potential features or
|
||
|
enhancements that are typically machine-specific. These potential
|
||
|
enhancements are encapsulated within comments in the code starting with
|
||
|
the string "POSSIBLE ENHANCEMENT." The potential enhancements which we
|
||
|
identify are:
|
||
|
|
||
|
(1) Data Distribution
|
||
|
|
||
|
We note where data migration routines should be called in order to
|
||
|
enhance locality of data access. We do not distribute data by
|
||
|
default as different machines implement migration routines in
|
||
|
different ways, and on some machines this is not relevant.
|
||
|
|
||
|
(2) Process-to-Processor Assignment
|
||
|
|
||
|
We note where calls can be made to "pin" processes to specific
|
||
|
processors so that process migration can be avoided. We do not
|
||
|
do this by default, since different machines implement this
|
||
|
feature in different ways.
|
||
|
|
||
|
In addition, to facilitate simulation studies, we note points in the
|
||
|
codes where statistics gathering routines should be turned on so that
|
||
|
cold-start and initialization effects can be avoided.
|
||
|
|
||
|
For two programs (Ocean and LU), we provide less-optimized versions of
|
||
|
the codes. The less-optimized versions utilize data structures that
|
||
|
lead to simpler implementations, but which do not allow for optimal data
|
||
|
distribution (and can generate false-sharing).
|
||
|
|
||
|
|
||
|
CORE PROGRAMS:
|
||
|
--------------
|
||
|
|
||
|
Since the number of programs has increased over SPLASH, and since not
|
||
|
everyone may be able to use all the programs in a given study, we
|
||
|
identify some of the programs as "core" programs that should be used
|
||
|
in most studies for comparability. In the currently available set,
|
||
|
these core programs include:
|
||
|
|
||
|
(1) Ocean Simulation
|
||
|
(2) Hierarchical Radiosity
|
||
|
(3) Water Simulation with Spatial data structure
|
||
|
(4) Barnes-Hut
|
||
|
(5) FFT
|
||
|
(6) Blocked Sparse Cholesky Factorization
|
||
|
(7) Radix Sort
|
||
|
|
||
|
The less optimized versions of the programs, when available, should be
|
||
|
used only in addition to these.
|
||
|
|
||
|
The base problem sizes that we recommend are provided in the README files
|
||
|
for individual applications. Please use at least these for experiments
|
||
|
with upto 64 processors. If changes are made to these base parameters
|
||
|
for further experimentation, these changes should be explicitly stated
|
||
|
in any results that are presented.
|