gem5/splash2/codes/apps/ocean/README.ocean

GENERAL INFORMATION:

The OCEAN program simulates large-scale ocean movements based on eddy and
boundary currents, and is an enhanced version of the SPLASH Ocean code.
A description of the functionality of this code can be found in the 
original SPLASH report.  The implementations contained in SPLASH-2 
differ from the original SPLASH implementation in the following ways:

  (1) The SPLASH-2 implementations are written in C rather than 
      FORTRAN.
  (2) Grids are partitioned into square-like subgrids rather than 
      groups of columns to improve the communication to computation 
      ratio.
  (3) The SOR solver in the SPLASH Ocean code has been replaced with a
      restricted Red-Black Gauss-Seidel Multigrid solver based on that 
      presented in:

      Brandt, A. Multi-Level Adaptive Solutions to Boundary-Value Problems.
           Mathematics of Computation, 31(138):333-390, April 1977.

      The solver is restricted so that each processor has as least two
      grid points in each dimension in each grid subpartition.

Two implementations are provided in the SPLASH-2 distribution:

  (1) Non-contiguous partition allocation

      This implementation (contained in the non_contiguous_partitions
      subdirectory) implements the grids to be operated on with
      two-dimensional arrays.  This data structure prevents partitions 
      from being allocated contiguously, but leads to a conceptually 
      simple programming implementation.

  (2) Contiguous partition allocation

      This implementation (contained in the contiguous_partitions 
      subdirectory) implements the grids to be operated on with
      3-dimensional arrays.  The first dimension specifies the processor
      which owns the partition, and the second and third dimensions 
      specify the x and y offset within a partition.  This data structure 
      allows partitions to be allocated contiguously and entirely in the 
      local memory of processors that "own" them, thus enhancing data
      locality properties.

The contiguous partition allocation implementation is described in:

Woo, S. C., Singh, J. P., and Hennessy, J. L.  The Performance Advantages
     of Integrating Message Passing in Cache-Coherent Multiprocessors.
     Technical Report CSL-TR-93-593, Stanford University, December 1993.

A detailed description of both versions will appear in the SPLASH-2 report.
The non-contiguous partition allocation implementation is conceptually
similar, except for the use of statically allocated 2-dimensional arrays.

These programs work under both the Unix FORK and SPROC models.

RUNNING THE PROGRAM:

To see how to run the program, please see the comment at the top of the
file main.C, or run the application with the "-h" command line option.
Five command line parameters can be specified, of which the ones which
would normally be changed are the number of grid points in each dimension,
and the number of processors.  The number of grid points must be a
(power of 2+2) in each dimension (e.g. 130, 258, etc.).  The number of
processors must be a power of 2.  Timing information is printed out at 
the end of the program.  The first timestep is considered part of the 
initialization phase of the program, and hence is not included in the 
"Total time without initialization."

BASE PROBLEM SIZE:

The base problem size for an upto-64 processor machine is a 258x258 grid.
The default values should be used for other parameters (except the number
of processors, which can be varied).  In addition, sample output files 
for the default parameters for each version of the code are contained in 
the file correct.out in each subdirectory.

DATA DISTRIBUTION:

Our "POSSIBLE ENHANCEMENT" comments in the source code tell where one
might want to distribute data and how.  Data distribution has an impact
on performance on the Stanford DASH multiprocessor.
Commit splash2 benchmark While at it also add the libpthread static library amd m5op_x86 for matrix multiplication test code as well. Note that the splash2 benchmark code does not comply with gem5 coding guidelines. Academic guys never seem to follow 80 columns and no whitespace guideline :(. 2017-04-26 17:20:15 +02:00			`GENERAL INFORMATION:`

			`The OCEAN program simulates large-scale ocean movements based on eddy and`
			`boundary currents, and is an enhanced version of the SPLASH Ocean code.`
			`A description of the functionality of this code can be found in the`
			`original SPLASH report. The implementations contained in SPLASH-2`
			`differ from the original SPLASH implementation in the following ways:`

			`(1) The SPLASH-2 implementations are written in C rather than`
			`FORTRAN.`
			`(2) Grids are partitioned into square-like subgrids rather than`
			`groups of columns to improve the communication to computation`
			`ratio.`
			`(3) The SOR solver in the SPLASH Ocean code has been replaced with a`
			`restricted Red-Black Gauss-Seidel Multigrid solver based on that`
			`presented in:`

			`Brandt, A. Multi-Level Adaptive Solutions to Boundary-Value Problems.`
			`Mathematics of Computation, 31(138):333-390, April 1977.`

			`The solver is restricted so that each processor has as least two`
			`grid points in each dimension in each grid subpartition.`

			`Two implementations are provided in the SPLASH-2 distribution:`

			`(1) Non-contiguous partition allocation`

			`This implementation (contained in the non_contiguous_partitions`
			`subdirectory) implements the grids to be operated on with`
			`two-dimensional arrays. This data structure prevents partitions`
			`from being allocated contiguously, but leads to a conceptually`
			`simple programming implementation.`

			`(2) Contiguous partition allocation`

			`This implementation (contained in the contiguous_partitions`
			`subdirectory) implements the grids to be operated on with`
			`3-dimensional arrays. The first dimension specifies the processor`
			`which owns the partition, and the second and third dimensions`
			`specify the x and y offset within a partition. This data structure`
			`allows partitions to be allocated contiguously and entirely in the`
			`local memory of processors that "own" them, thus enhancing data`
			`locality properties.`

			`The contiguous partition allocation implementation is described in:`

			`Woo, S. C., Singh, J. P., and Hennessy, J. L. The Performance Advantages`
			`of Integrating Message Passing in Cache-Coherent Multiprocessors.`
			`Technical Report CSL-TR-93-593, Stanford University, December 1993.`

			`A detailed description of both versions will appear in the SPLASH-2 report.`
			`The non-contiguous partition allocation implementation is conceptually`
			`similar, except for the use of statically allocated 2-dimensional arrays.`

			`These programs work under both the Unix FORK and SPROC models.`

			`RUNNING THE PROGRAM:`

			`To see how to run the program, please see the comment at the top of the`
			`file main.C, or run the application with the "-h" command line option.`
			`Five command line parameters can be specified, of which the ones which`
			`would normally be changed are the number of grid points in each dimension,`
			`and the number of processors. The number of grid points must be a`
			`(power of 2+2) in each dimension (e.g. 130, 258, etc.). The number of`
			`processors must be a power of 2. Timing information is printed out at`
			`the end of the program. The first timestep is considered part of the`
			`initialization phase of the program, and hence is not included in the`
			`"Total time without initialization."`

			`BASE PROBLEM SIZE:`

			`The base problem size for an upto-64 processor machine is a 258x258 grid.`
			`The default values should be used for other parameters (except the number`
			`of processors, which can be varied). In addition, sample output files`
			`for the default parameters for each version of the code are contained in`
			`the file correct.out in each subdirectory.`

			`DATA DISTRIBUTION:`

			`Our "POSSIBLE ENHANCEMENT" comments in the source code tell where one`
			`might want to distribute data and how. Data distribution has an impact`
			`on performance on the Stanford DASH multiprocessor.`