gem5/splash2/codes/apps/volrend
Sanchayan Maity 2fcc51c2c1 Commit splash2 benchmark
While at it also add the libpthread static library amd m5op_x86
for matrix multiplication test code as well.

Note that the splash2 benchmark code does not comply with gem5
coding guidelines. Academic guys never seem to follow 80 columns
and no whitespace guideline :(.
2017-04-26 20:50:15 +05:30
..
inputs Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
adaptive.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
address.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
anl.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
const.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
file.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
global.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
incl.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
libpthread.a Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
libtiff.tar.Z Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
m5op_x86.o Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
macros.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
main.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
Makefile Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
map.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
my_types.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
normal.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
octree.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
opacity.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
option.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
raytrace.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
README.volrend Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
render.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
user_options.H Commit splash2 benchmark 2017-04-26 20:50:15 +05:30
view.C Commit splash2 benchmark 2017-04-26 20:50:15 +05:30

GENERAL INFORMATION:

This code renders a three-dimensional volume onto a two-dimensional
image plane using an optimized ray casting technique developed by 
Marc Levoy.  A hierarchical octree data structure is used to represent 
the scene for efficient access, and early ray termination and antialiasing 
are implemented.   The best description of the algorithm can be found in:

Jason Nieh and Marc Levoy, "Volume Rendering on Scalable Shared-Memory
MIMD Architectures", Proc. Boston Workshop on Volume Visualization", 
October 1992. 

A briefer description can also be found in

Jaswinder Pal Singh, Anoop Gupta and Marc Levoy, "Parallel
Visualization Algorithms: Performance and Architectural Implications",
IEEE Computer, July 1994.

Further references to the sequential algorithm are contained in those
papers, and a detailed description will be in the SPLASH-2 report. 

RUNNING THE PROGRAM:

To see how to run the program, please see the comment at the top 
of the main.C file or run it as "VOLREND -h".  The base problem we 
recommend does not use adaptive pixel sampling, so it would not use 
the -a flag.  

One compile-time parameter that can be specified in the makefile
is FLIP.  This is used only in the I/O routines in file.H, and
specifies whether to reverse byte ordering when reading or writing
files.  Thus, if input files were written previously with one
byte ordering and are read using another byte ordering, the program
will crash with an error message ("Can't load version ... file").  
If this happens, you need to define/undefine FLIP to make sure the byte
ordering that is in the input file and what the program is expecting 
are the same.

The file user_options.H contains a number of compile-time options that
dictate what the program does.  That file describes how to use them.
Basically, there are two steps in the application: the first preprocesses 
the input .den file and generates the octree, the normal table, and the
opacity table; the next phase takes these structures as inputs and
does the parallel rendering.  One mode of the program is to only render,
reading in the octree, opacity table and normal table from precomputed
input files (.pyr, .opc and .norm, respectively).  This is the default
mode of the program, and is obtained by setting the RENDER_ONLY compile
time flag. If RENDER_ONLY is not defined, then the program will
not use these files but start from the .den file itself.  In this case,
there are two options:  if PREPROCESS is defined, then the program will
not render, but will create the .norm, .opc and .pyr files from the .den
file for a future run of the program to render. If PREPROCESS is not
defined either, then the program will not produce these intermediate
files:  It will start from the .den file, create the normal and opacity
tables etc as internal data structures, and render from them directly.

The SERIAL_PREPROC option tells whether the preprocessing phases (computing
the normal table or .norm file, etc.), should be done serially or in
parallel, when they are done.

If none of the three options are defined (RENDER_ONLY, PREPROCESS and 
SERIAL_PREPROC), the default is to do parallel preprocessing and 
rendering without storing intermediate array values.

In the base problem we recommend, we suggest doing the preprocessing first
off-line (perhaps serially), and then timing the parallel rendering part 
only.

We provide three input .den files.  head.den is a file containing a
256-by-256-by-126 voxel head.  head-scaleddown2.den scales this down
by a factor of two in each dimension, and head-scaleddown4.den by 
a factor of 4 in each dimension (head-scaleddown4 is thus clearly
only for testing).

There is a run-time command-line flag (-a) that controls whether 
adaptive pixel sampling is done.  We do not do it in our base problem.
If you use it, please say so explicitly in any results you present. 
Please also report the block size used for adaptive sampling (see code,
HBOXLEN parameter in user_options.H) in this case. 

There are several other compile-time parameters in the code, mainly in
user_options.H and const.H. They have default values which we describe 
below.  These are the values that we recommend for the base problem.  If 
you change them, please say so explicitly in any results you present. 
The main parameters of interest are:

user_options.H:

  BLOCK_LEN is the size of an image tile (in pixels: BLOCK_LEN by 
	BLOCK_LEN) which is the unit of task stealing (see papers
	cited above).  We recommend leaving this at 4.
  DIM: if set, DIM says to perform rotations of the volume (from
	one frame to the next) in each of the three spatial dimensions.
	Default is to do it only in one preset (the Y) dimension. 
	We recommend not setting DIM in the base problem.

const.H:
  
  ROTATE_STEPS:  The number of frames to be rendered (if DIM is set,
	then 3*ROTATE_STEPS frames are rendered).
  STEP_SIZE: the degree of rotation between frames.
  Other parameters should be left as they are. 

The volrend program generates a number of tiff files, one per frame,
that can then be viewed on a workstation.   Thus, the program
needs a tiff library to run (see Makefile).  The PBMPLUS tiff library 
works with this application.

BASE PROBLEM SIZE:

As stated above, we provide a 256-by-256-by-126 voxel human head data set, 
and two scaled down versions of the same data set.  The base problem
we recommend is the 256-by-256-by-126 data set. We shall soon provide 
a way to generate larger data sets.  Our base problem renders 4 frames, 
with DIM turned off (i.e. rotating in only one --- the Y --- dimension). 
The STEP_SIZE is 3 degrees.  And the BLOCK_LEN (the length of the square 
tile of pixels that is the unit of task stealing) is 4.

DATA DISTRIBUTION:

Data distribution is difficult in a shared address space version
of volume rendering (and, if it is required as in a message-passing 
abstraction, requires a different algorithm: see the paper
cited above).  One could place a processor's portion of the image
plane in its local memory. For the volume, however, particularly given 
changing viewpoints from which the scene might be rendered,
it is very difficult to place scene data appropriately in physically 
distributed main memory. Data distribution, however, does not make 
much difference to performance on the Stanford DASH multiprocessor.