394 lines
18 KiB
Text
394 lines
18 KiB
Text
Running the Tests
|
|
=================
|
|
|
|
All the tests are executed using the "Run" script in the top-level directory.
|
|
|
|
The simplest way to generate results is with the commmand:
|
|
./Run
|
|
|
|
This will run a standard "index" test (see "The BYTE Index" below), and
|
|
save the report in the "results" directory, with a filename like
|
|
hostname-2007-09-23-01
|
|
An HTML version is also saved.
|
|
|
|
If you want to generate both the basic system index and the graphics index,
|
|
then do:
|
|
./Run gindex
|
|
|
|
If your system has more than one CPU, the tests will be run twice -- once
|
|
with a single copy of each test running at once, and once with N copies,
|
|
where N is the number of CPUs. Some categories of tests, however (currently
|
|
the graphics tests) will only run with a single copy.
|
|
|
|
Since the tests are based on constant time (variable work), a "system"
|
|
run usually takes about 29 minutes; the "graphics" part about 18 minutes.
|
|
A "gindex" run on a dual-core machine will do 2 "system" passes (single-
|
|
and dual-processing) and one "graphics" run, for a total around one and
|
|
a quarter hours.
|
|
|
|
============================================================================
|
|
|
|
Detailed Usage
|
|
==============
|
|
|
|
The Run script takes a number of options which you can use to customise a
|
|
test, and you can specify the names of the tests to run. The full usage
|
|
is:
|
|
|
|
Run [ -q | -v ] [-i <n> ] [-c <n> [-c <n> ...]] [test ...]
|
|
|
|
The option flags are:
|
|
|
|
-q Run in quiet mode.
|
|
-v Run in verbose mode.
|
|
-i <count> Run <count> iterations for each test -- slower tests
|
|
use <count> / 3, but at least 1. Defaults to 10 (3 for
|
|
slow tests).
|
|
-c <n> Run <n> copies of each test in parallel.
|
|
|
|
The -c option can be given multiple times; for example:
|
|
|
|
./Run -c 1 -c 4
|
|
|
|
will run a single-streamed pass, then a 4-streamed pass. Note that some
|
|
tests (currently the graphics tests) will only run in a single-streamed pass.
|
|
|
|
The remaining non-flag arguments are taken to be the names of tests to run.
|
|
The default is to run "index". See "Tests" below.
|
|
|
|
When running the tests, I do *not* recommend switching to single-user mode
|
|
("init 1"). This seems to change the results in ways I don't understand,
|
|
and it's not realistic (unless your system will actually be running in this
|
|
mode, of course). However, if using a windowing system, you may want to
|
|
switch to a minimal window setup (for example, log in to a "twm" session),
|
|
so that randomly-churning background processes don't randomise the results
|
|
too much. This is particularly true for the graphics tests.
|
|
|
|
|
|
============================================================================
|
|
|
|
Tests
|
|
=====
|
|
|
|
The available tests are organised into categories; when generating index
|
|
scores (see "The BYTE Index" below) the results for each category are
|
|
produced separately. The categories are:
|
|
|
|
system The original Unix system tests (not all are actually
|
|
in the index)
|
|
2d 2D graphics tests (not all are actually in the index)
|
|
3d 3D graphics tests
|
|
misc Various non-indexed tests
|
|
|
|
The following individual tests are available:
|
|
|
|
system:
|
|
dhry2reg Dhrystone 2 using register variables
|
|
whetstone-double Double-Precision Whetstone
|
|
syscall System Call Overhead
|
|
pipe Pipe Throughput
|
|
context1 Pipe-based Context Switching
|
|
spawn Process Creation
|
|
execl Execl Throughput
|
|
fstime-w File Write 1024 bufsize 2000 maxblocks
|
|
fstime-r File Read 1024 bufsize 2000 maxblocks
|
|
fstime File Copy 1024 bufsize 2000 maxblocks
|
|
fsbuffer-w File Write 256 bufsize 500 maxblocks
|
|
fsbuffer-r File Read 256 bufsize 500 maxblocks
|
|
fsbuffer File Copy 256 bufsize 500 maxblocks
|
|
fsdisk-w File Write 4096 bufsize 8000 maxblocks
|
|
fsdisk-r File Read 4096 bufsize 8000 maxblocks
|
|
fsdisk File Copy 4096 bufsize 8000 maxblocks
|
|
shell1 Shell Scripts (1 concurrent) (runs "looper 60 multi.sh 1")
|
|
shell8 Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 8")
|
|
shell16 Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 16")
|
|
|
|
2d:
|
|
2d-rects 2D graphics: rectangles
|
|
2d-lines 2D graphics: lines
|
|
2d-circle 2D graphics: circles
|
|
2d-ellipse 2D graphics: ellipses
|
|
2d-shapes 2D graphics: polygons
|
|
2d-aashapes 2D graphics: aa polygons
|
|
2d-polys 2D graphics: complex polygons
|
|
2d-text 2D graphics: text
|
|
2d-blit 2D graphics: images and blits
|
|
2d-window 2D graphics: windows
|
|
|
|
3d:
|
|
ubgears 3D graphics: gears
|
|
|
|
misc:
|
|
C C Compiler Throughput ("looper 60 $cCompiler cctest.c")
|
|
arithoh Arithoh (huh?)
|
|
short Arithmetic Test (short) (this is arith.c configured for
|
|
"short" variables; ditto for the ones below)
|
|
int Arithmetic Test (int)
|
|
long Arithmetic Test (long)
|
|
float Arithmetic Test (float)
|
|
double Arithmetic Test (double)
|
|
dc Dc: sqrt(2) to 99 decimal places (runs
|
|
"looper 30 dc < dc.dat", using your system's copy of "dc")
|
|
hanoi Recursion Test -- Tower of Hanoi
|
|
grep Grep for a string in a large file, using your system's
|
|
copy of "grep"
|
|
sysexec Exercise fork() and exec().
|
|
|
|
The following pseudo-test names are aliases for combinations of other
|
|
tests:
|
|
|
|
arithmetic Runs arithoh, short, int, long, float, double,
|
|
and whetstone-double
|
|
dhry Alias for dhry2reg
|
|
dhrystone Alias for dhry2reg
|
|
whets Alias for whetstone-double
|
|
whetstone Alias for whetstone-double
|
|
load Runs shell1, shell8, and shell16
|
|
misc Runs C, dc, and hanoi
|
|
speed Runs the arithmetic and system groups
|
|
oldsystem Runs execl, fstime, fsbuffer, fsdisk, pipe, context1,
|
|
spawn, and syscall
|
|
system Runs oldsystem plus shell1, shell8, and shell16
|
|
fs Runs fstime-w, fstime-r, fstime, fsbuffer-w,
|
|
fsbuffer-r, fsbuffer, fsdisk-w, fsdisk-r, and fsdisk
|
|
shell Runs shell1, shell8, and shell16
|
|
|
|
index Runs the tests which constitute the official index:
|
|
the oldsystem group, plus dhry2reg, whetstone-double,
|
|
shell1, and shell8
|
|
See "The BYTE Index" below for more information.
|
|
graphics Runs the tests which constitute the graphics index:
|
|
2d-rects, 2d-ellipse, 2d-aashapes, 2d-text, 2d-blit,
|
|
2d-window, and ubgears
|
|
gindex Runs the index and graphics groups, to generate both
|
|
sets of index results
|
|
|
|
all Runs all tests
|
|
|
|
|
|
============================================================================
|
|
|
|
The BYTE Index
|
|
==============
|
|
|
|
The purpose of this test is to provide a basic indicator of the performance
|
|
of a Unix-like system; hence, multiple tests are used to test various
|
|
aspects of the system's performance. These test results are then compared
|
|
to the scores from a baseline system to produce an index value, which is
|
|
generally easier to handle than the raw sores. The entire set of index
|
|
values is then combined to make an overall index for the system.
|
|
|
|
Since 1995, the baseline system has been "George", a SPARCstation 20-61
|
|
with 128 MB RAM, a SPARC Storage Array, and Solaris 2.3, whose ratings
|
|
were set at 10.0. (So a system which scores 520 is 52 times faster than
|
|
this machine.) Since the numbers are really only useful in a relative
|
|
sense, there's no particular reason to update the base system, so for the
|
|
sake of consistency it's probably best to leave it alone. George's scores
|
|
are in the file "pgms/index.base"; this file is used to calculate the
|
|
index scores for any particular run.
|
|
|
|
Over the years, various changes have been made to the set of tests in the
|
|
index. Although there is a desire for a consistent baseline, various tests
|
|
have been determined to be misleading, and have been removed; and a few
|
|
alternatives have been added. These changes are detailed in the README,
|
|
and should be born in mind when looking at old scores.
|
|
|
|
A number of tests are included in the benchmark suite which are not part of
|
|
the index, for various reasons; these tests can of course be run manually.
|
|
See "Tests" above.
|
|
|
|
|
|
============================================================================
|
|
|
|
Graphics Tests
|
|
==============
|
|
|
|
As of version 5.1, UnixBench now contains some graphics benchmarks. These
|
|
are intended to give a rough idea of the general graphics performance of
|
|
a system.
|
|
|
|
The graphics tests are in categories "2d" and "3d", so the index scores
|
|
for these tests are separate from the basic system index. This seems
|
|
like a sensible division, since the graphics performance of a system
|
|
depends largely on the graphics adaptor.
|
|
|
|
The tests currently consist of some 2D "x11perf" tests and "ubgears".
|
|
|
|
* The 2D tests are a selection of the x11perf tests, using the host
|
|
system's x11perf command (which must be installed and in the search
|
|
path). Only a few of the x11perf tests are used, in the interests
|
|
of completing a test run in a reasonable time; if you want to do
|
|
detailed diagnosis of an X server or graphics chip, then use x11perf
|
|
directly.
|
|
|
|
* The 3D test is "ubgears", a modified version of the familiar "glxgears".
|
|
This version runs for 5 seconds to "warm up", then performs a timed
|
|
run and displays the average frames-per-second.
|
|
|
|
On multi-CPU systems, the graphics tests will only run in single-processing
|
|
mode. This is because the meaning of running two copies of a test at once
|
|
is dubious; and the test windows tend to overlay each other, meaning that
|
|
the window behind isn't actually doing any work.
|
|
|
|
|
|
============================================================================
|
|
|
|
Multiple CPUs
|
|
=============
|
|
|
|
If your system has multiple CPUs, the default behaviour is to run the selected
|
|
tests twice -- once with one copy of each test program running at a time,
|
|
and once with N copies, where N is the number of CPUs. (You can override
|
|
this with the "-c" option; see "Detailed Usage" above.) This is designed to
|
|
allow you to assess:
|
|
|
|
- the performance of your system when running a single task
|
|
- the performance of your system when running multiple tasks
|
|
- the gain from your system's implementation of parallel processing
|
|
|
|
The results, however, need to be handled with care. Here are the results
|
|
of two runs on a dual-processor system, one in single-processing mode, one
|
|
dual-processing:
|
|
|
|
Test Single Dual Gain
|
|
-------------------- ------ ------ ----
|
|
Dhrystone 2 562.5 1110.3 97%
|
|
Double Whetstone 320.0 640.4 100%
|
|
Execl Throughput 450.4 880.3 95%
|
|
File Copy 1024 759.4 595.9 -22%
|
|
File Copy 256 535.8 438.8 -18%
|
|
File Copy 4096 1261.8 1043.4 -17%
|
|
Pipe Throughput 481.0 979.3 104%
|
|
Pipe-based Switching 326.8 1229.0 276%
|
|
Process Creation 917.2 1714.1 87%
|
|
Shell Scripts (1) 1064.9 1566.3 47%
|
|
Shell Scripts (8) 1567.7 1709.9 9%
|
|
System Call Overhead 944.2 1445.5 53%
|
|
-------------------- ------ ------ ----
|
|
Index Score: 678.2 1026.2 51%
|
|
|
|
As expected, the heavily CPU-dependent tasks -- dhrystone, whetstone,
|
|
execl, pipe throughput, process creation -- show close to 100% gain when
|
|
running 2 copies in parallel.
|
|
|
|
The Pipe-based Context Switching test measures context switching overhead
|
|
by sending messages back and forth between 2 processes. I don't know why
|
|
it shows such a huge gain with 2 copies (ie. 4 processes total) running,
|
|
but it seems to be consistent on my system. I think this may be an issue
|
|
with the SMP implementation.
|
|
|
|
The System Call Overhead shows a lesser gain, presumably because it uses a
|
|
lot of CPU time in single-threaded kernel code. The shell scripts test with
|
|
8 concurrent processes shows no gain -- because the test itself runs 8
|
|
scripts in parallel, it's already using both CPUs, even when the benchmark
|
|
is run in single-stream mode. The same test with one process per copy
|
|
shows a real gain.
|
|
|
|
The filesystem throughput tests show a loss, instead of a gain, when
|
|
multi-processing. That there's no gain is to be expected, since the tests
|
|
are presumably constrained by the throughput of the I/O subsystem and the
|
|
disk drive itself; the drop in performance is presumably down to the
|
|
increased contention for resources, and perhaps greater disk head movement.
|
|
|
|
So what tests should you use, how many copies should you run, and how should
|
|
you interpret the results? Well, that's up to you, since it depends on
|
|
what it is you're trying to measure.
|
|
|
|
Implementation
|
|
--------------
|
|
|
|
The multi-processing mode is implemented at the level of test iterations.
|
|
During each iteration of a test, N slave processes are started using fork().
|
|
Each of these slaves executes the test program using fork() and exec(),
|
|
reads and stores the entire output, times the run, and prints all the
|
|
results to a pipe. The Run script reads the pipes for each of the slaves
|
|
in turn to get the results and times. The scores are added, and the times
|
|
averaged.
|
|
|
|
The result is that each test program has N copies running at once. They
|
|
should all finish at around the same time, since they run for constant time.
|
|
|
|
If a test program itself starts off K multiple processes (as with the shell8
|
|
test), then the effect will be that there are N * K processes running at
|
|
once. This is probably not very useful for testing multi-CPU performance.
|
|
|
|
|
|
============================================================================
|
|
|
|
The Language Setting
|
|
====================
|
|
|
|
The $LANG environment variable determines how programs abnd library
|
|
routines interpret text. This can have a big impact on the test results.
|
|
|
|
If $LANG is set to POSIX, or is left unset, text is treated as ASCII; if
|
|
it is set to en_US.UTF-8, foir example, then text is treated as being
|
|
encoded in UTF-8, which is more complex and therefore slower. Setting
|
|
it to other languages can have varying results.
|
|
|
|
To ensure consistency between test runs, the Run script now (as of version
|
|
5.1.1) sets $LANG to "en_US.utf8".
|
|
|
|
This setting which is configured with the variable "$language". You
|
|
should not change this if you want to share your results to allow
|
|
comparisons between systems; however, you may want to change it to see
|
|
how different language settings affect performance.
|
|
|
|
Each test report now includes the language settings in use. The reported
|
|
language is what is set in $LANG, and is not necessarily supported by the
|
|
system; but we also report the character mapping and collation order which
|
|
are actually in use (as reported by "locale").
|
|
|
|
|
|
============================================================================
|
|
|
|
Interpreting the Results
|
|
========================
|
|
|
|
Interpreting the results of these tests is tricky, and totally depends on
|
|
what you're trying to measure.
|
|
|
|
For example, are you trying to measure how fast your CPU is? Or how good
|
|
your compiler is? Because these tests are all recompiled using your host
|
|
system's compiler, the performance of the compiler will inevitably impact
|
|
the performance of the tests. Is this a problem? If you're choosing a
|
|
system, you probably care about its overall speed, which may well depend
|
|
on how good its compiler is; so including that in the test results may be
|
|
the right answer. But you may want to ensure that the right compiler is
|
|
used to build the tests.
|
|
|
|
On the other hand, with the vast majority of Unix systems being x86 / PC
|
|
compatibles, running Linux and the GNU C compiler, the results will tend
|
|
to be more dependent on the hardware; but the versions of the compiler and
|
|
OS can make a big difference. (I measured a 50% gain between SUSE 10.1
|
|
and OpenSUSE 10.2 on the same machine.) So you may want to make sure that
|
|
all your test systems are running the same version of the OS; or at least
|
|
publish the OS and compuiler versions with your results. Then again, it may
|
|
be compiler performance that you're interested in.
|
|
|
|
The C test is very dubious -- it tests the speed of compilation. If you're
|
|
running the exact same compiler on each system, OK; but otherwise, the
|
|
results should probably be discarded. A slower compilation doesn't say
|
|
anything about the speed of your system, since the compiler may simply be
|
|
spending more time to super-optimise the code, which would actually make it
|
|
faster.
|
|
|
|
This will be particularly true on architectures like IA-64 (Itanium etc.)
|
|
where the compiler spends huge amounts of effort scheduling instructions
|
|
to run in parallel, with a resultant significant gain in execution speed.
|
|
|
|
Some tests are even more dubious in terms of host-dependency -- for example,
|
|
the "dc" test uses the host's version of dc (a calculator program). The
|
|
version of this which is available can make a huge difference to the score,
|
|
which is why it's not in the index group. Read through the release notes
|
|
for more on these kinds of issues.
|
|
|
|
Another age-old issue is that of the benchmarks being too trivial to be
|
|
meaningful. With compilers getting ever smarter, and performing more
|
|
wide-ranging flow path analyses, the danger of parts of the benchmarks
|
|
simply being optimised out of existance is always present.
|
|
|
|
All in all, the "index" and "gindex" tests (see above) are designed to
|
|
give a reasonable measure of overall system performance; but the results
|
|
of any test run should always be used with care.
|
|
|