Add initial support for creating an ARM system with a Ruby-based
memory system. This support is currently experimental and limited to
the new VExpress_GEM5_V1 platform.
Change-Id: I36baeb68b0d891e34ea46aafe17b5e55217b4bfa
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brad Beckmann <brad.beckmann@amd.com>
An ARM big.LITTLE system consists of two cpu clusters: the big
CPUs are typically complex out-of-order cores and the little
CPUs are simpler in-order ones. The fs_bigLITTLE.py script
can run a full system simulation with various number of big
and little cores and cache hierarchy. The commit also includes
two example device tree files for booting Linux on the
bigLITTLE system.
Change-Id: I6396fb3b2d8f27049ccae49d8666d643b66c088b
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This patch provides the example test script to configure different HMC
architecture and run traffic through traffic generator.
Committed by Jason Lowe-Power <jason@lowepower.com>
Eliminate the VSZ constant that defined the Wavefront size (in numbers of work
items); replaced it with a parameter in the GPU.py configuration script.
Changed all data structures dependent on the Wavefront size to be dynamically
sized. Legal values of Wavefront size are 16, 32, 64 for now and checked at
initialization time.
This patch introduces the ability of making the coherent crossbar the
point of coherency. If so, the crossbar does not forward packets where
a cache with ownership has already committed to responding, and also
does not forward any coherency-related packets that are not intended
for a downstream memory controller. Thus, invalidations and upgrades
are turned around in the crossbar, and the memory controller only sees
normal reads and writes.
In addition this patch moves the express snoop promotion of a packet
to the crossbar, thus allowing the downstream cache to check the
express snoop flag (as it should) for bypassing any blocking, rather
than relying on whether a cache is responding or not.
This patch allows the ruby random tester to use ruby ports that may only
support instr or data requests. This patch is similar to a previous changeset
(8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets.
This current patch implements the support in a more straight-forward way.
Since retries are now tested when running the ruby random tester, this patch
splits up the retry and drain check behavior so that RubyPort children, such
as the GPUCoalescer, can perform those operations correctly without having to
duplicate code. Finally, the patch also includes better DPRINTFs for
debugging the tester.
This patch adds changes to the configuration scripts to support elastic
tracing and replay.
The patch adds a command line option to enable elastic tracing in SE mode
and FS mode. When enabled the Elastic Trace cpu probe is attached to O3CPU
and a few O3 CPU parameters are tuned. The Elastic Trace probe writes out
both instruction fetch and data dependency traces. The patch also enables
configuring the TraceCPU to replay traces using the SE and FS script.
The replay run is designed to resume from checkpoint using atomic cpu to
restore state keeping it consistent with FS run flow. It then switches to
TraceCPU to replay the input traces.
Added the missing types EthernetAddr and Current to the JSON/INI file
reader example configs/example/read_config.py.
Also added __str__ to EthernetAddr to make values appear in the same form
in JSON an INI files.
This patch adds yet another twist to the memtest cache hierarchy, in that
the writeback_clean option is toggled at every level to match the
clusivity of the downstream cache.
This patch adds an new twist to the memtest cache hierarchy, in that
it switches from mostly inclusive to mostly exclusive at every level
in the tree. This has helped weed out plenty issues, and serves as a
good stress tests.
This patch enables modeling a complete Hybrid Memory Cube (HMC) device. It
highly reuses the existing components in gem5's general memory system with some
small modifications. This changeset requires additional patches to model a
complete HMC device.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Adds SMT support to the "simple" CPU models so that they can be
used with other SMT-supported CPUs. Example usage: this enables
the TimingSimpleCPU to be used to warmup caches before swapping to
detailed mode with the in-order or out-of-order based CPU models.
Open up for other subclasses to BaseCache and transition to using the
explicit Cache subclass.
--HG--
rename : src/mem/cache/BaseCache.py => src/mem/cache/Cache.py
This patch takes the final step in removing the is_top_level parameter
from the cache. With the recent changes to read requests and write
invalidations, the parameter is no longer needed, and consequently
removed.
This also means that asymmetric cache hierarchies are now fully
supported (and we are actually using them already with L1 caches, but
no table-walker caches, connected to a shared L2).
This patch adds an example configuration in ext/sst/tests/ that allows
an SST/gem5 instance to simulate a 4-core AArch64 system with SST's
memHierarchy components providing all the caches and memories.
This patch adds a random option to memtest.py which allows the user to
easily test valid random tree topologies. The patch also adds a
wrapper script to run soak tests using the newly introduced option.
We also adjust the progress interval and progress limit check to make
the output less noisy, and avoid false positives.
Bring on the pain.
This patch enables users to speficy --os-type on the command
line. This option is used to take specific actions for an OS type,
such as changing the kernel command line. This patch is part of the
Android KitKat enablement.
This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.
Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
This is a rather unfortunate copy of the memtest.py example script,
that actually stresses the system with true sharing as opposed to the
false sharing of the MemTest. To do so it uses TrafficGen instances to
generate the reads/writes, and MemCheckerMonitor combined with the
MemChecker to check the validity of the read/written values.
As a bonus, this script also enables the addition of prefetchers, and
the traffic is created to have a mix of random addresses and linear
strides. We use the TaggedPrefetcher since the packets do not have a
request with a PC.
At the moment the code is almost identical to the memtest.py script,
and no effort has been made to factor out the construction of the
tree. The challenge is that the instantiation and connection of the
testers and monitors is done as part of the tree building.
This patch revamps the memtest example script and allows for the
insertion of testers at any level in the cache hierarchy. Previously
all created topologies placed testers only at the very top, and the
tree was thus entirely symmetric. With the changes made, it is possible
to not only place testers at the leaf caches (L1), but also to connect
testers at the L2, L3 etc.
As part of the changes the object hierarchy is also simplified to
ensure that the visual representation from the DOT printing looks
sensible. Using SubSystems to group the objects is one of the key
features.
The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.
The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.
In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
when trying to dual boot on arm build_drive_system will only use the default
values for the dtb file, number of processors, and disk image. if you are using
the non-default files by passing values on the command line for example, or by
making a new entry in Benchmarks.py, the build config scripts will still look
for the default files. this will lead to the wrong system files being used, or
the simulator will fail if you do not have them.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
More documentation at http://gem5.org/Simpoints
Steps to profile, generate, and use SimPoints with gem5:
1. To profile workload and generate SimPoint BBV file, use the
following option:
--simpoint-profile --simpoint-interval <interval length>
Requires single Atomic CPU and fastmem.
<interval length> is in number of instructions.
2. Generate SimPoint analysis using SimPoint 3.2 from UCSD.
(SimPoint 3.2 not included with this flow.)
3. To take gem5 checkpoints based on SimPoint analysis, use the
following option:
--take-simpoint-checkpoint=<simpoint file path>,<weight file
path>,<interval length>,<warmup length>
<simpoint file> and <weight file> is generated by SimPoint analysis
tool from UCSD. SimPoint 3.2 format expected. <interval length> and
<warmup length> are in number of instructions.
4. To resume from gem5 SimPoint checkpoints, use the following option:
--restore-simpoint-checkpoint -r <N> --checkpoint-dir <simpoint
checkpoint path>
<N> is (SimPoint index + 1). E.g., "-r 1" will resume from SimPoint
#0.
Both options accept template which will, through python string formatting,
have "mem", "disk", and "script" values substituted in from the mdesc.
Additional values can be used on a case by case basis by passing them as
keyword arguments to the fillInCmdLine function. That makes it possible to
have specialized parameters for a particular ISA, for instance.
The first option lets you specify the template directly, and the other lets
you specify a file which has the template in it.
In fs.py the io port controller was being attached to the iobus multiple
times. This should be done only once. In se.py, the the option use_map
was being set which no longer exists.
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.
Both ruby and the system used to maintain memory copies. With the changes
carried for programmed io accesses, only one single memory is required for
fs simulations. This patch sets the copy of memory that used to reside
with the system to null, so that no space is allocated, but address checks
can still be carried out. All the memory accesses now source and sink values
to the memory maintained by ruby.
This changes the default ARM system to a Versatile Express-like system that supports
2GB of memory and PCI devices and updates the default kernels/file-systems for
AArch64 ARM systems (64-bit) to support up to 32GB of memory and PCI devices. Some
platforms that are no longer supported have been pruned from the configuration files.
In addition a set of 64-bit ARM regressions have been added to the regression system.
This patch adds the ability to load in config.ini files generated from
gem5 into another instance of gem5 built without Python configuration
support. The intended use case is for configuring gem5 when it is a
library embedded in another simulation system.
A parallel config file reader is also provided purely in Python to
demonstrate the approach taken and to provided similar functionality
for as-yet-unknown use models. The Python configuration file reader
can read both .ini and .json files.
C++ configuration file reading:
A command line option has been added for scons to enable C++ configuration
file reading: --with-cxx-config
There is an example in util/cxx_config that shows C++ configuration in action.
util/cxx_config/README explains how to build the example.
Configuration is achieved by the object CxxConfigManager. It handles
reading object descriptions from a CxxConfigFileBase object which
wraps a config file reader. The wrapper class CxxIniFile is provided
which wraps an IniFile for reading .ini files. Reading .json files
from C++ would be possible with a similar wrapper and a JSON parser.
After reading object descriptions, CxxConfigManager creates
SimObjectParam-derived objects from the classes in the (generated with this
patch) directory build/ARCH/cxx_config
CxxConfigManager can then build SimObjects from those SimObjectParams (in an
order dictated by the SimObject-value parameters on other objects) and bind
ports of the produced SimObjects.
A minimal set of instantiate-replacing member functions are provided by
CxxConfigManager and few of the member functions of SimObject (such as drain)
are extended onto CxxConfigManager.
Python configuration file reading (configs/example/read_config.py):
A Python version of the reader is also supplied with a similar interface to
CxxConfigFileBase (In Python: ConfigFile) to config file readers.
The Python config file reading will handle both .ini and .json files.
The object construction strategy is slightly different in Python from the C++
reader as you need to avoid objects prematurely becoming the children of other
objects when setting parameters.
Port binding also needs to be strictly in the same port-index order as the
original instantiation.
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.
As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.
--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
This patch fixes scripts related to ruby by adding the ruby clock domain.
Now the L1 controllers and the Sequencer shares the cpu clock domain,
while the rest of the components use the ruby clock domain.
Before this patch, running simulations with the cpu clock set at 2GHz or
1GHz will output the same time results and could distort power measurements.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
A recent changeset altered the default memory class to DRAMCtrl. In se mode,
ruby uses the physical memory to check if a given address is within the bounds
of the physical memory. SimpleMemory is enough for this. Moreover,
SimpleMemory does not check whether it is connected or not, something which
DRAMCtrl does.
The code that creates test and drive systems is being moved to separate
functions so as to make the code more readable. Ultimately the two
functions would be combined so that the replicated code is eliminated.
The patch removes the ruby_fs.py file. The functionality is being moved to
fs.py. This would being ruby fs simulations in line with how ruby se
simulations are started (using --ruby option). The alpha fs config functions
are being combined for classing and ruby memory systems. This required
renaming the piobus in ruby to iobus. So, we will have stats being renamed
in the stats file for ruby fs regression.