This patch takes the final step in removing the is_top_level parameter
from the cache. With the recent changes to read requests and write
invalidations, the parameter is no longer needed, and consequently
removed.
This also means that asymmetric cache hierarchies are now fully
supported (and we are actually using them already with L1 caches, but
no table-walker caches, connected to a shared L2).
This patch adds a parameter to the BaseCache to enable a read-only
cache, for example for the instruction cache, or table-walker cache
(not for x86). A number of checks are put in place in the code to
ensure a read-only cache does not end up with dirty data.
A follow-on patch adds suitable read requests to allow a read-only
cache to explicitly ask for clean data.
This changeset adds support for aarch64 in kvm. The CPU module
supports both checkpointing and online CPU model switching as long as
no devices are simulated by the host kernel. It currently has the
following limitations:
* The system register based generic timer can only be simulated by
the host kernel. Workaround: Use a memory mapped timer instead to
simulate the timer in gem5.
* Simulating devices (e.g., the generic timer) in the host kernel
requires that the host kernel also simulates the GIC.
* ID registers in the host and in gem5 must match for switching
between simulated CPUs and KVM. This is particularly important
for ID registers describing memory system capabilities (e.g.,
ASID size, physical address size).
* Switching between a virtualized CPU and a simulated CPU is
currently not supported if in-kernel device emulation is
used. This could be worked around by adding support for switching
to the gem5 (e.g., the KvmGic) side of the device models. A
simpler workaround is to avoid in-kernel device models
altogether.
This patch simplifies the overall CPU by changing the TLB caches such
that they do not forward snoops to the table walker port(s). Note that
only ARM and X86 are affected.
There is no reason for the ports to snoop as they do not actually take
any action, and from a performance point of view we are better of not
snooping more than we have to.
Should it at a later point be required to snoop for a particular TLB
design it is easy enough to add it back.
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued. As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined). The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value. To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean. If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
This patch ensures that the CPU progress Event is triggered for the new set of
switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids
printing the interval state if the cpu is currently switched out.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch adds an example configuration in ext/sst/tests/ that allows
an SST/gem5 instance to simulate a 4-core AArch64 system with SST's
memHierarchy components providing all the caches and memories.
When using gem5 as a slave simulator, it will not advance the
clock on its own and depends on the master simulator calling
simulate(). This new option lets us use the Python scripts
to do all the configuration while stopping short of actually
simulating anything.
This patch adds a random option to memtest.py which allows the user to
easily test valid random tree topologies. The patch also adds a
wrapper script to run soak tests using the newly introduced option.
We also adjust the progress interval and progress limit check to make
the output less noisy, and avoid false positives.
Bring on the pain.
This patch enables users to speficy --os-type on the command
line. This option is used to take specific actions for an OS type,
such as changing the kernel command line. This patch is part of the
Android KitKat enablement.
This patch modifies FSConfig.py to look for 'android' only in disk
image name. Before this patch, 'android' was searched in full
disk path.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.
Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
Previously, the user would have to manually set access_backing_store=True
on all RubyPorts (Sequencers) in the config files.
Now, instead there is one global option that each RubyPort checks on
initialization.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This is a rather unfortunate copy of the memtest.py example script,
that actually stresses the system with true sharing as opposed to the
false sharing of the MemTest. To do so it uses TrafficGen instances to
generate the reads/writes, and MemCheckerMonitor combined with the
MemChecker to check the validity of the read/written values.
As a bonus, this script also enables the addition of prefetchers, and
the traffic is created to have a mix of random addresses and linear
strides. We use the TaggedPrefetcher since the packets do not have a
request with a PC.
At the moment the code is almost identical to the memtest.py script,
and no effort has been made to factor out the construction of the
tree. The challenge is that the instantiation and connection of the
testers and monitors is done as part of the tree building.
This patch revamps the memtest example script and allows for the
insertion of testers at any level in the cache hierarchy. Previously
all created topologies placed testers only at the very top, and the
tree was thus entirely symmetric. With the changes made, it is possible
to not only place testers at the leaf caches (L1), but also to connect
testers at the L2, L3 etc.
As part of the changes the object hierarchy is also simplified to
ensure that the visual representation from the DOT printing looks
sensible. Using SubSystems to group the objects is one of the key
features.
The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.
The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.
In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
Although you can put a list of colon-separated directory names
in M5_PATH, the current code just takes the first one that
exists and assumes all files must live there. This change
makes the code search the specified list of directories
for each individual binary or disk image that's requested.
The main motivation is that the x86/Alpha binaries and the
ARM binaries are in separate downloads, and thus naturally
end up in separate directories. With this change, you can
have M5_PATH point to those two directories, then run any
FS regression test without changing M5_PATH. Currently,
you either have to merge the two download directories
or change M5_PATH (or do something else I haven't figured out).
This patch uses the recently added XOR hashing capabilities for the
DRAM channel interleaving. This avoids channel biasing due to strided
access patterns.
This patch changes the DRAM channel interleaving default behaviour to
be more representative. The default address mapping (RoRaBaCoCh) moves
the channel bits towards the least significant bits, and uses 128 byte
as the default channel interleaving granularity.
These defaults can be overridden if desired, but should serve as a
sensible starting point for most use-cases.
Fix the makeArmSystem routine to reflect recent changes that support kernel
commandline option when running android. Without this fix, trying to run
android encounters a 'reference before assignment' error.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
DMA Controller was not being connected to the network for the MESI_Three_Level
protocol as was being done in the other protocol config files. Without this
patch, this protocol segfaults during startup.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
when trying to dual boot on arm build_drive_system will only use the default
values for the dtb file, number of processors, and disk image. if you are using
the non-default files by passing values on the command line for example, or by
making a new entry in Benchmarks.py, the build config scripts will still look
for the default files. this will lead to the wrong system files being used, or
the simulator will fail if you do not have them.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).
The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).
This patch adds the --memchecker option, to denote that a MemChecker
should be instantiated for the system. The exact usage of the MemChecker
depends on the system configuration.
For now CacheConfig.py makes use of the option, adding MemCheckerMonitor
instances between CPUs and D-Caches.
Note, however, that currently this only provides limited checking on a
running system; other parts of the system, such as I/O devices are not
monitored, and may cause warnings to be issued by the monitor.
More documentation at http://gem5.org/Simpoints
Steps to profile, generate, and use SimPoints with gem5:
1. To profile workload and generate SimPoint BBV file, use the
following option:
--simpoint-profile --simpoint-interval <interval length>
Requires single Atomic CPU and fastmem.
<interval length> is in number of instructions.
2. Generate SimPoint analysis using SimPoint 3.2 from UCSD.
(SimPoint 3.2 not included with this flow.)
3. To take gem5 checkpoints based on SimPoint analysis, use the
following option:
--take-simpoint-checkpoint=<simpoint file path>,<weight file
path>,<interval length>,<warmup length>
<simpoint file> and <weight file> is generated by SimPoint analysis
tool from UCSD. SimPoint 3.2 format expected. <interval length> and
<warmup length> are in number of instructions.
4. To resume from gem5 SimPoint checkpoints, use the following option:
--restore-simpoint-checkpoint -r <N> --checkpoint-dir <simpoint
checkpoint path>
<N> is (SimPoint index + 1). E.g., "-r 1" will resume from SimPoint
#0.
Both options accept template which will, through python string formatting,
have "mem", "disk", and "script" values substituted in from the mdesc.
Additional values can be used on a case by case basis by passing them as
keyword arguments to the fillInCmdLine function. That makes it possible to
have specialized parameters for a particular ISA, for instance.
The first option lets you specify the template directly, and the other lets
you specify a file which has the template in it.
In fs.py the io port controller was being attached to the iobus multiple
times. This should be done only once. In se.py, the the option use_map
was being set which no longer exists.
Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
evicted from the sleeping cpu's cache. This eviction is forwarded to the
sleeping cpu, which then wakes up.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Ruby's functional accesses are not guaranteed to succeed as of now. While
this is not a problem for the protocols that are currently in the mainline
repo, it seems that coherence protocols for gpus rely on a backing store to
supply the correct data. The aim of this patch is to make this backing store
configurable i.e. it comes into play only when a particular option:
--access-backing-store is invoked.
The backing store has been there since M5 and GEMS were integrated. The only
difference is that earlier the system used to maintain the backing store and
ruby's copy was write-only. Sometime last year, we moved to data being
supplied supplied by ruby in SE mode simulations. And now we have patches on
the reviewboard, which remove ruby's copy of memory altogether and rely
completely on the system's memory to supply data. This patch adds back a
SimpleMemory member to RubySystem. This member is used only if the option:
access-backing-store is set to true. By default, the memory would not be
accessed.
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.