Add support for KVM in the big.LITTLE(tm) example configuration. This
replaces the --atomic option with a --cpu-type option that can be used
to switch between atomic, kvm, and timing simulation.
When running in KVM mode, the simulation script automatically assigns
separate event queues (threads) to each of the simulated CPUs. All
simulated devices, including CPU child devices (e.g., interrupt
controllers and caches), are assigned to event queue 0.
Change-Id: Ic9a3f564db91f5a3d3cb754c5a02fdd5c17d5fdf
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Gabor Dozsa <gabor.dozsa@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2561
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Weiping Liao <weipingliao@google.com>
The vanilla bL configuration file and the dist-gem5 configuration file
use slightly different code paths when restoring from
checkpoints. Unify this by passing the parsed options to the
instantiate() method and adding an optional checkpoint keyword
argument for checkpoint directories (only used by the dist-gem5
script).
Change-Id: I9943ec10bd7a256465e29c8de571142ec3fbaa0e
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Gabor Dozsa <gabor.dozsa@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2560
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Weiping Liao <weipingliao@google.com>
If output redirection is activated, the error message is printed in
simout. This change ensure it will be printed in simerr.
Change-Id: Ie661ac6b6978bf2e4aaaccdf23134795d764d459
Signed-off-by: Pierre-Yves Péneau <pierre-yves.peneau@lirmm.fr>
Reviewed-on: https://gem5-review.googlesource.com/2221
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
The EIOProcess class was removed recently and it was the only other class
which derived from Process. Since every Process invocation is also a
LiveProcess invocation, it makes sense to simplify the organization by
combining the fields from LiveProcess into Process.
This patch extends the example big.LITTLE configuration to enable
dist-gem5 simulations of big.LITTLE systems.
Change-Id: I49c095ab3c737b6a082f7c6f15f514c269217756
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This patch prevents the body of the script getting executed when
the script is imported as a module.
Change-Id: I70a50f6295f1e7a088398017f5fa9d06fe90476a
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This patch prepares future extensions and customisation of the example
big.LITTLE configuration script. It breaks out the major phases into
functions so they can be called from other python scripts.
Change-Id: I2cb7c207c410fe14602cf17af7482719abba6c24
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
A KVM VM is typically a child of the System object already, but for
solving future issues with configuration graph resolution, the most
logical way to keep track of this object is for it to be an actual
parameter of the System object.
Change-Id: I965ded22203ff8667db9ca02de0042ff1c772220
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Names of DRAM configurations were updated to reflect both
the channel and device data width.
Previous naming format was:
<DEVICE_TYPE>_<DATA_RATE>_<CHANNEL_WIDTH>
The following nomenclature is now used:
<DEVICE_TYPE>_<DATA_RATE>_<n>x<w>
where n = The number of devices per rank on the channel
x = Device width
Total channel width can be calculated by n*w
Example:
A 64-bit DDR4, 2400 channel consisting of 4-bit devices:
n = 16
w = 4
The resulting configuration name is:
DDR4_2400_16x4
Updated scripts to match new naming convention.
Added unique configurations for DDR4 for:
1) 16x4
2) 8x8
3) 4x16
Change-Id: Ibd7f763b7248835c624309143cb9fc29d56a69d1
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
The current TLM bridge only provides a Slave Port that allows the gem5
world to send request to the SystemC world. This patch series refractors
and cleans up the existing code, and adds a Master Port that allows the
SystemC world to send requests to the gem5 world.
This patch:
* Restructure the existing sources in preparation of the addition of the
* new
Master Port.
* Refractor names to allow for distinction of the slave and master port.
* Replace the Makefile by a SConstruct.
Testing Done: The examples provided in util/tlm (now
util/tlm/examples/slave_port) still compile and run error free.
Reviewed at http://reviews.gem5.org/r/3527/
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Bugfix for Elastic Traces
This patch fixes the bug when elastic traces are used:
build/ARM/gem5.opt \
configs/example/fs.py \
--cpu-type=arm_detailed \
--num-cpu=1 \
--mem-type=SimpleMemory \
--mem-size=512MB \
--mem-channels=1 \
--caches \
--elastic-trace-en \
--data-trace-file=data.proto.gz \
--inst-trace-file=inst.proto.gz \
--machine-type=VExpress_EMM \
--dtb-filename=vexpress.aarch32.ll_20131205.0-gem5.1cpu.dtb \
--kernel=vmlinux.aarch32.ll_20131205.0-gem5 \
--disk-image=linux-aarch32-ael.img
NameError: global name 'CpuConfig' is not defined
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
Some configuration scripts need periodic stat dumps. One of the ways
this can be achieved is by using the pariodicStatDump helper
function. This function was previously only exported in the internal
name space. Export it as a normal function in m5.stat instead.
Change-Id: Ic88bf1fd33042a62ab436d5944d8ed778264ac98
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
This patch detects garnet network deadlock by monitoring
network interfaces. If a network interface continuously
fails to allocate virtual channels for a message, a
possible deadlock is detected.
This patch adds an IOCache to the example bigLITTLE
configuration. An IOCache is required for correct DMA
transfers when we have caches in the system.
Change-Id: Ifeddc1b360aacbb16b1393f361dd98873c834012
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This change adds the option to use the memcheck with random memory
hierarchies at the moment limited to a maximum depth of 3 allowing
testing with uncommon topologies.
Change-Id: Id2c2fe82a8175d9a67eb4cd7f3d2e2720a809b60
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
If the cache access mode is parallel, i.e. "sequential_access" parameter
is set to "False", tags and data are accessed in parallel. Therefore,
the hit_latency is the maximum latency between tag_latency and
data_latency. On the other hand, if the cache access mode is
sequential, i.e. "sequential_access" parameter is set to "True",
tags and data are accessed sequentially. Therefore, the hit_latency
is the sum of tag_latency plus data_latency.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This patch adds the ability for an application to request dist-gem5 to begin/
end synchronization using an m5 op. When toggling on sync, all nodes agree
on the next sync point based on the maximum of all nodes' ticks. CPUs are
suspended until the sync point to avoid sending network messages until sync has
been enabled. Toggling off sync acts like a global execution barrier, where
all CPUs are disabled until every node reaches the toggle off point. This
avoids tricky situations such as one node hitting a toggle off followed by a
toggle on before the other nodes hit the first toggle off.
this patch adds an ordered response buffer to the GM pipeline
to ensure in-order data delivery. the buffer is implemented as
a stl ordered map, which sorts the request in program order by
using their sequence ID. when requests return to the GM pipeline
they are marked as done. only the oldest request may be serviced
from the ordered buffer, and only if is marked as done.
the FIFO response buffers are kept and used in OoO delivery mode
This patch breaks out the most basic configuration options into a set
of base options, to allow them to be used also by scripts that do not
involve any ISA, and thus no actual CPUs or devices.
The patch also fixes a few modules so that they can be imported in a
NULL build, and avoid dragging in FSConfig every time Options is
imported.
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to
Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which
distinguishes writes to the INT and FP register banks.
Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72,
where the "latency" of FMADD is 3 if the next instruction is a FMADD and
has only the augend to destination dependency, otherwise it's 7 cycles.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Continue along the same line as the recent patch that made the
Ruby-related config scripts Python packages and make also the
configs/common directory a package.
All affected config scripts are updated (hopefully).
Note that this change makes it apparent that the current organisation
and naming of the config directory and its subdirectories is rather
chaotic. We mix scripts that are directly invoked with scripts that
merely contain convenience functions. While it is not addressed in
this patch we should follow up with a re-organisation of the
config structure, and renaming of some of the packages.
This patch moves the addition of network options into the Ruby module
to avoid the regressions all having to add it explicitly. Doing this
exposes an issue in our current config system though, namely the fact
that addtoPath is relative to the Python script being executed. Since
both example and regression scripts use the Ruby module we would end
up with two different (relative) paths being added. Instead we take a
first step at turning the config modules into Python packages, simply
by adding a __init__.py in the configs/ruby, configs/topologies and
configs/network subdirectories.
As a result, we can now add the top-level configs directory to the
Python search path, and then use the package names in the various
modules. The example scripts are also updated, and the messy
path-deducing variations in the scripts are unified.
This patch adds port direction names to the links during topology
creation, which can be used for better printed names for the links
or for users to code up their own adaptive routing algorithms.
It also adds support for every router to have an independent latency
value to support heterogeneous topologies with the subsequent
garnet2.0 patch.
This patch makes the internal links within the network topology
unidirectional, thus allowing any deadlock-free routing algorithms to
be specified from the topology itself using weights.
This patch also renames Mesh.py and MeshDirCorners.py to
Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing).
It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.
networktest is essentially a collection of synthetic traffic patterns
for the network. The protocol name and the tester having the same name
led to multiple python configuration files with the same name, adding
confusion. This patch renames networktest to garnet_synthetic_traffic,
and also adds more synthetic traffic patterns.
Over the past 6 years, we realized that the protocol is essentially used
to run the garnet network in a standalone manner, and feed standard synthetic
traffic patterns through it.
Adjust the traffic generator time-out so that the script works out of
the box
Change-Id: I6b3b6b11f98b094ae3acdbe09488c26e4aeb0ab4
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This patch refactors the configuration file to use a more
object-oriented design.
Change-Id: I44ac2d063c2b5901f385544fb6ce3f259459cb05
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Gabor Dozsa <gabor.dozsa@arm.com>
Add support for using KVM to accelerate APU simulations. The intended use
case is to fast-forward through runtime initialization until the first
kernel launch.
This patch changes the default behaviour of the SystemXBar, adding a
snoop filter. With the recent updates to the snoop filter allocation
behaviour this change no longer causes problems for the regressions
without caches.
Change-Id: Ibe0cd437b71b2ede9002384126553679acc69cc1
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
Ruby on ARM is currently very experimental. Fail with a fatal error
that explains this to make sure users are aware of the limitations (it
doesn't actually work yet!).
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Add initial support for creating an ARM system with a Ruby-based
memory system. This support is currently experimental and limited to
the new VExpress_GEM5_V1 platform.
Change-Id: I36baeb68b0d891e34ea46aafe17b5e55217b4bfa
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brad Beckmann <brad.beckmann@amd.com>
There are cases where we want to put boot ROMs on the PIO bus. Ruby
currently doesn't support functional accesses to such memories since
functional accesses are always assumed to go to physical memory. Add
the required support for routing functional accesses to the PIO bus.
Change-Id: Ia5b0fcbe87b9642bfd6ff98a55f71909d1a804e3
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brad Beckmann <brad.beckmann@amd.com>
Reviewed-by: Michael LeBeane <michael.lebeane@amd.com>
An ARM big.LITTLE system consists of two cpu clusters: the big
CPUs are typically complex out-of-order cores and the little
CPUs are simpler in-order ones. The fs_bigLITTLE.py script
can run a full system simulation with various number of big
and little cores and cache hierarchy. The commit also includes
two example device tree files for booting Linux on the
bigLITTLE system.
Change-Id: I6396fb3b2d8f27049ccae49d8666d643b66c088b
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
At the moment the SPARC FS machine configuration comes with a hardcoded
value for using the Solaris 10 disk image from the OpenSPARC tarball. The
--disk-image option is completely ignored for SPARC. This simple patch
modifies the behavior so that --disk-image option is both taken into
account and also required. This makes it possible to easily change SPARC FS
images without having to modify the configuration files.
This patch provides the example test script to configure different HMC
architecture and run traffic through traffic generator.
Committed by Jason Lowe-Power <jason@lowepower.com>
In this new hmc configuration we have used the existing components in gem5
mainly [SerialLink] [NoncoherentXbar]& [DRAMCtrl] to define 3 different
architecture for HMC.
Highlights
1- It explores 3 different HMC architectures
2- It creates 4-HMC crossbars and attaches 16 vault controllers with it.
This will connect vaults to serial links
3- From the previous version, HMCController with round robin funtionality
is being removed and all the serial links are being accessible directly
from user ports
4- Latency incorporated by HMCController (in previous version) is being
added to SerialLink
Committed by Jason Lowe-Power <jason@lowepower.com>
This patch ensures a walker cache is instantiated if specfied.
Change-Id: I2c6b4bf3454d56bb19558c73b406e1875acbd986
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-by: Mitch Hayenga <mitch.hayenga@arm.com>
Eliminate the VSZ constant that defined the Wavefront size (in numbers of work
items); replaced it with a parameter in the GPU.py configuration script.
Changed all data structures dependent on the Wavefront size to be dynamically
sized. Legal values of Wavefront size are 16, 32, 64 for now and checked at
initialization time.
Disable the default snoop filter in the SystemXBar so that the
typical membus does not have a snoop filter by default. Instead,
add the snoop filter only when there are caches added to the system
(with the caches / l2cache options).
The underlying problem is that the snoop filter grows without
bounds (for now) if there are no caches to tell it that lines have
been evicted. This causes slow regression runs for all the atomic
regressions. This patch fixes this behaviour.
--HG--
extra : source : f97c20511828209757440839ed48d741d02d428f
According to the Intel Multi Processor Specification rev 1.4 (-006) (*),
section 4.3.2 Bus Entries, Bus type strings are >>6-character ASCII
(blank-filled) strings<<.
This patch properly pads the entries with the missing spaces at the end.
(*) http://www.intel.com/design/pentium/datashts/24201606.pdf
Committed by Jason Lowe-Power <power.jg@gmail.com>
Distributed gem5 is the result of the convergence effort between
multi-gem5 and pd-gem5. It relies on the base multi-gem5 infrastructure
for packet forwarding, synchronisation and checkpointing but combines
those with the elaborated network switch model from pd-gem5.