Adds two public domain algorithms for determining number of set bits and also
whether a value is a power of two, uses the builtin that is available in GCC
and clang for popcount.
This is a first cut at a simple snoop filter that tracks presence of lines in
the caches "above" it. The snoop filter can be applied at any given cache
hierarchy and will then handle the caches above it appropriately; there is no
need to use this only in the last-level bus.
This design currently has some limitations: missing stats, no notion of clean
evictions (these will not update the underlying snoop filter, because they are
not sent from the evicting cache down), no notion of capacity for the snoop
filter and thus no need for invalidations caused by capacity pressure in the
snoop filter. These are planned to be added on top with future change sets.
There are cases where users might by accident / intention specify less voltage
operating points thatn frequency points. We consider one of these cases
special: giving only a single voltage to a voltage domain effectively renders
it as a static domain. This patch adds additional logic in the auxiliary parts
of the functionality to handle these cases properly (simple driver asking for
N>1 operating levels, we should return the same voltage for all of them) and
adds error checking code in the voltage domain.
This patch provides an Energy Controller device that provides software
(driver) access to a DVFS handler. The device is currently residing in
the dev/arm tree, but there is nothing inherently ARM specific in the
behaviour. It is currently only tested and supported for ARM Linux,
hence the location.
These additions allow easier interoperability with and querying from an
additional controller which will be in a separate patch. Also adding warnings
for changing the enabled state of the handler across checkpoint / resume and
deviating from the state in the configuration.
Contributed-by: Akash Bagdia <akash.bagdia@arm.com>
Added the following parameter to the DRAMCtrl class:
- bank_groups_per_rank
This defaults to 1. For the DDR4 case, the default is overridden to indicate
bank group architecture, with multiple bank groups per rank.
Added the following delays to the DRAMCtrl class:
- tCCD_L : CAS-to-CAS, same bank group delay
- tRRD_L : RAS-to-RAS, same bank group delay
These parameters are only applied when bank group timing is enabled. Bank
group timing is currently enabled only for DDR4 memories.
For all other memories, these delays will default to '0 ns'
In the DRAM controller model, applied the bank group timing to the per bank
parameters actAllowedAt and colAllowedAt.
The actAllowedAt will be updated based on bank group when an ACT is issued.
The colAllowedAt will be updated based on bank group when a RD/WR burst is
issued.
At the moment no modifications are made to the scheduling.
Add the following delay to the DRAM controller:
- tCS : Different rank bus turnaround delay
This will be applied for
1) read-to-read,
2) write-to-write,
3) write-to-read, and
4) read-to-write
command sequences, where the new command accesses a different rank
than the previous burst.
The delay defaults to 2*tCK for each defined memory class. Note that
this does not correspond to one particular timing constraint, but is a
way of modelling all the associated constraints.
The DRAM controller has some minor changes to prioritize commands to
the same rank. This prioritization will only occur when the command
stream is not switching from a read to write or vice versa (in the
case of switching we have a gap in any case).
To prioritize commands to the same rank, the model will determine if there are
any commands queued (same type) to the same rank as the previous command.
This check will ensure that the 'same rank' command will be able to execute
without adding bubbles to the command flow, e.g. any ACT delay requirements
can be done under the hoods, allowing the burst to issue seamlessly.
Add new DRAM_ROTATE mode to traffic generator.
This mode will generate DRAM traffic that rotates across
banks per rank, command types, and ranks per channel
The looping order is illustrated below:
for (ranks per channel)
for (command types)
for (banks per rank)
// Generate DRAM Command Series
This patch also adds the read percentage as an input argument to the
DRAM sweep script. If the simulated read percentage is 0 or 100, the
middle for loop does not generate additional commands. This loop is
used only when the read percentage is set to 50, in which case the
middle loop will toggle between read and write commands.
Modified sweep.py script, which generates DRAM traffic.
Added input arguments and support for new DRAM_ROTATE mode.
The script now has input arguments for:
1) Read percentage
2) Number of ranks
3) Address mapping
4) Traffic generator mode (DRAM or DRAM_ROTATE)
The default values are:
100% reads, 1 rank, RoRaBaCoCh address mapping, and DRAM traffic gen mode
For the DRAM traffic mode, added multi-rank support.
This patch adds support for 9p filesystem proxying over VirtIO. It can
currently operate by connecting to a 9p server over a socket
(VirtIO9PSocket) or by starting the diod 9p server and connecting over
pipe (VirtIO9PDiod).
*WARNING*: Checkpoints are currently not supported for systems with 9p
proxies!
This patch adds support for VirtIO over the PCI bus. It does so by
providing the following new SimObjects:
* VirtIODeviceBase - Abstract base class for VirtIO devices.
* PciVirtIO - VirtIO PCI transport interface.
A VirtIO device is hooked up to the guest system by adding a PciVirtIO
device to the PCI bus and connecting it to a VirtIO device using the
vio parameter.
New VirtIO devices should inherit from VirtIODevice base and
implementing one or more VirtQueues. The VirtQueues are usually
device-specific and all derive from the VirtQueue class. Queues must
be registered with the base class from the constructor since the
device assumes that the number of queues stay constant.
The terminal currently assumes that the transport to the guest always
inherits from the Uart class. This assumption breaks when
implementing, for example, a VirtIO consoles. This patch removes this
assumption by adding pointer to the from the terminal to the uart and
replacing it with a more general callback interface. The Uart, or any
other class using the terminal, class implements an instance of the
callbacks class and registers it with the terminal.
This patch does a bit of housekeeping on the string helper functions
and relies on the C++11 standard library where possible. It also does
away with our custom string hash as an implementation is already part
of the standard library.
There are two primary issues with this code which make it deserving of deletion.
1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher
2) This prefetcher isn't even structured like a GHB prefetcher.
It's basically a worse version of the stride prefetcher.
It primarily serves to confuse new gem5 users and most functionality is already
present in the stride prefetcher.
This patch 'completes' .json config files generation by adding in the
SimObject references and String-valued parameters not currently
printed.
TickParamValues are also changed to print in the same tick-value
format as in .ini files.
This allows .json files to describe a system as fully as the .ini files
currently do.
This patch adds a new function config_value (which mirrors ini_str) to
each ParamValue and to SimObject. This function can then be explicitly
changed to give different .json and .ini printing behaviour rather than
being written in terms of ini_str.
This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
The changeset ad9c042dce54 made changes to the structures under the network
directory to use a map of buffers instead of vector of buffers.
The reasoning was that not all vnets that are created are used and we
needlessly allocate more buffers than required and then iterate over them
while processing network messages. But the move to map resulted in a slow
down which was pointed out by Andreas Hansson. This patch moves things
back to using vector of message buffers.
This patch fixes cases where uncacheable/memory type flags are not set
correctly on a memory op which is split in the LSQ. Without this
patch, request->request if freely used to check flags where the flags
should actually come from the accumulation of request fragment flags.
This patch also fixes a bug where an uncacheable access which passes
through tryToSendRequest more than once can increment
LSQ::numAccessesInMemorySystem more than once.
This patch closes a number of space gaps in debug messages caused by
the incorrect use of line continuation within strings. (There's also
one consistency change to a similar, but correct, use of line
continuation)
The ProbeListener base class automatically registers itself with a
probe manager. Currently, the class does not unregister a itself when
it is destroyed, which makes removing probes listeners somewhat
cumbersome. This patch adds an automatic call to
manager->removeListener in the ProbeListener destructor, which solves
the problem.
Parsing vectorparams from the command was slightly broken
in that it wouldn't accept the input that the help message
provided to the user and it didn't do the conversion
on the second code path used to convert the string input
to the actual internal representation. This patch fixes these bugs.
Static analysis revealed that BaseGlobalEvent::barrier was never
deallocated. This changeset solves this leak by making the barrier
allocation a part of the BaseGlobalEvent instead of storing a pointer
to a separate heap-allocated barrier.
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
The PC platform has a single IO range that is used both legacy IO and PCI IO
while other platforms may use seperate regions. Provide another mechanism to
configure the legacy IO base address range and set it to the PCI IO address
range for x86.
Change the default kernel for AArch64 and since it supports PCI devices
remove the hack that made it use CF. Unfortunately, there isn't really
a half-way here and we need to switch. Current users will get an error
message that the kernel isn't found and hopefully go download a new
kernel that supports PCI.
This change adds support for a generic pci host bus driver that
has been included in recent Linux kernel instead of the more
bespoke one we've been using to date. It also works with
aarch64 so it provides PCI support for 64-bit ARM Linux.
To make this work a new configuration option pci_io_base is added
to the RealView platform that should be set to the start of
the memory used as memory mapped IO ports (IO ports that are
memory mapped, not regular memory mapped IO). And a parameter
pci_cfg_gen_offsets which specifies if the config space
offsets should be used that the generic driver expects.
To use the pci-host-generic device you need to:
pci_io_base = 0x2f000000 (Valid for VExpress EMM)
pci_cfg_gen_offsets = True
and add the following to your device tree:
pci {
compatible = "pci-host-ecam-generic";
device_type = "pci";
#address-cells = <0x3>;
#size-cells = <0x2>;
#interrupt-cells = <0x1>;
//bus-range = <0x0 0x1>;
// CPU_PHYSICAL(2) SIZE(2)
// Note, some DTS blobs only support 1 size
reg = <0x0 0x30000000 0x0 0x10000000>;
// IO (1), no bus address (2), cpu address (2), size (2)
// MMIO (1), at address (2), cpu address (2), size (2)
ranges = <0x01000000 0x0 0x00000000 0x0 0x2f000000 0x0 0x10000>,
<0x02000000 0x0 0x40000000 0x0 0x40000000 0x0 0x10000000>;
// With gem5 we typically use INTA/B/C/D one per device
interrupt-map = <0x0000 0x0 0x0 0x1 0x1 0x0 0x11 0x1
0x0000 0x0 0x0 0x2 0x1 0x0 0x12 0x1
0x0000 0x0 0x0 0x3 0x1 0x0 0x13 0x1
0x0000 0x0 0x0 0x4 0x1 0x0 0x14 0x1>;
// Only match INTA/B/C/D and not BDF
interrupt-map-mask = <0x0000 0x0 0x0 0x7>;
};
The new configuration scripts need the ability to splice
a simobject between a pair of ports that are already connected.
The primary use case is when a CommMonitor needs to be
created after the system is configured and then spliced between
the pair of ports it will monitor.
Updated the stat_config.ini files to reflect new structure.
Moved to a more generic stat naming scheme that can easily handle
multiple CPUs and L2s by letting the script replace pre-defined #
symbols to CPU or L2 ids.
Removed the previous per_switch_cpus sections. Still can be used by
spelling out the stat names if necessary. (Resuming from checkpoints
no longer use switch_cpus. Only fast-forwarding does.)
This patch changes the perlbmk regression script from the large to the
medium dataset to reduce the regression run time. For all ISAs and CPU
models, the total perlbmk host CPU time with the large dataset is
roughly 12 hours (constituting >30% of the total regression host
time). There is, most likely, almost no added value in terms of code
coverage for this rather excessive run time.