The patch removes the ruby_fs.py file. The functionality is being moved to
fs.py. This would being ruby fs simulations in line with how ruby se
simulations are started (using --ruby option). The alpha fs config functions
are being combined for classing and ruby memory systems. This required
renaming the piobus in ruby to iobus. So, we will have stats being renamed
in the stats file for ruby fs regression.
Modifies FSConfig.py to enable ARMv8 compatibility.
To boot gem5 with ARMv8:
Download the v8 kernel, .dtb file, and root FS from: http://gem5.org/Download
Download the ARMv8 toolchain, and add the bin dir to your path:
http://www.linaro.org/engineering/engineering-projects/armv8
Build gem5 for ARM
Build the v8 bootloader (in gem5/system/arm/aarch64_bootloader)
Make script in gem5/system/arm/aarch64_bootloader will require v8 toolchain,
drop the produced boot_emm.arm64 in $(M5_PATH)/binaries/
Run:
$ build/ARM/gem5.fast configs/example/fs.py --machine-type=VExpress_EMM64 \
--kernel=/path/to/kernel/vmlinux-linaro-tracking \
--dtb-filename=/path/to/dtb/rtsm_ve-aemv8a.dtb \
--disk-image=/path/to/img/linaro-minimal-armv8.img
This patch adds DRAMSim2 as a memory controller by wrapping the
external library and creating a sublass of AbstractMemory that bridges
between the semantics of gem5 and the DRAMSim2 interface.
The DRAMSim2 wrapper extracts the clock period from the config
file. There is no way of extracting this information from DRAMSim2
itself, so we simply read the same config file and get it from there.
To properly model the response queue, the wrapper keeps track of how
many transactions are in the actual controller, and how many are
stacking up waiting to be sent back as responses (in the wrapper). The
latter requires us to move away from the queued port and manage the
packets ourselves. This is due to DRAMSim2 not having any flow control
on the response path.
DRAMSim2 assumes that the transactions it is given are matching the
burst size of the choosen memory. The wrapper checks to ensure the
cache line size of the system matches the burst size of DRAMSim2 as
there are currently no provisions to split the system requests. In
theory we could allow a cache line size smaller than the burst size,
but that would lead to inefficient use of the DRAM, so for not we
fatal also in this case.
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.
Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.
Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
For some reason, the default x86 kernel is specified in
tests/configs/x86_generic.py and not in configs/common/FSConfig.py,
where the kernels for all the other ISAs are. This means that
running configs/example/fs.py for x86 fails because no kernel
is specified. Moving the specification over fixes this problem.
There is another problem that this uncovers, which is that going
past the init stage (i.e., past where the regression test stops)
fails because the fsck test on the disk device fails, but that's
a separate issue.
the current implementation of the fetch buffer in the o3 cpu
is only allowed to be the size of a cache line. some
architectures, e.g., ARM, have fetch buffers smaller than a cache
line, see slide 22 at:
http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf
this patch allows the fetch buffer to be set to values smaller
than a cache line.
This Python script generates an ARM DS-5 Streamline .apc project based
on gem5 run. To successfully convert, the gem5 runs needs to be run
with the context-switch-based stats dump option enabled (The guest
kernel also needs to be patched to allow gem5 interrogate its task
information.) See help for more information.
A couple of recent changesets added/deleted/edited some variables
that are needed for running the example ruby scripts. This changeset
edits these scripts to bring them to a working state.
In order to support m5ops in virtualized environments, we need to use
a memory mapped interface. This changeset adds support for that by
reserving 0xFFFF0000-0xFFFFFFFF and mapping those to the generic IPR
interface for m5ops. The mapping is done in the
X86ISA::TLB::finalizePhysical() which means that it just works for all
of the CPU models, including virtualized ones.
The previous changeset (9816) that fixes the use of max ticks introduced the
variable cpt_starttick, which is used for setting the relative max tick.
Unfortunately, with checkpointing at an instruction count or with simpoints,
the checkpoint tick is not stored conveniently, so to ensure that cpt_starttick
is initialized, set it to 0. Also, if using --rel-max-tick, check the use of
instruction counts or simpoints to warn the user that the max tick setting does
not include the checkpoint ticks.
This patch adds support for specifying multi-channel memory
configurations on the command line, e.g. 'se/fs.py
--mem-type=ddr3_1600_x64 --mem-channels=4'. To enable this, it
enhances the functionality of MemConfig and moves the existing
makeMultiChannel class method from SimpleDRAM to the support scripts.
The se/fs.py example scripts are updated to make use of the new
feature.
This patch adds the notion of voltage domains, and groups clock
domains that operate under the same voltage (i.e. power supply) into
domains. Each clock domain is required to be associated with a voltage
domain, and the latter requires the voltage to be explicitly set.
A voltage domain is an independently controllable voltage supply being
provided to section of the design. Thus, if you wish to perform
dynamic voltage scaling on a CPU, its clock domain should be
associated with a separate voltage domain.
The current implementation of the voltage domain does not take into
consideration cases where there are derived voltage domains running at
ratio of native voltage domains, as with the case where there can be
on-chip buck/boost (charge pumps) voltage regulation logic.
The regression and configuration scripts are updated with a generic
voltage domain for the system, and one for the CPUs.
This patch moves the instantiation of the memory controller outside
FSConfig and instead relies on the mem_ranges to pass the information
to the caller (e.g. fs.py or one of the regression scripts). The main
motivation for this change is to expose the structural composition of
the memory system and allow more tuning and configuration without
adding a large number of options to the makeSystem functions.
The patch updates the relevant example scripts to maintain the current
functionality. As the order that ports are connected to the memory bus
changes (in certain regresisons), some bus stats are shuffled
around. For example, what used to be layer 0 is now layer 1.
Going forward, options will be added to support the addition of
multi-channel memory controllers.
This patch contains three fixes to max tick options handling in Options.py and
Simulation.py:
1) Since the global simulator frequency isn't bound until m5.instantiate()
is called, the maxtick resolution needs to happen after this call, since
changes to the global frequency will cause m5.simulate() to misinterpret the
maxtick value. Shuffling this also requires tweaking the checkpoint directory
handling to signal the checkpoint restore tick back to run(). Fixing this
completely and correctly will require storing the simulation frequency into
checkpoints, which is beyond the scope of this patch.
2) The maxtick option in Options.py was defaulted to MaxTicks, so the old code
would always skip over the maxtime part of the conditionals at the beginning
of run(). Change the maxtick default to None, and set the maxtick local
variable in run() appropriately.
3) To clarify whether max ticks settings are relative or absolute, split the
maxtick option into separate options, for relative and absolute. Ensure that
these two options and the maxtime option are handled appropriately to set the
maxtick variable in Simulation.py.
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.
The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).
The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.
All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.
The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
This patch adds a 'sys_clock' command-line option and use it to assign
clocks to the system during instantiation.
As part of this change, the default clock in the System class is
removed and whenever a system is instantiated a system clock value
must be set. A default value is provided for the command-line option.
The configs and tests are updated accordingly.
This patch adds a 'cpu_clock' command-line option and uses the value
to assign clocks to components running at the CPU speed (L1 and L2
including the L2-bus). The configuration scripts are updated
accordingly.
The 'clock' option is left unchanged in this patch as it is still used
by a number of components. In follow-on patches the latter will be
disambiguated further.
The --restore-with-cpu option didn't use CpuConfig.cpu_names() to
determine which CPU names are valid, instead it used a static list of
known CPU names. This changeset makes the option parsing code use the
CPU list from the CpuConfig module instead.
This patch changes the class names of the variuos DRAM configurations
to better reflect what memory they are based on. The speed and
interface width is now part of the name, and also the alias that is
used to select them on the command line.
Some minor changes are done to the actual parameters, to better
reflect the named configurations. As a result of these changes the
regressions change slightly and the stats will be bumped in a separate
patch.
This patch adds a typical (leaning towards fast) LPDDR3 configuration
based on publically available data. As expected, it looks very similar
to the LPDDR2-S4 configuration, only with a slightly lower burst time.
This patch removes the explicit memset as it is redundant and causes
the simulator to touch the entire space, forcing the host system to
allocate the pages.
Anonymous pages are mapped on the first access, and the page-fault
handler is responsible for zeroing them. Thus, the pages are still
zeroed, but we avoid touching the entire allocated space which enables
us to use much larger memory sizes as long as not all the memory is
actually used.
having separate params for the local/globalHistoryBits and the
local/globalPredictorSize can lead to inconsistencies if they
are not carefully set. this patch dervies the number of bits
necessary to index into the local/global predictors based on
their size.
the value of the localHistoryTableSize for the ARM O3 CPU has been
increased to 1024 from 64, which is more accurate for an A15 based
on some correlation against A15 hardware.
This patch enables selection of the memory controller class through a
mem-type command-line option. Behind the scenes, this option is
treated much like the cpu-type, and a similar framework is used to
resolve the valid options, and translate the short-hand description to
a valid class.
The regression scripts are updated with a hardcoded memory class for
the moment. The best solution going forward is probably to get the
memory out of the makeSystem functions, but Ruby complicates things as
it does not connect the memory controller to the membus.
--HG--
rename : configs/common/CpuConfig.py => configs/common/MemConfig.py
This patch is based on http://reviews.m5sim.org/r/1474/ originally written by
Mitch Hayenga. Basic block vectors are generated (simpoint.bb.gz in simout
folder) based on start and end addresses of basic blocks.
Some comments to the original patch are addressed and hooks are added to create
and resume from checkpoints based on instruction counts dictated by external
SimPoint analysis tools.
SimPoint creation/resuming options will be implemented as a separate patch.
In Simulation.py, calls to m5.simulate(num_ticks) will run the simulated system
for num_ticks after the current tick. Fix calls to m5.simulate in
scriptCheckpoints() and benchCheckpoints() to appropriately handle the maxticks
variable.
As of now, we mark the top 1MB of memory space as unusable. Part of
it is actually usable and is required to be marked so by some of the
newer versions of linux kernel. This patch marks the top 639KB as usable.
This value was chosen by looking at QEMU's output for bios memory map.
The default cache configuration script currently import the O3_ARM_v7a
model configuration, which depends on the O3 CPU. This breaks if gem5
has been compiled without O3 support. This changeset removes the
dependency by only importing the model if it is requested by the
user. As a bonus, it actually removes some code duplication in the
configuration scripts.
CPU switching consists of the following steps:
1. Drain the system
2. Switch out old CPUs (cpu.switchOut())
3. Change the system timing mode to the mode the new CPUs require
4. Flush caches if switching to hardware virtualization
5. Inform new CPUs of the handover (cpu.takeOverFrom())
6. Resume the system
m5.switchCpus() previously only did step 2 & 5. Since information
about the new processors' memory system requirements is now exposed,
do all of the steps above.
This patch adds automatic memory system switching and flush (if
needed) to switchCpus(). Additionally, it adds optional draining to
switchCpus(). This has the following implications:
* changeToTiming and changeToAtomic are no longer needed, so they have
been removed.
* changeMemoryMode is only used internally, so it is has been renamed
to be private.
* switchCpus requires a reference to the system containing the CPUs as
its first parameter.
WARNING: This changeset breaks compatibility with existing
configuration scripts since it changes the signature of
m5.switchCpus().
The CPUs supported by the configuration scripts used to be
hard-coded. This was not ideal for several reasons. For example, the
configuration scripts depend on all CPU models even though only a
subset might have been compiled.
This changeset adds a new module to the configuration scripts that
automatically discovers the available CPU models from the compiled
SimObjects. As a nice bonus, the use of introspection allows us to
automatically generate a list of available CPU models suitable for
printing. This list is augmented with the Python doc string from the
underlying class if available.
The configuration scripts currently hard-code the requirements of each
CPU. This is clearly not optimal as it makes writing new configuration
scripts painful and adding new CPU models requires existing scripts to
be updated. This patch adds the following class methods to the base
CPU and all relevant CPUs:
* memory_mode -- Return a string describing the current memory mode
(invalid/atomic/timing).
* require_caches -- Does the CPU model require caches?
* support_take_over -- Does the CPU support CPU handover?
The run() method in Simulation.py used to call sys.exit() when the
simulator exits. This is undesirable when user has requested the
simulator to be run in interactive mode since it causes the simulator
to exit rather than entering the interactive Python environment.
This patch moves the default DRAM parameters from the SimpleDRAM class
to two different subclasses, one for DDR3 and one for LPDDR2. More can
be added as we go forward.
The regressions that previously used the SimpleDRAM are now using
SimpleDDR3 as this is the most similar configuration.
This patch moves the branch predictor files in the o3 and inorder directories
to src/cpu/pred. This allows sharing the branch predictor across different
cpu models.
This patch was originally posted by Timothy Jones in July 2010
but never made it to the repository.
--HG--
rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc
rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh
rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh
rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh
Used as a command in full-system scripts helps the user ensure the benchmarks have finished successfully.
For example, one can use:
/path/to/benchmark args || /sbin/m5 fail 1
and thus ensure gem5 will exit with an error if the benchmark fails.
The defer_registration parameter is used to prevent a CPU from
initializing at startup, leaving it in the "switched out" mode. The
name of this parameter (and the help string) is confusing. This patch
renames it to switched_out, which should be more descriptive.
This patch generalises the address range resolution for the I/O cache
and I/O bridge such that they do not assume a single memory. The patch
involves adding a parameter to the system which is then defined based
on the memories that are to be visible from the I/O subsystem, whether
behind a cache or a bridge.
The change is needed to allow interleaved memory controllers in the
system.
globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled.
globalHistoryBits controls how much history is kept, global and choice
predictor sizes control how much of that history is used when accessing
predictor tables. This way, global and choice predictors can actually be
different sizes, and it is no longer possible to walk off the predictor arrays
and cause a seg fault.
There are now individual thresholds for choice, global, and local saturating
counters, so that taken/not taken decisions are correct even when the
predictors' counters' sizes are different.
The interface for localPredictorSize has been removed from TournamentBP because
the value can be calculated from localHistoryBits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>