This patch enables the use of the generator behaviours outside the
TrafficGen module. This is useful e.g. to allow packet replay modes
for other devices in the system without having to replace them with a
TrafficGen in the configuration files.
This change also enables more specific behaviours to be composed as
specific modules, e.g. BaseBandModem can use a number of generators
and have application-specific parameters based around a specific set
of generators.
This changeset adds support for m5 pseudo-ops when running in
kvm-mode. Unfortunately, we can't trap the normal gem5 co-processor
entry in KVM (it doesn't seem to be possible to trap accesses to
non-existing co-processors). We therefore use BZJ instructions to
cause a trap from virtualized mode into gem5. The BZJ instruction is
becomes a normal branch to the gem5 fallback code when running in
simulated mode, which means that this patch does not need to change
the ARM ISA-specific code.
Note: This requires a patched host kernel.
Architecture specific limitations:
* LPAE is currently not supported by gem5. We therefore panic if LPAE
is enabled when returning to gem5.
* The co-processor based interface to the architected timer is
unsupported. We can't support this due to limitations in the KVM
API on ARM.
* M5 ops are currently not supported. This requires either a kernel
hack or a memory mapped device that handles the guest<->m5
interface.
Add support for using the CPU cycle counter instead of a normal POSIX
timer to generate timed exits to gem5. This should, in theory, provide
better resolution when requesting timer signals.
The perf-based timer requires a fairly recent kernel since it requires
a working PERF_EVENT_IOC_PERIOD ioctl. This ioctl has existed in the
kernel for a long time, but it used to be completely broken due to an
inverted match when the kernel copied things from user
space. Additionally, the ioctl does not change the sample period
correctly on all kernel versions which implement it. It is currently
only known to work reliably on kernel version 3.7 and above on ARM.
Reduce the number of KVM->TC synchronizations by overloading the
getContext() method and only request an update when the TC is
requested as opposed to every time KVM returns to gem5.
This changeset introduces the architecture independent parts required
to support KVM-accelerated CPUs. It introduces two new simulation
objects:
KvmVM -- The KVM VM is a component shared between all CPUs in a shared
memory domain. It is typically instantiated as a child of the
system object in the simulation hierarchy. It provides access
to KVM VM specific interfaces.
BaseKvmCPU -- Abstract base class for all KVM-based CPUs. Architecture
dependent CPU implementations inherit from this class
and implement the following methods:
* updateKvmState() -- Update the
architecture-dependent KVM state from the gem5
thread context associated with the CPU.
* updateThreadContext() -- Update the thread context
from the architecture-dependent KVM state.
* dump() -- Dump the KVM state using (optional).
In order to deliver interrupts to the guest, CPU
implementations typically override the tick() method and
check for, and deliver, interrupts prior to entering
KVM.
Hardware-virutalized CPU currently have the following limitations:
* SE mode is not supported.
* PC events are not supported.
* Timing statistics are currently very limited. The current approach
simply scales the host cycles with a user-configurable factor.
* The simulated system must not contain any caches.
* Since cycle counts are approximate, there is no way to request an
exact number of cycles (or instructions) to be executed by the CPU.
* Hardware virtualized CPUs and gem5 CPUs must not execute at the
same time in the same simulator instance.
* Only single-CPU systems can be simulated.
* Remote GDB connections to the guest system are not supported.
Additionally, m5ops requires an architecture specific interface and
might not be supported.
Add the options 'panic_on_panic' and 'panic_on_oops' to the
LinuxArmSystem SimObject. When these option are enabled, the simulator
panics when the guest kernel panics or oopses. Enable panic on panic
and panic on oops in ARM-based test cases.
Previously, nextCycle() could return the *current* cycle if the current tick was
already aligned with the clock edge. This behavior is not only confusing (not
quite what the function name implies), but also caused problems in the
drainResume() function. When exiting/re-entering the sim loop (e.g., to take
checkpoints), the CPUs will drain and resume. Due to the previous behavior of
nextCycle(), the CPU tick events were being rescheduled in the same ticks that
were already processed before draining. This caused divergence from runs that
did not exit/re-entered the sim loop. (Initially a cycle difference, but a
significant impact later on.)
This patch separates out the two behaviors (nextCycle() and clockEdge()),
uses nextCycle() in drainResume, and uses clockEdge() everywhere else.
Nothing (other than name) should change except for the drainResume timing.
This patch is based on http://reviews.m5sim.org/r/1474/ originally written by
Mitch Hayenga. Basic block vectors are generated (simpoint.bb.gz in simout
folder) based on start and end addresses of basic blocks.
Some comments to the original patch are addressed and hooks are added to create
and resume from checkpoints based on instruction counts dictated by external
SimPoint analysis tools.
SimPoint creation/resuming options will be implemented as a separate patch.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
Currently the commit stage keeps a local copy of the interrupt object.
Since the interrupt is usually handled several cycles after the commit
stage becomes aware of it, it is possible that the local copy of the
interrupt object may not be the interrupt that is actually handled.
It is possible that another interrupt occurred in the
interval between interrupt detection and interrupt handling.
This patch creates a copy of the interrupt just before the interrupt
is handled. The local copy is ignored.
This patch changes the port in the CPU classes to use MasterPort
instead of the derived CpuPort. The functions of the CpuPort are now
distributed across the relevant subclasses. The port accessor
functions (getInstPort and getDataPort) now return a MasterPort
instead of a CpuPort. This simplifies creating derivative CPUs that do
not use the CpuPort.
This patch comments out the inclusion of the inorder TLBUnit which is
only used in the 9-stage pipeline. With the TLBUnit present, gcc >=
4.6 in combination with LTO ends up throwing away the definition of
the TLBUnit destructor, and consequently fail to link. See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53808 for more details
about the bug, and http://gcc.gnu.org/ml/gcc/2012-06/msg00397.html for
the discussion thread that also touches on similar issues seen with
clang.
The traffic generator used to incorrectly determine the next state in
when state 0 had a non-zero probability. Due to the way the next
transition was determined, state 0 could never be entered other than
as an initial state. This changeset updates the transitition() method
to correctly handle such cases and cases where the transition matrix
is a 1x1 matrix.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
This patch fixes the warnings that clang3.2svn emit due to the "-Wall"
flag. There is one case of an uninitialised value in the ARM neon ISA
description, and then a whole range of unused private fields that are
pruned.
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
This patch moves the 16x APIC clock divider to the Python code to
avoid the post-instantiation modifications to the clock. The x86 APIC
was the only object setting the clock after creation time and this
required some custom functionality and configuration. With this patch,
the clock multiplier is moved to the Python code and the objects are
instantiated with the appropriate clock.
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.
There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
setMiscReg currently makes a new entry for each write to a misc reg without
checking for duplicates, this can cause a triggering of the assert if an
instruction get replayed and writes to the same misc regs multiple times.
This fix prevents duplicate entries and instead updates the value.
The rename can mis-handle serializing instructions (i.e. strex) if it gets
into a resource constrained situation and the serializing instruction has
to be placed on the skid buffer to handle blocking. In this situation the
instruction informs the pipeline it is serializing and logs that the next
instruction must be serialized, but since we are blocking the pipeline
defers this action to place the serializing instruction and
incoming instructions into the skid buffer. When resuming from blocking,
rename will pull the serializing instruction from the skid buffer and
the current logic will see this as the "next" instruction that has to
be serialized and because of flags set on the serializing instruction,
it passes through the pipeline stage as normal and resets rename to
non-serializing. This causes instructions to follow the serializing inst
incorrectly and eventually leads to an error in the pipeline. To fix this
rename should check first if it has to block before checking for serializing
instructions.
Fixes the tick used from rename:
- previously this gathered the tick on leaving rename which was always 1 less
than the dispatch. This conflated the decode ticks when back pressure built
in the pipeline.
- now picks up tick on entry.
Added --store_completions flag:
- will additionally display the store completion tail in the viewer.
- this highlights periods when large numbers of stores are outstanding (>16 LSQ
blocking)
Allows selection by tick range (previously this caused an infinite loop)
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.
To make memory mode tests cleaner, the following methods are added to
the System class:
* isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
* isTimingMode() -- True if the memory mode is 'timing'.
* bypassCaches() -- True if caches should be bypassed.
The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
CPUs need to test that the memory system is in the right mode in two
places, when the CPU is initialized (unless it's switched out) and on
a drainResume(). This led to some code duplication in the CPU
models. This changeset introduces the verifyMemoryMode() method which
is called by BaseCPU::init() if the CPU isn't switched out. The
individual CPU models are responsible for calling this method when
resuming from a drain as this code is CPU model specific.
Checker CPUs currently don't inherit from the CheckerCPU in the Python
object hierarchy. This has two consequences:
* It makes CPU model discovery from the Python world somewhat
complicated as there is no way of testing if a CPU is a checker.
* Parameters are duplicated in the checker configuration
specification.
This changeset makes all checker CPUs inherit from the base checker
CPU class.
The configuration scripts currently hard-code the requirements of each
CPU. This is clearly not optimal as it makes writing new configuration
scripts painful and adding new CPU models requires existing scripts to
be updated. This patch adds the following class methods to the base
CPU and all relevant CPUs:
* memory_mode -- Return a string describing the current memory mode
(invalid/atomic/timing).
* require_caches -- Does the CPU model require caches?
* support_take_over -- Does the CPU support CPU handover?
Fix a case in the O3 CPU where the decode stage blocks and unblocks in a
single cycle sending both signals to fetch which causes an assert or worse.
The previous check could never work before since the status was set to Blocked
before a test for the status being Unblocking was executed.
Check if an instruction just enabled interrupts and we've previously had an
interrupt pending that was not handled because interrupts were subsequently
disabled before the pipeline reached a place to handle the interrupt. In that
case squash now to make sure the interrupt is handled.
This patch moves the branch predictor files in the o3 and inorder directories
to src/cpu/pred. This allows sharing the branch predictor across different
cpu models.
This patch was originally posted by Timothy Jones in July 2010
but never made it to the repository.
--HG--
rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc
rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh
rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh
rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh
There was an issue w/ the rename logic, which would assign a previous physical
register to the ZeroReg architectural register in x86. This issue was giving
problems for instructions squashed in threads w/ ID different from 0,
sometimes allowing non-mispredicted instructions to obtain a value different
from zero when reading the zeroReg.
The changes made by the changeset 270c9a75e91f do not work well with switching
of cpus. The problem is that decoder for the old thread context holds state
that is not taken over by the new decoder.
This patch adds a takeOverFrom() function to Decoder class in each ISA. Except
for x86, functions in other ISAs are blank. For x86, the function copies state
from the old decoder to the new decoder.
Move the increment/decrement of wbOutstanding outside of the comparison
in incrWb and decrWb in the IEW. This also fixes a compiler bug with gcc
4.4.7, which incorrectly optimizes "-- ==" as "-=".
The changes made by the changeset 9376 were not quite correct. The patch made
changes to the code which resulted in decoder not getting initialized correctly
when the state was restored from a checkpoint.
This patch adds a startup function to each ISA object. For x86, this function
sets the required state in the decoder. For other ISAs, the function is empty
right now.
Cleanup the serialization code for the simple CPUs and the O3 CPU. The
CPU-specific code has been replaced with a (un)serializeThread that
serializes the thread state / context of a specific thread. Assuming
that the thread state class uses the CPU-specific thread state uses
the base thread state serialization code, this allows us to restore a
checkpoint with any of the CPU models.
This changeset inserts a TLB flush in BaseCPU::switchOut to prevent
stale translations when doing repeated switching. Additionally, the
TLB flushing functionality is exported to the Python to make debugging
of switching/checkpointing easier.
A simulation script will typically use the TLB flushing functionality
to generate a reference trace. The following sequence can be used to
simulate a handover (this depends on how drain is implemented, but is
generally the case) between identically configured CPU models:
m5.drain(test_sys)
[ cpu.flushTLBs() for cpu in test_sys.cpu ]
m5.resume(test_sys)
The generated trace should normally be identical to a trace generated
when switching between identically configured CPU models or
checkpointing and resuming.
Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.
Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.
Currently, the atomic CPU can be in the middle of a microcode sequence
when it is drained. This leads to two problems:
* When switching to a hardware virtualized CPU, we obviously can't
execute gem5 microcode.
* Since curMacroStaticInst is populated when executing microcode,
repeated switching between CPUs executing microcode leads to
incorrect execution.
After applying this patch, the CPU will be on a proper instruction
boundary, which means that it is safe to switch to any CPU model
(including hardware virtualized ones). This changeset fixes a bug
where the multiple switches to the same atomic CPU sometimes corrupts
the target state because of dangling pointers to the currently
executing microinstruction.
Note: This changeset moves tick event descheduling from switchOut() to
drain(), which makes timing consistent between just draining a system
and draining /and/ switching between two atomic CPUs. This makes
debugging quite a lot easier (execution traces get the same timing),
but the latency of the last instruction before a drain will not be
accounted for correctly (it will always be 1 cycle).
Note 2: This changeset removes so_state variable, the locked variable,
and the tickEvent from checkpoints since none of them contain state
that needs to be preserved across checkpoints. The so_state is made
redundant because we don't use the drain state variable anymore, the
lock variable should never be set when the system is drained, and the
tick event isn't scheduled.
Currently, the timing CPU can be in the middle of a microcode sequence
or multicycle (stayAtPC is true) instruction when it is drained. This
leads to two problems:
* When switching to a hardware virtualized CPU, we obviously can't
execute gem5 microcode.
* If stayAtPC is true we might execute half of an instruction twice
when restoring a checkpoint or switching CPUs, which leads to an
incorrect execution.
After applying this patch, the CPU will be on a proper instruction
boundary, which means that it is safe to switch to any CPU model
(including hardware virtualized ones). This changeset also fixes a bug
where the timing CPU sometimes switches out with while stayAtPC is
true, which corrupts the target state after a CPU switch or
checkpoint.
Note: This changeset removes the so_state variable from checkpoints
since the drain state isn't used anymore.
The thread context handover code used to break when multiple handovers
were performed during the same quiesce period. Previously, the thread
contexts would assign the TC pointer in the old quiesce event to the
new TC. This obviously broke in cases where multiple switches were
performed within the same quiesce period, in which case the TC pointer
in the quiesce event would point to an old CPU.
The new implementation deschedules pending quiesce events in the old
TC and schedules a new quiesce event in the new TC. The code has been
refactored to remove most of the code duplication.
Commit can currently both commit and squash in the same cycle. This
confuses other stages since the signals coming from the commit stage
can only signal either a squash or a commit in a cycle. This changeset
changes the behavior of squashAfter so that it commits all
instructions, including the instruction that requested the squash, in
the first cycle and then starts to squash in the next cycle.
The defer_registration parameter is used to prevent a CPU from
initializing at startup, leaving it in the "switched out" mode. The
name of this parameter (and the help string) is confusing. This patch
renames it to switched_out, which should be more descriptive.
This patch introduces the following sanity checks when switching
between CPUs:
* Check that the set of new and old CPUs do not overlap. Having an
overlap between the set of new CPUs and the set of old CPUs is
currently not supported. Doing such a switch used to result in the
following assertion error:
BaseCPU::takeOverFrom(BaseCPU*): \
Assertion `!new_itb_port->isConnected()' failed.
* Check that all new CPUs are in the switched out state.
* Check that all old CPUs are in the switched in state.
This patch cleans up the CPU switching functionality by making sure
that CPU models consistently call the parent on switchOut() and
takeOverFrom(). This has the following implications that might alter
current functionality:
* The call to BaseCPU::switchout() in the O3 CPU is moved from
signalDrained() (!) to switchOut().
* A call to BaseSimpleCPU::switchOut() is introduced in the simple
CPUs.
The O3 CPU used to copy its thread context to a SimpleThread in order
to do serialization. This was a bit of a hack involving two static
SimpleThread instances and a magic constructor that was only used by
the O3 CPU.
This patch moves the ThreadContext serialization code into two global
procedures that, in addition to the normal serialization parameters,
take a ThreadContext reference as a parameter. This allows us to reuse
the serialization code in all ThreadContext implementations.
The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.
This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.
Some architectures map registers differently depending on their mode
of operations. There is currently no architecture independent way of
accessing all registers. This patch introduces a flat register
interface to the ThreadContext class. This interface is useful, for
example, when serializing or copying thread contexts.
After making the ISA an independent SimObject, it is serialized
automatically by the Python world. Previously, this just resulted in
an empty ISA section. This patch moves the contents of the ISA to that
section and removes the explicit ISA serialization from the thread
contexts, which makes it behave like a normal SimObject during
serialization.
Note: This patch breaks checkpoint backwards compatibility! Use the
cpt_upgrader.py utility to upgrade old checkpoints to the new format.
This patch adds checks to all CPU models to make sure that the memory
system is in the correct mode at startup and when resuming after a
drain. Previously, we only checked that the memory system was in the
right mode when resuming. This is inadequate since this is a
configuration error that should be detected at startup as well as when
resuming. Additionally, since the check was done using an assert, it
wasn't performed when NDEBUG was set (e.g., the fast target).
This patch adds support for reading input traces encoded using
protobuf according to what is done in the CommMonitor.
A follow-up patch adds a Python script that can be used to convert the
previously used ASCII traces to protobuf equivalents. The appropriate
regression input is updated as part of this patch.
This patch encapsulates the traffic generator input in a stream class
such that the parsing is not visible to the trace generator. The
change takes us one step closer to using protobuf-based input traces
for the trace replay.
The functionality of the current input stream is identical to what it
was, and the ASCII format remains the same for now.
This patch fixes the computation that determines whether to perform a
read or a write such that the two corner cases (0 and 100) are both
more efficient and handled correctly.
The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked
address. Previously we relied on the cache to handle the locking for us, however
some users on the gem5 mailing list reported a case where the cpu speculatively
executes a ll operation after a pending sc operation in the pipeline and that
makes the cache monitor valid. This should handle that case by invaliding the
local monitor.
isSyscall was originally created because during handling of a syscall in SE
mode the threadcontext had to be updated. However, in many places this is used
in FS mode (e.g. fault handlers) and the name doesn't make much sense. The
boolean actually stops gem5 from squashing speculative and non-committed state
when a write to a threadcontext happens, so re-name the variable to something
more appropriate
This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
The directed tester supports only generating only read or only write accesses. The
patch modifies the tester to support streams that have both read and write accesses.
globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled.
globalHistoryBits controls how much history is kept, global and choice
predictor sizes control how much of that history is used when accessing
predictor tables. This way, global and choice predictors can actually be
different sizes, and it is no longer possible to walk off the predictor arrays
and cause a seg fault.
There are now individual thresholds for choice, global, and local saturating
counters, so that taken/not taken decisions are correct even when the
predictors' counters' sizes are different.
The interface for localPredictorSize has been removed from TournamentBP because
the value can be calculated from localHistoryBits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
SWIG needs a complete declaration of all wrapped objects. This patch
adds a header file with the DerivO3CPU class and includes it in the
SWIG interface.
--HG--
rename : src/cpu/o3/cpu_builder.cc => src/cpu/o3/deriv.cc
In order to create reliable SWIG wrappers, we need to include the
declaration of the wrapped class in the SWIG file. Previously, we
didn't expose the declaration of checker CPUs. This patch adds header
files for such CPUs and include them in the SWIG wrapper.
--HG--
rename : src/cpu/dummy_checker_builder.cc => src/cpu/dummy_checker.cc
rename : src/cpu/o3/checker_builder.cc => src/cpu/o3/checker.cc
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.
This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
This patch enables dumping statistics and Linux process information on
context switch boundaries (__switch_to() calls) that are used for
Streamline integration (a graphical statistics viewer from ARM).
The Memtest tester allows for only one request to be outstanding for a
particular physical address. The check has been written separately for
reads and writes. This patch moves the check earlier than its current
position so that it need not be written separately for reads and writes.
This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.
The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.
This patch addresses a number of smaller issues identified by the code
inspection utility cppcheck. There are a number of identified leaks in
the arm/linux/system.cc (although the function only get's called once
so it is not a major problem), a few deletes in dev/x86/i8042.cc that
were not array deletes, and sprintfs where the character array had one
element less than needed. In the IIC tags there was a function
allocating an array of longs which is in fact never used.
This patch changes the CoherentBus between the L1s and L2 to use the
CPU clock and also four times the width compared to the default
bus. The parameters are not intending to fit every single scenario,
but rather serve as a better startingpoint than what we previously
had.
Note that the scripts that do not use the addTwoLevelCacheHiearchy are
not affected by this change.
A separate patch will update the stats.
This patch adds a traffic generator to the code base. The generator is
aimed to be used as a black box model to create appropriate use-cases
and benchmarks for the memory system, and in particular the
interconnect and the memory controller.
The traffic generator is a master module, where the actual behaviour
is captured in a state-transition graph where each state generates
some sort of traffic. By constructing a graph it is possible to create
very elaborate scenarios from basic generators. Currencly the set of
generators include idling, linear address sweeps, random address
sequences and playback of traces (recording will be done by the
Communication Monitor in a follow-up patch). At the moment the graph
and the states are described in an ad-hoc line-based format, and in
the future this should be aligned with our used of e.g. the Google
protobufs. Similarly for the traces, the format is currently a
simplistic ad-hoc line-based format that merely serves as a starting
point.
In addition to being used as a black-box model for system components,
the traffic generator is also useful for creating test cases and
regressions for the interconnect and memory system. In future patches
we will use the traffic generator to create DRAM test cases for the
controller model.
The patch following this one adds a basic regressions which also
contains an example configuration script and trace file for playback.
This patch takes the final plunge and transitions from the templated
Range class to the more specific AddrRange. In doing so it changes the
obvious Range<Addr> to AddrRange, and also bumps the range_map to be
AddrRangeMap.
In addition to the obvious changes, including the removal of redundant
includes, this patch also does some house keeping in preparing for the
introduction of address interleaving support in the ranges. The Range
class is also stripped of all the functionality that is never used.
--HG--
rename : src/base/range.hh => src/base/addr_range.hh
rename : src/base/range_map.hh => src/base/addr_range_map.hh
The profileEvent pointer is tested against NULL in various places, but
it is not initialized unless running in full-system mode. In SE mode, this
can result in segmentation faults when profileEvent default intializes to
something other than NULL.
This patch is a first step to using Cycles as a parameter type. The
main affected modules are the CPUs and the Ruby caches. There are
definitely plenty more places that are affected, but this patch serves
as a starting point to making the transition.
An important part of this patch is to actually enable parameters to be
specified as Param.Cycles which involves some changes to params.py.
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.
In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.
Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
This patch tightens up the semantics around port binding and checks
that the ports that are being bound are currently not connected, and
similarly connected before unbind is called.
The patch consequently also changes the order of the unbind and bind
for the switching of CPUs to ensure that the rules are adhered
to. Previously the ports would be "over-written" without any check.
There are no changes in behaviour due to this patch, and the only
place where the unbind functionality is used is in the CPU.
This patch updates how the checker CPU handles the ports such that the
regressions will once again run without causing a panic.
A minor amount of tidying up was also done as part of this patch.
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).
The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.
Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.
The inorder CPU is particularly interesting as it uses a different
name for the parameter, and never make any use of it internally.
This patch makes the Tick unsigned and removes the UTick typedef. The
ticks should never be negative, and there was only one major issue
with removing it, caused by the o3 CPU using a -1 as an initial value.
The patch has no impact on any regressions.
This patch moves the clock of the CPU, bus, and numerous devices to
the new class ClockedObject, that sits in between the SimObject and
MemObject in the class hierarchy. Although there are currently a fair
amount of MemObjects that do not make use of the clock, they
potentially should do so, e.g. the caches should at some point have
the same clock as the CPU, potentially with a 1:n ratio. This patch
does not introduce any new clock objects or object hierarchies
(clusters, clock domains etc), but is still a step in the direction of
having a more structured approach clock domains.
The most contentious part of this patch is the serialisation of clocks
that some of the modules (but not all) did previously. This
serialisation should not be needed as the clock is set through the
parameters even when restoring from the checkpoint. In other words,
the state is "stored" in the Python code that creates the modules.
The nextCycle methods are also simplified and the clock phase
parameter of the CPU is removed (this could be part of a clock object
once they are introduced).
This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.
This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.
This replaces a (potentially uninitialized) string
field with a virtual function so that we can have
a safe interface without requiring changes to the
eio code.
when using the checker i ran into problems where an instruction reading the
cpu id register failed because the ids did not match, and hence, the result
of the instruction did not match. this patch ensures that the ids match so
this instruction does not fail. this problem only seemed to manifest itself
when multiple cores were in the system, either multi-core, or extra switched-
out cores present in the system.
This patch is a first step to align the port names used in the Python
world and the C++ world. Ultimately it serves to make the use of
config.json together with output from the simulation easier, including
post-processing of statistics.
Most notably, the CPU, cache, and bus is addressed in this patch, and
there might be other ports that should be updated accordingly. The
dash name separator has also been replaced with a "." which is what is
used to concatenate the names in python, and a separation is made
between the master and slave port in the bus.
This patch is the last part of moving all protocol-related
functionality out of the Port base class. All the send/recv functions
are already moved, and the retry (which still governs all the timing
transport functions) is the only part that remained in the base class.
The only point where this currently causes a bit of inconvenience is
in the bus where the retry list is global and holds Port pointers (not
Master/SlavePort). This is about to change with the split into a
request/response bus and will soon be removed anyway.
The patch has no impact on any regressions.
This patch is the result of static analysis identifying a number of
memory leaks. The leaks are all benign as they are a result of not
deallocating memory in the desctructor. The fix still has value as it
removes false positives in the static analysis.
Add new flag (named pushedRAS) in the PredictorHistory structure.
This flag tracks whether the RAS has been pushed or not during a prediction.
Then, in the squash function it is used to pop the RAS if necessary.
initCPU() will be called to initialize switched out CPUs for the simple and
inorder CPU models. this patch prevents those CPUs from being initialized
because they should get their state from the active CPU when it is switched
out.
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
This patch introduces a class hierarchy of buses, a non-coherent one,
and a coherent one, splitting the existing bus functionality. By doing
so it also enables further specialisation of the two types of buses.
A non-coherent bus connects a number of non-snooping masters and
slaves, and routes the request and response packets based on the
address. The request packets issued by the master connected to a
non-coherent bus could still snoop in caches attached to a coherent
bus, as is the case with the I/O bus and memory bus in most system
configurations. No snoops will, however, reach any master on the
non-coherent bus itself. The non-coherent bus can be used as a
template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses,
and is typically used for the I/O buses.
A coherent bus connects a number of (potentially) snooping masters and
slaves, and routes the request and response packets based on the
address, and also forwards all requests to the snoopers and deals with
the snoop responses. The coherent bus can be used as a template for
modelling QPI, HyperTransport, ACE and coherent OCP buses, and is
typically used for the L1-to-L2 buses and as the main system
interconnect.
The configuration scripts are updated to use a NoncoherentBus for all
peripheral and I/O buses.
A bit of minor tidying up has also been done.
--HG--
rename : src/mem/bus.cc => src/mem/coherent_bus.cc
rename : src/mem/bus.hh => src/mem/coherent_bus.hh
rename : src/mem/bus.cc => src/mem/noncoherent_bus.cc
rename : src/mem/bus.hh => src/mem/noncoherent_bus.hh
This patch removes the Packet::NodeID typedef and unifies it with the
Port::PortId. The src and dest fields in the packet are used to hold a
port id (e.g. in the bus), and thus the two should actually be the
same.
The typedef PortID is now global (in base/types.hh) and aligned with
the ThreadID in terms of capitalisation and naming of the
InvalidPortID constant.
Before this patch, two flags were used for valid destination and
source, rather than relying on a named value (InvalidPortID), and
this is now redundant, as the src and dest field themselves are
sufficient to tell whether the current value is a valid port
identifier or not. Consequently, the VALID_SRC and VALID_DST are
removed.
As part of the cleaning up, a number of int parameters and local
variables are updated to use PortID.
Note that Ruby still has its own NodeID typedef. Furthermore, the
MemObject getMaster/SlavePort still has an int idx parameter with a
default value of -1 which should eventually change to PortID idx =
InvalidPortID.
This will allow it to be specialized by the ISAs. The existing caching scheme
is provided by the BasicDecodeCache in the GenericISA namespace and is built
from the generalized components.
--HG--
rename : src/cpu/decode_cache.cc => src/arch/generic/decode_cache.cc
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.
--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.
For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.
This patch introduces the PortId type, moves the definition of
INVALID_PORT_ID to the Port class, and also gives every port an id to
reflect the fact that each element in a vector port has an
identifier/index.
Previously the bus and Ruby testers (and potentially other users of
the vector ports) added the id field in their port subclasses, and now
this functionality is always present as it is moved to the base class.
Put the { on the same line as the if and put a space between the if and the
open paren. Also, use the # format modifier which puts a 0x in front of hex
values automatically. If the ExtMachInst type isn't integral and actually
prints something more complicated, the # falls away harmlessly and we aren't
left with a phantom 0x followed by a bunch of unrelated text.
This patch simplifies future patches by changing the pointer type used
in a number of the Ruby testers to use MasterPort instead of using a
derived CpuPort class. There is no reason for using the more
specialised pointers, and there is no longer a need to do any casting.
With the latest changes to the tester, organising ports as readers and
writes, things got a bit more complicated, and the "type" now had to
be removed to be able to fall back to using MasterPort rather than
CpuPort.
This patch simplifies the packet by removing the broadcast flag and
instead more firmly relying on (and enforcing) the semantics of
transactions in the classic memory system, i.e. request packets are
routed from a master to a slave based on the address, and when they
are created they have neither a valid source, nor destination. On
their way to the slave, the request packet is updated with a source
field for all modules that multiplex packets from multiple master
(e.g. a bus). When a request packet is turned into a response packet
(at the final slave), it moves the potentially populated source field
to the destination field, and the response packet is routed through
any multiplexing components back to the master based on the
destination field.
Modules that connect multiplexing components, such as caches and
bridges store any existing source and destination field in the sender
state as a stack (just as before).
The packet constructor is simplified in that there is no longer a need
to pass the Packet::Broadcast as the destination (this was always the
case for the classic memory system). In the case of Ruby, rather than
using the parameter to the constructor we now rely on setDest, as
there is already another three-argument constructor in the packet
class.
In many places where the packet information was printed as part of
DPRINTFs, request packets would be printed with a numeric "dest" that
would always be -1 (Broadcast) and that field is now removed from the
printing.
This patch introduces port access methods that separates snoop
request/responses from normal memory request/responses. The
differentiation is made for functional, atomic and timing accesses and
builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the
different phases of the protocol (request -> [forwarded snoop request
-> snoop response]* -> response) all use the same port access
functions, even though the snoop packets flow in the opposite
direction to the normal packet. That is, a coherent master sends
normal request and receives responses, but receives snoop requests and
sends snoop responses (vice versa for the slave). These two distinct
phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a
slave through sendFunctional, and the request packet is turned into a
response before the call returns. In a system without cache coherence,
this is all that is needed from the functional interface. For the
cache-coherent scenario, a slave also sends snoop requests to coherent
masters through sendFunctionalSnoop, with responses returned within
the same packet pointer. This is currently used by the bus and caches,
and the LSQ of the O3 CPU. The send/recvFunctional and
send/recvFunctionalSnoop are moved from the Port super class to the
appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with
request being sent from master to slave through sendAtomic. In the
case of cache-coherent ports, a slave can send snoop requests to a
master through sendAtomicSnoop. Just as for the functional access
methods, the atomic send and receive member functions are moved to the
appropriate subclasses.
The timing access methods are different from the functional and atomic
in that requests and responses are separated in time and
send/recvTiming are used for both directions. Hence, a master uses
sendTiming to send a request to a slave, and a slave uses sendTiming
to send a response back to a master, at a later point in time. Snoop
requests and responses travel in the opposite direction, similar to
what happens in functional and atomic accesses. With the introduction
of this patch, it is possible to determine the direction of packets in
the bus, and no longer necessary to look for both a master and a slave
port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming
that are pure virtual functions, the recvFunctionalSnoop,
recvAtomicSnoop and recvTimingSnoop have a default implementation that
calls panic. This is to allow non-coherent master and slave ports to
not implement these functions.
This patch addresses a number of minor issues that cause problems when
compiling with clang >= 3.0 and gcc >= 4.6. Most importantly, it
avoids using the deprecated ext/hash_map and instead uses
unordered_map (and similarly so for the hash_set). To make use of the
new STL containers, g++ and clang has to be invoked with "-std=c++0x",
and this is now added for all gcc versions >= 4.6, and for clang >=
3.0. For gcc >= 4.3 and <= 4.5 and clang <= 3.0 we use the tr1
unordered_map to avoid the deprecation warning.
The addition of c++0x in turn causes a few problems, as the
compiler is more stringent and adds a number of new warnings. Below,
the most important issues are enumerated:
1) the use of namespaces is more strict, e.g. for isnan, and all
headers opening the entire namespace std are now fixed.
2) another other issue caused by the more stringent compiler is the
narrowing of the embedded python, which used to be a char array,
and is now unsigned char since there were values larger than 128.
3) a particularly odd issue that arose with the new c++0x behaviour is
found in range.hh, where the operator< causes gcc to complain about
the template type parsing (the "<" is interpreted as the beginning
of a template argument), and the problem seems to be related to the
begin/end members introduced for the range-type iteration, which is
a new feature in c++11.
As a minor update, this patch also fixes the build flags for the clang
debug target that used to be shared with gcc and incorrectly use
"-ggdb".
This patch removes the assumption on having on single instance of
PhysicalMemory, and enables a distributed memory where the individual
memories in the system are each responsible for a single contiguous
address range.
All memories inherit from an AbstractMemory that encompasses the basic
behaviuor of a random access memory, and provides untimed access
methods. What was previously called PhysicalMemory is now
SimpleMemory, and a subclass of AbstractMemory. All future types of
memory controllers should inherit from AbstractMemory.
To enable e.g. the atomic CPU and RubyPort to access the now
distributed memory, the system has a wrapper class, called
PhysicalMemory that is aware of all the memories in the system and
their associated address ranges. This class thus acts as an
infinitely-fast bus and performs address decoding for these "shortcut"
accesses. Each memory can specify that it should not be part of the
global address map (used e.g. by the functional memories by some
testers). Moreover, each memory can be configured to be reported to
the OS configuration table, useful for populating ATAG structures, and
any potential ACPI tables.
Checkpointing support currently assumes that all memories have the
same size and organisation when creating and resuming from the
checkpoint. A future patch will enable a more flexible
re-organisation.
--HG--
rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py
rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py
rename : src/mem/physical.cc => src/mem/abstract_mem.cc
rename : src/mem/physical.hh => src/mem/abstract_mem.hh
rename : src/mem/physical.cc => src/mem/simple_mem.cc
rename : src/mem/physical.hh => src/mem/simple_mem.hh
This patch removes the physmem_port from the Atomic CPU and instead
uses the system pointer to access the physmem when using the fastmem
option. The system already keeps track of the physmem and the valid
memory address ranges, and with this patch we merely make use of that
existing functionality. As a result of this change, the overloaded
getMasterPort in the Atomic CPU can be removed, thus unifying the CPUs.
This patch introduces the notion of a master and slave port in the C++
code, thus bringing the previous classification from the Python
classes into the corresponding simulation objects and memory objects.
The patch enables us to classify behaviours into the two bins and add
assumptions and enfore compliance, also simplifying the two
interfaces. As a starting point, isSnooping is confined to a master
port, and getAddrRanges to slave ports. More of these specilisations
are to come in later patches.
The getPort function is not getMasterPort and getSlavePort, and
returns a port reference rather than a pointer as NULL would never be
a valid return value. The default implementation of these two
functions is placed in MemObject, and calls fatal.
The one drawback with this specific patch is that it requires some
code duplication, e.g. QueuedPort becomes QueuedMasterPort and
QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort
(avoiding multiple inheritance). With the later introduction of the
port interfaces, moving the functionality outside the port itself, a
lot of the duplicated code will disappear again.
This patch unifies where initMemProxies is called, in the init()
method of each BaseCPU subclass, before TheISA::initCPU is
called. Moreover, it also ensures that initMemProxies is called in
both full-system and syscall-emulation mode, thus unifying also across
the modes. An additional check is added in the ThreadState to ensure
that initMemProxies is only called once.
This patch removes the overriding of "-Werror" in a handful of
cases. The code compiles with gcc 4.6.3 and clang 3.0 without any
warnings, and thus without any errors. There are no functional changes
introduced by this patch. In the future, rather than ypassing
"-Werror", address the warnings.
This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
This patch makes the code compile with clang 2.9 and 3.0 again by
making two very minor changes. Firt, it maintains a strict typing in
the forward declaration of the BaseCPUParams. Second, it adds a
FullSystemInt flag of the type unsigned int next to the boolean
FullSystem flag. The FullSystemInt variable can be used in
decode-statements (expands to switch statements) in the instruction
decoder.
Making the CheckerCPU a runtime time option requires the code to be compatible
with ISAs other than ARM. This patch adds the appropriate function
stubs to allow compilation.
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files. Also merges with the SE/FS changes.
This patch adds a creation-time check to the CPU to ensure that the
interrupt controller is created for the cases where it is needed,
i.e. if the CPU is not being switched in later and not a checker CPU.
The patch also adds the "createInterruptController" call to a number
of the regression scripts.
This patch simplfies the master ports used by RubyDirectedTester and
RubyTester by avoiding the use of SimpleTimingPort. Neither tester
made any use of the functionality offered by SimpleTimingPort besides
a trivial implementation of recvFunctional (only snoops) and
recvRangeChange (not relevant since there is only one master).
The patch does not change or add any functionality, it merely makes
the introduction of a master/slave port easier (in a future patch).
This patch moves the readBlob/writeBlob/memsetBlob from the Port class
to the PortProxy class, thus making a clear separation of the basic
port functionality (recv/send functional/atomic/timing), and the
higher-level functional accessors available on the port proxies.
There are only a few places in the code base where the blob functions
were used on ports, and they are all for peeking into the memory
system without making a normal memory access (in the memtest, and the
malta and tsunami pchip). The memtest also exemplifies how easy it is
to create a non-translating proxy if desired. The malta and tsunami
pchip used a slave port to perform a functional read, and this is now
changed to rely on the physProxy of the system (to which they already
have a pointer).
This patch is adding a clearer design intent to all objects that would
not be complete without a port proxy by making the proxies members
rathen than dynamically allocated. In essence, if NULL would not be a
valid value for the proxy, then we avoid using a pointer to make this
clear.
The same approach is used for the methods using these proxies, such as
loadSections, that now use references rather than pointers to better
reflect the fact that NULL would not be an acceptable value (in fact
the code would break and that is how this patch started out).
Overall the concept of "using a reference to express unconditional
composition where a NULL pointer is never valid" could be done on a
much broader scale throughout the code base, but for now it is only
done in the locations affected by the proxies.
This patch moves all port creation from the getPort method to be
consistently done in the MemObject's constructor. This is possible
thanks to the Swig interface passing the length of the vector ports.
Previously there was a mix of: 1) creating the ports as members (at
object construction time) and using getPort for the name resolution,
or 2) dynamically creating the ports in the getPort call. This is now
uniform. Furthermore, objects that would not be complete without a
port have these ports as members rather than having pointers to
dynamically allocated ports.
This patch also enables an elaboration-time enumeration of all the
ports in the system which can be used to determine the masterId.
This patch continues the unification of how the different CPU models
create and share their instruction and data ports. Most importantly,
it forces every CPU to have an instruction and a data port, and gives
these ports explicit getters in the BaseCPU (getDataPort and
getInstPort). The patch helps in simplifying the code, make
assumptions more explicit, andfurther ease future patches related to
the CPU ports.
The biggest changes are in the in-order model (that was not modified
in the previous unification patch), which now moves the ports from the
CacheUnit to the CPU. It also distinguishes the instruction fetch and
load-store unit from the rest of the resources, and avoids the use of
indices and casting in favour of keeping track of these two units
explicitly (since they are always there anyways). The atomic, timing
and O3 model simply return references to their already existing ports.
1. Updates the Branch Predictor correctly to the state
just after a mispredicted branch, if a squash occurs.
2. If a BTB does not find an entry, the branch is predicted not taken.
The global history is modified to correctly reflect this prediction.
3. Local history is now updated at the fetch stage instead of
execute stage.
4. In the Update stage of the branch predictor the local predictors are
now correctly updated according to the state of local history during
fetch stage.
This patch also improves performance by as much as 17% on some benchmarks
This patch classifies all ports in Python as either Master or Slave
and enforces a binding of master to slave. Conceptually, a master (such
as a CPU or DMA port) issues requests, and receives responses, and
conversely, a slave (such as a memory or a PIO device) receives
requests and sends back responses. Currently there is no
differentiation between coherent and non-coherent masters and slaves.
The classification as master/slave also involves splitting the dual
role port of the bus into a master and slave port and updating all the
system assembly scripts to use the appropriate port. Similarly, the
interrupt devices have to have their int_port split into a master and
slave port. The intdev and its children have minimal changes to
facilitate the extra port.
Note that this patch does not enforce any port typing in the C++
world, it merely ensures that the Python objects have a notion of the
port roles and are connected in an appropriate manner. This check is
carried when two ports are connected, e.g. bus.master =
memory.port. The following patches will make use of the
classifications and specialise the C++ ports into masters and slaves.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
The delayed commit flag is used in conjunction with interrupt pending flag to
figure out whether or not fetch stage should get more instructions. This patch
clears this flag when instructions are squashed. Also, in case an interrupt is
pending, currently it is not possible to access the instruction cache. This
patch allows accessing the cache in case this flag is set.
The condition for handling interrupts is to check whether or not the cpu's
instruction list is empty. As observed, this can lead to cases in which even
though the instruction list is empty, interrupts are handled when they should
not be. The condition is being strengthened so that interrupts get handled only
when the last committed microop did not had IsDelayedCommit set.
This patch adds a function to the ROB that will get the squashing instruction
from the ROB's list of instructions. This squashing instruction is used for
figuring out the macroop from which the fetch stage should fetch the microops.
Further, a check has been added that if the instructions are to be fetched
from the cache maintained by the fetch stage, then the data in the cache should
be valid and the PC of the thread being fetched from is same as the address of
the cache block.
This pointer was only being stored in code that came from SE mode. The system
pointer is always meaningful and available, so it should always be stored.
Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
This patch is a trivial simplification, removing the cpu pointer from
SimpleThread and relying on the baseCpu pointer in ThreadState. The
patch does not add or change any functionality, it merely cleans up
the code.
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU. These changes have only been tested with the ARM ISA. Other
ISAs potentially require modification.
This patch cleans up forward declarations and a member-function
prototype that still referred to the old FunctionalPort, VirtualPort
and TranslatingPort. There is no change in functionality.
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system. By default, the x86 architecture will have TSO.
This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
This patch removes the inheritance of EventManager from the ports and
moves all responsibility for event queues to the owner. Eventually the
event manager should be the interface block, which could either be the
structural owner or a subblock like a LSQ in the O3 CPU for example.
This patch performs minimal changes to move the instruction and data
ports from specialised subclasses to the base CPU (to the largest
degree possible). Ultimately it servers to make the CPU(s) have a
well-defined interface to the memory sub-system.
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.
The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy
--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
Adaptations to make gem5 compile and run on OSX 10.7.2, with a stock
gcc 4.2.1 and the remaining dependencies from macports, i.e. python
2.7,.2 swig 2.0.4, mercurial 2.0. The changes include an adaptation of
the SConstruct to handle non-library linker flags, and Darwin-specific
code to find the memory usage of gem5. A number of Ruby files relied
on ambigious uint (without the 32 suffix) which caused compilation
errors.
There are two lines in O3CPU.py that set the dcache and icache
tgts_per_mshr to 20, ignoring any pre-configured value of tgts_per_mshr.
This patch removes these hardcoded lines from O3CPU.py and sets the default
L1 cache mshr targets to 20.
--HG--
extra : rebase_source : 6f92d950e90496a3102967442814e97dc84db08b