Maintain all information about an instruction's fault in the DynInst object rather
than any cpu-request object. Also, if there is a fault during the execution stage
then just save the fault inside the instruction and trap once the instruction
tries to graduate
Give fetch unit it's own parameterizable fetch buffer to read from. Very inefficient
(architecturally and in simulation) to continually fetch at the granularity of the
wordsize. As expected, the number of fetch memory requests drops dramatically
instead of having one cache-unit class be responsible for both data and code
accesses, separate code that is just for fetch in it's own derived class off the
original base class. This makes the code easier to manage as well as handle
future cases of special fetch handling
allow the user to specify how many instructions a pipeline stage can process
on any given cycle (stageWidth...i.e.bandwidth) by setting the parameter through
the python interface rather than compile the code after changing the *.cc file.
(we always had the parameter there, but still used the static 'ThePipeline::StageWidth'
instead)
-
Since StageWidth is now dynamically defined, change the interstage communication
structure to use a vector and get rid of array and array handling index (toNextStageIndex)
since we can just make calls to the list for the same information
use skidbuffer as only location for instructions between stages. before,
we had the insts queue from the prior stage and the skidbuffer for the
current stage, but that gets confusing and this consolidation helps
when handling squash cases
manage insertion and deletion like a queue but will need
access to internal elements for future changes
Currently, skidbuffer manages any instruction that was
in a stage but could not complete processing, however
we will want to manage all blocked instructions (from prev stage
and from cur. stage) in just one buffer.
Previous code was marking CPU activity on almost every cycle due to a bug in
tracking the status of pipeline stages. This disables the CPU from sleeping
on long latency stalls and increases simulation time
This makes sure that the address ranges requested for caches and uncached ports
don't conflict with each other, and that accesses which are always uncached
(message signaled interrupts for instance) don't waste time passing through
caches.
The disk image to use was always being forced to a particular value. This
change changes what disk image is selected as the default based on the
architecture being built. In the future, a more sophisticated system might be
used that selected a path based on certain rules instead of relying on one off
file names.
Moving the definition of NoFault into fault.hh doesn't bring any new
dependencies with it, and allows some files to include just fault.hh which has
less baggage. NoFault will still be available to everything that includes
faults.hh because it includes fault.hh.
M5 skips over any simulated time where it doesn't have any work to do. When
the simulation is active, the time skipped is short and the work done at any
point in time is relatively substantial. If the time between events is long
and/or the work to do at each event is small, it's possible for simulated time
to pass faster than real time. When running a benchmark that can be good
because it means the simulation will finish sooner in real time. When
interacting with the real world through, for instance, a serial terminal or
bridge to a real network, this can be a problem. Human or network response time
could be greatly exagerated from the perspective of the simulation and make
simulated events happen "too soon" from an external perspective.
This change adds the capability to force the simulation to run no faster than
real time. It does so by scheduling a periodic event that checks to see if
its simulated period is shorter than its real period. If it is, it stalls the
simulation until they're equal. This is called time syncing.
A future change could add pseudo instructions which turn time syncing on and
off from within the simulation. That would allow time syncing to be used for
the interactive parts of a session but then turned off when running a
benchmark using the m5 utility program inside a script. Time syncing would
probably not happen anyway while running a benchmark because there would be
plenty of work for M5 to do, but the event overhead could be avoided.
Any change of control flow now resets the itstate to 0 mask and 0 condition,
except where the control flow alteration write into the cpsr register. These
case, for example return from an iterrupt, require the predecoder to recover
the itstate.
As there is a window of opportunity between the return from an interrupt
changing the control flow at the head of the pipe and the commit of the update
to the CPSR, the predecoder needs to be able to grab the ITstate early. This
is now handled by setting the forcedItState inside a PCstate for the control
flow altering instruction.
That instruction will have the correct mask/cond, but will not have a valid
itstate until advancePC is called (note this happens to advance the execution).
When the new PCstate is copy constructed it gets the itstate cond/mask, and
upon advancing the PC the itstate becomes valid.
Subsequent advancing invalidates the state and zeroes the cond/mask. This is
handled in isolation for the ARM ISA and should have no impact on other ISAs.
Refer arch/arm/types.hh and arch/arm/predecoder.cc for the details.
Without this change 0 is always used for the youngest sequence number if
a squash occured and the ROB was empty (E.g. an instruction is marked
serializeAfter or a fetch stall prevents other instructions from issuing).
Using 0 there is a race to rename where an instruction that committed the
same cycle as the squashing instruction can have it's renamed state undone
by the squash using sequence number 0.
I'm not positive this is the correct fix, but it's working right now.
Either we need to do something like this, prevent the misc reg from being renamed at all,
or there something else going on. We need to find the root cause as to why
this is only a problem sometimes.
The squash inside the fetch unit should not attempt to remove them from the
branch predictor as non-control instructions are not pushed into the predictor.
When this condition occurs the cpu should restart the fetch stage to fetch from
the original execution path. Fault handling in the commit stage is cleaned up a
little bit so the control flow is simplier. Finally, if an instruction is being
used to carry a fault it isn't executed, so the fault propagates appropriately.