Don't assert that the response packet is marked as a response
since it won't always be so for functional accesses.
Also cleanup code to refer to functional accesses rather
than "probes" (old terminology), and mention in the
DPRINTF which type of access we're doing.
Without this flag set, page-crossing requests were not split into two mem
request.
Depending on the alignment bit in the SCTLR, misaligned access could
raise a fault. However it seems unnecessary to implement that.
This fault can used to flush the pipe, not including the faulting instruction.
The particular case I needed this was for a self-modifying code. It needed to
drain the store queue and force the following instruction to refetch from
icache. DCCMVAC cp15 mcr instruction is modified to raise this fault.
When decoding a srs instruction, invalid mode encoding returns invalid instruction.
This can happen when garbage instructions are fetched from mispredicted path
Allow some loads that update the base register to use just two micro-ops. three
micro-ops are only used if the destination register matches the offset register
or the PC is the destination regsiter. If the PC is updated it needs to be
the last micro-op otherwise O3 will mispredict.
inUserMode now can take either a threadcontext or a CPSR value directly. If
given a thread context it just extracts the CPSR and calls the other version.
An inPrivelegedMode function was also implemented which just returns the
opposite of inUserMode.
This is to help tidy up arch/x86. These files should not be used external to
the ISA.
--HG--
rename : src/arch/x86/apicregs.hh => src/arch/x86/regs/apic.hh
rename : src/arch/x86/floatregs.hh => src/arch/x86/regs/float.hh
rename : src/arch/x86/intregs.hh => src/arch/x86/regs/int.hh
rename : src/arch/x86/miscregs.hh => src/arch/x86/regs/misc.hh
rename : src/arch/x86/segmentregs.hh => src/arch/x86/regs/segment.hh
This single parameter replaces the collection of bools that set up various
flavors of microops. A flag parameter also allows other flags to be set like
the serialize before/after flags, etc., without having to change the
constructor.
Since miscellaneous registers bypass wakeup logic, force serialization
to resolve data dependencies through them
* * *
ARM: adding non-speculative/serialize flags for instructions change CPSR
THis allows the CPU to handle predicated-false instructions accordingly.
This particular patch makes loads that are predicated-false to be sent
straight to the commit stage directly, not waiting for return of the data
that was never requested since it was predicated-false.
This allows one two different OS requirements for the same ISA to be handled.
Some OSes are compiled for a virtual address and need to be loaded into physical
memory that starts at address 0, while other bare metal tools generate
images that start at address 0.
This was being done in read(), but if readBytes was called directly it
wouldn't happen. Also, instead of setting the memory blob being read to -1
which would (I believe) require using memset with -1 as a parameter, this now
uses bzero. It's hoped that it's more specialized behavior will make it
slightly faster.
bkt size isn't evenly divisible by max-min and it would round down,
it's possible to sample a distribution and have no place to put the sample.
When this case occured the simulator would assert.
This patch allows messages to be stalled in their input buffers and wait
until a corresponding address changes state. In order to make this work,
all in_ports must be ranked in order of dependence and those in_ports that
may unblock an address, must wake up the stalled messages. Alot of this
complexity is handled in slicc and the specification files simply
annotate the in_ports.
--HG--
rename : src/mem/slicc/ast/CheckAllocateStatementAST.py => src/mem/slicc/ast/StallAndWaitStatementAST.py
rename : src/mem/slicc/ast/CheckAllocateStatementAST.py => src/mem/slicc/ast/WakeUpDependentsStatementAST.py
Patch allows each individual message buffer to have different recycle latencies
and allows the overall recycle latency to be specified at the cmd line. The
patch also adds profiling info to make sure no one processor's requests are
recycled too much.
The main purpose for clearing stats in the unserialize process is so
that the profiler can correctly set its start time to the unserialized
value of curTick.
This patch allows one to disable migratory sharing for those cache blocks that
are accessed by atomic requests. While the implementations are different
between the token and hammer protocols, the motivation is the same. For
Alpha, LLSC semantics expect that normal loads do not unlock cache blocks that
have been locked by LL accesses. Therefore, locked blocks should not transfer
write permissions when responding to these load requests. Instead, only they
only transfer read permissions so that the subsequent SC access can possibly
succeed.
Added drain functions to the RTC and 8254 timer so that periodic interrupts
stop when the system is draining. This patch is needed to checkpoint in
timing mode. Otherwise under certain situations, the event queue will never
be completely empty.
This patch fixes several bugs related to previous inconsistent assumptions on
how many tokens the Owner had. Mike Marty should have fixes these bugs years
ago. :)
Previously, the MOESI_hammer protocol calculated the same latency for L1 and
L2 hits. This was because the protocol was written using the old ruby
assumption that L1 hits used the sequencer fast path. Since ruby no longer
uses the fast-path, the protocol delays L2 hits by placing them on the
trigger queue.
The previous slower ruby latencies created a mismatch between the faster M5
cpu models and the much slower ruby memory system. Specifically smp
interrupts were much slower and infrequent, as well as cpus moving in and out
of spin locks. The result was many cpus were idle for large periods of time.
These changes fix the latency mismatch.
This patch adds back to ruby the capability to understand the response time
for messages that hit in different levels of the cache heirarchy.
Specifically add support for the MI_example, MOESI_hammer, and MOESI_CMP_token
protocols.
This patch adds DMA testing to the Memtester and is inherits many changes from
Polina's old tester_dma_extension patch. Since Ruby does not work in atomic
mode, the atomic mode options are removed.
Replace direct call to unserialize() on each SimObject with a pair of
calls for better control over initialization in both ckpt and non-ckpt
cases.
If restoring from a checkpoint, loadState(ckpt) is called on each
SimObject. The default implementation simply calls unserialize() if
there is a corresponding checkpoint section, so we get backward
compatibility for existing objects. However, objects can override
loadState() to get other behaviors, e.g., doing other programmed
initializations after unserialize(), or complaining if no checkpoint
section is found. (Note that the default warning for a missing
checkpoint section is now gone.)
If not restoring from a checkpoint, we call the new initState() method
on each SimObject instead. This provides a hook for state
initializations that are only required when *not* restoring from a
checkpoint.
Given this new framework, do some cleanup of LiveProcess subclasses
and X86System, which were (in some cases) emulating initState()
behavior in startup via a local flag or (in other cases) erroneously
doing initializations in startup() that clobbered state loaded earlier
by unserialize().
The separate restoreCheckpoint() call is gone; just pass
the checkpoint dir as an optional arg to instantiate().
This change is a precursor to some more extensive
reworking of the startup code.
The old code for handling SimObject children was kind of messy,
with children stored both in _values and _children, and
inconsistent and potentially buggy handling of SimObject
vectors. Now children are always stored in _children, and
SimObject vectors are consistently handled using the
SimObjectVector class.
Also, by deferring the parenting of SimObject-valued parameters
until the end (instead of doing it at assignment), we eliminate
the hole where one could assign a vector of SimObjects to a
parameter then append to that vector, with the appended objects
never getting parented properly.
This patch induces small stats changes in tests with data races
due to changes in the object creation & initialization order.
The new code does object vectors in order and so should be more
stable.
Orphan SimObjects (not in the config hierarchy) could get
created implicitly if they have a port connection to a SimObject
that is in the hierarchy. This means that there are objects on
the C++ SimObject list (created via the C++ SimObject
constructor call) that are unknown to Python and will get
skipped if we walk the hierarchy from the Python side (as we are
about to do). This patch detects this situation and prints an
error message.
Also fix the rubytester config script which happened to rely on
this behavior.
Enforce that the Python Root SimObject is instantiated only
once. The C++ Root object already panics if more than one is
created. This change avoids the need to track what the root
object is, since it's available from Root.getInstance() (if it
exists). It's now redundant to have the user pass the root
object to functions like instantiate(), checkpoint(), and
restoreCheckpoint(), so that arg is gone. Users who use
configs/common/Simulate.py should not notice.
Clean up some minor things left over from the default responder
change in rev 9af6fb59752f. Mostly renaming the 'responder_set'
param to 'use_default_range' to actually reflect what it does...
old name wasn't that descriptive in the first place, but now
it really doesn't make sense at all.
Also got rid of the bogus obsolete assignment to 'bus.responder'
which used to be a parameter but now is interpreted as an
implicit child assignment, and which was giving me problems in
the config restructuring to come. (A good argument for not
allowing implicit child assignments, IMO, but that's water under
the bridge, I'm afraid.)
Also moved the Bus constructor to the .cc file since that's
where it should have been all along.
printMemData is only used in DPRINTFs. If those are removed by compiling
m5.fast, that function is unused, gcc generates a warning, that gets turned
into an error, and the build fails. This change surrounds the function
definition with #if TRACING_ON so it only gets compiled in if the DPRINTFs do
to.
When a request is NO_ACCESS (x86 CDA microinstruction), the memory op
doesn't go to the cache, so TimingSimpleCPU::completeDataAccess needs
to handle the case where the current status of the CPU is Running
and not DcacheWaitResponse or DTBWaitResponse
switching between O3 and another CPU, O3's tick event might still be scheduled
in the event queue (as squashed). Therefore, check for a squashed tick event
as well as a non-scheduled event when taking over from another CPU and deal
with it accordingly.
It would be nice if python had a tree class that would do this for real,
but since we don't, we'll just keep a sorted list of keys and update
it on demand.
If the user sets the environment variable M5_OVERRIDE_PY_SOURCE to
True, then imports that would normally find python code compiled into
the executable will instead first check in the absolute location where
the code was found during the build of the executable. This only
works for files in the src (or extras) directories, not automatically
generated files.
This is a developer feature!
This tidbit was pulled from a larger patch for Tim's sake, so
the comment reflects functions that haven't been exported yet.
I hope to commit them soon so it didn't seem worth cleaning up.
m5 doesnt do stats specific to binary and this resource request stat is probably only
useful for people who really know the ins/outs of the model anyway
replace priority queue with vector of lists(1 list per stage) and place inside a class
so that we have more control of when an instruction uses a particular schedule entry
...
also, this is the 1st step toward making the InOrderCPU fully parameterizable. See the
wiki for details on this process
- use InOrderBPred instead of Resource for DPRINTFs
- account for DELAY SLOT in updating RAS and in squashing
- don't let squashed instructions update the predictor
- the BTB needs to use the ASID not the TID to work for multithreaded programs
- add stats for BTB hits
Requires new "SCUpgradeReq" message that marks upgrades
for store conditionals, so downstream caches can fail
these when they run into invalidations.
See http://www.m5sim.org/flyspray/task/197
Only set the dirty bit when we actually write to a block
(not if we thought we might but didn't, as in a failed
SC or CAS). This requires makeing sure the dirty bit
stays set when we get an exclusive (writable) copy
in a cache-to-cache transfer from another owner, which
n turn requires copying the mem-inhibit flag from
timing-mode requests to their associated responses.
One big difference is that PrioHeap puts the smallest element at the
top of the heap, whereas stl puts the largest element on top, so I
changed all comparisons so they did the right thing.
Some usage of PrioHeap was simply changed to a std::vector, using sort
at the right time, other usage had me just use the various heap functions
in the stl.
This was somewhat tricky because the RefCnt API was somewhat odd. The
biggest confusion was that the the RefCnt object's constructor that
took a TYPE& cloned the object. I created an explicit virtual clone()
function for things that took advantage of this version of the
constructor. I was conservative and used clone() when I was in doubt
of whether or not it was necessary. I still think that there are
probably too many instances of clone(), but hopefully not too many.
I converted several instances of const MsgPtr & to a simple MsgPtr.
If the function wants to avoid the overhead of creating another
reference, then it should just use a regular pointer instead of a ref
counting ptr.
There were a couple of instances where refcounted objects were created
on the stack. This seems pretty dangerous since if you ever
accidentally make a reference to that object with a ref counting
pointer, bad things are bound to happen.
Expand the help text on the --remote-gdb-port option so
people know you can use it to disable remote gdb without
reading the source code, and thus don't waste any time
trying to add a separate option to do that.
Clean up some gdb-related cruft I found while looking
for where one would add a gdb disable option, before
I found the comment that told me that I didn't need
to do that.
Spec2k benchmarks seem to run with atomic or timing mode simple
CPUs. Fixed up some constants, handling of 64 bit arguments,
and marked a few more syscalls ignoreFunc.
This will help keep the high level decode together and not have it spread into
the subordinate decode stuff. The ##include lines still need to be on a line
by themselves, though.