This patch clarifies the packet timings annotated
when going through a crossbar.
The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.
The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.
For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.
This patch adds some much-needed clarity in the specification of the
cache timing. For now, hit_latency and response_latency are kept as
top-level parameters, but the cache itself has a number of local
variables to better map the individual timing variables to different
behaviours (and sub-components).
The introduced variables are:
- lookupLatency: latency of tag lookup, occuring on any access
- forwardLatency: latency that occurs in case of outbound miss
- fillLatency: latency to fill a cache block
We keep the existing responseLatency
The forwardLatency is used by allocateInternalBuffer() for:
- MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer);
- MSHR allocateMissBuffer (cacheable miss in MSHR queue);
- MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR
queue)
It is our assumption that the time for the above three buffers is the
same. Similarly, for snoop responses passing through the cache we use
forwardLatency.
The m5format command didn't actually work due to parameter handling
issues and missing language detection. This changeset fixes those
issues and cleans up some of the code to shared between the style
checker and the format checker.
The style used to support the option -w to automatically fix white
space issues. However, this option was actually wired up to fix all
styles issues the checker encountered. This changeset cleans up the
code that handles automatic fixing and adds an option to fix all
issues, and separate options for white spaces and include ordering.
This patch revamps the memtest example script and allows for the
insertion of testers at any level in the cache hierarchy. Previously
all created topologies placed testers only at the very top, and the
tree was thus entirely symmetric. With the changes made, it is possible
to not only place testers at the leaf caches (L1), but also to connect
testers at the L2, L3 etc.
As part of the changes the object hierarchy is also simplified to
ensure that the visual representation from the DOT printing looks
sensible. Using SubSystems to group the objects is one of the key
features.
The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.
The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.
In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.
--HG--
rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py
rename : src/sim/tlb.cc => src/arch/generic/tlb.cc
rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
Gcc and clang both provide an attribute that can be used to flag a
function as deprecated at compile time. This changeset adds a gem5
compiler macro for that compiler feature. The macro can be used to
indicate that a legacy API within gem5 has been deprecated and provide
a graceful migration to the new API.
This patch fixes the CompoundFlag constructor, ensuring that it does
not dereference NULL. Doing so has undefined behaviuor, and both clang
and gcc's undefined-behaviour sanitiser was rather unhappy.
The Platform base class contains a pointer to an instance of the
System which is never initialized. This can lead to subtle bugs since
some architecture-specific platform implementations contain their own
system pointer which is normally used. However, if the platform is
accessed through a pointer to its base class, the dangling pointer
will be used instead.
Although you can put a list of colon-separated directory names
in M5_PATH, the current code just takes the first one that
exists and assumes all files must live there. This change
makes the code search the specified list of directories
for each individual binary or disk image that's requested.
The main motivation is that the x86/Alpha binaries and the
ARM binaries are in separate downloads, and thus naturally
end up in separate directories. With this change, you can
have M5_PATH point to those two directories, then run any
FS regression test without changing M5_PATH. Currently,
you either have to merge the two download directories
or change M5_PATH (or do something else I haven't figured out).
This patch adds a bit of clarification around the assumptions made in
the cache when packets are sent out, and dirty responses are
pending. As part of the change, the marking of an MSHR as in service
is simplified slightly, and comments are added to explain what
assumptions are made.
This patch uses the recently added XOR hashing capabilities for the
DRAM channel interleaving. This avoids channel biasing due to strided
access patterns.
This patch extends the current address interleaving with basic hashing
support. Instead of directly comparing a number of address bits with a
matching value, it is now possible to use two independent set of
address bits XOR'ed together. This avoids issues where strided address
patterns are heavily biased to a subset of the interleaved ranges.
This patch changes the DRAM channel interleaving default behaviour to
be more representative. The default address mapping (RoRaBaCoCh) moves
the channel bits towards the least significant bits, and uses 128 byte
as the default channel interleaving granularity.
These defaults can be overridden if desired, but should serve as a
sensible starting point for most use-cases.
As of August 2014, the gem5 style guide mandates that a source file's
primary header is included first in that source file. This helps to
ensure that the header file does not depend on include file ordering
and avoids surprises down the road when someone tries to reuse code.
In the new order, include files are grouped into the following blocks:
* Primary header file (e.g., foo.hh for foo.cc)
* Python headers
* C system/stdlib includes
* C++ stdlib includes
* Include files in the gem5 source tree
Just like before, include files within a block are required to be
sorted in alphabetical order.
This changeset updates the style checker to enforce the new order.
The method Event::initialized() tests if this != NULL as a part of the
expression that tests if an event is initialized. The only case when
this check could be false is if the method is called on a null
pointer, which is illegal and leads to undefined behavior (such as
eating your pets) according to the C++ standard. Because of this,
modern compilers (specifically, recent versions of clang) warn about
this which we treat as an error. This changeset removes the redundant
check to fix said warning.
Work around a bug in scons that causes the param wrappers being
compiled twice. The easiest way for us to do so is to tell scons to
ignore implicit command dependencies.
This patch changes how the timing CPU deals with processing responses,
always scheduling an event, even if it is for the current tick. This
helps to avoid situations where a new request shows up before a
response is finished in the crossbar, and also is more in line with
any realistic behaviour.
The Float param was not settable on the command line
due to a typo in the class definition in
python/m5/params.py. This corrects the typo and allows
floats to be set on the command line as intended.
Fix the makeArmSystem routine to reflect recent changes that support kernel
commandline option when running android. Without this fix, trying to run
android encounters a 'reference before assignment' error.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
While the IsFirstMicroop flag exists it was only occasionally used in the ARM
instructions that gem5 microOps and therefore couldn't be relied on to be correct.
When gem5 is a slave to another simulator and the Python is only used
to initialize the configuration (and not perform actual simulation), a
"debug start" (--debug-start) event will get freed during or immediately
after the initial Python frame's execution rather than remaining in the
event queue. This tricky patch fixes the GC issue causing this.
This patch takes the final step in removing the src and dest fields in
the packet. These fields were rather confusing in that they only
remember a single multiplexing component, and pushed the
responsibility to the bridge and caches to store the fields in a
senderstate, thus effectively creating a stack. With the recent
changes to the crossbar response routing the crossbar is now
responsible without relying on the packet fields. Thus, these
variables are now unused and can be removed.
This patch removes the source field from the ForwardResponseRecord,
but keeps the class as it is part of how the cache identifies
responses to hardware prefetches that are snooped upwards.
This patch aligns how the response routing is done in the RubyPort,
using the SenderState for both memory and I/O accesses. Before this
patch, only the I/O used the SenderState, whereas the memory accesses
relied on the src field in the packet. With this patch we shift to
using SenderState in both cases, thus not relying on the src field any
longer.
This patch removes the need for a source and destination field in the
packet by shifting the onus of the tracking to the crossbar, much like
a real implementation. This change in behaviour also means we no
longer need a SenderState to remember the source/dest when ever we
have multiple crossbars in the system. Thus, the stack that was
created by the SenderState is not needed, and each crossbar locally
tracks the response routing.
The fields in the packet are still left behind as the RubyPort (which
also acts as a crossbar) does routing based on them. In the succeeding
patches the uses of the src and dest field will be removed. Combined,
these patches improve the simulation performance by roughly 2%.
This patch fixes a minor issue in the X86 page table walker where it
ended up sending new request packets to the crossbar before the
response processing was finished (recvTimingResp is directly calling
sendTimingReq). Under certain conditions this caused the crossbar to
see illegal combinations of request/response overlap, in turn causing
problems with a slightly modified crossbar implementation.
This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.
DMA Controller was not being connected to the network for the MESI_Three_Level
protocol as was being done in the other protocol config files. Without this
patch, this protocol segfaults during startup.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
The ppCommit should notify the attached listener every time the cpu commits
a microop or non microcoded insturction. The listener can then decide
whether it will process only the last microop (eg. SimPoint probe).
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch removes the three MIPS and SPARC regressions that use the
deprecated InOrderCPU.
This is the first step in completely removing the code from the tree,
avoiding confusion, and focusing all development efforts on the
MinorCPU. Brave new world.
This patch fixes a bug where the DRAM controller tried to access the
system cacheline size before the system pointer was initialised. It
also fixes a bug where the granularity is 0 (no interleaving).