Commit graph

564 commits

Author SHA1 Message Date
Andreas Hansson 407233f5d8 mem: Avoid using invalid iterator in cache lock list traversal
Fix up issue highlighted by Valgrind and the clang Address Sanitizer.
2016-02-15 03:40:04 -05:00
Andreas Hansson 83a5977481 mem: Be less conservative in clearing load locks in the cache
Avoid being overly conservative in clearing load locks in the cache,
and allow writes to the line if they are from the same context. This
is in line with ALPHA and ARM.
2016-02-10 04:08:25 -05:00
Andreas Hansson 92f021cbbe mem: Move the point of coherency to the coherent crossbar
This patch introduces the ability of making the coherent crossbar the
point of coherency. If so, the crossbar does not forward packets where
a cache with ownership has already committed to responding, and also
does not forward any coherency-related packets that are not intended
for a downstream memory controller. Thus, invalidations and upgrades
are turned around in the crossbar, and the memory controller only sees
normal reads and writes.

In addition this patch moves the express snoop promotion of a packet
to the crossbar, thus allowing the downstream cache to check the
express snoop flag (as it should) for bypassing any blocking, rather
than relying on whether a cache is responding or not.
2016-02-10 04:08:25 -05:00
Andreas Hansson f84ee031cc mem: Align cache behaviour in atomic when upstream is responding
Adopt the same flow as in timing mode, where the caches on the path to
memory get to keep the line (if present), and we use the
responderHadWritable flag to determine if we need to forward the
(invalidating) packet or not.
2016-02-10 04:08:24 -05:00
Andreas Hansson 986214f181 mem: Align how snoops are handled when hitting writebacks
This patch unifies the snoop handling in case of hitting writebacks
with how we handle snoops hitting in the tags. As a result, we end up
using the same optimisation as the normal snoops, where we inform the
downstream cache if we encounter a line in Modified (writable and
dirty) state, which enables us to avoid sending out express snoops to
invalidate any Shared copies of the line. A few regressions
consequently change, as some transactions are sunk higher up in the
cache hierarchy.
2016-02-10 04:08:24 -05:00
Andreas Hansson fbdeb60316 mem: Deduce if cache should forward snoops
This patch changes how the cache determines if snoops should be
forwarded from the memory side to the CPU side. Instead of having a
parameter, the cache now looks at the port connected on the CPU side,
and if it is a snooping port, then snoops are forwarded. Less error
prone, and less parameters to worry about.

The patch also tidies up the CPU classes to ensure that their I-side
port is not snooping by removing overrides to the snoop request
handler, such that snoop requests will panic via the default
MasterPort implement
2016-02-10 04:08:24 -05:00
Steve Reinhardt 5592798865 style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'.
2016-02-06 17:21:19 -08:00
Steve Reinhardt 6caa2c9b4e mem: add CacheVerbose debug flag, filter noisy DPRINTFs
Some of the DPRINTFs added to the classic cache in cset 45df88079f04,
while useful to those unfamiliar with the cache code, end up being
noise when you're familiar with the code but are trying to debug tricky
protocol issues.  (Particularly getting two messages from each cache
as it receives a snoop request then declares that there was no match.)

This patch introduces a CacheVerbose debug flag, and moves a subset of
the added DPRINTFs into that category, so that Cache by itself returns
to being a more succinct summary of cache activity.

Also added a CacheAll compound flag to turn on all the cache-related
debug flags (other than CacheTags, which you *really* have to want badly
to turn it on, IMO).
2015-12-31 09:32:09 -08:00
Andreas Hansson 7fca994d04 mem: Do not allocate space for packet data if not needed
This patch looks at the request and response command to determine if
either actually has any data payload, and if not, we do not allocate
any space for packet data.

The only tricky case is where the command type is changed as part of
the MSHR functionality. In these cases where the original packet had
no data, but the new packet does, we need to explicitly call
allocate().
2015-12-31 09:33:39 -05:00
Andreas Hansson f1ec326be5 mem: Do not alter cache block state on uncacheable snoops
This patch ensures we do not respond with a Modified (dirty and
writable) line if the request is uncacheable, and that the cache
responding retains the line without modifying the state (even if
responding).
2015-12-31 09:33:25 -05:00
Andreas Hansson 0fcb376e5f mem: Make cache terminology easier to understand
This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.

The following name changes are made:

* the packet memInhibit flag is renamed to cacheResponding

* the packet sharedAsserted flag is renamed to hasSharers

* the packet NeedsExclusive attribute is renamed to NeedsWritable

* the packet isSupplyExclusive is renamed responderHadWritable

* the MSHR pendingDirty is renamed to pendingModified

The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.
2015-12-31 09:32:58 -05:00
Andreas Hansson f5c4a45889 mem: Explicitly check MSHR snoops for cases not dealt with
Add a sanity check to make it explicit that we currently do not allow
an I/O coherent agent to directly issue writes into the coherent part
of the memory system (it has to go via a cache, and get transformed
into a read ex, upgrade or invalidation).
2015-12-28 11:14:18 -05:00
Andreas Hansson f6525ff221 mem: Remove unused cache squash functionality
This patch removes the unused squash function from the MSHR queue, and
the associated (and also unused) threadNum member from the MSHR.
2015-12-28 11:14:16 -05:00
Andreas Hansson fbf3987c7b mem: Avoid unecessary checks when creating HardPFReq in cache
The checks made before sending out a HardPFReq were unecessarily
complex, and checked for cases that never occur. This patch
tidies it up.
2015-12-28 11:14:15 -05:00
Andreas Hansson b93a9d0d51 mem: Do not use sender state to track forwarded snoops in cache
This patch changes how the cache tracks which snoops are forwarded,
and which ones are created locally. Previously the identification was
based on an empty sender state of a specific class, but this method
fails to distinguish which cache actually attached the sender
state. Instead we use the same mechanism as the crossbar, and keep
track of the requests that have outstanding snoops.
2015-12-28 11:14:14 -05:00
Andreas Hansson 036263e280 mem: Fix cache sender state handling and add clarification
This patch addresses a bug in how the cache attached the MSHR as a
sender state. Rather than overwriting any existing sender state it now
pushes a new one. The handling of upward snoops is also clarified.
2015-12-28 11:14:10 -05:00
Andreas Hansson 97887eb6dc mem: Fix memory allocation bug in deferred snoop handling
This patch fixes a corner case in the deferred snoop handling, where
requests ended up being used by multiple packets with different
lifetimes, and inadvertently got deleted while they were still in use.
2015-12-17 17:07:11 -05:00
Andreas Sandberg 2a6fe97092 arm: Add missing explicit overrides for classic caches
Make clang when compiling on OSX.
2015-11-15 21:28:00 +00:00
Andreas Hansson 7433d77fcf mem: Add an option to perform clean writebacks from caches
This patch adds the necessary commands and cache functionality to
allow clean writebacks. This functionality is crucial, especially when
having exclusive (victim) caches. For example, if read-only L1
instruction caches are not sending clean writebacks, there will never
be any spills from the L1 to the L2. At the moment the cache model
defaults to not sending clean writebacks, and this should possibly be
re-evaluated.

The implementation of clean writebacks relies on a new packet command
WritebackClean, which acts much like a Writeback (renamed
WritebackDirty), and also much like a CleanEvict. On eviction of a
clean block the cache either sends a clean evict, or a clean
writeback, and if any copies are still cached upstream the clean
evict/writeback is dropped. Similarly, if a clean evict/writeback
reaches a cache where there are outstanding MSHRs for the block, the
packet is dropped. In the typical case though, the clean writeback
allocates a block in the downstream cache, and marks it writable if
the evicted block was writable.

The patch changes the O3_ARM_v7a L1 cache configuration and the
default L1 caches in config/common/Caches.py
2015-11-06 03:26:43 -05:00
Andreas Hansson 654266f39c mem: Add cache clusivity
This patch adds a parameter to control the cache clusivity, that is if
the cache is mostly inclusive or exclusive. At the moment there is no
intention to support strict policies, and thus the options are: 1)
mostly inclusive, or 2) mostly exclusive.

The choice of policy guides the behaviuor on a cache fill, and a new
helper function, allocOnFill, is created to encapsulate the decision
making process. For the timing mode, the decision is annotated on the
MSHR on sending out the downstream packet, and in atomic we directly
pass the decision to handleFill. We (ab)use the tempBlock in cases
where we are not allocating on fill, leaving the rest of the cache
unaffected. Simple and effective.

This patch also makes it more explicit that multiple caches are
allowed to consider a block writable (this is the case
also before this patch). That is, for a mostly inclusive cache,
multiple caches upstream may also consider the block exclusive. The
caches considering the block writable/exclusive all appear along the
same path to memory, and from a coherency protocol point of view it
works due to the fact that we always snoop upwards in zero time before
querying any downstream cache.

Note that this patch does not introduce clean writebacks. Thus, for
clean lines we are essentially removing a cache level if it is made
mostly exclusive. For example, lines from the read-only L1 instruction
cache or table-walker cache are always clean, and simply get dropped
rather than being passed to the L2. If the L2 is mostly exclusive and
does not allocate on fill it will thus never hold the line. A follow
on patch adds the clean writebacks.

The patch changes the L2 of the O3_ARM_v7a CPU configuration to be
mostly exclusive (and stats are affected accordingly).
2015-11-06 03:26:41 -05:00
Ali Jafri 52c8ae5187 mem: Enforce insertion order on the cache response path
This patch enforces insertion order transmission of packets on the
response path in the cache. Note that the logic to enforce order is
already present in the packet queue, this patch simply turns it on for
queues in the response path.

Without this patch, there are corner cases where a request-response is
faster than a response-response forwarded through the cache. This
violation of queuing order causes problems in the snoop filter leaving
it with inaccurate information. This causes assert failures in the
snoop filter later on.

A follow on patch relaxes the order enforcement in the packet queue to
limit the performance impact.
2015-11-06 03:26:37 -05:00
Andreas Hansson 8e55d51aaa mem: Do not treat CleanEvict as a write operation
This patch changes the CleanEvict command type to not be considered a
write. Initially it was made a zero-sized write to match the writeback
command, but as things developed it became clear that it causes more
problems than it solves. For example, the memory modules (and bridge)
should not consider the CleanEvict as a write, but instead discard
it. With this patch it will be neither a read, nor write, and as it
does not need a response the slave will simply sink it.
2015-11-06 03:26:33 -05:00
Andreas Hansson ac1368df50 mem: Unify delayed packet deletion
This patch unifies how we deal with delayed packet deletion, where the
receiving slave is responsible for deleting the packet, but the
sending agent (e.g. a cache) is still relying on the pointer until the
call to sendTimingReq completes. Previously we used a mix of a
deletion vector and a construct using unique_ptr. With this patch we
ensure all slaves use the latter approach.
2015-11-06 03:26:21 -05:00
Andreas Hansson 2cb5467e85 misc: Appease clang static analyzer
A few minor fixes to issues identified by the clang static analyzer.
2015-11-06 03:26:16 -05:00
Andreas Hansson d8b7a652e1 mem: Clarify cache MSHR handling on fill
This patch addresses the upgrading of deferred targets in the MSHR,
and makes it clearer by explicitly calling out what is happening
(deferred targets are promoted if we get exclusivity without asking
for it).
2015-10-29 08:48:20 -04:00
Andreas Hansson 2ac04c11ac misc: Add explicit overrides and fix other clang >= 3.5 issues
This patch adds explicit overrides as this is now required when using
"-Wall" with clang >= 3.5, the latter now part of the most recent
XCode. The patch consequently removes "virtual" for those methods
where "override" is added. The latter should be enough of an
indication.

As part of this patch, a few minor issues that clang >= 3.5 complains
about are also resolved (unused methods and variables).
2015-10-12 04:08:01 -04:00
Andreas Hansson 22c04190c6 misc: Remove redundant compiler-specific defines
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
2015-10-12 04:07:59 -04:00
Andreas Hansson a9a7002a3b mem: Add check for block status on WriteLineReq fill
More checks to help with understanding of functionality.
2015-09-25 07:26:58 -04:00
Andreas Hansson 012dd52dc2 mem: Fix WriteLineReq fill behaviour
This patch fixes issues in the interactions between deferred snoops
and WriteLineReq. More specifically, the patch addresses an issue
where deferred snoops caused assertion failures when being serviced on
the arrival of an InvalidateResp. The response packet was perceived to
be invalidating, when actually it is not for the cache that sent out
the original invalidation request.
2015-09-25 07:26:58 -04:00
Ali Jafri ceec2bb02c mem: Add snoops for CleanEvicts and Writebacks in atomic mode
This patch mirrors the logic in timing mode which sends up snoops to
check for cached copies before sending CleanEvicts and Writebacks down
the memory hierarchy. In case there is a copy in a cache above,
discard CleanEvicts and set the BLOCK_CACHED flag in Writebacks so
that writebacks do not reset the cache residency bit in the snoop
filter below.
2015-09-25 07:26:57 -04:00
Andreas Hansson 462c288a75 mem: Make the coherent crossbar account for timing snoops
This patch introduces the concept of a snoop latency. Given the
requirement to snoop and forward packets in zero time (due to the
coherency mechanism), the latency is accounted for later.

On a snoop, we establish the latency, and later add it to the header
delay of the packet. To allow multiple caches to contribute to the
snoop latency, we use a separate variable in the packet, and then take
the maximum before adding it to the header delay.
2015-09-25 07:13:54 -04:00
Andreas Hansson 419d437385 mem: Avoid setting markPending if not needed
In cases where a newly added target does not have any upstream MSHR to
mark as downstreamPending, remember that nothing is marked. This
allows us to avoid attempting to find the MSHR as part of the clearing
of downstreamPending.
2015-09-04 13:14:03 -04:00
Andreas Hansson 2c50a83ba2 mem: Tidy up CacheSet
Minor tweaks and house keeping.
2015-09-04 13:14:01 -04:00
Andreas Hansson 76088fb9ca mem: Tidy up the snoop state-transition logic
Remove broken and unused option to pass dirty data on non-exclusive
snoops. Also beef up the comments a bit.
2015-09-04 13:13:58 -04:00
Andreas Hansson 6eb434c8a2 arm, mem: Remove unused CLEAR_LL request flag
Cleaning up dead code. The CLREX stores zero directly to
MISCREG_LOCKFLAG and so the request flag is no longer needed. The
corresponding functionality in the cache tags is also removed.
2015-08-21 07:03:25 -04:00
Andreas Hansson bda79817c8 mem: Remove unused cache squash functionality
Tidying up.
2015-08-21 07:03:24 -04:00
Andreas Hansson ddfa96cf45 mem: Add explicit Cache subclass and make BaseCache abstract
Open up for other subclasses to BaseCache and transition to using the
explicit Cache subclass.

--HG--
rename : src/mem/cache/BaseCache.py => src/mem/cache/Cache.py
2015-08-21 07:03:23 -04:00
Andreas Hansson 1bf389a2bf mem: Move cache_impl.hh to cache.cc
There is no longer any need to keep the implementation in a header.
2015-08-21 07:03:20 -04:00
Andreas Sandberg 53e777d683 base: Declare a type for context IDs
Context IDs used to be declared as ad hoc (usually as int). This
changeset introduces a typedef for ContextIDs and a constant for
invalid context IDs.
2015-08-07 09:59:13 +01:00
Andreas Hansson 6fac40ceb0 mem: Add missing clean eviction on uncacheable access
This patch adds a missing clean eviction, occuring when an uncacheable
access flushes and invalidates an existing block.
2015-07-30 03:42:25 -04:00
Andreas Hansson 540e59fd70 mem: Remove unused RequestCause in cache
This patch removes the RequestCause, and also simplifies how we
schedule the sending of packets through the memory-side port. The
deassertion of bus requests is removed as it is not used.
2015-07-30 03:41:43 -04:00
David Guillen-Fandos 0c89c15b23 mem: Make caches way aware
This patch makes cache sets aware of the way number. This enables
some nice features such as the ablity to restrict way allocation. The
implemented mechanism allows to set a maximum way number to be
allocated 'k' which must fulfill 0 < k <= N (where N is the number of
ways). In the future more sophisticated mechasims can be implemented.
2015-07-30 03:41:42 -04:00
Andreas Hansson 5a18e181ff mem: Transition away from isSupplyExclusive for writebacks
This patch changes how writebacks communicate whether the line is
passed as modified or owned. Previously we relied on the
isSupplyExclusive mechanism, which was originally designed to avoid
unecessary snoops.

For normal cache requests we use the sharedAsserted mechanism to
determine if a block should be marked writeable or not, and with this
patch we transition the writebacks to also use this
mechanism. Conceptually this is cleaner and more consistent.
2015-07-30 03:41:40 -04:00
Andreas Hansson 5902e29e84 mem: Tidy up CacheBlk class
This patch modernises and tidies up the CacheBlk, removing dead code.
2015-07-30 03:41:39 -04:00
Andreas Hansson 5410660919 mem: Fix (ab)use of emplace to avoid temporary object creation 2015-07-13 08:46:28 -04:00
Andreas Sandberg ed38e3432c sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.

This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.

Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
2015-07-07 09:51:05 +01:00
Andreas Sandberg f16c0a4a90 sim: Decouple draining from the SimObject hierarchy
Draining is currently done by traversing the SimObject graph and
calling drain()/drainResume() on the SimObjects. This is not ideal
when non-SimObjects (e.g., ports) need draining since this means that
SimObjects owning those objects need to be aware of this.

This changeset moves the responsibility for finding objects that need
draining from SimObjects and the Python-side of the simulator to the
DrainManager. The DrainManager now maintains a set of all objects that
need draining. To reduce the overhead in classes owning non-SimObjects
that need draining, objects inheriting from Drainable now
automatically register with the DrainManager. If such an object is
destroyed, it is automatically unregistered. This means that drain()
and drainResume() should never be called directly on a Drainable
object.

While implementing the new functionality, the DrainManager has now
been made thread safe. In practice, this means that it takes a lock
whenever it manipulates the set of Drainable objects since SimObjects
in different threads may create Drainable objects
dynamically. Similarly, the drain counter is now an atomic_uint, which
ensures that it is manipulated correctly when objects signal that they
are done draining.

A nice side effect of these changes is that it makes the drain state
changes stricter, which the simulation scripts can exploit to avoid
redundant drains.
2015-07-07 09:51:05 +01:00
Andreas Sandberg e9c3d59aae sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
2015-07-07 09:51:04 +01:00
Andreas Sandberg 76cd4393c0 sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

  * Add a set of APIs to serialize into a subsection of the current
    object. Previously, objects that needed this functionality would
    use ad-hoc solutions using nameOut() and section name
    generation. In the new world, an object that implements the
    interface has the methods serializeSection() and
    unserializeSection() that serialize into a named /subsection/ of
    the current object. Calling serialize() serializes an object into
    the current section.

  * Move the name() method from Serializable to SimObject as it is no
    longer needed for serialization. The fully qualified section name
    is generated by the main serialization code on the fly as objects
    serialize sub-objects.

  * Add a scoped ScopedCheckpointSection helper class. Some objects
    need to serialize data structures, that are not deriving from
    Serializable, into subsections. Previously, this was done using
    nameOut() and manual section name generation. To simplify this,
    this changeset introduces a ScopedCheckpointSection() helper
    class. When this class is instantiated, it adds a new /subsection/
    and subsequent serialization calls during the lifetime of this
    helper class happen inside this section (or a subsection in case
    of nested sections).

  * The serialize() call is now const which prevents accidental state
    manipulation during serialization. Objects that rely on modifying
    state can use the serializeOld() call instead. The default
    implementation simply calls serialize(). Note: The old-style calls
    need to be explicitly called using the
    serializeOld()/serializeSectionOld() style APIs. These are used by
    default when serializing SimObjects.

  * Both the input and output checkpoints now use their own named
    types. This hides underlying checkpoint implementation from
    objects that need checkpointing and makes it easier to change the
    underlying checkpoint storage code.
2015-07-07 09:51:03 +01:00
Andreas Hansson db85ddca1a mem: Delay responses in the crossbar before forwarding
This patch changes how the crossbar classes deal with
responses. Instead of forwarding responses directly and burdening the
neighbouring modules in paying for the latency (through the
pkt->headerDelay), we now queue them before sending them.

The coherency protocol is not affected as requests and any snoop
requests/responses are still passed on in zero time. Thus, the
responses end up paying for any header delay accumulated when passing
through the crossbar. Any latency incurred on the request path will be
paid for on the response side, if no other module has dealt with it.

As a result of this patch, responses are returned at a later
point. This affects the number of outstanding transactions, and quite
a few regressions see an impact in blocking due to no MSHRs, increased
cache-miss latencies, etc.

Going forward we should be able to use the same concept also for snoop
responses, and any request that is not an express snoop.
2015-07-03 10:14:44 -04:00
Andreas Hansson b93c912013 mem: Remove redundant is_top_level cache parameter
This patch takes the final step in removing the is_top_level parameter
from the cache. With the recent changes to read requests and write
invalidations, the parameter is no longer needed, and consequently
removed.

This also means that asymmetric cache hierarchies are now fully
supported (and we are actually using them already with L1 caches, but
no table-walker caches, connected to a shared L2).
2015-07-03 10:14:43 -04:00
Andreas Hansson 71856cfbbc mem: Split WriteInvalidateReq into write and invalidate
WriteInvalidateReq ensures that a whole-line write does not incur the
cost of first doing a read exclusive, only to later overwrite the
data. This patch splits the existing WriteInvalidateReq into a
WriteLineReq, which is done locally, and an InvalidateReq that is sent
out throughout the memory system. The WriteLineReq re-uses the normal
WriteResp.

The change allows us to better express the difference between the
cache that is performing the write, and the ones that are merely
invalidating. As a consequence, we no longer have to rely on the
isTopLevel flag. Moreover, the actual memory in the system does not
see the intitial write, only the writeback. We were marking the
written line as dirty already, so there is really no need to also push
the write all the way to the memory.

The overall flow of the write-invalidate operation remains the same,
i.e. the operation is only carried out once the response for the
invalidate comes back. This patch adds the InvalidateResp for this
very reason.
2015-07-03 10:14:41 -04:00
Andreas Hansson 0ddde83a47 mem: Add ReadCleanReq and ReadSharedReq packets
This patch adds two new read requests packets:

ReadCleanReq - For a cache to explicitly request clean data. The
response is thus exclusive or shared, but not owned or modified. The
read-only caches (see previous patch) use this request type to ensure
they do not get dirty data.

ReadSharedReq - We add this to distinguish cache read requests from
those issued by other masters, such as devices and CPUs. Thus, devices
use ReadReq, and caches use ReadCleanReq, ReadExReq, or
ReadSharedReq. For the latter, the response can be any state, shared,
exclusive, owned or even modified.

Both ReadCleanReq and ReadSharedReq re-use the normal ReadResp. The
two transactions are aligned with the emerging cache-coherent TLM
standard and the AMBA nomenclature.

With this change, the normal ReadReq should never be used by a cache,
and is reserved for the actual (non-caching) masters in the system. We
thus have a way of identifying if a request came from a cache or
not. The introduction of ReadSharedReq thus removes the need for the
current isTopLevel hack, and also allows us to stop relying on
checking the packet size to determine if the source is a cache or
not. This is fixed in follow-on patches.
2015-07-03 10:14:40 -04:00
Andreas Hansson 893533a126 mem: Allow read-only caches and check compliance
This patch adds a parameter to the BaseCache to enable a read-only
cache, for example for the instruction cache, or table-walker cache
(not for x86). A number of checks are put in place in the code to
ensure a read-only cache does not end up with dirty data.

A follow-on patch adds suitable read requests to allow a read-only
cache to explicitly ask for clean data.
2015-07-03 10:14:39 -04:00
Ali Jafri a262908acc mem: Add clean evicts to improve snoop filter tracking
This patch adds eviction notices to the caches, to provide accurate
tracking of cache blocks in snoop filters. We add the CleanEvict
message to the memory heirarchy and use both CleanEvicts and
Writebacks with BLOCK_CACHED flags to propagate notice of clean and
dirty evictions respectively, down the memory hierarchy. Note that the
BLOCK_CACHED flag indicates whether there exist any copies of the
evicted block in the caches above the evicting cache.

The purpose of the CleanEvict message is to notify snoop filters of
silent evictions in the relevant caches. The CleanEvict message
behaves much like a Writeback. CleanEvict is a write and a request but
unlike a Writeback, CleanEvict does not have data and does not need
exclusive access to the block. The cache generates the CleanEvict
message on a fill resulting in eviction of a clean block. Before
travelling downwards CleanEvict requests generate zero-time snoop
requests to check if the same block is cached in upper levels of the
memory heirarchy. If the block exists, the cache discards the
CleanEvict message. The snoops check the tags, writeback queue and the
MSHRs of upper level caches in a manner similar to snoops generated
from HardPFReqs. Currently CleanEvicts keep travelling towards main
memory unless they encounter the block corresponding to their address
or reach main memory (since we have no well defined point of
serialisation). Main memory simply discards CleanEvict messages.

We have modified the behavior of Writebacks, such that they generate
snoops to check for the presence of blocks in upper level caches. It
is possible in our current implmentation for a lower level cache to be
writing back a block while a shared copy of the same block exists in
the upper level cache. If the snoops find the same block in upper
level caches, we set the BLOCK_CACHED flag in the Writeback message.

We have also added logic to account for interaction of other message
types with CleanEvicts waiting in the writeback queue. A simple
example is of a response arriving at a cache removing any CleanEvicts
to the same address from the cache's writeback queue.
2015-07-03 10:14:37 -04:00
Andreas Hansson 578a7f20c6 mem: Fix snoop packet data allocation bug
This patch fixes an issue where the snoop packet did not properly
forward the data pointer in case of static data.
2015-06-09 09:21:17 -04:00
Stephan Diestelhorst 2847d5f517 mem: Create a request copy for deferred snoops
Sometimes, we need to defer an express snoop in an MSHR, but the original
request might complete and deallocate the original pkt->req.  In those cases,
create a copy of the request so that someone who is inspecting the delayed
snoop can also inspect the request still.  All of this is rather hacky, but the
allocation / linking and general life-time management of Packet and Request is
rather tricky.  Deleting the copy is another tricky area, testing so far has
shown that the right copy is deleted at the right time.
2015-03-17 11:50:55 +00:00
Andreas Hansson 36f29496a0 mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.

The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
2015-05-05 03:22:29 -04:00
Andreas Hansson 14e5b2ea55 mem: Pass shared downstream through caches
This patch ensures that we pass on information about a packet being
shared (rather than exclusive), when forwarding a packet downstream.

Without this patch there is a risk that a downstream cache considers
the line exclusive when it really isn't.
2015-05-05 03:22:26 -04:00
Ali Jafri 3d33432136 mem: Add forward snoop check for HardPFReqs
We should always check whether the cache is supposed to be forwarding snoops
before generating snoops.
2015-05-05 03:22:25 -04:00
Andreas Hansson 0ebbf3f951 mem: Add missing stats update for uncacheable MSHRs
This patch adds a missing counter update for the uncacheable
accesses. By updating this counter we also get a meaningful average
latency for uncacheable accesses (previously inf).
2015-05-05 03:22:24 -04:00
Andreas Hansson 33e3e370f2 mem: Tidy up BaseCache parameters
This patch simply tidies up the BaseCache parameters and removes the
unused "two_queue" parameter.
2015-05-05 03:22:22 -04:00
David Guillen 5287945a8b mem: Remove templates in cache model
This patch changes the cache implementation to rely on virtual methods
rather than using the replacement policy as a template argument.

There is no impact on the simulation performance, and overall the
changes make it easier to modify (and subclass) the cache and/or
replacement policy.
2015-05-05 03:22:21 -04:00
Stephan Diestelhorst cb8856f580 mem: Support any number of master-IDs in stride prefetcher
The stride prefetcher had a hardcoded number of contexts (i.e. master-IDs)
that it could handle. Since master IDs need to be unique per system, and
every core, cache etc. requires a separate master port, a static limit on
these does not make much sense.

Instead, this patch adds a small hash map that will map all master IDs to
the right prefetch state and dynamically allocates new state for new master
IDs.
2015-03-27 04:56:03 -04:00
Andreas Hansson 0197e580e5 mem: Allocate cache writebacks before new MSHRs
This patch changes the order of writeback allocation such that any
writebacks resulting from a tag lookup (e.g. for an uncacheable
access), are added to the writebuffer before any new MSHR entries are
allocated. This ensures that the writebacks logically precedes the new
allocations.

The patch also changes the uncacheable flush to use proper timed (or
atomic) writebacks, as opposed to functional writes.
2015-03-27 04:56:02 -04:00
Andreas Hansson 24763c2177 mem: Cleanup flow for uncacheable accesses
This patch simplifies the code dealing with uncacheable timing
accesses, aiming to align it with the existing miss handling. Similar
to what we do in atomic, a timing request now goes through
Cache::access (where the block is also flushed), and then proceeds to
ignore any existing MSHR for the block in question. This unifies the
flow for cacheable and uncacheable accesses, and for atomic and timing.
2015-03-27 04:56:01 -04:00
Andreas Hansson a7a1e6004a mem: Ignore uncacheable MSHRs when finding matches
This patch changes how we search for matching MSHRs, ignoring any MSHR
that is allocated for an uncacheable access. By doing so, this patch
fixes a corner case in the MSHRs where incorrect data ended up being
copied into a (cacheable) read packet due to a first uncacheable MSHR
target of size 4, followed by a cacheable target to the same MSHR of
size 64. The latter target was filled with nonsense data.
2015-03-27 04:56:00 -04:00
Andreas Hansson 801ce65eae mem: Remove redundant allocateUncachedReadBuffer in cache
This patch removes the no-longer-needed
allocateUncachedReadBuffer. Besides the checks it is exactly the same
as allocateMissBuffer and thus provides no value.
2015-03-27 04:55:59 -04:00
Andreas Hansson fe806a0dd7 mem: Modernise MSHR iterators to C++11
This patch updates the iterators in the MSHR and MSHR queues to use
C++11 range-based for loops. It also does a bit of additional house
keeping.
2015-03-27 04:55:57 -04:00
Andreas Hansson 7bae98459c mem: Align all MSHR entries to block boundaries
This patch aligns all MSHR queue entries to block boundaries to
simplify checks for matches. Previously there were corner cases that
could lead to existing entries not being identified as matches.

There are, rather alarmingly, a few regressions that change with this
patch.
2015-03-27 04:55:55 -04:00
Ali Jafri 15f0d9ff14 mem: Rename PREFETCH_SNOOP_SQUASH flag to BLOCK_CACHED
This patch subsumes the PREFETCH_SNOOP_SQUASH flag with the more
generic BLOCK_CACHED flag. Future patches implementing cache eviction
messages can use the BLOCK_CACHED flag in almost the same manner as
hardware prefetches use the PREFETCH_SNOOP_SQUASH flag. The
PREFTECH_SNOOP_FLAG is set if the prefetch target is found in the tags
or the MSHRs in any state, so we are simply replacing calls to
setPrefetchSquashed() with setBlockCached(). The case of where the
prefetch target is found in the writeback MSHRs of upper level caches
continues to be covered by the MEM_INHIBIT flag.
2015-03-27 04:55:54 -04:00
Andreas Hansson 5275c9d740 mem: Use emplace front/back for deferred packets
Embrace C++11 for the deferred packets as we actually store the
objects in the data structure, and not just pointers.
2015-03-19 04:06:11 -04:00
Steve Reinhardt e57ab463cf mem: remove redundant test in in Cache::recvTimingResp()
For some reason we were checking mshr->hasTargets() even though
we had already called mshr->getTarget() unconditionally earlier
in the same function (which asserts if there are no targets).
Get rid of this useless check, and while we're at it get rid
of the redundant call to mshr->getTarget(), since we still have
the value saved in a local var.
2015-02-11 10:48:53 -08:00
Steve Reinhardt 89bb03a1a6 mem: add local var in Cache::recvTimingResp()
The main loop in recvTimingResp() uses target->pkt all over
the place.  Create a local tgt_pkt to help keep lines
under the line length limit.
2015-02-11 10:48:52 -08:00
Steve Reinhardt ccef61d1cc mem: clean up write buffer check in Cache::handleSnoop()
The 'if (writebacks.size)' check was redundant, because
writeBuffer.findMatches() would return false if the
writebacks list was empty.

Also renamed 'mshr' to 'wb_entry' in this context since
we are pointing at a writebuffer entry and not an MSHR
(even though it's the same C++ class).
2015-03-14 06:51:07 -07:00
Andreas Hansson fc315901ff mem: Unify all cache DPRINTF address formatting
This patch changes all the DPRINTF messages in the cache to use
'%#llx' every time a packet address is printed. The inclusion of '#'
ensures '0x' is prepended, and since the address type is a uint64_t %x
really should be %llx.
2015-03-02 04:00:56 -05:00
Andreas Hansson 88e2963951 mem: Fix cache MSHR conflict determination
This patch fixes a rather subtle issue in the sending of MSHR requests
in the cache, where the logic previously did not check for conflicts
between the MSRH queue and the write queue when requests were not
ready. The correct thing to do is to always check, since not having a
ready MSHR does not guarantee that there is no conflict.

The underlying problem seems to have slipped past due to the symmetric
timings used for the write queue and MSHR queue. However, with the
recent timing changes the bug caused regressions to fail.
2015-03-02 04:00:54 -05:00
Stephan Diestelhorst ecef1612b8 mem: Add option to force in-order insertion in PacketQueue
By default, the packet queue is ordered by the ticks of the to-be-sent
packages. With the recent modifications of packages sinking their header time
when their resposne leaves the caches, there could be cases of MSHR targets
being allocated and ordered A, B, but their responses being sent out in the
order B,A. This led to inconsistencies in bus traffic, in particular the snoop
filter observing first a ReadExResp and later a ReadRespWithInv.  Logically,
these were ordered the other way around behind the MSHR, but due to the timing
adjustments when inserting into the PacketQueue, they were sent out in the
wrong order on the bus, confusing the snoop filter.

This patch adds a flag (off by default) such that these special cases can
request in-order insertion into the packet queue, which might offset timing
slighty. This is expected to occur rarely and not affect timing results.
2015-03-02 04:00:49 -05:00
Marco Balboni d4ef8368aa mem: Downstream components consumes new crossbar delays
This patch makes the caches and memory controllers consume the delay
that is annotated to a packet by the crossbar. Previously many
components simply threw these delays away. Note that the devices still
do not pay for these delays.
2015-03-02 04:00:48 -05:00
Andreas Hansson 987de4f5cc mem: Tidy up the cache debug messages
Avoid redundant inclusion of the name in the DPRINTF string.
2015-03-02 04:00:37 -05:00
Andreas Hansson f26a289295 mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
2015-03-02 04:00:35 -05:00
Ali Jafri 6ebe8d863a mem: Fix prefetchSquash + memInhibitAsserted bug
This patch resolves a bug with hardware prefetches. Before a hardware prefetch
is sent towards the memory, the system generates a snoop request to check all
caches above the prefetch generating cache for the presence of the prefetth
target. If the prefetch target is found in the tags or the MSHRs of the upper
caches, the cache sets the prefetchSquashed flag in the snoop packet. When the
snoop packet returns with the prefetchSquashed flag set, the prefetch
generating cache deallocates the MSHR reserved for the prefetch. If the
prefetch target is found in the writeback buffer of the upper cache, the cache
sets the memInhibit flag, which signals the prefetch generating cache to
expect the data from the writeback. When the snoop packet returns with the
memInhibitAsserted flag set, it marks the allocated MSHR as inService and
waits for the data from the writeback.

If the prefetch target is found in multiple upper level caches, specifically
in the tags or MSHRs of one upper level cache and the writeback buffer of
another, the snoop packet will return with both prefetchSquashed and
memInhibitAsserted set, while the current code is not written to handle such
an outcome. Current code checks for the prefetchSquashed flag first, if it
finds the flag, it deallocates the reserved MSHR. This leads to assert failure
when the data from the writeback appears at cache. In this fix, we simply
switch the order of checks. We first check for memInhibitAsserted and then for
prefetch squashed.
2015-03-02 04:00:34 -05:00
Marco Balboni 268d9e59c5 mem: Clarification of packet crossbar timings
This patch clarifies the packet timings annotated
when going through a crossbar.

The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.

The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.

For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.
2015-02-11 10:23:47 -05:00
Marco Balboni e2828587b3 mem: Clarify usage of latency in the cache
This patch adds some much-needed clarity in the specification of the
cache timing. For now, hit_latency and response_latency are kept as
top-level parameters, but the cache itself has a number of local
variables to better map the individual timing variables to different
behaviours (and sub-components).

The introduced variables are:
- lookupLatency: latency of tag lookup, occuring on any access
- forwardLatency: latency that occurs in case of outbound miss
- fillLatency: latency to fill a cache block
We keep the existing responseLatency

The forwardLatency is used by allocateInternalBuffer() for:
- MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer);
- MSHR allocateMissBuffer (cacheable miss in MSHR queue);
- MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR
  queue)
It is our assumption that the time for the above three buffers is the
same. Similarly, for snoop responses passing through the cache we use
forwardLatency.
2015-02-11 10:23:36 -05:00
Andreas Hansson 461a80beb3 mem: Clarify express snoop behaviour
This patch adds a bit of documentation with insights around how
express snoops really work.
2015-02-03 14:26:02 -05:00
Andreas Hansson 193325ff60 mem: Clarify cache behaviour for pending dirty responses
This patch adds a bit of clarification around the assumptions made in
the cache when packets are sent out, and dirty responses are
pending. As part of the change, the marking of an MSHR as in service
is simplified slightly, and comments are added to explain what
assumptions are made.
2015-02-03 14:25:59 -05:00
Andreas Hansson 15c64035ed mem: Remove Packet source from ForwardResponseRecord
This patch removes the source field from the ForwardResponseRecord,
but keeps the class as it is part of how the cache identifies
responses to hardware prefetches that are snooped upwards.
2015-01-22 05:01:30 -05:00
Andreas Hansson 6096e2f9c1 mem: Fix bug in cache request retry mechanism
This patch ensures that inhibited packets that are about to be turned
into express snoops do not update the retry flag in the cache.
2015-01-20 08:12:01 -05:00
Mitch Hayenga b2342c5d9a mem: Change prefetcher to use random_mt
Prefechers has used rand() to generate random numers previously.
2014-12-23 09:31:19 -05:00
Curtis Dunham 516e6046ae mem: Hide WriteInvalidate requests from prefetchers
Without this tweak, a prefetcher will happily prefetch data that will
promptly be invalidated and overwritten by a WriteInvalidate.
2014-12-23 09:31:19 -05:00
Mitch Hayenga bd4f901c77 mem: Fix event scheduling issue for prefetches
The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever
a future prefetch is ready.  However, a prefetch being ready does not guarentee
that it can obtain an MSHR.  So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond
until an MSHR is finally freed and the prefetch can happen.

This patch fixes this by not signaling the prefetch ready time if the prefetch
could not be generated.  The event is rescheduled as soon as a MSHR becomes
available.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 4acd4a2055 mem: Fix bug relating to writebacks and prefetches
Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system.  I hit that case.  Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.
2014-12-23 09:31:18 -05:00
Mitch Hayenga df82a2d003 mem: Rework the structuring of the prefetchers
Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.

Finally, the stride-based prefetcher now assumes a custimizable lookup table
(sets/ways) rather than the previous fully associative structure.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 6cb58b2bd2 mem: Add parameter to reserve MSHR entries for demand access
Adds a new parameter that reserves some number of MSHR entries for demand
accesses.  This helps prevent prefetchers from taking all MSHRs, forcing demand
requests from the CPU to stall.
2014-12-23 09:31:18 -05:00
Curtis Dunham 5d22250845 mem: Support WriteInvalidate (again)
This patch takes a clean-slate approach to providing WriteInvalidate
(write streaming, full cache line writes without first reading)
support.

Unlike the prior attempt, which took an aggressive approach of directly
writing into the cache before handling the coherence actions, this
approach follows the existing cache flows as closely as possible.
2014-12-02 06:08:19 -05:00
Curtis Dunham 7ca27dd3cc mem: Remove WriteInvalidate support
Prepare for a different implementation following in the next patch
2014-12-02 06:08:17 -05:00
Andreas Hansson ea5ccc7041 mem: Clean up packet data allocation
This patch attempts to make the rules for data allocation in the
packet explicit, understandable, and easy to verify. The constructor
that copies a packet is extended with an additional flag "alloc_data"
to enable the call site to explicitly say whether the newly created
packet is short-lived (a zero-time snoop), or has an unknown life-time
and therefore should allocate its own data (or copy a static pointer
in the case of static data).

The tricky case is the static data. In essence this is a
copy-avoidance scheme where the original source of the request (DMA,
CPU etc) does not ask the memory system to return data as part of the
packet, but instead provides a pointer, and then the memory system
carries this pointer around, and copies the appropriate data to the
location itself. Thus any derived packet actually never copies any
data. As the original source does not copy any data from the response
packet when arriving back at the source, we must maintain the copy of
the original pointer to not break the system. We might want to revisit
this one day and pay the price for a few extra memcpy invocations.

All in all this patch should make it easier to grok what is going on
in the memory system and how data is actually copied (or not).
2014-12-02 06:07:54 -05:00
Andreas Hansson f012166bb6 mem: Cleanup Packet::checkFunctional and hasData usage
This patch cleans up the use of hasData and checkFunctional in the
packet. The hasData function is unfortunately suggesting that it
checks if the packet has a valid data pointer, when it does in fact
only check if the specific packet type is specified to have a data
payload. The confusion led to a bug in checkFunctional. The latter
function is also tidied up to avoid name overloading.
2014-12-02 06:07:52 -05:00
Andreas Hansson a2ee51f631 mem: Make the requests carried by packets const
This adds a basic level of sanity checking to the packet by ensuring
that a request is not modified once the packet is created. The only
issue that had to be worked around is the relaying of
software-prefetches in the cache. The specific situation is now solved
by first copying the request, and then creating a new packet
accordingly.
2014-12-02 06:07:50 -05:00
Andreas Hansson 3d6ec81e66 mem: Add checks and explanation for assertMemInhibit usage 2014-12-02 06:07:46 -05:00