Commit graph

497 commits

Author SHA1 Message Date
Andreas Hansson 801ce65eae mem: Remove redundant allocateUncachedReadBuffer in cache
This patch removes the no-longer-needed
allocateUncachedReadBuffer. Besides the checks it is exactly the same
as allocateMissBuffer and thus provides no value.
2015-03-27 04:55:59 -04:00
Andreas Hansson fe806a0dd7 mem: Modernise MSHR iterators to C++11
This patch updates the iterators in the MSHR and MSHR queues to use
C++11 range-based for loops. It also does a bit of additional house
keeping.
2015-03-27 04:55:57 -04:00
Andreas Hansson 7bae98459c mem: Align all MSHR entries to block boundaries
This patch aligns all MSHR queue entries to block boundaries to
simplify checks for matches. Previously there were corner cases that
could lead to existing entries not being identified as matches.

There are, rather alarmingly, a few regressions that change with this
patch.
2015-03-27 04:55:55 -04:00
Ali Jafri 15f0d9ff14 mem: Rename PREFETCH_SNOOP_SQUASH flag to BLOCK_CACHED
This patch subsumes the PREFETCH_SNOOP_SQUASH flag with the more
generic BLOCK_CACHED flag. Future patches implementing cache eviction
messages can use the BLOCK_CACHED flag in almost the same manner as
hardware prefetches use the PREFETCH_SNOOP_SQUASH flag. The
PREFTECH_SNOOP_FLAG is set if the prefetch target is found in the tags
or the MSHRs in any state, so we are simply replacing calls to
setPrefetchSquashed() with setBlockCached(). The case of where the
prefetch target is found in the writeback MSHRs of upper level caches
continues to be covered by the MEM_INHIBIT flag.
2015-03-27 04:55:54 -04:00
Andreas Hansson 5275c9d740 mem: Use emplace front/back for deferred packets
Embrace C++11 for the deferred packets as we actually store the
objects in the data structure, and not just pointers.
2015-03-19 04:06:11 -04:00
Steve Reinhardt e57ab463cf mem: remove redundant test in in Cache::recvTimingResp()
For some reason we were checking mshr->hasTargets() even though
we had already called mshr->getTarget() unconditionally earlier
in the same function (which asserts if there are no targets).
Get rid of this useless check, and while we're at it get rid
of the redundant call to mshr->getTarget(), since we still have
the value saved in a local var.
2015-02-11 10:48:53 -08:00
Steve Reinhardt 89bb03a1a6 mem: add local var in Cache::recvTimingResp()
The main loop in recvTimingResp() uses target->pkt all over
the place.  Create a local tgt_pkt to help keep lines
under the line length limit.
2015-02-11 10:48:52 -08:00
Steve Reinhardt ccef61d1cc mem: clean up write buffer check in Cache::handleSnoop()
The 'if (writebacks.size)' check was redundant, because
writeBuffer.findMatches() would return false if the
writebacks list was empty.

Also renamed 'mshr' to 'wb_entry' in this context since
we are pointing at a writebuffer entry and not an MSHR
(even though it's the same C++ class).
2015-03-14 06:51:07 -07:00
Andreas Hansson fc315901ff mem: Unify all cache DPRINTF address formatting
This patch changes all the DPRINTF messages in the cache to use
'%#llx' every time a packet address is printed. The inclusion of '#'
ensures '0x' is prepended, and since the address type is a uint64_t %x
really should be %llx.
2015-03-02 04:00:56 -05:00
Andreas Hansson 88e2963951 mem: Fix cache MSHR conflict determination
This patch fixes a rather subtle issue in the sending of MSHR requests
in the cache, where the logic previously did not check for conflicts
between the MSRH queue and the write queue when requests were not
ready. The correct thing to do is to always check, since not having a
ready MSHR does not guarantee that there is no conflict.

The underlying problem seems to have slipped past due to the symmetric
timings used for the write queue and MSHR queue. However, with the
recent timing changes the bug caused regressions to fail.
2015-03-02 04:00:54 -05:00
Stephan Diestelhorst ecef1612b8 mem: Add option to force in-order insertion in PacketQueue
By default, the packet queue is ordered by the ticks of the to-be-sent
packages. With the recent modifications of packages sinking their header time
when their resposne leaves the caches, there could be cases of MSHR targets
being allocated and ordered A, B, but their responses being sent out in the
order B,A. This led to inconsistencies in bus traffic, in particular the snoop
filter observing first a ReadExResp and later a ReadRespWithInv.  Logically,
these were ordered the other way around behind the MSHR, but due to the timing
adjustments when inserting into the PacketQueue, they were sent out in the
wrong order on the bus, confusing the snoop filter.

This patch adds a flag (off by default) such that these special cases can
request in-order insertion into the packet queue, which might offset timing
slighty. This is expected to occur rarely and not affect timing results.
2015-03-02 04:00:49 -05:00
Marco Balboni d4ef8368aa mem: Downstream components consumes new crossbar delays
This patch makes the caches and memory controllers consume the delay
that is annotated to a packet by the crossbar. Previously many
components simply threw these delays away. Note that the devices still
do not pay for these delays.
2015-03-02 04:00:48 -05:00
Andreas Hansson 987de4f5cc mem: Tidy up the cache debug messages
Avoid redundant inclusion of the name in the DPRINTF string.
2015-03-02 04:00:37 -05:00
Andreas Hansson f26a289295 mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
2015-03-02 04:00:35 -05:00
Ali Jafri 6ebe8d863a mem: Fix prefetchSquash + memInhibitAsserted bug
This patch resolves a bug with hardware prefetches. Before a hardware prefetch
is sent towards the memory, the system generates a snoop request to check all
caches above the prefetch generating cache for the presence of the prefetth
target. If the prefetch target is found in the tags or the MSHRs of the upper
caches, the cache sets the prefetchSquashed flag in the snoop packet. When the
snoop packet returns with the prefetchSquashed flag set, the prefetch
generating cache deallocates the MSHR reserved for the prefetch. If the
prefetch target is found in the writeback buffer of the upper cache, the cache
sets the memInhibit flag, which signals the prefetch generating cache to
expect the data from the writeback. When the snoop packet returns with the
memInhibitAsserted flag set, it marks the allocated MSHR as inService and
waits for the data from the writeback.

If the prefetch target is found in multiple upper level caches, specifically
in the tags or MSHRs of one upper level cache and the writeback buffer of
another, the snoop packet will return with both prefetchSquashed and
memInhibitAsserted set, while the current code is not written to handle such
an outcome. Current code checks for the prefetchSquashed flag first, if it
finds the flag, it deallocates the reserved MSHR. This leads to assert failure
when the data from the writeback appears at cache. In this fix, we simply
switch the order of checks. We first check for memInhibitAsserted and then for
prefetch squashed.
2015-03-02 04:00:34 -05:00
Marco Balboni 268d9e59c5 mem: Clarification of packet crossbar timings
This patch clarifies the packet timings annotated
when going through a crossbar.

The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.

The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.

For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.
2015-02-11 10:23:47 -05:00
Marco Balboni e2828587b3 mem: Clarify usage of latency in the cache
This patch adds some much-needed clarity in the specification of the
cache timing. For now, hit_latency and response_latency are kept as
top-level parameters, but the cache itself has a number of local
variables to better map the individual timing variables to different
behaviours (and sub-components).

The introduced variables are:
- lookupLatency: latency of tag lookup, occuring on any access
- forwardLatency: latency that occurs in case of outbound miss
- fillLatency: latency to fill a cache block
We keep the existing responseLatency

The forwardLatency is used by allocateInternalBuffer() for:
- MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer);
- MSHR allocateMissBuffer (cacheable miss in MSHR queue);
- MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR
  queue)
It is our assumption that the time for the above three buffers is the
same. Similarly, for snoop responses passing through the cache we use
forwardLatency.
2015-02-11 10:23:36 -05:00
Andreas Hansson 461a80beb3 mem: Clarify express snoop behaviour
This patch adds a bit of documentation with insights around how
express snoops really work.
2015-02-03 14:26:02 -05:00
Andreas Hansson 193325ff60 mem: Clarify cache behaviour for pending dirty responses
This patch adds a bit of clarification around the assumptions made in
the cache when packets are sent out, and dirty responses are
pending. As part of the change, the marking of an MSHR as in service
is simplified slightly, and comments are added to explain what
assumptions are made.
2015-02-03 14:25:59 -05:00
Andreas Hansson 15c64035ed mem: Remove Packet source from ForwardResponseRecord
This patch removes the source field from the ForwardResponseRecord,
but keeps the class as it is part of how the cache identifies
responses to hardware prefetches that are snooped upwards.
2015-01-22 05:01:30 -05:00
Andreas Hansson 6096e2f9c1 mem: Fix bug in cache request retry mechanism
This patch ensures that inhibited packets that are about to be turned
into express snoops do not update the retry flag in the cache.
2015-01-20 08:12:01 -05:00
Mitch Hayenga b2342c5d9a mem: Change prefetcher to use random_mt
Prefechers has used rand() to generate random numers previously.
2014-12-23 09:31:19 -05:00
Curtis Dunham 516e6046ae mem: Hide WriteInvalidate requests from prefetchers
Without this tweak, a prefetcher will happily prefetch data that will
promptly be invalidated and overwritten by a WriteInvalidate.
2014-12-23 09:31:19 -05:00
Mitch Hayenga bd4f901c77 mem: Fix event scheduling issue for prefetches
The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever
a future prefetch is ready.  However, a prefetch being ready does not guarentee
that it can obtain an MSHR.  So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond
until an MSHR is finally freed and the prefetch can happen.

This patch fixes this by not signaling the prefetch ready time if the prefetch
could not be generated.  The event is rescheduled as soon as a MSHR becomes
available.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 4acd4a2055 mem: Fix bug relating to writebacks and prefetches
Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system.  I hit that case.  Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.
2014-12-23 09:31:18 -05:00
Mitch Hayenga df82a2d003 mem: Rework the structuring of the prefetchers
Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.

Finally, the stride-based prefetcher now assumes a custimizable lookup table
(sets/ways) rather than the previous fully associative structure.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 6cb58b2bd2 mem: Add parameter to reserve MSHR entries for demand access
Adds a new parameter that reserves some number of MSHR entries for demand
accesses.  This helps prevent prefetchers from taking all MSHRs, forcing demand
requests from the CPU to stall.
2014-12-23 09:31:18 -05:00
Curtis Dunham 5d22250845 mem: Support WriteInvalidate (again)
This patch takes a clean-slate approach to providing WriteInvalidate
(write streaming, full cache line writes without first reading)
support.

Unlike the prior attempt, which took an aggressive approach of directly
writing into the cache before handling the coherence actions, this
approach follows the existing cache flows as closely as possible.
2014-12-02 06:08:19 -05:00
Curtis Dunham 7ca27dd3cc mem: Remove WriteInvalidate support
Prepare for a different implementation following in the next patch
2014-12-02 06:08:17 -05:00
Andreas Hansson ea5ccc7041 mem: Clean up packet data allocation
This patch attempts to make the rules for data allocation in the
packet explicit, understandable, and easy to verify. The constructor
that copies a packet is extended with an additional flag "alloc_data"
to enable the call site to explicitly say whether the newly created
packet is short-lived (a zero-time snoop), or has an unknown life-time
and therefore should allocate its own data (or copy a static pointer
in the case of static data).

The tricky case is the static data. In essence this is a
copy-avoidance scheme where the original source of the request (DMA,
CPU etc) does not ask the memory system to return data as part of the
packet, but instead provides a pointer, and then the memory system
carries this pointer around, and copies the appropriate data to the
location itself. Thus any derived packet actually never copies any
data. As the original source does not copy any data from the response
packet when arriving back at the source, we must maintain the copy of
the original pointer to not break the system. We might want to revisit
this one day and pay the price for a few extra memcpy invocations.

All in all this patch should make it easier to grok what is going on
in the memory system and how data is actually copied (or not).
2014-12-02 06:07:54 -05:00
Andreas Hansson f012166bb6 mem: Cleanup Packet::checkFunctional and hasData usage
This patch cleans up the use of hasData and checkFunctional in the
packet. The hasData function is unfortunately suggesting that it
checks if the packet has a valid data pointer, when it does in fact
only check if the specific packet type is specified to have a data
payload. The confusion led to a bug in checkFunctional. The latter
function is also tidied up to avoid name overloading.
2014-12-02 06:07:52 -05:00
Andreas Hansson a2ee51f631 mem: Make the requests carried by packets const
This adds a basic level of sanity checking to the packet by ensuring
that a request is not modified once the packet is created. The only
issue that had to be worked around is the relaying of
software-prefetches in the cache. The specific situation is now solved
by first copying the request, and then creating a new packet
accordingly.
2014-12-02 06:07:50 -05:00
Andreas Hansson 3d6ec81e66 mem: Add checks and explanation for assertMemInhibit usage 2014-12-02 06:07:46 -05:00
Andreas Hansson 5df96cb690 mem: Remove redundant Packet::allocate calls
This patch cleans up the packet memory allocation confusion. The data
is always allocated at the requesting side, when a packet is created
(or copied), and there is never a need for any device to allocate any
space if it is merely responding to a paket. This behaviour is in line
with how SystemC and TLM works as well, thus increasing
interoperability, and matching established conventions.

The redundant calls to Packet::allocate are removed, and the checks in
the function are tightened up to make sure data is only ever allocated
once. There are still some oddities in the packet copy constructor
where we copy the data pointer if it is static (without ownership),
and allocate new space if the data is dynamic (with ownership). The
latter is being worked on further in a follow-on patch.
2014-12-02 06:07:41 -05:00
Andreas Hansson 9779ba2e37 mem: Add const getters for write packet data
This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.
2014-12-02 06:07:36 -05:00
Ali Saidi b31d9e93e2 arm, mem: Fix drain bug and provide drain prints for more components. 2014-10-29 23:18:26 -05:00
Curtis Dunham 4024fab7fc mem: don't inhibit WriteInv's or defer snoops on their MSHRs
WriteInvalidate semantics depend on the unconditional writeback
or they won't complete.  Also, there's no point in deferring snoops
on their MSHRs, as they don't get new data at the end of their life
cycle the way other transactions do.

Add comment in the cache about a minor inefficiency re: WriteInvalidate.
2014-10-21 17:04:41 -05:00
Curtis Dunham 46f9f11a55 mem: have WriteInvalidate obsolete MSHRs
Since WriteInvalidate directly writes into the cache, it can
create tricky timing interleavings with reads and writes to the
same cache line that haven't yet completed.  This patch ensures
that these requests, when completed, don't overwrite the newer
data from the WriteInvalidate.
2014-10-29 23:18:24 -05:00
Andreas Hansson df973abef3 mem: Dynamically determine page bytes in memory components
This patch takes a step towards an ISA-agnostic memory
system by enabling the components to establish the page size after
instantiation. The swap operation in the memory is now also allowing
any granularity to avoid depending on the IntReg of the ISA.
2014-10-16 05:49:43 -04:00
Andreas Hansson f4a538f862 mem: Add packet sanity checks to cache and MSHRs
This patch adds a number of asserts to the cache, checking basic
assumptions about packets being requests or responses.
2014-10-09 17:51:56 -04:00
Andreas Hansson de62aedabc misc: Fix a bunch of minor issues identified by static analysis
Add some missing initialisation, and fix a handful benign resource
leaks (including some false positives).
2014-09-27 09:08:29 -04:00
Andreas Hansson 1f6d5f8f84 mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.

As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.

--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
2014-09-20 17:18:32 -04:00
Mitch Hayenga 3e5bf0c922 mem: Remove the GHB prefetcher from the source tree
There are two primary issues with this code which make it deserving of deletion.

1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher
2) This prefetcher isn't even structured like a GHB prefetcher.
   It's basically a worse version of the stride prefetcher.

It primarily serves to confuse new gem5 users and most functionality is already
present in the stride prefetcher.
2014-09-20 17:17:44 -04:00
Andreas Hansson f615c4aeb0 misc: Remove assertions ensuring unsigned values >= 0 2014-09-19 10:35:07 -04:00
Andreas Hansson 38646d48eb mem: Add checks to sendTimingReq in cache
A small fix to ensure the return value is not ignored.
2014-09-19 10:35:04 -04:00
Andreas Hansson da4539dc74 misc: Fix a number of unitialised variables and members
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
2014-09-09 04:36:31 -04:00
Curtis Dunham f6f63ec0aa mem: write streaming support via WriteInvalidate promotion
Support full-block writes directly rather than requiring RMW:
 * a cache line is allocated in the cache upon receipt of a
   WriteInvalidateReq, not the WriteInvalidateResp.
 * only top-level caches allocate the line; the others just pass
   the request along and invalidate as necessary.
 * to close a timing window between the *Req and the *Resp, a new
   metadata bit tracks whether another cache has read a copy of
   the new line before the writeback to memory.
2014-06-27 12:29:00 -05:00
Andreas Hansson 3be4f4b846 mem: Fix a bug in the cache port flow control
This patch fixes a bug in the cache port where the retry flag was
reset too early, allowing new requests to arrive before the retry was
actually sent, but with the event already scheduled. This caused a
deadlock in the interactions with the O3 LSQ.

The patche fixes the underlying issue by shifting the resetting of the
flag to be done by the event that also calls sendRetry(). The patch
also tidies up the flow control in recvTimingReq and ensures that we
also check if we already have a retry outstanding.
2014-09-03 07:42:50 -04:00
Curtis Dunham 5d029463ee cpu, mem: Make software prefetches non-blocking
Previously, they were treated so much like loads that they could stall
at the head of the ROB.  Now they are always treated like L1 hits.
If they actually miss, a new request is created at the L1 and tracked
from the MSHRs there if necessary (i.e. if it didn't coalesce with
an existing outstanding load).
2014-05-13 12:20:49 -05:00
Geoffrey Blake b404ffde60 cache: Fix handling of LL/SC requests under contention
If a set of LL/SC requests contend on the same cache block we
can get into a situation where CPUs will deadlock if they expect
a failed SC to supply them data.  This case happens where 3 or
more cores are contending for a cache block using LL/SC and the system
is configured where 2 cores are connected to a local bus and the
third is connected to a remote bus.  If a core on the local bus
sends an SCUpgrade and the core on the remote bus sends and SCUpgrade
they will race to see who will win the SC access.  In the meantime
if the other core appends a read to one of the SCUpgrades it will expect
to be supplied data by that SCUpgrade transaction.  If it happens that
the SCUpgrade that was picked to supply the data is failed, it will
drop the appended request for data and never respond, leaving the requesting
core to deadlock.  This patch makes all SC's behave as normal stores to
prevent this case but still makes sure to check whether it can perform
the update.
2014-09-03 07:42:31 -04:00
Andreas Hansson e1ac962939 arch: Cleanup unused ISA traits constants
This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
2014-09-03 07:42:21 -04:00
Mitch Hayenga f6f6ae461e mem: Properly set cache block status fields on writebacks
When a cacheline is written back to a lower-level cache,
tags->insertBlock() sets various status parameters. However these
status bits were cleared immediately after calling. This patch makes
it so that these status fields are not cleared by moving them outside
of the tags->insertBlock() call.
2014-08-13 06:57:24 -04:00
Anthony Gutierrez a628afedad mem: refactor LRU cache tags and add random replacement tags
this patch implements a new tags class that uses a random replacement policy.
these tags prefer to evict invalid blocks first, if none are available a
replacement candidate is chosen at random.

this patch factors out the common code in the LRU class and creates a new
abstract class: the BaseSetAssoc class. any set associative tag class must
implement the functionality related to the actual replacement policy in the
following methods:

accessBlock()
findVictim()
insertBlock()
invalidate()
2014-07-28 12:23:23 -04:00
Mitch Hayenga a15b713cba mem: Squash prefetch requests from downstream caches
This patch squashes prefetch requests from downstream caches,
so that they do not steal cachelines away from caches closer
to the cpu.  It was originally coded by Mitch Hayenga and
modified by Aasheesh Kolli.
2014-05-09 18:58:46 -04:00
Mitch Hayenga a0d30f36a6 mem: Don't print out the data of a cache block
This never actually worked since it was printing out only a word
of the cache block and not the entire thing and doubly didn't work
csprintf overrides the %#x specifier and assumes a char* array is
actually a string.
2014-04-01 14:24:36 -05:00
Prakash Ramrakhyani e88cffb30a mem: Fix incorrect assert failure in the Cache
This patch fixes an assert condition that is not true at all
times. There are valid situations that arise in dual-core
dual-workload runs where the assert condition is false. The function
call following the assert however needs to be called only when the
condition is true (a block cannot be invalidated in the tags structure
if has not been allocated in the structure, and the tempBlock is never
allocated). Hence the 'assert' has been replaced with an 'if'.
2014-03-07 15:56:23 -05:00
Andreas Hansson 969b436243 mem: Filter cache snoops based on address ranges
This patch adds a filter to the cache to drop snoop requests that are
not for a range covered by the cache. This fixes an issue observed
when multiple caches are placed in parallel, covering different
address ranges. Without this patch, all the caches will forward the
snoop upwards, when only one should do so.
2014-02-18 05:50:58 -05:00
Mitch Hayenga 96317d466e mem: Add additional tolerance to stride prefetcher
Forces the prefetcher to mispredict twice in a row before resetting the
confidence of prefetching.  This helps cases where a load PC strides by a
constant factor, however it may operate on different arrays at times.
Avoids the cost of retraining.  Primarily helps with small iteration loops.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga 771c864bf4 mem: Allowed tagged instruction prefetching in stride prefetcher
For systems with a tightly coupled L2, a stride-based prefetcher may observe
access requests from both instruction and data L1 caches.  However, the PC
address of an instruction miss gives no relevant training information to the
stride based prefetcher(there is no stride to train).  In theses cases, its
better if the L2 stride prefetcher simply reverted back to a simple N-block
ahead prefetcher.  This patch enables this option.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E) 95735e10e7 mem: prefetcher: add options, support for unaligned addresses
This patch extends the classic prefetcher to work on non-block aligned
addresses.  Because the existing prefetchers in gem5 mask off the lower
address bits of cache accesses, many predictable strides fail to be
detected.  For example, if a load were to stride by 48 bytes, with 64 byte
cachelines, the current stride based prefetcher would see an access pattern
of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern.  This
patch fixes this, by training the prefetcher on access and not masking off the
lower address bits.

It also adds the following configuration options:
1) Training/prefetching only on cache misses,
2) Training/prefetching only on data acceses,
3) Optionally tagging prefetches with a PC address.
#3 allows prefetchers to train off of prefetch requests in systems with
multiple cache levels and PC-based prefetchers present at multiple levels.
It also effectively allows a pipelining of prefetch requests (like in POWER4)
across multiple levels of cache hierarchy.

Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7%  for SPECFP (geomean).
2014-01-29 23:21:25 -06:00
Amin Farmahini ffbdaa7cce mem: Remove redundant findVictim() input argument
The patch
(1) removes the redundant writeback argument from findVictim()
(2) fixes the description of access() function

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:50 -06:00
Giacomo Gabrielli aefe9cc624 mem: Add support for a security bit in the memory system
This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
2014-01-24 15:29:30 -06:00
Timothy M. Jones 427ceb57a9 Cache: Collect very basic stats on tag and data accesses
Adds very basic statistics on the number of tag and data accesses within the
cache, which is important for power modelling.  For the tags, simply count
the associativity of the cache each time.  For the data, this depends on
whether tags and data are accessed sequentially, which is given by a new
parameter.  In the parallel case, all data blocks are accessed each time, but
with sequential accesses, a single data block is accessed only on a hit.
2014-01-24 15:29:30 -06:00
Dam Sunwoo 85e8779de7 mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks.  Cache occupancy stats are
recalculated on each stat dump.
2014-01-24 15:29:30 -06:00
Matt Horsnell ca89eba79e mem: track per-request latencies and access depths in the cache hierarchy
Add some values and methods to the request object to track the translation
and access latency for a request and which level of the cache hierarchy responded
to the request.
2014-01-24 15:29:30 -06:00
Matt Horsnell 6decd70bfb cpu: add consistent guarding to *_impl.hh files. 2013-10-17 10:20:45 -05:00
Andreas Hansson 19a5b68db7 arch: Resurrect the NOISA build target and rename it NULL
This patch makes it possible to once again build gem5 without any
ISA. The main purpose is to enable work around the interconnect and
memory system without having to build any CPU models or device models.

The regress script is updated to include the NULL ISA target. Currently
no regressions make use of it, but all the testers could (and perhaps
should) transition to it.

--HG--
rename : build_opts/NOISA => build_opts/NULL
rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts
rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh
rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc
2013-09-04 13:22:57 -04:00
Andreas Hansson d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Xiangyu Dong 4e8ecd7c6f mem: Add cache class destructor to avoid memory leaks
Make valgrind a little bit happier
2013-07-18 08:29:47 -04:00
Prakash Ramrakhyani ac515d7a9b mem: Reorganize cache tags and make them a SimObject
This patch reorganizes the cache tags to allow more flexibility to
implement new replacement policies. The base tags class is now a
clocked object so that derived classes can use a clock if they need
one. Also having deriving from SimObject allows specialized Tag
classes to be swapped in/out in .py files.

The cache set is now templatized to allow it to contain customized
cache blocks with additional informaiton. This involved moving code to
the .hh file and removing cacheset.cc.

The statistics belonging to the cache tags are now including ".tags"
in their name. Hence, the stats need an update to reflect the change
in naming.
2013-06-27 05:49:50 -04:00
Andreas Hansson 0d68d36b9d mem: Remove the cache builder
This patch removes the redundant cache builder class.
2013-06-27 05:49:50 -04:00
Andreas Hansson 33a8d777ad mem: Align cache timing to clock edges
This patch changes the cache timing calculations such that the results
are aligned to clock edges.

Plenty stats change as a results of this patch.
2013-06-27 05:49:49 -04:00
Andreas Hansson 368f50a0a1 mem: Cycles converted to Ticks in atomic cache accesses
This patch fixes an outstanding issue in the cache timing calculations
where an atomic access returned a time in Cycles, but the port
forwarded it on as if it was in Ticks.

A separate patch will update the regression stats.
2013-06-27 05:49:49 -04:00
Andreas Hansson f330b3c28d mem: Remove a redundant heap allocation for a snoop packet
This patch changes the updards snoop packet to avoid allocating and
later deleting it. As the code executes in 0 time and the lifetime of
the packet does not extend beyond the block there is no reason to heap
allocate it.
2013-06-27 05:49:49 -04:00
Andreas Hansson 7da851d1a8 mem: Spring cleaning of MSHR and MSHRQueue
This patch does some minor tidying up of the MSHR and MSHRQueue. The
clean up started as part of some ad-hoc tracing and debugging, but
seems worthwhile enough to go in as a separate patch.

The highlights of the changes are reduced scoping (private) members
where possible, avoiding redundant new/delete, and constructor
initialisation to please static code analyzers.
2013-05-30 12:54:11 -04:00
Andreas Hansson 42191522cc mem: Fix MSHR print format
This patch fixes an incorrect print format string by adding an
additional string element.
2013-05-30 12:54:09 -04:00
Uri Wiener a8fbfefb5e mem: Adding verbose debug output in the memory system
This patch provides useful printouts throughut the memory system. This
includes pretty-printed cache tags and function call messages
(call-stack like).
2013-04-22 13:20:33 -04:00
Mitch Hayenga 4920f0d7e5 mem: Fix cache latency bug
Fixes a latency calculation bug for accesses during a cache line fill.

Under a cache miss, before the line is filled, accesses to the cache are
associated with a MSHR and marked as targets.  Once the line fill completes,
MSHR target packets pay an additional latency of
"responseLatency + busSerializationLatency".  However, the "whenReady"
field of the cache line is only set to an additional delay of
"busSerializationLatency".  This lacks the responseLatency component of
the fill.  It is possible for accesses that occur on the cycle of
(or briefly after) the line fill to respond without properly paying the
responseLatency.  This also creates the situation where two accesses to the
same address may be serviced in an order opposite of how they were received
by the cache.  For stores to the same address, this means that although the
cache performs the stores in the order they were received, acknowledgements
may be sent in a different order.

Adding the responseLatency component to the whenReady field preserves the
penalty that should be paid and prevents these ordering issues.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-03-27 18:36:09 -05:00
Rene de Jong 87089175cc mem: Cancel cache retry event when blocking port
This patch solves the corner case scenario where the sendRetryEvent could be
scheduled twice, when an io device stresses the IOcache in the system. This
should not be possible in the cache system.
2013-03-26 14:46:51 -04:00
Andreas Hansson da950caed2 mem: Fix sender state bug and delay popping
This patch fixes a newly introduced bug where the sender state was
popped before checking that it should be. Amazingly all regressions
pass, but Linux fails to boot on the detailed CPU with caches enabled.
2013-02-19 12:57:47 -05:00
Andreas Hansson c10098f28b scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
2013-02-19 05:56:06 -05:00
Andreas Hansson 860155a5fc mem: Enforce strict use of busFirst- and busLastWordTime
This patch adds a check to ensure that the delay incurred by
the bus is not simply disregarded, but accounted for by someone. At
this point, all the modules do is to zero it out, and no additional
time is spent. This highlights where the bus timing is simply dropped
instead of being paid for.

As a follow up, the locations identified in this patch should add this
additional time to the packets in one way or another. For now it
simply acts as a sanity check and highlights where the delay is simply
ignored.

Since no time is added, all regressions remain the same.
2013-02-19 05:56:06 -05:00
Andreas Hansson 40d0e6c899 mem: Change accessor function names to match the port interface
This patch changes the names of the cache accessor functions to be in
line with those used by the ports. This is done to avoid confusion and
get closer to a one-to-one correspondence between the interface of the
memory object (the cache in this case) and the port itself.

The member function timingAccess has been split into a snoop/non-snoop
part to avoid branching on the isResponse() of the packet.
2013-02-19 05:56:06 -05:00
Andreas Hansson b3fc8839c4 mem: Make packet bus-related time accounting relative
This patch changes the bus-related time accounting done in the packet
to be relative. Besides making it easier to align the cache timing to
cache clock cycles, it also makes it possible to create a Last-Level
Cache (LLC) directly to a memory controller without a bus inbetween.

The bus is unique in that it does not ever make the packets wait to
reflect the time spent forwarding them. Instead, the cache is
currently responsible for making the packets wait. Thus, the bus
annotates the packets with the time needed for the first word to
appear, and also the last word. The cache then delays the packets in
its queues before passing them on. It is worth noting that every
object attached to a bus (devices, memories, bridges, etc) should be
doing this if we opt for keeping this way of accounting for the bus
timing.
2013-02-19 05:56:06 -05:00
Andreas Hansson 362160c8ae mem: Add deferred packet class to prefetcher
This patch removes the time field from the packet as it was only used
by the preftecher. Similar to the packet queue, the prefetcher now
wraps the packet in a deferred packet, which also has a tick
representing the absolute time when the packet should be sent.
2013-02-19 05:56:06 -05:00
Andreas Hansson 7cd49b24d2 sim: Make clock private and access using clockPeriod()
This patch makes the clock member private to the ClockedObject and
forces all children to access it using clockPeriod(). This makes it
impossible to inadvertently change the clock, and also makes it easier
to transition to a situation where the clock is derived from e.g. a
clock domain, or through a multiplier.
2013-02-19 05:56:06 -05:00
Sascha Bischoff 86a4d09269 mem: Fix SenderState related cache deadlock
This patch fixes a potential deadlock in the caches. This deadlock
could occur when more than one cache is used in a system, and
pkt->senderState is modified in between the two caches. This happened
as the caches relied on the senderState remaining unchanged, and used
it for instantaneous upstream communication with other caches.

This issue has been addressed by iterating over the linked list of
senderStates until we are either able to cast to a MSHR* or
senderState is NULL. If the cast is successful, we know that the
packet has previously passed through another cache, and therefore
update the downstreamPending flag accordingly. Otherwise, we do
nothing.
2013-02-19 05:56:06 -05:00
Andreas Hansson 0622f30961 mem: Add predecessor to SenderState base class
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.

There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
2013-02-19 05:56:05 -05:00
Andreas Hansson f6550b3d20 mem: Tighten up cache constness and scoping
This patch merely adopts a more strict use of const for the cache
member functions and variables, and also moves a large portion of the
member functions from public to protected.
2013-02-15 17:40:10 -05:00
Andreas Sandberg b904bd5437 sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.

To make memory mode tests cleaner, the following methods are added to
the System class:

 * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
 * isTimingMode() -- True if the memory mode is 'timing'.
 * bypassCaches() -- True if caches should be bypassed.

The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
2013-02-15 17:40:09 -05:00
Anthony Gutierrez af0f8b31db cache: remove drainManager because it's not used
the cache drainManager is set but never cleared, this is because
the cache itself does not need to be drained and thus never
triggers a signalDrainDone(). because the drainManager variable
is not used properly and does not appear to be necessary it has
been removed with this patch.
2013-01-28 20:19:42 -05:00
Mitch Hayenga c7dbd5e768 mem: Make LL/SC locks fine grained
The current implementation in gem5 just keeps a list of locks per cacheline.
Due to this, a store to a non-overlapping portion of the cacheline can cause an
LL/SC pair to fail.  This patch simply adds an address range to the lock
structure, so that the lock is only invalidated if the store overlaps the lock
range.
2013-01-08 08:54:07 -05:00
Andreas Sandberg 964aa49d15 mem: Fix guest corruption when caches handle uncacheable accesses
When the classic gem5 cache sees an uncacheable memory access, it used
to ignore it or silently drop the cache line in case of a
write. Normally, there shouldn't be any data in the cache belonging to
an uncacheable address range. However, since some architecture models
don't implement cache maintenance instructions, there might be some
dirty data in the cache that is discarded when this happens. The
reason it has mostly worked before is because such cache lines were
most likely evicted by normal memory activity before a TLB flush was
requested by the OS.

Previously, the cache model would invalidate cache lines when they
were accessed by an uncacheable write. This changeset alters this
behavior so all uncacheable memory accesses cause a cache flush with
an associated writeback if necessary. This is implemented by reusing
the cache flushing machinery used when draining the cache, which
implies that writebacks are performed using functional accesses.
2013-01-07 13:05:47 -05:00
Andreas Sandberg d44f2f611f mem: Remove the IIC replacement policy
The IIC replacement policy seems to be unused and has probably
gathered too much bit rot to be useful. This patch removes the IIC and
its associated cache parameters.
2013-01-07 13:05:39 -05:00
Andreas Hansson 921490a060 sim: Fatal if a clocked object is set to have a clock of 0
This patch adds a check to the clocked object constructor to ensure it
is not configured to have a clock period of 0.
2013-01-07 13:05:39 -05:00
Ali Saidi 9a645d6e9b cache: add note about where conflicts are handled 2013-01-07 13:05:32 -05:00
Andreas Sandberg ddd6af414c mem: Add support for writing back and flushing caches
This patch adds support for the following optional drain methods in
the classical memory system's cache model:

memWriteback() - Write back all dirty cache lines to memory using
functional accesses.

memInvalidate() - Invalidate all cache lines. Dirty cache lines
are lost unless a writeback is requested.

Since memWriteback() is called when checkpointing systems, this patch
adds support for checkpointing systems with caches. The serialization
code now checks whether there are any dirty lines in the cache. If
there are dirty lines in the cache, the checkpoint is flagged as bad
and a warning is printed.
2012-11-02 11:32:02 -05:00
Andreas Sandberg b81a977e6a sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
2012-11-02 11:32:01 -05:00
Andreas Sandberg c0ab52799c sim: Include object header files in SWIG interfaces
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.

This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
2012-11-02 11:32:01 -05:00
Andreas Hansson 2a740aa096 Port: Add protocol-agnostic ports in the port hierarchy
This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.

The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.
2012-10-15 08:12:35 -04:00