If we write back an exclusive copy, we now mark it
as such, so the cache receiving the writeback can
mark its copy as exclusive. This avoids some
unnecessary upgrade requests when a cache later
tries to re-acquire exclusive access to the block.
Corrects an oversight in cset f97b62be544f. The fix there only
failed queued SCUpgradeReq packets that encountered an
invalidation, which meant that the upgrade had to reach the L2
cache. To handle pending requests in the L1 we must similarly
fail StoreCondReq packets too.
Allow lower-level caches (e.g., L2 or L3) to pass exclusive
copies to higher levels (e.g., L1). This eliminates a lot
of unnecessary upgrade transactions on read-write sequences
to non-shared data.
Also some cleanup of MSHR coherence handling and multiple
bug fixes.
Requires new "SCUpgradeReq" message that marks upgrades
for store conditionals, so downstream caches can fail
these when they run into invalidations.
See http://www.m5sim.org/flyspray/task/197
Only set the dirty bit when we actually write to a block
(not if we thought we might but didn't, as in a failed
SC or CAS). This requires makeing sure the dirty bit
stays set when we get an exclusive (writable) copy
in a cache-to-cache transfer from another owner, which
n turn requires copying the mem-inhibit flag from
timing-mode requests to their associated responses.
On the config end, if a shared L2 is created for the system, it is
parameterized to have n sharers as defined by option.num_cpus. In addition to
making the cache sharing aware so that discriminating tag policies can make use
of context_ids to make decisions, I added an occupancy AverageStat and an occ %
stat to each cache so that you could know which contexts are occupying how much
cache on average, both in terms of blocks and percentage. Note that since
devices have context_id -1, having an array of occ stats that correspond to
each context_id will break here, so in FS mode I add an extra bucket for device
blocks. This bucket is explicitly not added in SE mode in order to not only
avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas
break when a bucket is 0).
This prevents redundant prefetches from being issued, solving the
occasional 'needsExclusive && !blk->isWritable()' assertion failure
in cache_impl.hh that several people have run into.
Eliminates "prefetch_cache_check_push" flag, neither setting of
which really solved the problem.
Previously there was one per bus, which caused some coherence problems
when more than one decided to respond. Now there is just one on
the main memory bus. The default bus responder on all other buses
is now the downstream cache's cpu_side port. Caches no longer need
to do address range filtering; instead, we just have a simple flag
to prevent snoops from propagating to the I/O bus.
Apparently we broke it with the cache rewrite and never noticed.
Thanks to Bao Yungang <baoyungang@gmail.com> for a significant part
of these changes (and for inspiring me to work on the rest).
Some other overdue cleanup on the prefetch code too.
I was asserting that the only reason you would defer targets is if
a write came in while you had an outstanding read miss, but there's
another case where you could get a read access after you've snooped
an invalidation and buffered it because it applies to a prior
outstanding miss.
if a prior write miss arrived while an even earlier
read miss was still outstanding.
--HG--
extra : convert_revision : 4924e145829b2ecf4610b88d33f4773510c6801a
where we defer a response to a read from a far-away cache A, then later
defer a ReadExcl from a cache B on the same bus as us. We'll assert
MemInhibit in both cases, but in the latter case MemInhibit will keep
the invalidation from reaching cache A. This special response tells
cache A that it gets the block to satisfy its read, but must immediately
invalidate it.
--HG--
extra : convert_revision : f85c8b47bb30232da37ac861b50a6539dc81161b
Not so much noise on failed sends, and more complete
info when grepping a trace using an address.
--HG--
extra : convert_revision : 05a8261c9452072ca08b906200c6322b33e2b9f1
SimObjects not yet updated:
- Process and subclasses
- BaseCPU and subclasses
The SimObject(const std::string &name) constructor was removed. Subclasses
that still rely on that behavior must call the parent initializer as
: SimObject(makeParams(name))
--HG--
extra : convert_revision : d6faddde76e7c3361ebdbd0a7b372a40941c12ed
Make sure not to keep processing functional accesses
after they've been responded to.
Also use checkFunctional() return value instead of checking
packet command field where possible, mostly just for consistency.
--HG--
extra : convert_revision : 29fc76bc18731bd93a4ed05a281297827028ef75
Turns out DeferredSnoop isn't quite the right bit of info
we needed... see new comment in cache_impl.hh.
--HG--
extra : convert_revision : a38de8c1677a37acafb743b7074ef88b21d3b7be
If the invalidation beats the upgrade at a lower level
then the upgrade must be converted to a read exclusive
"in the field".
Restructure target list & deferred target list to
factor out some common code.
--HG--
extra : convert_revision : 7bab4482dd6c48efdb619610f0d3778c60ff777a
- Add "deferred snoop" flag to Packet so upper-level caches
can distinguish whether lower-level cache request was
in-service or not at the time of the original snoop.
- Revamp response handling to properly handle deferred snoops
on non-cache-fill requests (i.e. upgrades).
- Make sure forwarded writebacks are kept in write buffer at
lower-level caches so they get snooped properly.
--HG--
extra : convert_revision : 17f8a3772a1ae31a16991a53f8225ddf54d31fc9
src/cpu/simple/timing.cc:
Fix another SC problem.
src/mem/cache/cache_impl.hh:
Forgot to call makeTimingResponse() on uncached timing responses.
--HG--
extra : convert_revision : 5a5a58ca2053e4e8de2133205bfd37de15eb4209
Stats pretty much line up with old code, except:
- bug in old code included L1 latency in L2 miss time, making it too high
- UniCoherence did cache-to-cache transfers even from non-owner caches,
so occasionally the icache would get a block from the dcache not the L2
- L2 can now receive ReadExReq from L1 since L1s have coherence
--HG--
extra : convert_revision : 5052c1a1767b5a662f30a88f16012165a73b791c
Change target overflow from assertion to warning.
src/mem/cache/cache_impl.hh:
Change target overflow from assertion to warning.
--HG--
extra : convert_revision : ceca990ed916bbf96dedd4836c40df522803f173
src/mem/cache/cache_impl.hh:
Handle grants with no packet.
src/mem/cache/miss/mshr.cc:
Fix MSHR snoop hit handling.
--HG--
extra : convert_revision : f365283afddaa07cb9e050b2981ad6a898c14451
sure we don't re-request bus prematurely. Use callback to
avoid calling sendRetry() recursively within recvTiming.
--HG--
extra : convert_revision : a907a2781b4b00aa8eb1ea7147afc81d6b424140
timing mode still broken.
configs/example/memtest.py:
Revamp options.
src/cpu/memtest/memtest.cc:
No need for memory initialization.
No need to make atomic response... memory system should do that now.
src/cpu/memtest/memtest.hh:
MemTest really doesn't want to snoop.
src/mem/bridge.cc:
checkFunctional() cleanup.
src/mem/bus.cc:
src/mem/bus.hh:
src/mem/cache/base_cache.cc:
src/mem/cache/base_cache.hh:
src/mem/cache/cache.cc:
src/mem/cache/cache.hh:
src/mem/cache/cache_blk.hh:
src/mem/cache/cache_builder.cc:
src/mem/cache/cache_impl.hh:
src/mem/cache/coherence/coherence_protocol.cc:
src/mem/cache/coherence/coherence_protocol.hh:
src/mem/cache/coherence/simple_coherence.hh:
src/mem/cache/miss/SConscript:
src/mem/cache/miss/mshr.cc:
src/mem/cache/miss/mshr.hh:
src/mem/cache/miss/mshr_queue.cc:
src/mem/cache/miss/mshr_queue.hh:
src/mem/cache/prefetch/base_prefetcher.cc:
src/mem/cache/tags/fa_lru.cc:
src/mem/cache/tags/fa_lru.hh:
src/mem/cache/tags/iic.cc:
src/mem/cache/tags/iic.hh:
src/mem/cache/tags/lru.cc:
src/mem/cache/tags/lru.hh:
src/mem/cache/tags/split.cc:
src/mem/cache/tags/split.hh:
src/mem/cache/tags/split_lifo.cc:
src/mem/cache/tags/split_lifo.hh:
src/mem/cache/tags/split_lru.cc:
src/mem/cache/tags/split_lru.hh:
src/mem/packet.cc:
src/mem/packet.hh:
src/mem/physical.cc:
src/mem/physical.hh:
src/mem/tport.cc:
More major reorg. Seems to work for atomic mode now,
timing mode still broken.
--HG--
extra : convert_revision : 7e70dfc4a752393b911880ff028271433855ae87
src/mem/cache/cache_impl.hh:
src/mem/cache/coherence/simple_coherence.hh:
Get rid of old invalidate propagation logic in preparation
for new multilevel snoop protocol.
src/mem/cache/coherence/coherence_protocol.cc:
L2 cache now has protocol, so protocol must handle ReadExReq
coming in from the CPU side.
src/mem/cache/miss/mshr_queue.cc:
Assertion is failing, so let's take it out for now.
src/mem/packet.cc:
src/mem/packet.hh:
Add WritebackAck command.
Reorganize enum to put responses next to corresponding requests.
Get rid of unused WriteReqNoAck.
--HG--
extra : convert_revision : 24c519846d161978123f9aa029ae358a41546c73
Compiles but doesn't work... committing just so I can merge
(stupid bk!).
src/mem/bridge.cc:
Get rid of SNOOP_COMMIT.
src/mem/bus.cc:
src/mem/packet.hh:
Get rid of SNOOP_COMMIT & two-pass snoop.
First bits of EXPRESS_SNOOP support.
src/mem/cache/base_cache.cc:
src/mem/cache/base_cache.hh:
src/mem/cache/cache.hh:
src/mem/cache/cache_impl.hh:
src/mem/cache/miss/blocking_buffer.cc:
src/mem/cache/miss/miss_queue.cc:
src/mem/cache/prefetch/base_prefetcher.cc:
Big reorg of ports and port-related functions & events.
src/mem/cache/cache.cc:
src/mem/cache/cache_builder.cc:
src/mem/cache/coherence/SConscript:
Get rid of UniCoherence object.
--HG--
extra : convert_revision : 7672434fa3115c9b1c94686f497e57e90413b7c3
src/dev/io_device.cc:
extra printing and assertions
src/mem/bridge.hh:
deal with packets only satisfying part of a request by making many requests
src/mem/cache/cache_impl.hh:
make the cache try to satisfy a functional request from the cache above it before checking itself
--HG--
extra : convert_revision : 1df52ab61d7967e14cc377c560495430a6af266a
add seperate response buffers and request queue sizes in bus bridge
add delay to respond to a nack in the bus bridge
src/dev/i8254xGBe.cc:
src/dev/ide_ctrl.cc:
src/dev/ns_gige.cc:
src/dev/pcidev.hh:
src/dev/sinic.cc:
add backoff delay parameters
src/dev/io_device.cc:
src/dev/io_device.hh:
add a backoff algorithm when nacks are received.
src/mem/bridge.cc:
src/mem/bridge.hh:
add seperate response buffers and request queue sizes
add a new parameters to specify how long before a nack in ready to go after a packet that needs to be nacked is received
src/mem/cache/cache_impl.hh:
assert on the
src/mem/tport.cc:
add a friendly assert to make sure the packet was inserted into the list
--HG--
extra : convert_revision : 3595ad932015a4ce2bb72772da7850ad91bd09b1
In this way a MemoryObject can keep a functional port around and give it to anyone who wants to do functional accesses rather
than creating a new one each time.
src/mem/bus.cc:
src/mem/bus.hh:
src/mem/cache/cache_impl.hh:
only keep around one func port we give to anyone who wants it. Otherwise we can run out of port ids reasonably quickly if
a lot of functional accesses are happening (e.g. remote debugging, dprintk, etc)
--HG--
extra : convert_revision : 6a9e3e96f51cedaab6de1b36cf317203899a3716
into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/clean2
src/cpu/base_dyn_inst.hh:
Hand merge. Line is no longer needed because it's handled in the ISA.
--HG--
extra : convert_revision : 0be4067aa38759a5631c6940f0167d48fde2b680
1. Update packet's flags properly when a snoop happens
2. Don't allow accesses to read a block's data if the block has outstanding MSHRs. This avoids a RAW hazard in MP systems that the memory system was not detecting properly earlier (a write required a block to upgrade, and while the upgrade was outstanding, a read came along and read old data).
3. Update MSHR's request upon a response being handled. If the MSHR has more targets than it can respond to in one cycle, then its request must be properly updated to the new head of the targets list.
src/mem/bus.cc:
Update packet's flags properly upon snoop.
src/mem/cache/cache_impl.hh:
Be sure to not allow accesses to a block with outstanding MSHRs.
src/mem/cache/miss/miss_queue.cc:
Update MSHR's request upon a response being handled.
--HG--
extra : convert_revision : 76a9abc610ca3f1904f075ad21637148a41982d6
src/cpu/memtest/memtest.cc:
Add the [] to a delete to make it work correctly
src/mem/cache/cache_impl.hh:
Fix one of the memory leaks
--HG--
extra : convert_revision : 64c7465c68a084efe38a62419205518b24d852a7