This patch introduces port access methods that separates snoop
request/responses from normal memory request/responses. The
differentiation is made for functional, atomic and timing accesses and
builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the
different phases of the protocol (request -> [forwarded snoop request
-> snoop response]* -> response) all use the same port access
functions, even though the snoop packets flow in the opposite
direction to the normal packet. That is, a coherent master sends
normal request and receives responses, but receives snoop requests and
sends snoop responses (vice versa for the slave). These two distinct
phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a
slave through sendFunctional, and the request packet is turned into a
response before the call returns. In a system without cache coherence,
this is all that is needed from the functional interface. For the
cache-coherent scenario, a slave also sends snoop requests to coherent
masters through sendFunctionalSnoop, with responses returned within
the same packet pointer. This is currently used by the bus and caches,
and the LSQ of the O3 CPU. The send/recvFunctional and
send/recvFunctionalSnoop are moved from the Port super class to the
appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with
request being sent from master to slave through sendAtomic. In the
case of cache-coherent ports, a slave can send snoop requests to a
master through sendAtomicSnoop. Just as for the functional access
methods, the atomic send and receive member functions are moved to the
appropriate subclasses.
The timing access methods are different from the functional and atomic
in that requests and responses are separated in time and
send/recvTiming are used for both directions. Hence, a master uses
sendTiming to send a request to a slave, and a slave uses sendTiming
to send a response back to a master, at a later point in time. Snoop
requests and responses travel in the opposite direction, similar to
what happens in functional and atomic accesses. With the introduction
of this patch, it is possible to determine the direction of packets in
the bus, and no longer necessary to look for both a master and a slave
port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming
that are pure virtual functions, the recvFunctionalSnoop,
recvAtomicSnoop and recvTimingSnoop have a default implementation that
calls panic. This is to allow non-coherent master and slave ports to
not implement these functions.
This patch introduces the notion of a master and slave port in the C++
code, thus bringing the previous classification from the Python
classes into the corresponding simulation objects and memory objects.
The patch enables us to classify behaviours into the two bins and add
assumptions and enfore compliance, also simplifying the two
interfaces. As a starting point, isSnooping is confined to a master
port, and getAddrRanges to slave ports. More of these specilisations
are to come in later patches.
The getPort function is not getMasterPort and getSlavePort, and
returns a port reference rather than a pointer as NULL would never be
a valid return value. The default implementation of these two
functions is placed in MemObject, and calls fatal.
The one drawback with this specific patch is that it requires some
code duplication, e.g. QueuedPort becomes QueuedMasterPort and
QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort
(avoiding multiple inheritance). With the later introduction of the
port interfaces, moving the functionality outside the port itself, a
lot of the duplicated code will disappear again.
This patch decouples the queueing and the port interactions to
simplify the introduction of the master and slave ports. By separating
the queueing functionality from the port itself, it becomes much
easier to distinguish between master and slave ports, and still retain
the queueing ability for both (without code duplication).
As part of the split into a PacketQueue and a port, there is now also
a hierarchy of two port classes, QueuedPort and SimpleTimingPort. The
QueuedPort is useful for ports that want to leave the packet
transmission of outgoing packets to the queue and is used by both
master and slave ports. The SimpleTimingPort inherits from the
QueuedPort and adds the implemention of recvTiming and recvFunctional
through recvAtomic.
The PioPort and MessagePort are cleaned up as part of the changes.
--HG--
rename : src/mem/tport.cc => src/mem/packet_queue.cc
rename : src/mem/tport.hh => src/mem/packet_queue.hh
The block is never inserted because it's the one extra block in the cache, but
it can be invalidated twice in a row. In that case the block doesn't have a
new master id (beacuse it was never inserted), however it is valid and
the accounting goes wrong at that point.
This patch splits the two cache ports into a master (memory-side) and
slave (cpu-side) subclass of port with slightly different
functionality. For example, it is only the CPU-side port that blocks
incoming requests, and only the memory-side port that schedules send
events outside of what the transmit list dictates.
This patch simplifies the two classes by relying further on
SimpleTimingPort and also generalises the latter to better accommodate
the changes (introducing trySendTiming and scheduleSend). The
memory-side cache port overrides sendDeferredPacket to be able to not
only send responses from the transmit list, but also send requests
based on the MSHRs.
A follow on patch further simplifies the SimpleTimingPort and the
cache ports.
This patch fixes the cache stats to use the new request ids.
Cache stats also display the requestor names in the vector subnames.
Most cache stats now include "nozero" and "nonan" flags to reduce the
amount of excessive cache stat dump. Also, simplified
incMissCount()/incHitCount() functions.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
This patch is a very straight-forward simplification, removing the
unecessary otherPort pointer from the cache port. The pointer was only
used to forward range changes, and the address range is fixed for the
cache. Removing the pointer simplifies the transition to master/slave
ports.
The functional ports are no longer used and this patch cleans up the
legacy that is still present in buses, memories, CPUs etc. Note that
this does not refer to the class FunctionalPort (already removed), but
rather ports with the name (and use) functional.
This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
This patch removes the inheritance of EventManager from the ports and
moves all responsibility for event queues to the owner. Eventually the
event manager should be the interface block, which could either be the
structural owner or a subblock like a LSQ in the O3 CPU for example.
This patch changes the functionalAccess member function in the cache
model such that it is aware of what port the access came from, i.e. if
it came from the CPU side or from the memory side. By adding this
information, it is possible to respect the 'forwardSnoops' flag for
snooping requests coming from the memory side and not forward
them. This fixes an outstanding issue with the IO bus getting accesses
that have no valid destination port and also cleans up future changes
to the bus model.
Check that we're not currently writing back an address the prefetcher is trying
to prefetch before issuing it. We previously checked the mshrQueue and the cache
itself, but forgot to check the writeBuffer. This fixes a memory corrucption
issue with an L2 prefetcher.
Prefetch requests issued from the L2 or below wouldn't check if valid data is
present higher in the system. If a prefetch into the L2 occured at the same
time as writeback from a higher-level cache the dirty data could be replaced
in by unmodified data in memory.
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
This change fixes the problem for all the cases we actively use. If you want to try
more creative I/O device attachments (E.g. sharing an L2), this won't work. You
would need another level of caching between the I/O device and the cache
(which you actually need anyway with our current code to make sure writes
propagate). This is required so that you can mark the cache in between as
top level and it won't try to send ownership of a block to the I/O device.
Asserts have been added that should catch any issues.
If we write back an exclusive copy, we now mark it
as such, so the cache receiving the writeback can
mark its copy as exclusive. This avoids some
unnecessary upgrade requests when a cache later
tries to re-acquire exclusive access to the block.
Corrects an oversight in cset f97b62be544f. The fix there only
failed queued SCUpgradeReq packets that encountered an
invalidation, which meant that the upgrade had to reach the L2
cache. To handle pending requests in the L1 we must similarly
fail StoreCondReq packets too.
Allow lower-level caches (e.g., L2 or L3) to pass exclusive
copies to higher levels (e.g., L1). This eliminates a lot
of unnecessary upgrade transactions on read-write sequences
to non-shared data.
Also some cleanup of MSHR coherence handling and multiple
bug fixes.
Requires new "SCUpgradeReq" message that marks upgrades
for store conditionals, so downstream caches can fail
these when they run into invalidations.
See http://www.m5sim.org/flyspray/task/197
Only set the dirty bit when we actually write to a block
(not if we thought we might but didn't, as in a failed
SC or CAS). This requires makeing sure the dirty bit
stays set when we get an exclusive (writable) copy
in a cache-to-cache transfer from another owner, which
n turn requires copying the mem-inhibit flag from
timing-mode requests to their associated responses.
On the config end, if a shared L2 is created for the system, it is
parameterized to have n sharers as defined by option.num_cpus. In addition to
making the cache sharing aware so that discriminating tag policies can make use
of context_ids to make decisions, I added an occupancy AverageStat and an occ %
stat to each cache so that you could know which contexts are occupying how much
cache on average, both in terms of blocks and percentage. Note that since
devices have context_id -1, having an array of occ stats that correspond to
each context_id will break here, so in FS mode I add an extra bucket for device
blocks. This bucket is explicitly not added in SE mode in order to not only
avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas
break when a bucket is 0).
This prevents redundant prefetches from being issued, solving the
occasional 'needsExclusive && !blk->isWritable()' assertion failure
in cache_impl.hh that several people have run into.
Eliminates "prefetch_cache_check_push" flag, neither setting of
which really solved the problem.
Previously there was one per bus, which caused some coherence problems
when more than one decided to respond. Now there is just one on
the main memory bus. The default bus responder on all other buses
is now the downstream cache's cpu_side port. Caches no longer need
to do address range filtering; instead, we just have a simple flag
to prevent snoops from propagating to the I/O bus.
Apparently we broke it with the cache rewrite and never noticed.
Thanks to Bao Yungang <baoyungang@gmail.com> for a significant part
of these changes (and for inspiring me to work on the rest).
Some other overdue cleanup on the prefetch code too.