Commit graph

564 commits

Author SHA1 Message Date
Andreas Hansson 5df96cb690 mem: Remove redundant Packet::allocate calls
This patch cleans up the packet memory allocation confusion. The data
is always allocated at the requesting side, when a packet is created
(or copied), and there is never a need for any device to allocate any
space if it is merely responding to a paket. This behaviour is in line
with how SystemC and TLM works as well, thus increasing
interoperability, and matching established conventions.

The redundant calls to Packet::allocate are removed, and the checks in
the function are tightened up to make sure data is only ever allocated
once. There are still some oddities in the packet copy constructor
where we copy the data pointer if it is static (without ownership),
and allocate new space if the data is dynamic (with ownership). The
latter is being worked on further in a follow-on patch.
2014-12-02 06:07:41 -05:00
Andreas Hansson 9779ba2e37 mem: Add const getters for write packet data
This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.
2014-12-02 06:07:36 -05:00
Ali Saidi b31d9e93e2 arm, mem: Fix drain bug and provide drain prints for more components. 2014-10-29 23:18:26 -05:00
Curtis Dunham 4024fab7fc mem: don't inhibit WriteInv's or defer snoops on their MSHRs
WriteInvalidate semantics depend on the unconditional writeback
or they won't complete.  Also, there's no point in deferring snoops
on their MSHRs, as they don't get new data at the end of their life
cycle the way other transactions do.

Add comment in the cache about a minor inefficiency re: WriteInvalidate.
2014-10-21 17:04:41 -05:00
Curtis Dunham 46f9f11a55 mem: have WriteInvalidate obsolete MSHRs
Since WriteInvalidate directly writes into the cache, it can
create tricky timing interleavings with reads and writes to the
same cache line that haven't yet completed.  This patch ensures
that these requests, when completed, don't overwrite the newer
data from the WriteInvalidate.
2014-10-29 23:18:24 -05:00
Andreas Hansson df973abef3 mem: Dynamically determine page bytes in memory components
This patch takes a step towards an ISA-agnostic memory
system by enabling the components to establish the page size after
instantiation. The swap operation in the memory is now also allowing
any granularity to avoid depending on the IntReg of the ISA.
2014-10-16 05:49:43 -04:00
Andreas Hansson f4a538f862 mem: Add packet sanity checks to cache and MSHRs
This patch adds a number of asserts to the cache, checking basic
assumptions about packets being requests or responses.
2014-10-09 17:51:56 -04:00
Andreas Hansson de62aedabc misc: Fix a bunch of minor issues identified by static analysis
Add some missing initialisation, and fix a handful benign resource
leaks (including some false positives).
2014-09-27 09:08:29 -04:00
Andreas Hansson 1f6d5f8f84 mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.

As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.

--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
2014-09-20 17:18:32 -04:00
Mitch Hayenga 3e5bf0c922 mem: Remove the GHB prefetcher from the source tree
There are two primary issues with this code which make it deserving of deletion.

1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher
2) This prefetcher isn't even structured like a GHB prefetcher.
   It's basically a worse version of the stride prefetcher.

It primarily serves to confuse new gem5 users and most functionality is already
present in the stride prefetcher.
2014-09-20 17:17:44 -04:00
Andreas Hansson f615c4aeb0 misc: Remove assertions ensuring unsigned values >= 0 2014-09-19 10:35:07 -04:00
Andreas Hansson 38646d48eb mem: Add checks to sendTimingReq in cache
A small fix to ensure the return value is not ignored.
2014-09-19 10:35:04 -04:00
Andreas Hansson da4539dc74 misc: Fix a number of unitialised variables and members
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
2014-09-09 04:36:31 -04:00
Curtis Dunham f6f63ec0aa mem: write streaming support via WriteInvalidate promotion
Support full-block writes directly rather than requiring RMW:
 * a cache line is allocated in the cache upon receipt of a
   WriteInvalidateReq, not the WriteInvalidateResp.
 * only top-level caches allocate the line; the others just pass
   the request along and invalidate as necessary.
 * to close a timing window between the *Req and the *Resp, a new
   metadata bit tracks whether another cache has read a copy of
   the new line before the writeback to memory.
2014-06-27 12:29:00 -05:00
Andreas Hansson 3be4f4b846 mem: Fix a bug in the cache port flow control
This patch fixes a bug in the cache port where the retry flag was
reset too early, allowing new requests to arrive before the retry was
actually sent, but with the event already scheduled. This caused a
deadlock in the interactions with the O3 LSQ.

The patche fixes the underlying issue by shifting the resetting of the
flag to be done by the event that also calls sendRetry(). The patch
also tidies up the flow control in recvTimingReq and ensures that we
also check if we already have a retry outstanding.
2014-09-03 07:42:50 -04:00
Curtis Dunham 5d029463ee cpu, mem: Make software prefetches non-blocking
Previously, they were treated so much like loads that they could stall
at the head of the ROB.  Now they are always treated like L1 hits.
If they actually miss, a new request is created at the L1 and tracked
from the MSHRs there if necessary (i.e. if it didn't coalesce with
an existing outstanding load).
2014-05-13 12:20:49 -05:00
Geoffrey Blake b404ffde60 cache: Fix handling of LL/SC requests under contention
If a set of LL/SC requests contend on the same cache block we
can get into a situation where CPUs will deadlock if they expect
a failed SC to supply them data.  This case happens where 3 or
more cores are contending for a cache block using LL/SC and the system
is configured where 2 cores are connected to a local bus and the
third is connected to a remote bus.  If a core on the local bus
sends an SCUpgrade and the core on the remote bus sends and SCUpgrade
they will race to see who will win the SC access.  In the meantime
if the other core appends a read to one of the SCUpgrades it will expect
to be supplied data by that SCUpgrade transaction.  If it happens that
the SCUpgrade that was picked to supply the data is failed, it will
drop the appended request for data and never respond, leaving the requesting
core to deadlock.  This patch makes all SC's behave as normal stores to
prevent this case but still makes sure to check whether it can perform
the update.
2014-09-03 07:42:31 -04:00
Andreas Hansson e1ac962939 arch: Cleanup unused ISA traits constants
This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
2014-09-03 07:42:21 -04:00
Mitch Hayenga f6f6ae461e mem: Properly set cache block status fields on writebacks
When a cacheline is written back to a lower-level cache,
tags->insertBlock() sets various status parameters. However these
status bits were cleared immediately after calling. This patch makes
it so that these status fields are not cleared by moving them outside
of the tags->insertBlock() call.
2014-08-13 06:57:24 -04:00
Anthony Gutierrez a628afedad mem: refactor LRU cache tags and add random replacement tags
this patch implements a new tags class that uses a random replacement policy.
these tags prefer to evict invalid blocks first, if none are available a
replacement candidate is chosen at random.

this patch factors out the common code in the LRU class and creates a new
abstract class: the BaseSetAssoc class. any set associative tag class must
implement the functionality related to the actual replacement policy in the
following methods:

accessBlock()
findVictim()
insertBlock()
invalidate()
2014-07-28 12:23:23 -04:00
Mitch Hayenga a15b713cba mem: Squash prefetch requests from downstream caches
This patch squashes prefetch requests from downstream caches,
so that they do not steal cachelines away from caches closer
to the cpu.  It was originally coded by Mitch Hayenga and
modified by Aasheesh Kolli.
2014-05-09 18:58:46 -04:00
Mitch Hayenga a0d30f36a6 mem: Don't print out the data of a cache block
This never actually worked since it was printing out only a word
of the cache block and not the entire thing and doubly didn't work
csprintf overrides the %#x specifier and assumes a char* array is
actually a string.
2014-04-01 14:24:36 -05:00
Prakash Ramrakhyani e88cffb30a mem: Fix incorrect assert failure in the Cache
This patch fixes an assert condition that is not true at all
times. There are valid situations that arise in dual-core
dual-workload runs where the assert condition is false. The function
call following the assert however needs to be called only when the
condition is true (a block cannot be invalidated in the tags structure
if has not been allocated in the structure, and the tempBlock is never
allocated). Hence the 'assert' has been replaced with an 'if'.
2014-03-07 15:56:23 -05:00
Andreas Hansson 969b436243 mem: Filter cache snoops based on address ranges
This patch adds a filter to the cache to drop snoop requests that are
not for a range covered by the cache. This fixes an issue observed
when multiple caches are placed in parallel, covering different
address ranges. Without this patch, all the caches will forward the
snoop upwards, when only one should do so.
2014-02-18 05:50:58 -05:00
Mitch Hayenga 96317d466e mem: Add additional tolerance to stride prefetcher
Forces the prefetcher to mispredict twice in a row before resetting the
confidence of prefetching.  This helps cases where a load PC strides by a
constant factor, however it may operate on different arrays at times.
Avoids the cost of retraining.  Primarily helps with small iteration loops.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga 771c864bf4 mem: Allowed tagged instruction prefetching in stride prefetcher
For systems with a tightly coupled L2, a stride-based prefetcher may observe
access requests from both instruction and data L1 caches.  However, the PC
address of an instruction miss gives no relevant training information to the
stride based prefetcher(there is no stride to train).  In theses cases, its
better if the L2 stride prefetcher simply reverted back to a simple N-block
ahead prefetcher.  This patch enables this option.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E) 95735e10e7 mem: prefetcher: add options, support for unaligned addresses
This patch extends the classic prefetcher to work on non-block aligned
addresses.  Because the existing prefetchers in gem5 mask off the lower
address bits of cache accesses, many predictable strides fail to be
detected.  For example, if a load were to stride by 48 bytes, with 64 byte
cachelines, the current stride based prefetcher would see an access pattern
of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern.  This
patch fixes this, by training the prefetcher on access and not masking off the
lower address bits.

It also adds the following configuration options:
1) Training/prefetching only on cache misses,
2) Training/prefetching only on data acceses,
3) Optionally tagging prefetches with a PC address.
#3 allows prefetchers to train off of prefetch requests in systems with
multiple cache levels and PC-based prefetchers present at multiple levels.
It also effectively allows a pipelining of prefetch requests (like in POWER4)
across multiple levels of cache hierarchy.

Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7%  for SPECFP (geomean).
2014-01-29 23:21:25 -06:00
Amin Farmahini ffbdaa7cce mem: Remove redundant findVictim() input argument
The patch
(1) removes the redundant writeback argument from findVictim()
(2) fixes the description of access() function

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:50 -06:00
Giacomo Gabrielli aefe9cc624 mem: Add support for a security bit in the memory system
This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
2014-01-24 15:29:30 -06:00
Timothy M. Jones 427ceb57a9 Cache: Collect very basic stats on tag and data accesses
Adds very basic statistics on the number of tag and data accesses within the
cache, which is important for power modelling.  For the tags, simply count
the associativity of the cache each time.  For the data, this depends on
whether tags and data are accessed sequentially, which is given by a new
parameter.  In the parallel case, all data blocks are accessed each time, but
with sequential accesses, a single data block is accessed only on a hit.
2014-01-24 15:29:30 -06:00
Dam Sunwoo 85e8779de7 mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks.  Cache occupancy stats are
recalculated on each stat dump.
2014-01-24 15:29:30 -06:00
Matt Horsnell ca89eba79e mem: track per-request latencies and access depths in the cache hierarchy
Add some values and methods to the request object to track the translation
and access latency for a request and which level of the cache hierarchy responded
to the request.
2014-01-24 15:29:30 -06:00
Matt Horsnell 6decd70bfb cpu: add consistent guarding to *_impl.hh files. 2013-10-17 10:20:45 -05:00
Andreas Hansson 19a5b68db7 arch: Resurrect the NOISA build target and rename it NULL
This patch makes it possible to once again build gem5 without any
ISA. The main purpose is to enable work around the interconnect and
memory system without having to build any CPU models or device models.

The regress script is updated to include the NULL ISA target. Currently
no regressions make use of it, but all the testers could (and perhaps
should) transition to it.

--HG--
rename : build_opts/NOISA => build_opts/NULL
rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts
rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh
rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc
2013-09-04 13:22:57 -04:00
Andreas Hansson d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Xiangyu Dong 4e8ecd7c6f mem: Add cache class destructor to avoid memory leaks
Make valgrind a little bit happier
2013-07-18 08:29:47 -04:00
Prakash Ramrakhyani ac515d7a9b mem: Reorganize cache tags and make them a SimObject
This patch reorganizes the cache tags to allow more flexibility to
implement new replacement policies. The base tags class is now a
clocked object so that derived classes can use a clock if they need
one. Also having deriving from SimObject allows specialized Tag
classes to be swapped in/out in .py files.

The cache set is now templatized to allow it to contain customized
cache blocks with additional informaiton. This involved moving code to
the .hh file and removing cacheset.cc.

The statistics belonging to the cache tags are now including ".tags"
in their name. Hence, the stats need an update to reflect the change
in naming.
2013-06-27 05:49:50 -04:00
Andreas Hansson 0d68d36b9d mem: Remove the cache builder
This patch removes the redundant cache builder class.
2013-06-27 05:49:50 -04:00
Andreas Hansson 33a8d777ad mem: Align cache timing to clock edges
This patch changes the cache timing calculations such that the results
are aligned to clock edges.

Plenty stats change as a results of this patch.
2013-06-27 05:49:49 -04:00
Andreas Hansson 368f50a0a1 mem: Cycles converted to Ticks in atomic cache accesses
This patch fixes an outstanding issue in the cache timing calculations
where an atomic access returned a time in Cycles, but the port
forwarded it on as if it was in Ticks.

A separate patch will update the regression stats.
2013-06-27 05:49:49 -04:00
Andreas Hansson f330b3c28d mem: Remove a redundant heap allocation for a snoop packet
This patch changes the updards snoop packet to avoid allocating and
later deleting it. As the code executes in 0 time and the lifetime of
the packet does not extend beyond the block there is no reason to heap
allocate it.
2013-06-27 05:49:49 -04:00
Andreas Hansson 7da851d1a8 mem: Spring cleaning of MSHR and MSHRQueue
This patch does some minor tidying up of the MSHR and MSHRQueue. The
clean up started as part of some ad-hoc tracing and debugging, but
seems worthwhile enough to go in as a separate patch.

The highlights of the changes are reduced scoping (private) members
where possible, avoiding redundant new/delete, and constructor
initialisation to please static code analyzers.
2013-05-30 12:54:11 -04:00
Andreas Hansson 42191522cc mem: Fix MSHR print format
This patch fixes an incorrect print format string by adding an
additional string element.
2013-05-30 12:54:09 -04:00
Uri Wiener a8fbfefb5e mem: Adding verbose debug output in the memory system
This patch provides useful printouts throughut the memory system. This
includes pretty-printed cache tags and function call messages
(call-stack like).
2013-04-22 13:20:33 -04:00
Mitch Hayenga 4920f0d7e5 mem: Fix cache latency bug
Fixes a latency calculation bug for accesses during a cache line fill.

Under a cache miss, before the line is filled, accesses to the cache are
associated with a MSHR and marked as targets.  Once the line fill completes,
MSHR target packets pay an additional latency of
"responseLatency + busSerializationLatency".  However, the "whenReady"
field of the cache line is only set to an additional delay of
"busSerializationLatency".  This lacks the responseLatency component of
the fill.  It is possible for accesses that occur on the cycle of
(or briefly after) the line fill to respond without properly paying the
responseLatency.  This also creates the situation where two accesses to the
same address may be serviced in an order opposite of how they were received
by the cache.  For stores to the same address, this means that although the
cache performs the stores in the order they were received, acknowledgements
may be sent in a different order.

Adding the responseLatency component to the whenReady field preserves the
penalty that should be paid and prevents these ordering issues.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-03-27 18:36:09 -05:00
Rene de Jong 87089175cc mem: Cancel cache retry event when blocking port
This patch solves the corner case scenario where the sendRetryEvent could be
scheduled twice, when an io device stresses the IOcache in the system. This
should not be possible in the cache system.
2013-03-26 14:46:51 -04:00
Andreas Hansson da950caed2 mem: Fix sender state bug and delay popping
This patch fixes a newly introduced bug where the sender state was
popped before checking that it should be. Amazingly all regressions
pass, but Linux fails to boot on the detailed CPU with caches enabled.
2013-02-19 12:57:47 -05:00
Andreas Hansson c10098f28b scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
2013-02-19 05:56:06 -05:00
Andreas Hansson 860155a5fc mem: Enforce strict use of busFirst- and busLastWordTime
This patch adds a check to ensure that the delay incurred by
the bus is not simply disregarded, but accounted for by someone. At
this point, all the modules do is to zero it out, and no additional
time is spent. This highlights where the bus timing is simply dropped
instead of being paid for.

As a follow up, the locations identified in this patch should add this
additional time to the packets in one way or another. For now it
simply acts as a sanity check and highlights where the delay is simply
ignored.

Since no time is added, all regressions remain the same.
2013-02-19 05:56:06 -05:00
Andreas Hansson 40d0e6c899 mem: Change accessor function names to match the port interface
This patch changes the names of the cache accessor functions to be in
line with those used by the ports. This is done to avoid confusion and
get closer to a one-to-one correspondence between the interface of the
memory object (the cache in this case) and the port itself.

The member function timingAccess has been split into a snoop/non-snoop
part to avoid branching on the isResponse() of the packet.
2013-02-19 05:56:06 -05:00
Andreas Hansson b3fc8839c4 mem: Make packet bus-related time accounting relative
This patch changes the bus-related time accounting done in the packet
to be relative. Besides making it easier to align the cache timing to
cache clock cycles, it also makes it possible to create a Last-Level
Cache (LLC) directly to a memory controller without a bus inbetween.

The bus is unique in that it does not ever make the packets wait to
reflect the time spent forwarding them. Instead, the cache is
currently responsible for making the packets wait. Thus, the bus
annotates the packets with the time needed for the first word to
appear, and also the last word. The cache then delays the packets in
its queues before passing them on. It is worth noting that every
object attached to a bus (devices, memories, bridges, etc) should be
doing this if we opt for keeping this way of accounting for the bus
timing.
2013-02-19 05:56:06 -05:00
Andreas Hansson 362160c8ae mem: Add deferred packet class to prefetcher
This patch removes the time field from the packet as it was only used
by the preftecher. Similar to the packet queue, the prefetcher now
wraps the packet in a deferred packet, which also has a tick
representing the absolute time when the packet should be sent.
2013-02-19 05:56:06 -05:00
Andreas Hansson 7cd49b24d2 sim: Make clock private and access using clockPeriod()
This patch makes the clock member private to the ClockedObject and
forces all children to access it using clockPeriod(). This makes it
impossible to inadvertently change the clock, and also makes it easier
to transition to a situation where the clock is derived from e.g. a
clock domain, or through a multiplier.
2013-02-19 05:56:06 -05:00
Sascha Bischoff 86a4d09269 mem: Fix SenderState related cache deadlock
This patch fixes a potential deadlock in the caches. This deadlock
could occur when more than one cache is used in a system, and
pkt->senderState is modified in between the two caches. This happened
as the caches relied on the senderState remaining unchanged, and used
it for instantaneous upstream communication with other caches.

This issue has been addressed by iterating over the linked list of
senderStates until we are either able to cast to a MSHR* or
senderState is NULL. If the cast is successful, we know that the
packet has previously passed through another cache, and therefore
update the downstreamPending flag accordingly. Otherwise, we do
nothing.
2013-02-19 05:56:06 -05:00
Andreas Hansson 0622f30961 mem: Add predecessor to SenderState base class
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.

There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
2013-02-19 05:56:05 -05:00
Andreas Hansson f6550b3d20 mem: Tighten up cache constness and scoping
This patch merely adopts a more strict use of const for the cache
member functions and variables, and also moves a large portion of the
member functions from public to protected.
2013-02-15 17:40:10 -05:00
Andreas Sandberg b904bd5437 sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.

To make memory mode tests cleaner, the following methods are added to
the System class:

 * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
 * isTimingMode() -- True if the memory mode is 'timing'.
 * bypassCaches() -- True if caches should be bypassed.

The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
2013-02-15 17:40:09 -05:00
Anthony Gutierrez af0f8b31db cache: remove drainManager because it's not used
the cache drainManager is set but never cleared, this is because
the cache itself does not need to be drained and thus never
triggers a signalDrainDone(). because the drainManager variable
is not used properly and does not appear to be necessary it has
been removed with this patch.
2013-01-28 20:19:42 -05:00
Mitch Hayenga c7dbd5e768 mem: Make LL/SC locks fine grained
The current implementation in gem5 just keeps a list of locks per cacheline.
Due to this, a store to a non-overlapping portion of the cacheline can cause an
LL/SC pair to fail.  This patch simply adds an address range to the lock
structure, so that the lock is only invalidated if the store overlaps the lock
range.
2013-01-08 08:54:07 -05:00
Andreas Sandberg 964aa49d15 mem: Fix guest corruption when caches handle uncacheable accesses
When the classic gem5 cache sees an uncacheable memory access, it used
to ignore it or silently drop the cache line in case of a
write. Normally, there shouldn't be any data in the cache belonging to
an uncacheable address range. However, since some architecture models
don't implement cache maintenance instructions, there might be some
dirty data in the cache that is discarded when this happens. The
reason it has mostly worked before is because such cache lines were
most likely evicted by normal memory activity before a TLB flush was
requested by the OS.

Previously, the cache model would invalidate cache lines when they
were accessed by an uncacheable write. This changeset alters this
behavior so all uncacheable memory accesses cause a cache flush with
an associated writeback if necessary. This is implemented by reusing
the cache flushing machinery used when draining the cache, which
implies that writebacks are performed using functional accesses.
2013-01-07 13:05:47 -05:00
Andreas Sandberg d44f2f611f mem: Remove the IIC replacement policy
The IIC replacement policy seems to be unused and has probably
gathered too much bit rot to be useful. This patch removes the IIC and
its associated cache parameters.
2013-01-07 13:05:39 -05:00
Andreas Hansson 921490a060 sim: Fatal if a clocked object is set to have a clock of 0
This patch adds a check to the clocked object constructor to ensure it
is not configured to have a clock period of 0.
2013-01-07 13:05:39 -05:00
Ali Saidi 9a645d6e9b cache: add note about where conflicts are handled 2013-01-07 13:05:32 -05:00
Andreas Sandberg ddd6af414c mem: Add support for writing back and flushing caches
This patch adds support for the following optional drain methods in
the classical memory system's cache model:

memWriteback() - Write back all dirty cache lines to memory using
functional accesses.

memInvalidate() - Invalidate all cache lines. Dirty cache lines
are lost unless a writeback is requested.

Since memWriteback() is called when checkpointing systems, this patch
adds support for checkpointing systems with caches. The serialization
code now checks whether there are any dirty lines in the cache. If
there are dirty lines in the cache, the checkpoint is flagged as bad
and a warning is printed.
2012-11-02 11:32:02 -05:00
Andreas Sandberg b81a977e6a sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
2012-11-02 11:32:01 -05:00
Andreas Sandberg c0ab52799c sim: Include object header files in SWIG interfaces
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.

This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
2012-11-02 11:32:01 -05:00
Andreas Hansson 2a740aa096 Port: Add protocol-agnostic ports in the port hierarchy
This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.

The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.
2012-10-15 08:12:35 -04:00
Andreas Hansson 93a159875a Fix: Address a few minor issues identified by cppcheck
This patch addresses a number of smaller issues identified by the code
inspection utility cppcheck. There are a number of identified leaks in
the arm/linux/system.cc (although the function only get's called once
so it is not a major problem), a few deletes in dev/x86/i8042.cc that
were not array deletes, and sprintfs where the character array had one
element less than needed. In the IIC tags there was a function
allocating an array of longs which is in fact never used.
2012-10-15 08:12:23 -04:00
Andreas Hansson 88554790c3 Mem: Use cycles to express cache-related latencies
This patch changes the cache-related latencies from an absolute time
expressed in Ticks, to a number of cycles that can be scaled with the
clock period of the caches. Ultimately this patch serves to enable
future work that involves dynamic frequency scaling. As an immediate
benefit it also makes it more convenient to specify cache performance
without implicitly assuming a specific CPU core operating frequency.

The stat blocked_cycles that actually counter in ticks is now updated
to count in cycles.

As the timing is now rounded to the clock edges of the cache, there
are some regressions that change. Plenty of them have very minor
changes, whereas some regressions with a short run-time are perturbed
quite significantly. A follow-on patch updates all the statistics for
the regressions.
2012-10-15 08:10:54 -04:00
Djordje Kovacevic 80a26a3e39 MEM: Put memory system document into doxygen 2012-09-25 11:49:41 -05:00
Mrinmoy Ghosh 6fc0094337 Cache: add a response latency to the caches
In the current caches the hit latency is paid twice on a miss. This patch lets
a configurable response latency be set of the cache for the backward path.
2012-09-25 11:49:41 -05:00
Andreas Hansson ffb6aec603 AddrRange: Transition from Range<T> to AddrRange
This patch takes the final plunge and transitions from the templated
Range class to the more specific AddrRange. In doing so it changes the
obvious Range<Addr> to AddrRange, and also bumps the range_map to be
AddrRangeMap.

In addition to the obvious changes, including the removal of redundant
includes, this patch also does some house keeping in preparing for the
introduction of address interleaving support in the ranges. The Range
class is also stripped of all the functionality that is never used.

--HG--
rename : src/base/range.hh => src/base/addr_range.hh
rename : src/base/range_map.hh => src/base/addr_range_map.hh
2012-09-19 06:15:44 -04:00
Andreas Hansson 292d8252a4 clang: Fix issues identified by the clang static analyzer
This patch addresses a few minor issues reported by the clang static
analyzer.

The analysis was run with:

scan-build -disable-checker deadcode \
           -enable-checker experimental.core \
           -disable-checker experimental.core.CastToStruct \
           -enable-checker experimental.cpluscplus
2012-09-11 14:15:47 -04:00
Lena Olson 584eba3ab6 Cache: Split invalidateBlk up to seperate block vs. tags
This seperates the functionality to clear the state in a block into
blk.hh and the functionality to udpate the tag information into the
tags.  This gets rid of the case where calling invalidateBlk on an
already-invalid block does something different than calling it on a
valid block, which was confusing.
2012-09-11 14:14:49 -04:00
Andreas Hansson 287ea1a081 Param: Transition to Cycles for relevant parameters
This patch is a first step to using Cycles as a parameter type. The
main affected modules are the CPUs and the Ruby caches. There are
definitely plenty more places that are affected, but this patch serves
as a starting point to making the transition.

An important part of this patch is to actually enable parameters to be
specified as Param.Cycles which involves some changes to params.py.
2012-09-07 12:34:38 -04:00
Andreas Hansson c60db56741 Packet: Remove NACKs from packet and its use in endpoints
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).

The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.

Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
2012-08-22 11:39:59 -04:00
Andreas Hansson e317d8b9ff Port: Extend the QueuedPort interface and use where appropriate
This patch extends the queued port interfaces with methods for
scheduling the transmission of a timing request/response. The methods
are named similar to the corresponding sendTiming(Snoop)Req/Resp,
replacing the "send" with "sched". As the queues are currently
unbounded, the methods always succeed and hence do not return a value.

This functionality was previously provided in the subclasses by
calling PacketQueue::schedSendTiming with the appropriate
parameters. With this change, there is no need to introduce these
extra methods in the subclasses, and the use of the queued interface
is more uniform and explicit.
2012-08-22 11:39:56 -04:00
Anthony Gutierrez 0b3897fc90 O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs
This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.

This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.
2012-08-15 10:38:08 -04:00
Anthony Gutierrez 7bf14aedbf cache: don't allow dirty data in the i-cache
removes the optimization that forwards an exclusive copy to a requester on a
read, only for the i-cache. this optimization isn't necessary because we
typically won't be writing to the i-cache.
2012-07-27 16:08:04 -04:00
Andreas Hansson b265d9925c Port: Align port names in C++ and Python
This patch is a first step to align the port names used in the Python
world and the C++ world. Ultimately it serves to make the use of
config.json together with output from the simulation easier, including
post-processing of statistics.

Most notably, the CPU, cache, and bus is addressed in this patch, and
there might be other ports that should be updated accordingly. The
dash name separator has also been replaced with a "." which is what is
used to concatenate the names in python, and a separation is made
between the master and slave port in the bus.
2012-07-09 12:35:39 -04:00
Andreas Hansson 46d9adb68c Port: Make getAddrRanges const
This patch makes getAddrRanges const throughout the code base. There
is no reason why it should not be, and making it const prevents adding
any unintentional side-effects.
2012-07-09 12:35:34 -04:00
Andreas Hansson 49407d76aa Port: Add isSnooping to slave port (asking master port)
This patch adds isSnooping to the slave port, and thus avoids going
through getMasterPort to be able to ask the master. Over the course of
the next few patches, all getMasterPort/getSlavePort in Port and
MemObject are to be protocol agnostic, and the snooping is part of the
protocol layer.

The function is already present on the master port, where it is
implemented by the module itself, e.g. a cache. On the slave side, it
is merely asking the connected master port. The same name is used by
both functions despite their difference in behaviour. The initial
design used isMasterSnooping on the slave port side, but the more
verbose function name was later changed.
2012-07-09 12:35:32 -04:00
Andreas Hansson 17f9270dad Port: Move retry from port base class to Master/SlavePort
This patch is the last part of moving all protocol-related
functionality out of the Port base class. All the send/recv functions
are already moved, and the retry (which still governs all the timing
transport functions) is the only part that remained in the base class.

The only point where this currently causes a bit of inconvenience is
in the bus where the retry list is global and holds Port pointers (not
Master/SlavePort). This is about to change with the split into a
request/response bus and will soon be removed anyway.

The patch has no impact on any regressions.
2012-07-09 12:35:31 -04:00
Andreas Hansson ff5718f042 Fix: Address a few benign memory leaks
This patch is the result of static analysis identifying a number of
memory leaks. The leaks are all benign as they are a result of not
deallocating memory in the desctructor. The fix still has value as it
removes false positives in the static analysis.
2012-07-09 12:35:30 -04:00
Lena Olson d2ebade5a5 Cache: Fix the LRU policy for classic memory hierarchy
The LRU policy always evicted the least recently touched way, even if it
contained valid data and another way was invalid, as can happen if a block has
been invalidated by coherance.  This can result in caches never warming up even
though they are replacing blocks.  This modifies the LRU policy to move blocks
to LRU position on invalidation.
2012-06-29 11:21:58 -04:00
Dam Sunwoo 7cbe0cf564 Mem: fix master id assertion in cache_impl.hh
The assertion was applied to the wrong packet.
This patch fixes the issue rerported by Xiang Jiang on the gem5-dev mailing list.
2012-06-29 11:19:07 -04:00
Ali Saidi 8d1e56bdcd Cache: Only invalidate a line in the cache when an uncacheable write is seen. 2012-06-29 11:18:29 -04:00
Ali Saidi c80cd4136e mem: Delay deleting of incoming packets by one call.
This patch is a temporary fix until Andreas' four-phase patches
get reviewed and committed. Removing FastAlloc seems to have exposed
an issue which previously was reasonable rare in which packets are freed
before the sending cache is done with them. This change puts incoming packets
no a pendingDelete queue which are deleted at the start of the next call and
thus breaks the dependency between when the caller returns true and when the
packet is actually used by the sending cache.

Running valgrind on a multi-core linux boot and the memtester results in no
valgrind warnings.
2012-06-07 10:59:03 -04:00
Ali Saidi 1b370431d0 sim: Remove FastAlloc
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
2012-06-05 01:23:08 -04:00
Andreas Hansson 5880fbe96d Bus: Turn the PortId into a transport function parameter
The main aim of this patch is to arrive at a suitable port interface
for vector ports, including both the packet and the port id. This
patch changes the bus transport functions
(recvFunctional/Atomic/Timing) to require a PortId parameter
indicating the source port. Previously this information was passed by
setting the source field of the packet, and this is only required in
the case of a timing request.

With this patch, the use of the source and destination field is also
more restrictive, as they are only needed for timing accesses. The
modifications to these fields for atomic snoops is now removed
entirely, also making minor modifications to the cache.
2012-05-30 05:30:24 -04:00
Andreas Hansson cad802761a Packet: Unify the use of PortID in packet and port
This patch removes the Packet::NodeID typedef and unifies it with the
Port::PortId. The src and dest fields in the packet are used to hold a
port id (e.g. in the bus), and thus the two should actually be the
same.

The typedef PortID is now global (in base/types.hh) and aligned with
the ThreadID in terms of capitalisation and naming of the
InvalidPortID constant.

Before this patch, two flags were used for valid destination and
source, rather than relying on a named value (InvalidPortID), and
this is now redundant, as the src and dest field themselves are
sufficient to tell whether the current value is a valid port
identifier or not. Consequently, the VALID_SRC and VALID_DST are
removed.

As part of the cleaning up, a number of int parameters and local
variables are updated to use PortID.

Note that Ruby still has its own NodeID typedef. Furthermore, the
MemObject getMaster/SlavePort still has an int idx parameter with a
default value of -1 which should eventually change to PortID idx =
InvalidPortID.
2012-05-30 05:29:42 -04:00
Andreas Hansson 49da0497d3 Cache: Remove dangling doWriteback declaration
This patch removes the declaration of doWriteback as there is no
implementation for this member function.
2012-05-24 04:09:19 -04:00
Ali Saidi c02dc07424 Cache: restructure code that actually isn't a loop 2012-05-10 18:04:27 -05:00
Ali Saidi 4f66bcdd2e gem5: fix some iterator use and erase bugs 2012-05-10 18:04:27 -05:00
Ali Saidi 8cee4dacc8 gem5: Fix a number of incorrect case statements 2012-05-10 18:04:26 -05:00
Ali Saidi f6895e8bd4 Cache: Panic if you attempt to create a checkpoint with a cache in the system 2012-05-10 18:04:26 -05:00
Andreas Hansson 3fea59e162 MEM: Separate requests and responses for timing accesses
This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.

For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).

The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.

With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.
2012-05-01 13:40:42 -04:00
Andreas Hansson 750f33a901 MEM: Remove the Broadcast destination from the packet
This patch simplifies the packet by removing the broadcast flag and
instead more firmly relying on (and enforcing) the semantics of
transactions in the classic memory system, i.e. request packets are
routed from a master to a slave based on the address, and when they
are created they have neither a valid source, nor destination. On
their way to the slave, the request packet is updated with a source
field for all modules that multiplex packets from multiple master
(e.g. a bus). When a request packet is turned into a response packet
(at the final slave), it moves the potentially populated source field
to the destination field, and the response packet is routed through
any multiplexing components back to the master based on the
destination field.

Modules that connect multiplexing components, such as caches and
bridges store any existing source and destination field in the sender
state as a stack (just as before).

The packet constructor is simplified in that there is no longer a need
to pass the Packet::Broadcast as the destination (this was always the
case for the classic memory system). In the case of Ruby, rather than
using the parameter to the constructor we now rely on setDest, as
there is already another three-argument constructor in the packet
class.

In many places where the packet information was printed as part of
DPRINTFs, request packets would be printed with a numeric "dest" that
would always be -1 (Broadcast) and that field is now removed from the
printing.
2012-04-14 05:45:55 -04:00
Andreas Hansson dccca0d3a9 MEM: Separate snoops and normal memory requests/responses
This patch introduces port access methods that separates snoop
request/responses from normal memory request/responses. The
differentiation is made for functional, atomic and timing accesses and
builds on the introduction of master and slave ports.

Before the introduction of this patch, the packets belonging to the
different phases of the protocol (request -> [forwarded snoop request
-> snoop response]* -> response) all use the same port access
functions, even though the snoop packets flow in the opposite
direction to the normal packet. That is, a coherent master sends
normal request and receives responses, but receives snoop requests and
sends snoop responses (vice versa for the slave). These two distinct
phases now use different access functions, as described below.

Starting with the functional access, a master sends a request to a
slave through sendFunctional, and the request packet is turned into a
response before the call returns. In a system without cache coherence,
this is all that is needed from the functional interface. For the
cache-coherent scenario, a slave also sends snoop requests to coherent
masters through sendFunctionalSnoop, with responses returned within
the same packet pointer. This is currently used by the bus and caches,
and the LSQ of the O3 CPU. The send/recvFunctional and
send/recvFunctionalSnoop are moved from the Port super class to the
appropriate subclass.

Atomic accesses follow the same flow as functional accesses, with
request being sent from master to slave through sendAtomic. In the
case of cache-coherent ports, a slave can send snoop requests to a
master through sendAtomicSnoop. Just as for the functional access
methods, the atomic send and receive member functions are moved to the
appropriate subclasses.

The timing access methods are different from the functional and atomic
in that requests and responses are separated in time and
send/recvTiming are used for both directions. Hence, a master uses
sendTiming to send a request to a slave, and a slave uses sendTiming
to send a response back to a master, at a later point in time. Snoop
requests and responses travel in the opposite direction, similar to
what happens in functional and atomic accesses. With the introduction
of this patch, it is possible to determine the direction of packets in
the bus, and no longer necessary to look for both a master and a slave
port with the requested port id.

In contrast to the normal recvFunctional, recvAtomic and recvTiming
that are pure virtual functions, the recvFunctionalSnoop,
recvAtomicSnoop and recvTimingSnoop have a default implementation that
calls panic. This is to allow non-coherent master and slave ports to
not implement these functions.
2012-04-14 05:45:07 -04:00
Andreas Hansson b00949d88b MEM: Enable multiple distributed generalized memories
This patch removes the assumption on having on single instance of
PhysicalMemory, and enables a distributed memory where the individual
memories in the system are each responsible for a single contiguous
address range.

All memories inherit from an AbstractMemory that encompasses the basic
behaviuor of a random access memory, and provides untimed access
methods. What was previously called PhysicalMemory is now
SimpleMemory, and a subclass of AbstractMemory. All future types of
memory controllers should inherit from AbstractMemory.

To enable e.g. the atomic CPU and RubyPort to access the now
distributed memory, the system has a wrapper class, called
PhysicalMemory that is aware of all the memories in the system and
their associated address ranges. This class thus acts as an
infinitely-fast bus and performs address decoding for these "shortcut"
accesses. Each memory can specify that it should not be part of the
global address map (used e.g. by the functional memories by some
testers). Moreover, each memory can be configured to be reported to
the OS configuration table, useful for populating ATAG structures, and
any potential ACPI tables.

Checkpointing support currently assumes that all memories have the
same size and organisation when creating and resuming from the
checkpoint. A future patch will enable a more flexible
re-organisation.

--HG--
rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py
rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py
rename : src/mem/physical.cc => src/mem/abstract_mem.cc
rename : src/mem/physical.hh => src/mem/abstract_mem.hh
rename : src/mem/physical.cc => src/mem/simple_mem.cc
rename : src/mem/physical.hh => src/mem/simple_mem.hh
2012-04-06 13:46:31 -04:00
William Wang f9d403a7b9 MEM: Introduce the master/slave port sub-classes in C++
This patch introduces the notion of a master and slave port in the C++
code, thus bringing the previous classification from the Python
classes into the corresponding simulation objects and memory objects.

The patch enables us to classify behaviours into the two bins and add
assumptions and enfore compliance, also simplifying the two
interfaces. As a starting point, isSnooping is confined to a master
port, and getAddrRanges to slave ports. More of these specilisations
are to come in later patches.

The getPort function is not getMasterPort and getSlavePort, and
returns a port reference rather than a pointer as NULL would never be
a valid return value. The default implementation of these two
functions is placed in MemObject, and calls fatal.

The one drawback with this specific patch is that it requires some
code duplication, e.g. QueuedPort becomes QueuedMasterPort and
QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort
(avoiding multiple inheritance). With the later introduction of the
port interfaces, moving the functionality outside the port itself, a
lot of the duplicated code will disappear again.
2012-03-30 09:40:11 -04:00
Andreas Hansson c2d2ea99e3 MEM: Split SimpleTimingPort into PacketQueue and ports
This patch decouples the queueing and the port interactions to
simplify the introduction of the master and slave ports. By separating
the queueing functionality from the port itself, it becomes much
easier to distinguish between master and slave ports, and still retain
the queueing ability for both (without code duplication).

As part of the split into a PacketQueue and a port, there is now also
a hierarchy of two port classes, QueuedPort and SimpleTimingPort. The
QueuedPort is useful for ports that want to leave the packet
transmission of outgoing packets to the queue and is used by both
master and slave ports. The SimpleTimingPort inherits from the
QueuedPort and adds the implemention of recvTiming and recvFunctional
through recvAtomic.

The PioPort and MessagePort are cleaned up as part of the changes.

--HG--
rename : src/mem/tport.cc => src/mem/packet_queue.cc
rename : src/mem/tport.hh => src/mem/packet_queue.hh
2012-03-22 06:36:27 -04:00
Ali Saidi eaa994e7f6 cache: Allow main memory to be at disjoint address ranges. 2012-03-09 09:59:25 -05:00
Ali Saidi d907d0ec72 Cache: Fix an issue with LRU when bonus block is used to complete transaction.
The block is never inserted because it's the one extra block in the cache, but
it can be invalidated twice in a row. In that case the block doesn't have a
new master id (beacuse it was never inserted), however it is valid and
the accounting goes wrong at that point.
2012-03-01 17:26:31 -06:00
Andreas Hansson 0cd0a8fdd3 MEM: Simplify cache ports preparing for master/slave split
This patch splits the two cache ports into a master (memory-side) and
slave (cpu-side) subclass of port with slightly different
functionality. For example, it is only the CPU-side port that blocks
incoming requests, and only the memory-side port that schedules send
events outside of what the transmit list dictates.

This patch simplifies the two classes by relying further on
SimpleTimingPort and also generalises the latter to better accommodate
the changes (introducing trySendTiming and scheduleSend). The
memory-side cache port overrides sendDeferredPacket to be able to not
only send responses from the transmit list, but also send requests
based on the MSHRs.

A follow on patch further simplifies the SimpleTimingPort and the
cache ports.
2012-02-24 11:52:49 -05:00
Andreas Hansson 5a9a743cfc MEM: Introduce the master/slave port roles in the Python classes
This patch classifies all ports in Python as either Master or Slave
and enforces a binding of master to slave. Conceptually, a master (such
as a CPU or DMA port) issues requests, and receives responses, and
conversely, a slave (such as a memory or a PIO device) receives
requests and sends back responses. Currently there is no
differentiation between coherent and non-coherent masters and slaves.

The classification as master/slave also involves splitting the dual
role port of the bus into a master and slave port and updating all the
system assembly scripts to use the appropriate port. Similarly, the
interrupt devices have to have their int_port split into a master and
slave port. The intdev and its children have minimal changes to
facilitate the extra port.

Note that this patch does not enforce any port typing in the C++
world, it merely ensures that the Python objects have a notion of the
port roles and are connected in an appropriate manner. This check is
carried when two ports are connected, e.g. bus.master =
memory.port. The following patches will make use of the
classifications and specialise the C++ ports into masters and slaves.
2012-02-13 06:43:09 -05:00
Dam Sunwoo 230540e655 mem: fix cache stats to use request ids correctly
This patch fixes the cache stats to use the new request ids.
Cache stats also display the requestor names in the vector subnames.
Most cache stats now include "nozero" and "nonan" flags to reduce the
amount of excessive cache stat dump. Also, simplified
incMissCount()/incHitCount() functions.
2012-02-12 16:07:39 -06:00
Ali Saidi 8aaa39e93d mem: Add a master ID to each request object.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
2012-02-12 16:07:38 -06:00
Mrinmoy Ghosh 7e104a1af2 prefetcher: Make prefetcher a sim object instead of it being a parameter on cache 2012-02-12 16:07:38 -06:00
Gabe Black ea8b347dc5 Merge with head, hopefully the last time for this batch. 2012-01-31 22:40:08 -08:00
Koan-Sin Tan 7d4f187700 clang: Enable compiling gem5 using clang 2.9 and 3.0
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).

clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
2012-01-31 12:05:52 -05:00
Andreas Hansson 4590b91fb8 MEM: Remove the otherPort from the cache ports
This patch is a very straight-forward simplification, removing the
unecessary otherPort pointer from the cache port. The pointer was only
used to forward range changes, and the address range is fixed for the
cache. Removing the pointer simplifies the transition to master/slave
ports.
2012-01-31 11:51:19 -05:00
Gabe Black c3d41a2def Merge with the main repo.
--HG--
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
2012-01-28 07:24:01 -08:00
William Wang e731cf4c1d MEM: Remove the functional ports from the memory system
The functional ports are no longer used and this patch cleans up the
legacy that is still present in buses, memories, CPUs etc. Note that
this does not refer to the class FunctionalPort (already removed), but
rather ports with the name (and use) functional.
2012-01-17 12:55:09 -06:00
Andreas Hansson 07cf9d914b MEM: Separate queries for snooping and address ranges
This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
2012-01-17 12:55:09 -06:00
Andreas Hansson 142380a373 MEM: Remove Port removeConn and MemObject deletePortRefs
Cleaning up and simplifying the ports and going towards a more strict
elaboration-time creation and binding of the ports.
2012-01-17 12:55:09 -06:00
Andreas Hansson de34e49d15 MEM: Simplify ports by removing EventManager
This patch removes the inheritance of EventManager from the ports and
moves all responsibility for event queues to the owner. Eventually the
event manager should be the interface block, which could either be the
structural owner or a subblock like a LSQ in the O3 CPU for example.
2012-01-17 12:55:09 -06:00
Andreas Hansson 13ef7a5647 MEM: Differentiate functional cache accesses from CPU and memory
This patch changes the functionalAccess member function in the cache
model such that it is aware of what port the access came from, i.e. if
it came from the CPU side or from the memory side. By adding this
information, it is possible to respect the 'forwardSnoops' flag for
snooping requests coming from the memory side and not forward
them. This fixes an outstanding issue with the IO bus getting accesses
that have no valid destination port and also cleans up future changes
to the bus model.
2012-01-17 12:55:07 -06:00
Gabe Black 36a822f08e Merge with main repository. 2012-01-07 02:10:34 -08:00
Gabe Black 85424bef19 SE/FS: Get rid of includes of config/full_system.hh. 2011-11-18 02:20:22 -08:00
Gabe Black 71c4534ce9 SE/FS: Get rid of FULL_SYSTEM in mem. 2011-11-07 01:13:43 -08:00
Gabe Black d735abe5da GCC: Get everything working with gcc 4.6.1.
And by "everything" I mean all the quick regressions.
2011-10-31 01:09:44 -07:00
Ali Saidi 0c29a97ba9 Prefetch: Don't prefetch if address is in the write queue.
Check that we're not currently writing back an address the prefetcher is trying
to prefetch before issuing it. We previously checked the mshrQueue and the cache
itself, but forgot to check the writeBuffer. This fixes a memory corrucption
issue with an L2 prefetcher.
2011-09-13 12:06:13 -05:00
Lisa Hsu f6a2ef22ff Fix build for gcc-4.2 opt/fast
Even though the code is safe, compiler flags a warning here, which are treated as errors for fast/opt. I know it's redundant but it has no side effects and fixes the compile.
2011-09-01 15:25:54 -07:00
Ali Saidi c9c2d979b8 Mem: Put prefetcher notify call before packet is deleted. 2011-08-19 15:08:08 -05:00
Ali Saidi 6779bd3e5d Prefetcher: Fix some memory leaks with the prefetcher. 2011-08-19 15:08:05 -05:00
Ali Saidi 147095cb08 Mem: Fix issue with prefetches originating at non-L1 caches getting stale data
Prefetch requests issued from the L2 or below wouldn't check if valid data is
present higher in the system. If a prefetch into the L2 occured at the same
time as writeback from a higher-level cache the dirty data could be replaced
in by unmodified data in memory.
2011-07-15 11:53:35 -05:00
Nathan Binkert 2b1aa35e20 scons: rename TraceFlags to DebugFlags 2011-06-02 17:36:21 -07:00
Nathan Binkert 63371c8664 stats: rename stats so they can be used as python expressions 2011-04-19 18:45:21 -07:00
Nathan Binkert eddac53ff6 trace: reimplement the DTRACE function so it doesn't use a vector
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing.  This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
2011-04-15 10:44:32 -07:00
Nathan Binkert 39a055645f includes: sort all includes 2011-04-15 10:44:06 -07:00
Ali Saidi a432d8e085 Mem: Fix issue with dirty block being lost when entire block transferred to non-cache.
This change fixes the problem for all the cases we actively use. If you want to try
more creative I/O device attachments (E.g. sharing an L2), this won't work. You
would need another level of caching between the I/O device and the cache
(which you actually need anyway with our current code to make sure writes
propagate). This is required so that you can mark the cache in between as
top level and it won't try to send ownership of a block to the I/O device.
Asserts have been added that should catch any issues.
2011-03-17 19:20:19 -05:00
Ali Saidi f05f35df99 Includes: Don't include isa_traits.hh and use the TheISA namespace unless really needed. 2011-02-23 15:10:49 -06:00
Steve Reinhardt 6f1187943c Replace curTick global variable with accessor functions.
This step makes it easy to replace the accessor functions
(which still access a global variable) with ones that access
per-thread curTick values.
2011-01-07 21:50:29 -08:00
Ali Saidi e1b9a815dd SCons: Support building without an ISA 2010-11-19 18:00:39 -06:00
Steve Reinhardt 45aebaccde cache: minor SC assertion fix
Thanks to Joe Gross for finding/testing this.
2010-10-18 13:05:15 -07:00
Gabe Black 930c653270 Mem: Change the CLREX flag to CLEAR_LL.
CLREX is the name of an ARM instruction, not a name for this generic flag.
2010-10-13 01:57:31 -07:00
Steve Reinhardt e918536380 cache: improve coherence handling of writebacks
If we write back an exclusive copy, we now mark it
as such, so the cache receiving the writeback can
mark its copy as exclusive.  This avoids some
unnecessary upgrade requests when a cache later
tries to re-acquire exclusive access to the block.
2010-09-21 23:07:34 -07:00
Nathan Binkert afafaf1dcb style: fix sorting of includes and whitespace in some files 2010-09-10 14:58:04 -07:00
Steve Reinhardt 1249728494 cache: fail SC when invalidated while waiting for bus
Corrects an oversight in cset f97b62be544f.  The fix there only
failed queued SCUpgradeReq packets that encountered an
invalidation, which meant that the upgrade had to reach the L2
cache.  To handle pending requests in the L1 we must similarly
fail StoreCondReq packets too.
2010-09-09 14:40:19 -04:00
Steve Reinhardt 6dc599ea9b mem: fix functional accesses to deal with coherence change
We can't just obliviously return the first valid cache block
we find any more... see comments for details.
2010-09-09 14:40:19 -04:00
Steve Reinhardt 71aca6d29e cache: coherence protocol enhancements & bug fixes
Allow lower-level caches (e.g., L2 or L3) to pass exclusive
copies to higher levels (e.g., L1).  This eliminates a lot
of unnecessary upgrade transactions on read-write sequences
to non-shared data.

Also some cleanup of MSHR coherence handling and multiple
bug fixes.
2010-09-09 14:40:18 -04:00
Steve Reinhardt 3ffc4505f7 mem: fix m5.fast compile bug in previous cset 2010-08-26 08:03:20 -07:00
Steve Reinhardt 1bf944be62 cache: fix a bug in atomic multilevel snoops 2010-08-25 21:55:55 -07:00
Steve Reinhardt 62c06c1403 mem: fix dumb typo in copyrights 2010-08-25 14:08:27 -07:00
Gene Wu d6736384b2 MEM: Make CLREX a first class request operation and clear locks in caches when it in received 2010-08-23 11:18:41 -05:00
Gene Wu 23626d99af ARM: Make sure that software prefetch instructions can't change the state of the TLB 2010-08-23 11:18:41 -05:00
Ali Saidi ac575a9d82 Compiler: Fixes for GCC 4.5. 2010-08-23 11:18:39 -05:00
Timothy M. Jones 28a5ea3f99 Port: Only indicate that a SimpleTimingPort is drained if its send event is
not scheduled, as well as the transmit list being empty.
2010-07-22 18:54:37 +01:00
Steve Reinhardt 897247d63b cache: fix bug in SC upgrade handling
This bug was introduced with the recent rework of SC
failure handling in cset f97b62be544f.
2010-07-08 17:56:13 -07:00
Steve Reinhardt de2321de81 cache: fix longstanding prefetcher bug
Thanks to Joe Gross for pointing this out (again?).
Apologies to anyone who pointed it out earlier and
we didn't listen.
2010-06-22 21:29:43 -07:00
Steve Reinhardt f24ae2ec2a cache: fail store conditionals when upgrade loses race
Requires new "SCUpgradeReq" message that marks upgrades
for store conditionals, so downstream caches can fail
these when they run into invalidations.
See http://www.m5sim.org/flyspray/task/197
2010-06-16 15:25:57 -07:00
Steve Reinhardt 57f2b7db11 cache: fix dirty bit setting
Only set the dirty bit when we actually write to a block
(not if we thought we might but didn't, as in a failed
SC or CAS).  This requires makeing sure the dirty bit
stays set when we get an exclusive (writable) copy
in a cache-to-cache transfer from another owner, which
n turn requires copying the mem-inhibit flag from
timing-mode requests to their associated responses.
2010-06-16 15:25:57 -07:00
Nathan Binkert 86a93fe7b9 stats: only consider a formula initialized if there is a formula 2010-06-15 01:18:36 -07:00
Lisa Hsu 7f3cd9a9fd cache stats: account for writebacks and/or device occupancy in the cache.
Plus, a minor bugfix that neglects to update blk->contextSrc in certain cases on a cache insert.
2010-02-24 13:46:55 -08:00
Lisa Hsu 1d3228481f cache: Make caches sharing aware and add occupancy stats.
On the config end, if a shared L2 is created for the system, it is
parameterized to have n sharers as defined by option.num_cpus. In addition to
making the cache sharing aware so that discriminating tag policies can make use
of context_ids to make decisions, I added an occupancy AverageStat and an occ %
stat to each cache so that you could know which contexts are occupying how much
cache on average, both in terms of blocks and percentage. Note that since
devices have context_id -1, having an array of occ stats that correspond to
each context_id will break here, so in FS mode I add an extra bucket for device
blocks. This bucket is explicitly not added in SE mode in order to not only
avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas
break when a bucket is 0).
2010-02-23 09:34:22 -08:00
Lisa Hsu 2ad386f104 cache: pull CacheSet out of LRU so that other tags can use associative sets. 2010-02-23 09:33:09 -08:00
Lisa Hsu 8b4e8690b7 cache: make tags->insertBlock() and tags->accessBlock() context aware so that the cache can make context-specific decisions within their various tag policy implementations. 2010-01-12 10:53:02 -08:00
Steve Reinhardt f679630788 Minor cleanup: Use the blockAlign() method where it applies in the cache. 2009-09-26 10:50:50 -07:00
Steve Reinhardt 72cfed4164 Force prefetches to check cache and MSHRs immediately prior to issue.
This prevents redundant prefetches from being issued, solving the
occasional 'needsExclusive && !blk->isWritable()' assertion failure
in cache_impl.hh that several people have run into.
Eliminates "prefetch_cache_check_push" flag, neither setting of
which really solved the problem.
2009-09-26 10:50:50 -07:00
Nathan Binkert d9f39c8ce7 arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh 2009-09-23 08:34:21 -07:00
Steve Reinhardt a13a706a20 Fix setting of INST_FETCH flag for O3 CPU.
It's still broken in inorder.
Also enhance DPRINTFs in cache and physical memory so we
can see more easily whether it's getting set or not.
2009-08-01 22:50:14 -07:00
Nathan Binkert 6faf377b53 types: clean up types, especially signed vs unsigned 2009-06-04 23:21:12 -07:00
Nathan Binkert 47877cf2db types: add a type for thread IDs and try to use it everywhere 2009-05-26 09:23:13 -07:00
Nathan Binkert 8d2e51c7f5 includes: sort includes again 2009-05-17 14:34:52 -07:00
Nathan Binkert 709d859530 includes: use base/types.hh not inttypes.h or stdint.h 2009-05-17 14:34:51 -07:00
Nathan Binkert eef3a2e142 types: Move stuff for global types into src/base/types.hh
--HG--
rename : src/sim/host.hh => src/base/types.hh
2009-05-17 14:34:50 -07:00
Steve Reinhardt 6629d9b2bc mem: use single BadAddr responder per system.
Previously there was one per bus, which caused some coherence problems
when more than one decided to respond.  Now there is just one on
the main memory bus.  The default bus responder on all other buses
is now the downstream cache's cpu_side port.  Caches no longer need
to do address range filtering; instead, we just have a simple flag
to prevent snoops from propagating to the I/O bus.
2008-07-16 11:10:33 -07:00
Steve Reinhardt 3083268d60 request: rename INST_READ to INST_FETCH. 2009-04-20 18:54:02 -07:00
Gabe Black bd6f2bb538 Mem: Change isLlsc to isLLSC. 2009-04-19 21:44:15 -07:00
Gabe Black 3e5f487663 Memory: Rename LOCKED for load locked store conditional to LLSC. 2009-04-19 04:25:01 -07:00
Steve Reinhardt 758bfe4eb5 cache: set dirty bit on swaps (oops!) 2009-03-11 23:05:26 -07:00
Steve Reinhardt a94c68228a prefetch: don't panic on requests w/o contextID (e.g., writebacks). 2009-03-10 17:37:15 -07:00
Nathan Binkert cc95b57390 stats: Fix all stats usages to deal with template fixes 2009-03-05 19:09:53 -08:00
Steve Reinhardt 89a7fb0393 Fixes to get prefetching working again.
Apparently we broke it with the cache rewrite and never noticed.
Thanks to Bao Yungang <baoyungang@gmail.com> for a significant part
of these changes (and for inspiring me to work on the rest).
Some other overdue cleanup on the prefetch code too.
2009-02-16 08:56:40 -08:00
Steve Reinhardt 640b415688 Cache: get rid of obsolete Tag methods.
I think readData() and writeData() were used for Erik's compression
work, but that code is gone, these aren't called anymore, and they
don't even really do what their names imply.
2008-11-14 14:14:35 -08:00
Steve Reinhardt 42bd460d7f Cache: Refactor packet forwarding a bit.
Makes adding write-through operations easier.
2008-11-10 14:10:28 -08:00
Lisa Hsu c68032ddcb decouple eviction from insertion in the cache. 2008-11-04 11:35:58 -05:00
Lisa Hsu 4ab52cb986 Change the findBlock(addr, lat) to accessBlock, which I think has better connotations for what is really happening and how it should be used. 2008-11-04 11:35:57 -05:00
Lisa Hsu dd99ff23c6 get rid of all instances of readTid() and getThreadNum(). Unify and eliminate
redundancies with threadId() as their replacement.
2008-11-04 11:35:42 -05:00
Lisa Hsu d857faf073 Add in Context IDs to the simulator. From now on, cpuId is almost never used,
the primary identifier for a hardware context should be contextId().  The
concept of threads within a CPU remains, in the form of threadId() because
sometimes you need to know which context within a cpu to manipulate.
2008-11-02 21:57:07 -05:00
Lisa Hsu 8788d703f8 s/cpu_id/cpuId in o3 (to be consistent and match style), also fix some typos in
comments.
2008-10-23 16:49:17 -04:00
Lisa Hsu 546a6c0c1b probe function no longer used anywhere. 2008-10-23 16:49:13 -04:00
Lisa Hsu 7a28ab2d18 remove the totally obsolete split cache 2008-10-23 16:11:28 -04:00
Lisa Hsu 90e40ca982 This function declaration isn't used anywhere.
HG: user: Lisa Hsu <hsul@eecs.umich.edu> HG: branch default HG: changed
src/mem/cache/cache.hh
2008-10-14 17:22:03 -04:00
Nathan Binkert e06321091d eventq: convert all usage of events to use the new API.
For now, there is still a single global event queue, but this is
necessary for making the steps towards a parallelized m5.
2008-10-09 04:58:24 -07:00
Ali Saidi 3a3e356f4e style: Remove non-leading tabs everywhere they shouldn't be. Developers should configure their editors to not insert tabs 2008-09-10 14:26:15 -04:00
Steve Reinhardt caaac16803 Backed out changeset 94a7bb476fca: caused memory leak. 2008-06-28 13:19:38 -04:00
Steve Reinhardt 6b45238316 Generate more useful error messages for unconnected ports.
Force all non-default ports to provide a name and an
owner in the constructor.
2008-06-21 01:04:43 -04:00
Steve Reinhardt 024ec4c5c3 Get rid of bogus cache assertion.
I was asserting that the only reason you would defer targets is if
a write came in while you had an outstanding read miss, but there's
another case where you could get a read access after you've snooped
an invalidation and buffered it because it applies to a prior
outstanding miss.
2008-06-13 01:29:20 -04:00
Ali Saidi e71a5270a2 Make sure that output files are always checked success before they're used.
Make OutputDirectory::resolve() private and change the functions using
resolve() to instead use create().

--HG--
extra : convert_revision : 36d4be629764d0c4c708cec8aa712cd15f966453
2008-05-15 19:10:26 -04:00
Steve Reinhardt 29be31ce31 Fix handling of writeback-induced writebacks in atomic mode.
--HG--
extra : convert_revision : 4fa64f8a929f1aa36a9d5a71b8d1816b497aca4c
2008-03-25 10:01:21 -04:00
Steve Reinhardt 93ab43288a Don't FastAlloc MSHRs since we don't allocate them on the fly.
--HG--
extra : convert_revision : 02775cfb460afe6df0df0938c62cccd93a71e775
2008-03-24 01:08:02 -04:00
Steve Reinhardt 407710d387 Fix cache problem with writes to tempBlock
getting wrong writeback address.

--HG--
extra : convert_revision : 023dfb69c227c13a69bfe2744c6af75a45828b0b
2008-03-22 22:17:15 -04:00
Steve Reinhardt b051ae6acc Fix a few Packet memory leaks.
--HG--
extra : convert_revision : 00db19f0698c0786f0dff561eea9217860a5a05a
2008-03-17 03:08:28 -04:00
Steve Reinhardt 19c367fa8f Fix subtle cache bug where read could return stale data
if a prior write miss arrived while an even earlier
read miss was still outstanding.

--HG--
extra : convert_revision : 4924e145829b2ecf4610b88d33f4773510c6801a
2008-03-15 05:03:55 -07:00
Steve Reinhardt e6d6adc731 Revamp cache timing access mshr check to make stats sane again.
--HG--
extra : convert_revision : 37009b8ee536807073b5a5ca07ed1d097a496aea
2008-02-26 22:03:28 -08:00
Steve Reinhardt bdf3323915 Cache: better comments particularly regarding writeback situation.
--HG--
extra : convert_revision : 59ff9ee63ee0fec5a7dfc27b485b737455ccf362
2008-02-26 20:17:26 -08:00
Steve Reinhardt 4597a71cef Make L2+ caches allocate new block for writeback misses
instead of forwarding down the line.

--HG--
extra : convert_revision : b0d6e7862c92ea7a2d21f817d30398735e7bb8ba
2008-02-16 14:58:03 -05:00
Steve Reinhardt 9d7a69c582 Fix #include lines for renamed cache files.
--HG--
extra : convert_revision : b5008115dc5b34958246608757e69a3fa43b85c5
2008-02-10 14:45:25 -08:00