Do some minor cleanup of some recently added comments, a warning, and change
other instances of stack extension to be like what's now being done for x86.
Even though the code is safe, compiler flags a warning here, which are treated as errors for fast/opt. I know it's redundant but it has no side effects and fixes the compile.
In the current implementation of Functional Accesses, it's very hard to
implement broadcast or snooping protocols where the memory has no idea if it
has exclusive access to a cache block or not. Without this knowledge, making
sure the RW vs. RO permissions are right are next to impossible. So we add a
new state called Backing_Store to enable the conveyance that this is the backup
storage for a block, so that it can be written if it is the only possibly RW
block in the system, or written even if there is another RW block in the
system, without causing problems.
Also, a small change to actually set the m_name field for each Controller so
that debugging can be easier. Now you can access a controller's name just by
controller->getName().
This patch replaces RUBY with PROTOCOL in all the SConscript files as
the environment variable that decides whether or not certain components
of the simulator are compiled.
This patch drops RUBY as a compile time option. Instead the PROTOCOL option
is used to figure out whether or not to build Ruby. If the specified protocol
is 'None', then Ruby is not compiled.
Currently, functions associated with a controller go into separate files.
This patch puts all the functions in the controller's .cc file. This should
hopefully take away some time from compilation.
Prefetch requests issued from the L2 or below wouldn't check if valid data is
present higher in the system. If a prefetch into the L2 occured at the same
time as writeback from a higher-level cache the dirty data could be replaced
in by unmodified data in memory.
This makes it possible to use the grammar multiple times and use the multiple
instances concurrently. This makes implementing an include statement as part
of a grammar possible.
Addition of functional access support to Ruby necessitated some changes to
the way coherence protocols are written. I had forgotten to update the
Network_test protocol. This patch makes those updates.
This patch rpovides functional access support in Ruby. Currently only
the M5Port of RubyPort supports functional accesses. The support for
functional through the PioPort will be added as a separate patch.
The code for Set class was written under the assumption that
std::numeric_limits<long>::digits returns the number of bits used for
data type long, which was presumed to be either 32 or 64. But return value
is actually one less, that is, it is either 31 or 63. The value is now
being incremented by 1 so as to correctly set it.
The access permissions for the directory entries are not being set correctly.
This is because pointers are not used for handling directory entries.
function. get and set functions for access permissions have been added to the
Controller state machine. The changePermission() function provided by the
AbstractEntry and AbstractCacheEntry classes has been exposed to SLICC
code once again. The set_permission() functionality has been removed.
NOTE: Each protocol will have to define these get and set functions in order
to compile successfully.
Currently, the machine name is appended before any of the functions
defined with in the sm files. This is not necessary and it also
means that these functions cannot be used outside the sm files.
This patch does away with the prefixes. Note that the generated
C++ files in which the code for these functions is present are
still named such that the machine name is the prefix.
Re-enabling implicit parenting (see previous patch) causes current
Ruby config scripts to create some strange hierarchies and generate
several warnings. This patch makes three general changes to address
these issues.
1. The order of object creation in the ruby config files makes the L1
caches children of the sequencer rather than the controller; these
config ciles are rewritten to assign the L1 caches to the
controller first.
2. The assignment of the sequencer list to system.ruby.cpu_ruby_ports
causes the sequencers to be children of system.ruby, generating
warnings because they are already parented to their respective
controllers. Changing this attribute to _cpu_ruby_ports fixes this
because the leading underscore means this is now treated as a plain
Python attribute rather than a child assignment. As a result, the
configuration hierarchy changes such that, e.g.,
system.ruby.cpu_ruby_ports0 becomes system.l1_cntrl0.sequencer.
3. In the topology classes, the routers become children of some random
internal link node rather than direct children of the topology.
The topology classes are rewritten to assign the routers to the
topology object first.
The virtual channels within "response" vnets are made buffers_per_data_vc
deep (default=4), while virtual channels within other vnets are made
buffers_per_ctrl_vc deep (default = 1). This is for accurate power estimates.
Identifying response vnets versus other vnets will allow garnet to
determine which vnets will carry data packets, and which will carry
ctrl packets, and use appropriate buffer sizes (since data packets are larger
than ctrl packets). This in turn allows the orion power model to accurately
estimate buffer power.
Renamed (message) class to vnet for consistency with rest of ruby.
Moved some parameters specific to fixed/flexible garnet networks into their
corresponding py files.
The RubyMemory flag wasnt used in the code, creating large gaps in trace output. Replace cprintfs w/dprintfs
using RubyMemory in memory controller. DPRINTF also deprecate the usage of the setDebug() pure virtual
function in the AbstractMemoryOrCache Class as well the m_debug/cprintf functions in MemoryControl.hh/cc
The simple network's endpoint bandwidth value is used to adjust the overall
bandwidth of the network. Specifically, the ration between endpoint bandwidth
and the MESSAGE_SIZE_MULTIPLIER determines the increase. By setting the value
to 1000, that means the bandwdith factor specified in the links translates to
the link bandwidth in bytes. Previously, it was increasing that value by 10.
This patch will likely require a reset of the ruby regression tester stats.
Moved the buffer_size, endpoint_bandwidth, and adaptive_routing params out of
the top-level parent network object and to only those networks that actually
use those parameters.
This patch ensures that both Garnet and the simple networks use the bw value
specified in the topology. To do so, the patch generalizes the specification
of bw for basic links. This value is then translated to the specific value
used by the simple and Garnet networks. Since Garent does not support
non-uniformed link bandwidth, the patch also adds a check to ensure all bws are
equal.
--HG--
rename : src/mem/ruby/network/BasicLink.cc => src/mem/ruby/network/simple/SimpleLink.cc
rename : src/mem/ruby/network/BasicLink.hh => src/mem/ruby/network/simple/SimpleLink.hh
rename : src/mem/ruby/network/BasicLink.py => src/mem/ruby/network/simple/SimpleLink.py
This patch converts links and switches from second class simobjects that were
virtually ignored by the networks (both simple and Garnet) to first class
simobjects that directly correspond to c++ ojbects manipulated by the
topology and network classes. This is especially true for Garnet, where the
links and switches directly correspond to specific C++ objects.
By making this change, many aspects of the Topology class were simplified.
--HG--
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/BasicLink.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/BasicLink.hh
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.hh
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.py
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/fixed-pipeline/GarnetRouter_d.py
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.hh
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.py
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/flexible-pipeline/GarnetRouter.py
Moved the Topology class to the top network directory because it is shared by
both the simple and Garnet networks.
--HG--
rename : src/mem/ruby/network/simple/Topology.cc => src/mem/ruby/network/Topology.cc
rename : src/mem/ruby/network/simple/Topology.hh => src/mem/ruby/network/Topology.hh
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
Fixed an error reguarding DMA for uninprocessor systems. Basically removed an
overly agressive optimization that lead to inconsistent state between the
cache and the directory.
This function duplicates the functionality of allocate() exactly, except that it does not return
a return value. In protocols where you just want to allocate a block
but do not want that block to be your implicitly passed cache_entry, use this function.
Otherwise, SLICC will complain if you do not consume the pointer returned by allocate(),
and if you do a dummy assignment Entry foo := cache.allocate(address), the C++
compiler will complain of an unused variable. This is kind of a hack to get around
those issues, but suggestions welcome.
Before this changeset, all local variables of type Entry and TBE were considered
to be pointers, but an immediate use of said variables would not be automatically
deferenced in SLICC-generated code. Instead, deferences occurred when such
variables were passed to functions, and were automatically dereferenced in
the bodies of the functions (e.g. the implicitly passed cache_entry).
This is a more general way to do it, which leaves in place the
assumption that parameters to functions and local variables of type AbstractCacheEntry
and TBE are always pointers, but instead of dereferencing to access member variables
on a contextual basis, the dereferencing automatically occurs on a type basis at the
moment a member is being accessed. So, now, things you can do that you couldn't before
include:
Entry foo := getCacheEntry(address);
cache_entry.DataBlk := foo.DataBlk;
or
cache_entry.DataBlk := getCacheEntry(address).DataBlk;
or even
cache_entry.DataBlk := static_cast(Entry, pointer, cache.lookup(address)).DataBlk;
This is a substitute for MessageBuffers between controllers where you don't
want messages to actually go through the Network, because requests/responses can
always get reordered wrt to one another (even if you turn off Randomization and turn on Ordered)
because you are, after all, going through a network with contention. For systems where you model
multiple controllers that are very tightly coupled and do not actually go through a network,
it is a pain to have to write a coherence protocol to account for mixed up request/response orderings
despite the fact that it's completely unrealistic. This is *not* meant as a substitute for real
MessageBuffers when messages do in fact go over a network.
It is useful for Ruby to understand from whence request packets came.
This has all request packets going into Ruby pass the contextId value, if
it exists. This supplants the old libruby proc_id value passed around in
all the Messages, so I've also removed the unused unsigned proc_id; member
generated by SLICC for all Message types.
The goal of the patch is to do away with the CacheMsg class currently in use
in coherence protocols. In place of CacheMsg, the RubyRequest class will used.
This class is already present in slicc_interface/RubyRequest.hh. In fact,
objects of class CacheMsg are generated by copying values from a RubyRequest
object.
The tester code is in testers/networktest.
The tester can be invoked by configs/example/ruby_network_test.py.
A dummy coherence protocol called Network_test is also addded for network-only simulations and testing. The protocol takes in messages from the tester and just pushes them into the network in the appropriate vnet, without storing any state.
I had recently committed a patch that removed the WakeUp*.py files from the
slicc/ast directory. I had forgotten to remove the import calls for these
files from slicc/ast/__init__.py. This resulted in error while running
regressions on zizzer. This patch does the needful.
This patch fixes the problem where Ruby would fail to call sendRetry on ports
after it nacked the port. This patch is particularly helpful for bursty dma
requests which often include several packets.
In SLICC, in order to define a type a data type for which it should not
generate any code, the keyword external_type is used. For those data types for
which code should be generated, the keyword structure is used. This patch
eliminates the use of keyword external_type for defining structures. structure
key word can now have an optional attribute external, which would be used for
figuring out whether or not to generate the code for this structure. Also, now
structures can have functions as well data members in them.
In order to add stall and wait facility for protocols, a keyword
wake_up_dependents was introduced. This patch removes the keyword,
instead this functionality is now implemented as function call.
In order to add stall and wait facility for protocols, a keyword
wake_up_all_dependents was introduced. This patch removes the keyword,
instead this functionality is now implemented as function call.
This change fixes the problem for all the cases we actively use. If you want to try
more creative I/O device attachments (E.g. sharing an L2), this won't work. You
would need another level of caching between the I/O device and the cache
(which you actually need anyway with our current code to make sure writes
propagate). This is required so that you can mark the cache in between as
top level and it won't try to send ownership of a block to the I/O device.
Asserts have been added that should catch any issues.
None of the code in the ruby tester directory is compiled or referred to
outside of that directory. This change eliminates it. If it's needed in the
future, it can be revived from the history. In the mean time, this removes
clutter and the only use of the GEMS_ROOT scons variable.
There may not be a formally correct spelling for the past tense of mmap, but
mmapped is the spelling Google doesn't try to autocorrect. This makes sense
because it mirrors the past tense of map->mapped and not the past tense of
cape->caped.
--HG--
rename : src/arch/alpha/mmaped_ipr.hh => src/arch/alpha/mmapped_ipr.hh
rename : src/arch/arm/mmaped_ipr.hh => src/arch/arm/mmapped_ipr.hh
rename : src/arch/mips/mmaped_ipr.hh => src/arch/mips/mmapped_ipr.hh
rename : src/arch/power/mmaped_ipr.hh => src/arch/power/mmapped_ipr.hh
rename : src/arch/sparc/mmaped_ipr.hh => src/arch/sparc/mmapped_ipr.hh
rename : src/arch/x86/mmaped_ipr.hh => src/arch/x86/mmapped_ipr.hh
At a couple of places in PerfectSwitch.cc and MessageBuffer.cc, DPRINTF()
has not been provided with correct number of arguments. The patch fixes these
bugs.
This patch removes the store buffer from Ruby. It is not in use currently.
Since libruby is being and store buffer makes calls to libruby, it is not
possible to maintain it until substantial changes are made.
This patch changes Address.hh so that it is not dependent on RubySystem.
This dependence seems unecessary. All those functions that depend on
RubySystem have been moved to Address.cc file.
This patch changes DataBlock.hh so that it is not dependent on RubySystem.
This dependence seems unecessary. All those functions that depende on
RubySystem have been moved to DataBlock.cc file.
This patch integrates permissions with cache and memory states, and then
automates the setting of permissions within the generated code. No longer
does one need to manually set the permissions within the setState funciton.
This patch will faciliate easier functional access support by always correctly
setting permissions for both cache and memory states.
--HG--
rename : src/mem/slicc/ast/EnumDeclAST.py => src/mem/slicc/ast/StateDeclAST.py
rename : src/mem/slicc/ast/TypeFieldEnumAST.py => src/mem/slicc/ast/TypeFieldStateAST.py
"executing" isnt a very descriptive debug message and in going through the
output you get multiple messages that say "executing" but nothing to help
you parse through the code/execution.
So instead, at least print out the name of the action that is taking
place in these functions.
Overall, continue to progress Ruby debug messages to more of the normal M5
debug message style
- add a name() to the Ruby Throttle & PerfectSwitch objects so that the debug output
isn't littered w/"global:" everywhere.
- clean up messages that print over multiple lines when possible
- clean up duplicate prints in the message buffer
In certain actions of the L1 cache controller, while creating an outgoing
message, the machine type was not being set. This results in a
segmentation fault when trace is collected. Joseph Pusudesris provided
his patch for fixing this issue.
Currently the wakeup function for the PerfectSwitch contains three loops -
loop on number of virtual networks
loop on number of incoming links
loop till all messages for this (link, network) have been routed
With an 8 processor mesh network and Hammer protocol, about 11-12% of the
was observed to have been spent in this function, which is the highest
amongst all the functions. It was found that the innermost loop is executed
about 45 times per invocation of the wakeup function, when each invocation
of the wakeup function processes just about one message.
The patch tries to do away with the redundant executions of the innermost
loop. Counters have been added for each virtual network that record the
number of messages that need to be routed for that virtual network. The
inner loops are only executed when the number of messages for that particular
virtual network > 0. This does away with almost 80% of the executions of the
innermost loop. The function now consumes about 5-6% of the total execution
time.
The patch changes the order in which L1 dcache and icache are looked up when
a request comes in. Earlier, if a request came in for instruction fetch, the
dcache was looked up before the icache, to correctly handle self-modifying
code. But, in the common case, dcache is going to report a miss and the
subsequent icache lookup is going to report a hit. Given the invariant -
caches under the same controller keep track of disjoint sets of cache blocks,
we can move the icache lookup before the dcache lookup. In case of a hit in
the icache, using our invariant, we know that the dcache would have reported
a miss. In case of a miss in the icache, we know that icache would have
missed even if the dcache was looked up before looking up the icache.
Effectively, we are doing the same thing as before, though in the common case,
we expect reduction in the number of lookups. This was empirically confirmed
for MOESI hammer. The ratio lookups to access requests is now about 1.1 to 1.
The TBE pointer in the MESI CMP implementation was not being set to NULL
when the TBE is deallocated. This resulted in segmentation fault on testing
the protocol when the ProtocolTrace was switched on.
The code for Orion 2.0 makes use of printf() at several places where there as
an error in configuration of the model. These have been replaced with fatal().
By stalling and waiting the mandatory queue instead of recycling it, one can
ensure that no incoming messages are starved when the mandatory queue puts
signficant of pressure on the L1 cache controller (i.e. the ruby memtester).
--HG--
rename : src/mem/slicc/ast/WakeUpDependentsStatementAST.py => src/mem/slicc/ast/WakeUpAllDependentsStatementAST.py
The packet now identifies whether static or dynamic data has been allocated and
is used by Ruby to determine whehter to copy the data pointer into the ruby
request. Subsequently, Ruby can be told not to update phys memory when
receiving packets.
Separate data VCs and ctrl VCs in garnet, as ctrl VCs have 1 buffer per VC,
while data VCs have > 1 buffers per VC. This is for correct power estimations.
The purpose of this patch is to change the way CacheMemory interfaces with
coherence protocols. Currently, whenever a cache controller (defined in the
protocol under consideration) needs to carry out any operation on a cache
block, it looks up the tag hash map and figures out whether or not the block
exists in the cache. In case it does exist, the operation is carried out
(which requires another lookup). As observed through profiling of different
protocols, multiple such lookups take place for a given cache block. It was
noted that the tag lookup takes anything from 10% to 20% of the simulation
time. In order to reduce this time, this patch is being posted.
I have to acknowledge that the many of the thoughts that went in to this
patch belong to Brad.
Changes to CacheMemory, TBETable and AbstractCacheEntry classes:
1. The lookup function belonging to CacheMemory class now returns a pointer
to a cache block entry, instead of a reference. The pointer is NULL in case
the block being looked up is not present in the cache. Similar change has
been carried out in the lookup function of the TBETable class.
2. Function for setting and getting access permission of a cache block have
been moved from CacheMemory class to AbstractCacheEntry class.
3. The allocate function in CacheMemory class now returns pointer to the
allocated cache entry.
Changes to SLICC:
1. Each action now has implicit variables - cache_entry and tbe. cache_entry,
if != NULL, must point to the cache entry for the address on which the action
is being carried out. Similarly, tbe should also point to the transaction
buffer entry of the address on which the action is being carried out.
2. If a cache entry or a transaction buffer entry is passed on as an
argument to a function, it is presumed that a pointer is being passed on.
3. The cache entry and the tbe pointers received __implicitly__ by the
actions, are passed __explicitly__ to the trigger function.
4. While performing an action, set/unset_cache_entry, set/unset_tbe are to
be used for setting / unsetting cache entry and tbe pointers respectively.
5. is_valid() and is_invalid() has been made available for testing whether
a given pointer 'is not NULL' and 'is NULL' respectively.
6. Local variables are now available, but they are assumed to be pointers
always.
7. It is now possible for an object of the derieved class to make calls to
a function defined in the interface.
8. An OOD token has been introduced in SLICC. It is same as the NULL token
used in C/C++. If you are wondering, OOD stands for Out Of Domain.
9. static_cast can now taken an optional parameter that asks for casting the
given variable to a pointer of the given type.
10. Functions can be annotated with 'return_by_pointer=yes' to return a
pointer.
11. StateMachine has two new variables, EntryType and TBEType. EntryType is
set to the type which inherits from 'AbstractCacheEntry'. There can only be
one such type in the machine. TBEType is set to the type for which 'TBE' is
used as the name.
All the protocols have been modified to conform with the new interface.
This patch changes the manner in which data is copied from L1 to L2 cache in
the implementation of the Hammer's cache coherence protocol. Earlier, data was
copied directly from one cache entry to another. This has been broken in to
two parts. First, the data is copied from the source cache entry to a
transaction buffer entry. Then, data is copied from the transaction buffer
entry to the destination cache entry.
This has been done to maintain the invariant - at any given instant, multiple
caches under a controller are exclusive with respect to each other.
Ran all the source files through 'perl -pi' with this script:
s|\s*(};?\s*)?/\*\s*(end\s*)?namespace\s*(\S+)\s*\*/(\s*})?|} // namespace $3|;
s|\s*};?\s*//\s*(end\s*)?namespace\s*(\S+)\s*|} // namespace $2\n|;
s|\s*};?\s*//\s*(\S+)\s*namespace\s*|} // namespace $1\n|;
Also did a little manual editing on some of the arch/*/isa_traits.hh files
and src/SConscript.
Two functions in src/mem/ruby/system/PerfectCacheMemory.hh, tryCacheAccess()
and cacheProbe(), end with calls to panic(). Both of these functions have
return type other than void. Any file that includes this header file fails
to compile because of the missing return statement. This patch adds dummy
values so as to avoid the compiler warnings.
This diff is for changing the way ASSERT is handled in Ruby. m5.fast
compiles out the assert statements by using the macro NDEBUG. Ruby uses the
macro RUBY_NO_ASSERT to do so. This macro has been removed and NDEBUG has
been put in its place.
These flags were being used to identify what alignment a request needed, but
the same information is available using the request size. This change also
eliminates the isMisaligned function. If more complicated alignment checks are
needed, they can be signaled using the ASI_BITS space in the flags vector like
is currently done with ARM.
If we write back an exclusive copy, we now mark it
as such, so the cache receiving the writeback can
mark its copy as exclusive. This avoids some
unnecessary upgrade requests when a cache later
tries to re-acquire exclusive access to the block.
Also move the "Fault" reference counted pointer type into a separate file,
sim/fault.hh. It would be better to name this less similarly to sim/faults.hh
to reduce confusion, but fault.hh matches the name of the type. We could change
Fault to FaultPtr to match other pointer types, and then changing the name of
the file would make more sense.
Corrects an oversight in cset f97b62be544f. The fix there only
failed queued SCUpgradeReq packets that encountered an
invalidation, which meant that the upgrade had to reach the L2
cache. To handle pending requests in the L1 we must similarly
fail StoreCondReq packets too.
Allow lower-level caches (e.g., L2 or L3) to pass exclusive
copies to higher levels (e.g., L1). This eliminates a lot
of unnecessary upgrade transactions on read-write sequences
to non-shared data.
Also some cleanup of MSHR coherence handling and multiple
bug fixes.
This patch allows messages to be stalled in their input buffers and wait
until a corresponding address changes state. In order to make this work,
all in_ports must be ranked in order of dependence and those in_ports that
may unblock an address, must wake up the stalled messages. Alot of this
complexity is handled in slicc and the specification files simply
annotate the in_ports.
--HG--
rename : src/mem/slicc/ast/CheckAllocateStatementAST.py => src/mem/slicc/ast/StallAndWaitStatementAST.py
rename : src/mem/slicc/ast/CheckAllocateStatementAST.py => src/mem/slicc/ast/WakeUpDependentsStatementAST.py
Patch allows each individual message buffer to have different recycle latencies
and allows the overall recycle latency to be specified at the cmd line. The
patch also adds profiling info to make sure no one processor's requests are
recycled too much.
The main purpose for clearing stats in the unserialize process is so
that the profiler can correctly set its start time to the unserialized
value of curTick.
This patch allows one to disable migratory sharing for those cache blocks that
are accessed by atomic requests. While the implementations are different
between the token and hammer protocols, the motivation is the same. For
Alpha, LLSC semantics expect that normal loads do not unlock cache blocks that
have been locked by LL accesses. Therefore, locked blocks should not transfer
write permissions when responding to these load requests. Instead, only they
only transfer read permissions so that the subsequent SC access can possibly
succeed.
This patch fixes several bugs related to previous inconsistent assumptions on
how many tokens the Owner had. Mike Marty should have fixes these bugs years
ago. :)
Previously, the MOESI_hammer protocol calculated the same latency for L1 and
L2 hits. This was because the protocol was written using the old ruby
assumption that L1 hits used the sequencer fast path. Since ruby no longer
uses the fast-path, the protocol delays L2 hits by placing them on the
trigger queue.
The previous slower ruby latencies created a mismatch between the faster M5
cpu models and the much slower ruby memory system. Specifically smp
interrupts were much slower and infrequent, as well as cpus moving in and out
of spin locks. The result was many cpus were idle for large periods of time.
These changes fix the latency mismatch.
This patch adds back to ruby the capability to understand the response time
for messages that hit in different levels of the cache heirarchy.
Specifically add support for the MI_example, MOESI_hammer, and MOESI_CMP_token
protocols.
This patch adds DMA testing to the Memtester and is inherits many changes from
Polina's old tester_dma_extension patch. Since Ruby does not work in atomic
mode, the atomic mode options are removed.
Clean up some minor things left over from the default responder
change in rev 9af6fb59752f. Mostly renaming the 'responder_set'
param to 'use_default_range' to actually reflect what it does...
old name wasn't that descriptive in the first place, but now
it really doesn't make sense at all.
Also got rid of the bogus obsolete assignment to 'bus.responder'
which used to be a parameter but now is interpreted as an
implicit child assignment, and which was giving me problems in
the config restructuring to come. (A good argument for not
allowing implicit child assignments, IMO, but that's water under
the bridge, I'm afraid.)
Also moved the Bus constructor to the .cc file since that's
where it should have been all along.
Requires new "SCUpgradeReq" message that marks upgrades
for store conditionals, so downstream caches can fail
these when they run into invalidations.
See http://www.m5sim.org/flyspray/task/197
Only set the dirty bit when we actually write to a block
(not if we thought we might but didn't, as in a failed
SC or CAS). This requires makeing sure the dirty bit
stays set when we get an exclusive (writable) copy
in a cache-to-cache transfer from another owner, which
n turn requires copying the mem-inhibit flag from
timing-mode requests to their associated responses.
One big difference is that PrioHeap puts the smallest element at the
top of the heap, whereas stl puts the largest element on top, so I
changed all comparisons so they did the right thing.
Some usage of PrioHeap was simply changed to a std::vector, using sort
at the right time, other usage had me just use the various heap functions
in the stl.
This was somewhat tricky because the RefCnt API was somewhat odd. The
biggest confusion was that the the RefCnt object's constructor that
took a TYPE& cloned the object. I created an explicit virtual clone()
function for things that took advantage of this version of the
constructor. I was conservative and used clone() when I was in doubt
of whether or not it was necessary. I still think that there are
probably too many instances of clone(), but hopefully not too many.
I converted several instances of const MsgPtr & to a simple MsgPtr.
If the function wants to avoid the overhead of creating another
reference, then it should just use a regular pointer instead of a ref
counting ptr.
There were a couple of instances where refcounted objects were created
on the stack. This seems pretty dangerous since if you ever
accidentally make a reference to that object with a ref counting
pointer, bad things are bound to happen.
Further cleanup should probably be done to make this class be non-Ruby
specific and put it in src/base.
There are probably several cases where this class is used, std::bitset
could be used instead.
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
Previously, the set size was set to 4. This was mostly do to the fact that a
crazy graduate student use to create networks with 256 l2 cache banks. Now it
is far more likely that users will create systems with less than 64 of any
particular controller type. Therefore Ruby should be optimized for a set size
of 1.
On the config end, if a shared L2 is created for the system, it is
parameterized to have n sharers as defined by option.num_cpus. In addition to
making the cache sharing aware so that discriminating tag policies can make use
of context_ids to make decisions, I added an occupancy AverageStat and an occ %
stat to each cache so that you could know which contexts are occupying how much
cache on average, both in terms of blocks and percentage. Note that since
devices have context_id -1, having an array of occ stats that correspond to
each context_id will break here, so in FS mode I add an extra bucket for device
blocks. This bucket is explicitly not added in SE mode in order to not only
avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas
break when a bucket is 0).
This patch includes the necessary regression updates to test the new ruby
configuration system. The patch includes support for multiple ruby protocols
and adds the ruby random tester. The patch removes atomic mode test for
ruby since ruby does not support atomic mode acceses. These tests can be
added back in when ruby supports atomic mode for real.
--HG--
rename : tests/quick/50.memtest/test.py => tests/quick/60.rubytest/test.py
Removed the dummy power function implementations so that Orion can implement
them correctly. Since Orion lacks modular design, this patch simply enables
scons to compile it. There are no python configuration changes in this patch.
Renamed the MESI directory file to be consistent with all other protocols.
--HG--
rename : src/mem/protocol/MESI_CMP_directory-mem.sm => src/mem/protocol/MESI_CMP_directory-dir.sm
Cleaned up the ruby profilers by moving the memory controller profiling code
out of the main profiler object and into a separate object similar to the
current CacheProfiler. Both the CacheProfiler and MemCntrlProfiler are
specific to a particular Ruby object, CacheMemory and MemoryControl
respectively. Therefore, these profilers should not be SimObjects and
created by the python configuration system, but instead private objects. This
simplifies the creation of these profilers.
Reorganized ruby python configuration so that protocol and ruby memory system
configuration code can be shared by multiple front-end configuration files
(i.e. memory tester, full system, and hopefully the regression tester). This
code works for memory tester, but have not tested fs mode.
Modified ruby's tracing support to no longer rely on the RubySystem map
to convert a sequencer string name to a sequencer pointer. As a
temporary solution, the code uses the sim_object find function.
Eventually, we should develop a better fix.
This patch includes a rather substantial change to the memory controller
profiler in order to work with the new configuration system. Most
noteably, the mem_cntrl_profiler no longer uses a string map, but instead
a vector. Eventually this support should be removed from the main
profiler and go into a separate object. Each memory controller should have
a pointer to that new mem_cntrl profile object.
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
The necessary companion conversion of Ruby objects generated by SLICC
are converted to M5 SimObjects in the following patch, so this patch
alone does not compile.
Conversion of Garnet network models is also handled in a separate
patch; that code is temporarily disabled from compiling to allow
testing of interim code.
Though OutPort's message type is not used to generate code, this fix checks
that the programmer's intent is correct. Eventually, we may want to
remove the message type from the OutPort declaration statement.
1) Move alpha-specific code out of page_table.cc:serialize().
2) Begin serializing M5_pid and unserializing it, but adding an function to do optional paramIn so that old checkpoints don't need to be fixed up.
3) Fix up alpha startup code so that the unserialized M5_pid value is properly written to DTB_IPR_ASN.
4) Fix the memory unserialize that I forgot somehow in the last changeset.
5) Add in an agg_se.py to handle aggregated checkpoints. --bench foo-bar plus positional arguments foo bar are the only changes in usage from se.py.
Note this aggregation stuff has only been tested for Alpha and nothing else, though it should take a very minimal amount of work to get it to work with another ISA.
This patch changes the way that Ruby handles atomic RMW instructions. This implementation, unlike the prior one, is protocol independent. It works by locking an address from the sequencer immediately after the read portion of an RMW completes. When that address is locked, the coherence controller will only satisfy requests coming from one port (e.g., the mandatory queue) and will ignore all others. After the write portion completed, the line is unlocked. This should also work with multi-line atomics, as long as the blocks are always acquired in the same order.
Added error messages when:
- a state does not exist in a machine's list of known states.
- an event does not exist in a machine
- the actions of a certain machine have not been declared
Connects M5 cpu and dma ports directly to ruby sequencers and dma
sequencers. Rubymem also includes a pio port so that pio requests
and be forwarded to a special pio bus connecting to device pio
ports.
Right now .cc and .hh files are handled separately, but then
they're just munged together at the end by scons, so it
doesn't buy us anything. Might as well munge from the start
since we'll eventually be adding generated Python files
to the list too.
This mostly was a matter of changing the license owner to Princeton
which is as it should have been. The code was originally licensed
under the GPL but was relicensed as BSD by Li-Shiuan Peh on July 27,
2009. This relicensing was in an explicit e-mail to Nathan Binkert,
Brad Beckmann, Mark Hill, David Wood, and Steve Reinhardt.
This prevents redundant prefetches from being issued, solving the
occasional 'needsExclusive && !blk->isWritable()' assertion failure
in cache_impl.hh that several people have run into.
Eliminates "prefetch_cache_check_push" flag, neither setting of
which really solved the problem.
This is simply a translation of the C++ slicc into python with very minimal
reorganization of the code. The output can be verified as nearly identical
by doing a "diff -wBur".
Slicc can easily be run manually by using util/slicc
Get rid of misc.py and just stick misc things in __init__.py
Move utility functions out of SCons files and into m5.util
Move utility type stuff from m5/__init__.py to m5/util/__init__.py
Remove buildEnv from m5 and allow access only from m5.defines
Rename AddToPath to addToPath while we're moving it to m5.util
Rename read_command to readCommand while we're moving it
Rename compare_versions to compareVersions while we're moving it.
--HG--
rename : src/python/m5/convert.py => src/python/m5/util/convert.py
rename : src/python/m5/smartdict.py => src/python/m5/util/smartdict.py
This changeset contains a lot of different changes that are too
mingled to separate. They are:
1. Added MOESI_CMP_directory
I made the changes necessary to bring back MOESI_CMP_directory,
including adding a DMA controller. I got rid of MOESI_CMP_directory_m
and made MOESI_CMP_directory use a memory controller. Added a new
configuration for two level protocols in general, and
MOESI_CMP_directory in particular.
2. DMA Sequencer uses a generic SequencerMsg
I will eventually make the cache Sequencer use this type as well. It
doesn't contain an offset field, just a physical address and a length.
MI_example has been updated to deal with this.
3. Parameterized Controllers
SLICC controllers can now take custom parameters to use for mapping,
latencies, etc. Currently, only int parameters are supported.
The inconsistency was causing a subtle bug with some of the
constructors where the params had the same name as the fields.
This is also a first step to switching the accessors over to
our new "standard", e.g., getVaddr() -> vaddr().
Caches are now responsible for their own statistic gathering. This
requires a direct callback from the protocol on misses, and so all
future protocols need to take this into account.
The DMASequencer was still using a parameter from the old RubyConfig,
causing an offset error when the requested data wasn't block aligned.
This changeset also includes a fix to MI_example for a similar bug.
2. Reintroduced RMW_Read and RMW_Write
3. Defined -2 in the Sequencer as well as made a note about mandatory queue
Did not address the issues in the slicc because remaking the atomics altogether to allow
multiple processors to issue atomic requests at once
This also includes a change to the default Ruby random seed, which was
previously set using the wall clock. It is now set to 1234 so that
the stat files don't change for the regression tester.
This was done with an automated process, so there could be things that were
done in this tree in the past that didn't make it. One known regression
is that atomic memory operations do not seem to work properly anymore.
This changeset also includes a lot of work from Derek Hower <drh5@cs.wisc.edu>
RubyMemory is now both a driver for Ruby and a port for M5. Changed
makeRequest/hitCallback interface. Brought packets (superficially)
into the sequencer. Modified tester infrastructure to be packet based.
and Ruby can be used together through the example ruby_se.py
script. SPARC parallel applications work, and the timing *seems* right
from combined M5/Ruby debug traces. To run,
% build/ALPHA_SE/m5.debug configs/example/ruby_se.py -c
tests/test-progs/hello/bin/alpha/linux/hello -n 4 -t
1. removed checks from tester files
2. removed else clause in Sequencer and DirectoryMemory else clause is
needed by the tester, it is up to Derek to revive it elsewhere when he
gets to it
Also:
1. Changed m_entries in DirectoryMemory to a map
2. And replaced SIMICS_read_physical_memory with a call to now-dummy
Derek's-to-be readPhysMem function
Add the PROTOCOL sticky option sets the coherence protocol that slicc
will parse and therefore ruby will use. This whole process was made
difficult by the fact that the set of files that are output by slicc
are not easily known ahead of time. The easiest thing wound up being
to write a parser for slicc that would tell me. Incidentally this
means we now have a slicc grammar written in python.
This basically means changing all #include statements and changing
autogenerated code so that it generates the correct paths. Because
slicc generates #includes, I had to hard code the include paths to
mem/protocol.
1) Removing files from the ruby build left some unresovled
symbols. Those have been fixed.
2) Most of the dependencies on Simics data types and the simics
interface files have been removed.
3) Almost all mention of opal is gone.
4) Huge chunks of LogTM are now gone.
5) Handling 1-4 left ~hundreds of unresolved references, which were
fixed, yielding a snowball effect (and the massive size of this
delta).
I did the macro cleanup because I was worried that the SCons scanner
would get confused. This code will hopefully go away soon anyway.
--HG--
rename : src/mem/ruby/config/config.include => src/mem/ruby/config/config.hh
Previously there was one per bus, which caused some coherence problems
when more than one decided to respond. Now there is just one on
the main memory bus. The default bus responder on all other buses
is now the downstream cache's cpu_side port. Caches no longer need
to do address range filtering; instead, we just have a simple flag
to prevent snoops from propagating to the I/O bus.