Re-enabling implicit parenting (see previous patch) causes current
Ruby config scripts to create some strange hierarchies and generate
several warnings. This patch makes three general changes to address
these issues.
1. The order of object creation in the ruby config files makes the L1
caches children of the sequencer rather than the controller; these
config ciles are rewritten to assign the L1 caches to the
controller first.
2. The assignment of the sequencer list to system.ruby.cpu_ruby_ports
causes the sequencers to be children of system.ruby, generating
warnings because they are already parented to their respective
controllers. Changing this attribute to _cpu_ruby_ports fixes this
because the leading underscore means this is now treated as a plain
Python attribute rather than a child assignment. As a result, the
configuration hierarchy changes such that, e.g.,
system.ruby.cpu_ruby_ports0 becomes system.l1_cntrl0.sequencer.
3. In the topology classes, the routers become children of some random
internal link node rather than direct children of the topology.
The topology classes are rewritten to assign the routers to the
topology object first.
The virtual channels within "response" vnets are made buffers_per_data_vc
deep (default=4), while virtual channels within other vnets are made
buffers_per_ctrl_vc deep (default = 1). This is for accurate power estimates.
Identifying response vnets versus other vnets will allow garnet to
determine which vnets will carry data packets, and which will carry
ctrl packets, and use appropriate buffer sizes (since data packets are larger
than ctrl packets). This in turn allows the orion power model to accurately
estimate buffer power.
Renamed (message) class to vnet for consistency with rest of ruby.
Moved some parameters specific to fixed/flexible garnet networks into their
corresponding py files.
The RubyMemory flag wasnt used in the code, creating large gaps in trace output. Replace cprintfs w/dprintfs
using RubyMemory in memory controller. DPRINTF also deprecate the usage of the setDebug() pure virtual
function in the AbstractMemoryOrCache Class as well the m_debug/cprintf functions in MemoryControl.hh/cc
The simple network's endpoint bandwidth value is used to adjust the overall
bandwidth of the network. Specifically, the ration between endpoint bandwidth
and the MESSAGE_SIZE_MULTIPLIER determines the increase. By setting the value
to 1000, that means the bandwdith factor specified in the links translates to
the link bandwidth in bytes. Previously, it was increasing that value by 10.
This patch will likely require a reset of the ruby regression tester stats.
Moved the buffer_size, endpoint_bandwidth, and adaptive_routing params out of
the top-level parent network object and to only those networks that actually
use those parameters.
This patch ensures that both Garnet and the simple networks use the bw value
specified in the topology. To do so, the patch generalizes the specification
of bw for basic links. This value is then translated to the specific value
used by the simple and Garnet networks. Since Garent does not support
non-uniformed link bandwidth, the patch also adds a check to ensure all bws are
equal.
--HG--
rename : src/mem/ruby/network/BasicLink.cc => src/mem/ruby/network/simple/SimpleLink.cc
rename : src/mem/ruby/network/BasicLink.hh => src/mem/ruby/network/simple/SimpleLink.hh
rename : src/mem/ruby/network/BasicLink.py => src/mem/ruby/network/simple/SimpleLink.py
This patch converts links and switches from second class simobjects that were
virtually ignored by the networks (both simple and Garnet) to first class
simobjects that directly correspond to c++ ojbects manipulated by the
topology and network classes. This is especially true for Garnet, where the
links and switches directly correspond to specific C++ objects.
By making this change, many aspects of the Topology class were simplified.
--HG--
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/BasicLink.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/BasicLink.hh
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.hh
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.py
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/fixed-pipeline/GarnetRouter_d.py
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.hh
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.py
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/flexible-pipeline/GarnetRouter.py
Moved the Topology class to the top network directory because it is shared by
both the simple and Garnet networks.
--HG--
rename : src/mem/ruby/network/simple/Topology.cc => src/mem/ruby/network/Topology.cc
rename : src/mem/ruby/network/simple/Topology.hh => src/mem/ruby/network/Topology.hh
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
Fixed an error reguarding DMA for uninprocessor systems. Basically removed an
overly agressive optimization that lead to inconsistent state between the
cache and the directory.
This function duplicates the functionality of allocate() exactly, except that it does not return
a return value. In protocols where you just want to allocate a block
but do not want that block to be your implicitly passed cache_entry, use this function.
Otherwise, SLICC will complain if you do not consume the pointer returned by allocate(),
and if you do a dummy assignment Entry foo := cache.allocate(address), the C++
compiler will complain of an unused variable. This is kind of a hack to get around
those issues, but suggestions welcome.
Before this changeset, all local variables of type Entry and TBE were considered
to be pointers, but an immediate use of said variables would not be automatically
deferenced in SLICC-generated code. Instead, deferences occurred when such
variables were passed to functions, and were automatically dereferenced in
the bodies of the functions (e.g. the implicitly passed cache_entry).
This is a more general way to do it, which leaves in place the
assumption that parameters to functions and local variables of type AbstractCacheEntry
and TBE are always pointers, but instead of dereferencing to access member variables
on a contextual basis, the dereferencing automatically occurs on a type basis at the
moment a member is being accessed. So, now, things you can do that you couldn't before
include:
Entry foo := getCacheEntry(address);
cache_entry.DataBlk := foo.DataBlk;
or
cache_entry.DataBlk := getCacheEntry(address).DataBlk;
or even
cache_entry.DataBlk := static_cast(Entry, pointer, cache.lookup(address)).DataBlk;
This is a substitute for MessageBuffers between controllers where you don't
want messages to actually go through the Network, because requests/responses can
always get reordered wrt to one another (even if you turn off Randomization and turn on Ordered)
because you are, after all, going through a network with contention. For systems where you model
multiple controllers that are very tightly coupled and do not actually go through a network,
it is a pain to have to write a coherence protocol to account for mixed up request/response orderings
despite the fact that it's completely unrealistic. This is *not* meant as a substitute for real
MessageBuffers when messages do in fact go over a network.
It is useful for Ruby to understand from whence request packets came.
This has all request packets going into Ruby pass the contextId value, if
it exists. This supplants the old libruby proc_id value passed around in
all the Messages, so I've also removed the unused unsigned proc_id; member
generated by SLICC for all Message types.
The goal of the patch is to do away with the CacheMsg class currently in use
in coherence protocols. In place of CacheMsg, the RubyRequest class will used.
This class is already present in slicc_interface/RubyRequest.hh. In fact,
objects of class CacheMsg are generated by copying values from a RubyRequest
object.
The tester code is in testers/networktest.
The tester can be invoked by configs/example/ruby_network_test.py.
A dummy coherence protocol called Network_test is also addded for network-only simulations and testing. The protocol takes in messages from the tester and just pushes them into the network in the appropriate vnet, without storing any state.
I had recently committed a patch that removed the WakeUp*.py files from the
slicc/ast directory. I had forgotten to remove the import calls for these
files from slicc/ast/__init__.py. This resulted in error while running
regressions on zizzer. This patch does the needful.
This patch fixes the problem where Ruby would fail to call sendRetry on ports
after it nacked the port. This patch is particularly helpful for bursty dma
requests which often include several packets.
In SLICC, in order to define a type a data type for which it should not
generate any code, the keyword external_type is used. For those data types for
which code should be generated, the keyword structure is used. This patch
eliminates the use of keyword external_type for defining structures. structure
key word can now have an optional attribute external, which would be used for
figuring out whether or not to generate the code for this structure. Also, now
structures can have functions as well data members in them.
In order to add stall and wait facility for protocols, a keyword
wake_up_dependents was introduced. This patch removes the keyword,
instead this functionality is now implemented as function call.
In order to add stall and wait facility for protocols, a keyword
wake_up_all_dependents was introduced. This patch removes the keyword,
instead this functionality is now implemented as function call.
This change fixes the problem for all the cases we actively use. If you want to try
more creative I/O device attachments (E.g. sharing an L2), this won't work. You
would need another level of caching between the I/O device and the cache
(which you actually need anyway with our current code to make sure writes
propagate). This is required so that you can mark the cache in between as
top level and it won't try to send ownership of a block to the I/O device.
Asserts have been added that should catch any issues.
None of the code in the ruby tester directory is compiled or referred to
outside of that directory. This change eliminates it. If it's needed in the
future, it can be revived from the history. In the mean time, this removes
clutter and the only use of the GEMS_ROOT scons variable.
There may not be a formally correct spelling for the past tense of mmap, but
mmapped is the spelling Google doesn't try to autocorrect. This makes sense
because it mirrors the past tense of map->mapped and not the past tense of
cape->caped.
--HG--
rename : src/arch/alpha/mmaped_ipr.hh => src/arch/alpha/mmapped_ipr.hh
rename : src/arch/arm/mmaped_ipr.hh => src/arch/arm/mmapped_ipr.hh
rename : src/arch/mips/mmaped_ipr.hh => src/arch/mips/mmapped_ipr.hh
rename : src/arch/power/mmaped_ipr.hh => src/arch/power/mmapped_ipr.hh
rename : src/arch/sparc/mmaped_ipr.hh => src/arch/sparc/mmapped_ipr.hh
rename : src/arch/x86/mmaped_ipr.hh => src/arch/x86/mmapped_ipr.hh
At a couple of places in PerfectSwitch.cc and MessageBuffer.cc, DPRINTF()
has not been provided with correct number of arguments. The patch fixes these
bugs.
This patch removes the store buffer from Ruby. It is not in use currently.
Since libruby is being and store buffer makes calls to libruby, it is not
possible to maintain it until substantial changes are made.
This patch changes Address.hh so that it is not dependent on RubySystem.
This dependence seems unecessary. All those functions that depend on
RubySystem have been moved to Address.cc file.
This patch changes DataBlock.hh so that it is not dependent on RubySystem.
This dependence seems unecessary. All those functions that depende on
RubySystem have been moved to DataBlock.cc file.
This patch integrates permissions with cache and memory states, and then
automates the setting of permissions within the generated code. No longer
does one need to manually set the permissions within the setState funciton.
This patch will faciliate easier functional access support by always correctly
setting permissions for both cache and memory states.
--HG--
rename : src/mem/slicc/ast/EnumDeclAST.py => src/mem/slicc/ast/StateDeclAST.py
rename : src/mem/slicc/ast/TypeFieldEnumAST.py => src/mem/slicc/ast/TypeFieldStateAST.py
"executing" isnt a very descriptive debug message and in going through the
output you get multiple messages that say "executing" but nothing to help
you parse through the code/execution.
So instead, at least print out the name of the action that is taking
place in these functions.
Overall, continue to progress Ruby debug messages to more of the normal M5
debug message style
- add a name() to the Ruby Throttle & PerfectSwitch objects so that the debug output
isn't littered w/"global:" everywhere.
- clean up messages that print over multiple lines when possible
- clean up duplicate prints in the message buffer
In certain actions of the L1 cache controller, while creating an outgoing
message, the machine type was not being set. This results in a
segmentation fault when trace is collected. Joseph Pusudesris provided
his patch for fixing this issue.
Currently the wakeup function for the PerfectSwitch contains three loops -
loop on number of virtual networks
loop on number of incoming links
loop till all messages for this (link, network) have been routed
With an 8 processor mesh network and Hammer protocol, about 11-12% of the
was observed to have been spent in this function, which is the highest
amongst all the functions. It was found that the innermost loop is executed
about 45 times per invocation of the wakeup function, when each invocation
of the wakeup function processes just about one message.
The patch tries to do away with the redundant executions of the innermost
loop. Counters have been added for each virtual network that record the
number of messages that need to be routed for that virtual network. The
inner loops are only executed when the number of messages for that particular
virtual network > 0. This does away with almost 80% of the executions of the
innermost loop. The function now consumes about 5-6% of the total execution
time.
The patch changes the order in which L1 dcache and icache are looked up when
a request comes in. Earlier, if a request came in for instruction fetch, the
dcache was looked up before the icache, to correctly handle self-modifying
code. But, in the common case, dcache is going to report a miss and the
subsequent icache lookup is going to report a hit. Given the invariant -
caches under the same controller keep track of disjoint sets of cache blocks,
we can move the icache lookup before the dcache lookup. In case of a hit in
the icache, using our invariant, we know that the dcache would have reported
a miss. In case of a miss in the icache, we know that icache would have
missed even if the dcache was looked up before looking up the icache.
Effectively, we are doing the same thing as before, though in the common case,
we expect reduction in the number of lookups. This was empirically confirmed
for MOESI hammer. The ratio lookups to access requests is now about 1.1 to 1.
The TBE pointer in the MESI CMP implementation was not being set to NULL
when the TBE is deallocated. This resulted in segmentation fault on testing
the protocol when the ProtocolTrace was switched on.
The code for Orion 2.0 makes use of printf() at several places where there as
an error in configuration of the model. These have been replaced with fatal().
By stalling and waiting the mandatory queue instead of recycling it, one can
ensure that no incoming messages are starved when the mandatory queue puts
signficant of pressure on the L1 cache controller (i.e. the ruby memtester).
--HG--
rename : src/mem/slicc/ast/WakeUpDependentsStatementAST.py => src/mem/slicc/ast/WakeUpAllDependentsStatementAST.py
The packet now identifies whether static or dynamic data has been allocated and
is used by Ruby to determine whehter to copy the data pointer into the ruby
request. Subsequently, Ruby can be told not to update phys memory when
receiving packets.
Separate data VCs and ctrl VCs in garnet, as ctrl VCs have 1 buffer per VC,
while data VCs have > 1 buffers per VC. This is for correct power estimations.
The purpose of this patch is to change the way CacheMemory interfaces with
coherence protocols. Currently, whenever a cache controller (defined in the
protocol under consideration) needs to carry out any operation on a cache
block, it looks up the tag hash map and figures out whether or not the block
exists in the cache. In case it does exist, the operation is carried out
(which requires another lookup). As observed through profiling of different
protocols, multiple such lookups take place for a given cache block. It was
noted that the tag lookup takes anything from 10% to 20% of the simulation
time. In order to reduce this time, this patch is being posted.
I have to acknowledge that the many of the thoughts that went in to this
patch belong to Brad.
Changes to CacheMemory, TBETable and AbstractCacheEntry classes:
1. The lookup function belonging to CacheMemory class now returns a pointer
to a cache block entry, instead of a reference. The pointer is NULL in case
the block being looked up is not present in the cache. Similar change has
been carried out in the lookup function of the TBETable class.
2. Function for setting and getting access permission of a cache block have
been moved from CacheMemory class to AbstractCacheEntry class.
3. The allocate function in CacheMemory class now returns pointer to the
allocated cache entry.
Changes to SLICC:
1. Each action now has implicit variables - cache_entry and tbe. cache_entry,
if != NULL, must point to the cache entry for the address on which the action
is being carried out. Similarly, tbe should also point to the transaction
buffer entry of the address on which the action is being carried out.
2. If a cache entry or a transaction buffer entry is passed on as an
argument to a function, it is presumed that a pointer is being passed on.
3. The cache entry and the tbe pointers received __implicitly__ by the
actions, are passed __explicitly__ to the trigger function.
4. While performing an action, set/unset_cache_entry, set/unset_tbe are to
be used for setting / unsetting cache entry and tbe pointers respectively.
5. is_valid() and is_invalid() has been made available for testing whether
a given pointer 'is not NULL' and 'is NULL' respectively.
6. Local variables are now available, but they are assumed to be pointers
always.
7. It is now possible for an object of the derieved class to make calls to
a function defined in the interface.
8. An OOD token has been introduced in SLICC. It is same as the NULL token
used in C/C++. If you are wondering, OOD stands for Out Of Domain.
9. static_cast can now taken an optional parameter that asks for casting the
given variable to a pointer of the given type.
10. Functions can be annotated with 'return_by_pointer=yes' to return a
pointer.
11. StateMachine has two new variables, EntryType and TBEType. EntryType is
set to the type which inherits from 'AbstractCacheEntry'. There can only be
one such type in the machine. TBEType is set to the type for which 'TBE' is
used as the name.
All the protocols have been modified to conform with the new interface.
This patch changes the manner in which data is copied from L1 to L2 cache in
the implementation of the Hammer's cache coherence protocol. Earlier, data was
copied directly from one cache entry to another. This has been broken in to
two parts. First, the data is copied from the source cache entry to a
transaction buffer entry. Then, data is copied from the transaction buffer
entry to the destination cache entry.
This has been done to maintain the invariant - at any given instant, multiple
caches under a controller are exclusive with respect to each other.
Ran all the source files through 'perl -pi' with this script:
s|\s*(};?\s*)?/\*\s*(end\s*)?namespace\s*(\S+)\s*\*/(\s*})?|} // namespace $3|;
s|\s*};?\s*//\s*(end\s*)?namespace\s*(\S+)\s*|} // namespace $2\n|;
s|\s*};?\s*//\s*(\S+)\s*namespace\s*|} // namespace $1\n|;
Also did a little manual editing on some of the arch/*/isa_traits.hh files
and src/SConscript.
Two functions in src/mem/ruby/system/PerfectCacheMemory.hh, tryCacheAccess()
and cacheProbe(), end with calls to panic(). Both of these functions have
return type other than void. Any file that includes this header file fails
to compile because of the missing return statement. This patch adds dummy
values so as to avoid the compiler warnings.
This diff is for changing the way ASSERT is handled in Ruby. m5.fast
compiles out the assert statements by using the macro NDEBUG. Ruby uses the
macro RUBY_NO_ASSERT to do so. This macro has been removed and NDEBUG has
been put in its place.
These flags were being used to identify what alignment a request needed, but
the same information is available using the request size. This change also
eliminates the isMisaligned function. If more complicated alignment checks are
needed, they can be signaled using the ASI_BITS space in the flags vector like
is currently done with ARM.
If we write back an exclusive copy, we now mark it
as such, so the cache receiving the writeback can
mark its copy as exclusive. This avoids some
unnecessary upgrade requests when a cache later
tries to re-acquire exclusive access to the block.
Also move the "Fault" reference counted pointer type into a separate file,
sim/fault.hh. It would be better to name this less similarly to sim/faults.hh
to reduce confusion, but fault.hh matches the name of the type. We could change
Fault to FaultPtr to match other pointer types, and then changing the name of
the file would make more sense.
Corrects an oversight in cset f97b62be544f. The fix there only
failed queued SCUpgradeReq packets that encountered an
invalidation, which meant that the upgrade had to reach the L2
cache. To handle pending requests in the L1 we must similarly
fail StoreCondReq packets too.
Allow lower-level caches (e.g., L2 or L3) to pass exclusive
copies to higher levels (e.g., L1). This eliminates a lot
of unnecessary upgrade transactions on read-write sequences
to non-shared data.
Also some cleanup of MSHR coherence handling and multiple
bug fixes.
This patch allows messages to be stalled in their input buffers and wait
until a corresponding address changes state. In order to make this work,
all in_ports must be ranked in order of dependence and those in_ports that
may unblock an address, must wake up the stalled messages. Alot of this
complexity is handled in slicc and the specification files simply
annotate the in_ports.
--HG--
rename : src/mem/slicc/ast/CheckAllocateStatementAST.py => src/mem/slicc/ast/StallAndWaitStatementAST.py
rename : src/mem/slicc/ast/CheckAllocateStatementAST.py => src/mem/slicc/ast/WakeUpDependentsStatementAST.py
Patch allows each individual message buffer to have different recycle latencies
and allows the overall recycle latency to be specified at the cmd line. The
patch also adds profiling info to make sure no one processor's requests are
recycled too much.
The main purpose for clearing stats in the unserialize process is so
that the profiler can correctly set its start time to the unserialized
value of curTick.
This patch allows one to disable migratory sharing for those cache blocks that
are accessed by atomic requests. While the implementations are different
between the token and hammer protocols, the motivation is the same. For
Alpha, LLSC semantics expect that normal loads do not unlock cache blocks that
have been locked by LL accesses. Therefore, locked blocks should not transfer
write permissions when responding to these load requests. Instead, only they
only transfer read permissions so that the subsequent SC access can possibly
succeed.
This patch fixes several bugs related to previous inconsistent assumptions on
how many tokens the Owner had. Mike Marty should have fixes these bugs years
ago. :)
Previously, the MOESI_hammer protocol calculated the same latency for L1 and
L2 hits. This was because the protocol was written using the old ruby
assumption that L1 hits used the sequencer fast path. Since ruby no longer
uses the fast-path, the protocol delays L2 hits by placing them on the
trigger queue.
The previous slower ruby latencies created a mismatch between the faster M5
cpu models and the much slower ruby memory system. Specifically smp
interrupts were much slower and infrequent, as well as cpus moving in and out
of spin locks. The result was many cpus were idle for large periods of time.
These changes fix the latency mismatch.
This patch adds back to ruby the capability to understand the response time
for messages that hit in different levels of the cache heirarchy.
Specifically add support for the MI_example, MOESI_hammer, and MOESI_CMP_token
protocols.
This patch adds DMA testing to the Memtester and is inherits many changes from
Polina's old tester_dma_extension patch. Since Ruby does not work in atomic
mode, the atomic mode options are removed.
Clean up some minor things left over from the default responder
change in rev 9af6fb59752f. Mostly renaming the 'responder_set'
param to 'use_default_range' to actually reflect what it does...
old name wasn't that descriptive in the first place, but now
it really doesn't make sense at all.
Also got rid of the bogus obsolete assignment to 'bus.responder'
which used to be a parameter but now is interpreted as an
implicit child assignment, and which was giving me problems in
the config restructuring to come. (A good argument for not
allowing implicit child assignments, IMO, but that's water under
the bridge, I'm afraid.)
Also moved the Bus constructor to the .cc file since that's
where it should have been all along.
Requires new "SCUpgradeReq" message that marks upgrades
for store conditionals, so downstream caches can fail
these when they run into invalidations.
See http://www.m5sim.org/flyspray/task/197
Only set the dirty bit when we actually write to a block
(not if we thought we might but didn't, as in a failed
SC or CAS). This requires makeing sure the dirty bit
stays set when we get an exclusive (writable) copy
in a cache-to-cache transfer from another owner, which
n turn requires copying the mem-inhibit flag from
timing-mode requests to their associated responses.
One big difference is that PrioHeap puts the smallest element at the
top of the heap, whereas stl puts the largest element on top, so I
changed all comparisons so they did the right thing.
Some usage of PrioHeap was simply changed to a std::vector, using sort
at the right time, other usage had me just use the various heap functions
in the stl.
This was somewhat tricky because the RefCnt API was somewhat odd. The
biggest confusion was that the the RefCnt object's constructor that
took a TYPE& cloned the object. I created an explicit virtual clone()
function for things that took advantage of this version of the
constructor. I was conservative and used clone() when I was in doubt
of whether or not it was necessary. I still think that there are
probably too many instances of clone(), but hopefully not too many.
I converted several instances of const MsgPtr & to a simple MsgPtr.
If the function wants to avoid the overhead of creating another
reference, then it should just use a regular pointer instead of a ref
counting ptr.
There were a couple of instances where refcounted objects were created
on the stack. This seems pretty dangerous since if you ever
accidentally make a reference to that object with a ref counting
pointer, bad things are bound to happen.
Further cleanup should probably be done to make this class be non-Ruby
specific and put it in src/base.
There are probably several cases where this class is used, std::bitset
could be used instead.
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
Previously, the set size was set to 4. This was mostly do to the fact that a
crazy graduate student use to create networks with 256 l2 cache banks. Now it
is far more likely that users will create systems with less than 64 of any
particular controller type. Therefore Ruby should be optimized for a set size
of 1.
On the config end, if a shared L2 is created for the system, it is
parameterized to have n sharers as defined by option.num_cpus. In addition to
making the cache sharing aware so that discriminating tag policies can make use
of context_ids to make decisions, I added an occupancy AverageStat and an occ %
stat to each cache so that you could know which contexts are occupying how much
cache on average, both in terms of blocks and percentage. Note that since
devices have context_id -1, having an array of occ stats that correspond to
each context_id will break here, so in FS mode I add an extra bucket for device
blocks. This bucket is explicitly not added in SE mode in order to not only
avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas
break when a bucket is 0).
This patch includes the necessary regression updates to test the new ruby
configuration system. The patch includes support for multiple ruby protocols
and adds the ruby random tester. The patch removes atomic mode test for
ruby since ruby does not support atomic mode acceses. These tests can be
added back in when ruby supports atomic mode for real.
--HG--
rename : tests/quick/50.memtest/test.py => tests/quick/60.rubytest/test.py
Removed the dummy power function implementations so that Orion can implement
them correctly. Since Orion lacks modular design, this patch simply enables
scons to compile it. There are no python configuration changes in this patch.
Renamed the MESI directory file to be consistent with all other protocols.
--HG--
rename : src/mem/protocol/MESI_CMP_directory-mem.sm => src/mem/protocol/MESI_CMP_directory-dir.sm
Cleaned up the ruby profilers by moving the memory controller profiling code
out of the main profiler object and into a separate object similar to the
current CacheProfiler. Both the CacheProfiler and MemCntrlProfiler are
specific to a particular Ruby object, CacheMemory and MemoryControl
respectively. Therefore, these profilers should not be SimObjects and
created by the python configuration system, but instead private objects. This
simplifies the creation of these profilers.
Reorganized ruby python configuration so that protocol and ruby memory system
configuration code can be shared by multiple front-end configuration files
(i.e. memory tester, full system, and hopefully the regression tester). This
code works for memory tester, but have not tested fs mode.
Modified ruby's tracing support to no longer rely on the RubySystem map
to convert a sequencer string name to a sequencer pointer. As a
temporary solution, the code uses the sim_object find function.
Eventually, we should develop a better fix.
This patch includes a rather substantial change to the memory controller
profiler in order to work with the new configuration system. Most
noteably, the mem_cntrl_profiler no longer uses a string map, but instead
a vector. Eventually this support should be removed from the main
profiler and go into a separate object. Each memory controller should have
a pointer to that new mem_cntrl profile object.
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
The necessary companion conversion of Ruby objects generated by SLICC
are converted to M5 SimObjects in the following patch, so this patch
alone does not compile.
Conversion of Garnet network models is also handled in a separate
patch; that code is temporarily disabled from compiling to allow
testing of interim code.
Though OutPort's message type is not used to generate code, this fix checks
that the programmer's intent is correct. Eventually, we may want to
remove the message type from the OutPort declaration statement.
1) Move alpha-specific code out of page_table.cc:serialize().
2) Begin serializing M5_pid and unserializing it, but adding an function to do optional paramIn so that old checkpoints don't need to be fixed up.
3) Fix up alpha startup code so that the unserialized M5_pid value is properly written to DTB_IPR_ASN.
4) Fix the memory unserialize that I forgot somehow in the last changeset.
5) Add in an agg_se.py to handle aggregated checkpoints. --bench foo-bar plus positional arguments foo bar are the only changes in usage from se.py.
Note this aggregation stuff has only been tested for Alpha and nothing else, though it should take a very minimal amount of work to get it to work with another ISA.
This patch changes the way that Ruby handles atomic RMW instructions. This implementation, unlike the prior one, is protocol independent. It works by locking an address from the sequencer immediately after the read portion of an RMW completes. When that address is locked, the coherence controller will only satisfy requests coming from one port (e.g., the mandatory queue) and will ignore all others. After the write portion completed, the line is unlocked. This should also work with multi-line atomics, as long as the blocks are always acquired in the same order.
Added error messages when:
- a state does not exist in a machine's list of known states.
- an event does not exist in a machine
- the actions of a certain machine have not been declared
Connects M5 cpu and dma ports directly to ruby sequencers and dma
sequencers. Rubymem also includes a pio port so that pio requests
and be forwarded to a special pio bus connecting to device pio
ports.
Right now .cc and .hh files are handled separately, but then
they're just munged together at the end by scons, so it
doesn't buy us anything. Might as well munge from the start
since we'll eventually be adding generated Python files
to the list too.
This mostly was a matter of changing the license owner to Princeton
which is as it should have been. The code was originally licensed
under the GPL but was relicensed as BSD by Li-Shiuan Peh on July 27,
2009. This relicensing was in an explicit e-mail to Nathan Binkert,
Brad Beckmann, Mark Hill, David Wood, and Steve Reinhardt.
This prevents redundant prefetches from being issued, solving the
occasional 'needsExclusive && !blk->isWritable()' assertion failure
in cache_impl.hh that several people have run into.
Eliminates "prefetch_cache_check_push" flag, neither setting of
which really solved the problem.
This is simply a translation of the C++ slicc into python with very minimal
reorganization of the code. The output can be verified as nearly identical
by doing a "diff -wBur".
Slicc can easily be run manually by using util/slicc
Get rid of misc.py and just stick misc things in __init__.py
Move utility functions out of SCons files and into m5.util
Move utility type stuff from m5/__init__.py to m5/util/__init__.py
Remove buildEnv from m5 and allow access only from m5.defines
Rename AddToPath to addToPath while we're moving it to m5.util
Rename read_command to readCommand while we're moving it
Rename compare_versions to compareVersions while we're moving it.
--HG--
rename : src/python/m5/convert.py => src/python/m5/util/convert.py
rename : src/python/m5/smartdict.py => src/python/m5/util/smartdict.py
This changeset contains a lot of different changes that are too
mingled to separate. They are:
1. Added MOESI_CMP_directory
I made the changes necessary to bring back MOESI_CMP_directory,
including adding a DMA controller. I got rid of MOESI_CMP_directory_m
and made MOESI_CMP_directory use a memory controller. Added a new
configuration for two level protocols in general, and
MOESI_CMP_directory in particular.
2. DMA Sequencer uses a generic SequencerMsg
I will eventually make the cache Sequencer use this type as well. It
doesn't contain an offset field, just a physical address and a length.
MI_example has been updated to deal with this.
3. Parameterized Controllers
SLICC controllers can now take custom parameters to use for mapping,
latencies, etc. Currently, only int parameters are supported.
The inconsistency was causing a subtle bug with some of the
constructors where the params had the same name as the fields.
This is also a first step to switching the accessors over to
our new "standard", e.g., getVaddr() -> vaddr().
Caches are now responsible for their own statistic gathering. This
requires a direct callback from the protocol on misses, and so all
future protocols need to take this into account.
The DMASequencer was still using a parameter from the old RubyConfig,
causing an offset error when the requested data wasn't block aligned.
This changeset also includes a fix to MI_example for a similar bug.
2. Reintroduced RMW_Read and RMW_Write
3. Defined -2 in the Sequencer as well as made a note about mandatory queue
Did not address the issues in the slicc because remaking the atomics altogether to allow
multiple processors to issue atomic requests at once
This also includes a change to the default Ruby random seed, which was
previously set using the wall clock. It is now set to 1234 so that
the stat files don't change for the regression tester.
This was done with an automated process, so there could be things that were
done in this tree in the past that didn't make it. One known regression
is that atomic memory operations do not seem to work properly anymore.
This changeset also includes a lot of work from Derek Hower <drh5@cs.wisc.edu>
RubyMemory is now both a driver for Ruby and a port for M5. Changed
makeRequest/hitCallback interface. Brought packets (superficially)
into the sequencer. Modified tester infrastructure to be packet based.
and Ruby can be used together through the example ruby_se.py
script. SPARC parallel applications work, and the timing *seems* right
from combined M5/Ruby debug traces. To run,
% build/ALPHA_SE/m5.debug configs/example/ruby_se.py -c
tests/test-progs/hello/bin/alpha/linux/hello -n 4 -t
1. removed checks from tester files
2. removed else clause in Sequencer and DirectoryMemory else clause is
needed by the tester, it is up to Derek to revive it elsewhere when he
gets to it
Also:
1. Changed m_entries in DirectoryMemory to a map
2. And replaced SIMICS_read_physical_memory with a call to now-dummy
Derek's-to-be readPhysMem function
Add the PROTOCOL sticky option sets the coherence protocol that slicc
will parse and therefore ruby will use. This whole process was made
difficult by the fact that the set of files that are output by slicc
are not easily known ahead of time. The easiest thing wound up being
to write a parser for slicc that would tell me. Incidentally this
means we now have a slicc grammar written in python.
This basically means changing all #include statements and changing
autogenerated code so that it generates the correct paths. Because
slicc generates #includes, I had to hard code the include paths to
mem/protocol.
1) Removing files from the ruby build left some unresovled
symbols. Those have been fixed.
2) Most of the dependencies on Simics data types and the simics
interface files have been removed.
3) Almost all mention of opal is gone.
4) Huge chunks of LogTM are now gone.
5) Handling 1-4 left ~hundreds of unresolved references, which were
fixed, yielding a snowball effect (and the massive size of this
delta).
I did the macro cleanup because I was worried that the SCons scanner
would get confused. This code will hopefully go away soon anyway.
--HG--
rename : src/mem/ruby/config/config.include => src/mem/ruby/config/config.hh
Previously there was one per bus, which caused some coherence problems
when more than one decided to respond. Now there is just one on
the main memory bus. The default bus responder on all other buses
is now the downstream cache's cpu_side port. Caches no longer need
to do address range filtering; instead, we just have a simple flag
to prevent snoops from propagating to the I/O bus.
This frees up needed space for more public flags. Also:
- remove unused Request accessor methods
- make Packet use public Request accessors, so it need not be a friend
Apparently we broke it with the cache rewrite and never noticed.
Thanks to Bao Yungang <baoyungang@gmail.com> for a significant part
of these changes (and for inspiring me to work on the rest).
Some other overdue cleanup on the prefetch code too.
Bogus calls to ChunkGenerator with negative size were triggering
a new assertion that was added there.
Also did a little renaming and cleanup in the process.
I think readData() and writeData() were used for Erik's compression
work, but that code is gone, these aren't called anymore, and they
don't even really do what their names imply.
I did some of the flags and assertions wrong. Thanks to Brad Beckmann
for pointing this out. I should have run the opt regressions instead
of the fast. I also screwed up some of the logical functions in the Flags
class.
the primary identifier for a hardware context should be contextId(). The
concept of threads within a CPU remains, in the form of threadId() because
sometimes you need to know which context within a cpu to manipulate.
Since the early days of M5, an event needed to know which event queue
it was on, and that data was required at the time of construction of
the event object. In the future parallelized M5, this sort of
requirement does not work well since the proper event queue will not
always be known at the time of construction of an event. Now, events
are created, and the EventQueue itself has the schedule function,
e.g. eventq->schedule(event, when). To simplify the syntax, I created
a class called EventManager which holds a pointer to an EventQueue and
provides the schedule interface that is a proxy for the EventQueue.
The intent is that objects that frequently schedule events can be
derived from EventManager and then they have the schedule interface.
SimObject and Port are examples of objects that will become
EventManagers. The end result is that any SimObject can just call
schedule(event, when) and it will just call that SimObject's
eventq->schedule function. Of course, some objects may have more than
one EventQueue, so this interface might not be perfect for those, but
they should be relatively few.
This appears to work, but I don't want to commit it until it gets tested a lot more.
I haven't deleted the functionality in this patch that will come later, but one question
is how to enforce encourage objects that call getVirtPort() to not cache the virtual port
since if the CPU changes out from under them it will be worse than useless. Perhaps a null
function like delVirtPort() is still useful in that case.
It runs out that if a MemObject turns around and does a send in its
receive callback, and there are other sends already scheduled, then
it could observe a state where it's not at the head of the list but
the bus's sendEvent is not scheduled (because we're still in the
middle of processing the prior sendEvent).
I was asserting that the only reason you would defer targets is if
a write came in while you had an outstanding read miss, but there's
another case where you could get a read access after you've snooped
an invalidation and buffered it because it applies to a prior
outstanding miss.
Make OutputDirectory::resolve() private and change the functions using
resolve() to instead use create().
--HG--
extra : convert_revision : 36d4be629764d0c4c708cec8aa712cd15f966453
if a prior write miss arrived while an even earlier
read miss was still outstanding.
--HG--
extra : convert_revision : 4924e145829b2ecf4610b88d33f4773510c6801a
where we defer a response to a read from a far-away cache A, then later
defer a ReadExcl from a cache B on the same bus as us. We'll assert
MemInhibit in both cases, but in the latter case MemInhibit will keep
the invalidation from reaching cache A. This special response tells
cache A that it gets the block to satisfy its read, but must immediately
invalidate it.
--HG--
extra : convert_revision : f85c8b47bb30232da37ac861b50a6539dc81161b
Don't mark upstream MSHR as pending if downstream MSHR is already in service.
--HG--
extra : convert_revision : e1c135ff00217291db58ce8a06ccde34c403d37f
Not so much noise on failed sends, and more complete
info when grepping a trace using an address.
--HG--
extra : convert_revision : 05a8261c9452072ca08b906200c6322b33e2b9f1
SimObjects not yet updated:
- Process and subclasses
- BaseCPU and subclasses
The SimObject(const std::string &name) constructor was removed. Subclasses
that still rely on that behavior must call the parent initializer as
: SimObject(makeParams(name))
--HG--
extra : convert_revision : d6faddde76e7c3361ebdbd0a7b372a40941c12ed
The page table now stores actual page table entries. It is still a templated
class here, but this will be corrected in the near future.
--HG--
extra : convert_revision : 804dcc6320414c2b3ab76a74a15295bd24e1d13d
Make sure not to keep processing functional accesses
after they've been responded to.
Also use checkFunctional() return value instead of checking
packet command field where possible, mostly just for consistency.
--HG--
extra : convert_revision : 29fc76bc18731bd93a4ed05a281297827028ef75
creation and initialization now happens in python. Parameter objects
are generated and initialized by python. The .ini file is now solely for
debugging purposes and is not used in construction of the objects in any
way.
--HG--
extra : convert_revision : 7e722873e417cb3d696f2e34c35ff488b7bff4ed
Turns out DeferredSnoop isn't quite the right bit of info
we needed... see new comment in cache_impl.hh.
--HG--
extra : convert_revision : a38de8c1677a37acafb743b7074ef88b21d3b7be
If the invalidation beats the upgrade at a lower level
then the upgrade must be converted to a read exclusive
"in the field".
Restructure target list & deferred target list to
factor out some common code.
--HG--
extra : convert_revision : 7bab4482dd6c48efdb619610f0d3778c60ff777a
- Add "deferred snoop" flag to Packet so upper-level caches
can distinguish whether lower-level cache request was
in-service or not at the time of the original snoop.
- Revamp response handling to properly handle deferred snoops
on non-cache-fill requests (i.e. upgrades).
- Make sure forwarded writebacks are kept in write buffer at
lower-level caches so they get snooped properly.
--HG--
extra : convert_revision : 17f8a3772a1ae31a16991a53f8225ddf54d31fc9
Move check for loops outside, since half the call sites
end up working around it anyway. Return integer port ID
instead of port object pointer.
--HG--
extra : convert_revision : 4c31fe9930f4d1aa4919e764efb7c50d43792ea3
Note that we should *not* print pointer values in DPRINTFs as
these needlessly clutter tracediff output.
--HG--
extra : convert_revision : 25a448f1b3ac8d453a717a104ad6dc0112fb30bb
src/cpu/simple/timing.cc:
Fix another SC problem.
src/mem/cache/cache_impl.hh:
Forgot to call makeTimingResponse() on uncached timing responses.
--HG--
extra : convert_revision : 5a5a58ca2053e4e8de2133205bfd37de15eb4209
Stats pretty much line up with old code, except:
- bug in old code included L1 latency in L2 miss time, making it too high
- UniCoherence did cache-to-cache transfers even from non-owner caches,
so occasionally the icache would get a block from the dcache not the L2
- L2 can now receive ReadExReq from L1 since L1s have coherence
--HG--
extra : convert_revision : 5052c1a1767b5a662f30a88f16012165a73b791c
Change target overflow from assertion to warning.
src/mem/cache/cache_impl.hh:
Change target overflow from assertion to warning.
--HG--
extra : convert_revision : ceca990ed916bbf96dedd4836c40df522803f173
src/mem/cache/cache_impl.hh:
Handle grants with no packet.
src/mem/cache/miss/mshr.cc:
Fix MSHR snoop hit handling.
--HG--
extra : convert_revision : f365283afddaa07cb9e050b2981ad6a898c14451
sure we don't re-request bus prematurely. Use callback to
avoid calling sendRetry() recursively within recvTiming.
--HG--
extra : convert_revision : a907a2781b4b00aa8eb1ea7147afc81d6b424140
src/cpu/memtest/memtest.cc:
Need to set packet source field so that response from cache
doesn't run into assertion failure when copying source to dest.
src/mem/packet.hh:
Copy source field when copying packets.
Assert that source is valid before copying it to dest
when turning packets around.
--HG--
extra : convert_revision : 09e3cfda424aa89fe170e21e955b295746832bf8
supposed to and make sure parameters have the right type.
Also make sure that any object that should be an intermediate
type has the right options set.
--HG--
extra : convert_revision : d56910628d9a067699827adbc0a26ab629d11e93
into vm1.(none):/home/stever/bk/newmem-cache2
configs/example/memtest.py:
Hand merge redundant changes.
--HG--
extra : convert_revision : a2e36be254bf052024f37bcb23b5209f367d37e1
timing mode still broken.
configs/example/memtest.py:
Revamp options.
src/cpu/memtest/memtest.cc:
No need for memory initialization.
No need to make atomic response... memory system should do that now.
src/cpu/memtest/memtest.hh:
MemTest really doesn't want to snoop.
src/mem/bridge.cc:
checkFunctional() cleanup.
src/mem/bus.cc:
src/mem/bus.hh:
src/mem/cache/base_cache.cc:
src/mem/cache/base_cache.hh:
src/mem/cache/cache.cc:
src/mem/cache/cache.hh:
src/mem/cache/cache_blk.hh:
src/mem/cache/cache_builder.cc:
src/mem/cache/cache_impl.hh:
src/mem/cache/coherence/coherence_protocol.cc:
src/mem/cache/coherence/coherence_protocol.hh:
src/mem/cache/coherence/simple_coherence.hh:
src/mem/cache/miss/SConscript:
src/mem/cache/miss/mshr.cc:
src/mem/cache/miss/mshr.hh:
src/mem/cache/miss/mshr_queue.cc:
src/mem/cache/miss/mshr_queue.hh:
src/mem/cache/prefetch/base_prefetcher.cc:
src/mem/cache/tags/fa_lru.cc:
src/mem/cache/tags/fa_lru.hh:
src/mem/cache/tags/iic.cc:
src/mem/cache/tags/iic.hh:
src/mem/cache/tags/lru.cc:
src/mem/cache/tags/lru.hh:
src/mem/cache/tags/split.cc:
src/mem/cache/tags/split.hh:
src/mem/cache/tags/split_lifo.cc:
src/mem/cache/tags/split_lifo.hh:
src/mem/cache/tags/split_lru.cc:
src/mem/cache/tags/split_lru.hh:
src/mem/packet.cc:
src/mem/packet.hh:
src/mem/physical.cc:
src/mem/physical.hh:
src/mem/tport.cc:
More major reorg. Seems to work for atomic mode now,
timing mode still broken.
--HG--
extra : convert_revision : 7e70dfc4a752393b911880ff028271433855ae87
using a divide in order to not loop forever after resuming from a checkpoint
--HG--
extra : convert_revision : 4bbc70b1be4e5c4ed99d4f88418ab620d5ce475a
Makes page table cache scheme actually work
src/mem/page_table.cc:
src/mem/page_table.hh:
fix caching scheme to actually work and improve performance
--HG--
extra : convert_revision : 443a8d8acbee540b26affcfdfbf107b8e735d1bd
Oops... forgot to update call site after changing
function argument semantics.
src/mem/tport.cc:
Oops... forgot to update call site after changing
function argument semantics.
--HG--
extra : convert_revision : 9234b991dc678f062d268ace73c71b3d13dd17dc
- factor out checkFunctional() code so it can be
called from derived classes
- use EventWrapper for sendEvent, move event handling
code from event to port where it belongs
- make sendEvent a pointer so derived classes can
override it
- replace std::pair with new class for readability
--HG--
extra : convert_revision : 5709de2daacfb751a440144ecaab5f9fc02e6b7a