Cleaned up the ruby profilers by moving the memory controller profiling code
out of the main profiler object and into a separate object similar to the
current CacheProfiler. Both the CacheProfiler and MemCntrlProfiler are
specific to a particular Ruby object, CacheMemory and MemoryControl
respectively. Therefore, these profilers should not be SimObjects and
created by the python configuration system, but instead private objects. This
simplifies the creation of these profilers.
Reorganized ruby python configuration so that protocol and ruby memory system
configuration code can be shared by multiple front-end configuration files
(i.e. memory tester, full system, and hopefully the regression tester). This
code works for memory tester, but have not tested fs mode.
Modified ruby's tracing support to no longer rely on the RubySystem map
to convert a sequencer string name to a sequencer pointer. As a
temporary solution, the code uses the sim_object find function.
Eventually, we should develop a better fix.
This patch includes a rather substantial change to the memory controller
profiler in order to work with the new configuration system. Most
noteably, the mem_cntrl_profiler no longer uses a string map, but instead
a vector. Eventually this support should be removed from the main
profiler and go into a separate object. Each memory controller should have
a pointer to that new mem_cntrl profile object.
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
The necessary companion conversion of Ruby objects generated by SLICC
are converted to M5 SimObjects in the following patch, so this patch
alone does not compile.
Conversion of Garnet network models is also handled in a separate
patch; that code is temporarily disabled from compiling to allow
testing of interim code.
Though OutPort's message type is not used to generate code, this fix checks
that the programmer's intent is correct. Eventually, we may want to
remove the message type from the OutPort declaration statement.
1) Move alpha-specific code out of page_table.cc:serialize().
2) Begin serializing M5_pid and unserializing it, but adding an function to do optional paramIn so that old checkpoints don't need to be fixed up.
3) Fix up alpha startup code so that the unserialized M5_pid value is properly written to DTB_IPR_ASN.
4) Fix the memory unserialize that I forgot somehow in the last changeset.
5) Add in an agg_se.py to handle aggregated checkpoints. --bench foo-bar plus positional arguments foo bar are the only changes in usage from se.py.
Note this aggregation stuff has only been tested for Alpha and nothing else, though it should take a very minimal amount of work to get it to work with another ISA.
This patch changes the way that Ruby handles atomic RMW instructions. This implementation, unlike the prior one, is protocol independent. It works by locking an address from the sequencer immediately after the read portion of an RMW completes. When that address is locked, the coherence controller will only satisfy requests coming from one port (e.g., the mandatory queue) and will ignore all others. After the write portion completed, the line is unlocked. This should also work with multi-line atomics, as long as the blocks are always acquired in the same order.
Added error messages when:
- a state does not exist in a machine's list of known states.
- an event does not exist in a machine
- the actions of a certain machine have not been declared
Connects M5 cpu and dma ports directly to ruby sequencers and dma
sequencers. Rubymem also includes a pio port so that pio requests
and be forwarded to a special pio bus connecting to device pio
ports.
Right now .cc and .hh files are handled separately, but then
they're just munged together at the end by scons, so it
doesn't buy us anything. Might as well munge from the start
since we'll eventually be adding generated Python files
to the list too.
This mostly was a matter of changing the license owner to Princeton
which is as it should have been. The code was originally licensed
under the GPL but was relicensed as BSD by Li-Shiuan Peh on July 27,
2009. This relicensing was in an explicit e-mail to Nathan Binkert,
Brad Beckmann, Mark Hill, David Wood, and Steve Reinhardt.
This prevents redundant prefetches from being issued, solving the
occasional 'needsExclusive && !blk->isWritable()' assertion failure
in cache_impl.hh that several people have run into.
Eliminates "prefetch_cache_check_push" flag, neither setting of
which really solved the problem.
This is simply a translation of the C++ slicc into python with very minimal
reorganization of the code. The output can be verified as nearly identical
by doing a "diff -wBur".
Slicc can easily be run manually by using util/slicc
Get rid of misc.py and just stick misc things in __init__.py
Move utility functions out of SCons files and into m5.util
Move utility type stuff from m5/__init__.py to m5/util/__init__.py
Remove buildEnv from m5 and allow access only from m5.defines
Rename AddToPath to addToPath while we're moving it to m5.util
Rename read_command to readCommand while we're moving it
Rename compare_versions to compareVersions while we're moving it.
--HG--
rename : src/python/m5/convert.py => src/python/m5/util/convert.py
rename : src/python/m5/smartdict.py => src/python/m5/util/smartdict.py
This changeset contains a lot of different changes that are too
mingled to separate. They are:
1. Added MOESI_CMP_directory
I made the changes necessary to bring back MOESI_CMP_directory,
including adding a DMA controller. I got rid of MOESI_CMP_directory_m
and made MOESI_CMP_directory use a memory controller. Added a new
configuration for two level protocols in general, and
MOESI_CMP_directory in particular.
2. DMA Sequencer uses a generic SequencerMsg
I will eventually make the cache Sequencer use this type as well. It
doesn't contain an offset field, just a physical address and a length.
MI_example has been updated to deal with this.
3. Parameterized Controllers
SLICC controllers can now take custom parameters to use for mapping,
latencies, etc. Currently, only int parameters are supported.
The inconsistency was causing a subtle bug with some of the
constructors where the params had the same name as the fields.
This is also a first step to switching the accessors over to
our new "standard", e.g., getVaddr() -> vaddr().
Caches are now responsible for their own statistic gathering. This
requires a direct callback from the protocol on misses, and so all
future protocols need to take this into account.
The DMASequencer was still using a parameter from the old RubyConfig,
causing an offset error when the requested data wasn't block aligned.
This changeset also includes a fix to MI_example for a similar bug.
2. Reintroduced RMW_Read and RMW_Write
3. Defined -2 in the Sequencer as well as made a note about mandatory queue
Did not address the issues in the slicc because remaking the atomics altogether to allow
multiple processors to issue atomic requests at once
This also includes a change to the default Ruby random seed, which was
previously set using the wall clock. It is now set to 1234 so that
the stat files don't change for the regression tester.
This was done with an automated process, so there could be things that were
done in this tree in the past that didn't make it. One known regression
is that atomic memory operations do not seem to work properly anymore.
This changeset also includes a lot of work from Derek Hower <drh5@cs.wisc.edu>
RubyMemory is now both a driver for Ruby and a port for M5. Changed
makeRequest/hitCallback interface. Brought packets (superficially)
into the sequencer. Modified tester infrastructure to be packet based.
and Ruby can be used together through the example ruby_se.py
script. SPARC parallel applications work, and the timing *seems* right
from combined M5/Ruby debug traces. To run,
% build/ALPHA_SE/m5.debug configs/example/ruby_se.py -c
tests/test-progs/hello/bin/alpha/linux/hello -n 4 -t
1. removed checks from tester files
2. removed else clause in Sequencer and DirectoryMemory else clause is
needed by the tester, it is up to Derek to revive it elsewhere when he
gets to it
Also:
1. Changed m_entries in DirectoryMemory to a map
2. And replaced SIMICS_read_physical_memory with a call to now-dummy
Derek's-to-be readPhysMem function
Add the PROTOCOL sticky option sets the coherence protocol that slicc
will parse and therefore ruby will use. This whole process was made
difficult by the fact that the set of files that are output by slicc
are not easily known ahead of time. The easiest thing wound up being
to write a parser for slicc that would tell me. Incidentally this
means we now have a slicc grammar written in python.
This basically means changing all #include statements and changing
autogenerated code so that it generates the correct paths. Because
slicc generates #includes, I had to hard code the include paths to
mem/protocol.
1) Removing files from the ruby build left some unresovled
symbols. Those have been fixed.
2) Most of the dependencies on Simics data types and the simics
interface files have been removed.
3) Almost all mention of opal is gone.
4) Huge chunks of LogTM are now gone.
5) Handling 1-4 left ~hundreds of unresolved references, which were
fixed, yielding a snowball effect (and the massive size of this
delta).
I did the macro cleanup because I was worried that the SCons scanner
would get confused. This code will hopefully go away soon anyway.
--HG--
rename : src/mem/ruby/config/config.include => src/mem/ruby/config/config.hh
Previously there was one per bus, which caused some coherence problems
when more than one decided to respond. Now there is just one on
the main memory bus. The default bus responder on all other buses
is now the downstream cache's cpu_side port. Caches no longer need
to do address range filtering; instead, we just have a simple flag
to prevent snoops from propagating to the I/O bus.
This frees up needed space for more public flags. Also:
- remove unused Request accessor methods
- make Packet use public Request accessors, so it need not be a friend
Apparently we broke it with the cache rewrite and never noticed.
Thanks to Bao Yungang <baoyungang@gmail.com> for a significant part
of these changes (and for inspiring me to work on the rest).
Some other overdue cleanup on the prefetch code too.
Bogus calls to ChunkGenerator with negative size were triggering
a new assertion that was added there.
Also did a little renaming and cleanup in the process.
I think readData() and writeData() were used for Erik's compression
work, but that code is gone, these aren't called anymore, and they
don't even really do what their names imply.
I did some of the flags and assertions wrong. Thanks to Brad Beckmann
for pointing this out. I should have run the opt regressions instead
of the fast. I also screwed up some of the logical functions in the Flags
class.
the primary identifier for a hardware context should be contextId(). The
concept of threads within a CPU remains, in the form of threadId() because
sometimes you need to know which context within a cpu to manipulate.
Since the early days of M5, an event needed to know which event queue
it was on, and that data was required at the time of construction of
the event object. In the future parallelized M5, this sort of
requirement does not work well since the proper event queue will not
always be known at the time of construction of an event. Now, events
are created, and the EventQueue itself has the schedule function,
e.g. eventq->schedule(event, when). To simplify the syntax, I created
a class called EventManager which holds a pointer to an EventQueue and
provides the schedule interface that is a proxy for the EventQueue.
The intent is that objects that frequently schedule events can be
derived from EventManager and then they have the schedule interface.
SimObject and Port are examples of objects that will become
EventManagers. The end result is that any SimObject can just call
schedule(event, when) and it will just call that SimObject's
eventq->schedule function. Of course, some objects may have more than
one EventQueue, so this interface might not be perfect for those, but
they should be relatively few.
This appears to work, but I don't want to commit it until it gets tested a lot more.
I haven't deleted the functionality in this patch that will come later, but one question
is how to enforce encourage objects that call getVirtPort() to not cache the virtual port
since if the CPU changes out from under them it will be worse than useless. Perhaps a null
function like delVirtPort() is still useful in that case.
It runs out that if a MemObject turns around and does a send in its
receive callback, and there are other sends already scheduled, then
it could observe a state where it's not at the head of the list but
the bus's sendEvent is not scheduled (because we're still in the
middle of processing the prior sendEvent).
I was asserting that the only reason you would defer targets is if
a write came in while you had an outstanding read miss, but there's
another case where you could get a read access after you've snooped
an invalidation and buffered it because it applies to a prior
outstanding miss.
Make OutputDirectory::resolve() private and change the functions using
resolve() to instead use create().
--HG--
extra : convert_revision : 36d4be629764d0c4c708cec8aa712cd15f966453
if a prior write miss arrived while an even earlier
read miss was still outstanding.
--HG--
extra : convert_revision : 4924e145829b2ecf4610b88d33f4773510c6801a