It's confusing (especially to new users), when you are setting some standard
parameters (as defined in Options.py) and they aren't reflected in the simulations
so we might as well link the settings in CacheConfig.py to those in Options.py
This way things that don't care about work count options and/or aren't called
by something that has those command line options set up doesn't have to build
a fake object to carry in inert values.
This makes sure that the address ranges requested for caches and uncached ports
don't conflict with each other, and that accesses which are always uncached
(message signaled interrupts for instance) don't waste time passing through
caches.
The disk image to use was always being forced to a particular value. This
change changes what disk image is selected as the default based on the
architecture being built. In the future, a more sophisticated system might be
used that selected a path based on certain rules instead of relying on one off
file names.
M5 skips over any simulated time where it doesn't have any work to do. When
the simulation is active, the time skipped is short and the work done at any
point in time is relatively substantial. If the time between events is long
and/or the work to do at each event is small, it's possible for simulated time
to pass faster than real time. When running a benchmark that can be good
because it means the simulation will finish sooner in real time. When
interacting with the real world through, for instance, a serial terminal or
bridge to a real network, this can be a problem. Human or network response time
could be greatly exagerated from the perspective of the simulation and make
simulated events happen "too soon" from an external perspective.
This change adds the capability to force the simulation to run no faster than
real time. It does so by scheduling a periodic event that checks to see if
its simulated period is shorter than its real period. If it is, it stalls the
simulation until they're equal. This is called time syncing.
A future change could add pseudo instructions which turn time syncing on and
off from within the simulation. That would allow time syncing to be used for
the interactive parts of a session but then turned off when running a
benchmark using the m5 utility program inside a script. Time syncing would
probably not happen anyway while running a benchmark because there would be
plenty of work for M5 to do, but the event overhead could be avoided.
This patch adds an option to the script Ruby.py for setting the parameter
m_random_seed used for randomizing delays in the memory system. The option
can be specified as "--random_seed <seed value>".
Most of the messages in the config scripts that report a time value already
print "@ tick" followed by the current tick value, but a few were printing
"@ cycle". Since this is a distinction that's frequently confusing to new
users, this changes those message to the more accurate and consistent "@ tick".
Patch allows each individual message buffer to have different recycle latencies
and allows the overall recycle latency to be specified at the cmd line. The
patch also adds profiling info to make sure no one processor's requests are
recycled too much.
This patch allows one to disable migratory sharing for those cache blocks that
are accessed by atomic requests. While the implementations are different
between the token and hammer protocols, the motivation is the same. For
Alpha, LLSC semantics expect that normal loads do not unlock cache blocks that
have been locked by LL accesses. Therefore, locked blocks should not transfer
write permissions when responding to these load requests. Instead, only they
only transfer read permissions so that the subsequent SC access can possibly
succeed.
The previous slower ruby latencies created a mismatch between the faster M5
cpu models and the much slower ruby memory system. Specifically smp
interrupts were much slower and infrequent, as well as cpus moving in and out
of spin locks. The result was many cpus were idle for large periods of time.
These changes fix the latency mismatch.
This patch adds DMA testing to the Memtester and is inherits many changes from
Polina's old tester_dma_extension patch. Since Ruby does not work in atomic
mode, the atomic mode options are removed.
This patch attaches ruby objects to the system before the topology is
created so that their simobject names read their meaningful variable
names instead of their topology name.
The separate restoreCheckpoint() call is gone; just pass
the checkpoint dir as an optional arg to instantiate().
This change is a precursor to some more extensive
reworking of the startup code.
Enforce that the Python Root SimObject is instantiated only
once. The C++ Root object already panics if more than one is
created. This change avoids the need to track what the root
object is, since it's available from Root.getInstance() (if it
exists). It's now redundant to have the user pass the root
object to functions like instantiate(), checkpoint(), and
restoreCheckpoint(), so that arg is gone. Users who use
configs/common/Simulate.py should not notice.
The patch creates a specific mesh network where directories are at the corners.
The patch is a good example of how to create an arbitrary network, similar to
the old file specified network, while leveraging scripts and loops when
possible.
Most of these frontend configurations share cache configuration code, pull it out so that
changes to caches don't have to require changing multiple config files.
On the config end, if a shared L2 is created for the system, it is
parameterized to have n sharers as defined by option.num_cpus. In addition to
making the cache sharing aware so that discriminating tag policies can make use
of context_ids to make decisions, I added an occupancy AverageStat and an occ %
stat to each cache so that you could know which contexts are occupying how much
cache on average, both in terms of blocks and percentage. Note that since
devices have context_id -1, having an array of occ stats that correspond to
each context_id will break here, so in FS mode I add an extra bucket for device
blocks. This bucket is explicitly not added in SE mode in order to not only
avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas
break when a bucket is 0).
Based on Steve's suggestion, the ugly if-elif statement and multiple protocol
module import calls are removed and replaced with exec statements using the
protocol string.
Cleaned up the ruby profilers by moving the memory controller profiling code
out of the main profiler object and into a separate object similar to the
current CacheProfiler. Both the CacheProfiler and MemCntrlProfiler are
specific to a particular Ruby object, CacheMemory and MemoryControl
respectively. Therefore, these profilers should not be SimObjects and
created by the python configuration system, but instead private objects. This
simplifies the creation of these profilers.
Reorganized ruby python configuration so that protocol and ruby memory system
configuration code can be shared by multiple front-end configuration files
(i.e. memory tester, full system, and hopefully the regression tester). This
code works for memory tester, but have not tested fs mode.
This patch includes a rather substantial change to the memory controller
profiler in order to work with the new configuration system. Most
noteably, the mem_cntrl_profiler no longer uses a string map, but instead
a vector. Eventually this support should be removed from the main
profiler and go into a separate object. Each memory controller should have
a pointer to that new mem_cntrl profile object.
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
The necessary companion conversion of Ruby objects generated by SLICC
are converted to M5 SimObjects in the following patch, so this patch
alone does not compile.
Conversion of Garnet network models is also handled in a separate
patch; that code is temporarily disabled from compiling to allow
testing of interim code.
Connects M5 cpu and dma ports directly to ruby sequencers and dma
sequencers. Rubymem also includes a pio port so that pio requests
and be forwarded to a special pio bus connecting to device pio
ports.
Get rid of misc.py and just stick misc things in __init__.py
Move utility functions out of SCons files and into m5.util
Move utility type stuff from m5/__init__.py to m5/util/__init__.py
Remove buildEnv from m5 and allow access only from m5.defines
Rename AddToPath to addToPath while we're moving it to m5.util
Rename read_command to readCommand while we're moving it
Rename compare_versions to compareVersions while we're moving it.
--HG--
rename : src/python/m5/convert.py => src/python/m5/util/convert.py
rename : src/python/m5/smartdict.py => src/python/m5/util/smartdict.py
fix bug with 'numThreads=len(workloads)' which was counting characters of command-line not counting threads as intended.
Update numThreads for inorder/o3 cases and default to 1 for all other cases.
-option to allow threads to run to a max_inst_any_thread which is more useful/quicker in a lot of
cases then always having to figure out what tick to run your simulation to.
This changeset also includes a lot of work from Derek Hower <drh5@cs.wisc.edu>
RubyMemory is now both a driver for Ruby and a port for M5. Changed
makeRequest/hitCallback interface. Brought packets (superficially)
into the sequencer. Modified tester infrastructure to be packet based.
and Ruby can be used together through the example ruby_se.py
script. SPARC parallel applications work, and the timing *seems* right
from combined M5/Ruby debug traces. To run,
% build/ALPHA_SE/m5.debug configs/example/ruby_se.py -c
tests/test-progs/hello/bin/alpha/linux/hello -n 4 -t
this was double scheduling itself (once in constructor and once in cpu code). also add support for stopping / starting
progress events through repeatEvent flag and also changing the interval of the progress event as well
Previously there was one per bus, which caused some coherence problems
when more than one decided to respond. Now there is just one on
the main memory bus. The default bus responder on all other buses
is now the downstream cache's cpu_side port. Caches no longer need
to do address range filtering; instead, we just have a simple flag
to prevent snoops from propagating to the I/O bus.
- Add the option of redirecting stderr to a file. With the old
behaviour, stderr would follow stdout if stdout was to a file, but
stderr went to the host stderr if stdout went to the host stdout. The
new default maintains stdout and stderr going to the host. Now the
two can specify different files, but they will share a file descriptor
if the name of the files is the same.
- Add --output and --errout options to se.py to go with --input.
into vm1.(none):/home/stever/bk/newmem-cache2
configs/example/memtest.py:
Hand merge redundant changes.
--HG--
extra : convert_revision : a2e36be254bf052024f37bcb23b5209f367d37e1
timing mode still broken.
configs/example/memtest.py:
Revamp options.
src/cpu/memtest/memtest.cc:
No need for memory initialization.
No need to make atomic response... memory system should do that now.
src/cpu/memtest/memtest.hh:
MemTest really doesn't want to snoop.
src/mem/bridge.cc:
checkFunctional() cleanup.
src/mem/bus.cc:
src/mem/bus.hh:
src/mem/cache/base_cache.cc:
src/mem/cache/base_cache.hh:
src/mem/cache/cache.cc:
src/mem/cache/cache.hh:
src/mem/cache/cache_blk.hh:
src/mem/cache/cache_builder.cc:
src/mem/cache/cache_impl.hh:
src/mem/cache/coherence/coherence_protocol.cc:
src/mem/cache/coherence/coherence_protocol.hh:
src/mem/cache/coherence/simple_coherence.hh:
src/mem/cache/miss/SConscript:
src/mem/cache/miss/mshr.cc:
src/mem/cache/miss/mshr.hh:
src/mem/cache/miss/mshr_queue.cc:
src/mem/cache/miss/mshr_queue.hh:
src/mem/cache/prefetch/base_prefetcher.cc:
src/mem/cache/tags/fa_lru.cc:
src/mem/cache/tags/fa_lru.hh:
src/mem/cache/tags/iic.cc:
src/mem/cache/tags/iic.hh:
src/mem/cache/tags/lru.cc:
src/mem/cache/tags/lru.hh:
src/mem/cache/tags/split.cc:
src/mem/cache/tags/split.hh:
src/mem/cache/tags/split_lifo.cc:
src/mem/cache/tags/split_lifo.hh:
src/mem/cache/tags/split_lru.cc:
src/mem/cache/tags/split_lru.hh:
src/mem/packet.cc:
src/mem/packet.hh:
src/mem/physical.cc:
src/mem/physical.hh:
src/mem/tport.cc:
More major reorg. Seems to work for atomic mode now,
timing mode still broken.
--HG--
extra : convert_revision : 7e70dfc4a752393b911880ff028271433855ae87
Make clocks more reasonable.
Fix bug in sense of options.timing flag.
configs/example/memtest.py:
Fix bug in sense of options.timing flag.
configs/example/memtest.py:
Make clocks more reasonable.
--HG--
extra : convert_revision : 3715697988c56e92a4da129b42026d0623f5e85e
configs/example/memtest.py:
PhysicalMemory has vector of uniform ports instead of one special one.
Other updates to fix obsolete brokenness.
src/mem/physical.cc:
src/mem/physical.hh:
src/python/m5/objects/PhysicalMemory.py:
Have vector of uniform ports instead of one special one.
src/python/swig/pyobject.cc:
Add comment.
--HG--
extra : convert_revision : a4a764dcdcd9720bcd07c979d0ece311fc8cb4f1
cache blocks that get dmaed ARE NOT marked invalid in the caches so it's a performance issue here
src/mem/bridge.cc:
src/mem/bridge.hh:
hopefully the final hacky change to make the bus bridge work ok
--HG--
extra : convert_revision : 62cbc65c74d1a84199f0a376546ec19994c5899c
set the latency parameter in terms of a latency
add caches to tsunami-simple configs
configs/common/Caches.py:
tests/configs/memtest.py:
tests/configs/o3-timing-mp.py:
tests/configs/o3-timing.py:
tests/configs/simple-atomic-mp.py:
tests/configs/simple-timing-mp.py:
tests/configs/simple-timing.py:
set the latency parameter in terms of a latency
configs/common/FSConfig.py:
give the bridge a default latency too
src/mem/cache/cache_builder.cc:
src/python/m5/objects/BaseCache.py:
remove hit_latency and make latency do the right thing
tests/configs/tsunami-simple-atomic-dual.py:
tests/configs/tsunami-simple-atomic.py:
tests/configs/tsunami-simple-timing-dual.py:
tests/configs/tsunami-simple-timing.py:
add caches to tsunami-simple configs
--HG--
extra : convert_revision : 37bef7c652e97c8cdb91f471fba62978f89019f1
figure out the block size from devices attached to the bus otherwise use a default block size when no devices that care are attached
configs/common/FSConfig.py:
src/mem/bridge.cc:
src/mem/bridge.hh:
src/python/m5/objects/Bridge.py:
fix partial writes with a functional memory hack
src/mem/bus.cc:
src/mem/bus.hh:
src/python/m5/objects/Bus.py:
figure out the block size from devices attached to the bus otherwise use a default block size when no devices that care are attached
src/mem/packet.cc:
fix WriteInvalidateResp to not be a request that needs a response since it isn't
src/mem/port.hh:
by default return 0 for deviceBlockSize instead of panicing. This makes finding the block size the bus should use easier
--HG--
extra : convert_revision : 3fcfe95f9f392ef76f324ee8bd1d7f6de95c1a64
fix script to reflect new benchmark directory sturcture
configs/boot/spec-surge-client.rcS:
fix script to reflect new benchmark directory sturcture
--HG--
extra : convert_revision : 45f9d8aebabd1f3f8d1e826e07840e2365511a35
directly configured by python. Move stuff from root.(cc|hh) to
core.(cc|hh) since it really belogs there now.
In the process, simplify how ticks are used in the python code.
--HG--
extra : convert_revision : cf82ee1ea20f9343924f30bacc2a38d4edee8df3