This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.
Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
Previously, the user would have to manually set access_backing_store=True
on all RubyPorts (Sequencers) in the config files.
Now, instead there is one global option that each RubyPort checks on
initialization.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
evicted from the sleeping cpu's cache. This eviction is forwarded to the
sleeping cpu, which then wakes up.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Ruby's functional accesses are not guaranteed to succeed as of now. While
this is not a problem for the protocols that are currently in the mainline
repo, it seems that coherence protocols for gpus rely on a backing store to
supply the correct data. The aim of this patch is to make this backing store
configurable i.e. it comes into play only when a particular option:
--access-backing-store is invoked.
The backing store has been there since M5 and GEMS were integrated. The only
difference is that earlier the system used to maintain the backing store and
ruby's copy was write-only. Sometime last year, we moved to data being
supplied supplied by ruby in SE mode simulations. And now we have patches on
the reviewboard, which remove ruby's copy of memory altogether and rely
completely on the system's memory to supply data. This patch adds back a
SimpleMemory member to RubySystem. This member is used only if the option:
access-backing-store is set to true. By default, the memory would not be
accessed.
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.
Both ruby and the system used to maintain memory copies. With the changes
carried for programmed io accesses, only one single memory is required for
fs simulations. This patch sets the copy of memory that used to reside
with the system to null, so that no space is allocated, but address checks
can still be carried out. All the memory accesses now source and sink values
to the memory maintained by ruby.
This patch is the final patch in a series of patches. The aim of the series
is to make ruby more configurable than it was. More specifically, the
connections between controllers are not at all possible (unless one is ready
to make significant changes to the coherence protocol). Moreover the buffers
themselves are magically connected to the network inside the slicc code.
These connections are not part of the configuration file.
This patch makes changes so that these connections will now be made in the
python configuration files associated with the protocols. This requires
each state machine to expose the message buffers it uses for input and output.
So, the patch makes these buffers configurable members of the machines.
The patch drops the slicc code that usd to connect these buffers to the
network. Now these buffers are exposed to the python configuration system
as Master and Slave ports. In the configuration files, any master port
can be connected any slave port. The file pyobject.cc has been modified to
take care of allocating the actual message buffer. This is inline with how
other port connections work.
This helps in configuring the network interfaces from the python script and
these objects no longer rely on the network object for the timing information.
The patch removes the ruby_fs.py file. The functionality is being moved to
fs.py. This would being ruby fs simulations in line with how ruby se
simulations are started (using --ruby option). The alpha fs config functions
are being combined for classing and ruby memory systems. This required
renaming the piobus in ruby to iobus. So, we will have stats being renamed
in the stats file for ruby fs regression.
The routers are created before the network class. This results in the routers
becoming children of the first link they are connected to and they get generic
names like int_node and node_b. This patch creates the network object first
and passes it to the topology creation function. Now the routers are children
of the network object and names are much more sensible.
The number of transitions per cycle that a controller can carry out is
a proxy for the number of ports that a controller has. This value is
currently 32 which is way too high. The patch introduces an option
for the number of ports and uses this option in the protocol files
to set the number of transitions. The default value is being set to
4. None of the se regressions change. Ruby stats for the fs regression
change and are being updated.
This patch moves the instantiation of the memory controller outside
FSConfig and instead relies on the mem_ranges to pass the information
to the caller (e.g. fs.py or one of the regression scripts). The main
motivation for this change is to expose the structural composition of
the memory system and allow more tuning and configuration without
adding a large number of options to the makeSystem functions.
The patch updates the relevant example scripts to maintain the current
functionality. As the order that ports are connected to the memory bus
changes (in certain regresisons), some bus stats are shuffled
around. For example, what used to be layer 0 is now layer 1.
Going forward, options will be added to support the addition of
multi-channel memory controllers.
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.
The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).
The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.
All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.
The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
The Topology class in Ruby does not need to inherit from SimObject class.
This patch turns it into a regular class. The topology object is now created
in the constructor of the Network class. All the parameters for the topology
class have been moved to the network class.
This patch simplifies the Range object hierarchy in preparation for an
address range class that also allows striping (e.g. selecting a few
bits as matching in addition to the range).
To extend the AddrRange class to an AddrRegion, the first step is to
simplify the hierarchy such that we can make it as lean as possible
before adding the new functionality. The only class using Range and
MetaRange is AddrRange, and the three classes are now collapsed into
one.
This patch moves instantiateTopology into Ruby.py and removes the
mem/ruby/network/topologies directory. It also adds some extra inheritance to
the topologies to clean up some issues in the existing topologies.
Instead of just passing a list of controllers to the makeTopology function
in src/mem/ruby/network/topologies/<Topo>.py we pass in a function pointer
which knows how to make the topology, possibly with some extra state set
in the configs/ruby/<protocol>.py file. Thus, we can move all of the files
from network/topologies to configs/topologies. A new class BaseTopology
is added which all topologies in configs/topologies must inheirit from and
follow its API.
--HG--
rename : src/mem/ruby/network/topologies/Crossbar.py => configs/topologies/Crossbar.py
rename : src/mem/ruby/network/topologies/Mesh.py => configs/topologies/Mesh.py
rename : src/mem/ruby/network/topologies/MeshDirCorners.py => configs/topologies/MeshDirCorners.py
rename : src/mem/ruby/network/topologies/Pt2Pt.py => configs/topologies/Pt2Pt.py
rename : src/mem/ruby/network/topologies/Torus.py => configs/topologies/Torus.py
This patch removes the assumption on having on single instance of
PhysicalMemory, and enables a distributed memory where the individual
memories in the system are each responsible for a single contiguous
address range.
All memories inherit from an AbstractMemory that encompasses the basic
behaviuor of a random access memory, and provides untimed access
methods. What was previously called PhysicalMemory is now
SimpleMemory, and a subclass of AbstractMemory. All future types of
memory controllers should inherit from AbstractMemory.
To enable e.g. the atomic CPU and RubyPort to access the now
distributed memory, the system has a wrapper class, called
PhysicalMemory that is aware of all the memories in the system and
their associated address ranges. This class thus acts as an
infinitely-fast bus and performs address decoding for these "shortcut"
accesses. Each memory can specify that it should not be part of the
global address map (used e.g. by the functional memories by some
testers). Moreover, each memory can be configured to be reported to
the OS configuration table, useful for populating ATAG structures, and
any potential ACPI tables.
Checkpointing support currently assumes that all memories have the
same size and organisation when creating and resuming from the
checkpoint. A future patch will enable a more flexible
re-organisation.
--HG--
rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py
rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py
rename : src/mem/physical.cc => src/mem/abstract_mem.cc
rename : src/mem/physical.hh => src/mem/abstract_mem.hh
rename : src/mem/physical.cc => src/mem/simple_mem.cc
rename : src/mem/physical.hh => src/mem/simple_mem.hh
With recent changes to the memory system, a port cannot be assigned a peer
port twice. While making use of the Ruby memory system in FS mode, DMA
ports were assigned peer twice, once for the classic memory system
and once for the Ruby memory system. This patch removes this double
assignment of peer ports.
This patch removes the physMemPort from the RubySequencer and instead
uses the system pointer to access the physmem. The system already
keeps track of the physmem and the valid memory address ranges, and
with this patch we merely make use of that existing functionality. The
memory is modified so that it is possible to call the access functions
(atomic and functional) without going through the port, and the memory
is allowed to be unconnected, i.e. have no ports (since Ruby does not
attach it like the conventional memory system).
This patch classifies all ports in Python as either Master or Slave
and enforces a binding of master to slave. Conceptually, a master (such
as a CPU or DMA port) issues requests, and receives responses, and
conversely, a slave (such as a memory or a PIO device) receives
requests and sends back responses. Currently there is no
differentiation between coherent and non-coherent masters and slaves.
The classification as master/slave also involves splitting the dual
role port of the bus into a master and slave port and updating all the
system assembly scripts to use the appropriate port. Similarly, the
interrupt devices have to have their int_port split into a master and
slave port. The intdev and its children have minimal changes to
facilitate the extra port.
Note that this patch does not enforce any port typing in the C++
world, it merely ensures that the Python objects have a notion of the
port roles and are connected in an appropriate manner. This check is
carried when two ports are connected, e.g. bus.master =
memory.port. The following patches will make use of the
classifications and specialise the C++ ports into masters and slaves.
This patch moves the connection of the system port to create_system in
Ruby.py. Thereby it allows the failing Ruby test (and other Ruby
systems) to run again.
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.
The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy
--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
This patch adds a fault model, which provides the probability of a number of
architectural faults in the interconnection network (e.g., data corruption,
misrouting). These probabilities can be used to realistically inject faults
in GARNET and faithfully evaluate the effectiveness of novel resilient NoC
architectures.
This patch rpovides functional access support in Ruby. Currently only
the M5Port of RubyPort supports functional accesses. The support for
functional through the PioPort will be added as a separate patch.
Re-enabling implicit parenting (see previous patch) causes current
Ruby config scripts to create some strange hierarchies and generate
several warnings. This patch makes three general changes to address
these issues.
1. The order of object creation in the ruby config files makes the L1
caches children of the sequencer rather than the controller; these
config ciles are rewritten to assign the L1 caches to the
controller first.
2. The assignment of the sequencer list to system.ruby.cpu_ruby_ports
causes the sequencers to be children of system.ruby, generating
warnings because they are already parented to their respective
controllers. Changing this attribute to _cpu_ruby_ports fixes this
because the leading underscore means this is now treated as a plain
Python attribute rather than a child assignment. As a result, the
configuration hierarchy changes such that, e.g.,
system.ruby.cpu_ruby_ports0 becomes system.l1_cntrl0.sequencer.
3. In the topology classes, the routers become children of some random
internal link node rather than direct children of the topology.
The topology classes are rewritten to assign the routers to the
topology object first.
This patch ensures that both Garnet and the simple networks use the bw value
specified in the topology. To do so, the patch generalizes the specification
of bw for basic links. This value is then translated to the specific value
used by the simple and Garnet networks. Since Garent does not support
non-uniformed link bandwidth, the patch also adds a check to ensure all bws are
equal.
--HG--
rename : src/mem/ruby/network/BasicLink.cc => src/mem/ruby/network/simple/SimpleLink.cc
rename : src/mem/ruby/network/BasicLink.hh => src/mem/ruby/network/simple/SimpleLink.hh
rename : src/mem/ruby/network/BasicLink.py => src/mem/ruby/network/simple/SimpleLink.py
This patch converts links and switches from second class simobjects that were
virtually ignored by the networks (both simple and Garnet) to first class
simobjects that directly correspond to c++ ojbects manipulated by the
topology and network classes. This is especially true for Garnet, where the
links and switches directly correspond to specific C++ objects.
By making this change, many aspects of the Topology class were simplified.
--HG--
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/BasicLink.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/BasicLink.hh
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.hh
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/fixed-pipeline/GarnetLink_d.py
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/fixed-pipeline/GarnetRouter_d.py
rename : src/mem/ruby/network/Network.cc => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.cc
rename : src/mem/ruby/network/Network.hh => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.hh
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/flexible-pipeline/GarnetLink.py
rename : src/mem/ruby/network/garnet/fixed-pipeline/GarnetNetwork_d.py => src/mem/ruby/network/garnet/flexible-pipeline/GarnetRouter.py
This patch adds an option to the script Ruby.py for setting the parameter
m_random_seed used for randomizing delays in the memory system. The option
can be specified as "--random_seed <seed value>".
Patch allows each individual message buffer to have different recycle latencies
and allows the overall recycle latency to be specified at the cmd line. The
patch also adds profiling info to make sure no one processor's requests are
recycled too much.