This patch adds support to different entities in the ruby memory system
for more reliable functional read/write accesses. Only the simple network
has been augmented as of now. Later on Garnet will also support functional
accesses.
The patch adds functional access code to all the different types of messages
that protocols can send around. These messages are functionally accessed
by going through the buffers maintained by the network entities.
The patch also rectifies some of the bugs found in coherence protocols while
testing the patch.
With this patch applied, functional writes always succeed. But functional
reads can still fail.
This patch changes the cache-related latencies from an absolute time
expressed in Ticks, to a number of cycles that can be scaled with the
clock period of the caches. Ultimately this patch serves to enable
future work that involves dynamic frequency scaling. As an immediate
benefit it also makes it more convenient to specify cache performance
without implicitly assuming a specific CPU core operating frequency.
The stat blocked_cycles that actually counter in ticks is now updated
to count in cycles.
As the timing is now rounded to the clock edges of the cache, there
are some regressions that change. Plenty of them have very minor
changes, whereas some regressions with a short run-time are perturbed
quite significantly. A follow-on patch updates all the statistics for
the regressions.
The memtest.py script used to connect the system port directly to the
SimpleMemory, but the latter is now single ported. Since the system
port is not used for anything in this particular example, a quick fix
is to attach it to the functional bus instead.
In order to ensure correct functionality of switch CPUs, the TLB walker ports
must be connected to the Ruby system in x86 simulation.
This fixes x86 assertion failures that the TLB walker ports are not connected
during the CPU switch process.
This patch allows for specifying multiple programs via command line. It also
adds an option for specifying whether to use of SMT. But SMT does not work for
the o3 cpu as of now.
This patch removes the NACKing in the bridge, as the split
request/response busses now ensure that protocol deadlocks do not
occur, i.e. the message-dependency chain is broken by always allowing
responses to make progress without being stalled by requests. The
NACKs had limited support in the system with most components ignoring
their use (with a suitable call to panic), and as the NACKs are no
longer needed to avoid protocol deadlocks, the cleanest way is to
simply remove them.
The bridge is the starting point as this is the only place where the
NACKs are created. A follow-up patch will remove the code that deals
with NACKs in the endpoints, e.g. the X86 table walker and DMA
port. Ultimately the type of packet can be complete removed (until
someone sees a need for modelling more complex protocols, which can
now be done in parts of the system since the port and interface is
split).
As a consequence of the NACK removal, the bridge now has to send a
retry to a master if the request or response queue was full on the
first attempt. This change also makes the bridge ports very similar to
QueuedPorts, and a later patch will change the bridge to use these. A
first step in this direction is taken by aligning the name of the
member functions, as done by this patch.
A bit of tidying up has also been done as part of the simplifications.
Surprisingly, this patch has no impact on any of the
regressions. Hence, there was never any NACKs issued. In a follow-up
patch I would suggest changing the size of the bridge buffers set in
FSConfig.py to also test the situation where the bridge fills up.
This patch changes the se and fs script to use the clock option and
not simply set the CPUs clock to 2 GHz. It also makes a minor change
to the assignment of the switch_cpus clock to allow different clocks.
This patch changes the simple memory to have a single slave port
rather than a vector port. The simple memory makes no attempts at
modelling the contention between multiple ports, and any such
multiplexing and demultiplexing could be done in a bus (or crossbar)
outside the memory controller. This scenario also matches with the
ongoing work on a SimpleDRAM model, which will be a single-ported
single-channel controller that can be used in conjunction with a bus
(or crossbar) to create a multi-port multi-channel controller.
There are only very few regressions that make use of the vector port,
and these are all for functional accesses only. To facilitate these
cases, memtest and memtest-ruby have been updated to also have a
"functional" bus to perform the (de)multiplexing of the functional
memory accesses.
Instead of just passing a list of controllers to the makeTopology function
in src/mem/ruby/network/topologies/<Topo>.py we pass in a function pointer
which knows how to make the topology, possibly with some extra state set
in the configs/ruby/<protocol>.py file. Thus, we can move all of the files
from network/topologies to configs/topologies. A new class BaseTopology
is added which all topologies in configs/topologies must inheirit from and
follow its API.
--HG--
rename : src/mem/ruby/network/topologies/Crossbar.py => configs/topologies/Crossbar.py
rename : src/mem/ruby/network/topologies/Mesh.py => configs/topologies/Mesh.py
rename : src/mem/ruby/network/topologies/MeshDirCorners.py => configs/topologies/MeshDirCorners.py
rename : src/mem/ruby/network/topologies/Pt2Pt.py => configs/topologies/Pt2Pt.py
rename : src/mem/ruby/network/topologies/Torus.py => configs/topologies/Torus.py
As status matrix, MIPS fs does not work. Hence, these options are not
required. Secondly, the function is setting param values for a CPU class.
This seems strange, should probably be done in a different way.
This patch introduces a class hierarchy of buses, a non-coherent one,
and a coherent one, splitting the existing bus functionality. By doing
so it also enables further specialisation of the two types of buses.
A non-coherent bus connects a number of non-snooping masters and
slaves, and routes the request and response packets based on the
address. The request packets issued by the master connected to a
non-coherent bus could still snoop in caches attached to a coherent
bus, as is the case with the I/O bus and memory bus in most system
configurations. No snoops will, however, reach any master on the
non-coherent bus itself. The non-coherent bus can be used as a
template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses,
and is typically used for the I/O buses.
A coherent bus connects a number of (potentially) snooping masters and
slaves, and routes the request and response packets based on the
address, and also forwards all requests to the snoopers and deals with
the snoop responses. The coherent bus can be used as a template for
modelling QPI, HyperTransport, ACE and coherent OCP buses, and is
typically used for the L1-to-L2 buses and as the main system
interconnect.
The configuration scripts are updated to use a NoncoherentBus for all
peripheral and I/O buses.
A bit of minor tidying up has also been done.
--HG--
rename : src/mem/bus.cc => src/mem/coherent_bus.cc
rename : src/mem/bus.hh => src/mem/coherent_bus.hh
rename : src/mem/bus.cc => src/mem/noncoherent_bus.cc
rename : src/mem/bus.hh => src/mem/noncoherent_bus.hh
Multithreaded programs did not run by just specifying the binary once on the
command line of SE mode.The default mode is multi-programmed mode. Added
check in SE mode to run multi-threaded programs in case only one program is
specified with multiple CPUS. Default mode is still multi-programmed mode.
Added the options to Options.py for FS mode with backward compatibility. It is
good to provide an option to specify the disk image and the memory size from
command line since a lot of disk images are created to support different
benchmark suites as well as per user needs. Change in program also leads to
change in memory requirements. These options provide the interface to provide
both disk image and memory size from the command line and gives more
flexibility.
This patch removes the assumption on having on single instance of
PhysicalMemory, and enables a distributed memory where the individual
memories in the system are each responsible for a single contiguous
address range.
All memories inherit from an AbstractMemory that encompasses the basic
behaviuor of a random access memory, and provides untimed access
methods. What was previously called PhysicalMemory is now
SimpleMemory, and a subclass of AbstractMemory. All future types of
memory controllers should inherit from AbstractMemory.
To enable e.g. the atomic CPU and RubyPort to access the now
distributed memory, the system has a wrapper class, called
PhysicalMemory that is aware of all the memories in the system and
their associated address ranges. This class thus acts as an
infinitely-fast bus and performs address decoding for these "shortcut"
accesses. Each memory can specify that it should not be part of the
global address map (used e.g. by the functional memories by some
testers). Moreover, each memory can be configured to be reported to
the OS configuration table, useful for populating ATAG structures, and
any potential ACPI tables.
Checkpointing support currently assumes that all memories have the
same size and organisation when creating and resuming from the
checkpoint. A future patch will enable a more flexible
re-organisation.
--HG--
rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py
rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py
rename : src/mem/physical.cc => src/mem/abstract_mem.cc
rename : src/mem/physical.hh => src/mem/abstract_mem.hh
rename : src/mem/physical.cc => src/mem/simple_mem.cc
rename : src/mem/physical.hh => src/mem/simple_mem.hh
With recent changes to the memory system, a port cannot be assigned a peer
port twice. While making use of the Ruby memory system in FS mode, DMA
ports were assigned peer twice, once for the classic memory system
and once for the Ruby memory system. This patch removes this double
assignment of peer ports.
This patch removes the physmem_port from the Atomic CPU and instead
uses the system pointer to access the physmem when using the fastmem
option. The system already keeps track of the physmem and the valid
memory address ranges, and with this patch we merely make use of that
existing functionality. As a result of this change, the overloaded
getMasterPort in the Atomic CPU can be removed, thus unifying the CPUs.
I am not too happy with the way options are added in files se.py and fs.py
currently. This patch moves all the options to the file Options.py, functions
from which are called when required.
With the SE/FS merge, interrupt controller is created irrespective of the
mode. This patch creates the interrupt controller when Ruby is used and
connects its ports.
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files. Also merges with the SE/FS changes.
This patch cleans up a number of remaining uses of bus.port which
is now split into bus.master and bus.slave. The only non-trivial change
is the memtest where the level building now has to be aware of the role
of the ports used in the previous level.
This patch classifies all ports in Python as either Master or Slave
and enforces a binding of master to slave. Conceptually, a master (such
as a CPU or DMA port) issues requests, and receives responses, and
conversely, a slave (such as a memory or a PIO device) receives
requests and sends back responses. Currently there is no
differentiation between coherent and non-coherent masters and slaves.
The classification as master/slave also involves splitting the dual
role port of the bus into a master and slave port and updating all the
system assembly scripts to use the appropriate port. Similarly, the
interrupt devices have to have their int_port split into a master and
slave port. The intdev and its children have minimal changes to
facilitate the extra port.
Note that this patch does not enforce any port typing in the C++
world, it merely ensures that the Python objects have a notion of the
port roles and are connected in an appropriate manner. This check is
carried when two ports are connected, e.g. bus.master =
memory.port. The following patches will make use of the
classifications and specialise the C++ ports into masters and slaves.
This patch moves the connection of the system port to create_system in
Ruby.py. Thereby it allows the failing Ruby test (and other Ruby
systems) to run again.
This patch fixes the currently broken fs.py by specifying the size of
the bridge range rather than the end address. This effectively
subtracts one when determining the address range for the IO bridge
(from IO bus to membus), and thus avoids the overlapping ranges.
This patch makes the bus bridge uni-directional and specialises the
bus ports to be a master port and a slave port. This greatly
simplifies the assumptions on both sides as either port only has to
deal with requests or responses. The following patches introduce the
notion of master and slave ports, and would not be possible without
this split of responsibilities.
In making the bridge unidirectional, the address range mechanism of
the bridge is also changed. For the cases where communication is
taking place both ways, an additional bridge is needed. This causes
issues with the existing mechanism, as the busses cannot determine
when to stop iterating the address updates from the two bridges. To
avoid this issue, and also greatly simplify the specification, the
bridge now has a fixed set of address ranges, specified at creation
time.