This patch updates the stats to reflect the change in how cache
latencies are expressed. In addition, the latencies are now rounded to
multiples of the clock period, thus also affecting other stats.
This patch changes the cache-related latencies from an absolute time
expressed in Ticks, to a number of cycles that can be scaled with the
clock period of the caches. Ultimately this patch serves to enable
future work that involves dynamic frequency scaling. As an immediate
benefit it also makes it more convenient to specify cache performance
without implicitly assuming a specific CPU core operating frequency.
The stat blocked_cycles that actually counter in ticks is now updated
to count in cycles.
As the timing is now rounded to the clock edges of the cache, there
are some regressions that change. Plenty of them have very minor
changes, whereas some regressions with a short run-time are perturbed
quite significantly. A follow-on patch updates all the statistics for
the regressions.
This patch changes the memtest clock from 1THz (the default) to 2GHz,
similar to the CPUs in the other regressions. This is useful as the
caches will adopt the same clock as the CPU. The bus clock rate is
scaled accordingly, and the L1-L2 bus is kept at the CPU clock while
the memory bus is at half that frequency.
A separate patch updates the affected stats.
This patch unifies the full-system regression config scripts and uses
the BaseCPU convenience method addTwoLevelCacheHierarchy to connect up
the L1s and L2, and create the bus inbetween.
The patch is a step on the way to use the clock period to express the
cache latencies, as the CPU is now the parent of the L1, L2 and L1-L2
bus, and these modules thus use the CPU clock.
The patch does not change the value of any stats, but plenty names,
and a follow-up patch contains the update to the stats, chaning
system.l2c to system.cpu.l2cache.
In the current caches the hit latency is paid twice on a miss. This patch lets
a configurable response latency be set of the cache for the backward path.
This patch merely adds a clock other than the default 1 Tick for the
CPUs of both the test system and drive system for the twosys-tsunami
regression.
The CPU frequency of the driver system is choosed to be twice that of
the test system to ensure it is not the bottleneck (although in this
case it mostly serves as a demonstration of a two-system setup),
This patch adds a basic regression for the traffic generator. The
regression also serves as an example of the file formats used. More
complex regressions that make use of a DRAM controller model will
follow shortly.
This patch simply bumps the stats to avoid having failing
regressions. Someone with more insight in the changes should verify
that these differences all make sense.
This patch removes the NACKing in the bridge, as the split
request/response busses now ensure that protocol deadlocks do not
occur, i.e. the message-dependency chain is broken by always allowing
responses to make progress without being stalled by requests. The
NACKs had limited support in the system with most components ignoring
their use (with a suitable call to panic), and as the NACKs are no
longer needed to avoid protocol deadlocks, the cleanest way is to
simply remove them.
The bridge is the starting point as this is the only place where the
NACKs are created. A follow-up patch will remove the code that deals
with NACKs in the endpoints, e.g. the X86 table walker and DMA
port. Ultimately the type of packet can be complete removed (until
someone sees a need for modelling more complex protocols, which can
now be done in parts of the system since the port and interface is
split).
As a consequence of the NACK removal, the bridge now has to send a
retry to a master if the request or response queue was full on the
first attempt. This change also makes the bridge ports very similar to
QueuedPorts, and a later patch will change the bridge to use these. A
first step in this direction is taken by aligning the name of the
member functions, as done by this patch.
A bit of tidying up has also been done as part of the simplifications.
Surprisingly, this patch has no impact on any of the
regressions. Hence, there was never any NACKs issued. In a follow-up
patch I would suggest changing the size of the bridge buffers set in
FSConfig.py to also test the situation where the bridge fills up.
This patch changes the simple memory to have a single slave port
rather than a vector port. The simple memory makes no attempts at
modelling the contention between multiple ports, and any such
multiplexing and demultiplexing could be done in a bus (or crossbar)
outside the memory controller. This scenario also matches with the
ongoing work on a SimpleDRAM model, which will be a single-ported
single-channel controller that can be used in conjunction with a bus
(or crossbar) to create a multi-port multi-channel controller.
There are only very few regressions that make use of the vector port,
and these are all for functional accesses only. To facilitate these
cases, memtest and memtest-ruby have been updated to also have a
"functional" bus to perform the (de)multiplexing of the functional
memory accesses.
This patch bumps all the stats to reflect the bus changes, i.e. the
introduction of the state variable, the division into a request and
response layer, and the new default bus width of 8 bytes.
This patch introduces a class hierarchy of buses, a non-coherent one,
and a coherent one, splitting the existing bus functionality. By doing
so it also enables further specialisation of the two types of buses.
A non-coherent bus connects a number of non-snooping masters and
slaves, and routes the request and response packets based on the
address. The request packets issued by the master connected to a
non-coherent bus could still snoop in caches attached to a coherent
bus, as is the case with the I/O bus and memory bus in most system
configurations. No snoops will, however, reach any master on the
non-coherent bus itself. The non-coherent bus can be used as a
template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses,
and is typically used for the I/O buses.
A coherent bus connects a number of (potentially) snooping masters and
slaves, and routes the request and response packets based on the
address, and also forwards all requests to the snoopers and deals with
the snoop responses. The coherent bus can be used as a template for
modelling QPI, HyperTransport, ACE and coherent OCP buses, and is
typically used for the L1-to-L2 buses and as the main system
interconnect.
The configuration scripts are updated to use a NoncoherentBus for all
peripheral and I/O buses.
A bit of minor tidying up has also been done.
--HG--
rename : src/mem/bus.cc => src/mem/coherent_bus.cc
rename : src/mem/bus.hh => src/mem/coherent_bus.hh
rename : src/mem/bus.cc => src/mem/noncoherent_bus.cc
rename : src/mem/bus.hh => src/mem/noncoherent_bus.hh
This patch updates the stats for parser to be aligned with the most
up-to-date behaviour. Somehow the wrong results got committed as part
of 8800b05e1cb3 (see details below) when fixing the no_value -> nan
stats.
changeset: 8983:8800b05e1cb3
user: Nathan Binkert <nate@binkert.org>
summary: stats: update stats for no_value -> nan
The kernel originally used to generate the stats is different from the one
at use on zizzer. This patch updates the stats with the correct kernel in
use.