This patch unified the L1 and L2 caches used throughout the
regressions instead of declaring different, but very similar,
configurations in the different scripts.
The patch also changes the default L2 configuration to match what it
used to be for the fs and se scripts (until the last patch that
updated the regressions to also make use of the cache config). The
MSHRs and targets per MSHR are now set to a more realistic default of
20 and 12, respectively.
As a result of both the aforementioned changes, many of the regression
stats are changed. A follow-on patch will bump the stats.
This patch updates the stats to reflect the change in the default
system clock from 1 THz to 1GHz. The changes are due to the DMA
devices now injecting requests at a lower pace.
This patch bumps the stats to match the use of SimpleDRAM instead of
SimpleMemory in all inorder and O3 regressions, and also all
full-system regressions. A number of performance-related stats change,
and a whole bunch of stats are added for the memory controller.
This patch favours using SimpleDRAM with the default timing instead of
SimpleMemory for all regressions that involve the o3 or inorder CPU,
or are full system (in other words, where the actual performance of
the memory is important for the overall performance).
Moving forward, the solution for FSConfig and the users of fs.py and
se.py is probably something similar to what we use to choose the CPU
type. I envision a few pre-set configurations SimpleLPDDR2,
SimpleDDR3, etc that can be choosen by a dram_type option. Feedback on
this part is welcome.
This patch changes plenty stats and adds all the DRAM controller
related stats. A follow-on patch updates the relevant statistics. The
total run-time for the entire regression goes up with ~5% with this
patch due to the added complexity of the SimpleDRAM model. This is a
concious trade-off to ensure that the model is properly tested.
This patch uses the common L1, L2 and IOCache configuration for the
regressions that all share the same cache parameters. There are a few
regressions that use a slightly different configuration (memtest,
o3-timing=mp, simple-atomic-mp and simple-timing-mp), and the latter
are not changed in this patch. They will be updated in a future patch.
The common cache configurations are changed to match the ones used in
the regressions, and are slightly changed with respect to what they
were. Hopefully this means we can converge on a common base
configuration, used both in the normal user configurations and
regressions.
As only regressions that shared the same cache configuration are
updated, no regressions are affected.
This patch updates the stats to reflect the change in how cache
latencies are expressed. In addition, the latencies are now rounded to
multiples of the clock period, thus also affecting other stats.
This patch changes the cache-related latencies from an absolute time
expressed in Ticks, to a number of cycles that can be scaled with the
clock period of the caches. Ultimately this patch serves to enable
future work that involves dynamic frequency scaling. As an immediate
benefit it also makes it more convenient to specify cache performance
without implicitly assuming a specific CPU core operating frequency.
The stat blocked_cycles that actually counter in ticks is now updated
to count in cycles.
As the timing is now rounded to the clock edges of the cache, there
are some regressions that change. Plenty of them have very minor
changes, whereas some regressions with a short run-time are perturbed
quite significantly. A follow-on patch updates all the statistics for
the regressions.
This patch changes the memtest clock from 1THz (the default) to 2GHz,
similar to the CPUs in the other regressions. This is useful as the
caches will adopt the same clock as the CPU. The bus clock rate is
scaled accordingly, and the L1-L2 bus is kept at the CPU clock while
the memory bus is at half that frequency.
A separate patch updates the affected stats.
This patch unifies the full-system regression config scripts and uses
the BaseCPU convenience method addTwoLevelCacheHierarchy to connect up
the L1s and L2, and create the bus inbetween.
The patch is a step on the way to use the clock period to express the
cache latencies, as the CPU is now the parent of the L1, L2 and L1-L2
bus, and these modules thus use the CPU clock.
The patch does not change the value of any stats, but plenty names,
and a follow-up patch contains the update to the stats, chaning
system.l2c to system.cpu.l2cache.
In the current caches the hit latency is paid twice on a miss. This patch lets
a configurable response latency be set of the cache for the backward path.
This patch merely adds a clock other than the default 1 Tick for the
CPUs of both the test system and drive system for the twosys-tsunami
regression.
The CPU frequency of the driver system is choosed to be twice that of
the test system to ensure it is not the bottleneck (although in this
case it mostly serves as a demonstration of a two-system setup),
This patch adds a basic regression for the traffic generator. The
regression also serves as an example of the file formats used. More
complex regressions that make use of a DRAM controller model will
follow shortly.
This patch simply bumps the stats to avoid having failing
regressions. Someone with more insight in the changes should verify
that these differences all make sense.
This patch removes the NACKing in the bridge, as the split
request/response busses now ensure that protocol deadlocks do not
occur, i.e. the message-dependency chain is broken by always allowing
responses to make progress without being stalled by requests. The
NACKs had limited support in the system with most components ignoring
their use (with a suitable call to panic), and as the NACKs are no
longer needed to avoid protocol deadlocks, the cleanest way is to
simply remove them.
The bridge is the starting point as this is the only place where the
NACKs are created. A follow-up patch will remove the code that deals
with NACKs in the endpoints, e.g. the X86 table walker and DMA
port. Ultimately the type of packet can be complete removed (until
someone sees a need for modelling more complex protocols, which can
now be done in parts of the system since the port and interface is
split).
As a consequence of the NACK removal, the bridge now has to send a
retry to a master if the request or response queue was full on the
first attempt. This change also makes the bridge ports very similar to
QueuedPorts, and a later patch will change the bridge to use these. A
first step in this direction is taken by aligning the name of the
member functions, as done by this patch.
A bit of tidying up has also been done as part of the simplifications.
Surprisingly, this patch has no impact on any of the
regressions. Hence, there was never any NACKs issued. In a follow-up
patch I would suggest changing the size of the bridge buffers set in
FSConfig.py to also test the situation where the bridge fills up.