Go to file
Radhika Jagtap 7d3600c309 cpu: Add TraceCPU to playback elastic traces
This patch defines a TraceCPU that replays trace generated using the elastic
trace probe attached to the O3 CPU model. The elastic trace is an execution
trace with data dependencies and ordering dependencies annoted to it. It also
replays fixed timestamp instruction fetch trace that is also generated by the
elastic trace probe.

The TraceCPU inherits from BaseCPU as a result of which some methods need
to be defined. It has two port subclasses inherited from MasterPort for
instruction and data ports. It issues the memory requests deducing the
timing from the trace and without performing real execution of micro-ops.
As soon as the last dependency for an instruction is complete,
its computational delay, also provided in the input trace is added. The
dependency-free nodes are maintained in a list, called 'ReadyList',
ordered by ready time. Instructions which depend on load stall until the
responses for read requests are received thus achieving elastic replay. If
the dependency is not found when adding a new node, it is assumed complete.
Thus, if this node is found to be completely dependency-free its issue time is
calculated and it is added to the ready list immediately. This is encapsulated
in the subclass ElasticDataGen.

If ready nodes are issued in an unconstrained way there can be more nodes
outstanding which results in divergence in timing compared to the O3CPU.
Therefore, the Trace CPU also models hardware resources. A sub-class to model
hardware resources is added which contains the maximum sizes of load buffer,
store buffer and ROB. If resources are not available, the node is not issued.
The 'depFreeQueue' structure holds nodes that are pending issue.

Modeling the ROB size in the Trace CPU as a resource limitation is arguably the
most important parameter of all resources. The ROB occupancy is estimated using
the newly added field 'robNum'. We need to use ROB number as sequence number is
at times much higher due to squashing and trace replay is focused on correct
path modeling.

A map called 'inFlightNodes' is added to track nodes that are not only in
the readyList but also load nodes that are executed (and thus removed from
readyList) but are not complete. ReadyList handles what and when to execute
next node while the inFlightNodes is used for resource modelling. The oldest
ROB number is updated when any node occupies the ROB or when an entry in the
ROB is released. The ROB occupancy is equal to the difference in the ROB number
of the newly dependency-free node and the oldest ROB number in flight.

If no node dependends on a non load/store node then there is no reason to track
it in the dependency graph. We filter out such nodes but count them and add a
weight field to the subsequent node that we do include in the trace. The weight
field is used to model ROB occupancy during replay.

The depFreeQueue is chosen to be FIFO so that child nodes which are in
program order get pushed into it in that order and thus issued in the in
program order, like in the O3CPU. This is also why the dependents is made a
sequential container, std::set to std::vector. We only check head of the
depFreeQueue as nodes are issued in order and blocking on head models that
better than looping the entire queue. An alternative choice would be to inspect
top N pending nodes where N is the issue-width. This is left for future as the
timing correlation looks good as it is.

At the start of an execution event, first we attempt to issue such pending
nodes by checking if appropriate resources have become available. If yes, we
compute the execute tick with respect to the time then. Then we proceed to
complete nodes from the readyList.

When a read response is received, sometimes a dependency on it that was
supposed to be released when it was issued is still not released. This occurs
because the dependent gets added to the graph after the read was sent. So the
check is made less strict and the dependency is marked complete on read
response instead of insisting that it should have been removed on read sent.

There is a check for requests spanning two cache lines as this condition
triggers an assert fail in the L1 cache. If it does then truncate the size
to access only until the end of that line and ignore the remainder.
Strictly-ordered requests are skipped and the dependencies on such requests
are handled by simply marking them complete immediately.

The simulated seconds can be calculated as the difference between the
final_tick stat and the tickOffset stat. A CountedExitEvent that contains
a static int belonging to the Trace CPU class as a down counter is used to
implement multi Trace CPU simulation exit.
2015-12-07 16:42:15 -06:00
build_opts scons: Do not build the InOrderCPU 2015-01-20 08:12:45 -05:00
configs dev: Rewrite PCI host functionality 2015-12-05 00:11:24 +00:00
ext ext: fix SST connector 2015-10-06 18:08:28 -05:00
src cpu: Add TraceCPU to playback elastic traces 2015-12-07 16:42:15 -06:00
system arm: Add support for ARMv8 (AArch64 & AArch32) 2014-01-24 15:29:34 -06:00
tests stats: Update to reflect changes to PCI handling 2015-12-05 00:11:25 +00:00
util util: term: drop CC from Makefile 2015-12-04 17:25:45 -06:00
.hgignore misc: ignore object files and static libs in util/m5 2015-11-13 17:03:48 -05:00
.hgtags Added tag stable_2015_09_03 for changeset 60eb3fef9c2d 2015-09-03 15:38:46 -05:00
COPYING copyright: Add code for finding all copyright blocks and create a COPYING file 2011-06-02 17:36:07 -07:00
LICENSE copyright: Add code for finding all copyright blocks and create a COPYING file 2011-06-02 17:36:07 -07:00
README misc: README direct to website for dependencies 2014-08-26 10:12:04 -04:00
SConstruct sim: Add support for generating back traces on errors 2015-12-04 00:12:58 +00:00

This is the gem5 simulator.

The main website can be found at http://www.gem5.org

A good starting point is http://www.gem5.org/Introduction, and for
more information about building the simulator and getting started
please see http://www.gem5.org/Documentation and
http://www.gem5.org/Tutorials.

To build gem5, you will need the following software: g++ or clang,
Python (gem5 links in the Python interpreter), SCons, SWIG, zlib, m4,
and lastly protobuf if you want trace capture and playback
support. Please see http://www.gem5.org/Dependencies for more details
concerning the minimum versions of the aforementioned tools.

Once you have all dependencies resolved, type 'scons
build/<ARCH>/gem5.opt' where ARCH is one of ALPHA, ARM, NULL, MIPS,
POWER, SPARC, or X86. This will build an optimized version of the gem5
binary (gem5.opt) for the the specified architecture. See
http://www.gem5.org/Build_System for more details and options.

With the simulator built, have a look at
http://www.gem5.org/Running_gem5 for more information on how to use
gem5.

The basic source release includes these subdirectories:
   - configs: example simulation configuration scripts
   - ext: less-common external packages needed to build gem5
   - src: source code of the gem5 simulator
   - system: source for some optional system software for simulated systems
   - tests: regression tests
   - util: useful utility programs and files

To run full-system simulations, you will need compiled system firmware
(console and PALcode for Alpha), kernel binaries and one or more disk
images. Please see the gem5 download page for these items at
http://www.gem5.org/Download

If you have questions, please send mail to gem5-users@gem5.org

Enjoy using gem5 and please share your modifications and extensions.