Commit graph

1691 commits

Author SHA1 Message Date
Rekai Gonzalez Alberquilla 21f8242430 cpu: Change literal integer constants to meaningful labels
fu_pool and inst_queue were using -1 for "no such FU" and -2 for "all those
FUs are busy at the moment" when requesting for a FU and replying. This
patch introduces new constants NoCapableFU and NoFreeFU respectively.

In addition, the condition (idx == -2 || idx != -1) is equivalent to
(idx != -1), so this patch also simplifies that.

--HG--
extra : rebase_source : 4833717b9d1e09d7594d1f34f882e13fc4b86846
2015-05-05 16:47:24 +01:00
Andreas Sandberg 4f303785dc kvm: Shutdown KVM and disconnect performance counters on fork
We can't/shouldn't use KVM after a fork since the child and parent
probably point to the same VM. Knowing the exact effects of this is
hard, but they are likely to be messy. We also disconnect the
performance counters attached to the guest. This works around what
seems to be a kernel bug where spurious SIGIOs get delivered to the
forked child process.

Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se>
[sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version]
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
[andreas.sandberg@arm.com: Fatal if entering KVM in child process ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2015-11-27 14:52:10 +00:00
Andreas Sandberg 5383e1ada4 base: Add support for changing output directories
This changeset adds support for changing the simulator output
directory. This can be useful when the simulation goes through several
stages (e.g., a warming phase, a simulation phase, and a verification
phase) since it allows the output from each stage to be located in a
different directory. Relocation is done by calling core.setOutputDir()
from Python or simout.setOutputDirectory() from C++.

This change affects several parts of the design of the gem5's output
subsystem. First, files returned by an OutputDirectory instance (e.g.,
simout) are of the type OutputStream instead of a std::ostream. This
allows us to do some more book keeping and control re-opening of files
when the output directory is changed. Second, new subdirectories are
OutputDirectory instances, which should be used to create files in
that sub-directory.

Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se>
[sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version]
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2015-11-27 14:41:59 +00:00
Stephan Diestelhorst f703160e5a mem, cpu: Add assertions to snoop invalidation logic
This patch adds assertions that enforce that only invalidating snoops
will ever reach into the logic that tracks in-order load completion and
also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds
some comments to MSHR::replaceUpgrades().
2015-08-10 11:25:52 +01:00
Krishnendra Nathella cabd4768c7 cpu: Fix LLSC atomic CPU wakeup
Writes to locked memory addresses (LLSC) did not wake up the locking
CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU,
recvAtomicSnoop was checking if the incoming packet was an invalidation
(isInvalidate) and only then handled a locked snoop. But, writes are
seen instead of invalidates when running without caches (fast-forward
configurations). As as simple fix, now handleLockedSnoop is also called
even if the incoming snoop packet are from writes.
2015-07-19 15:03:30 -05:00
Matteo Andreozzi 496a8c6c92 cpu: TraceGen fix for tick frequency check
Bug fix for check on protobuf file frequency being different than
global frequency.

The ASCII encoder script is also fixed, and the example trace used in
the regressions is updated.
2016-02-24 04:16:55 -05:00
Andreas Hansson 4619f0ee8b scons: Add missing override to appease clang
Make clang happy...again.
2016-02-23 03:27:20 -05:00
Andreas Hansson 0d50979888 misc: Add missing overrides to appease clang
Since the last round of fixes a few new issues have snuck in. We
should consider switching the regression runs to clang.
2016-02-15 03:40:32 -05:00
Andreas Hansson fbdeb60316 mem: Deduce if cache should forward snoops
This patch changes how the cache determines if snoops should be
forwarded from the memory side to the CPU side. Instead of having a
parameter, the cache now looks at the port connected on the CPU side,
and if it is a snooping port, then snoops are forwarded. Less error
prone, and less parameters to worry about.

The patch also tidies up the CPU classes to ensure that their I-side
port is not snooping by removing overrides to the snoop request
handler, such that snoop requests will panic via the default
MasterPort implement
2016-02-10 04:08:24 -05:00
Steve Reinhardt f6b828d068 style: eliminate explicit boolean comparisons
Result of running 'hg m5style --skip-all --fix-control -a' to get
rid of '== true' comparisons, plus trivial manual edits to get
rid of '== false'/'== False' comparisons.

Left a couple of explicit comparisons in where they didn't seem
unreasonable:
invalid boolean comparison in src/arch/mips/interrupts.cc:155
>>        DPRINTF(Interrupt, "Interrupts OnCpuTimerINterrupt(tc) == true\n");<<
invalid boolean comparison in src/unittest/unittest.hh:110
>>            "EXPECT_FALSE(" #expr ")", (expr) == false)<<
2016-02-06 17:21:20 -08:00
Steve Reinhardt 5592798865 style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'.
2016-02-06 17:21:19 -08:00
Steve Reinhardt dc8018a5c3 style: remove trailing whitespace
Result of running 'hg m5style --skip-all --fix-white -a'.
2016-02-06 17:21:18 -08:00
Steve Reinhardt 1b6355c895 cpu. arch: add initiateMemRead() to ExecContext interface
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.

This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.

This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().

For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level.  For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
2016-01-17 18:27:46 -08:00
Steve Reinhardt 707275265f cpu: remove unnecessary data ptr from O3 internal read() funcs
The read() function merely initiates a memory read operation; the
data doesn't arrive until the access completes and a response packet
is received from the memory system.  Thus there's no need to provide
a data pointer; its existence is historical.

Getting this pointer out of this internal o3 interface sets the
stage for similar cleanup in the ExecContext interface.  Also
found that we were pointlessly setting the contents at this pointer
on a store forward (the useful memcpy happens just a few lines
below the deleted one).
2016-01-17 18:27:46 -08:00
Andreas Hansson 12eb034378 scons: Enable -Wextra by default
Make best use of the compiler, and enable -Wextra as well as
-Wall. There are a few issues that had to be resolved, but they are
all trivial.
2016-01-11 05:52:20 -05:00
Andreas Hansson 0fcb376e5f mem: Make cache terminology easier to understand
This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.

The following name changes are made:

* the packet memInhibit flag is renamed to cacheResponding

* the packet sharedAsserted flag is renamed to hasSharers

* the packet NeedsExclusive attribute is renamed to NeedsWritable

* the packet isSupplyExclusive is renamed responderHadWritable

* the MSHR pendingDirty is renamed to pendingModified

The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.
2015-12-31 09:32:58 -05:00
Brad Beckmann 173a786921 ruby: more flexible ruby tester support
This patch allows the ruby random tester to use ruby ports that may only
support instr or data requests.  This patch is similar to a previous changeset
(8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets.
This current patch implements the support in a more straight-forward way.
Since retries are now tested when running the ruby random tester, this patch
splits up the retry and drain check behavior so that RubyPort children, such
as the GPUCoalescer, can perform those operations correctly without having to
duplicate code.  Finally, the patch also includes better DPRINTFs for
debugging the tester.
2015-07-20 09:15:18 -05:00
Radhika Jagtap 54519fd51f cpu: Support virtual addr in elastic traces
This patch adds support to optionally capture the virtual address and asid
for load/store instructions in the elastic traces. If they are present in
the traces, Trace CPU will set those fields of the request during replay.
2015-12-07 16:42:16 -06:00
Radhika Jagtap 3080bbcc36 cpu: Create record type enum for elastic traces
This patch replaces the booleans that specified the elastic trace record
type with an enum type. The source of change is the proto message for
elastic trace where the enum is introduced. The struct definitions in the
elastic trace probe listener as well as the Trace CPU replace the boleans
with the proto message enum.

The patch does not impact functionality, but traces are not compatible with
previous version. This is preparation for adding new types of records in
subsequent patches.
2015-12-07 16:42:16 -06:00
Radhika Jagtap 7d3600c309 cpu: Add TraceCPU to playback elastic traces
This patch defines a TraceCPU that replays trace generated using the elastic
trace probe attached to the O3 CPU model. The elastic trace is an execution
trace with data dependencies and ordering dependencies annoted to it. It also
replays fixed timestamp instruction fetch trace that is also generated by the
elastic trace probe.

The TraceCPU inherits from BaseCPU as a result of which some methods need
to be defined. It has two port subclasses inherited from MasterPort for
instruction and data ports. It issues the memory requests deducing the
timing from the trace and without performing real execution of micro-ops.
As soon as the last dependency for an instruction is complete,
its computational delay, also provided in the input trace is added. The
dependency-free nodes are maintained in a list, called 'ReadyList',
ordered by ready time. Instructions which depend on load stall until the
responses for read requests are received thus achieving elastic replay. If
the dependency is not found when adding a new node, it is assumed complete.
Thus, if this node is found to be completely dependency-free its issue time is
calculated and it is added to the ready list immediately. This is encapsulated
in the subclass ElasticDataGen.

If ready nodes are issued in an unconstrained way there can be more nodes
outstanding which results in divergence in timing compared to the O3CPU.
Therefore, the Trace CPU also models hardware resources. A sub-class to model
hardware resources is added which contains the maximum sizes of load buffer,
store buffer and ROB. If resources are not available, the node is not issued.
The 'depFreeQueue' structure holds nodes that are pending issue.

Modeling the ROB size in the Trace CPU as a resource limitation is arguably the
most important parameter of all resources. The ROB occupancy is estimated using
the newly added field 'robNum'. We need to use ROB number as sequence number is
at times much higher due to squashing and trace replay is focused on correct
path modeling.

A map called 'inFlightNodes' is added to track nodes that are not only in
the readyList but also load nodes that are executed (and thus removed from
readyList) but are not complete. ReadyList handles what and when to execute
next node while the inFlightNodes is used for resource modelling. The oldest
ROB number is updated when any node occupies the ROB or when an entry in the
ROB is released. The ROB occupancy is equal to the difference in the ROB number
of the newly dependency-free node and the oldest ROB number in flight.

If no node dependends on a non load/store node then there is no reason to track
it in the dependency graph. We filter out such nodes but count them and add a
weight field to the subsequent node that we do include in the trace. The weight
field is used to model ROB occupancy during replay.

The depFreeQueue is chosen to be FIFO so that child nodes which are in
program order get pushed into it in that order and thus issued in the in
program order, like in the O3CPU. This is also why the dependents is made a
sequential container, std::set to std::vector. We only check head of the
depFreeQueue as nodes are issued in order and blocking on head models that
better than looping the entire queue. An alternative choice would be to inspect
top N pending nodes where N is the issue-width. This is left for future as the
timing correlation looks good as it is.

At the start of an execution event, first we attempt to issue such pending
nodes by checking if appropriate resources have become available. If yes, we
compute the execute tick with respect to the time then. Then we proceed to
complete nodes from the readyList.

When a read response is received, sometimes a dependency on it that was
supposed to be released when it was issued is still not released. This occurs
because the dependent gets added to the graph after the read was sent. So the
check is made less strict and the dependency is marked complete on read
response instead of insisting that it should have been removed on read sent.

There is a check for requests spanning two cache lines as this condition
triggers an assert fail in the L1 cache. If it does then truncate the size
to access only until the end of that line and ignore the remainder.
Strictly-ordered requests are skipped and the dependencies on such requests
are handled by simply marking them complete immediately.

The simulated seconds can be calculated as the difference between the
final_tick stat and the tickOffset stat. A CountedExitEvent that contains
a static int belonging to the Trace CPU class as a down counter is used to
implement multi Trace CPU simulation exit.
2015-12-07 16:42:15 -06:00
Radhika Jagtap de4dc50e95 proto, probe: Add elastic trace probe to o3 cpu
The elastic trace is a type of probe listener and listens to probe points
in multiple stages of the O3CPU. The notify method is called on a probe
point typically when an instruction successfully progresses through that
stage.

As different listener methods mapped to the different probe points execute,
relevant information about the instruction, e.g. timestamps and register
accesses, are captured and stored in temporary InstExecInfo class objects.
When the instruction progresses through the commit stage, the timing and the
dependency information about the instruction is finalised and encapsulated in
a struct called TraceInfo. TraceInfo objects are collected in a list instead
of writing them out to the trace file one a time. This is required as the
trace is processed in chunks to evaluate order dependencies and computational
delay in case an instruction does not have any register dependencies. By this
we achieve a simpler algorithm during replay because every record in the
trace can be hooked onto a record in its past. The instruction dependency
trace is written out as a protobuf format file. A second trace containing
fetch requests at absolute timestamps is written to a separate protobuf
format file.

If the instruction is not executed then it is not added to the trace.
The code checks if the instruction had a fault, if it predicated
false and thus previous register values were restored or if it was a
load/store that did not have a request (e.g. when the size of the
request is zero). In all these cases the instruction is set as
executed by the Execute stage and is picked up by the commit probe
listener. But a request is not issued and registers are not written.
So practically, skipping these should not hurt the dependency modelling.

If squashing results in squashing younger instructions, it may happen that
the squash probe discards the inst and removes it from the temporary
store but execute stage deals with the instruction in the next cycle which
results in the execute probe seeing this inst as 'new' inst. A sequence
number of the last processed trace record is used to trap these cases and
not add to the temporary store.

The elastic instruction trace and fetch request trace can be read in and
played back by the TraceCPU.
2015-12-07 16:42:15 -06:00
Radhika Jagtap eb19fc2976 probe: Add probe in Fetch, IEW, Rename and Commit
This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows.

A probe point is added in the Fetch stage for probing when a fetch request is
sent. Notify is fired on the probe point when a request is sent succesfully in
the first attempt as well as on a retry attempt.

Probe points are added in the IEW stage when an instruction begins to execute
and when execution is complete. This points can be used for monitoring the
execution time of an instruction.

Probe points are added in the Rename stage to probe renaming of source and
destination registers and when there is squashing. These probe points can be
used to track register dependencies and remove when there is squashing.

A probe point for squashing is added in Commit to probe squashed instructions.
2015-12-07 16:42:15 -06:00
Pau Cabre abfb997800 cpu: fix unitialized variable which may cause assertion failure
The assert in lsq_unit_impl.hh line 963 needs pktPending to be initialized to
NULL (I got the assertion failure several times without the fix).

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-12-04 17:54:03 -06:00
Nathanael Premillieu 488128dab2 cpu: Fix base FP and CC register index in o3 insertThread()
Note that the method is not used, and could possibly be deleted.
2015-11-22 05:10:19 -05:00
Andreas Hansson 949437d559 cpu: Fix memory leak in traffic generator
In cases where we discard the packet, make sure to also delete it and
the associated request.
2015-11-22 05:10:16 -05:00
Andreas Sandberg d57a855e40 cpu: Enforce 1 interrupt controller per thread
Consider it a fatal configuration error if the number of interrupt
controllers doesn't match the number of threads in an SMT
configuration.
2015-11-20 14:50:17 -06:00
Nilay Vaish 1d268a1f2d o3: drop unused statistic wbPenalized and wbPenalizedRate 2015-11-16 04:57:52 -06:00
Andreas Hansson 2ac04c11ac misc: Add explicit overrides and fix other clang >= 3.5 issues
This patch adds explicit overrides as this is now required when using
"-Wall" with clang >= 3.5, the latter now part of the most recent
XCode. The patch consequently removes "virtual" for those methods
where "override" is added. The latter should be enough of an
indication.

As part of this patch, a few minor issues that clang >= 3.5 complains
about are also resolved (unused methods and variables).
2015-10-12 04:08:01 -04:00
Andreas Hansson 22c04190c6 misc: Remove redundant compiler-specific defines
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
2015-10-12 04:07:59 -04:00
Rekai Gonzalez Alberquilla d3d159749a isa: Add parameter to pick different decoder inside ISA
The decoder is responsible for splitting instructions in micro
operations (uops). Given that different micro architectures may split
operations differently, this patch allows to specify which micro
architecture each isa implements, so different cores in the system can
split instructions differently, also decoupling uop splitting
(microArch) from ISA (Arch). This is done making the decodification
calls templates that receive a type 'DecoderFlavour' that maps the
name of the operation to the class that implements it. This way there
is only one selection point (converting the command line enum to the
appropriate DecodeFeatures object). In addition, there is no explicit
code replication: template instantiation hides that, and the compiler
should be able to resolve a number of things at compile-time.
2015-10-09 14:50:54 -05:00
Steve Reinhardt 2511490c9c sim: add ExecMacro to Exec* compound debug flags
Really should have been there in the first place, IMO.
Makes debugging x86 execution a lot easier.
2015-10-06 17:26:50 -07:00
Curtis Dunham 02881a7bf3 base: remove Trace::enabled flag
The DTRACE() macro tests both Trace::enabled and the specific flag. This
change uses the same administrative interface for enabling/disabling
tracing, but masks the SimpleFlags settings directly. This eliminates a
load for every DTRACE() test, e.g. DPRINTF.
2015-09-30 15:21:55 -05:00
Mitch Hayenga 9e07a7504c cpu,isa,mem: Add per-thread wakeup logic
Changes wakeup functionality so that only specific threads on SMT
capable cpus are woken.
2015-09-30 11:14:19 -05:00
Mitch Hayenga a5c4eb3de9 isa,cpu: Add support for FS SMT Interrupts
Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
2015-09-30 11:14:19 -05:00
Mitch Hayenga fafa83ed32 cpu: Add per-thread monitors
Adds per-thread address monitors to support FullSystem SMT.
2015-09-30 11:14:19 -05:00
Mitch Hayenga 582a0148b4 config,cpu: Add SMT support to Atomic and Timing CPUs
Adds SMT support to the "simple" CPU models so that they can be
used with other SMT-supported CPUs. Example usage: this enables
the TimingSimpleCPU to be used to warmup caches before swapping to
detailed mode with the in-order or out-of-order based CPU models.
2015-09-30 11:14:19 -05:00
Mitch Hayenga 52d521e433 cpu: Change thread assignments for heterogenous SMT
Trying to run an SE system with varying threads per core (SMT cores + Non-SMT
cores) caused failures due to the CPU id assignment logic.  The comment
about thread assignment (worrying about core 0 not having tid 0) seems
not to be valid given that our configuration scripts initialize them in
order.

This removes that constraint so a heterogenously threaded sytem can work.
2015-09-30 11:14:19 -05:00
Andrew Lukefahr 543efd5ca6 cpu: pred: Local Predictor Reset in Tournament Predictor
When a branch gets squashed, it's speculative branch predictor state should get
rolled back in squash().  However, only the globalHistory state was being
rolled back.  This patch adds (at least some) support for rolling back the
local predictor state also.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-09-15 08:14:07 -05:00
Hongil Yoon fb0f9884e2 cpu, o3: consider split requests for LSQ checksnoop operations
This patch enables instructions in LSQ to track two physical addresses for
corresponding two split requests. Later, the information is used in
checksnoop() to search for/invalidate the corresponding LD instructions.

The current implementation has kept track of only the physical address that is
referenced by the first split request. Thus, for checksnoop(), the line
accessed by the second request has not been considered, causing potential
correctness issues.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-09-15 08:14:06 -05:00
Nilay Vaish 4727fc26f8 ruby: eliminate type uint64 and int64
These types are being replaced with uint64_t and int64_t.
2015-08-29 10:19:23 -05:00
Andreas Hansson daaae3744d mem: Reflect that packet address and size are always valid
This patch simplifies the packet, and removes the possibility of
creating a packet without a valid address and/or size. Under no
circumstances are these fields set at a later point, and thus they
really have to be provided at construction time.

The patch also fixes a case there the MinorCPU creates a packet
without a valid address and size, only to later delete it.
2015-08-21 07:03:27 -04:00
Andreas Hansson ae06e9a5c6 cpu: Move invldPid constant from Request to BaseCPU
A more natural home for this constant.
2015-08-21 07:03:14 -04:00
Nilay Vaish 2f44dada68 ruby: reverts to changeset: bf82f1f7b040 2015-08-19 10:02:01 -05:00
Nilay Vaish a6f3f38f2c ruby: eliminate type uint64 and int64
These types are being replaced with uint64_t and int64_t.
2015-08-14 19:28:43 -05:00
Nilay Vaish 91a84c5b3c ruby: replace Address by Addr
This patch eliminates the type Address defined by the ruby memory system.
This memory system would now use the type Addr that is in use by the
rest of the system.
2015-08-14 12:04:51 -05:00
Nilay Vaish 759fe30d9f ruby: drop some redundant includes 2015-08-11 11:39:23 -05:00
Andreas Sandberg 53e777d683 base: Declare a type for context IDs
Context IDs used to be declared as ad hoc (usually as int). This
changeset introduces a typedef for ContextIDs and a constant for
invalid context IDs.
2015-08-07 09:59:13 +01:00
David Hashe 8f71e667b3 cpu: Fixed a bug on where to fetch the next instruction from
Figure out if the next instruction to fetch comes from the micro-op ROM
or not. Otherwise, wrong instructions may be fetched.
2015-07-20 09:15:18 -05:00
Andreas Sandberg f789d729b5 cpu: Update debug message from Fetch1 isDrained() in Minor
Fix a spurious %s and include the state of the Fetch1 stage in the
debug printout.
2015-07-31 17:04:59 +01:00
Andreas Sandberg f73b05431a cpu: Fix Minor drain issues when switched out
The Minor CPU currently doesn't drain properly when it is switched
out. This happens because Fetch 1 expects to be in the FetchHalted
state when it is drained. However, because the CPU is switched out, it
is stuck in the FetchWaitingForPC state. Fix this by ignoring drain
requests and returning DrainState::Drained from MinorCPU::drain() if
the CPU is switched out. This is always safe since a switched out CPU,
by definition, doesn't have any instructions in flight.
2015-07-31 17:04:59 +01:00
Andreas Sandberg ff8195235e cpu: Only activate thread 0 in Minor if the CPU is active
Minor currently activates thread 0 in startup() to work around an
issue where activateContext() is called from LiveProcess before the
process entry point is known. When activateContext() is called, Minor
creates a branch instruction to the process's entry point. The first
time it is called, the branch points to an undefined location (0). The
call in startup() updates the branch to point to the actual entry
point.

When instantiating a switched out Minor CPU, it still tries to
activate thread 0. This is clearly incorrect since a switched out CPU
can't have any active threads. This changeset adds a check to ensure
that the thread is active before reactivating it.
2015-07-30 10:15:50 +01:00
Andreas Sandberg 473a0dcc63 cpu: Fix drain issues in the Minor CPU
The drain refactor patches introduced a couple of bugs in the way
Minor handles draining. This patch fixes an incorrect assert and a
case of infinite recursion when the CPU signals drain done.
2015-07-30 10:15:50 +01:00
Andreas Hansson 85a44e24dc cpu: Fix issue identified by UBSan 2015-07-30 03:41:22 -04:00
Nilay Vaish aafa5c3f86 revert 5af8f40d8f2c 2015-07-28 01:58:04 -05:00
Nilay Vaish 608641e23c cpu: implements vector registers
This adds a vector register type.  The type is defined as a std::array of a
fixed number of uint64_ts.  The isa_parser.py has been modified to parse vector
register operands and generate the required code.  Different cpus have vector
register files now.
2015-07-26 10:21:20 -05:00
Nilay Vaish 6e354e82d9 cpu: o3: slight correction to identation in rename_impl.hh 2015-07-26 10:20:07 -05:00
Brandon Potter bfe7ee96ad ruby: replace global g_abs_controls with per-RubySystem var
This is another step in the process of removing global variables
from Ruby to enable multiple RubySystem instances in a single simulation.

The list of abstract controllers is per-RubySystem and should be
represented that way, rather than as a global.

Since this is the last remaining Ruby global variable, the
src/mem/ruby/Common/Global.* files are also removed.
2015-07-10 16:05:24 -05:00
Andreas Sandberg ed38e3432c sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.

This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.

Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
2015-07-07 09:51:05 +01:00
Andreas Sandberg e9c3d59aae sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
2015-07-07 09:51:04 +01:00
Andreas Sandberg 76cd4393c0 sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

  * Add a set of APIs to serialize into a subsection of the current
    object. Previously, objects that needed this functionality would
    use ad-hoc solutions using nameOut() and section name
    generation. In the new world, an object that implements the
    interface has the methods serializeSection() and
    unserializeSection() that serialize into a named /subsection/ of
    the current object. Calling serialize() serializes an object into
    the current section.

  * Move the name() method from Serializable to SimObject as it is no
    longer needed for serialization. The fully qualified section name
    is generated by the main serialization code on the fly as objects
    serialize sub-objects.

  * Add a scoped ScopedCheckpointSection helper class. Some objects
    need to serialize data structures, that are not deriving from
    Serializable, into subsections. Previously, this was done using
    nameOut() and manual section name generation. To simplify this,
    this changeset introduces a ScopedCheckpointSection() helper
    class. When this class is instantiated, it adds a new /subsection/
    and subsequent serialization calls during the lifetime of this
    helper class happen inside this section (or a subsection in case
    of nested sections).

  * The serialize() call is now const which prevents accidental state
    manipulation during serialization. Objects that rely on modifying
    state can use the serializeOld() call instead. The default
    implementation simply calls serialize(). Note: The old-style calls
    need to be explicitly called using the
    serializeOld()/serializeSectionOld() style APIs. These are used by
    default when serializing SimObjects.

  * Both the input and output checkpoints now use their own named
    types. This hides underlying checkpoint implementation from
    objects that need checkpointing and makes it easier to change the
    underlying checkpoint storage code.
2015-07-07 09:51:03 +01:00
Nilay Vaish 11a48faeb4 o3: correct the number of cc registers in rename map 2015-07-04 10:43:46 -05:00
Andreas Sandberg 7c4eb3b4d8 kvm, arm: Add support for aarch64
This changeset adds support for aarch64 in kvm. The CPU module
supports both checkpointing and online CPU model switching as long as
no devices are simulated by the host kernel. It currently has the
following limitations:

   * The system register based generic timer can only be simulated by
     the host kernel. Workaround: Use a memory mapped timer instead to
     simulate the timer in gem5.

   * Simulating devices (e.g., the generic timer) in the host kernel
     requires that the host kernel also simulates the GIC.

   * ID registers in the host and in gem5 must match for switching
     between simulated CPUs and KVM. This is particularly important
     for ID registers describing memory system capabilities (e.g.,
     ASID size, physical address size).

   * Switching between a virtualized CPU and a simulated CPU is
     currently not supported if in-kernel device emulation is
     used. This could be worked around by adding support for switching
     to the gem5 (e.g., the KvmGic) side of the device models. A
     simpler workaround is to avoid in-kernel device models
     altogether.
2015-06-01 19:44:19 +01:00
Andreas Sandberg dbfd6effe0 kvm, arm, dev: Add an in-kernel GIC implementation
This changeset adds a GIC implementation that uses the kernel's
built-in support for simulating the interrupt controller. Since there
is currently no support for state transfer between gem5 and the
kernel, the device model does not support serialization and CPU
switching (which would require switching to a gem5-simulated GIC).
2015-06-01 19:44:17 +01:00
Andreas Sandberg 8e7c0575dc kvm: Handle inst events at the current instruction count
There are cases (particularly when attaching GDB) when instruction
events are scheduled at the current instruction tick. This used to
trigger an assertion error in kvm. This changeset adds a check for
this condition and forces KVM to do a quick entry that completes any
pending IO operations, but does not execute any new instructions,
before servicing the event. We could check if we need to enter KVM at
all, but forcing a quick entry is makes the code slightly cleaner and
does not hurt correctness (performance is hardly an issue in these
cases).
2015-06-01 19:43:41 +01:00
Andreas Sandberg 06cf5cc60b kvm, arm: Move ARM-specific files to arch/arm/kvm/
This changeset moves the ARM-specific KVM CPU implementation to
arch/arm/kvm/. This change is expected to keep the source tree
somewhat cleaner as we start adding support for ARMv8 and KVM
in-kernel interrupt controller simulation.

--HG--
rename : src/cpu/kvm/ArmKvmCPU.py => src/arch/arm/kvm/ArmKvmCPU.py
rename : src/cpu/kvm/arm_cpu.cc => src/arch/arm/kvm/arm_cpu.cc
rename : src/cpu/kvm/arm_cpu.hh => src/arch/arm/kvm/arm_cpu.hh
2015-06-01 19:43:40 +01:00
Andrew Bardsley cea1d14a93 cpu: Fix a bug in counting issued instructions in MinorCPU
The MinorCPU would count bubbles in Execute::issue as part of
the num_insts_issued and so sometimes reach the instruction
issue limit incorrectly.

Fixed by checking for a bubble in one new place.
2015-05-26 03:21:37 -04:00
Andreas Sandberg 5435f25ec8 kvm: Fix dumping code for large registers
The register dumping code in kvm tries to print the bytes in large
registers (128 bits and larger) instead of printing them as hex. This
changeset fixes that.
2015-05-23 13:37:22 +01:00
Andreas Sandberg ed447bbff9 kvm, x86: Guard x86-specific APIs in KvmVM
Protect x86-specific APIs in KvmVM with compile-time guards to avoid
breaking ARM builds.
2015-05-23 13:37:20 +01:00
Andreas Hansson bd583d00f9 misc: Appease gcc 5.1
Three minor issues are resolved:

1. Apparently gcc 5.1 does not like negation of booleans followed by
   bitwise AND.

2. Somehow the compiler also gets confused and warns about
   NoopMachInst being unused (removing it causes compilation errors
   though). Most likely a compiler bug.

3. There seems to be a number of instances where loop unrolling causes
   false positives for the array-bounds check. For now, switch to
   std::array. Potentially we could disable the warning for newer gcc
   versions, but switching to std::array is probably a good move in
   any case.
2015-05-15 13:39:53 -04:00
Andreas Sandberg 48281375ee mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

    * Speculation: CPU models are not allowed to issue speculative
      loads.

    * Write combining: CPU models and caches are not allowed to merge
      writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
2015-05-05 03:22:33 -04:00
Andreas Hansson 36f29496a0 mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.

The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
2015-05-05 03:22:29 -04:00
Andreas Hansson d0d933facc cpu: Work around gcc 4.9 issues with Num_OpClasses
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
2015-05-05 03:22:19 -04:00
Nilay Vaish 4333549575 cpu: o3: replace issueLatency with bool pipelined
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued.  As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined).  The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value.  To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean.  If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
2015-04-29 22:35:22 -05:00
Nilay Vaish 0dbd696aae cpu: o3: single cycle default div microop latency on x86
This patch sets the default latency of the division microop to a single cycle
on x86.  This is because the division instructions DIV and IDIV have been
implemented as loops of div microops, where each microop computes a single bit
of the quotient.
2015-04-29 22:35:22 -05:00
Brandon Potter a70a83155b cpu: remove conditional check (count > 0) on o3 IQ squashes
The o3 cpu instruction queue model uses the count variable to track the number
of unissued instructions in the queue. Previously, the squash method used
this variable to avoid executing the doSquash method when there were no
unissued instructions in the pipeline.  A corner case problem exists when
only issued instructions exist in the pipeline and a squash occurs; the
doSquash code is not invoked and subsequently does not clean up state properly.
2015-04-22 07:52:03 -07:00
Andreas Hansson cd76e34056 cpu: Remove the InOrderCPU from the tree
This patch takes the final step in removing the InOrderCPU from the
tree. Rest in peace.

The MinorCPU is now used to model an in-order microarchitecture, and
long term the MinorCPU will eventually be renamed InOrderCPU.
2015-04-20 12:46:35 -04:00
Malek Musleh 826f69b470 config, cpu: fix progress interval for switched CPUs
This patch ensures that the CPU progress Event is triggered for the new set of
switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids
printing the interval state if the cpu is currently switched out.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-14 11:01:10 -05:00
Dibakar Gope 34ad1123ee cpu: re-organizes the branch predictor structure.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-13 17:33:57 -05:00
Nikos Nikoleris 305e29b98e cpu: fix system total instructions accounting
The totalInstructions counter is only incremented when the whole instruction is
commited and not on every microop. It was incorrectly reset in atomic and
timing cpus.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
2015-04-03 11:42:10 -05:00
Andreas Hansson a196dbe3bf cpu: Fix InstPBTrace inheritance
This patch fixes an issue that prevented gem5 to be built with C++
config and without Python.
2015-03-26 11:16:43 -04:00
Steve Reinhardt 6677b9122a mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW
Makes x86-style locked operations even more distinct from
LLSC operations.  Using "locked" by itself should be
obviously ambiguous now.
2015-03-23 16:14:20 -07:00
Wendy Elsasser 9b4d8030e6 cpu: Fix TrafficGen message format
Fix erroneous message format for fatal error.
Previously, code did not have type indicator (% instead of %d).

Also removed redundant fatal check.

Ran modified sweep.py with in range and out of range values to test.
2015-03-19 04:06:12 -04:00
Steve Reinhardt ee0b52404c mem: restructure Packet cmd initialization a bit more
Refactor the way that specific MemCmd values are generated for packets.
The new approach is a little more elegant in that we assign the right
value up front, and it's also more amenable to non-heap-allocated
Packet objects.

Also replaced the code in the Minor model that was still doing it the
ad-hoc way.

This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
2015-02-11 10:48:50 -08:00
Nilay Vaish e5fbc67e16 cpu: o3: another assert instead of check 2015-03-09 09:39:08 -05:00
Nilay Vaish 5003ed5f7a cpu: o3: Remove unused code in iew, add assert instead. 2015-03-09 09:39:08 -05:00
Nilay Vaish 4e1d10a3cf cpu: o3: commit: mark pipeline delay variable as consts 2015-03-09 09:39:08 -05:00
Nilay Vaish 53de2512b1 cpu: o3: remove unused stat variables. 2015-03-09 09:39:08 -05:00
Nilay Vaish 54bc67f619 cpu: o3: combine if with same condition 2015-03-09 09:39:07 -05:00
Nilay Vaish 61edd5ac97 cpu: o3: remove member variable squashCounter
The variable is used in only one place and a whole new function setNextStatus()
has been defined just to compute the value of the variable.  Instead of calling
the function, the value is now computed in the loop that preceded the function
call.
2015-03-09 09:39:07 -05:00
Nilay Vaish f69a74fda6 cpu: o3: remove unused function annotateMemoryUnits() 2015-03-09 09:39:07 -05:00
Andreas Hansson 36dc93a5fa mem: Move crossbar default latencies to subclasses
This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.

Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
2015-03-02 04:00:47 -05:00
Andreas Hansson d64b34bef8 arm: Share a port for the two table walker objects
This patch changes how the MMU and table walkers are created such that
a single port is used to connect the MMU and the TLBs to the memory
system. Previously two ports were needed as there are two table walker
objects (stage one and stage two), and they both had a port. Now the
port itself is moved to the Stage2MMU, and each TableWalker is simply
using the port from the parent.

By using the same port we also remove the need for having an
additional crossbar joining the two ports before the walker cache or
the L2. This simplifies the creation of the CPU cache topology in
BaseCPU.py considerably. Moreover, for naming and symmetry reasons,
the TLB walker port is connected through the stage-one table walker
thus making the naming identical to x86. Along the same line, we use
the stage-one table walker to generate the master id that is used by
all TLB-related requests.
2015-03-02 04:00:42 -05:00
Rekai 3d5434022a cpu: o3 register renaming request handling improved
Now, prior to the renaming, the instruction requests the exact amount of
registers it will need, and the rename_map decides whether the instruction is
allowed to proceed or not.
2015-03-02 04:00:38 -05:00
Andreas Hansson f26a289295 mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
2015-03-02 04:00:35 -05:00
Stephan Diestelhorst de46eeade7 cpu: Add a PC-value to the traffic generator requests
Have the traffic generator add its masterID as the PC address to the
requests. That way, prefetchers (and other components) that use a PC
for request classification will see per-tester streams of requests.
This enables us to test strided prefetchers with the memchecker, too.
2015-03-02 04:00:31 -05:00
Andreas Hansson 8c78aa31ea cpu: TrafficGen sinks snoops without complaining
To be able to use the TrafficGen in a system with caches we need to
allow it to sink incoming snoop requests. By default the master port
panics, so silently ignore any snoops.
2015-02-16 03:34:55 -05:00
Andreas Hansson d0e1b8a19c arch: Make readMiscRegNoEffect const throughout
Finally took the plunge and made this apply to all ISAs, not just ARM.
2015-02-16 03:33:28 -05:00
Ali Saidi 4eff4fa12e cpu: add support for outputing a protobuf formatted CPU trace
Doesn't support x86 due to static instruction representation.

--HG--
rename : src/cpu/CPUTracers.py => src/cpu/InstPBTrace.py
2015-02-16 03:32:38 -05:00
Andreas Hansson 6563ec8634 cpu: Tidy up the MemTest and make false sharing more obvious
The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.

The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.

In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
2015-02-11 10:23:28 -05:00
Andreas Sandberg 550c318490 sim: Move the BaseTLB to src/arch/generic/
The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.

--HG--
rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py
rename : src/sim/tlb.cc => src/arch/generic/tlb.cc
rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-02-11 10:23:27 -05:00