Commit graph

745 commits

Author SHA1 Message Date
Brandon Potter 7a8dda49a4 style: [patch 1/22] use /r/3648/ to reorganize includes 2016-11-09 14:27:37 -06:00
Arthur Perais 1664625db8 cpu: Resolve targets of predicted 'taken' decode for O3
The target of taken conditional direct branches does not
need to be resolved in IEW: the target can be computed at
decode, usually using the decoded instruction word and the PC.

The higher-than-necessary penalty is taken only on conditional
branches that are predicted taken but miss in the BTB. Thus,
this is mostly inconsequential on IPC if the BTB is big/associative
enough (fewer capacity/conflict misses). Nonetheless, what gem5
simulates is not representative of how conditional branch targets
can be handled.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-12-21 15:05:24 -06:00
Arthur Perais e5fb6752d6 cpu: Clarify meaning of cachePorts variable in lsq_unit.hh of O3
cachePorts currently constrains the number of store packets written to the
D-Cache each cycle), but loads currently affect this variable. This leads
to unexpected congestion (e.g., setting cachePorts to a realistic 1 will
in fact allow a store to WB only if no loads have accessed the D-Cache
this cycle). In the absence of arbitration, this patch decouples how many
loads can be done per cycle from how many stores can be done per cycle.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-12-21 15:04:06 -06:00
Fernando Endo 6c72c35519 cpu, arm: Distinguish Float* and SimdFloat*, create FloatMem* opClass
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to
Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which
distinguishes writes to the INT and FP register banks.
Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72,
where the "latency" of FMADD is 3 if the next instruction is a FMADD and
has only the augend to destination dependency, otherwise it's 7 cycles.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15 14:58:45 -05:00
Rekai Gonzalez-Alberquilla ad296b068c cpu: Fix the O3 CPU Drain
The drain did not wait until stages were ready again. Therefore, as a
result of messages in the TimeBuffer being drain, the state after the
drain was not consistent and asserts fired in some places when the
draining happened after a stage got blocked, but before the notification
arrived to the previous stages.

Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-22 10:49:10 +01:00
Michael LeBeane 458d4a3c7b sim: Refactor quiesce and remove FS asserts
The quiesce family of magic ops can be simplified by the inclusion of
quiesceTick() and quiesce() functions on ThreadContext.  This patch also
gets rid of the FS guards, since suspending a CPU is also a valid
operation for SE mode.
2016-09-13 23:17:42 -04:00
David Guillen Fandos fb5fc11da4 pwr: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU
into a low-power idle state whenever all threads in it are idle.

Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b
2016-06-06 17:16:43 +01:00
David Guillen Fandos 70798b1ba0 stats: Fixing regStats function for some SimObjects
Fixing an issue with regStats not calling the parent class method
for most SimObjects in Gem5. This causes issues if one adds new
stats in the base class (since they are never initialized properly!).

Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-06-06 17:16:43 +01:00
Mitch Hayenga c75ff71139 mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.

This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
2016-04-07 09:30:20 -05:00
Andreas Sandberg be28d96510 Revert power patch sets with unexpected interactions
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:

e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject

Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>

--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
2016-04-06 19:43:31 +01:00
Mitch Hayenga 8615b27174 mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
2016-04-05 12:39:21 -05:00
Akash Bagdia 1c34ee20df power: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU into a low-power
idle state whenever all threads in it are idle.
2014-12-09 10:42:08 +00:00
Rekai Gonzalez Alberquilla 21f8242430 cpu: Change literal integer constants to meaningful labels
fu_pool and inst_queue were using -1 for "no such FU" and -2 for "all those
FUs are busy at the moment" when requesting for a FU and replying. This
patch introduces new constants NoCapableFU and NoFreeFU respectively.

In addition, the condition (idx == -2 || idx != -1) is equivalent to
(idx != -1), so this patch also simplifies that.

--HG--
extra : rebase_source : 4833717b9d1e09d7594d1f34f882e13fc4b86846
2015-05-05 16:47:24 +01:00
Andreas Sandberg 5383e1ada4 base: Add support for changing output directories
This changeset adds support for changing the simulator output
directory. This can be useful when the simulation goes through several
stages (e.g., a warming phase, a simulation phase, and a verification
phase) since it allows the output from each stage to be located in a
different directory. Relocation is done by calling core.setOutputDir()
from Python or simout.setOutputDirectory() from C++.

This change affects several parts of the design of the gem5's output
subsystem. First, files returned by an OutputDirectory instance (e.g.,
simout) are of the type OutputStream instead of a std::ostream. This
allows us to do some more book keeping and control re-opening of files
when the output directory is changed. Second, new subdirectories are
OutputDirectory instances, which should be used to create files in
that sub-directory.

Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se>
[sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version]
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2015-11-27 14:41:59 +00:00
Stephan Diestelhorst f703160e5a mem, cpu: Add assertions to snoop invalidation logic
This patch adds assertions that enforce that only invalidating snoops
will ever reach into the logic that tracks in-order load completion and
also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds
some comments to MSHR::replaceUpgrades().
2015-08-10 11:25:52 +01:00
Krishnendra Nathella cabd4768c7 cpu: Fix LLSC atomic CPU wakeup
Writes to locked memory addresses (LLSC) did not wake up the locking
CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU,
recvAtomicSnoop was checking if the incoming packet was an invalidation
(isInvalidate) and only then handled a locked snoop. But, writes are
seen instead of invalidates when running without caches (fast-forward
configurations). As as simple fix, now handleLockedSnoop is also called
even if the incoming snoop packet are from writes.
2015-07-19 15:03:30 -05:00
Andreas Hansson fbdeb60316 mem: Deduce if cache should forward snoops
This patch changes how the cache determines if snoops should be
forwarded from the memory side to the CPU side. Instead of having a
parameter, the cache now looks at the port connected on the CPU side,
and if it is a snooping port, then snoops are forwarded. Less error
prone, and less parameters to worry about.

The patch also tidies up the CPU classes to ensure that their I-side
port is not snooping by removing overrides to the snoop request
handler, such that snoop requests will panic via the default
MasterPort implement
2016-02-10 04:08:24 -05:00
Steve Reinhardt 5592798865 style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'.
2016-02-06 17:21:19 -08:00
Steve Reinhardt dc8018a5c3 style: remove trailing whitespace
Result of running 'hg m5style --skip-all --fix-white -a'.
2016-02-06 17:21:18 -08:00
Steve Reinhardt 707275265f cpu: remove unnecessary data ptr from O3 internal read() funcs
The read() function merely initiates a memory read operation; the
data doesn't arrive until the access completes and a response packet
is received from the memory system.  Thus there's no need to provide
a data pointer; its existence is historical.

Getting this pointer out of this internal o3 interface sets the
stage for similar cleanup in the ExecContext interface.  Also
found that we were pointlessly setting the contents at this pointer
on a store forward (the useful memcpy happens just a few lines
below the deleted one).
2016-01-17 18:27:46 -08:00
Andreas Hansson 0fcb376e5f mem: Make cache terminology easier to understand
This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.

The following name changes are made:

* the packet memInhibit flag is renamed to cacheResponding

* the packet sharedAsserted flag is renamed to hasSharers

* the packet NeedsExclusive attribute is renamed to NeedsWritable

* the packet isSupplyExclusive is renamed responderHadWritable

* the MSHR pendingDirty is renamed to pendingModified

The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.
2015-12-31 09:32:58 -05:00
Radhika Jagtap 54519fd51f cpu: Support virtual addr in elastic traces
This patch adds support to optionally capture the virtual address and asid
for load/store instructions in the elastic traces. If they are present in
the traces, Trace CPU will set those fields of the request during replay.
2015-12-07 16:42:16 -06:00
Radhika Jagtap 3080bbcc36 cpu: Create record type enum for elastic traces
This patch replaces the booleans that specified the elastic trace record
type with an enum type. The source of change is the proto message for
elastic trace where the enum is introduced. The struct definitions in the
elastic trace probe listener as well as the Trace CPU replace the boleans
with the proto message enum.

The patch does not impact functionality, but traces are not compatible with
previous version. This is preparation for adding new types of records in
subsequent patches.
2015-12-07 16:42:16 -06:00
Radhika Jagtap de4dc50e95 proto, probe: Add elastic trace probe to o3 cpu
The elastic trace is a type of probe listener and listens to probe points
in multiple stages of the O3CPU. The notify method is called on a probe
point typically when an instruction successfully progresses through that
stage.

As different listener methods mapped to the different probe points execute,
relevant information about the instruction, e.g. timestamps and register
accesses, are captured and stored in temporary InstExecInfo class objects.
When the instruction progresses through the commit stage, the timing and the
dependency information about the instruction is finalised and encapsulated in
a struct called TraceInfo. TraceInfo objects are collected in a list instead
of writing them out to the trace file one a time. This is required as the
trace is processed in chunks to evaluate order dependencies and computational
delay in case an instruction does not have any register dependencies. By this
we achieve a simpler algorithm during replay because every record in the
trace can be hooked onto a record in its past. The instruction dependency
trace is written out as a protobuf format file. A second trace containing
fetch requests at absolute timestamps is written to a separate protobuf
format file.

If the instruction is not executed then it is not added to the trace.
The code checks if the instruction had a fault, if it predicated
false and thus previous register values were restored or if it was a
load/store that did not have a request (e.g. when the size of the
request is zero). In all these cases the instruction is set as
executed by the Execute stage and is picked up by the commit probe
listener. But a request is not issued and registers are not written.
So practically, skipping these should not hurt the dependency modelling.

If squashing results in squashing younger instructions, it may happen that
the squash probe discards the inst and removes it from the temporary
store but execute stage deals with the instruction in the next cycle which
results in the execute probe seeing this inst as 'new' inst. A sequence
number of the last processed trace record is used to trap these cases and
not add to the temporary store.

The elastic instruction trace and fetch request trace can be read in and
played back by the TraceCPU.
2015-12-07 16:42:15 -06:00
Radhika Jagtap eb19fc2976 probe: Add probe in Fetch, IEW, Rename and Commit
This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows.

A probe point is added in the Fetch stage for probing when a fetch request is
sent. Notify is fired on the probe point when a request is sent succesfully in
the first attempt as well as on a retry attempt.

Probe points are added in the IEW stage when an instruction begins to execute
and when execution is complete. This points can be used for monitoring the
execution time of an instruction.

Probe points are added in the Rename stage to probe renaming of source and
destination registers and when there is squashing. These probe points can be
used to track register dependencies and remove when there is squashing.

A probe point for squashing is added in Commit to probe squashed instructions.
2015-12-07 16:42:15 -06:00
Pau Cabre abfb997800 cpu: fix unitialized variable which may cause assertion failure
The assert in lsq_unit_impl.hh line 963 needs pktPending to be initialized to
NULL (I got the assertion failure several times without the fix).

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-12-04 17:54:03 -06:00
Nathanael Premillieu 488128dab2 cpu: Fix base FP and CC register index in o3 insertThread()
Note that the method is not used, and could possibly be deleted.
2015-11-22 05:10:19 -05:00
Nilay Vaish 1d268a1f2d o3: drop unused statistic wbPenalized and wbPenalizedRate 2015-11-16 04:57:52 -06:00
Andreas Hansson 2ac04c11ac misc: Add explicit overrides and fix other clang >= 3.5 issues
This patch adds explicit overrides as this is now required when using
"-Wall" with clang >= 3.5, the latter now part of the most recent
XCode. The patch consequently removes "virtual" for those methods
where "override" is added. The latter should be enough of an
indication.

As part of this patch, a few minor issues that clang >= 3.5 complains
about are also resolved (unused methods and variables).
2015-10-12 04:08:01 -04:00
Andreas Hansson 22c04190c6 misc: Remove redundant compiler-specific defines
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
2015-10-12 04:07:59 -04:00
Rekai Gonzalez Alberquilla d3d159749a isa: Add parameter to pick different decoder inside ISA
The decoder is responsible for splitting instructions in micro
operations (uops). Given that different micro architectures may split
operations differently, this patch allows to specify which micro
architecture each isa implements, so different cores in the system can
split instructions differently, also decoupling uop splitting
(microArch) from ISA (Arch). This is done making the decodification
calls templates that receive a type 'DecoderFlavour' that maps the
name of the operation to the class that implements it. This way there
is only one selection point (converting the command line enum to the
appropriate DecodeFeatures object). In addition, there is no explicit
code replication: template instantiation hides that, and the compiler
should be able to resolve a number of things at compile-time.
2015-10-09 14:50:54 -05:00
Mitch Hayenga 9e07a7504c cpu,isa,mem: Add per-thread wakeup logic
Changes wakeup functionality so that only specific threads on SMT
capable cpus are woken.
2015-09-30 11:14:19 -05:00
Mitch Hayenga a5c4eb3de9 isa,cpu: Add support for FS SMT Interrupts
Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
2015-09-30 11:14:19 -05:00
Mitch Hayenga fafa83ed32 cpu: Add per-thread monitors
Adds per-thread address monitors to support FullSystem SMT.
2015-09-30 11:14:19 -05:00
Hongil Yoon fb0f9884e2 cpu, o3: consider split requests for LSQ checksnoop operations
This patch enables instructions in LSQ to track two physical addresses for
corresponding two split requests. Later, the information is used in
checksnoop() to search for/invalidate the corresponding LD instructions.

The current implementation has kept track of only the physical address that is
referenced by the first split request. Thus, for checksnoop(), the line
accessed by the second request has not been considered, causing potential
correctness issues.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-09-15 08:14:06 -05:00
Andreas Sandberg 53e777d683 base: Declare a type for context IDs
Context IDs used to be declared as ad hoc (usually as int). This
changeset introduces a typedef for ContextIDs and a constant for
invalid context IDs.
2015-08-07 09:59:13 +01:00
David Hashe 8f71e667b3 cpu: Fixed a bug on where to fetch the next instruction from
Figure out if the next instruction to fetch comes from the micro-op ROM
or not. Otherwise, wrong instructions may be fetched.
2015-07-20 09:15:18 -05:00
Nilay Vaish aafa5c3f86 revert 5af8f40d8f2c 2015-07-28 01:58:04 -05:00
Nilay Vaish 608641e23c cpu: implements vector registers
This adds a vector register type.  The type is defined as a std::array of a
fixed number of uint64_ts.  The isa_parser.py has been modified to parse vector
register operands and generate the required code.  Different cpus have vector
register files now.
2015-07-26 10:21:20 -05:00
Nilay Vaish 6e354e82d9 cpu: o3: slight correction to identation in rename_impl.hh 2015-07-26 10:20:07 -05:00
Andreas Sandberg ed38e3432c sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.

This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.

Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
2015-07-07 09:51:05 +01:00
Andreas Sandberg e9c3d59aae sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
2015-07-07 09:51:04 +01:00
Andreas Sandberg 76cd4393c0 sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

  * Add a set of APIs to serialize into a subsection of the current
    object. Previously, objects that needed this functionality would
    use ad-hoc solutions using nameOut() and section name
    generation. In the new world, an object that implements the
    interface has the methods serializeSection() and
    unserializeSection() that serialize into a named /subsection/ of
    the current object. Calling serialize() serializes an object into
    the current section.

  * Move the name() method from Serializable to SimObject as it is no
    longer needed for serialization. The fully qualified section name
    is generated by the main serialization code on the fly as objects
    serialize sub-objects.

  * Add a scoped ScopedCheckpointSection helper class. Some objects
    need to serialize data structures, that are not deriving from
    Serializable, into subsections. Previously, this was done using
    nameOut() and manual section name generation. To simplify this,
    this changeset introduces a ScopedCheckpointSection() helper
    class. When this class is instantiated, it adds a new /subsection/
    and subsequent serialization calls during the lifetime of this
    helper class happen inside this section (or a subsection in case
    of nested sections).

  * The serialize() call is now const which prevents accidental state
    manipulation during serialization. Objects that rely on modifying
    state can use the serializeOld() call instead. The default
    implementation simply calls serialize(). Note: The old-style calls
    need to be explicitly called using the
    serializeOld()/serializeSectionOld() style APIs. These are used by
    default when serializing SimObjects.

  * Both the input and output checkpoints now use their own named
    types. This hides underlying checkpoint implementation from
    objects that need checkpointing and makes it easier to change the
    underlying checkpoint storage code.
2015-07-07 09:51:03 +01:00
Nilay Vaish 11a48faeb4 o3: correct the number of cc registers in rename map 2015-07-04 10:43:46 -05:00
Andreas Hansson bd583d00f9 misc: Appease gcc 5.1
Three minor issues are resolved:

1. Apparently gcc 5.1 does not like negation of booleans followed by
   bitwise AND.

2. Somehow the compiler also gets confused and warns about
   NoopMachInst being unused (removing it causes compilation errors
   though). Most likely a compiler bug.

3. There seems to be a number of instances where loop unrolling causes
   false positives for the array-bounds check. For now, switch to
   std::array. Potentially we could disable the warning for newer gcc
   versions, but switching to std::array is probably a good move in
   any case.
2015-05-15 13:39:53 -04:00
Andreas Sandberg 48281375ee mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

    * Speculation: CPU models are not allowed to issue speculative
      loads.

    * Write combining: CPU models and caches are not allowed to merge
      writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
2015-05-05 03:22:33 -04:00
Andreas Hansson 36f29496a0 mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.

The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
2015-05-05 03:22:29 -04:00
Andreas Hansson d0d933facc cpu: Work around gcc 4.9 issues with Num_OpClasses
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
2015-05-05 03:22:19 -04:00
Nilay Vaish 4333549575 cpu: o3: replace issueLatency with bool pipelined
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued.  As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined).  The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value.  To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean.  If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
2015-04-29 22:35:22 -05:00
Nilay Vaish 0dbd696aae cpu: o3: single cycle default div microop latency on x86
This patch sets the default latency of the division microop to a single cycle
on x86.  This is because the division instructions DIV and IDIV have been
implemented as loops of div microops, where each microop computes a single bit
of the quotient.
2015-04-29 22:35:22 -05:00