Commit graph

640 commits

Author SHA1 Message Date
Dam Sunwoo 84f8fe637c cpu: Add O3 CPU width checks
O3CPU has a compile-time maximum width set in o3/impl.hh, but checking
the configuration against this limit was not implemented anywhere
except for fetch. Configuring a wider pipe than the limit can silently
cause various issues during the simulation. This patch adds the proper
checking in the constructor of the various pipeline stages.
2014-04-23 05:18:18 -04:00
Faissal Sleiman a1570f544f o3: Fix occupancy checks for SMT
A number of calls to isEmpty() and numFreeEntries()
should be thread-specific.

In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob
has instructions from thread 0 (isEmpty() returns false), and none from
thread 1. If we are trying to squash all of thread 1, then
readTailInst(thread 1) will be called because rob->isEmpty() returns
false. The result is end_it is not in the list and the while
statement loops indefinitely back over the cpu's instList.

In iew_impl.hh, all threads are told they have the entire remaining IQ, when
each thread actually has a certain allocation. The result is extra stalls at
the iew dispatch stage which the rename stage usually takes care of.

In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only
contains instructions from thread 0. This returns a dummyInst (which may work
since we are trying to squash all instructions, but hardly seems like the right
way to do it).

In rob_impl.hh this fix skips the rest of the function more frequently and is
more efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-04-19 09:00:30 -05:00
Marco Elver b884fcf412 cpu: o3: lsq: Fix TSO implementation
This patch fixes violation of TSO in the O3CPU, as all loads must be
ordered with all other loads. In the LQ, if a snoop is observed, all
subsequent loads need to be squashed if the system is TSO.

Prior to this patch, the following case could be violated:

 P0         | P1          ;
 MOV [x],mail=/usr/spool/mail/nilay | MOV EAX,[y] ;
 MOV [y],mail=/usr/spool/mail/nilay | MOV EBX,[x] ;

exists (1:EAX=1 /\ 1:EBX=0) [is a violation]

The problem was found using litmus [http://diy.inria.fr].

Committed by: Nilay Vaish <nilay@cs.wisc.edu
2014-03-25 13:15:04 -05:00
Paul Rosenfeld 32bf74cb8e alpha: Small removal of dead comments/code from alpha ISA
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-12 07:03:22 -05:00
Andreas Hansson 62fe81e9c1 cpu: Make CPU and ThreadContext getters const
This patch merely tidies up the CPU and ThreadContext getters by
making them const where appropriate.
2014-03-07 15:56:23 -05:00
Mitch Hayenga b9a9d99b22 scons: Fixes uninitialized warnings issued by clang
Small fixes to appease recent clang versions.
2014-03-07 15:56:23 -05:00
Geoffrey Blake 9633282fc8 checker: CheckerCPU handling of MiscRegs was incorrect
The CheckerCPU model in pre-v8 code was not checking the
updates to miscellaneous registers due to some methods
for setting misc regs were not instrumented.  The v8 patches
exposed this by calling the instrumented misc reg update
methods and then invoking the checker before the main CPU had
updated its misc regs, leading to false positives about
register mismatches. This patch fixes the non-instrumented
misc reg update methods and places calls to the checker in
the proper places in the O3 model.
2014-01-24 15:29:30 -06:00
Ali Saidi 7d0344704a arch, cpu: Add support for flattening misc register indexes.
With ARMv8 support the same misc register id  results in accessing different
registers depending on the current mode of the processor. This patch adds
the same orthogonality to the misc register file as the others (int, float, cc).
For all the othre ISAs this is currently a null-implementation.

Additionally, a system variable is added to all the ISA objects.
2014-01-24 15:29:30 -06:00
Giacomo Gabrielli 3436de0c2a cpu: Add support for Memory+Barrier instruction types in O3 cpu. 2014-01-24 15:29:30 -06:00
Ali Saidi 90b1775a8f cpu: Add support for instructions that zero cache lines. 2014-01-24 15:29:30 -06:00
Ali Saidi 6bed6e0352 cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped.
This patch add support for generating wake-up events in the CPU when an address
that is currently in the exclusive state is hit by a snoop. This mechanism is required
for ARMv8 multi-processor support.
2014-01-24 15:29:30 -06:00
Dam Sunwoo 85e8779de7 mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks.  Cache occupancy stats are
recalculated on each stat dump.
2014-01-24 15:29:30 -06:00
Matt Horsnell 739c6df94e base: add support for probe points and common probes
The probe patch is motivated by the desire to move analytical and trace code
away from functional code. This is achieved by the probe interface which is
essentially a glorified observer model.

What this means to users:
* add a probe point and a "notify" call at the source of an "event"
* add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace)
* register that module as a probe listener
Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py

What is happening under the hood:
* every SimObject maintains has a ProbeManager.
* during initialization (src/python/m5/simulate.py) first regProbePoints and
  the regProbeListeners is called on each SimObject.  this hooks up the probe
  point notify calls with the listeners.

FAQs:
Why did you develop probe points:
* to remove trace, stats gathering, analytical code out of the functional code.
* the belief that probes could be generically useful.

What is a probe point:
* a probe point is used to notify upon a given event (e.g. cpu commits an instruction)

What is a probe listener:
* a class that handles whatever the user wishes to do when they are notified
  about an event.

What can be passed on notify:
* probe points are templates, and so the user can generate probes that pass any
  type of argument (by const reference) to a listener.

What relationships can be generated (1:1, 1:N, N:M etc):
* there isn't a restriction. You can hook probe points and listeners up in a
  1:1, 1:N, N:M relationship. They become useful when a number of modules
  listen to the same probe points. The idea being that you can add a small
  number of probes into the source code and develop a larger number of useful
  analysis modules that use information passed by the probes.

Can you give examples:
* adding a probe point to the cpu's commit method allows you to build a trace
  module (outputting assembler), you could re-use this to gather instruction
  distribution (arithmetic, load/store, conditional, control flow) stats.

Why is the probe interface currently restricted to passing a const reference:
* the desire, initially at least, is to allow an interface to observe
  functionality, but not to change functionality.
* of course this can be subverted by const-casting.

What is the performance impact of adding probes:
* when nothing is actively listening to the probes they should have a
  relatively minor impact. Profiling has suggested even with a large number of
  probes (60) the impact of them (when not active) is very minimal (<1%).
2014-01-24 15:29:30 -06:00
Matt Horsnell ca89eba79e mem: track per-request latencies and access depths in the cache hierarchy
Add some values and methods to the request object to track the translation
and access latency for a request and which level of the cache hierarchy responded
to the request.
2014-01-24 15:29:30 -06:00
Andreas Hansson 7db542c0dd cpu: Relax check on squashed non-speculative instructions
This patch relaxes the check performed when squashing non-speculative
instructions, as it caused problems with loads that were marked ready,
and then stalled on a blocked cache. The assertion is now allowing
memory references to be non-faulting.
2014-01-24 15:29:29 -06:00
Nilay Vaish 5800e83223 cpu: call BaseCPU startup() function in o3 cpu 2013-12-03 10:36:04 -06:00
Anthony Gutierrez 8a53da22c2 cpu: allow the fetch buffer to be smaller than a cache line
the current implementation of the fetch buffer in the o3 cpu
is only allowed to be the size of a cache line. some
architectures, e.g., ARM, have fetch buffers smaller than a cache
line, see slide 22 at:
http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf

this patch allows the fetch buffer to be set to values smaller
than a cache line.
2013-11-15 13:21:15 -05:00
Faissal Sleiman 397dc784fd cpu: Construct ROB with cpu params struct instead of each variable
Most other structures/stages get passed the cpu params struct.
2013-10-31 13:41:13 -05:00
Ali Saidi 79f81e2641 cpu: Fix O3 issuse with load+barrier instructions.
Fix a problem in the O3 CPU for instructions that are both
memory loads and memory barriers (e.g. load acquire) and
to uncacheable memory. This combination can confuse the
commit stage into commitng an instruction that hasn't
executed and got it's value yet. At the same time refactor
the code slightly to remove duplication between two of
the cases.
2013-10-31 13:41:13 -05:00
Matt Horsnell 6decd70bfb cpu: add consistent guarding to *_impl.hh files. 2013-10-17 10:20:45 -05:00
Faissal Sleiman 1746eb4a11 cpu: Removing an unused variable in rename 2013-10-17 10:20:45 -05:00
Faissal Sleiman 9195f1fbfd cpu: Change IEW DPRINTF to use IEW debug flag
IEW DPRINTF uses Decode debug flag, which appears to be a copying error. This
patch changes this to the IEW Debug flag.
2013-10-17 10:20:45 -05:00
Faissal Sleiman e516531bd0 cpu: Put in assertions to check for maximum supported LQ/SQ size
LSQSenderState represents the LQ/SQ index using uint8_t, which supports up to
 256 entries (including the sentinel entry). Sending packets to memory with a
higher index than 255 truncates the index, such that the response matches the
wrong entry. For instance, this can result in a deadlock if a store completion
does not clear the head entry.
2013-10-17 10:20:45 -05:00
Yasuko Eckert 1bb293d1e7 arch/x86: add support for explicit CC register file
Convert condition code registers from being specialized
("pseudo") integer registers to using the recently
added CC register class.

Nilay Vaish also contributed to this patch.
2013-10-15 14:22:44 -04:00
Yasuko Eckert 2c293823aa cpu: add a condition-code register class
Add a third register class for condition codes,
in parallel with the integer and FP classes.
No ISAs use the CC class at this point though.
2013-10-15 14:22:44 -04:00
Steve Reinhardt 5526221847 cpu/o3: clean up rename map and free list
Restructured rename map and free list to clean up some
extraneous code and separate out common code that can
be reused across different register classes (int and fp
at this point).  Both components now consist of a set
of Simple* objects that are stand-alone rename map &
free list for each class, plus a Unified* object that
presents a unified interface across all register
classes and then redirects accesses to the appropriate
Simple* object as needed.

Moved free list initialization to PhysRegFile to better
isolate knowledge of physical register index mappings
to that class (and remove the need to pass a number
of parameters to the free list constructor).

Causes a small change to these stats:
  cpu.rename.int_rename_lookups
  cpu.rename.fp_rename_lookups
because they are now categorized on a per-operand basis
rather than a per-instruction basis.
That is, an instruction with mixed fp/int/misc operand
types will have each operand categorized independently,
where previously the lookup was categorized based on
the instruction type.
2013-10-15 14:22:44 -04:00
Steve Reinhardt 219c423f1f cpu: rename *_DepTag constants to *_Reg_Base
Make these names more meaningful.

Specifically, made these substitutions:

s/FP_Base_DepTag/FP_Reg_Base/g;
s/Ctrl_Base_DepTag/Misc_Reg_Base/g;
s/Max_DepTag/Max_Reg_Index/g;
2013-10-15 14:22:43 -04:00
Steve Reinhardt 9bd017b8ae cpu/o3: clean up scoreboard object
It had a bunch of fields (and associated constructor
parameters) thet it didn't really use, and the array
initialization was needlessly verbose.

Also just hardwired the getReg() method to aleays
return true for misc regs, rather than having an array
of bits that we always kept marked as ready.
2013-10-15 14:22:43 -04:00
Steve Reinhardt c009d0eb2a cpu/o3: clean up physical register file
No need for PhysRegFile to be a template class, or
have a pointer back to the CPU.  Also made some methods
for checking the physical register type (int vs. float)
based on the phys reg index, which will come in handy later.
2013-10-15 14:22:43 -04:00
Steve Reinhardt 7aa423acad cpu: clean up architectural register classification
Move from a poorly documented scheme where the mapping
of unified architectural register indices to register
classes is hardcoded all over to one where there's an
enum for the register classes and a function that
encapsulates the mapping.
2013-10-15 14:22:42 -04:00
Joel Hestness a1f9081bab cpu: Dynamically instantiate O3 CPU LSQUnits
Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's
object, but it would only initialize numThreads LSQUnits as specified by the
user. This had the effect of leaving some LSQUnits uninitialized when the
number of threads was less than MaxThreads, and when adding statistics to the
LSQUnit that must be initialized, this caused the stats initialization check to
fail. By dynamically instantiating LSQUnits, they are all initialized and this
avoids uninitialized LSQUnits from floating around during runtime.
2013-09-11 15:34:50 -05:00
Andreas Hansson ea40297018 cpu: Move the branch predictor out of the BaseCPU
The branch predictor is guarded by having either the in-order or
out-of-order CPU as one of the available CPU models and therefore
should not be used in the BaseCPU. This patch moves the parameter to
the relevant CPU classes.
2013-09-04 13:22:56 -04:00
Andreas Hansson 9b2effd9e2 cpu: Fix a bug in the O3 CPU introduced by the cache line patch
This patch fixes a bug in the O3 fetch stage that was introduced when
the cache line size was moved to the system. By mistake, the
initialisation and resetting of the fetch stage was merged and put in
the constructor. The resetting is now re-added where it should be.
2013-08-19 03:52:24 -04:00
Andreas Hansson d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Akash Bagdia 7d7ab73862 sim: Add the notion of clock domains to all ClockedObjects
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.

The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).

The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.

All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.

The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
2013-06-27 05:49:49 -04:00
Andreas Hansson 10650fc525 cpu: Consider instructions waiting for FU completion in draining
This patch changes the IEW drain check to include the FU pool as there
can be instructions that are "stored" in FU completion events and thus
not covered by the existing checks. With this patch, we simply include
a check to see if all the FUs are considered non-busy in the next
tick.

Without this patch, the pc-switcheroo-full regression fails after
minor changes to the cache timing (aligning to clock edge).
2013-06-27 05:49:49 -04:00
Dam Sunwoo e8381142b0 sim: separate nextCycle() and clockEdge() in clockedObjects
Previously, nextCycle() could return the *current* cycle if the current tick was
already aligned with the clock edge. This behavior is not only confusing (not
quite what the function name implies), but also caused problems in the
drainResume() function. When exiting/re-entering the sim loop (e.g., to take
checkpoints), the CPUs will drain and resume. Due to the previous behavior of
nextCycle(), the CPU tick events were being rescheduled in the same ticks that
were already processed before draining. This caused divergence from runs that
did not exit/re-entered the sim loop. (Initially a cycle difference, but a
significant impact later on.)

This patch separates out the two behaviors (nextCycle() and clockEdge()),
uses nextCycle() in drainResume, and uses clockEdge() everywhere else.
Nothing (other than name) should change except for the drainResume timing.
2013-04-22 13:20:31 -04:00
Ali Saidi c9e4678c16 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-04-22 13:20:31 -04:00
Nilay Vaish ac778b1d02 o3cpu: commit: changes interrupt handling
Currently the commit stage keeps a local copy of the interrupt object.
Since the interrupt is usually handled several cycles after the commit
stage becomes aware of it, it is possible that the local copy of the
interrupt object may not be the interrupt that is actually handled.
It is possible that another interrupt occurred in the
interval between interrupt detection and interrupt handling.

This patch creates a copy of the interrupt just before the interrupt
is handled. The local copy is ignored.
2013-03-29 14:05:26 -05:00
Andreas Hansson 08c1835bef cpu: Remove CpuPort and use MasterPort in the CPU classes
This patch changes the port in the CPU classes to use MasterPort
instead of the derived CpuPort. The functions of the CpuPort are now
distributed across the relevant subclasses. The port accessor
functions (getInstPort and getDataPort) now return a MasterPort
instead of a CpuPort. This simplifies creating derivative CPUs that do
not use the CpuPort.
2013-03-26 14:46:42 -04:00
Ali Saidi f205d83359 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-03-04 23:33:47 -05:00
Andreas Hansson c10098f28b scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
2013-02-19 05:56:06 -05:00
Geoffrey Blake ca96e7bff1 cpu: Avoid duplicate entries in tracking structures for writes to misc regs
setMiscReg currently makes a new entry for each write to a misc reg without
checking for duplicates, this can cause a triggering of the assert if an
instruction get replayed and writes to the same misc regs multiple times.
This fix prevents duplicate entries and instead updates the value.
2013-02-15 17:40:10 -05:00
Geoffrey Blake 8e79c68936 cpu: Fix rename mis-handling serializing instructions when resource constrained
The rename can mis-handle serializing instructions (i.e. strex) if it gets
into a resource constrained situation and the serializing instruction has
to be placed on the skid buffer to handle blocking.  In this situation the
instruction informs the pipeline it is serializing and logs that the next
instruction must be serialized, but since we are blocking the pipeline
defers this action to place the serializing instruction and
incoming instructions into the skid buffer. When resuming from blocking,
rename will pull the serializing instruction from the skid buffer and
the current logic will see this as the "next" instruction that has to
be serialized and because of flags set on the serializing instruction,
it passes through the pipeline stage as normal and resets rename to
non-serializing.  This causes instructions to follow the serializing inst
incorrectly and eventually leads to an error in the pipeline. To fix this
rename should check first if it has to block before checking for serializing
instructions.
2013-02-15 17:40:10 -05:00
Matt Horsnell e88e7d88b9 o3: fix tick used for renaming and issue with range selection
Fixes the tick used from rename:
- previously this gathered the tick on leaving rename which was always 1 less
  than the dispatch. This conflated the decode ticks when back pressure built
  in the pipeline.
- now picks up tick on entry.

Added --store_completions flag:
- will additionally display the store completion tail in the viewer.
- this highlights periods when large numbers of stores are outstanding (>16 LSQ
  blocking)

Allows selection by tick range (previously this caused an infinite loop)
2013-02-15 17:40:09 -05:00
Andreas Sandberg b904bd5437 sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.

To make memory mode tests cleaner, the following methods are added to
the System class:

 * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
 * isTimingMode() -- True if the memory mode is 'timing'.
 * bypassCaches() -- True if caches should be bypassed.

The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
2013-02-15 17:40:09 -05:00
Andreas Sandberg 1eec115c31 cpu: Refactor memory system checks
CPUs need to test that the memory system is in the right mode in two
places, when the CPU is initialized (unless it's switched out) and on
a drainResume(). This led to some code duplication in the CPU
models. This changeset introduces the verifyMemoryMode() method which
is called by BaseCPU::init() if the CPU isn't switched out. The
individual CPU models are responsible for calling this method when
resuming from a drain as this code is CPU model specific.
2013-02-15 17:40:08 -05:00
Andreas Sandberg 7f1263f144 cpu: Make checker CPUs inherit from CheckerCPU in the Python hierarchy
Checker CPUs currently don't inherit from the CheckerCPU in the Python
object hierarchy. This has two consequences:
 * It makes CPU model discovery from the Python world somewhat
   complicated as there is no way of testing if a CPU is a checker.
 * Parameters are duplicated in the checker configuration
   specification.

This changeset makes all checker CPUs inherit from the base checker
CPU class.
2013-02-15 17:40:08 -05:00
Andreas Sandberg 7cd1fd4324 cpu: Add CPU metadata om the Python classes
The configuration scripts currently hard-code the requirements of each
CPU. This is clearly not optimal as it makes writing new configuration
scripts painful and adding new CPU models requires existing scripts to
be updated. This patch adds the following class methods to the base
CPU and all relevant CPUs:

 * memory_mode -- Return a string describing the current memory mode
                  (invalid/atomic/timing).

 * require_caches -- Does the CPU model require caches?

 * support_take_over -- Does the CPU support CPU handover?
2013-02-15 17:40:08 -05:00
Ali Saidi 4412046041 cpu: include set in o3/commit_impl.
While the majority of compilers seemed to pickup set from else where,
one version of gcc 4.7 complains, so explictly add it.
2013-02-15 17:40:08 -05:00