Commit graph

613 commits

Author SHA1 Message Date
Steve Reinhardt
9bd017b8ae cpu/o3: clean up scoreboard object
It had a bunch of fields (and associated constructor
parameters) thet it didn't really use, and the array
initialization was needlessly verbose.

Also just hardwired the getReg() method to aleays
return true for misc regs, rather than having an array
of bits that we always kept marked as ready.
2013-10-15 14:22:43 -04:00
Steve Reinhardt
c009d0eb2a cpu/o3: clean up physical register file
No need for PhysRegFile to be a template class, or
have a pointer back to the CPU.  Also made some methods
for checking the physical register type (int vs. float)
based on the phys reg index, which will come in handy later.
2013-10-15 14:22:43 -04:00
Steve Reinhardt
7aa423acad cpu: clean up architectural register classification
Move from a poorly documented scheme where the mapping
of unified architectural register indices to register
classes is hardcoded all over to one where there's an
enum for the register classes and a function that
encapsulates the mapping.
2013-10-15 14:22:42 -04:00
Joel Hestness
a1f9081bab cpu: Dynamically instantiate O3 CPU LSQUnits
Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's
object, but it would only initialize numThreads LSQUnits as specified by the
user. This had the effect of leaving some LSQUnits uninitialized when the
number of threads was less than MaxThreads, and when adding statistics to the
LSQUnit that must be initialized, this caused the stats initialization check to
fail. By dynamically instantiating LSQUnits, they are all initialized and this
avoids uninitialized LSQUnits from floating around during runtime.
2013-09-11 15:34:50 -05:00
Andreas Hansson
ea40297018 cpu: Move the branch predictor out of the BaseCPU
The branch predictor is guarded by having either the in-order or
out-of-order CPU as one of the available CPU models and therefore
should not be used in the BaseCPU. This patch moves the parameter to
the relevant CPU classes.
2013-09-04 13:22:56 -04:00
Andreas Hansson
9b2effd9e2 cpu: Fix a bug in the O3 CPU introduced by the cache line patch
This patch fixes a bug in the O3 fetch stage that was introduced when
the cache line size was moved to the system. By mistake, the
initialisation and resetting of the fetch stage was merged and put in
the constructor. The resetting is now re-added where it should be.
2013-08-19 03:52:24 -04:00
Andreas Hansson
d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Akash Bagdia
7d7ab73862 sim: Add the notion of clock domains to all ClockedObjects
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.

The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).

The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.

All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.

The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
2013-06-27 05:49:49 -04:00
Andreas Hansson
10650fc525 cpu: Consider instructions waiting for FU completion in draining
This patch changes the IEW drain check to include the FU pool as there
can be instructions that are "stored" in FU completion events and thus
not covered by the existing checks. With this patch, we simply include
a check to see if all the FUs are considered non-busy in the next
tick.

Without this patch, the pc-switcheroo-full regression fails after
minor changes to the cache timing (aligning to clock edge).
2013-06-27 05:49:49 -04:00
Dam Sunwoo
e8381142b0 sim: separate nextCycle() and clockEdge() in clockedObjects
Previously, nextCycle() could return the *current* cycle if the current tick was
already aligned with the clock edge. This behavior is not only confusing (not
quite what the function name implies), but also caused problems in the
drainResume() function. When exiting/re-entering the sim loop (e.g., to take
checkpoints), the CPUs will drain and resume. Due to the previous behavior of
nextCycle(), the CPU tick events were being rescheduled in the same ticks that
were already processed before draining. This caused divergence from runs that
did not exit/re-entered the sim loop. (Initially a cycle difference, but a
significant impact later on.)

This patch separates out the two behaviors (nextCycle() and clockEdge()),
uses nextCycle() in drainResume, and uses clockEdge() everywhere else.
Nothing (other than name) should change except for the drainResume timing.
2013-04-22 13:20:31 -04:00
Ali Saidi
c9e4678c16 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-04-22 13:20:31 -04:00
Nilay Vaish
ac778b1d02 o3cpu: commit: changes interrupt handling
Currently the commit stage keeps a local copy of the interrupt object.
Since the interrupt is usually handled several cycles after the commit
stage becomes aware of it, it is possible that the local copy of the
interrupt object may not be the interrupt that is actually handled.
It is possible that another interrupt occurred in the
interval between interrupt detection and interrupt handling.

This patch creates a copy of the interrupt just before the interrupt
is handled. The local copy is ignored.
2013-03-29 14:05:26 -05:00
Andreas Hansson
08c1835bef cpu: Remove CpuPort and use MasterPort in the CPU classes
This patch changes the port in the CPU classes to use MasterPort
instead of the derived CpuPort. The functions of the CpuPort are now
distributed across the relevant subclasses. The port accessor
functions (getInstPort and getDataPort) now return a MasterPort
instead of a CpuPort. This simplifies creating derivative CPUs that do
not use the CpuPort.
2013-03-26 14:46:42 -04:00
Ali Saidi
f205d83359 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-03-04 23:33:47 -05:00
Andreas Hansson
c10098f28b scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
2013-02-19 05:56:06 -05:00
Geoffrey Blake
ca96e7bff1 cpu: Avoid duplicate entries in tracking structures for writes to misc regs
setMiscReg currently makes a new entry for each write to a misc reg without
checking for duplicates, this can cause a triggering of the assert if an
instruction get replayed and writes to the same misc regs multiple times.
This fix prevents duplicate entries and instead updates the value.
2013-02-15 17:40:10 -05:00
Geoffrey Blake
8e79c68936 cpu: Fix rename mis-handling serializing instructions when resource constrained
The rename can mis-handle serializing instructions (i.e. strex) if it gets
into a resource constrained situation and the serializing instruction has
to be placed on the skid buffer to handle blocking.  In this situation the
instruction informs the pipeline it is serializing and logs that the next
instruction must be serialized, but since we are blocking the pipeline
defers this action to place the serializing instruction and
incoming instructions into the skid buffer. When resuming from blocking,
rename will pull the serializing instruction from the skid buffer and
the current logic will see this as the "next" instruction that has to
be serialized and because of flags set on the serializing instruction,
it passes through the pipeline stage as normal and resets rename to
non-serializing.  This causes instructions to follow the serializing inst
incorrectly and eventually leads to an error in the pipeline. To fix this
rename should check first if it has to block before checking for serializing
instructions.
2013-02-15 17:40:10 -05:00
Matt Horsnell
e88e7d88b9 o3: fix tick used for renaming and issue with range selection
Fixes the tick used from rename:
- previously this gathered the tick on leaving rename which was always 1 less
  than the dispatch. This conflated the decode ticks when back pressure built
  in the pipeline.
- now picks up tick on entry.

Added --store_completions flag:
- will additionally display the store completion tail in the viewer.
- this highlights periods when large numbers of stores are outstanding (>16 LSQ
  blocking)

Allows selection by tick range (previously this caused an infinite loop)
2013-02-15 17:40:09 -05:00
Andreas Sandberg
b904bd5437 sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.

To make memory mode tests cleaner, the following methods are added to
the System class:

 * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
 * isTimingMode() -- True if the memory mode is 'timing'.
 * bypassCaches() -- True if caches should be bypassed.

The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
2013-02-15 17:40:09 -05:00
Andreas Sandberg
1eec115c31 cpu: Refactor memory system checks
CPUs need to test that the memory system is in the right mode in two
places, when the CPU is initialized (unless it's switched out) and on
a drainResume(). This led to some code duplication in the CPU
models. This changeset introduces the verifyMemoryMode() method which
is called by BaseCPU::init() if the CPU isn't switched out. The
individual CPU models are responsible for calling this method when
resuming from a drain as this code is CPU model specific.
2013-02-15 17:40:08 -05:00
Andreas Sandberg
7f1263f144 cpu: Make checker CPUs inherit from CheckerCPU in the Python hierarchy
Checker CPUs currently don't inherit from the CheckerCPU in the Python
object hierarchy. This has two consequences:
 * It makes CPU model discovery from the Python world somewhat
   complicated as there is no way of testing if a CPU is a checker.
 * Parameters are duplicated in the checker configuration
   specification.

This changeset makes all checker CPUs inherit from the base checker
CPU class.
2013-02-15 17:40:08 -05:00
Andreas Sandberg
7cd1fd4324 cpu: Add CPU metadata om the Python classes
The configuration scripts currently hard-code the requirements of each
CPU. This is clearly not optimal as it makes writing new configuration
scripts painful and adding new CPU models requires existing scripts to
be updated. This patch adds the following class methods to the base
CPU and all relevant CPUs:

 * memory_mode -- Return a string describing the current memory mode
                  (invalid/atomic/timing).

 * require_caches -- Does the CPU model require caches?

 * support_take_over -- Does the CPU support CPU handover?
2013-02-15 17:40:08 -05:00
Ali Saidi
4412046041 cpu: include set in o3/commit_impl.
While the majority of compilers seemed to pickup set from else where,
one version of gcc 4.7 complains, so explictly add it.
2013-02-15 17:40:08 -05:00
Ali Saidi
7ae06a3b3b cpu: fix case with o3 cpu blocking and unblocking decode in cycle
Fix a case in the O3 CPU where the decode stage blocks and unblocks in a
single cycle sending both signals to fetch which causes an assert or worse.
The previous check could never work before since the status was set to Blocked
before a test for the status being Unblocking was executed.
2013-02-15 17:40:08 -05:00
Ali Saidi
b84bd3028c cpu: Fix a livelock in the o3 cpu.
Check if an instruction just enabled interrupts and we've previously had an
interrupt pending that was not handled because interrupts were subsequently
disabled before the pipeline reached a place to handle the interrupt. In that
case squash now to make sure the interrupt is handled.
2013-02-15 17:40:07 -05:00
Nilay Vaish ext:(%2C%20Timothy%20Jones%20%3Ctimothy.jones%40cl.cam.ac.uk%3E)
dbeabedaf0 branch predictor: move out of o3 and inorder cpus
This patch moves the branch predictor files in the o3 and inorder directories
to src/cpu/pred. This allows sharing the branch predictor across different
cpu models.

This patch was originally posted by Timothy Jones in July 2010
but never made it to the repository.

--HG--
rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc
rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh
rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh
rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh
2013-01-24 12:28:51 -06:00
Andrea Pellegrini
11d5ffa108 o3 cpu: fix zero reg problem
There was an issue w/ the rename logic, which would assign a previous physical
register to the ZeroReg architectural register in x86.  This issue was giving
problems for instructions squashed in threads w/ ID different from 0,
sometimes allowing non-mispredicted instructions to obtain a value different
from zero when reading the zeroReg.
2013-01-22 00:13:28 -06:00
Nilay Vaish
fc57ae6401 x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch
The changes made by the changeset 270c9a75e91f do not work well with switching
of cpus. The problem is that decoder for the old thread context holds state
that is not taken over by the new decoder.

This patch adds a takeOverFrom() function to Decoder class in each ISA. Except
for x86, functions in other ISAs are blank. For x86, the function copies state
from the old decoder to the new decoder.
2013-01-22 00:10:10 -06:00
Joel Hestness
1429d21244 O3 IEW: Make incrWb and decrWb clearer
Move the increment/decrement of wbOutstanding outside of the comparison
in incrWb and decrWb in the IEW. This also fixes a compiler bug with gcc
4.4.7, which incorrectly optimizes "-- ==" as "-=".
2013-01-19 15:14:54 -06:00
Nilay Vaish
25ec278a0b x86: Changes to decoder, corrects 9376
The changes made by the changeset 9376 were not quite correct. The patch made
changes to the code which resulted in decoder not getting initialized correctly
when the state was restored from a checkpoint.

This patch adds a startup function to each ISA object. For x86, this function
sets the required state in the decoder. For other ISAs, the function is empty
right now.
2013-01-12 22:09:48 -06:00
Andreas Sandberg
009970f59b cpu: Unify the serialization code for all of the CPU models
Cleanup the serialization code for the simple CPUs and the O3 CPU. The
CPU-specific code has been replaced with a (un)serializeThread that
serializes the thread state / context of a specific thread. Assuming
that the thread state class uses the CPU-specific thread state uses
the base thread state serialization code, this allows us to restore a
checkpoint with any of the CPU models.
2013-01-07 13:05:52 -05:00
Andreas Sandberg
1814a85a05 cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.

Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
52ff37caa3 cpu: Fix broken thread context handover
The thread context handover code used to break when multiple handovers
were performed during the same quiesce period. Previously, the thread
contexts would assign the TC pointer in the old quiesce event to the
new TC. This obviously broke in cases where multiple switches were
performed within the same quiesce period, in which case the TC pointer
in the quiesce event would point to an old CPU.

The new implementation deschedules pending quiesce events in the old
TC and schedules a new quiesce event in the new TC. The code has been
refactored to remove most of the code duplication.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
fca4fea769 cpu: Fix O3 LSQ debug dumping constness and formatting 2013-01-07 13:05:46 -05:00
Andreas Sandberg
8db27aa230 cpu: Fix broken squashAfter implementation in O3 CPU
Commit can currently both commit and squash in the same cycle. This
confuses other stages since the signals coming from the commit stage
can only signal either a squash or a commit in a cycle. This changeset
changes the behavior of squashAfter so that it commits all
instructions, including the instruction that requested the squash, in
the first cycle and then starts to squash in the next cycle.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
a2077ccf02 o3 cpu: Remove unused variables 2013-01-07 13:05:45 -05:00
Andreas Sandberg
2cfe62adc4 cpu: Rename defer_registration->switched_out
The defer_registration parameter is used to prevent a CPU from
initializing at startup, leaving it in the "switched out" mode. The
name of this parameter (and the help string) is confusing. This patch
renames it to switched_out, which should be more descriptive.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
901258c22b cpu: Correctly call parent on switchOut() and takeOverFrom()
This patch cleans up the CPU switching functionality by making sure
that CPU models consistently call the parent on switchOut() and
takeOverFrom(). This has the following implications that might alter
current functionality:

 * The call to BaseCPU::switchout() in the O3 CPU is moved from
   signalDrained() (!) to switchOut().

 * A call to BaseSimpleCPU::switchOut() is introduced in the simple
   CPUs.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
4ae02295d5 cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order
to do serialization. This was a bit of a hack involving two static
SimpleThread instances and a magic constructor that was only used by
the O3 CPU.

This patch moves the ThreadContext serialization code into two global
procedures that, in addition to the normal serialization parameters,
take a ThreadContext reference as a parameter. This allows us to reuse
the serialization code in all ThreadContext implementations.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
6daada2701 cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.

This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
e2dad8236a cpu: Implement a flat register interface in thread contexts
Some architectures map registers differently depending on their mode
of operations. There is currently no architecture independent way of
accessing all registers. This patch introduces a flat register
interface to the ThreadContext class. This interface is useful, for
example, when serializing or copying thread contexts.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
7eb0fb8b6e cpu: Check that the memory system is in the correct mode
This patch adds checks to all CPU models to make sure that the memory
system is in the correct mode at startup and when resuming after a
drain.  Previously, we only checked that the memory system was in the
right mode when resuming. This is inadequate since this is a
configuration error that should be detected at startup as well as when
resuming. Additionally, since the check was done using an assert, it
wasn't performed when NDEBUG was set (e.g., the fast target).
2013-01-07 13:05:41 -05:00
Andreas Sandberg
3db3f83a5e arch: Make the ISA class inherit from SimObject
The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.

This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.
2013-01-07 13:05:35 -05:00
Ali Saidi
69d419f313 o3: Fix issue with LLSC ordering and speculation
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked
address. Previously we relied on the cache to handle the locking for us, however
some users on the gem5 mailing list reported a case where the cpu speculatively
executes a ll operation after a pending sc operation in the pipeline and that
makes the cache monitor valid. This should handle that case by invaliding the
local monitor.
2013-01-07 13:05:33 -05:00
Ali Saidi
5146a69835 cpu: rename the misleading inSyscall to noSquashFromTC
isSyscall was originally created because during handling of a syscall in SE
mode the threadcontext had to be updated. However, in many places this is used
in FS mode (e.g. fault handlers) and the name doesn't make much sense. The
boolean actually stops gem5 from squashing speculative and non-committed state
when a write to a threadcontext happens, so re-name the variable to something
more appropriate
2013-01-07 13:05:33 -05:00
Gabe Black
e17c375ddd Decoder: Remove the thread context get/set from the decoder.
This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-01-04 19:00:45 -06:00
Erik Tomusk
3dc7e4f496 TournamentBP: Fix some bugs with table sizes and counters
globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled.
globalHistoryBits controls how much history is kept, global and choice
predictor sizes control how much of that history is used when accessing
predictor tables. This way, global and choice predictors can actually be
different sizes, and it is no longer possible to walk off the predictor arrays
and cause a seg fault.

There are now individual thresholds for choice, global, and local saturating
counters, so that taken/not taken decisions are correct even when the
predictors' counters' sizes are different.

The interface for localPredictorSize has been removed from TournamentBP because
the value can be calculated from localHistoryBits.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-06 09:31:06 -06:00
Nathanael Premillieu
eb899407c5 o3 cpu: remove some unused buggy functions in the lsq
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-06 04:36:51 -06:00
Andreas Sandberg
b81a977e6a sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
2012-11-02 11:32:01 -05:00
Andreas Sandberg
eb703a4b4e cpu: O3 add a header declaring the DerivO3CPU
SWIG needs a complete declaration of all wrapped objects. This patch
adds a header file with the DerivO3CPU class and includes it in the
SWIG interface.

--HG--
rename : src/cpu/o3/cpu_builder.cc => src/cpu/o3/deriv.cc
2012-11-02 11:32:01 -05:00