Commit graph

1343 commits

Author SHA1 Message Date
Dam Sunwoo 2c1e344313 cpu: generate SimPoint basic block vector profiles
This patch is based on http://reviews.m5sim.org/r/1474/ originally written by
Mitch Hayenga. Basic block vectors are generated (simpoint.bb.gz in simout
folder) based on start and end addresses of basic blocks.

Some comments to the original patch are addressed and hooks are added to create
and resume from checkpoints based on instruction counts dictated by external
SimPoint analysis tools.

SimPoint creation/resuming options will be implemented as a separate patch.
2013-04-22 13:20:31 -04:00
Ali Saidi c9e4678c16 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-04-22 13:20:31 -04:00
Nilay Vaish ac778b1d02 o3cpu: commit: changes interrupt handling
Currently the commit stage keeps a local copy of the interrupt object.
Since the interrupt is usually handled several cycles after the commit
stage becomes aware of it, it is possible that the local copy of the
interrupt object may not be the interrupt that is actually handled.
It is possible that another interrupt occurred in the
interval between interrupt detection and interrupt handling.

This patch creates a copy of the interrupt just before the interrupt
is handled. The local copy is ignored.
2013-03-29 14:05:26 -05:00
Andreas Hansson 08c1835bef cpu: Remove CpuPort and use MasterPort in the CPU classes
This patch changes the port in the CPU classes to use MasterPort
instead of the derived CpuPort. The functions of the CpuPort are now
distributed across the relevant subclasses. The port accessor
functions (getInstPort and getDataPort) now return a MasterPort
instead of a CpuPort. This simplifies creating derivative CPUs that do
not use the CpuPort.
2013-03-26 14:46:42 -04:00
Andreas Hansson 2ca42cd626 cpu: Avoid including inorder TLBUnit to avoid gcc LTO bug
This patch comments out the inclusion of the inorder TLBUnit which is
only used in the 9-stage pipeline. With the TLBUnit present, gcc >=
4.6 in combination with LTO ends up throwing away the definition of
the TLBUnit destructor, and consequently fail to link. See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53808 for more details
about the bug, and http://gcc.gnu.org/ml/gcc/2012-06/msg00397.html for
the discussion thread that also touches on similar issues seen with
clang.
2013-03-20 06:41:23 -04:00
Andreas Sandberg fc6f569d94 cpu: Fix state transition bug in the traffic generator
The traffic generator used to incorrectly determine the next state in
when state 0 had a non-zero probability. Due to the way the next
transition was determined, state 0 could never be entered other than
as an initial state. This changeset updates the transitition() method
to correctly handle such cases and cases where the transition matrix
is a 1x1 matrix.
2013-03-12 18:41:29 +01:00
Ali Saidi f205d83359 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-03-04 23:33:47 -05:00
Andreas Hansson a62afd094b scons: Fix warnings issued by clang 3.2svn (XCode 4.6)
This patch fixes the warnings that clang3.2svn emit due to the "-Wall"
flag. There is one case of an uninitialised value in the ARM neon ISA
description, and then a whole range of unused private fields that are
pruned.
2013-02-19 05:56:08 -05:00
Andreas Hansson 319443d42d scons: Add warning for missing declarations
This patch enables warnings for missing declarations. To avoid issues
with SWIG-generated code, the warning is only applied to non-SWIG
code.
2013-02-19 05:56:07 -05:00
Andreas Hansson c10098f28b scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
2013-02-19 05:56:06 -05:00
Andreas Hansson 5c7ebee434 x86: Move APIC clock divider to Python
This patch moves the 16x APIC clock divider to the Python code to
avoid the post-instantiation modifications to the clock. The x86 APIC
was the only object setting the clock after creation time and this
required some custom functionality and configuration. With this patch,
the clock multiplier is moved to the Python code and the objects are
instantiated with the appropriate clock.
2013-02-19 05:56:06 -05:00
Andreas Hansson 0622f30961 mem: Add predecessor to SenderState base class
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.

There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
2013-02-19 05:56:05 -05:00
Andreas Sandberg 3af59ab386 cpu: Document exec trace flags 2013-02-15 17:40:10 -05:00
Geoffrey Blake ca96e7bff1 cpu: Avoid duplicate entries in tracking structures for writes to misc regs
setMiscReg currently makes a new entry for each write to a misc reg without
checking for duplicates, this can cause a triggering of the assert if an
instruction get replayed and writes to the same misc regs multiple times.
This fix prevents duplicate entries and instead updates the value.
2013-02-15 17:40:10 -05:00
Geoffrey Blake 8e79c68936 cpu: Fix rename mis-handling serializing instructions when resource constrained
The rename can mis-handle serializing instructions (i.e. strex) if it gets
into a resource constrained situation and the serializing instruction has
to be placed on the skid buffer to handle blocking.  In this situation the
instruction informs the pipeline it is serializing and logs that the next
instruction must be serialized, but since we are blocking the pipeline
defers this action to place the serializing instruction and
incoming instructions into the skid buffer. When resuming from blocking,
rename will pull the serializing instruction from the skid buffer and
the current logic will see this as the "next" instruction that has to
be serialized and because of flags set on the serializing instruction,
it passes through the pipeline stage as normal and resets rename to
non-serializing.  This causes instructions to follow the serializing inst
incorrectly and eventually leads to an error in the pipeline. To fix this
rename should check first if it has to block before checking for serializing
instructions.
2013-02-15 17:40:10 -05:00
Matt Horsnell e88e7d88b9 o3: fix tick used for renaming and issue with range selection
Fixes the tick used from rename:
- previously this gathered the tick on leaving rename which was always 1 less
  than the dispatch. This conflated the decode ticks when back pressure built
  in the pipeline.
- now picks up tick on entry.

Added --store_completions flag:
- will additionally display the store completion tail in the viewer.
- this highlights periods when large numbers of stores are outstanding (>16 LSQ
  blocking)

Allows selection by tick range (previously this caused an infinite loop)
2013-02-15 17:40:09 -05:00
Andreas Sandberg b904bd5437 sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.

To make memory mode tests cleaner, the following methods are added to
the System class:

 * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
 * isTimingMode() -- True if the memory mode is 'timing'.
 * bypassCaches() -- True if caches should be bypassed.

The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
2013-02-15 17:40:09 -05:00
Andreas Sandberg 1eec115c31 cpu: Refactor memory system checks
CPUs need to test that the memory system is in the right mode in two
places, when the CPU is initialized (unless it's switched out) and on
a drainResume(). This led to some code duplication in the CPU
models. This changeset introduces the verifyMemoryMode() method which
is called by BaseCPU::init() if the CPU isn't switched out. The
individual CPU models are responsible for calling this method when
resuming from a drain as this code is CPU model specific.
2013-02-15 17:40:08 -05:00
Andreas Sandberg 7f1263f144 cpu: Make checker CPUs inherit from CheckerCPU in the Python hierarchy
Checker CPUs currently don't inherit from the CheckerCPU in the Python
object hierarchy. This has two consequences:
 * It makes CPU model discovery from the Python world somewhat
   complicated as there is no way of testing if a CPU is a checker.
 * Parameters are duplicated in the checker configuration
   specification.

This changeset makes all checker CPUs inherit from the base checker
CPU class.
2013-02-15 17:40:08 -05:00
Andreas Sandberg 7cd1fd4324 cpu: Add CPU metadata om the Python classes
The configuration scripts currently hard-code the requirements of each
CPU. This is clearly not optimal as it makes writing new configuration
scripts painful and adding new CPU models requires existing scripts to
be updated. This patch adds the following class methods to the base
CPU and all relevant CPUs:

 * memory_mode -- Return a string describing the current memory mode
                  (invalid/atomic/timing).

 * require_caches -- Does the CPU model require caches?

 * support_take_over -- Does the CPU support CPU handover?
2013-02-15 17:40:08 -05:00
Ali Saidi 4412046041 cpu: include set in o3/commit_impl.
While the majority of compilers seemed to pickup set from else where,
one version of gcc 4.7 complains, so explictly add it.
2013-02-15 17:40:08 -05:00
Ali Saidi 7ae06a3b3b cpu: fix case with o3 cpu blocking and unblocking decode in cycle
Fix a case in the O3 CPU where the decode stage blocks and unblocks in a
single cycle sending both signals to fetch which causes an assert or worse.
The previous check could never work before since the status was set to Blocked
before a test for the status being Unblocking was executed.
2013-02-15 17:40:08 -05:00
Ali Saidi b84bd3028c cpu: Fix a livelock in the o3 cpu.
Check if an instruction just enabled interrupts and we've previously had an
interrupt pending that was not handled because interrupts were subsequently
disabled before the pipeline reached a place to handle the interrupt. In that
case squash now to make sure the interrupt is handled.
2013-02-15 17:40:07 -05:00
Nilay Vaish ext:(%2C%20Timothy%20Jones%20%3Ctimothy.jones%40cl.cam.ac.uk%3E) dbeabedaf0 branch predictor: move out of o3 and inorder cpus
This patch moves the branch predictor files in the o3 and inorder directories
to src/cpu/pred. This allows sharing the branch predictor across different
cpu models.

This patch was originally posted by Timothy Jones in July 2010
but never made it to the repository.

--HG--
rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc
rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh
rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh
rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh
2013-01-24 12:28:51 -06:00
Andrea Pellegrini 11d5ffa108 o3 cpu: fix zero reg problem
There was an issue w/ the rename logic, which would assign a previous physical
register to the ZeroReg architectural register in x86.  This issue was giving
problems for instructions squashed in threads w/ ID different from 0,
sometimes allowing non-mispredicted instructions to obtain a value different
from zero when reading the zeroReg.
2013-01-22 00:13:28 -06:00
Nilay Vaish fc57ae6401 x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch
The changes made by the changeset 270c9a75e91f do not work well with switching
of cpus. The problem is that decoder for the old thread context holds state
that is not taken over by the new decoder.

This patch adds a takeOverFrom() function to Decoder class in each ISA. Except
for x86, functions in other ISAs are blank. For x86, the function copies state
from the old decoder to the new decoder.
2013-01-22 00:10:10 -06:00
Joel Hestness 1429d21244 O3 IEW: Make incrWb and decrWb clearer
Move the increment/decrement of wbOutstanding outside of the comparison
in incrWb and decrWb in the IEW. This also fixes a compiler bug with gcc
4.4.7, which incorrectly optimizes "-- ==" as "-=".
2013-01-19 15:14:54 -06:00
Nilay Vaish 5b6f972750 ruby: remove calls to g_system_ptr->getTime()
This patch further removes calls to g_system_ptr->getTime() where ever other
clocked objects are available for providing current time.
2013-01-17 13:10:12 -06:00
Nilay Vaish f7c0ba406e base simple cpu: removes commented out code about cache ops 2013-01-12 22:11:16 -06:00
Nilay Vaish 25ec278a0b x86: Changes to decoder, corrects 9376
The changes made by the changeset 9376 were not quite correct. The patch made
changes to the code which resulted in decoder not getting initialized correctly
when the state was restored from a checkpoint.

This patch adds a startup function to each ISA object. For x86, this function
sets the required state in the decoder. For other ISAs, the function is empty
right now.
2013-01-12 22:09:48 -06:00
Andreas Sandberg 009970f59b cpu: Unify the serialization code for all of the CPU models
Cleanup the serialization code for the simple CPUs and the O3 CPU. The
CPU-specific code has been replaced with a (un)serializeThread that
serializes the thread state / context of a specific thread. Assuming
that the thread state class uses the CPU-specific thread state uses
the base thread state serialization code, this allows us to restore a
checkpoint with any of the CPU models.
2013-01-07 13:05:52 -05:00
Andreas Sandberg e09e9fa279 cpu: Flush TLBs on switchOut()
This changeset inserts a TLB flush in BaseCPU::switchOut to prevent
stale translations when doing repeated switching. Additionally, the
TLB flushing functionality is exported to the Python to make debugging
of switching/checkpointing easier.

A simulation script will typically use the TLB flushing functionality
to generate a reference trace. The following sequence can be used to
simulate a handover (this depends on how drain is implemented, but is
generally the case) between identically configured CPU models:

  m5.drain(test_sys)
  [ cpu.flushTLBs() for cpu in test_sys.cpu ]
  m5.resume(test_sys)

The generated trace should normally be identical to a trace generated
when switching between identically configured CPU models or
checkpointing and resuming.
2013-01-07 13:05:48 -05:00
Andreas Sandberg 1814a85a05 cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.

Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.
2013-01-07 13:05:46 -05:00
Andreas Sandberg 9e8003148f cpu: Make sure that a drained atomic CPU isn't executing ucode
Currently, the atomic CPU can be in the middle of a microcode sequence
when it is drained. This leads to two problems:

 * When switching to a hardware virtualized CPU, we obviously can't
   execute gem5 microcode.

 * Since curMacroStaticInst is populated when executing microcode,
   repeated switching between CPUs executing microcode leads to
   incorrect execution.

After applying this patch, the CPU will be on a proper instruction
boundary, which means that it is safe to switch to any CPU model
(including hardware virtualized ones). This changeset fixes a bug
where the multiple switches to the same atomic CPU sometimes corrupts
the target state because of dangling pointers to the currently
executing microinstruction.

Note: This changeset moves tick event descheduling from switchOut() to
drain(), which makes timing consistent between just draining a system
and draining /and/ switching between two atomic CPUs. This makes
debugging quite a lot easier (execution traces get the same timing),
but the latency of the last instruction before a drain will not be
accounted for correctly (it will always be 1 cycle).

Note 2: This changeset removes so_state variable, the locked variable,
and the tickEvent from checkpoints since none of them contain state
that needs to be preserved across checkpoints. The so_state is made
redundant because we don't use the drain state variable anymore, the
lock variable should never be set when the system is drained, and the
tick event isn't scheduled.
2013-01-07 13:05:46 -05:00
Andreas Sandberg f9bcf46371 cpu: Make sure that a drained timing CPU isn't executing ucode
Currently, the timing CPU can be in the middle of a microcode sequence
or multicycle (stayAtPC is true) instruction when it is drained. This
leads to two problems:

 * When switching to a hardware virtualized CPU, we obviously can't
   execute gem5 microcode.

 * If stayAtPC is true we might execute half of an instruction twice
   when restoring a checkpoint or switching CPUs, which leads to an
   incorrect execution.

After applying this patch, the CPU will be on a proper instruction
boundary, which means that it is safe to switch to any CPU model
(including hardware virtualized ones). This changeset also fixes a bug
where the timing CPU sometimes switches out with while stayAtPC is
true, which corrupts the target state after a CPU switch or
checkpoint.

Note: This changeset removes the so_state variable from checkpoints
since the drain state isn't used anymore.
2013-01-07 13:05:46 -05:00
Andreas Sandberg 52ff37caa3 cpu: Fix broken thread context handover
The thread context handover code used to break when multiple handovers
were performed during the same quiesce period. Previously, the thread
contexts would assign the TC pointer in the old quiesce event to the
new TC. This obviously broke in cases where multiple switches were
performed within the same quiesce period, in which case the TC pointer
in the quiesce event would point to an old CPU.

The new implementation deschedules pending quiesce events in the old
TC and schedules a new quiesce event in the new TC. The code has been
refactored to remove most of the code duplication.
2013-01-07 13:05:46 -05:00
Andreas Sandberg fca4fea769 cpu: Fix O3 LSQ debug dumping constness and formatting 2013-01-07 13:05:46 -05:00
Andreas Sandberg 8db27aa230 cpu: Fix broken squashAfter implementation in O3 CPU
Commit can currently both commit and squash in the same cycle. This
confuses other stages since the signals coming from the commit stage
can only signal either a squash or a commit in a cycle. This changeset
changes the behavior of squashAfter so that it commits all
instructions, including the instruction that requested the squash, in
the first cycle and then starts to squash in the next cycle.
2013-01-07 13:05:45 -05:00
Andreas Sandberg a2077ccf02 o3 cpu: Remove unused variables 2013-01-07 13:05:45 -05:00
Andreas Sandberg 2cfe62adc4 cpu: Rename defer_registration->switched_out
The defer_registration parameter is used to prevent a CPU from
initializing at startup, leaving it in the "switched out" mode. The
name of this parameter (and the help string) is confusing. This patch
renames it to switched_out, which should be more descriptive.
2013-01-07 13:05:45 -05:00
Andreas Sandberg f7da0fddd1 cpu: Remove unused params.hh header file in inorder CPU 2013-01-07 13:05:45 -05:00
Andreas Sandberg a7e0cbeb36 cpu: Introduce sanity checks when switching between CPUs
This patch introduces the following sanity checks when switching
between CPUs:

 * Check that the set of new and old CPUs do not overlap. Having an
   overlap between the set of new CPUs and the set of old CPUs is
   currently not supported. Doing such a switch used to result in the
   following assertion error:
     BaseCPU::takeOverFrom(BaseCPU*): \
       Assertion `!new_itb_port->isConnected()' failed.

 * Check that all new CPUs are in the switched out state.

 * Check that all old CPUs are in the switched in state.
2013-01-07 13:05:44 -05:00
Andreas Sandberg 901258c22b cpu: Correctly call parent on switchOut() and takeOverFrom()
This patch cleans up the CPU switching functionality by making sure
that CPU models consistently call the parent on switchOut() and
takeOverFrom(). This has the following implications that might alter
current functionality:

 * The call to BaseCPU::switchout() in the O3 CPU is moved from
   signalDrained() (!) to switchOut().

 * A call to BaseSimpleCPU::switchOut() is introduced in the simple
   CPUs.
2013-01-07 13:05:44 -05:00
Andreas Sandberg 4ae02295d5 cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order
to do serialization. This was a bit of a hack involving two static
SimpleThread instances and a magic constructor that was only used by
the O3 CPU.

This patch moves the ThreadContext serialization code into two global
procedures that, in addition to the normal serialization parameters,
take a ThreadContext reference as a parameter. This allows us to reuse
the serialization code in all ThreadContext implementations.
2013-01-07 13:05:44 -05:00
Andreas Sandberg 6daada2701 cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.

This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.
2013-01-07 13:05:44 -05:00
Andreas Sandberg e2dad8236a cpu: Implement a flat register interface in thread contexts
Some architectures map registers differently depending on their mode
of operations. There is currently no architecture independent way of
accessing all registers. This patch introduces a flat register
interface to the ThreadContext class. This interface is useful, for
example, when serializing or copying thread contexts.
2013-01-07 13:05:44 -05:00
Andreas Sandberg 17b47d35e1 arch: Move the ISA object to a separate section
After making the ISA an independent SimObject, it is serialized
automatically by the Python world. Previously, this just resulted in
an empty ISA section. This patch moves the contents of the ISA to that
section and removes the explicit ISA serialization from the thread
contexts, which makes it behave like a normal SimObject during
serialization.

Note: This patch breaks checkpoint backwards compatibility! Use the
cpt_upgrader.py utility to upgrade old checkpoints to the new format.
2013-01-07 13:05:42 -05:00
Andreas Sandberg 7eb0fb8b6e cpu: Check that the memory system is in the correct mode
This patch adds checks to all CPU models to make sure that the memory
system is in the correct mode at startup and when resuming after a
drain.  Previously, we only checked that the memory system was in the
right mode when resuming. This is inadequate since this is a
configuration error that should be detected at startup as well as when
resuming. Additionally, since the check was done using an assert, it
wasn't performed when NDEBUG was set (e.g., the fast target).
2013-01-07 13:05:41 -05:00
Andreas Hansson ccb6c64047 cpu: Share the send functionality between traffic generators
This patch moves the packet creating and sending to a member function
in the shared base class to avoid code duplication.
2013-01-07 13:05:37 -05:00
Andreas Hansson 1da209140c cpu: Add support for protobuf input for the trace generator
This patch adds support for reading input traces encoded using
protobuf according to what is done in the CommMonitor.

A follow-up patch adds a Python script that can be used to convert the
previously used ASCII traces to protobuf equivalents. The appropriate
regression input is updated as part of this patch.
2013-01-07 13:05:37 -05:00