Commit graph

731 commits

Author SHA1 Message Date
Andreas Sandberg
52ff37caa3 cpu: Fix broken thread context handover
The thread context handover code used to break when multiple handovers
were performed during the same quiesce period. Previously, the thread
contexts would assign the TC pointer in the old quiesce event to the
new TC. This obviously broke in cases where multiple switches were
performed within the same quiesce period, in which case the TC pointer
in the quiesce event would point to an old CPU.

The new implementation deschedules pending quiesce events in the old
TC and schedules a new quiesce event in the new TC. The code has been
refactored to remove most of the code duplication.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
fca4fea769 cpu: Fix O3 LSQ debug dumping constness and formatting 2013-01-07 13:05:46 -05:00
Andreas Sandberg
8db27aa230 cpu: Fix broken squashAfter implementation in O3 CPU
Commit can currently both commit and squash in the same cycle. This
confuses other stages since the signals coming from the commit stage
can only signal either a squash or a commit in a cycle. This changeset
changes the behavior of squashAfter so that it commits all
instructions, including the instruction that requested the squash, in
the first cycle and then starts to squash in the next cycle.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
a2077ccf02 o3 cpu: Remove unused variables 2013-01-07 13:05:45 -05:00
Andreas Sandberg
2cfe62adc4 cpu: Rename defer_registration->switched_out
The defer_registration parameter is used to prevent a CPU from
initializing at startup, leaving it in the "switched out" mode. The
name of this parameter (and the help string) is confusing. This patch
renames it to switched_out, which should be more descriptive.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
901258c22b cpu: Correctly call parent on switchOut() and takeOverFrom()
This patch cleans up the CPU switching functionality by making sure
that CPU models consistently call the parent on switchOut() and
takeOverFrom(). This has the following implications that might alter
current functionality:

 * The call to BaseCPU::switchout() in the O3 CPU is moved from
   signalDrained() (!) to switchOut().

 * A call to BaseSimpleCPU::switchOut() is introduced in the simple
   CPUs.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
4ae02295d5 cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order
to do serialization. This was a bit of a hack involving two static
SimpleThread instances and a magic constructor that was only used by
the O3 CPU.

This patch moves the ThreadContext serialization code into two global
procedures that, in addition to the normal serialization parameters,
take a ThreadContext reference as a parameter. This allows us to reuse
the serialization code in all ThreadContext implementations.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
6daada2701 cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.

This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
e2dad8236a cpu: Implement a flat register interface in thread contexts
Some architectures map registers differently depending on their mode
of operations. There is currently no architecture independent way of
accessing all registers. This patch introduces a flat register
interface to the ThreadContext class. This interface is useful, for
example, when serializing or copying thread contexts.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
7eb0fb8b6e cpu: Check that the memory system is in the correct mode
This patch adds checks to all CPU models to make sure that the memory
system is in the correct mode at startup and when resuming after a
drain.  Previously, we only checked that the memory system was in the
right mode when resuming. This is inadequate since this is a
configuration error that should be detected at startup as well as when
resuming. Additionally, since the check was done using an assert, it
wasn't performed when NDEBUG was set (e.g., the fast target).
2013-01-07 13:05:41 -05:00
Andreas Sandberg
3db3f83a5e arch: Make the ISA class inherit from SimObject
The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.

This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.
2013-01-07 13:05:35 -05:00
Ali Saidi
69d419f313 o3: Fix issue with LLSC ordering and speculation
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked
address. Previously we relied on the cache to handle the locking for us, however
some users on the gem5 mailing list reported a case where the cpu speculatively
executes a ll operation after a pending sc operation in the pipeline and that
makes the cache monitor valid. This should handle that case by invaliding the
local monitor.
2013-01-07 13:05:33 -05:00
Ali Saidi
5146a69835 cpu: rename the misleading inSyscall to noSquashFromTC
isSyscall was originally created because during handling of a syscall in SE
mode the threadcontext had to be updated. However, in many places this is used
in FS mode (e.g. fault handlers) and the name doesn't make much sense. The
boolean actually stops gem5 from squashing speculative and non-committed state
when a write to a threadcontext happens, so re-name the variable to something
more appropriate
2013-01-07 13:05:33 -05:00
Gabe Black
e17c375ddd Decoder: Remove the thread context get/set from the decoder.
This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-01-04 19:00:45 -06:00
Erik Tomusk
3dc7e4f496 TournamentBP: Fix some bugs with table sizes and counters
globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled.
globalHistoryBits controls how much history is kept, global and choice
predictor sizes control how much of that history is used when accessing
predictor tables. This way, global and choice predictors can actually be
different sizes, and it is no longer possible to walk off the predictor arrays
and cause a seg fault.

There are now individual thresholds for choice, global, and local saturating
counters, so that taken/not taken decisions are correct even when the
predictors' counters' sizes are different.

The interface for localPredictorSize has been removed from TournamentBP because
the value can be calculated from localHistoryBits.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-06 09:31:06 -06:00
Nathanael Premillieu
eb899407c5 o3 cpu: remove some unused buggy functions in the lsq
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-06 04:36:51 -06:00
Andreas Sandberg
b81a977e6a sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
2012-11-02 11:32:01 -05:00
Andreas Sandberg
eb703a4b4e cpu: O3 add a header declaring the DerivO3CPU
SWIG needs a complete declaration of all wrapped objects. This patch
adds a header file with the DerivO3CPU class and includes it in the
SWIG interface.

--HG--
rename : src/cpu/o3/cpu_builder.cc => src/cpu/o3/deriv.cc
2012-11-02 11:32:01 -05:00
Andreas Sandberg
ebe65a394b cpu: Add header files for checker CPUs
In order to create reliable SWIG wrappers, we need to include the
declaration of the wrapped class in the SWIG file. Previously, we
didn't expose the declaration of checker CPUs. This patch adds header
files for such CPUs and include them in the SWIG wrapper.

--HG--
rename : src/cpu/dummy_checker_builder.cc => src/cpu/dummy_checker.cc
rename : src/cpu/o3/checker_builder.cc => src/cpu/o3/checker.cc
2012-11-02 11:32:01 -05:00
Andreas Sandberg
c0ab52799c sim: Include object header files in SWIG interfaces
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.

This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
2012-11-02 11:32:01 -05:00
Ali Saidi
5adb4ddc12 O3: Pack the comm structures a bit better to reduce their size. 2012-09-25 11:49:40 -05:00
Djordje Kovacevic
d060a28a29 CPU: Add abandoned instructions to O3 Pipe Viewer 2012-09-25 11:49:40 -05:00
Anthony Gutierrez
c6927ed138 stats: remove duplicate instruction stats from the commit stage
these stats are duplicates of insts/opsCommitted, cause
confusion, and are poorly named.
2012-09-12 11:35:52 -04:00
Ali Saidi
03ff612054 O3: Get rid of incorrect assert in RAS. 2012-09-07 14:20:53 -05:00
Andreas Hansson
287ea1a081 Param: Transition to Cycles for relevant parameters
This patch is a first step to using Cycles as a parameter type. The
main affected modules are the CPUs and the Ruby caches. There are
definitely plenty more places that are affected, but this patch serves
as a starting point to making the transition.

An important part of this patch is to actually enable parameters to be
specified as Param.Cycles which involves some changes to params.py.
2012-09-07 12:34:38 -04:00
Andreas Hansson
0cacf7e817 Clock: Add a Cycles wrapper class and use where applicable
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.

Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.

This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.

In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.

An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
2012-08-28 14:30:33 -04:00
Andreas Hansson
d53d04473e Clock: Rework clocks to avoid tick-to-cycle transformations
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.

In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.

Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.

As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
2012-08-28 14:30:31 -04:00
Andreas Hansson
c60db56741 Packet: Remove NACKs from packet and its use in endpoints
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).

The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.

Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
2012-08-22 11:39:59 -04:00
Andreas Hansson
a81c969529 CPU: Remove overloaded function_trace_start parameter
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.

The inorder CPU is particularly interesting as it uses a different
name for the parameter, and never make any use of it internally.
2012-08-21 05:49:43 -04:00
Andreas Hansson
016593f2e9 Clock: Make Tick unsigned and remove UTick
This patch makes the Tick unsigned and removes the UTick typedef. The
ticks should never be negative, and there was only one major issue
with removing it, caused by the o3 CPU using a -1 as an initial value.

The patch has no impact on any regressions.
2012-08-21 05:49:09 -04:00
Anthony Gutierrez
0b3897fc90 O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs
This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.

This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.
2012-08-15 10:38:08 -04:00
Anthony Gutierrez
8133f2460f checker: make checker cpu id match its host's cpu id
when using the checker i ran into problems where an instruction reading the
cpu id register failed because the ids did not match, and hence, the result
of the instruction did not match. this patch ensures that the ids match so
this instruction does not fail. this problem only seemed to manifest itself
when multiple cores were in the system, either multi-core, or extra switched-
out cores present in the system.
2012-07-27 16:08:04 -04:00
Andreas Hansson
b265d9925c Port: Align port names in C++ and Python
This patch is a first step to align the port names used in the Python
world and the C++ world. Ultimately it serves to make the use of
config.json together with output from the simulation easier, including
post-processing of statistics.

Most notably, the CPU, cache, and bus is addressed in this patch, and
there might be other ports that should be updated accordingly. The
dash name separator has also been replaced with a "." which is what is
used to concatenate the names in python, and a separation is made
between the master and slave port in the bus.
2012-07-09 12:35:39 -04:00
Andreas Hansson
ff5718f042 Fix: Address a few benign memory leaks
This patch is the result of static analysis identifying a number of
memory leaks. The leaks are all benign as they are a result of not
deallocating memory in the desctructor. The fix still has value as it
removes false positives in the static analysis.
2012-07-09 12:35:30 -04:00
Nathanael Premillieu
af2b14a362 O3: Track if the RAS has been pushed or not to pop the RAS if neccessary.
Add new flag (named pushedRAS) in the PredictorHistory structure.
This flag tracks whether the RAS has been pushed or not during a prediction.
Then, in the squash function it is used to pop the RAS if necessary.
2012-06-29 11:18:29 -04:00
Ali Saidi
20d25b9da7 ISA: Back-out NoopMachInst as a StaticInstPtr change. 2012-06-05 13:52:30 -04:00
Ali Saidi
6df196b71e O3: Clean up the O3 structures and try to pack them a bit better.
DynInst is extremely large the hope is that this re-organization will put the
most used members close to each other.
2012-06-05 01:23:09 -04:00
Ali Saidi
1b370431d0 sim: Remove FastAlloc
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
2012-06-05 01:23:08 -04:00
Gabe Black
008b17d816 ISA: Turn the ExtMachInst NoopMachinst into the StaticInstPtr NoopStaticInst.
This eliminates a use of the ExtMachInst type outside of the ISAs.
2012-06-04 10:57:23 -07:00
Gabe Black
0cba96ba6a CPU: Merge the predecoder and decoder.
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.

--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
2012-05-26 13:44:46 -07:00
Gabe Black
82a228bd43 Decode: Make the Decoder class defined per ISA.
--HG--
rename : src/cpu/decode.cc => src/arch/generic/decoder.cc
rename : src/cpu/decode.hh => src/arch/generic/decoder.hh
2012-05-25 00:53:37 -07:00
Andreas Hansson
3fea59e162 MEM: Separate requests and responses for timing accesses
This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.

For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).

The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.

With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.
2012-05-01 13:40:42 -04:00
Andreas Hansson
750f33a901 MEM: Remove the Broadcast destination from the packet
This patch simplifies the packet by removing the broadcast flag and
instead more firmly relying on (and enforcing) the semantics of
transactions in the classic memory system, i.e. request packets are
routed from a master to a slave based on the address, and when they
are created they have neither a valid source, nor destination. On
their way to the slave, the request packet is updated with a source
field for all modules that multiplex packets from multiple master
(e.g. a bus). When a request packet is turned into a response packet
(at the final slave), it moves the potentially populated source field
to the destination field, and the response packet is routed through
any multiplexing components back to the master based on the
destination field.

Modules that connect multiplexing components, such as caches and
bridges store any existing source and destination field in the sender
state as a stack (just as before).

The packet constructor is simplified in that there is no longer a need
to pass the Packet::Broadcast as the destination (this was always the
case for the classic memory system). In the case of Ruby, rather than
using the parameter to the constructor we now rely on setDest, as
there is already another three-argument constructor in the packet
class.

In many places where the packet information was printed as part of
DPRINTFs, request packets would be printed with a numeric "dest" that
would always be -1 (Broadcast) and that field is now removed from the
printing.
2012-04-14 05:45:55 -04:00
Andreas Hansson
dccca0d3a9 MEM: Separate snoops and normal memory requests/responses
This patch introduces port access methods that separates snoop
request/responses from normal memory request/responses. The
differentiation is made for functional, atomic and timing accesses and
builds on the introduction of master and slave ports.

Before the introduction of this patch, the packets belonging to the
different phases of the protocol (request -> [forwarded snoop request
-> snoop response]* -> response) all use the same port access
functions, even though the snoop packets flow in the opposite
direction to the normal packet. That is, a coherent master sends
normal request and receives responses, but receives snoop requests and
sends snoop responses (vice versa for the slave). These two distinct
phases now use different access functions, as described below.

Starting with the functional access, a master sends a request to a
slave through sendFunctional, and the request packet is turned into a
response before the call returns. In a system without cache coherence,
this is all that is needed from the functional interface. For the
cache-coherent scenario, a slave also sends snoop requests to coherent
masters through sendFunctionalSnoop, with responses returned within
the same packet pointer. This is currently used by the bus and caches,
and the LSQ of the O3 CPU. The send/recvFunctional and
send/recvFunctionalSnoop are moved from the Port super class to the
appropriate subclass.

Atomic accesses follow the same flow as functional accesses, with
request being sent from master to slave through sendAtomic. In the
case of cache-coherent ports, a slave can send snoop requests to a
master through sendAtomicSnoop. Just as for the functional access
methods, the atomic send and receive member functions are moved to the
appropriate subclasses.

The timing access methods are different from the functional and atomic
in that requests and responses are separated in time and
send/recvTiming are used for both directions. Hence, a master uses
sendTiming to send a request to a slave, and a slave uses sendTiming
to send a response back to a master, at a later point in time. Snoop
requests and responses travel in the opposite direction, similar to
what happens in functional and atomic accesses. With the introduction
of this patch, it is possible to determine the direction of packets in
the bus, and no longer necessary to look for both a master and a slave
port with the requested port id.

In contrast to the normal recvFunctional, recvAtomic and recvTiming
that are pure virtual functions, the recvFunctionalSnoop,
recvAtomicSnoop and recvTimingSnoop have a default implementation that
calls panic. This is to allow non-coherent master and slave ports to
not implement these functions.
2012-04-14 05:45:07 -04:00
Andreas Hansson
b00949d88b MEM: Enable multiple distributed generalized memories
This patch removes the assumption on having on single instance of
PhysicalMemory, and enables a distributed memory where the individual
memories in the system are each responsible for a single contiguous
address range.

All memories inherit from an AbstractMemory that encompasses the basic
behaviuor of a random access memory, and provides untimed access
methods. What was previously called PhysicalMemory is now
SimpleMemory, and a subclass of AbstractMemory. All future types of
memory controllers should inherit from AbstractMemory.

To enable e.g. the atomic CPU and RubyPort to access the now
distributed memory, the system has a wrapper class, called
PhysicalMemory that is aware of all the memories in the system and
their associated address ranges. This class thus acts as an
infinitely-fast bus and performs address decoding for these "shortcut"
accesses. Each memory can specify that it should not be part of the
global address map (used e.g. by the functional memories by some
testers). Moreover, each memory can be configured to be reported to
the OS configuration table, useful for populating ATAG structures, and
any potential ACPI tables.

Checkpointing support currently assumes that all memories have the
same size and organisation when creating and resuming from the
checkpoint. A future patch will enable a more flexible
re-organisation.

--HG--
rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py
rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py
rename : src/mem/physical.cc => src/mem/abstract_mem.cc
rename : src/mem/physical.hh => src/mem/abstract_mem.hh
rename : src/mem/physical.cc => src/mem/simple_mem.cc
rename : src/mem/physical.hh => src/mem/simple_mem.hh
2012-04-06 13:46:31 -04:00
William Wang
f9d403a7b9 MEM: Introduce the master/slave port sub-classes in C++
This patch introduces the notion of a master and slave port in the C++
code, thus bringing the previous classification from the Python
classes into the corresponding simulation objects and memory objects.

The patch enables us to classify behaviours into the two bins and add
assumptions and enfore compliance, also simplifying the two
interfaces. As a starting point, isSnooping is confined to a master
port, and getAddrRanges to slave ports. More of these specilisations
are to come in later patches.

The getPort function is not getMasterPort and getSlavePort, and
returns a port reference rather than a pointer as NULL would never be
a valid return value. The default implementation of these two
functions is placed in MemObject, and calls fatal.

The one drawback with this specific patch is that it requires some
code duplication, e.g. QueuedPort becomes QueuedMasterPort and
QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort
(avoiding multiple inheritance). With the later introduction of the
port interfaces, moving the functionality outside the port itself, a
lot of the duplicated code will disappear again.
2012-03-30 09:40:11 -04:00
Andreas Hansson
a14013af3a CPU: Unify initMemProxies across CPUs and simulation modes
This patch unifies where initMemProxies is called, in the init()
method of each BaseCPU subclass, before TheISA::initCPU is
called. Moreover, it also ensures that initMemProxies is called in
both full-system and syscall-emulation mode, thus unifying also across
the modes. An additional check is added in the ThreadState to ensure
that initMemProxies is only called once.
2012-03-30 09:38:35 -04:00
Andrew Lukefahr
b4e5be717d O3: Fix sizing of decode to rename skid buffer. 2012-03-21 10:34:06 -05:00
Brian Grayson
565c1de4a8 O3: Fix size of skid buffer between fetch and decode when widths are different 2012-03-21 10:34:05 -05:00
Andreas Hansson
72538294fb gcc: Clean-up of non-C++0x compliant code, first steps
This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
2012-03-19 06:36:09 -04:00
Brian Grayson
98185658c5 O3: Add fatal when fetchWidth > Impl::MaxWidth. 2012-03-11 10:20:54 -04:00
Geoffrey Blake
69d229ce28 O3/Ozone: Eliminate dead code counting software prefetch insts
Eliminates dead code in the O3 and Ozone CPU models that counted
software prefetch instructions separately for the ALPHA ISA only.
2012-03-09 09:59:28 -05:00
Geoffrey Blake
043709fdfa CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files.  Also merges with the SE/FS changes.
2012-03-09 09:59:27 -05:00
Andreas Hansson
32eae8094d CPU: Check that the interrupt controller is created when needed
This patch adds a creation-time check to the CPU to ensure that the
interrupt controller is created for the cases where it is needed,
i.e. if the CPU is not being switched in later and not a checker CPU.

The patch also adds the "createInterruptController" call to a number
of the regression scripts.
2012-03-02 09:21:48 -05:00
Nilay Vaish
c80af04d7d x86: Fix switching of CPUs
This patch prevents creation of interrupt controller for
cpus that will be switched in later
2012-03-01 11:37:02 -06:00
Andreas Hansson
9e3c8de30b MEM: Make port proxies use references rather than pointers
This patch is adding a clearer design intent to all objects that would
not be complete without a port proxy by making the proxies members
rathen than dynamically allocated. In essence, if NULL would not be a
valid value for the proxy, then we avoid using a pointer to make this
clear.

The same approach is used for the methods using these proxies, such as
loadSections, that now use references rather than pointers to better
reflect the fact that NULL would not be an acceptable value (in fact
the code would break and that is how this patch started out).

Overall the concept of "using a reference to express unconditional
composition where a NULL pointer is never valid" could be done on a
much broader scale throughout the code base, but for now it is only
done in the locations affected by the proxies.
2012-02-24 11:45:30 -05:00
Andreas Hansson
9f07d2ce7e CPU: Round-two unifying instr/data CPU ports across models
This patch continues the unification of how the different CPU models
create and share their instruction and data ports. Most importantly,
it forces every CPU to have an instruction and a data port, and gives
these ports explicit getters in the BaseCPU (getDataPort and
getInstPort). The patch helps in simplifying the code, make
assumptions more explicit, andfurther ease future patches related to
the CPU ports.

The biggest changes are in the in-order model (that was not modified
in the previous unification patch), which now moves the ports from the
CacheUnit to the CPU. It also distinguishes the instruction fetch and
load-store unit from the rest of the resources, and avoids the use of
indices and casting in favour of keeping track of these two units
explicitly (since they are always there anyways). The atomic, timing
and O3 model simply return references to their already existing ports.
2012-02-24 11:42:00 -05:00
Mrinmoy Ghosh
9b05e96b9e BPred: Fix RAS to handle predicated call/return instructions.
Change RAS to fix issues with predicated call/return instructions.
Handled all cases in the life of a predicated call and return instruction.
2012-02-13 12:26:25 -06:00
Mrinmoy Ghosh
fd90c3676d BP: Fix several Branch Predictor issues.
1. Updates the Branch Predictor correctly to the state
   just after a mispredicted branch, if a squash occurs.
2. If a BTB does not find an entry, the branch is predicted not taken.
   The global history is modified to correctly reflect this prediction.
3. Local history is now updated at the fetch stage instead of
   execute stage.
4. In the Update stage of the branch predictor the local predictors are
   now correctly updated according to the state of local history during
   fetch stage.

This patch also improves performance by as much as 17% on some benchmarks
2012-02-13 12:26:24 -06:00
Anthony Gutierrez
542d0ceebc cpu: add separate stats for insts/ops both globally and per cpu model 2012-02-12 16:07:39 -06:00
Ali Saidi
8aaa39e93d mem: Add a master ID to each request object.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
2012-02-12 16:07:38 -06:00
Nilay Vaish
6a7a6263e1 O3 CPU: Improve handling of delayed commit flag
The delayed commit flag is used in conjunction with interrupt pending flag to
figure out whether or not fetch stage should get more instructions. This patch
clears this flag when instructions are squashed. Also, in case an interrupt is
pending, currently it is not possible to access the instruction cache. This
patch allows accessing the cache in case this flag is set.
2012-02-10 08:37:31 -06:00
Nilay Vaish
cd765c23a2 O3 CPU: Strengthen condition for handling interrupts
The condition for handling interrupts is to check whether or not the cpu's
instruction list is empty. As observed, this can lead to cases in which even
though the instruction list is empty, interrupts are handled when they should
not be. The condition is being strengthened so that interrupts get handled only
when the last committed microop did not had IsDelayedCommit set.
2012-02-10 08:37:30 -06:00
Nilay Vaish
8f7e03d4cf O3 CPU: Provide the squashing instruction
This patch adds a function to the ROB that will get the squashing instruction
from the ROB's list of instructions. This squashing instruction is used for
figuring out the macroop from which the fetch stage should fetch the microops.
Further, a check has been added that if the instructions are to be fetched
from the cache maintained by the fetch stage, then the data in the cache should
be valid and the PC of the thread being fetched from is same as the address of
the cache block.
2012-02-10 08:37:28 -06:00
Nilay Vaish
0e597e944a O3 Fetch: Check if PC is pointing to Microcode ROM 2012-02-10 08:37:26 -06:00
Gabe Black
f2b46fdb85 Faults: Turn off arch/faults.hh
Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.
2012-02-07 04:43:21 -08:00
Gabe Black
ea8b347dc5 Merge with head, hopefully the last time for this batch. 2012-01-31 22:40:08 -08:00
Koan-Sin Tan
7d4f187700 clang: Enable compiling gem5 using clang 2.9 and 3.0
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).

clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
2012-01-31 12:05:52 -05:00
Geoffrey Blake
af6aaf2581 CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU.  These changes have only been tested with the ARM ISA.  Other
ISAs potentially require modification.
2012-01-31 07:46:03 -08:00
Gabe Black
e88165a431 Merge with main repository. 2012-01-30 21:07:57 -08:00
Andreas Hansson
ef9fc01073 MEM: Clean-up of Functional/Virtual/TranslatingPort remnants
This patch cleans up forward declarations and a member-function
prototype that still referred to the old FunctionalPort, VirtualPort
and TranslatingPort. There is no change in functionality.
2012-01-30 03:44:25 -05:00
Gabe Black
39f314cc15 Yet another merge with the main repository.
--HG--
rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/config.ini => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/config.ini
rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/simout => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/simout
rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt
rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/system.pc.com_1.terminal => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/system.pc.com_1.terminal
rename : tests/long/00.gzip/ref/x86/linux/o3-timing/config.ini => tests/long/se/00.gzip/ref/x86/linux/o3-timing/config.ini
rename : tests/long/00.gzip/ref/x86/linux/o3-timing/simout => tests/long/se/00.gzip/ref/x86/linux/o3-timing/simout
rename : tests/long/00.gzip/ref/x86/linux/o3-timing/stats.txt => tests/long/se/00.gzip/ref/x86/linux/o3-timing/stats.txt
rename : tests/long/10.mcf/ref/x86/linux/o3-timing/config.ini => tests/long/se/10.mcf/ref/x86/linux/o3-timing/config.ini
rename : tests/long/10.mcf/ref/x86/linux/o3-timing/simout => tests/long/se/10.mcf/ref/x86/linux/o3-timing/simout
rename : tests/long/10.mcf/ref/x86/linux/o3-timing/stats.txt => tests/long/se/10.mcf/ref/x86/linux/o3-timing/stats.txt
rename : tests/long/20.parser/ref/x86/linux/o3-timing/config.ini => tests/long/se/20.parser/ref/x86/linux/o3-timing/config.ini
rename : tests/long/20.parser/ref/x86/linux/o3-timing/simout => tests/long/se/20.parser/ref/x86/linux/o3-timing/simout
rename : tests/long/20.parser/ref/x86/linux/o3-timing/stats.txt => tests/long/se/20.parser/ref/x86/linux/o3-timing/stats.txt
rename : tests/long/70.twolf/ref/x86/linux/o3-timing/config.ini => tests/long/se/70.twolf/ref/x86/linux/o3-timing/config.ini
rename : tests/long/70.twolf/ref/x86/linux/o3-timing/simout => tests/long/se/70.twolf/ref/x86/linux/o3-timing/simout
rename : tests/long/70.twolf/ref/x86/linux/o3-timing/stats.txt => tests/long/se/70.twolf/ref/x86/linux/o3-timing/stats.txt
rename : tests/quick/00.hello/ref/x86/linux/o3-timing/config.ini => tests/quick/se/00.hello/ref/x86/linux/o3-timing/config.ini
rename : tests/quick/00.hello/ref/x86/linux/o3-timing/simout => tests/quick/se/00.hello/ref/x86/linux/o3-timing/simout
rename : tests/quick/00.hello/ref/x86/linux/o3-timing/stats.txt => tests/quick/se/00.hello/ref/x86/linux/o3-timing/stats.txt
2012-01-29 03:27:15 -08:00
Gabe Black
dc0e629ea1 Implement Ali's review feedback.
Try to decrease indentation, and remove some redundant FullSystem checks.
2012-01-29 02:04:34 -08:00
Nilay Vaish
5c2fc35e02 O3 CPU LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system. By default, the x86 architecture will have TSO.
2012-01-28 19:09:04 -06:00
Gabe Black
c3d41a2def Merge with the main repo.
--HG--
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
2012-01-28 07:24:01 -08:00
Gabe Black
da2a4acc26 Merge yet again with the main repository. 2012-01-16 04:27:10 -08:00
Andreas Hansson
07cf9d914b MEM: Separate queries for snooping and address ranges
This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
2012-01-17 12:55:09 -06:00
Andreas Hansson
b3f930c884 CPU: Moving towards a more general port across CPU models
This patch performs minimal changes to move the instruction and data
ports from specialised subclasses to the base CPU (to the largest
degree possible). Ultimately it servers to make the CPU(s) have a
well-defined interface to the memory sub-system.
2012-01-17 12:55:08 -06:00
Andreas Hansson
f85286b3de MEM: Add port proxies instead of non-structural ports
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.

The following replacements are made:
FunctionalPort      > PortProxy
TranslatingPort     > SETranslatingPortProxy
VirtualPort         > FSTranslatingPortProxy

--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
2012-01-17 12:55:08 -06:00
Nilay Vaish
9957035a42 DPRINTF: Improve some dprintf messages. 2012-01-10 10:15:02 -06:00
Ali Saidi
525d1e46dc O3: Remove some asserts that no longer seem to be valid. 2012-01-09 18:08:20 -06:00
Ali Saidi
d2c26f402c O3: Add support of function tracing with O3 CPU. 2012-01-09 18:08:20 -06:00
Gabe Black
241cc0c840 Another merge with the main repository. 2012-01-07 02:16:37 -08:00
Gabe Black
ec936364b7 Merge with the main repository again. 2012-01-07 02:15:35 -08:00
Gabe Black
36a822f08e Merge with main repository. 2012-01-07 02:10:34 -08:00
Nathan Binkert
6ef9691035 gcc: fix unused variable warnings from GCC 4.6.1
--HG--
extra : rebase_source : f9e22de341493a25ac6106c16ac35c61c128a080
2011-12-13 11:49:27 -08:00
Chander Sudanthi
61c14da751 O3: Remove hardcoded tgts_per_mshr in O3CPU.py.
There are two lines in O3CPU.py that set the dcache and icache
tgts_per_mshr to 20, ignoring any pre-configured value of tgts_per_mshr.
This patch removes these hardcoded lines from O3CPU.py and sets the default
L1 cache mshr targets to 20.

--HG--
extra : rebase_source : 6f92d950e90496a3102967442814e97dc84db08b
2011-12-01 00:15:22 -08:00
Ali Saidi
1444103998 O3: Add stat that counts how many cycles the O3 cpu was quiesced.
--HG--
extra : rebase_source : 043b9307eef3c5b87f8e6370765641e016ed1fa7
2011-12-01 00:15:22 -08:00
Gabe Black
85424bef19 SE/FS: Get rid of includes of config/full_system.hh. 2011-11-18 02:20:22 -08:00
Gabe Black
de21bb93ea SE/FS: Get rid of FULL_SYSTEM in the CPU directory. 2011-11-18 01:33:28 -08:00
Gabe Black
1268e0df1f SE/FS: Expose the same methods on the CPUs in SE and FS modes. 2011-11-01 04:01:13 -07:00
Gabe Black
8ad2b8c559 SE/FS: Make the functions available from the TC consistent between SE and FS. 2011-10-31 02:58:22 -07:00
Gabe Black
d735abe5da GCC: Get everything working with gcc 4.6.1.
And by "everything" I mean all the quick regressions.
2011-10-31 01:09:44 -07:00
Gabe Black
facb40f3ff SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs. 2011-10-30 00:33:02 -07:00
Gabe Black
5b433568f0 SE/FS: Build the base process class in FS. 2011-10-30 00:32:54 -07:00
Gabe Black
464c485d0c SE/FS: Include getMemPort in FS. 2011-10-16 05:06:40 -07:00
Gabe Black
3595b0c5a1 SE/FS: Build/expose vport in SE mode. 2011-10-16 05:06:39 -07:00
Gabe Black
e8e9f97312 CPU: Make physPort and getPhysPort available in SE mode. 2011-10-16 02:59:53 -07:00
Gabe Black
4fcf8e9959 O3: Tidy up some DPRINTFs in the LSQ. 2011-09-27 00:25:26 -07:00
Gabe Black
44ed4849d4 Faults: Replace calls to genMachineCheckFault with M5PanicFault. 2011-09-27 00:24:43 -07:00
Nilay Vaish
56bddab189 LSQ: Moved a couple of lines to enable O3 + Ruby
This patch makes O3 CPU work along with the Ruby memory model. Ruby
overwrites the senderState pointer with another pointer. The pointer
is restored only when Ruby gets done with the packet. LSQ makes use of
senderState just after sendTiming() returns. But the dynamic_cast returns
a NULL pointer since Ruby's senderState pointer is from a different class.
Storing the senderState pointer before calling sendTiming() does away with
the problem.
2011-09-26 12:18:32 -05:00
Steve Reinhardt
84f0a1bd91 event: minor cleanup
Initialize flags via the Event constructor instead of calling
setFlags() in the body of the derived class's constructor.  I
forget exactly why, but this made life easier when implementing
multi-queue support.

Also rename Event::getFlags() to isFlagSet() to better match
common usage, and get rid of some unused Event methods.
2011-09-22 18:59:55 -07:00
Gabe Black
10c2e37f60 Syscall: Make the syscall function available in both SE and FS modes.
In FS mode the syscall function will panic, but the interface will be
consistent and code which calls syscall can be compiled in. This will allow,
for instance, instructions that use syscall to be built unconditionally but
then not returned by the decoder.
2011-09-19 02:46:48 -07:00
Ali Saidi
649c239cee LSQ: Only trigger a memory violation with a load/load if the value changes.
Only create a memory ordering violation when the value could have changed
between two subsequent loads, instead of just when loads go out-of-order
to the same address. While not very common in the case of Alpha, with
an architecture with a hardware table walker this can happen reasonably
frequently beacuse a translation will miss and start a table walk and
before the CPU re-schedules the faulting instruction another one will
pass it to the same address (or cache block depending on the dendency
checking).

This patch has been tested with a couple of self-checking hand crafted
programs to stress ordering between two cores.

The performance improvement on SPEC benchmarks can be substantial (2-10%).
2011-09-13 12:58:08 -04:00
Gabe Black
b7b545bc38 Decode: Pull instruction decoding out of the StaticInst class into its own.
This change pulls the instruction decoding machinery (including caches) out of
the StaticInst class and puts it into its own class. This has a few intrinsic
benefits. First, the StaticInst code, which has gotten to be quite large, gets
simpler. Second, the code that handles decode caching is now separated out
into its own component and can be looked at in isolation, making it easier to
understand. I took the opportunity to restructure the code a bit which will
hopefully also help.

Beyond that, this change also lays some ground work for each ISA to have its
own, potentially stateful decode object. We'd be able to include less
contextualizing information in the ExtMachInst objects since that context
would be applied at the decoder. Also, the decoder could "know" ahead of time
that all the instructions it's going to see are going to be, for instance, 64
bit mode, and it will have one less thing to check when it decodes them.
Because the decode caching mechanism has been separated out, it's now possible
to have multiple caches which correspond to different types of decoding
context. Having one cache for each element of the cross product of different
configurations may become prohibitive, so it may be desirable to clear out the
cache when relatively static state changes and not to have one for each
setting.

Because the decode function is no longer universally accessible as a static
member of the StaticInst class, a new function was added to the ThreadContexts
that returns the applicable decode object.
2011-09-09 02:30:01 -07:00
Ali Saidi
b6203360ef LSQ: Set store predictor to periodically clear itself as recommended in the storesets paper.
This patch improves performance by as much as 10% on some spec benchmarks.
2011-08-19 15:08:07 -05:00
Geoffrey Blake
5f425b8bd1 Fix bugs due to interaction between SEV instructions and O3 pipeline
SEV instructions were originally implemented to cause asynchronous squashes
via the generateTCSquash() function in the O3 pipeline when updating the
SEV_MAILBOX miscReg. This caused race conditions between CPUs in an MP system
that would lead to a pipeline either going inactive indefinitely or not being
able to commit squashed instructions. Fixed SEV instructions to behave like
interrupts and cause synchronous sqaushes inside the pipeline, eliminating
the race conditions. Also fixed up the semantics of the WFE instruction to
behave as documented in the ARMv7 ISA description to not sleep if SEV_MAILBOX=1
or unmasked interrupts are pending.
2011-08-19 15:08:07 -05:00
Mrinmoy Ghosh
d0e0485902 LSQ: Add some better dprintfs for storeset predictor. 2011-08-19 15:08:05 -05:00
Mrinmoy Ghosh
0db95030fc LSQ: Fix a few issues with the storeset predictor.
Two issues are fixed in this patch:
1. The load and store pc passed to the predictor are passed in reverse order.
2. The flag indicating that a barrier is inflight was never cleared when
   the barrier was squashed instead of committed. This made all load insts
   dependent on a non-existent barrier in-flight.
2011-08-19 15:08:05 -05:00
Giacomo Gabrielli
676a530b77 O3: Squash the violator and younger instructions instead not all insts.
Change the way instructions are squashed on memory ordering violations
to squash the violator and younger instructions, not all instructions
that are younger than the instruction they violated (no reason to throw
away valid work).
2011-08-19 15:08:05 -05:00
Gabe Black
78a4636a13 O3: Make lsq_unit.hh include arch/isa_traits.hh directly, not transitively. 2011-08-16 02:46:57 -07:00
Gabe Black
0e6dc00497 O3: When squashing, restore the macroop that should be used for fetching. 2011-08-14 17:41:34 -07:00
Gabe Black
ec204f003c O3: Add a pointer to the macroop for a microop in the dyninst. 2011-08-14 04:08:14 -07:00
Gabe Black
e0043f8dbe O3: At the end of an instruction, force fetchAddr to something sensible.
It's possible (though until now very unlikely) for fetchAddr to get out of
sync with the actual PC of the current instruction. This change forcefull
resets fetchAddr at the end of every instruction.
2011-08-13 13:36:37 -07:00
Gabe Black
96df6bedb7 O3: Stop using the current macroop no matter why you're leaving it.
Until now, the only reason a macroop would be left was because it ended at a
microop marked as the last microop. In O3 with branch prediction, it's
possible for the branch predictor to have entries which originally came from
different instructions which happened to have the same RIP. This could
theoretically happen in many ways, but it was encountered specifically when
different programs in different address spaces ran one after the other in
X86_FS.

What would happen in that case was that the macroop would continue to be
looped over and microops fetched from it until it reached the last microop
even though the macropc had moved out from under it. If things lined up
properly, this could mean that the end bytes of an instruction actually fell
into the instruction sized block of memory after the one in the predecoder.
The fetch loop implicitly assumes that the last instruction sized chunk of
memory processed was the last one needed for the instruction it just finished
executing. It would then tell the predecoder to move to an offset within the
bytes it was given that is larger than those bytes, and that would trip an
assert in the x86 predecoder.

This change fixes this problem by making fetch stop processing the current
macroop if the address it should be fetching from changed when the PC is
updated. That happens when the last microop was reached because the instruction
handled it properly, and it also catches the case where the branch predictor
makes fetch do a macro level branch when it shouldn't.

The check of isLastMicroop is retained because otherwise, a macroop that
branches back to itself would act like a single, long macroop instead of
multiple instances of the same microop. There may be situations (which may
turn out to be purely hypothetical) where that matters.

This also fixes a relatively minor issue where the curMacroop variable would
be set to NULL immediately after seeing that a microop was the last one before
curMacroop was used to build the dyninst. The traceData structure would have a
NULL pointer to the macroop for that microop.
2011-08-09 11:30:43 -07:00
Gabe Black
3989f41261 O3: When waiting to handle an interrupt, let everything drain out.
Before this change, the commit stage would wait until the ROB and store queue
were empty before recognizing an interrupt. The fetch stage would stop
generating instructions at an appropriate point, so commit would then wait
until a valid time to interrupt the instruction stream. Instructions might be
in flight after fetch but not the in the ROB or store queue (in rename, for
instance), so this change makes commit wait until all in flight instructions
are finished.
2011-08-09 03:37:43 -07:00
Gabe Black
5c0e6e6092 O3: Get rid of the unused addToRemoveList function. 2011-08-07 15:41:10 -07:00
Gabe Black
a9b7931156 O3: Let squashed and deferred instructions issue.
Let squahsed and deferred instructions issue so they don't accumulate and clog
up the CPU.
2011-08-07 15:41:07 -07:00
Gabe Black
6230668f5e O3: Get rid of the raw ExtMachInst constructor on DynInsts.
This constructor assumes that the ExtMachInst can be decoded directly into a
StaticInst that's useful to execute. With the advent of microcoded
instructions that's no longer true.
2011-08-02 11:51:16 -07:00
Gabe Black
206c2e9a0e O3: Implement memory mapped IPRs for O3. 2011-07-31 19:21:17 -07:00
Gabe Black
a42c6ae48d O3: Fix corner case squashing into the microcode ROM.
When fetching from the microcode ROM, if the PC is set so that it isn't in the
cache block that's been fetched the CPU will get stuck. The fetch stage
notices that it's in the ROM so it doesn't try to fetch from the current PC.
It then later notices that it's outside of the current cache block so it skips
generating instructions expecting to continue once the right bytes have been
fetched. This change lets the fetch stage attempt to generate instructions,
and only checks if the bytes it's going to use are valid if it's really going
to use them.
2011-07-30 23:22:53 -07:00
Giacomo Gabrielli
69ef57fd0f O3: Create a pipeline activity viewer for the O3 CPU model.
Implemented a pipeline activity viewer as a python script (util/o3-pipeview.py)
and modified O3 code base to support an extra trace flag (O3PipeView) for
generating traces to be used as inputs by the tool.
2011-07-15 11:53:35 -05:00
Geoffrey Blake
c7e7b89058 O3: Fix up pipelining icache accesses in fetch stage to function properly
Fixed up the patch from Yasuko Watanabe that enabled pipelining of fetch accessess to
icache to work with recent changes to main repository.
Also added in ability for fetch stage to delay issuing the fault carrying
nop when a pipeline fetch causes a fault and no fetch bandwidth is available
until the next cycle.
2011-07-10 12:56:08 -05:00
Ali Saidi
60579e8d74 O3: Make sure fetch doesn't go off into the weeds during speculation. 2011-07-10 12:56:08 -05:00
Korey Sewell
c8b43641fd o3: missing newlines on some dprintfs 2011-06-10 22:15:32 -04:00
Nathan Binkert
2b1aa35e20 scons: rename TraceFlags to DebugFlags 2011-06-02 17:36:21 -07:00
Geoffrey Blake
d0b0a55515 O3: Fix offset calculation into storeQueue buffer for store->load forwarding
Calculation of offset to copy from storeQueue[idx].data structure for load to
store forwarding fixed to be difference in bytes between store and load virtual
addresses.  Previous method would induce bug where a load would index into
buffer at the wrong location.
2011-05-23 10:40:21 -05:00
Geoffrey Blake
c223b887fe O3: Fix issue w/wbOutstading being decremented multiple times on blocked cache.
If a split load fails on a blocked cache wbOutstanding can be decremented
twice if the first part of the split load succeeds and the second part fails.
Condition the decrementing on not having completed the first part of the load.
2011-05-23 10:40:19 -05:00
Geoffrey Blake
6dd996aabb O3: Fix issue with interrupts/faults occuring in the middle of a macro-op
This patch fixes two problems with the O3 cpu model. The first is an issue
with an instruction fetch causing a fault on the next address while the
current macro-op is being issued. This happens when the micro-ops exceed
the fetch bandwdith and then on the next cycle the fetch stage attempts
to issue a request to the next line while it still has micro-ops to issue
if the next line faults a fault is attached to a micro-op in the currently
executing macro-op rather than a "nop" from the next instruction block.
This leads to an instruction incorrectly faulting when on fetch when
it had no reason to fault.

A similar problem occurs with interrupts. When an interrupt occurs the
fetch stage nominally stops issuing instructions immediately. This is incorrect
in the case of a macro-op as the current location might not be interruptable.
2011-05-23 10:40:18 -05:00
Geoffrey Blake
b79650ceaa O3: Fix an issue with a load & branch instruction and mem dep squashing
Instructions that load an address and are control instructions can
execute down the wrong path if they were predicted correctly and then
instructions following them are squashed. If an instruction is a
memory and control op use the predicted address for the next PC instead
of just advancing the PC. Without this change NPC is used for the next
instruction, but predPC is used to verify that the branch was successful
so the wrong path is silently executed.
2011-05-13 17:27:00 -05:00
Ali Saidi
89e7bcca82 O3: Remove assertion for case that is actually handled in code.
If an nonspeculative instruction has a fault it might not be in the
nonSpecInsts map.
2011-05-04 20:38:27 -05:00
Ali Saidi
09a2be0c39 O3: Fix a small corner case with the lsq hazard detection logic. 2011-05-04 20:38:26 -05:00
Nathan Binkert
6e9143d36d stats: one more name violation 2011-04-20 19:07:45 -07:00
Nathan Binkert
63371c8664 stats: rename stats so they can be used as python expressions 2011-04-19 18:45:21 -07:00
Nathan Binkert
eddac53ff6 trace: reimplement the DTRACE function so it doesn't use a vector
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing.  This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
2011-04-15 10:44:32 -07:00
Nathan Binkert
bbb1392c08 includes: fix up code after sorting 2011-04-15 10:44:14 -07:00
Nathan Binkert
39a055645f includes: sort all includes 2011-04-15 10:44:06 -07:00
Ali Saidi
6b69890493 ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
This change fixes a small bug in the arm copyRegs() code where some registers
wouldn't be copied if the processor was in a mode other than MODE_USER.
Additionally, this change simplifies the way the O3 switchCpu code works by
utilizing TheISA::copyRegs() to copy the required context information
rather than the adhoc copying that goes on in the CPU model. The current code
makes assumptions about the visibility of int and float registers that aren't
true for all architectures in FS mode.
2011-04-04 11:42:28 -05:00
Ali Saidi
a679cd917a ARM: Cleanup implementation of ITSTATE and put important code in PCState.
Consolidate all code to handle ITSTATE in the PCState object rather than
touching a variety of structures/objects.
2011-04-04 11:42:28 -05:00
Ali Saidi
5962fecc1d CPU: Remove references to memory copy operations 2011-04-04 11:42:26 -05:00
Ali Saidi
7dde557fdc O3: Tighten memory order violation checking to 16 bytes.
The comment in the code suggests that the checking granularity should be 16
bytes, however in reality the shift by 8 is 256 bytes which seems much
larger than required.
2011-04-04 11:42:23 -05:00
Ali Saidi
799c3da8d0 O3: Send instruction back to fetch on squash to seed predecoder correctly. 2011-03-17 19:20:19 -05:00
Ali Saidi
30143baf7e O3: Cleanup the commitInfo comm struct.
Get rid of unused members and use base types rather than derrived values
where possible to limit amount of state.
2011-03-17 19:20:19 -05:00
Ali Saidi
a432d8e085 Mem: Fix issue with dirty block being lost when entire block transferred to non-cache.
This change fixes the problem for all the cases we actively use. If you want to try
more creative I/O device attachments (E.g. sharing an L2), this won't work. You
would need another level of caching between the I/O device and the cache
(which you actually need anyway with our current code to make sure writes
propagate). This is required so that you can mark the cache in between as
top level and it won't try to send ownership of a block to the I/O device.
Asserts have been added that should catch any issues.
2011-03-17 19:20:19 -05:00
Ali Saidi
2f40b3b8ae O3: Fix unaligned stores when cache blocked
Without this change the a store can be issued to the cache multiple times.
If this case occurs when the l1 cache is out of mshrs (and thus blocked)
the processor will never make forward progress because each cycle it will
send a single request using the recently freed mshr and not completing the
multipart store. This will continue forever.
2011-03-17 19:20:19 -05:00
Timothy M. Jones
a10685ad1e O3CPU: Fix iqCount and lsqCount SMT fetch policies.
Fixes two of the SMT fetch policies in O3CPU that were returning the count
of instructions in the IQ or LSQ rather than the thread ID to fetch from.
2011-02-25 13:50:29 +00:00
Ali Saidi
f9d4d9df1b O3: When a prefetch causes a fault, don't record it in the inst 2011-02-23 15:10:50 -06:00
Ali Saidi
3de8e0a0d4 O3: If there is an outstanding table walk don't let the inst queue sleep.
If there is an outstanding table walk and no other activity in the CPU
it can go to sleep and never wake up. This change makes the instruction
queue always active if the CPU is waiting for a store to translate.

If Gabe changes the way this code works then the below should be removed
as indicated by the todo.
2011-02-23 15:10:49 -06:00
Ali Saidi
7391ea6de6 ARM: Do something for ISB, DSB, DMB 2011-02-23 15:10:49 -06:00
Ali Saidi
ae3d456855 ARM: Fix bug that let two table walks occur in parallel. 2011-02-23 15:10:49 -06:00
Ali Saidi
68bd80794c O3: Fix bug when a squash occurs right before TLB miss returns.
In this case we need to throw away the TLB miss, not assume it was the
one we were waiting for.
2011-02-23 15:10:49 -06:00
Gabe Black
f036fd9748 O3: Fetch from the microcode ROM when needed. 2011-02-13 17:40:07 -08:00
Ali Saidi
7c763b34c9 O3: Fix GCC 4.2.4 complaint 2011-02-13 16:51:15 -05:00
Giacomo Gabrielli
a05032f4df O3: Fix pipeline restart when a table walk completes in the fetch stage.
When a table walk is initiated by the fetch stage, the CPU can
potentially move to the idle state and never wake up.

The fetch stage must call cpu->wakeCPU() when a translation completes
(in finishTranslation()).
2011-02-11 18:29:35 -06:00
Giacomo Gabrielli
e2507407b1 O3: Enhance data address translation by supporting hardware page table walkers.
Some ISAs (like ARM) relies on hardware page table walkers.  For those ISAs,
when a TLB miss occurs, initiateTranslation() can return with NoFault but with
the translation unfinished.

Instructions experiencing a delayed translation due to a hardware page table
walk are deferred until the translation completes and kept into the IQ.  In
order to keep track of them, the IQ has been augmented with a queue of the
outstanding delayed memory instructions.  When their translation completes,
instructions are re-executed (only their initiateAccess() was already
executed; their DTB translation is now skipped).  The IEW stage has been
modified to support such a 2-pass execution.
2011-02-11 18:29:35 -06:00
Joel Hestness
b4c10bd680 mcpat: Adds McPAT performance counters
Updated patches from Rick Strong's set that modify performance counters for
McPAT
2011-02-06 22:14:17 -08:00
Gabe Black
00f24ae92c Config: Keep track of uncached and cached ports separately.
This makes sure that the address ranges requested for caches and uncached ports
don't conflict with each other, and that accesses which are always uncached
(message signaled interrupts for instance) don't waste time passing through
caches.
2011-02-03 20:23:00 -08:00
Gabe Black
869a046e41 O3: Fix a style bug in O3. 2011-02-02 23:34:14 -08:00
Gabe Black
119f5f8e94 X86: Add L1 caches for the TLB walkers.
Small L1 caches are connected to the TLB walkers when caches are used. This
allows them to participate in the coherence protocol properly.
2011-02-01 18:28:41 -08:00
Matt Horsnell
b13a79ee71 O3: Fix some variable length instruction issues with the O3 CPU and ARM ISA. 2011-01-18 16:30:05 -06:00
Matt Horsnell
c98df6f8c2 O3: Don't test misprediction on load instructions until executed. 2011-01-18 16:30:05 -06:00
Ali Saidi
1167ef19cf O3: Keep around the last committed instruction and use for squashing.
Without this change 0 is always used for the youngest sequence number if
a squash occured and the ROB was empty (E.g. an instruction is marked
serializeAfter or a fetch stall prevents other instructions from issuing).
Using 0 there is a race to rename where an instruction that committed the
same cycle as the squashing instruction can have it's renamed state undone
by the squash using sequence number 0.
2011-01-18 16:30:05 -06:00
Ali Saidi
ea058b14da O3: Don't try to scoreboard misc registers.
I'm not positive this is the correct fix, but it's working right now.
Either we need to do something like this, prevent the misc reg from being renamed at all,
or there something else going on. We need to find the root cause as to why
this is only a problem sometimes.
2011-01-18 16:30:05 -06:00
Matt Horsnell
11bef2ab38 O3: Fix corner cases where multiple squashes/fetch redirects overwrite timebuf. 2011-01-18 16:30:05 -06:00
Matt Horsnell
62f2097917 O3: Fix mispredicts from non control instructions.
The squash inside the fetch unit should not attempt to remove them from the
branch predictor as non-control instructions are not pushed into the predictor.
2011-01-18 16:30:05 -06:00
Matt Horsnell
5ebf3b2808 O3: Fixes the way prefetches are handled inside the iew unit.
This patch prevents the prefetch being added to the instCommit queue twice.
2011-01-18 16:30:02 -06:00
Ali Saidi
ee9a331fe5 O3: Support timing translations for O3 CPU fetch. 2011-01-18 16:30:02 -06:00
Ali Saidi
0f9a3671b6 ARM: Add support for moving predicated false dest operands from sources. 2011-01-18 16:30:02 -06:00
Min Kyu Jeong
96375409ea O3: Fixes fetch deadlock when the interrupt clears before CPU handles it.
When this condition occurs the cpu should restart the fetch stage to fetch from
the original execution path. Fault handling in the commit stage is cleaned up a
little bit so the control flow is simplier. Finally, if an instruction is being
used to carry a fault it isn't executed, so the fault propagates appropriately.
2011-01-18 16:30:01 -06:00
Steve Reinhardt
6f1187943c Replace curTick global variable with accessor functions.
This step makes it easy to replace the accessor functions
(which still access a global variable) with ones that access
per-thread curTick values.
2011-01-07 21:50:29 -08:00
Steve Reinhardt
89cf3f6e85 Move sched_list.hh and timebuf.hh from src/base to src/cpu.
These files really aren't general enough to belong in src/base.
This patch doesn't reorder include lines, leaving them unsorted
in many cases, but Nate's magic script will fix that up shortly.

--HG--
rename : src/base/sched_list.hh => src/cpu/sched_list.hh
rename : src/base/timebuf.hh => src/cpu/timebuf.hh
2011-01-03 14:35:47 -08:00
Ali Saidi
42ba158479 O3: Allow a store entry to store up to 16 bytes (instead of TheISA::IntReg).
The store queue doesn't need to be ISA specific and architectures can
frequently store more than an int registers worth of data. A 128 bits seems
more common, but even 256 bits may be appropriate. Pretty much anything less
than a cache line size is buildable.
2010-12-07 16:19:57 -08:00
Ali Saidi
e681c0f7b3 O3: Support squashing all state after special instruction
For SPARC ASIs are added to the ExtMachInst. If the ASI is changed simply
marking the instruction as Serializing isn't enough beacuse that only
stops rename. This provides a mechanism to squash all the instructions
and refetch them
2010-12-07 16:19:57 -08:00
Giacomo Gabrielli
719f9a6d4f O3: Make all instructions that write a misc. register not perform the write until commit.
ARM instructions updating cumulative flags (ARM FP exceptions and saturation
flags) are not serialized.

Added aliases for ARM FP exceptions and saturation flags in FPSCR.  Removed
write accesses to the FP condition codes for most ARM VFP instructions: only
VCMP and VCMPE instructions update the FP condition codes.  Removed a potential
cause of seg. faults in the O3 model for NEON memory macro-ops (ARM).
2010-12-07 16:19:57 -08:00
Min Kyu Jeong
4bbdd6ceb2 O3: Support SWAP and predicated loads/store in ARM. 2010-12-07 16:19:57 -08:00
Gabe Black
92655b6399 O3: Fix fp destination register flattening, and index offset adjusting.
This change makes O3 flatten floating point destination registers, and also
fixes misc register flattening so that it's correctly repositioned relative to
the resized regions for integer and floating point indices.

It also fixes some overly long lines.
2010-11-18 13:11:36 -05:00
Gabe Black
8b9b85e92c O3: Make O3 support variably lengthed instructions. 2010-11-15 19:37:03 -08:00
Ali Saidi
776c075917 O3: reset architetural state by calling clear() 2010-11-15 14:04:05 -06:00
Giacomo Gabrielli
0058927190 CPU/ARM: Add SIMD op classes to CPU models and ARM ISA. 2010-11-15 14:04:04 -06:00
Min Kyu Jeong
745df74fe0 O3: prevent a squash when completeAcc() modifies misc reg through TC.
This happens on ARM instructions when they update the IT state bits.
Code and associated comment was copied from execute() and initiateAcc() methods
2010-11-15 14:04:04 -06:00
Gabe Black
6f4bd2c1da ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.
2010-10-31 00:07:20 -07:00
Gabe Black
d5dbd91f3d O3: Get rid of a bunch of commented out lines. 2010-10-24 00:43:32 -07:00
Gabe Black
d4492190e6 Alpha: Fix Alpha NumMiscArchRegs constant.
Also add asserts in O3's Scoreboard class to catch bad indexes.
2010-10-04 11:58:06 -07:00
Gabe Black
ab8d7eee76 CPU: Fix O3 and possible InOrder segfaults in FS. 2010-09-20 02:46:42 -07:00
Gabe Black
8f3fbd2d13 CPU: Get rid of the now unnecessary getInst/setInst family of functions.
This code is no longer needed because of the preceeding change which adds a
StaticInstPtr parameter to the fault's invoke method, obviating the only use
for this pair of functions.
2010-09-13 21:58:34 -07:00
Gabe Black
6833ca7eed Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.
Also move the "Fault" reference counted pointer type into a separate file,
sim/fault.hh. It would be better to name this less similarly to sim/faults.hh
to reduce confusion, but fault.hh matches the name of the type. We could change
Fault to FaultPtr to match other pointer types, and then changing the name of
the file would make more sense.
2010-09-13 19:26:03 -07:00
Nathan Binkert
afafaf1dcb style: fix sorting of includes and whitespace in some files 2010-09-10 14:58:04 -07:00
Min Kyu Jeong
e1168e72ca ARM: Fixed register flattening logic (FP_Base_DepTag was set too low)
When decoding a srs instruction, invalid mode encoding returns invalid instruction.
This can happen when garbage instructions are fetched from mispredicted path
2010-08-25 19:10:43 -05:00
Gabe Black
943c171480 ISA: Get rid of old, unused utility functions cluttering up the ISAs. 2010-08-23 16:14:20 -07:00
Min Kyu Jeong
d8d6b869a2 O3: Skipping mem-order violation check for uncachable loads.
Uncachable load is not executed until it reaches the head of the ROB,
hence cannot cause one.
2010-08-23 11:18:42 -05:00
Min Kyu Jeong
e6a0be648e ARM: Improve printing of uop disassembly. 2010-08-23 11:18:42 -05:00
Min Kyu Jeong
03286e9d4e CPU: Make Exec trace to print predication result (if false) for memory instructions 2010-08-23 11:18:41 -05:00
Min Kyu Jeong
92ae620be8 ARM: mark msr/mrs instructions as SerializeBefore/After
Since miscellaneous registers bypass wakeup logic, force serialization
to resolve data dependencies through them
* * *
ARM: adding non-speculative/serialize flags for instructions change CPSR
2010-08-23 11:18:41 -05:00
Min Kyu Jeong
43c938d23e O3: Handle loads when the destination is the PC.
For loads that PC is the destination, check if the load
was mispredicted again when the value being loaded returns from memory
2010-08-23 11:18:40 -05:00
Min Kyu Jeong
5f91ec3f46 ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate.
THis allows the CPU to handle predicated-false instructions accordingly.
This particular patch makes loads that are predicated-false to be sent
straight to the commit stage directly, not waiting for return of the data
that was never requested since it was predicated-false.
2010-08-23 11:18:40 -05:00
Gabe Black
aa8c6e9c95 CPU: Add readBytes and writeBytes functions to the exec contexts. 2010-08-13 06:16:02 -07:00
Timothy M. Jones
607f519800 LSQ Unit: After deleting part of a split request, set it to NULL so that it
isn't accidentally deleted again later (causing a segmentation fault).
2010-07-22 18:54:37 +01:00
Timothy M. Jones
e50a880297 O3CPU: Fix a bug where stores in the cpu where never marked as split. 2010-07-22 18:52:02 +01:00
Timothy M. Jones
9a3533ec84 O3CPU: O3's tick event gets squashed when it is switched out. When repeatedly
switching between O3 and another CPU, O3's tick event might still be scheduled
in the event queue (as squashed).  Therefore, check for a squashed tick event
as well as a non-scheduled event when taking over from another CPU and deal
with it accordingly.
2010-07-22 18:47:43 +01:00
Timothy M. Jones
96767fc721 O3ThreadContext: When taking over from a previous context, only assert that
the system pointers match in Full System mode.
2010-06-23 00:53:17 +01:00