Commit graph

6952 commits

Author SHA1 Message Date
Brandon Potter 9eda4bdc5a ruby: remove extra whitespace and correct misspelled words 2015-07-10 16:05:23 -05:00
Andreas Sandberg a74c446e7d dev, arm: Add a device model that uses the NoMali model
Add a simple device shim that interfaces with the NoMali model
library. The gem5 side of the interface supports Mali T60x/T62x/T760
GPUs. This device model pretends to be a Mali GPU, but doesn't render
anything and executes in zero time.
2015-07-07 10:03:14 +01:00
Andreas Sandberg ed38e3432c sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.

This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.

Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
2015-07-07 09:51:05 +01:00
Andreas Sandberg f16c0a4a90 sim: Decouple draining from the SimObject hierarchy
Draining is currently done by traversing the SimObject graph and
calling drain()/drainResume() on the SimObjects. This is not ideal
when non-SimObjects (e.g., ports) need draining since this means that
SimObjects owning those objects need to be aware of this.

This changeset moves the responsibility for finding objects that need
draining from SimObjects and the Python-side of the simulator to the
DrainManager. The DrainManager now maintains a set of all objects that
need draining. To reduce the overhead in classes owning non-SimObjects
that need draining, objects inheriting from Drainable now
automatically register with the DrainManager. If such an object is
destroyed, it is automatically unregistered. This means that drain()
and drainResume() should never be called directly on a Drainable
object.

While implementing the new functionality, the DrainManager has now
been made thread safe. In practice, this means that it takes a lock
whenever it manipulates the set of Drainable objects since SimObjects
in different threads may create Drainable objects
dynamically. Similarly, the drain counter is now an atomic_uint, which
ensures that it is manipulated correctly when objects signal that they
are done draining.

A nice side effect of these changes is that it makes the drain state
changes stricter, which the simulation scripts can exploit to avoid
redundant drains.
2015-07-07 09:51:05 +01:00
Andreas Sandberg d5f5fbb855 sim: Move mem(Writeback|Invalidate) to SimObject
The memWriteback() and memInvalidate() calls used to live in the
Serializable interface. In this series of patches, the Serializable
interface will be redesigned to make serialization independent of the
object graph and always work on the entire simulator. This means that
the Serialization interface won't be useful to perform maintenance of
the caches in a sub-graph of the entire SimObject graph. This
changeset moves these memory maintenance methods to the SimObject
interface instead.
2015-07-07 09:51:04 +01:00
Andreas Sandberg e9c3d59aae sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
2015-07-07 09:51:04 +01:00
Andreas Sandberg 1dc5e63b88 python: Remove redundant drain when changing memory modes
When the Python helper code switches CPU models, it sometimes also
needs to change the memory mode of the simulator. When this happens,
it accidentally tried to drain the simulator despite having done so
already. This changeset removes the redundant drain.
2015-07-07 09:51:04 +01:00
Andreas Sandberg 7773cb9565 sim: Add macros to serialize objects into a section
Add the SERIALIZE_OBJ / UNSERIALIZE_OBJ macros that serialize an
object into a subsection of the current checkpoint section.
2015-07-07 09:51:04 +01:00
Andreas Sandberg b3ecfa6ae0 base: Add serialization support to Pixels and FrameBuffer
Serialize pixels as unsigned 32 bit integers by adding the required
to_number() and stream operators. This is used by the FrameBuffer,
which now implements the Serializable interface. Users of frame
buffers are expected to serialize it into its own section by calling
serializeSection().
2015-07-07 09:51:04 +01:00
Andreas Sandberg 888ec455cb sim: Fix broken event unserialization
Events expected to be unserialized using an event-specific
unserializeEvent call. This call was never actually used, which meant
the events relying on it never got unserialized (or scheduled after
unserialization).

Instead of relying on a custom call, we now use the normal
serialization code again. In order to schedule the event correctly,
the parrent object is expected to use the
EventQueue::checkpointReschedule() call. This happens automatically
for events that are serialized using the AutoSerialize mechanism.
2015-07-07 09:51:04 +01:00
Andreas Sandberg 76cd4393c0 sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

  * Add a set of APIs to serialize into a subsection of the current
    object. Previously, objects that needed this functionality would
    use ad-hoc solutions using nameOut() and section name
    generation. In the new world, an object that implements the
    interface has the methods serializeSection() and
    unserializeSection() that serialize into a named /subsection/ of
    the current object. Calling serialize() serializes an object into
    the current section.

  * Move the name() method from Serializable to SimObject as it is no
    longer needed for serialization. The fully qualified section name
    is generated by the main serialization code on the fly as objects
    serialize sub-objects.

  * Add a scoped ScopedCheckpointSection helper class. Some objects
    need to serialize data structures, that are not deriving from
    Serializable, into subsections. Previously, this was done using
    nameOut() and manual section name generation. To simplify this,
    this changeset introduces a ScopedCheckpointSection() helper
    class. When this class is instantiated, it adds a new /subsection/
    and subsequent serialization calls during the lifetime of this
    helper class happen inside this section (or a subsection in case
    of nested sections).

  * The serialize() call is now const which prevents accidental state
    manipulation during serialization. Objects that rely on modifying
    state can use the serializeOld() call instead. The default
    implementation simply calls serialize(). Note: The old-style calls
    need to be explicitly called using the
    serializeOld()/serializeSectionOld() style APIs. These are used by
    default when serializing SimObjects.

  * Both the input and output checkpoints now use their own named
    types. This hides underlying checkpoint implementation from
    objects that need checkpointing and makes it easier to change the
    underlying checkpoint storage code.
2015-07-07 09:51:03 +01:00
Andreas Sandberg 7cd5db8c6d sim: Add serialization macros for std containers 2015-07-07 09:51:03 +01:00
Andreas Sandberg 777cc71c4a mem: Cleanup CommMonitor in preparation for probe support
Make configuration parameters constant and get rid of an unnecessary
dependency on the Time class.
2015-07-06 17:08:53 +01:00
Nikos Nikoleris 67925a8334 x86: Adjust the size of the values written to the x87 misc registers
All x87 misc registers are implemented in an array of 64 bit values
but in real hardware the size of some of these registers is smaller.
Previsouly all 64 bits where incorrectly set and then later read.  To
ensure correctness we mask the value in setMiscRegNoEffect to write
only the valid bits.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-07-04 10:43:47 -05:00
Nilay Vaish 11a48faeb4 o3: correct the number of cc registers in rename map 2015-07-04 10:43:46 -05:00
Nilay Vaish d29d7c41f1 mem: packet: Add const to constructor argument 2015-07-04 10:43:46 -05:00
Nilay Vaish 16ac48e6a4 ruby: drop NetworkMessage class
This patch drops the NetworkMessage class.  The relevant data members and functions
have been moved to the Message class, which was the parent of NetworkMessage.
2015-07-04 10:43:46 -05:00
Nilay Vaish baa3eb0de3 ruby: mesi three level: name change to avoid clash
The accessor function getDestination() for Destination variable in the
coherence message clashes with the getDestination() that is part of the Message
class.  Hence the name change.
2015-07-04 10:43:46 -05:00
Nilay Vaish b4efb48a71 ruby: remove message buffer node
This structure's only purpose was to provide a comparison function for
ordering messages in the MessageBuffer.  The comparison function is now
being moved to the Message class itself.  So we no longer require this
structure.
2015-07-04 10:43:46 -05:00
Andreas Hansson 7e711c98f8 mem: Increase the default buffer sizes for the DDR4 controller
This patch increases the default read/write buffer sizes for the DDR4
controller config to values that are more suitable for the high
bandwidth and high bank count.
2015-07-03 10:14:48 -04:00
Wendy Elsasser 31f901b69d mem: Update DRAM command scheduler for bank groups
This patch updates the command arbitration so that bank group timing
as well as rank-to-rank delays will be taken into account. The
resulting arbitration no longer selects commands (prepped or not) that
cannot issue seamlessly if there are commands that can issue
back-to-back, minimizing the effect of rank-to-rank (tCS) & same bank
group (tCCD_L) delays.

The arbitration selects a new command based on the following priority.
Within each priority band, the arbitration will use FCFS to select the
appropriate command:

1) Bank is prepped and burst can issue seamlessly, without a bubble

2) Bank is not prepped, but can prep and issue seamlessly, without a
bubble

3) Bank is prepped but burst cannot issue seamlessly. In this case, a
bubble will occur on the bus

Thus, to enable more parallelism in subsequent selections, an
unprepped packet is given higher priority if the bank prep can be
hidden. If the bank prep cannot be hidden, the selection logic will
choose a prepped packet that cannot issue seamlessly if one exist.
Otherwise, the default selection will choose the packet with the
minimum bank prep delay.
2015-07-03 10:14:46 -04:00
Andreas Hansson b56167b682 mem: Avoid DRAM write queue iteration for merging and read lookup
This patch adds a simple lookup structure to avoid iterating over the
write queue to find read matches, and for the merging of write
bursts. Instead of relying on iteration we simply store a set of
currently-buffered write-burst addresses and compare against
these. For the reads we still perform the iteration if we have a
match. For the writes, we rely entirely on the set. Note that there
are corner-cases where sub-bursts would actually not be mergeable
without a read-modify-write. We ignore these cases and opt for speed.
2015-07-03 10:14:45 -04:00
Andreas Hansson db85ddca1a mem: Delay responses in the crossbar before forwarding
This patch changes how the crossbar classes deal with
responses. Instead of forwarding responses directly and burdening the
neighbouring modules in paying for the latency (through the
pkt->headerDelay), we now queue them before sending them.

The coherency protocol is not affected as requests and any snoop
requests/responses are still passed on in zero time. Thus, the
responses end up paying for any header delay accumulated when passing
through the crossbar. Any latency incurred on the request path will be
paid for on the response side, if no other module has dealt with it.

As a result of this patch, responses are returned at a later
point. This affects the number of outstanding transactions, and quite
a few regressions see an impact in blocking due to no MSHRs, increased
cache-miss latencies, etc.

Going forward we should be able to use the same concept also for snoop
responses, and any request that is not an express snoop.
2015-07-03 10:14:44 -04:00
Andreas Hansson b93c912013 mem: Remove redundant is_top_level cache parameter
This patch takes the final step in removing the is_top_level parameter
from the cache. With the recent changes to read requests and write
invalidations, the parameter is no longer needed, and consequently
removed.

This also means that asymmetric cache hierarchies are now fully
supported (and we are actually using them already with L1 caches, but
no table-walker caches, connected to a shared L2).
2015-07-03 10:14:43 -04:00
Andreas Hansson 71856cfbbc mem: Split WriteInvalidateReq into write and invalidate
WriteInvalidateReq ensures that a whole-line write does not incur the
cost of first doing a read exclusive, only to later overwrite the
data. This patch splits the existing WriteInvalidateReq into a
WriteLineReq, which is done locally, and an InvalidateReq that is sent
out throughout the memory system. The WriteLineReq re-uses the normal
WriteResp.

The change allows us to better express the difference between the
cache that is performing the write, and the ones that are merely
invalidating. As a consequence, we no longer have to rely on the
isTopLevel flag. Moreover, the actual memory in the system does not
see the intitial write, only the writeback. We were marking the
written line as dirty already, so there is really no need to also push
the write all the way to the memory.

The overall flow of the write-invalidate operation remains the same,
i.e. the operation is only carried out once the response for the
invalidate comes back. This patch adds the InvalidateResp for this
very reason.
2015-07-03 10:14:41 -04:00
Andreas Hansson 0ddde83a47 mem: Add ReadCleanReq and ReadSharedReq packets
This patch adds two new read requests packets:

ReadCleanReq - For a cache to explicitly request clean data. The
response is thus exclusive or shared, but not owned or modified. The
read-only caches (see previous patch) use this request type to ensure
they do not get dirty data.

ReadSharedReq - We add this to distinguish cache read requests from
those issued by other masters, such as devices and CPUs. Thus, devices
use ReadReq, and caches use ReadCleanReq, ReadExReq, or
ReadSharedReq. For the latter, the response can be any state, shared,
exclusive, owned or even modified.

Both ReadCleanReq and ReadSharedReq re-use the normal ReadResp. The
two transactions are aligned with the emerging cache-coherent TLM
standard and the AMBA nomenclature.

With this change, the normal ReadReq should never be used by a cache,
and is reserved for the actual (non-caching) masters in the system. We
thus have a way of identifying if a request came from a cache or
not. The introduction of ReadSharedReq thus removes the need for the
current isTopLevel hack, and also allows us to stop relying on
checking the packet size to determine if the source is a cache or
not. This is fixed in follow-on patches.
2015-07-03 10:14:40 -04:00
Andreas Hansson 893533a126 mem: Allow read-only caches and check compliance
This patch adds a parameter to the BaseCache to enable a read-only
cache, for example for the instruction cache, or table-walker cache
(not for x86). A number of checks are put in place in the code to
ensure a read-only cache does not end up with dirty data.

A follow-on patch adds suitable read requests to allow a read-only
cache to explicitly ask for clean data.
2015-07-03 10:14:39 -04:00
Ali Jafri a262908acc mem: Add clean evicts to improve snoop filter tracking
This patch adds eviction notices to the caches, to provide accurate
tracking of cache blocks in snoop filters. We add the CleanEvict
message to the memory heirarchy and use both CleanEvicts and
Writebacks with BLOCK_CACHED flags to propagate notice of clean and
dirty evictions respectively, down the memory hierarchy. Note that the
BLOCK_CACHED flag indicates whether there exist any copies of the
evicted block in the caches above the evicting cache.

The purpose of the CleanEvict message is to notify snoop filters of
silent evictions in the relevant caches. The CleanEvict message
behaves much like a Writeback. CleanEvict is a write and a request but
unlike a Writeback, CleanEvict does not have data and does not need
exclusive access to the block. The cache generates the CleanEvict
message on a fill resulting in eviction of a clean block. Before
travelling downwards CleanEvict requests generate zero-time snoop
requests to check if the same block is cached in upper levels of the
memory heirarchy. If the block exists, the cache discards the
CleanEvict message. The snoops check the tags, writeback queue and the
MSHRs of upper level caches in a manner similar to snoops generated
from HardPFReqs. Currently CleanEvicts keep travelling towards main
memory unless they encounter the block corresponding to their address
or reach main memory (since we have no well defined point of
serialisation). Main memory simply discards CleanEvict messages.

We have modified the behavior of Writebacks, such that they generate
snoops to check for the presence of blocks in upper level caches. It
is possible in our current implmentation for a lower level cache to be
writing back a block while a shared copy of the same block exists in
the upper level cache. If the snoops find the same block in upper
level caches, we set the BLOCK_CACHED flag in the Writeback message.

We have also added logic to account for interaction of other message
types with CleanEvicts waiting in the writeback queue. A simple
example is of a response arriving at a cache removing any CleanEvicts
to the same address from the cache's writeback queue.
2015-07-03 10:14:37 -04:00
Andreas Hansson aa5bbe81f6 mem: Convert Request static const flags to enums
This patch fixes an issue which is very wide spread in the codebase,
causing sporadic linking failures. The issue is that we declare static
const class variables in the header, without any definition (as part
of a source file). In most cases the compiler propagates the value and
we have no issues. However, especially for less optimising builds such
as debug, we get sporadic linking failures due to undefined
references.

This patch fixes the Request class, by turning the static const flags
and master IDs into C++11 typed enums.
2015-07-03 10:14:36 -04:00
Curtis Dunham e385ae0c72 base: remove fd from object loaders
All the object loaders directly examine the (already completely loaded
by object_file.cc) memory image. There is no current motivation to
keep the fd around.
2015-07-03 10:14:34 -04:00
Andreas Hansson c466d55a26 scons: Bump compiler requirement to gcc >= 4.7 and clang >= 3.1
This patch updates the compiler minimum requirement to gcc 4.7 and
clang 3.1, thus allowing:

1. Explicit virtual overrides (no need for M5_ATTR_OVERRIDE)
2. Non-static data member initializers
3. Template aliases
4. Delegating constructors

This patch also enables a transition from --std=c++0x to --std=c++11.
2015-07-03 10:14:15 -04:00
Nilay Vaish 57971248f6 ruby: slicc: remove README
No longer maintained.  Updates are only made to the wiki page.  So being
dropped.
2015-06-25 11:58:30 -05:00
Nilay Vaish 0647d99854 ruby: message: remove a data member added by mistake
I (Nilay) had mistakenly added a data member to  the Message class in revision c1694b4032a6.
The data member is being removed.
2015-06-25 11:58:29 -05:00
Jason Power 2f3c467883 Ruby: Remove assert in RubyPort retry list logic
Remove the assert when adding a port to the RubyPort retry list.
Instead of asserting, just ignore the added port, since it's
already on the list.
Without this patch, Ruby+detailed fails for even the simplest tests
2015-06-25 11:58:28 -05:00
Andreas Sandberg cc813cd5f7 base: Add a warn_if macro
Add a warn if macro that is analogous to the panic_if and fatal_if.
2015-06-21 20:52:13 +01:00
Andreas Sandberg d541038549 arm: Cleanup arch headers to remove dma_device.hh dependency
Break the dependency on dma_device.hh by forward-declaring DmaPort in
the relevant header.
2015-06-21 20:48:33 +01:00
Ali Jafri f0c3b70451 mem: Add check for express snoop in packet destructor
Snoop packets share the request pointer with the originating
packets. We need to ensure that the snoop packet destruction does not
delete the request. Snoops are used for reads, invalidations,
HardPFReqs, Writebacks and CleansEvicts. Reads, invalidations, and
HardPFReqs need a response so their snoops do not delete the
request. For Writebacks and CleanEvicts we need to check explicitly
for whethere the current packet is an express snoop, in whcih case do
not delete the request.
2015-06-09 09:21:18 -04:00
Andreas Hansson 578a7f20c6 mem: Fix snoop packet data allocation bug
This patch fixes an issue where the snoop packet did not properly
forward the data pointer in case of static data.
2015-06-09 09:21:17 -04:00
Rune Holm eb3ed11794 arm: Delete debug print in initialization of hardware thread
There seems to have been a debug print left in when the original ARMv8
support was merged in. This printout is performed every time you
initialize a hardware thread, and it prints raw pointers, so it always
causes diffs in the regression. This patch removes the debug print.
2015-06-09 09:21:16 -04:00
Rune Holm f4311d3932 arm: Fix typo in ldrsh instruction name
ldrsh was typoed as hdrsh, which is a bit annoying when printing
instructions.  This patch fixes it.
2015-06-09 09:21:15 -04:00
Andreas Sandberg 737e5da7f6 base: Reset CircleBuf size on flush()
The flush() method in CircleBuf resets the state of the circular
buffer, but fails to set size to zero. This obviously confuses code
that tries to determine the amount of data in the buffer. Set the size
to zero on flush.
2015-06-09 09:21:14 -04:00
Andreas Sandberg a9cad92011 dev, arm: Include PIO size in AmbaDmaDevice constructor
Make it possible to specify the size of the PIO space for an AMBA DMA
device. Maintain backwards compatibility and default to zero.
2015-06-09 09:21:12 -04:00
Marco Elver 6599dd87c8 ruby: Fix MESI consistency bug
Fixes missed forward eviction to CPU. With the O3CPU this can lead to load-load
reordering, as the LQ is never notified of the invalidate.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-06-07 14:02:40 -05:00
Matthias Jung 25fe4c2529 mem: Add HMC Timing Parameters
A single HMC-2500 x32 model based on:

[1] DRAMSpec: a high-level DRAM bank modelling tool developed at the University
of Kaiserslautern. This high level tool uses RC (resistance-capacitance) and CV
(capacitance-voltage) models to estimate the DRAM bank latency and power
numbers.

[2] A Logic-base Interconnect for Supporting Near Memory Computation in the
Hybrid Memory Cube (E. Azarkhish et. al) Assumed for the HMC model is a 30 nm
technology node.  The modelled HMC consists of a 4 Gbit part with 4 layers
connected with TSVs.  Each layer has 16 vaults and each vault consists of 2
banks per layer.  In order to be able to use the same controller used for 2D
DRAM generations for HMC, the following analogy is done: Channel (DDR) => Vault
(HMC) device_size (DDR) => size of a single layer in a vault ranks per channel
(DDR) => number of layers banks per rank (DDR) => banks per layer devices per
rank (DDR) => devices per layer ( 1 for HMC).  The parameters for which no
input is available are inherited from the DDR3 configuration.
2015-06-07 14:02:40 -05:00
Ruslan Bukin ext:(%2C%20Zhang%20Guoye) 736d3314bf arch: fix build under MacOSX
put O_DIRECT under ifdefs -- this fixes build for MacOSX.
Also use correct class for arm64 openFlagTable.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-06-07 14:02:40 -05:00
Christoph Pfister 4a17494708 mem: addr_mapper: restore old address if request not sent
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-05-30 13:45:17 +02:00
Curtis Dunham 31825bd988 sim, arm: add checkpoint upgrader for d02b45a5
The insertion of CONTEXTIDR_EL2 in the ARM miscellaneous registers
obsoletes old checkpoints.
2015-06-01 18:05:11 -05:00
Andreas Sandberg 7c4eb3b4d8 kvm, arm: Add support for aarch64
This changeset adds support for aarch64 in kvm. The CPU module
supports both checkpointing and online CPU model switching as long as
no devices are simulated by the host kernel. It currently has the
following limitations:

   * The system register based generic timer can only be simulated by
     the host kernel. Workaround: Use a memory mapped timer instead to
     simulate the timer in gem5.

   * Simulating devices (e.g., the generic timer) in the host kernel
     requires that the host kernel also simulates the GIC.

   * ID registers in the host and in gem5 must match for switching
     between simulated CPUs and KVM. This is particularly important
     for ID registers describing memory system capabilities (e.g.,
     ASID size, physical address size).

   * Switching between a virtualized CPU and a simulated CPU is
     currently not supported if in-kernel device emulation is
     used. This could be worked around by adding support for switching
     to the gem5 (e.g., the KvmGic) side of the device models. A
     simpler workaround is to avoid in-kernel device models
     altogether.
2015-06-01 19:44:19 +01:00
Andreas Sandberg dbfd6effe0 kvm, arm, dev: Add an in-kernel GIC implementation
This changeset adds a GIC implementation that uses the kernel's
built-in support for simulating the interrupt controller. Since there
is currently no support for state transfer between gem5 and the
kernel, the device model does not support serialization and CPU
switching (which would require switching to a gem5-simulated GIC).
2015-06-01 19:44:17 +01:00
Andreas Sandberg 8e7c0575dc kvm: Handle inst events at the current instruction count
There are cases (particularly when attaching GDB) when instruction
events are scheduled at the current instruction tick. This used to
trigger an assertion error in kvm. This changeset adds a check for
this condition and forces KVM to do a quick entry that completes any
pending IO operations, but does not execute any new instructions,
before servicing the event. We could check if we need to enter KVM at
all, but forcing a quick entry is makes the code slightly cleaner and
does not hurt correctness (performance is hardly an issue in these
cases).
2015-06-01 19:43:41 +01:00
Andreas Sandberg 06cf5cc60b kvm, arm: Move ARM-specific files to arch/arm/kvm/
This changeset moves the ARM-specific KVM CPU implementation to
arch/arm/kvm/. This change is expected to keep the source tree
somewhat cleaner as we start adding support for ARMv8 and KVM
in-kernel interrupt controller simulation.

--HG--
rename : src/cpu/kvm/ArmKvmCPU.py => src/arch/arm/kvm/ArmKvmCPU.py
rename : src/cpu/kvm/arm_cpu.cc => src/arch/arm/kvm/arm_cpu.cc
rename : src/cpu/kvm/arm_cpu.hh => src/arch/arm/kvm/arm_cpu.hh
2015-06-01 19:43:40 +01:00
Curtis Dunham e590f0d1ef arm: implement the CONTEXTIDR_EL2 system reg. 2015-05-26 03:21:45 -04:00
Nathanael Premillieu 31fd18ab15 arm: Make address translation faster with better caching
This patch adds better caching of the sys regs for AArch64, thus
avoiding unnecessary calls to tc->readMiscReg(MISCREG_CPSR) in the
non-faulting case.
2015-05-26 03:21:42 -04:00
Andreas Hansson 53a360985b base: Allow multiple interleaved ranges
This patch changes how the address range calculates intersection such
that a system can have a number of non-overlapping interleaved ranges
without complaining. Without this patch we end up with a panic.
2015-05-26 03:21:40 -04:00
Andrew Bardsley cea1d14a93 cpu: Fix a bug in counting issued instructions in MinorCPU
The MinorCPU would count bubbles in Execute::issue as part of
the num_insts_issued and so sometimes reach the instruction
issue limit incorrectly.

Fixed by checking for a bubble in one new place.
2015-05-26 03:21:37 -04:00
Giacomo Gabrielli cc2346e8ca arm: Implement some missing syscalls (SE mode)
Adding a few syscalls that were previously considered unimplemented.
2015-05-26 03:21:35 -04:00
Andreas Hansson 0cc350d2c5 ruby: Deprecation warning for RubyMemoryControl
A step towards removing RubyMemoryControl and shift users to
DRAMCtrl. The latter is faster, more representative, very versatile,
and is integrated with power models.
2015-05-26 03:21:34 -04:00
Andreas Sandberg f3f06e1684 arm, dev: Add support for a memory mapped generic timer
There are cases when we don't want to use a system register mapped
generic timer, but can't use the SP804. For example, when using KVM on
aarch64, we want to intercept accesses to the generic timer, but can't
do so if it is using the system register interface. In such cases,
we need to use a memory-mapped generic timer.

This changeset adds a device model that implements the memory mapped
generic timer interface. The current implementation only supports a
single frame (i.e., one virtual timer and one physical timer).
2015-05-23 13:46:56 +01:00
Andreas Sandberg 6533f2000b arm: Get rid of pointless have_generic_timer param
The ArmSystem class has a parameter to indicate whether it is
configured to use the generic timer extension or not. This parameter
doesn't affect any feature flags in the current implementation and is
therefore completely unnecessary. In fact, we usually don't set it
even if a system has a generic timer. If we ever need to check if
there is a generic timer present, we should just request a pointer and
check if it is non-null instead.
2015-05-23 13:46:54 +01:00
Andreas Sandberg 2278fec1d1 dev, arm: Add virtual timers to the generic timer model
The generic timer model currently does not support virtual
counters. Virtual and physical counters both tick with the same
frequency. However, virtual timers allow a hypervisor to set an offset
that is subtracted from the counter when it is read. This enables the
hypervisor to present a time base that ticks with virtual time in the
VM (i.e., doesn't tick when the VM isn't running). Modern Linux
kernels generally assume that virtual counters exist and try to use
them by default.
2015-05-23 13:46:53 +01:00
Andreas Sandberg 65f3f097d3 dev, arm: Refactor and clean up the generic timer model
This changeset cleans up the generic timer a bit and moves most of the
register juggling from the ISA code into a separate class in the same
source file as the rest of the generic timer. It also removes the
assumption that there is always 8 or fewer CPUs in the system. Instead
of having a fixed limit, we now instantiate per-core timers as they
are requested. This is all in preparation for other patches that add
support for virtual timers and a memory mapped interface.
2015-05-23 13:46:52 +01:00
Andreas Sandberg 5435f25ec8 kvm: Fix dumping code for large registers
The register dumping code in kvm tries to print the bytes in large
registers (128 bits and larger) instead of printing them as hex. This
changeset fixes that.
2015-05-23 13:37:22 +01:00
Andreas Sandberg ed447bbff9 kvm, x86: Guard x86-specific APIs in KvmVM
Protect x86-specific APIs in KvmVM with compile-time guards to avoid
breaking ARM builds.
2015-05-23 13:37:20 +01:00
Andreas Sandberg cba3a125e1 arm: Workaround incorrect HDLCD register order in kernel
Some versions of the kernel incorrectly swap the red and blue color
select registers. This changeset adds a workaround for that by
swapping them when instantiating a PixelConverter.
2015-05-23 13:37:04 +01:00
Andreas Sandberg db5c9a5f90 base: Redesign internal frame buffer handling
Currently, frame buffer handling in gem5 is quite ad hoc. In practice,
we pass around naked pointers to raw pixel data and expect consumers
to convert frame buffers using the (broken) VideoConverter.

This changeset completely redesigns the way we handle frame buffers
internally. In summary, it fixes several color conversion bugs, adds
support for more color formats (e.g., big endian), and makes the code
base easier to follow.

In the new world, gem5 always represents pixel data using the Pixel
struct when pixels need to be passed between different classes (e.g.,
a display controller and the VNC server). Producers of entire frames
(e.g., display controllers) should use the FrameBuffer class to
represent a frame.

Frame producers are expected to create one instance of the FrameBuffer
class in their constructors and register it with its consumers
once. Consumers are expected to check the dimensions of the frame
buffer when they consume it.

Conversion between the external representation and the internal
representation is supported for all common "true color" RGB formats of
up to 32-bit color depth. The external pixel representation is
expected to be between 1 and 4 bytes in either big endian or little
endian. Color channels are assumed to be contiguous ranges of bits
within each pixel word. The external pixel value is scaled to an 8-bit
internal representation using a floating multiplication to map it to
the entire 8-bit range.
2015-05-23 13:37:03 +01:00
Andreas Sandberg 1985d28ef9 base: Clean up bitmap generation code
The bitmap generation code is hard to follow and incorrectly uses the
size of an enum member to calculate the size of a pixel. This
changeset cleans up the code and adds some documentation.
2015-05-23 13:37:01 +01:00
Joel Hestness 0479569f67 ruby: Fix RubySystem warm-up and cool-down scope
The processes of warming up and cooling down Ruby caches are simulation-wide
processes, not just RubySystem instance-specific processes. Thus, the warm-up
and cool-down variables should be globally visible to any Ruby components
participating in either process. Make these variables static members and track
the warm-up and cool-down processes as appropriate.

This patch also has two side benefits:
1) It removes references to the RubySystem g_system_ptr, which are problematic
for allowing multiple RubySystem instances in a single simulation. Warmup and
cooldown variables being static (global) reduces the need for instance-specific
dereferences through the RubySystem.
2) From the AbstractController, it removes local RubySystem pointers, which are
used inconsistently with other uses of the RubySystem: 11 other uses reference
the RubySystem with the g_system_ptr. Only sequencers have local pointers.
2015-05-19 10:56:51 -05:00
Andreas Hansson 99d3fa5945 arm: Identify table-walker requests
This patch ensures all page-table walks are flagged as such.
2015-05-15 13:40:01 -04:00
Andreas Hansson bd583d00f9 misc: Appease gcc 5.1
Three minor issues are resolved:

1. Apparently gcc 5.1 does not like negation of booleans followed by
   bitwise AND.

2. Somehow the compiler also gets confused and warns about
   NoopMachInst being unused (removing it causes compilation errors
   though). Most likely a compiler bug.

3. There seems to be a number of instances where loop unrolling causes
   false positives for the array-bounds check. For now, switch to
   std::array. Potentially we could disable the warning for newer gcc
   versions, but switching to std::array is probably a good move in
   any case.
2015-05-15 13:39:53 -04:00
Andreas Sandberg 37aab4a155 sim: Don't clear the active CPU vector in System::initState
The system class currently clears the vector of active CPUs in
initState(). CPUs are added to the list by registerThreadContext()
which is called from BaseCPU::init(). This obviously breaks when the
System object is initialized after the CPUs. This changeset removes
the offending clear() call since the list will be empty after it has
been instantiated anyway.
2015-05-15 13:39:44 -04:00
Steve Reinhardt c65fa3dceb syscall_emul: fix warn_once behavior
The current ignoreWarnOnceFunc doesn't really work as expected,
since it will only generate one warning total, for whichever
"warn-once" syscall is invoked first.  This patch fixes that
behavior by keeping a "warned" flag in the SyscallDesc object,
allowing suitably flagged syscalls to warn exactly once per
syscall.
2015-05-05 09:25:59 -07:00
Andreas Hansson f349592071 arm: Add missing FPEXC.EN check
Add a missing check to ensure that exceptions are generated properly.
2015-05-05 03:22:45 -04:00
Giacomo Gabrielli a3f23894eb arm: enable DCZVA by default in SE mode 2015-05-05 03:22:42 -04:00
Stephan Diestelhorst 2847d5f517 mem: Create a request copy for deferred snoops
Sometimes, we need to defer an express snoop in an MSHR, but the original
request might complete and deallocate the original pkt->req.  In those cases,
create a copy of the request so that someone who is inspecting the delayed
snoop can also inspect the request still.  All of this is rather hacky, but the
allocation / linking and general life-time management of Packet and Request is
rather tricky.  Deleting the copy is another tricky area, testing so far has
shown that the right copy is deleted at the right time.
2015-03-17 11:50:55 +00:00
Andreas Sandberg 706597f021 arm: Relax ordering for some uncacheable accesses
We currently assume that all uncacheable memory accesses are strictly
ordered. Instead of always enforcing strict ordering, we now only
enforce it if the required memory type is device memory or strongly
ordered memory.
2015-05-05 03:22:34 -04:00
Andreas Sandberg 48281375ee mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

    * Speculation: CPU models are not allowed to issue speculative
      loads.

    * Write combining: CPU models and caches are not allowed to merge
      writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
2015-05-05 03:22:33 -04:00
Andreas Sandberg 1da634ace0 mem, alpha: Move Alpha-specific request flags
Move Alpha-specific memory request flags to an architecture-specific
header and map them to the architecture specific flag bit range.
2015-05-05 03:22:31 -04:00
Andreas Hansson 23b9792681 arm: Remove unnecessary boot uncachability
With the recent patches addressing how we deal with uncacheable
accesses there is no longer need for the work arounds put in place to
enforce certain sections of memory to be uncacheable during boot.
2015-05-05 03:22:30 -04:00
Andreas Hansson 36f29496a0 mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.

The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
2015-05-05 03:22:29 -04:00
Andreas Hansson 554ddc7c07 arch, cpu: Do not forward snoops to table walker
This patch simplifies the overall CPU by changing the TLB caches such
that they do not forward snoops to the table walker port(s). Note that
only ARM and X86 are affected.

There is no reason for the ports to snoop as they do not actually take
any action, and from a performance point of view we are better of not
snooping more than we have to.

Should it at a later point be required to snoop for a particular TLB
design it is easy enough to add it back.
2015-05-05 03:22:27 -04:00
Andreas Hansson 14e5b2ea55 mem: Pass shared downstream through caches
This patch ensures that we pass on information about a packet being
shared (rather than exclusive), when forwarding a packet downstream.

Without this patch there is a risk that a downstream cache considers
the line exclusive when it really isn't.
2015-05-05 03:22:26 -04:00
Ali Jafri 3d33432136 mem: Add forward snoop check for HardPFReqs
We should always check whether the cache is supposed to be forwarding snoops
before generating snoops.
2015-05-05 03:22:25 -04:00
Andreas Hansson 0ebbf3f951 mem: Add missing stats update for uncacheable MSHRs
This patch adds a missing counter update for the uncacheable
accesses. By updating this counter we also get a meaningful average
latency for uncacheable accesses (previously inf).
2015-05-05 03:22:24 -04:00
Andreas Hansson 33e3e370f2 mem: Tidy up BaseCache parameters
This patch simply tidies up the BaseCache parameters and removes the
unused "two_queue" parameter.
2015-05-05 03:22:22 -04:00
David Guillen 5287945a8b mem: Remove templates in cache model
This patch changes the cache implementation to rely on virtual methods
rather than using the replacement policy as a template argument.

There is no impact on the simulation performance, and overall the
changes make it easier to modify (and subclass) the cache and/or
replacement policy.
2015-05-05 03:22:21 -04:00
Andreas Hansson d0d933facc cpu: Work around gcc 4.9 issues with Num_OpClasses
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
2015-05-05 03:22:19 -04:00
Ruslan Bukin 81f3211149 arch, base, dev, kern, sym: FreeBSD support
This adds support for FreeBSD/aarch64 FS and SE mode (basic set of syscalls only)

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-29 22:35:23 -05:00
Rizwana Begum 52a3bc5e5c mem: Simplify page close checks for adaptive policies
Both open_adaptive and close_adaptive page polices keep the page
open if a row hit is found. If a row hit is not found, close_adaptive
page policy precharges the row, and open_adaptive policy precharges
the row only if there is a bank conflict request waiting in the queue.

This patch makes the checks for above conditions simpler.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-29 22:35:22 -05:00
Nilay Vaish 3a2731fb8c ruby: set: replace long by unsigned long
UBSan complains about negative value being shifted
2015-04-29 22:35:22 -05:00
Nilay Vaish 4333549575 cpu: o3: replace issueLatency with bool pipelined
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued.  As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined).  The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value.  To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean.  If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
2015-04-29 22:35:22 -05:00
Nilay Vaish 0dbd696aae cpu: o3: single cycle default div microop latency on x86
This patch sets the default latency of the division microop to a single cycle
on x86.  This is because the division instructions DIV and IDIV have been
implemented as loops of div microops, where each microop computes a single bit
of the quotient.
2015-04-29 22:35:22 -05:00
Nilay Vaish ee06fed656 x86: change divide-by-zero fault to divide-error
Same exception is raised whether division with zero is performed or the
quotient is greater than the maximum value that the provided space can hold.
Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.
2015-04-29 22:35:22 -05:00
Andreas Hansson 179787f31f misc: Appease gcc 5.1 without moving GDB_REG_BYTES
This patch rolls back the move of the GDB_REG_BYTES constant, and
instead adds M5_VAR_USED.
2015-04-24 03:30:08 -04:00
Rene de Jong 483f873d01 arm, dev: Add a UFS device
This patch introduces a UFS host controller and a UFS device. More
information about the UFS standard can be found at the JEDEC site:
http://www.jedec.org/standards-documents/results/jesd220

Note that the model does not implement the complete standard, and as
such is not an actual implementation of UFS. The following SCSI
commands are implemented: inquiry, read, read capacity, report LUNs,
start/stop, test unit ready, verify, write, format unit, send
diagnostic, synchronize cache, mode select, mode sense, request sense,
unmap, write buffer and read buffer. This is sufficient for usage with
Linux and Android.

To interact with this model a kernel version 3.9 or above is
needed.
2015-04-23 13:37:50 -04:00
Rene de Jong fff28ce954 arm, dev: Add a NAND flash timing model
This adds a NAND flash timing model. This model takes the number of
planes into account and is ultimately intended to be used as a
high-level performance model for any device using flash. To access the
memory, use either readMemory or writeMemory.

To make use of the model you will need an interface model
such as UFSHostDevice, which is part of a separate patch.

At the moment the flash device is part of the ARM device tree since
the only use if the UFSHostDevice, and that in turn relies on the ARM
GIC.
2015-04-23 13:37:49 -04:00
Peter Enns 2e64590b88 dev: Add support for i2c devices
This patch adds an I2C bus and base device. I2C is used to connect a
variety of sensors, and this patch serves as a starting point to
enable a range of I2C devices.
2015-04-23 13:37:48 -04:00
Andreas Hansson c8c4f66889 misc: Appease gcc 5.1
This patch fixes a few small issues to ensure gem5 compiles when using
gcc 5.1.

First, the GDB_REG_BYTES in the RemoteGDB header are, rather
surprisingly, flagged as unused for both ARM and X86. Removing them,
however, causes compilation errors as they are actually used in the
source file. Moving the constant into the class definition fixes the
issue. Possibly a gcc bug.

Second, we have an unused EthPktData constructor using auto_ptr, and
the latter is deprecated. Since the code is never used it is simply
removed.
2015-04-23 13:37:46 -04:00
Brandon Potter a70a83155b cpu: remove conditional check (count > 0) on o3 IQ squashes
The o3 cpu instruction queue model uses the count variable to track the number
of unissued instructions in the queue. Previously, the squash method used
this variable to avoid executing the doSquash method when there were no
unissued instructions in the pipeline.  A corner case problem exists when
only issued instructions exist in the pipeline and a squash occurs; the
doSquash code is not invoked and subsequently does not clean up state properly.
2015-04-22 07:52:03 -07:00
Brandon Potter 4991c29965 syscall_emul: implement clock_gettime system call 2015-04-22 07:51:27 -07:00
Monir Mozumder 00e3cab8fc syscall_emul: update x86 syscall table
Update table with additional definitions through Linux 3.13.
2015-04-22 07:51:27 -07:00
Brandon Potter 344a437064 syscall_emul: update getrlimit to use warn
Don't use std::cerr directly, and just return EINVAL instead of aborting.
2015-04-22 07:51:27 -07:00
Brandon Potter 9c6509f198 syscall_emul: fix warning with wrong syscall name
Also nix extra whitespace.
2015-04-22 07:51:27 -07:00
Brandon Potter 6ad29ba6df base: add new ChunkGenerator method to identify last chunk 2015-04-22 07:51:27 -07:00
Andreas Hansson cd76e34056 cpu: Remove the InOrderCPU from the tree
This patch takes the final step in removing the InOrderCPU from the
tree. Rest in peace.

The MinorCPU is now used to model an in-order microarchitecture, and
long term the MinorCPU will eventually be renamed InOrderCPU.
2015-04-20 12:46:35 -04:00
Malek Musleh 826f69b470 config, cpu: fix progress interval for switched CPUs
This patch ensures that the CPU progress Event is triggered for the new set of
switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids
printing the interval state if the cpu is currently switched out.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-14 11:01:10 -05:00
Dibakar Gope 34ad1123ee cpu: re-organizes the branch predictor structure.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-13 17:33:57 -05:00
Nilay Vaish e596e52498 x86: implements x87 mult/div instructions 2015-04-13 17:33:57 -05:00
Lena Olson dea7acdb3e ruby: allow restoring from checkpoint when using DRAMCtrl
Restoring from a checkpoint with ruby + the DRAMCtrl memory model was not
working, because ruby and DRAMCtrl disagreed on the current tick during warmup.
Since there is no reason to do timing requests during warmup, use functional
requests instead.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-13 17:33:57 -05:00
Nilay Vaish d6af46915c sim: Use NULL instead of None for testing filenames.
The filenames are initialized with NULL.  So the test should be
checking for them to be == NULL instead == None.
2015-04-13 17:33:57 -05:00
Nilay Vaish b26fef8466 sim: fix function for emulating dup()
The function was using the host fd to obtain the fd object from the simulated
process.
2015-04-13 17:33:57 -05:00
Curtis Dunham c3268f8820 config: Support full-system with SST's memory system
This patch adds an example configuration in ext/sst/tests/ that allows
an SST/gem5 instance to simulate a 4-core AArch64 system with SST's
memHierarchy components providing all the caches and memories.
2015-04-08 15:56:06 -05:00
Nikos Nikoleris 4bdbdd8413 dev: (un)serialize fix for the RTC and RTC Timer Interrupt events
Restoring from a checkpoint fails if either the RTC or the RTC Timer
Interrrupt event is disabled. The restored machine tried incorrectly
to schedule the next event with negative offset.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-03 11:42:10 -05:00
Ruslan Bukin bebab7f24f sim: correct check for endianess
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-03 11:42:10 -05:00
Ruslan Bukin b3314673f4 dev: Extend access width for IDE control registers
Add 32-bit access width for PrimaryTiming register and 16bit for UDMAControl
register as FreeBSD required.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-03 11:42:10 -05:00
Nikos Nikoleris 305e29b98e cpu: fix system total instructions accounting
The totalInstructions counter is only incremented when the whole instruction is
commited and not on every microop. It was incorrectly reset in atomic and
timing cpus.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
2015-04-03 11:42:10 -05:00
Lena Olson 333988a73e x86: fix debug trace output for mwait
When running with the Exec flag, the mwait instruction attempted
to print out its source registers, which were never actually
initialized. This led to sporadic assertion failures when the
value stored there was invalid.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-03 11:42:10 -05:00
Stephan Diestelhorst cb8856f580 mem: Support any number of master-IDs in stride prefetcher
The stride prefetcher had a hardcoded number of contexts (i.e. master-IDs)
that it could handle. Since master IDs need to be unique per system, and
every core, cache etc. requires a separate master port, a static limit on
these does not make much sense.

Instead, this patch adds a small hash map that will map all master IDs to
the right prefetch state and dynamically allocates new state for new master
IDs.
2015-03-27 04:56:03 -04:00
Andreas Hansson 0197e580e5 mem: Allocate cache writebacks before new MSHRs
This patch changes the order of writeback allocation such that any
writebacks resulting from a tag lookup (e.g. for an uncacheable
access), are added to the writebuffer before any new MSHR entries are
allocated. This ensures that the writebacks logically precedes the new
allocations.

The patch also changes the uncacheable flush to use proper timed (or
atomic) writebacks, as opposed to functional writes.
2015-03-27 04:56:02 -04:00
Andreas Hansson 24763c2177 mem: Cleanup flow for uncacheable accesses
This patch simplifies the code dealing with uncacheable timing
accesses, aiming to align it with the existing miss handling. Similar
to what we do in atomic, a timing request now goes through
Cache::access (where the block is also flushed), and then proceeds to
ignore any existing MSHR for the block in question. This unifies the
flow for cacheable and uncacheable accesses, and for atomic and timing.
2015-03-27 04:56:01 -04:00
Andreas Hansson a7a1e6004a mem: Ignore uncacheable MSHRs when finding matches
This patch changes how we search for matching MSHRs, ignoring any MSHR
that is allocated for an uncacheable access. By doing so, this patch
fixes a corner case in the MSHRs where incorrect data ended up being
copied into a (cacheable) read packet due to a first uncacheable MSHR
target of size 4, followed by a cacheable target to the same MSHR of
size 64. The latter target was filled with nonsense data.
2015-03-27 04:56:00 -04:00
Andreas Hansson 801ce65eae mem: Remove redundant allocateUncachedReadBuffer in cache
This patch removes the no-longer-needed
allocateUncachedReadBuffer. Besides the checks it is exactly the same
as allocateMissBuffer and thus provides no value.
2015-03-27 04:55:59 -04:00
Andreas Hansson fe806a0dd7 mem: Modernise MSHR iterators to C++11
This patch updates the iterators in the MSHR and MSHR queues to use
C++11 range-based for loops. It also does a bit of additional house
keeping.
2015-03-27 04:55:57 -04:00
Andreas Hansson 7bae98459c mem: Align all MSHR entries to block boundaries
This patch aligns all MSHR queue entries to block boundaries to
simplify checks for matches. Previously there were corner cases that
could lead to existing entries not being identified as matches.

There are, rather alarmingly, a few regressions that change with this
patch.
2015-03-27 04:55:55 -04:00
Ali Jafri 15f0d9ff14 mem: Rename PREFETCH_SNOOP_SQUASH flag to BLOCK_CACHED
This patch subsumes the PREFETCH_SNOOP_SQUASH flag with the more
generic BLOCK_CACHED flag. Future patches implementing cache eviction
messages can use the BLOCK_CACHED flag in almost the same manner as
hardware prefetches use the PREFETCH_SNOOP_SQUASH flag. The
PREFTECH_SNOOP_FLAG is set if the prefetch target is found in the tags
or the MSHRs in any state, so we are simply replacing calls to
setPrefetchSquashed() with setBlockCached(). The case of where the
prefetch target is found in the writeback MSHRs of upper level caches
continues to be covered by the MEM_INHIBIT flag.
2015-03-27 04:55:54 -04:00
Curtis Dunham a1164154de sim: Update limit_event reuse to final version
Matching final version on reviewboard.
2015-03-26 11:16:44 -04:00
Andreas Hansson a196dbe3bf cpu: Fix InstPBTrace inheritance
This patch fixes an issue that prevented gem5 to be built with C++
config and without Python.
2015-03-26 11:16:43 -04:00
Steve Reinhardt 6677b9122a mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW
Makes x86-style locked operations even more distinct from
LLSC operations.  Using "locked" by itself should be
obviously ambiguous now.
2015-03-23 16:14:20 -07:00
Steve Reinhardt 5302305255 misc: quote args in echoed command line
Currently if there are shell special characters in a
command-line argument, you can't copy and paste the
echoed command line onto a shell prompt because the
characters aren't quoted properly.  This patch fixes
that problem.
2015-03-23 16:14:18 -07:00
Curtis Dunham 564482c782 sim: Reuse the same limit_event in simulate()
This patch accomplishes two things:
1. Makes simulate()'s GlobalSimLoopExitEvent a singleton reused
   across calls. This is slightly more efficient than recreating
   it every time.
2. Gives callers to simulate() (especially other simulators) a
   foolproof way of knowing that the simulation period ended
   successfully by hitting the limit event. They can call
   getLimitEvent() and compare it to the return
   value of simulate().

This change was motivated by an ongoing effort to integrate gem5
and SST, with SST as the master sim and gem5 as the slave sim.
2015-03-23 06:57:36 -04:00
Andreas Hansson 45286d9b64 mem: Tidy up Request
This patch does a bit of house keeping, fixing up typos, removing dead
code etc.
2015-03-23 06:57:34 -04:00
Matt Evans 1ccc3d7e5b arm: Add a GICv2m device
This patch adds a new PIO-accessible GICv2m shim. This shim has a PIO
slave port on one side, and SPI 'wires' on the other. It accepts MSIs
from the system and triggers SPIs on the GIC. It is configurable with
a number of frames, each of which has a number of SPIs and a base SPI
offset.

A Linux driver for GICv2m is available upstream.
2015-03-19 04:06:17 -04:00
Matt Evans ec80224188 arm: Remove the 'magic MSI register' in the GIC (PL390)
This patch removes the code that added this magic register. A
follow-up patch provides a GICv2m MSI shim that gives the same
functionality in a standard ARM system architecture way.
2015-03-19 04:06:16 -04:00
Wendy Elsasser 9b4d8030e6 cpu: Fix TrafficGen message format
Fix erroneous message format for fatal error.
Previously, code did not have type indicator (% instead of %d).

Also removed redundant fatal check.

Ran modified sweep.py with in range and out of range values to test.
2015-03-19 04:06:12 -04:00
Andreas Hansson 5275c9d740 mem: Use emplace front/back for deferred packets
Embrace C++11 for the deferred packets as we actually store the
objects in the data structure, and not just pointers.
2015-03-19 04:06:11 -04:00
Geoffrey Blake 1d403960af mem: Enable CommMonitor to output traces in atomic mode
The CommMonitor by default only allows memory traces to be gathered in
timing mode. This patch allows memory traces to be gathered in atomic
mode if all one needs is a functional trace of memory addresses used
and timing information is of a secondary concern.
2015-03-19 04:06:10 -04:00
Steve Reinhardt e57ab463cf mem: remove redundant test in in Cache::recvTimingResp()
For some reason we were checking mshr->hasTargets() even though
we had already called mshr->getTarget() unconditionally earlier
in the same function (which asserts if there are no targets).
Get rid of this useless check, and while we're at it get rid
of the redundant call to mshr->getTarget(), since we still have
the value saved in a local var.
2015-02-11 10:48:53 -08:00
Steve Reinhardt 89bb03a1a6 mem: add local var in Cache::recvTimingResp()
The main loop in recvTimingResp() uses target->pkt all over
the place.  Create a local tgt_pkt to help keep lines
under the line length limit.
2015-02-11 10:48:52 -08:00
Steve Reinhardt ee0b52404c mem: restructure Packet cmd initialization a bit more
Refactor the way that specific MemCmd values are generated for packets.
The new approach is a little more elegant in that we assign the right
value up front, and it's also more amenable to non-heap-allocated
Packet objects.

Also replaced the code in the Minor model that was still doing it the
ad-hoc way.

This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
2015-02-11 10:48:50 -08:00
Steve Reinhardt ccef61d1cc mem: clean up write buffer check in Cache::handleSnoop()
The 'if (writebacks.size)' check was redundant, because
writeBuffer.findMatches() would return false if the
writebacks list was empty.

Also renamed 'mshr' to 'wb_entry' in this context since
we are pointing at a writebuffer entry and not an MSHR
(even though it's the same C++ class).
2015-03-14 06:51:07 -07:00
Nilay Vaish e5fbc67e16 cpu: o3: another assert instead of check 2015-03-09 09:39:08 -05:00
Nilay Vaish 5003ed5f7a cpu: o3: Remove unused code in iew, add assert instead. 2015-03-09 09:39:08 -05:00
Nilay Vaish 4e1d10a3cf cpu: o3: commit: mark pipeline delay variable as consts 2015-03-09 09:39:08 -05:00
Nilay Vaish 53de2512b1 cpu: o3: remove unused stat variables. 2015-03-09 09:39:08 -05:00
Nilay Vaish 54bc67f619 cpu: o3: combine if with same condition 2015-03-09 09:39:07 -05:00
Nilay Vaish 61edd5ac97 cpu: o3: remove member variable squashCounter
The variable is used in only one place and a whole new function setNextStatus()
has been defined just to compute the value of the variable.  Instead of calling
the function, the value is now computed in the loop that preceded the function
call.
2015-03-09 09:39:07 -05:00
Nilay Vaish f69a74fda6 cpu: o3: remove unused function annotateMemoryUnits() 2015-03-09 09:39:07 -05:00
Andreas Hansson fc315901ff mem: Unify all cache DPRINTF address formatting
This patch changes all the DPRINTF messages in the cache to use
'%#llx' every time a packet address is printed. The inclusion of '#'
ensures '0x' is prepended, and since the address type is a uint64_t %x
really should be %llx.
2015-03-02 04:00:56 -05:00
Andreas Hansson 88e2963951 mem: Fix cache MSHR conflict determination
This patch fixes a rather subtle issue in the sending of MSHR requests
in the cache, where the logic previously did not check for conflicts
between the MSRH queue and the write queue when requests were not
ready. The correct thing to do is to always check, since not having a
ready MSHR does not guarantee that there is no conflict.

The underlying problem seems to have slipped past due to the symmetric
timings used for the write queue and MSHR queue. However, with the
recent timing changes the bug caused regressions to fail.
2015-03-02 04:00:54 -05:00
Andreas Hansson 407737614e mem: Add byte mask to Packet::checkFunctional
This patch changes the valid-bytes start/end to a proper byte
mask. With the changes in timing introduced in previous patches there
are more packets waiting in queues, and there are regressions using
the checker CPU failing due to non-contigous read data being found in
the various cache queues.

This patch also adds some more comments explaining what is going on,
and adds the fourth and missing case to Packet::checkFunctional.
2015-03-02 04:00:52 -05:00
Stephan Diestelhorst ecef1612b8 mem: Add option to force in-order insertion in PacketQueue
By default, the packet queue is ordered by the ticks of the to-be-sent
packages. With the recent modifications of packages sinking their header time
when their resposne leaves the caches, there could be cases of MSHR targets
being allocated and ordered A, B, but their responses being sent out in the
order B,A. This led to inconsistencies in bus traffic, in particular the snoop
filter observing first a ReadExResp and later a ReadRespWithInv.  Logically,
these were ordered the other way around behind the MSHR, but due to the timing
adjustments when inserting into the PacketQueue, they were sent out in the
wrong order on the bus, confusing the snoop filter.

This patch adds a flag (off by default) such that these special cases can
request in-order insertion into the packet queue, which might offset timing
slighty. This is expected to occur rarely and not affect timing results.
2015-03-02 04:00:49 -05:00
Marco Balboni d4ef8368aa mem: Downstream components consumes new crossbar delays
This patch makes the caches and memory controllers consume the delay
that is annotated to a packet by the crossbar. Previously many
components simply threw these delays away. Note that the devices still
do not pay for these delays.
2015-03-02 04:00:48 -05:00
Andreas Hansson 36dc93a5fa mem: Move crossbar default latencies to subclasses
This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.

Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
2015-03-02 04:00:47 -05:00
Marco Balboni d35dd71ab4 mem: Add crossbar latencies
This patch introduces latencies in crossbar that were neglected
before. In particular, it adds three parameters in crossbar model:
front_end_latency, forward_latency, and response_latency. Along with
these parameters, three corresponding members are added:
frontEndLatency, forwardLatency, and responseLatency. The coherent
crossbar has an additional snoop_response_latency.

The latency of the request path through the xbar is set as
--> frontEndLatency + forwardLatency

In case the snoop filter is enabled, the request path latency is charged
also by look-up latency of the snoop filter.
--> frontEndLatency + SF(lookupLatency) + forwardLatency.

The latency of the response path through the xbar is set instead as
--> responseLatency.

In case of snoop response, if the response is treated as a normal response
the latency associated is again
--> responseLatency;

If instead it is forwarded as snoop response we add an additional variable
+ snoopResponseLatency
and the latency associated is
--> snoopResponseLatency;

Furthermore, this patch lets the crossbar progress on the next clock
edge after an unused retry, changing the time the crossbar considers
itself busy after sending a retry that was not acted upon.
2015-03-02 04:00:46 -05:00
Andreas Sandberg 7be9d4eb67 dev, arm: Clean up PL011 and rewrite interrupt handling
The ARM PL011 UART model didn't clear and raise interrupts
correctly. This changeset rewrites the whole interrupt handling and
makes it both simpler and fixes several cases where the correct
interrupts weren't raised or cleared. Additionally, it cleans up many
other aspects of the code.
2015-03-02 04:00:44 -05:00
Andreas Hansson d64b34bef8 arm: Share a port for the two table walker objects
This patch changes how the MMU and table walkers are created such that
a single port is used to connect the MMU and the TLBs to the memory
system. Previously two ports were needed as there are two table walker
objects (stage one and stage two), and they both had a port. Now the
port itself is moved to the Stage2MMU, and each TableWalker is simply
using the port from the parent.

By using the same port we also remove the need for having an
additional crossbar joining the two ports before the walker cache or
the L2. This simplifies the creation of the CPU cache topology in
BaseCPU.py considerably. Moreover, for naming and symmetry reasons,
the TLB walker port is connected through the stage-one table walker
thus making the naming identical to x86. Along the same line, we use
the stage-one table walker to generate the master id that is used by
all TLB-related requests.
2015-03-02 04:00:42 -05:00
Giacomo Gabrielli bd70db5521 arm: Remove unnecessary dependencies between AArch64 FP instructions 2015-03-02 04:00:41 -05:00
Rekai 3d5434022a cpu: o3 register renaming request handling improved
Now, prior to the renaming, the instruction requests the exact amount of
registers it will need, and the rename_map decides whether the instruction is
allowed to proceed or not.
2015-03-02 04:00:38 -05:00
Andreas Hansson 987de4f5cc mem: Tidy up the cache debug messages
Avoid redundant inclusion of the name in the DPRINTF string.
2015-03-02 04:00:37 -05:00
Andreas Hansson f26a289295 mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
2015-03-02 04:00:35 -05:00
Ali Jafri 6ebe8d863a mem: Fix prefetchSquash + memInhibitAsserted bug
This patch resolves a bug with hardware prefetches. Before a hardware prefetch
is sent towards the memory, the system generates a snoop request to check all
caches above the prefetch generating cache for the presence of the prefetth
target. If the prefetch target is found in the tags or the MSHRs of the upper
caches, the cache sets the prefetchSquashed flag in the snoop packet. When the
snoop packet returns with the prefetchSquashed flag set, the prefetch
generating cache deallocates the MSHR reserved for the prefetch. If the
prefetch target is found in the writeback buffer of the upper cache, the cache
sets the memInhibit flag, which signals the prefetch generating cache to
expect the data from the writeback. When the snoop packet returns with the
memInhibitAsserted flag set, it marks the allocated MSHR as inService and
waits for the data from the writeback.

If the prefetch target is found in multiple upper level caches, specifically
in the tags or MSHRs of one upper level cache and the writeback buffer of
another, the snoop packet will return with both prefetchSquashed and
memInhibitAsserted set, while the current code is not written to handle such
an outcome. Current code checks for the prefetchSquashed flag first, if it
finds the flag, it deallocates the reserved MSHR. This leads to assert failure
when the data from the writeback appears at cache. In this fix, we simply
switch the order of checks. We first check for memInhibitAsserted and then for
prefetch squashed.
2015-03-02 04:00:34 -05:00
Stephan Diestelhorst de46eeade7 cpu: Add a PC-value to the traffic generator requests
Have the traffic generator add its masterID as the PC address to the
requests. That way, prefetchers (and other components) that use a PC
for request classification will see per-tester streams of requests.
This enables us to test strided prefetchers with the memchecker, too.
2015-03-02 04:00:31 -05:00
Andreas Sandberg 3b4ae7debb arm: Don't truncate 16-bit ASIDs to 8 bits
The ISA code sometimes stores 16-bit ASIDs as 8-bit unsigned integers
and has a couple of inverted checks that mask out the high 8 bits of
an ASID if 16-bit ASIDs have been /enabled/. This changeset fixes both
of those issues.
2015-03-02 04:00:28 -05:00
Andreas Sandberg 804b11a3ed arm: Correctly access the stack pointer in GDB
We curently use INTREG_X31 instead of INTREG_SPX when accessing the
stack pointer in GDB. gem5 normally uses INTREG_SPX to access the
stack pointer, which gets mapped to the stack pointer corresponding
(INTREG_SPn) to the current exception level. This changeset updates
the GDB interface to use SPX instead of X31 (which is always zero)
when transfering CPU state to gdb.
2015-03-02 04:00:27 -05:00
Andreas Sandberg 34dcd90b61 arm: Fix broken page table permissions checks in remote GDB
The remote GDB interface currently doesn't check if translations are
valid before reading memory. This causes a panic when GDB tries to
access unmapped memory (e.g., when getting a stack trace). There are
two reasons for this: 1) The function used to check for valid
translations (virtvalid()) doesn't work and panics on invalid
translations. 2) The method in the GDB interface used to test if a
translation is valid (RemoteGDB::acc) always returns true regardless
of the return from virtvalid().

This changeset fixes both of these issues.
2015-03-02 04:00:27 -05:00
Jason Power 670f44e05e Ruby: Update backing store option to propagate through to all RubyPorts
Previously, the user would have to manually set access_backing_store=True
on all RubyPorts (Sequencers) in the config files.
Now, instead there is one global option that each RubyPort checks on
initialization.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-02-26 09:58:26 -06:00
Andreas Hansson 8c78aa31ea cpu: TrafficGen sinks snoops without complaining
To be able to use the TrafficGen in a system with caches we need to
allow it to sink incoming snoop requests. By default the master port
panics, so silently ignore any snoops.
2015-02-16 03:34:55 -05:00
Stephan Diestelhorst 93fa8e3cd4 mem: Fix initial value problem with MemChecker
In highly loaded cases, reads might actually overlap with writes to the
initial memory state. The mem checker needs to detect such cases and
permit the read reading either from the writes (what it is doing now) or
read from the initial, unknown value.

This patch adds this logic.
2015-02-16 03:34:47 -05:00
Andreas Hansson 661dac1598 dev: Fix undefined behaviuor in i8254xGBe
This patch fixes a rather unfortunate oversight where the annotation
pointer was used even though it is null. Somehow the code still works,
but UBSan is rather unhappy. The use is now guarded, and the variable
is initialised in the constructor (as well as init()).
2015-02-16 03:34:35 -05:00
Andreas Sandberg 0a2ee77616 arm: Wire up the GIC with the platform in the base class
Move the (common) GIC initialization code that notifies the platform
code of the new GIC to the base class (BaseGic) instead of the Pl390
implementation.
2015-02-16 03:34:18 -05:00
Andreas Hansson e17328a227 mem: mmap the backing store with MAP_NORESERVE
This patch ensures we can run simulations with very large simulated
memories (at least 64 TB based on some quick runs on a Linux
workstation). In essence this allows us to efficiently deal with
sparse address maps without having to implement a redirection layer in
the backing store.

This opens up for run-time errors if we eventually exhausts the hosts
memory and swap space, but this should hopefully never happen.
2015-02-16 03:33:47 -05:00
Andreas Hansson 57758ca685 mem: Use the range cache for lookup as well as access
This patch changes the range cache used in the global physical memory
to be an iterator so that we can use it not only as part of isMemAddr,
but also access and functionalAccess. This matches use-cases where a
core is using the atomic non-caching memory mode, and repeatedly calls
isMemAddr and access.

Linux boot on aarch32, with a single atomic CPU, is now more than 30%
faster when using "--fastmem" compared to not using the direct memory
access.
2015-02-16 03:33:37 -05:00
Andreas Hansson d0e1b8a19c arch: Make readMiscRegNoEffect const throughout
Finally took the plunge and made this apply to all ISAs, not just ARM.
2015-02-16 03:33:28 -05:00
Andreas Sandberg 5bfa7e3d59 arm: Merge ISA files with pseudo instructions
This changeset moves the pseudo instructions used to signal unknown
instructions and unimplemented instructions to the same source files
as the decoder fault.
2015-02-16 03:32:58 -05:00
Ali Saidi 4eff4fa12e cpu: add support for outputing a protobuf formatted CPU trace
Doesn't support x86 due to static instruction representation.

--HG--
rename : src/cpu/CPUTracers.py => src/cpu/InstPBTrace.py
2015-02-16 03:32:38 -05:00
Marco Balboni 268d9e59c5 mem: Clarification of packet crossbar timings
This patch clarifies the packet timings annotated
when going through a crossbar.

The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.

The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.

For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.
2015-02-11 10:23:47 -05:00
Marco Balboni e2828587b3 mem: Clarify usage of latency in the cache
This patch adds some much-needed clarity in the specification of the
cache timing. For now, hit_latency and response_latency are kept as
top-level parameters, but the cache itself has a number of local
variables to better map the individual timing variables to different
behaviours (and sub-components).

The introduced variables are:
- lookupLatency: latency of tag lookup, occuring on any access
- forwardLatency: latency that occurs in case of outbound miss
- fillLatency: latency to fill a cache block
We keep the existing responseLatency

The forwardLatency is used by allocateInternalBuffer() for:
- MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer);
- MSHR allocateMissBuffer (cacheable miss in MSHR queue);
- MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR
  queue)
It is our assumption that the time for the above three buffers is the
same. Similarly, for snoop responses passing through the cache we use
forwardLatency.
2015-02-11 10:23:36 -05:00
Andreas Hansson 6563ec8634 cpu: Tidy up the MemTest and make false sharing more obvious
The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.

The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.

In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
2015-02-11 10:23:28 -05:00
Andreas Sandberg 550c318490 sim: Move the BaseTLB to src/arch/generic/
The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.

--HG--
rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py
rename : src/sim/tlb.cc => src/arch/generic/tlb.cc
rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-02-11 10:23:27 -05:00
Andreas Sandberg 9e6f803254 base: Add compiler macros to add deprecation warnings
Gcc and clang both provide an attribute that can be used to flag a
function as deprecated at compile time. This changeset adds a gem5
compiler macro for that compiler feature. The macro can be used to
indicate that a legacy API within gem5 has been deprecated and provide
a graceful migration to the new API.
2015-02-11 10:23:24 -05:00
Andreas Hansson c9b8616c51 base: Do not dereference NULL in CompoundFlag creation
This patch fixes the CompoundFlag constructor, ensuring that it does
not dereference NULL. Doing so has undefined behaviuor, and both clang
and gcc's undefined-behaviour sanitiser was rather unhappy.
2015-02-11 10:23:23 -05:00
Andreas Sandberg 431a6d708b dev: Remove unused system pointer in the Platform base class
The Platform base class contains a pointer to an instance of the
System which is never initialized. This can lead to subtle bugs since
some architecture-specific platform implementations contain their own
system pointer which is normally used. However, if the platform is
accessed through a pointer to its base class, the dangling pointer
will be used instead.
2015-02-11 10:23:22 -05:00
Alexandru Dutu ad1b177550 cpu: Idle CPU status logic revised
This patch sets the CPU status to idle when the last active thread gets
suspended.
2015-02-06 18:01:22 -08:00
Andreas Hansson 461a80beb3 mem: Clarify express snoop behaviour
This patch adds a bit of documentation with insights around how
express snoops really work.
2015-02-03 14:26:02 -05:00
Andreas Hansson 193325ff60 mem: Clarify cache behaviour for pending dirty responses
This patch adds a bit of clarification around the assumptions made in
the cache when packets are sent out, and dirty responses are
pending. As part of the change, the marking of an MSHR as in service
is simplified slightly, and comments are added to explain what
assumptions are made.
2015-02-03 14:25:59 -05:00
Curtis Dunham f0a764edc6 base: add an accessor and operators ==,!= to address ranges 2015-02-03 14:25:58 -05:00
Andreas Hansson ccb512ecc1 base: Add XOR-based hashed address interleaving
This patch extends the current address interleaving with basic hashing
support. Instead of directly comparing a number of address bits with a
matching value, it is now possible to use two independent set of
address bits XOR'ed together. This avoids issues where strided address
patterns are heavily biased to a subset of the interleaved ranges.
2015-02-03 14:25:54 -05:00
Andreas Hansson 5ea60a95b3 config: Adjust DRAM channel interleaving defaults
This patch changes the DRAM channel interleaving default behaviour to
be more representative. The default address mapping (RoRaBaCoCh) moves
the channel bits towards the least significant bits, and uses 128 byte
as the default channel interleaving granularity.

These defaults can be overridden if desired, but should serve as a
sensible starting point for most use-cases.
2015-02-03 14:25:52 -05:00
Andreas Sandberg fe200c2487 sim: Remove test for non-NULL this in Event
The method Event::initialized() tests if this != NULL as a part of the
expression that tests if an event is initialized. The only case when
this check could be false is if the method is called on a null
pointer, which is illegal and leads to undefined behavior (such as
eating your pets) according to the C++ standard. Because of this,
modern compilers (specifically, recent versions of clang) warn about
this which we treat as an error. This changeset removes the redundant
check to fix said warning.
2015-02-03 14:25:48 -05:00
Andreas Sandberg 851b29ad20 dev: Correctly clear interrupts in VirtIO PCI
Correctly clear the PCI interrupt belonging to a VirtIO device when
the ISR register is read.
2015-02-03 14:25:47 -05:00
Curtis Dunham b89fd57663 sim: prioritize async events; prevent starvation
If a time quantum event is the only one in the queue, async
events (Ctrl-C, I/O, etc.) will never be processed.

So process them first.
2014-12-19 15:32:34 -06:00
Andreas Hansson 20111ba917 cpu: Ensure timing CPU sinks response before sending new request
This patch changes how the timing CPU deals with processing responses,
always scheduling an event, even if it is for the current tick. This
helps to avoid situations where a new request shows up before a
response is finished in the crossbar, and also is more in line with
any realistic behaviour.
2015-02-03 14:25:27 -05:00
Geoffrey Blake 3e33786db8 config: Fix typo in Float param
The Float param was not settable on the command line
due to a typo in the class definition in
python/m5/params.py.  This corrects the typo and allows
floats to be set on the command line as intended.
2015-02-03 14:25:07 -05:00
Ali Saidi 89b3616d7e arm: always set the IsFirstMicroop flag
While the IsFirstMicroop flag exists it was only occasionally used in the ARM
instructions that gem5 microOps and therefore couldn't be relied on to be correct.
2015-01-25 07:22:56 -05:00
Ali Saidi 9d8ddd92dc sim: Clean up InstRecord
Track memory size and flags as well as add some comments and consts.
2015-01-25 07:22:44 -05:00
Ali Saidi f6742ea26e cpu: Remove all notion that we know when the cpu is misspeculating.
We have no way of knowing if a CPU model is on the wrong path with
our execute-in-execute CPU models. Don't pretend that we do.
2015-01-25 07:22:26 -05:00
Ali Saidi 0bd986015b cpu: Put all CPU instruction tracers in a single file 2015-01-25 07:22:17 -05:00
Ali Saidi 6c4a23c1c6 cpu: remove legion tracer
If someone wants to debug with legion again they can restore the
code from the repository, but no need to have it hang around indefinately.
2015-01-25 07:22:05 -05:00
Curtis Dunham 10b5e5431d sim: fix reference counting of PythonEvent
When gem5 is a slave to another simulator and the Python is only used
to initialize the configuration (and not perform actual simulation), a
"debug start" (--debug-start) event will get freed during or immediately
after the initial Python frame's execution rather than remaining in the
event queue. This tricky patch fixes the GC issue causing this.
2014-12-23 11:51:40 -06:00
Andreas Hansson 10c69bb168 mem: Remove unused Packet src and dest fields
This patch takes the final step in removing the src and dest fields in
the packet. These fields were rather confusing in that they only
remember a single multiplexing component, and pushed the
responsibility to the bridge and caches to store the fields in a
senderstate, thus effectively creating a stack. With the recent
changes to the crossbar response routing the crossbar is now
responsible without relying on the packet fields. Thus, these
variables are now unused and can be removed.
2015-01-22 05:01:31 -05:00
Andreas Hansson 15c64035ed mem: Remove Packet source from ForwardResponseRecord
This patch removes the source field from the ForwardResponseRecord,
but keeps the class as it is part of how the cache identifies
responses to hardware prefetches that are snooped upwards.
2015-01-22 05:01:30 -05:00
Andreas Hansson 0c2ffd2daa mem: Remove unused RequestState in the bridge
This patch removes the bridge sender state as the Crossbar now takes
care of remembering its own routing decisions.
2015-01-22 05:01:27 -05:00
Andreas Hansson 00536b0efc mem: Always use SenderState for response routing in RubyPort
This patch aligns how the response routing is done in the RubyPort,
using the SenderState for both memory and I/O accesses. Before this
patch, only the I/O used the SenderState, whereas the memory accesses
relied on the src field in the packet. With this patch we shift to
using SenderState in both cases, thus not relying on the src field any
longer.
2015-01-22 05:01:24 -05:00
Andreas Hansson 072f78471d mem: Make the XBar responsible for tracking response routing
This patch removes the need for a source and destination field in the
packet by shifting the onus of the tracking to the crossbar, much like
a real implementation. This change in behaviour also means we no
longer need a SenderState to remember the source/dest when ever we
have multiple crossbars in the system. Thus, the stack that was
created by the SenderState is not needed, and each crossbar locally
tracks the response routing.

The fields in the packet are still left behind as the RubyPort (which
also acts as a crossbar) does routing based on them. In the succeeding
patches the uses of the src and dest field will be removed. Combined,
these patches improve the simulation performance by roughly 2%.
2015-01-22 05:01:14 -05:00
Andreas Hansson ce12d4bc63 x86: Delay X86 table walk on receiving walker response
This patch fixes a minor issue in the X86 page table walker where it
ended up sending new request packets to the crossbar before the
response processing was finished (recvTimingResp is directly calling
sendTimingReq). Under certain conditions this caused the crossbar to
see illegal combinations of request/response overlap, in turn causing
problems with a slightly modified crossbar implementation.
2015-01-22 05:00:54 -05:00
Andreas Hansson f49830ce0b mem: Clean up Request initialisation
This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.
2015-01-22 05:00:53 -05:00
Nikos Nikoleris a35283ac65 cpu: commit probe notification on every microop or macroop
The ppCommit should notify the attached listener every time the cpu commits
a microop or non microcoded insturction. The listener can then decide
whether it will process only the last microop (eg. SimPoint probe).

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-20 14:15:27 -06:00
Andreas Hansson 6096e2f9c1 mem: Fix bug in cache request retry mechanism
This patch ensures that inhibited packets that are about to be turned
into express snoops do not update the retry flag in the cache.
2015-01-20 08:12:01 -05:00
Andreas Hansson da0c770943 cpu: Fix retry bug in MinorCPU LSQ 2015-01-20 08:11:58 -05:00
Andreas Hansson 92585d60c9 mem: Move DRAM interleaving check to init
This patch fixes a bug where the DRAM controller tried to access the
system cacheline size before the system pointer was initialised. It
also fixes a bug where the granularity is 0 (no interleaving).
2015-01-20 08:11:55 -05:00
Emilio Castillo 7bb65dd434 x86 : fxsave and fxrestore missing template code
This patch corrects the FXSAVE and FXRSTOR Macroops.  The actual code used for
saving/restore the FP registers is in the file but it was not used.

The FXSAVE and FXRSTOR instructions are used in the kernel for saving and
loading the state of the mmx,xmm and fpu registers.

This operation is triggered in FS by issuing a Device Not Available Fault.  The
cr0 register has a TS flag that is set upon each context change. Every time a
task access any FP related register (SIMD as well) if the TS flag is set to
one, the device not available fault is issued.  The kernel saves the current
state of the registers, and restore the previous state of the currently running
task.

Right now Gem5 lacks of this capability. the Device Not Available Fault is
never issued, leading to several problems when different threads share the same
CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this
behavior.

In order to test this a hack in the atomic cpu code was done to detect if a
static instruction has any FP operands and the cr0 reg TS bit is set.  This
check must be done in the ISA dependent code. But it seems to be tricky to
access the cr0 register while executing an instruction.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-10 14:30:53 -06:00
Nikos Nikoleris ec64b81a9d cpu: fix RetiredStores probe point
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-10 14:30:53 -06:00
cdirik 1693e526d0 dev: prevent intel 8254 timer counter events firing before startup
This change includes edits to Intel8254Timer to prevent counter events firing
before startup to comply with SimObject initialization call sequence.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-06 15:10:22 -07:00
Gabe Black 1c1fb2c988 test: Add a unittest for the BitUnion types. 2015-01-07 00:34:40 -08:00
Gabe Black 86dea86987 base: Fix assigning between identical bitfields.
If two bitfields are of the same type, also implying that they have the same
first and last bit positions, the existing implementation would copy the
entire bitfield. That includes the __data member which is shared among all the
bitfields, effectively overwritting the entire bitunion.

This change also adjusts the write only signed bitfield assignment operator to
be like the unsigned version, using "using" instead of implementing it again
and calling down to the underlying implementation.
2015-01-07 00:31:46 -08:00
Gabe Black cd6380605c x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield.
These are for the monitor/mwait instructions, SSSE3, and XSAVE.
2015-01-06 22:15:00 -08:00
Gabe Black cb181d6f91 cpuid, x86: Revert "Enabling more features in CPUid"
That change enables CPUID bits for features that aren't implemented in gem5.
If a simulated system tries to use those features because it was told it
could, bad things can happen.
2015-01-06 22:13:56 -08:00
Andrew Lukefahr 6d32004407 minor: fixed LSQ MasterPortID
Minor was reporting the data cache access as ".inst" accesses.
This just switches the MasterPortID to dataMasterPortId.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-03 17:51:48 -06:00
mike upton cb911559dc arm: Add unlinkat syscall implementation
added ARM aarch64 unlinkat syscall support, modeled on other <xxx>at syscalls.
This gets all of the cpu2006 int workloads passing in SE mode on aarch64.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-03 17:51:48 -06:00
Maxime Martinasso 5a5416d575 x86: implements the simd128 ADDSUBPD instruction
This patch implements the simd128 ADDSUBPD instruction for the x86 architecture.

Tested with a simple program in assembly language which executes the
instruction.  Checked that different versions of the instruction are executed
by using the execution tracing option.

Committed by: Nilay Vaish <nilay@cs.wisc.edu
2015-01-03 17:51:48 -06:00
Cagdas Dirik 02c376ac44 dev: prevent RTC events firing before startup
This change includes edits to MC146818 timer to prevent RTC events
firing before startup to comply with SimObject initialization call sequence.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-03 17:51:48 -06:00
Joel Hestness 642b9b4fab syscall_emul: Return correct writev value
According to Linux man pages, if writev is successful, it returns the total
number of bytes written. Otherwise, it returns an error code. Instead of
returning 0, return the result from the actual call to writev in the system
call.
2014-12-27 13:48:40 -06:00
Mitch Hayenga b2342c5d9a mem: Change prefetcher to use random_mt
Prefechers has used rand() to generate random numers previously.
2014-12-23 09:31:19 -05:00
Curtis Dunham 516e6046ae mem: Hide WriteInvalidate requests from prefetchers
Without this tweak, a prefetcher will happily prefetch data that will
promptly be invalidated and overwritten by a WriteInvalidate.
2014-12-23 09:31:19 -05:00
Mitch Hayenga bd4f901c77 mem: Fix event scheduling issue for prefetches
The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever
a future prefetch is ready.  However, a prefetch being ready does not guarentee
that it can obtain an MSHR.  So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond
until an MSHR is finally freed and the prefetch can happen.

This patch fixes this by not signaling the prefetch ready time if the prefetch
could not be generated.  The event is rescheduled as soon as a MSHR becomes
available.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 4acd4a2055 mem: Fix bug relating to writebacks and prefetches
Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system.  I hit that case.  Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.
2014-12-23 09:31:18 -05:00
Mitch Hayenga df82a2d003 mem: Rework the structuring of the prefetchers
Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.

Finally, the stride-based prefetcher now assumes a custimizable lookup table
(sets/ways) rather than the previous fully associative structure.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 6cb58b2bd2 mem: Add parameter to reserve MSHR entries for demand access
Adds a new parameter that reserves some number of MSHR entries for demand
accesses.  This helps prevent prefetchers from taking all MSHRs, forcing demand
requests from the CPU to stall.
2014-12-23 09:31:18 -05:00
Curtis Dunham 4d88978913 arm: Add stats to table walker
This patch adds table walker stats for:
- Walk events
- Instruction vs Data
- Page size histogram
- Wait time and service time histograms
- Pending requests histogram (per cycle) - measures dist. of L
  (p(1..) = how often busy, p(0) = how often idle)
- Squashes, before starting and after completion
2014-12-23 09:31:18 -05:00
Andreas Hansson 59460b91f3 config: Expose the DRAM ranks as a command-line option
This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).

The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).
2014-12-23 09:31:18 -05:00
Andreas Hansson 2f7baf9dbe mem: Ensure DRAM controller is idle when in atomic mode
This patch addresses an issue seen with the KVM CPU where the refresh
events scheduled by the DRAM controller forces the simulator to switch
out of the KVM mode, thus killing performance.

The current patch works around the fact that we currently have no
proper API to inform a SimObject of the mode switches. Instead we rely
on drainResume being called after any switch, and cache the previous
mode locally to be able to decide on appropriate actions.

The switcheroo regression require a minor stats bump as a result.
2014-12-23 09:31:18 -05:00
Omar Naji 381d1da791 mem: Add rank-wise refresh to the DRAM controller
This patch adds rank-wise refresh to the controller, as opposed to the
channel-wide refresh currently in place. In essence each rank can be
refreshed independently, and for this to be possible the controller
is extended with a state machine per rank.

Without this patch the data bus is always idle during a refresh, as
all the ranks are refreshing at the same time. With the rank-wise
refresh it is possible to use one rank while another one is
refreshing, and thus the data bus can be kept busy.

The patch introduces a Rank class to encapsulate the state per rank,
and also shifts all the relevant banks, activation tracking etc to the
rank. The arbitration is also updated to consider the state of the rank.
2014-12-23 09:31:18 -05:00
Omar Naji 152c02354e mem: Fix a bug in the DRAM controller arbitration
Fix a minor issue that affects multi-rank systems.
2014-12-23 09:31:18 -05:00
Kanishk Sugand 7a25b1a0e0 mem: Add stack distance statistics to the CommMonitor
This patch adds the stack distance calculator to the CommMonitor. The
stats are disabled by default.
2014-12-23 09:31:18 -05:00
Kanishk Sugand 888975b29d mem: Add a stack distance calculator
This patch adds a stand-alone stack distance calculator. The stack
distance calculator is a passive SimObject that observes the addresses
passed to it. It calculates stack distances (LRU Distances) of
incoming addresses based on the partial sum hierarchy tree algorithm
described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043.

For each transaction a hashtable look-up is performed. At every
non-unique transaction the tree is traversed from the leaf at the
returned index to the root, the old node is deleted from the tree, and
the sums (to the right) are collected and decremented. The collected
sum represets the stack distance of the found node. At every unique
transaction the stack distance is returned as
numeric_limits<uint64>::max().

In addition to the basic stack distance calculation, a feature to mark
an old node in the tree is added. This is useful if it is required to
see the reuse pattern. For example, Writebacks to the lower level
(e.g. membus from L2), can be marked instead of being removed from the
stack (isMarked flag of Node set to True). And then later if this same
address is accessed (by L1), the value of the isMarked flag would be
True. This gives some insight on how the Writeback policy of the
lower level affect the read/write accesses in an application.

Debugging is enabled by setting the verify flag to true. Debugging is
implemented using a dummy stack that behaves in a naive way, using STL
vectors. Note that this has a large impact on run time.
2014-12-23 09:31:18 -05:00
Marco Elver dd0f3943e2 mem: Add MemChecker and MemCheckerMonitor
This patch adds the MemChecker and MemCheckerMonitor classes. While
MemChecker can be integrated anywhere in the system and is independent,
the most convenient usage is through the MemCheckerMonitor -- this
however, puts limitations on where the MemChecker is able to observe
read/write transactions.
2014-12-23 09:31:17 -05:00
Andreas Sandberg 184fefbb3b arm: Raise an alignment fault if a PC has illegal alignment
We currently don't handle unaligned PCs correctly. There is one check
for unaligned PCs in the TLB when running in aarch64 mode, but this
check does not cover cases where the CPU does not do a TLB lookup when
decoding an instruction (e.g., a branch stays within the same cache
line). Additionally, the Decoder class sometimes throws an assertion
for unaligned PCs which breaks speculation.

This changeset introduces a decoder fault bit field in the ExtMachInst
structure. This field can be used to signal a decoder failure. If set,
the decoder generates an internal gem5fault instruction instead of a
normal instruction. This instruction in turns either panics (fault
type PANIC), returns an PCAlignmentFault (fault type UNALIGNED,
aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32).

The patch causes minor changes to the realview64 regressions, and a
stats bump will follow.
2014-12-23 09:31:17 -05:00
Andreas Sandberg b33812ba43 arm: Clean up and document decoder API
This changeset adds more documentation to the ArmISA::Decoder class
and restructures it slightly to make API groups more obvious.
2014-12-23 09:31:17 -05:00
Andreas Sandberg 070b4a81db arm: Add support for filtering in the PMU
This patch adds support for filtering events in the PMU. In order to
do so, it updates the ISADevice base class to forward an ISA pointer
to ISA devices. This enables such devices to access the MiscReg file
to determine the current execution level.
2014-12-23 09:31:17 -05:00
Gabe Black 70eb68beae Let other objects set up memory like regions in a KVM VM. 2014-12-09 21:53:44 -08:00
Andreas Sandberg 9b7578d8c7 arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0
The aarch64 system register decoder is currently not decoding
PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the
decoder so that they are decoded using the values in table C5-6 in ARM
DDI 0478A.c.
2014-12-08 04:49:53 -05:00
Andreas Sandberg 6a9fbd295d dev: Add response sanity checks in PioPort
Add an assert in the PioPort that checks if a response packet from a
device has the right flags set before passing it to them rest of the
memory system.
2014-12-08 04:49:52 -05:00
Andreas Sandberg 1ccc4e0e21 dev: Correctly transform packets into responses
The VirtIO devices didn't correctly set the response flags in memory
packets. This changeset adds the required Packet::makeResponse()
calls.
2014-12-08 04:49:51 -05:00
Gabe Black 4a8a0a0798 misc: Generalize GDB single stepping.
The new single stepping implementation for x86 doesn't rely on any ISA
specific properties or functionality. This change pulls out the per ISA
implementation of those functions and promotes the X86 implementation to the
base class.

One drawback of that implementation is that the CPU might stop on an
instruction twice if it's affected by both breakpoints and single stepping.
While that might be a little surprising, it's harmless and would only happen
under somewhat unlikely circumstances.
2014-12-05 22:37:03 -08:00
Gabe Black fb07d43b1a x86: Implement a remote GDB stub.
This stub should allow remote debugging of 32 bit and 64 bit targets. Single
stepping seems to work, as do breakpoints. If both breakpoints and single
stepping affect an instruction, gdb will stop at the instruction twice before
continuing. That's a little surprising, but is generally harmless.
2014-12-05 22:36:16 -08:00
Gabe Black 16c9b41616 misc: Add some utility functions for schedule inst commit events.
These can be used to simplify the implementation of single step in derived
classes.
2014-12-05 22:35:47 -08:00
Gabe Black cddf988bfd misc: Rename the GDB "Event" event class to InputEvent.
The "Event" name is the same as the base event class. That's a bit confusing,
and makes it a little awkward to add other event types.
2014-12-05 22:34:42 -08:00
Gabe Black f9f46b8fa9 sim: Ensure GDB interrupts the simulation at an instruction boundary.
Use the comInstEventQueue to ensure GDB interrupts the simulation at an
instruction boundary and not in the middle of a macroop, memory access, etc.
2014-12-05 01:51:49 -08:00
Gabe Black bacbb8ecbc cpu: Only check for PC events on instruction boundaries.
Only the instruction address is actually checked, so there's no need to check
repeatedly while we're working through the microops of a macroop and that's
not changing.
2014-12-05 01:47:35 -08:00
Gabe Black fe48c0a32b misc: Make the GDB register cache accessible in various sized chunks.
Not all ISAs have 64 bit sized registers, so it's not always very convenient
to access the GDB register cache in 64 bit sized chunks. This change makes it
accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations
were working around that limitation by bundling and unbundling 32 bit values
into 64 bit values. That code has been removed.
2014-12-05 01:44:24 -08:00
Gabe Black 22aaa5867f x86: Rework opcode parsing to support 3 byte opcodes properly.
Instead of counting the number of opcode bytes in an instruction and recording
each byte before the actual opcode, we can represent the path we took to get to
the actual opcode byte by using a type code. That has a couple of advantages.
First, we can disambiguate the properties of opcodes of the same length which
have different properties. Second, it reduces the amount of data stored in an
ExtMachInst, making them slightly easier/faster to create and process. This
also adds some flexibility as far as how different types of opcodes are
handled, which might come in handy if we decide to support VEX or XOP
instructions.

This change also adds tables to support properly decoding 3 byte opcodes.
Before we would fall off the end of some arrays, on top of the ambiguity
described above.

This change doesn't measureably affect performance on the twolf benchmark.

--HG--
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa
2014-12-04 15:53:54 -08:00