Commit graph

6423 commits

Author SHA1 Message Date
Andreas Hansson 4a453e8c95 mem: Allow packet queue to move next send event forward
This patch changes the packet queue such that when scheduling a send,
the queue is allowed to move the event forward.
2014-10-09 17:51:52 -04:00
Andreas Hansson 6498ccddb2 misc: Fix issues identified by static analysis
Another bunch of issues addressed.
2014-10-01 08:05:54 -04:00
Andreas Hansson b520223699 arm: Use MiscRegIndex rather than int when flattening
Some additional type checking to avoid future issues.
2014-10-01 08:05:52 -04:00
Andreas Hansson 10f82934be arm: More UBSan cleanups after additional full-system runs
Some incorrect casting to IntRegIndex, and a few uninitialized members
in the i8254xGBe device.
2014-10-01 08:05:51 -04:00
Andreas Hansson ec41000dad arm: Fixed undefined behaviours identified by gcc
This patch fixes the runtime errors highlighted by the undefined
behaviour sanitizer. In the end there were two issues. First, when
rotating an immediate, we ended up shifting an uint32_t by 32 in some
cases. This case is fixed by checking for a rotation by 0
positions. Second, the Mrc15 and Mcr15 are operating on an IntReg and
a MiscReg, but we used the type RegRegImmOp and passed a MiscRegIndex
as an IntRegIndex. This issue is resolved by introducing a
MiscRegRegImmOp and RegMiscRegImmOp with the appropriate types.

With these fixes there are no runtime errors identified for the full
ARM regressions.
2014-09-27 09:08:37 -04:00
Andreas Hansson 341dbf2662 arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
2014-09-27 09:08:36 -04:00
Andreas Hansson deb2200671 scons: Address issues related to gcc 4.9.1
Fix a number few minor issues to please gcc 4.9.1. Removing the
'-fuse-linker-plugin' flag means no libraries are part of the LTO
process, but hopefully this is an acceptable loss, as the flag causes
issues on a lot of systems (only certain combinations of gcc, ld and
ar work).
2014-09-27 09:08:34 -04:00
Curtis Dunham 4836aef1e4 dev: Output invalid access size in IsaFake panic 2014-09-27 09:08:33 -04:00
Curtis Dunham b7f1d675da mem: Output precise range when XBar has conflicts 2014-09-27 09:08:32 -04:00
Curtis Dunham 725be98fe8 mem: Provide better diagnostic for unconnected port
When _masterPort is null, a message to that effect is
more helpful than a segfault.
2014-09-27 09:08:30 -04:00
Andreas Hansson de62aedabc misc: Fix a bunch of minor issues identified by static analysis
Add some missing initialisation, and fix a handful benign resource
leaks (including some false positives).
2014-09-27 09:08:29 -04:00
Mitch Hayenga cc6523e2d6 cpu: Remove unused deallocateContext calls
The call paths for de-scheduling a thread are halt() and suspend(), from
the thread context. There is no call to deallocateContext() in general,
though some CPUs chose to define it. This patch removes the function
from BaseCPU and the cores which do not require it.
2014-09-20 17:18:36 -04:00
Mitch Hayenga e1403fc2af alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
2014-09-20 17:18:35 -04:00
Andreas Hansson 1f6d5f8f84 mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.

As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.

--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
2014-09-20 17:18:32 -04:00
Stephan Diestelhorst 435f4aec3d mem: Add access statistics for the snoop filter
Adds a simple access counter for requests and snoops for the snoop filter and
also classifies hits based on whether a single other holder existed or whether
multiple shares held the line.
2014-04-25 12:36:16 +01:00
Stephan Diestelhorst afa2428eca mem: Tie in the snoop filter in the coherent bus 2014-09-20 17:18:29 -04:00
Stephan Diestelhorst 7d488cc66f mem: Add a simple snoop counter per bus
This patch adds a simple counter for both total messages and a histogram for
the fan-out of snoop messages.  The fan-out describes to how many ports snoops
had to be sent per incoming request / snoop-from-below.  Without any
cleverness, this usually means to either all, or all but the requesting port.
2014-04-24 13:28:47 +01:00
Stephan Diestelhorst fe98cb6be4 misc: Add functions for doing popcount and power-of-two checking
Adds two public domain algorithms for determining number of set bits and also
whether a value is a power of two, uses the builtin that is available in GCC
and clang for popcount.
2014-04-24 17:41:26 +01:00
Stephan Diestelhorst ba98d598ae mem: Simple Snoop Filter
This is a first cut at a simple snoop filter that tracks presence of lines in
the caches "above" it. The snoop filter can be applied at any given cache
hierarchy and will then handle the caches above it appropriately; there is no
need to use this only in the last-level bus.

This design currently has some limitations: missing stats, no notion of clean
evictions (these will not update the underlying snoop filter, because they are
not sent from the evicting cache down), no notion of capacity for the snoop
filter and thus no need for invalidations caused by capacity pressure in the
snoop filter. These are planned to be added on top with future change sets.
2014-09-20 17:18:26 -04:00
Stephan Diestelhorst 16351ba8d6 energy: Tighter checking of levels for DFS systems
There are cases where users might by accident / intention specify less voltage
operating points thatn frequency points.  We consider one of these cases
special: giving only a single voltage to a voltage domain effectively renders
it as a static domain.  This patch adds additional logic in the auxiliary parts
of the functionality to handle these cases properly (simple driver asking for
N>1 operating levels, we should return the same voltage for all of them) and
adds error checking code in the voltage domain.
2014-08-12 19:00:44 +01:00
Stephan Diestelhorst 65aaf62714 energy: Add the Energy Controller in the right configs
Tie in the newly created energy controller components in the default
configurations.
2014-07-25 13:36:23 +01:00
Akash Bagdia 04e51e5e3e energy: Memory-mapped Energy Controller component
This patch provides an Energy Controller device that provides software
(driver) access to a DVFS handler. The device is currently residing in
the dev/arm tree, but there is nothing inherently ARM specific in the
behaviour. It is currently only tested and supported for ARM Linux,
hence the location.
2014-09-20 17:18:23 -04:00
Stephan Diestelhorst 4422d1322a energy: Small extentions and fixes for DVFS handler
These additions allow easier interoperability with and querying from an
additional controller which will be in a separate patch.  Also adding warnings
for changing the enabled state of the handler across checkpoint / resume and
deviating from the state in the configuration.

Contributed-by: Akash Bagdia <akash.bagdia@arm.com>
2014-06-16 14:59:44 +01:00
Wendy Elsasser bf23847072 mem: Add DDR4 bank group timing
Added the following parameter to the DRAMCtrl class:
 - bank_groups_per_rank

This defaults to 1. For the DDR4 case, the default is overridden to indicate
bank group architecture, with multiple bank groups per rank.

Added the following delays to the DRAMCtrl class:
 - tCCD_L : CAS-to-CAS, same bank group delay
 - tRRD_L : RAS-to-RAS, same bank group delay

These parameters are only applied when bank group timing is enabled.  Bank
group timing is currently enabled only for DDR4 memories.

For all other memories, these delays will default to '0 ns'

In the DRAM controller model, applied the bank group timing to the per bank
parameters actAllowedAt and colAllowedAt.
The actAllowedAt will be updated based on bank group when an ACT is issued.
The colAllowedAt will be updated based on bank group when a RD/WR burst is
issued.

At the moment no modifications are made to the scheduling.
2014-09-20 17:18:21 -04:00
Wendy Elsasser b6ecfe9183 mem: Add memory rank-to-rank delay
Add the following delay to the DRAM controller:
 - tCS : Different rank bus turnaround delay

This will be applied for
 1) read-to-read,
 2) write-to-write,
 3) write-to-read, and
 4) read-to-write
command sequences, where the new command accesses a different rank
than the previous burst.

The delay defaults to 2*tCK for each defined memory class. Note that
this does not correspond to one particular timing constraint, but is a
way of modelling all the associated constraints.

The DRAM controller has some minor changes to prioritize commands to
the same rank. This prioritization will only occur when the command
stream is not switching from a read to write or vice versa (in the
case of switching we have a gap in any case).

To prioritize commands to the same rank, the model will determine if there are
any commands queued (same type) to the same rank as the previous command.
This check will ensure that the 'same rank' command will be able to execute
without adding bubbles to the command flow, e.g. any ACT delay requirements
can be done under the hoods, allowing the burst to issue seamlessly.
2014-09-20 17:17:57 -04:00
Wendy Elsasser a384525355 cpu: Update DRAM traffic gen
Add new DRAM_ROTATE mode to traffic generator.
This mode will generate DRAM traffic that rotates across
banks per rank, command types, and ranks per channel

The looping order is illustrated below:
for (ranks per channel)
   for (command types)
      for (banks per rank)
         // Generate DRAM Command Series

This patch also adds the read percentage as an input argument to the
DRAM sweep script. If the simulated read percentage is 0 or 100, the
middle for loop does not generate additional commands.  This loop is
used only when the read percentage is set to 50, in which case the
middle loop will toggle between read and write commands.

Modified sweep.py script, which generates DRAM traffic.
Added input arguments and support for new DRAM_ROTATE mode.
The script now has input arguments for:
 1) Read percentage
 2) Number of ranks
 3) Address mapping
 4) Traffic generator mode  (DRAM or DRAM_ROTATE)

The default values are:
 100% reads, 1 rank, RoRaBaCoCh address mapping, and DRAM traffic gen mode

For the DRAM traffic mode, added multi-rank support.
2014-09-20 17:17:55 -04:00
Andreas Sandberg 3f7a9348dd dev: Add support for 9p proxying over VirtIO
This patch adds support for 9p filesystem proxying over VirtIO. It can
currently operate by connecting to a 9p server over a socket
(VirtIO9PSocket) or by starting the diod 9p server and connecting over
pipe (VirtIO9PDiod).


*WARNING*: Checkpoints are currently not supported for systems with 9p
 proxies!
2014-09-20 17:17:54 -04:00
Andreas Sandberg 8c070c8f1b dev: Add a VirtIO block device model 2014-09-20 17:17:53 -04:00
Andreas Sandberg b8c9b04bd6 dev: Add a VirtIO console device model 2014-09-20 17:17:52 -04:00
Andreas Sandberg bf2c2183c6 dev, pci: Implement basic VirtIO support
This patch adds support for VirtIO over the PCI bus. It does so by
providing the following new SimObjects:

 * VirtIODeviceBase - Abstract base class for VirtIO devices.
 * PciVirtIO - VirtIO PCI transport interface.

A VirtIO device is hooked up to the guest system by adding a PciVirtIO
device to the PCI bus and connecting it to a VirtIO device using the
vio parameter.

New VirtIO devices should inherit from VirtIODevice base and
implementing one or more VirtQueues. The VirtQueues are usually
device-specific and all derive from the VirtQueue class. Queues must
be registered with the base class from the constructor since the
device assumes that the number of queues stay constant.
2014-09-20 17:17:51 -04:00
Andreas Sandberg 0c5139310d dev: Refactor terminal<->UART interface to make it more generic
The terminal currently assumes that the transport to the guest always
inherits from the Uart class. This assumption breaks when
implementing, for example, a VirtIO consoles. This patch removes this
assumption by adding pointer to the from the terminal to the uart and
replacing it with a more general callback interface. The Uart, or any
other class using the terminal, class implements an instance of the
callbacks class and registers it with the terminal.
2014-09-20 17:17:50 -04:00
Andreas Hansson 0fa128bbd0 base: Clean up redundant string functions and use C++11
This patch does a bit of housekeeping on the string helper functions
and relies on the C++11 standard library where possible. It also does
away with our custom string hash as an implementation is already part
of the standard library.
2014-09-20 17:17:49 -04:00
Andrew Bardsley b2c2e67468 base: Add getSectionNames to IniFile
Add an accessor to IniFile to list all the sections in the file.
2014-09-20 17:17:47 -04:00
Mitch Hayenga 4f0e3cd4d7 cpu: Add ExecFlags debug flag
Adds a debug flag to print out the flags a instruction is tagged with.
2014-09-20 17:17:45 -04:00
Mitch Hayenga 3e5bf0c922 mem: Remove the GHB prefetcher from the source tree
There are two primary issues with this code which make it deserving of deletion.

1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher
2) This prefetcher isn't even structured like a GHB prefetcher.
   It's basically a worse version of the stride prefetcher.

It primarily serves to confuse new gem5 users and most functionality is already
present in the stride prefetcher.
2014-09-20 17:17:44 -04:00
Dam Sunwoo ca3513d630 cpu: use probes infrastructure to do simpoint profiling
Instead of having code embedded in cpu model to do simpoint profiling use
the probes infrastructure to do it.
2014-09-20 17:17:43 -04:00
Andrew Bardsley 7329c0e20b config: Cleanup .json config file generation
This patch 'completes' .json config files generation by adding in the
SimObject references and String-valued parameters not currently
printed.

TickParamValues are also changed to print in the same tick-value
format as in .ini files.

This allows .json files to describe a system as fully as the .ini files
currently do.

This patch adds a new function config_value (which mirrors ini_str) to
each ParamValue and to SimObject.  This function can then be explicitly
changed to give different .json and .ini printing behaviour rather than
being written in terms of ini_str.
2014-09-20 17:17:42 -04:00
Andreas Hansson 41fc8a573e arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
2014-09-19 10:35:18 -04:00
Andreas Hansson 619c5519fe cpu: Use a deque in o3 rename instruction queue
Switch from a list to a data structure with better data layout.
2014-09-19 10:35:14 -04:00
Andreas Hansson 586a219d11 base: Ensure the CP annotation compiles again
A bit of revamping to get the CP annotate functionality to compile.
2014-09-19 10:35:12 -04:00
Andreas Hansson efd5cf323a misc: Use safe_cast when assumptions are made about return value
This patch changes two dynamic_cast to safe_cast as we assume the
return value is not NULL (without checking).
2014-09-19 10:35:11 -04:00
Andreas Hansson 32c111eda4 misc: Restore ostream flags where needed
This patch ensures we adhere to the normal ostream usage rules, and
restore the flags after modifying them.
2014-09-19 10:35:09 -04:00
Andreas Hansson addfd89dce stats: Fix flow-control bug in Vector2D printing 2014-09-19 10:35:08 -04:00
Andreas Hansson f615c4aeb0 misc: Remove assertions ensuring unsigned values >= 0 2014-09-19 10:35:07 -04:00
Andreas Hansson 377f081251 mem: Check return value of checkFunctional in SimpleMemory
Simple fix to ensure we only iterate until we are done.
2014-09-19 10:35:06 -04:00
Andreas Hansson 38646d48eb mem: Add checks to sendTimingReq in cache
A small fix to ensure the return value is not ignored.
2014-09-19 10:35:04 -04:00
Nilay Vaish 2ccdfc547d ruby: network: revert some of the changes from ad9c042dce54
The changeset ad9c042dce54 made changes to the structures under the network
directory to use a map of buffers instead of vector of buffers.
The reasoning was that not all vnets that are created are used and we
needlessly allocate more buffers than required and then iterate over them
while processing network messages.  But the move to map resulted in a slow
down which was pointed out by Andreas Hansson.  This patch moves things
back to using vector of message buffers.
2014-09-15 16:19:38 -05:00
Andrew Bardsley 1a45a8c5d3 cpu: Fix memory access in Minor not setting parent Request flags
This patch fixes cases where uncacheable/memory type flags are not set
correctly on a memory op which is split in the LSQ.  Without this
patch, request->request if freely used to check flags where the flags
should actually come from the accumulation of request fragment flags.

This patch also fixes a bug where an uncacheable access which passes
through tryToSendRequest more than once can increment
LSQ::numAccessesInMemorySystem more than once.
2014-09-12 10:22:49 -04:00
Andrew Bardsley c8b919aba2 style: Fix line continuation, especially in debug messages
This patch closes a number of space gaps in debug messages caused by
the incorrect use of line continuation within strings. (There's also
one consistency change to a similar, but correct, use of line
continuation)
2014-09-12 10:22:47 -04:00
Andreas Hansson 2b4906fc64 minor: Fix typo in DPRINTF for Minor branch prediction 2014-09-12 10:22:46 -04:00
Andreas Sandberg 53a24b01ab sim: Automatically unregister probe listeners
The ProbeListener base class automatically registers itself with a
probe manager. Currently, the class does not unregister a itself when
it is destroyed, which makes removing probes listeners somewhat
cumbersome. This patch adds an automatic call to
manager->removeListener in the ProbeListener destructor, which solves
the problem.
2014-09-09 04:36:43 -04:00
Geoffrey Blake b0e4de667a config: Fix vectorparam command line parsing
Parsing vectorparams from the command was slightly broken
in that it wouldn't accept the input that the help message
provided to the user and it didn't do the conversion
on the second code path used to convert the string input
to the actual internal representation.  This patch fixes these bugs.
2014-09-09 04:36:34 -04:00
Mitch Hayenga cd1bd7572a cpu: Only iterate over possible threads on the o3 cpu
Some places in O3 always iterated over "Impl::MaxThreads" even if a CPU had
fewer threads.  This removes a few of those instances.
2014-09-09 04:36:34 -04:00
Mitch Hayenga 9a595fac74 mem: Add accessor function for vaddr
Determine if a request has an associated virtual address.
2014-09-09 04:36:33 -04:00
Andreas Sandberg 11494c4345 sim: Fix resource leak in BaseGlobalEvent
Static analysis revealed that BaseGlobalEvent::barrier was never
deallocated. This changeset solves this leak by making the barrier
allocation a part of the BaseGlobalEvent instead of storing a pointer
to a separate heap-allocated barrier.
2014-09-09 04:36:32 -04:00
Andreas Hansson da4539dc74 misc: Fix a number of unitialised variables and members
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
2014-09-09 04:36:31 -04:00
Ali Saidi 346fe73370 dev: seperate legacy io offsets from PCI offset
The PC platform has a single IO range that is used both legacy IO and PCI IO
while other platforms may use seperate regions. Provide another mechanism to
configure the legacy IO base address range and set it to the PCI IO address
range for x86.
2014-09-03 07:43:06 -04:00
Ali Saidi 1c0ae90027 arm: Support >2GB of memory for AArch64 systems 2014-09-03 07:43:05 -04:00
Ali Saidi 1e13f1b074 dev, arm: Add support for linux generic pci host driver
This change adds support for a generic pci host bus driver that
has been included in recent Linux kernel instead of the more
bespoke one we've been using to date. It also works with
aarch64 so it provides PCI support for 64-bit ARM Linux.

To make this work a new configuration option pci_io_base is added
to the RealView platform that should be set to the start of
the memory used as memory mapped IO ports (IO ports that are
memory mapped, not regular memory mapped IO). And a parameter
pci_cfg_gen_offsets which specifies if the config space
offsets should be used that the generic driver expects.

To use the pci-host-generic device you need to:
pci_io_base = 0x2f000000 (Valid for VExpress EMM)
pci_cfg_gen_offsets = True

and add the following to your device tree:

    pci {
        compatible = "pci-host-ecam-generic";
        device_type = "pci";
        #address-cells = <0x3>;
        #size-cells = <0x2>;
        #interrupt-cells = <0x1>;
        //bus-range = <0x0 0x1>;

        // CPU_PHYSICAL(2)  SIZE(2)
        // Note, some DTS blobs only support 1 size
        reg = <0x0 0x30000000 0x0 0x10000000>;

        // IO (1), no bus address (2), cpu address (2), size (2)
        // MMIO (1), at address (2), cpu address (2), size (2)
        ranges = <0x01000000 0x0 0x00000000 0x0 0x2f000000 0x0 0x10000>,
                 <0x02000000 0x0 0x40000000 0x0 0x40000000 0x0 0x10000000>;

        // With gem5 we typically use INTA/B/C/D one per device
        interrupt-map = <0x0000 0x0 0x0 0x1 0x1 0x0 0x11 0x1
                         0x0000 0x0 0x0 0x2 0x1 0x0 0x12 0x1
                         0x0000 0x0 0x0 0x3 0x1 0x0 0x13 0x1
                         0x0000 0x0 0x0 0x4 0x1 0x0 0x14 0x1>;

        // Only match INTA/B/C/D and not BDF
        interrupt-map-mask = <0x0000 0x0 0x0 0x7>;
    };
2014-09-03 07:43:04 -04:00
Geoffrey Blake 31e4e475d9 config: Add port splicing capability to PortRef class
The new configuration scripts need the ability to splice
a simobject between a pair of ports that are already connected.
The primary use case is when a CommMonitor needs to be
created after the system is configured and then spliced between
the pair of ports it will monitor.
2014-09-03 07:43:03 -04:00
Geoffrey Blake 845e199934 config: Refactor RealviewEMM to fit into new config system
This eliminates some default devices and adds in helper functions
to connect the devices defined here to associate with the proper
clock domains.
2014-09-03 07:43:01 -04:00
Andreas Hansson 83a46bfc09 base: Use STL C++11 random number generation
This patch changes the random number generator from the in-house
Mersenne twister to an implementation relying entirely on C++11 STL.

The format for the checkpointing of the twister is simplified. As the
functionality was never used this should not matter. Note that this
patch does not actually make use of the checkpointing
functionality. As the random number generator is not thread safe, it
may be sensible to create one generator per thread, system, or even
object. Until this is decided the status quo is maintained in that no
generator state is part of the checkpoint.
2014-09-03 07:42:55 -04:00
Andreas Hansson 2698e73966 base: Use the global Mersenne twister throughout
This patch tidies up random number generation to ensure that it is
done consistently throughout the code base. In essence this involves a
clean-up of Ruby, and some code simplifications in the traffic
generator.

As part of this patch a bunch of skewed distributions (off-by-one etc)
have been fixed.

Note that a single global random number generator is used, and that
the object instantiation order will impact the behaviour (the sequence
of numbers will be unaffected, but if module A calles random before
module B then they would obviously see a different outcome). The
dependency on the instantiation order is true in any case due to the
execution-model of gem5, so we leave it as is. Also note that the
global ranom generator is not thread safe at this point.

Regressions using the memtest, TrafficGen or any Ruby tester are
affected and will be updated accordingly.
2014-09-03 07:42:54 -04:00
Andreas Hansson 1ff4c45bbb mem: Avoid unecessary retries when bus peer is not ready
This patch removes unecessary retries that happened when the bus layer
itself was no longer busy, but the the peer was not yet ready. Instead
of sending a retry that will inevitably not succeed, the bus now
silenty waits until the peer sends a retry.
2014-09-03 07:42:53 -04:00
Mitch Hayenga 8f95144e16 arm: Make memory ops work on 64bit/128-bit quantities
Multiple instructions assume only 32-bit load operations are available,
this patch increases load sizes to 64-bit or 128-bit for many load pair and
load multiple instructions.
2014-09-03 07:42:52 -04:00
Curtis Dunham f6f63ec0aa mem: write streaming support via WriteInvalidate promotion
Support full-block writes directly rather than requiring RMW:
 * a cache line is allocated in the cache upon receipt of a
   WriteInvalidateReq, not the WriteInvalidateResp.
 * only top-level caches allocate the line; the others just pass
   the request along and invalidate as necessary.
 * to close a timing window between the *Req and the *Resp, a new
   metadata bit tracks whether another cache has read a copy of
   the new line before the writeback to memory.
2014-06-27 12:29:00 -05:00
Andreas Hansson 3be4f4b846 mem: Fix a bug in the cache port flow control
This patch fixes a bug in the cache port where the retry flag was
reset too early, allowing new requests to arrive before the retry was
actually sent, but with the event already scheduled. This caused a
deadlock in the interactions with the O3 LSQ.

The patche fixes the underlying issue by shifting the resetting of the
flag to be done by the event that also calls sendRetry(). The patch
also tidies up the flow control in recvTimingReq and ensures that we
also check if we already have a retry outstanding.
2014-09-03 07:42:50 -04:00
Curtis Dunham 5d029463ee cpu, mem: Make software prefetches non-blocking
Previously, they were treated so much like loads that they could stall
at the head of the ROB.  Now they are always treated like L1 hits.
If they actually miss, a new request is created at the L1 and tracked
from the MSHRs there if necessary (i.e. if it didn't coalesce with
an existing outstanding load).
2014-05-13 12:20:49 -05:00
Curtis Dunham e3b19cb294 mem: Refactor assignment of Packet types
Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.
2014-05-13 12:20:48 -05:00
Mitch Hayenga afbae1ec95 x86: Flag instructions that call suspend as IsQuiesce
The o3 cpu relies upon instructions that suspend a thread context being
flagged as "IsQuiesce".  If they are not, unpredictable behavior can occur.
This patch fixes that for the x86 ISA.
2014-09-03 07:42:46 -04:00
Mitch Hayenga 659bdc1a6b cpu: Fix o3 drain bug
For X86, the o3 CPU would get stuck with the commit stage not being
drained if an interrupt arrived while drain was pending. isDrained()
makes sure that pcState.microPC() == 0, thus ensuring that we are at
an instruction boundary. However, when we take an interrupt we
execute:

    pcState.upc(romMicroPC(entry));
    pcState.nupc(romMicroPC(entry) + 1);
    tc->pcState(pcState);

As a result, the MicroPC is no longer zero. This patch ensures the drain is
delayed until no interrupts are present.  Once draining, non-synchronous
interrupts are deffered until after the switch.
2014-09-03 07:42:45 -04:00
Mitch Hayenga bb1e6cf7c4 arm: Fix v8 neon latency issue for loads/stores
Neon memory ops that operate on multiple registers currently have very poor
performance because of interleave/deinterleave micro-ops.

This patch marks the deinterleave/interleave micro-ops as "No_OpClass" such
that they take minumum cycles to execute and are never resource constrained.

Additionaly the micro-ops over-read registers.  Although one form may need
to read up to 20 sources, not all do.  This adds in new forms so false
dependencies are not modeled.  Instructions read their minimum number of
sources.
2014-09-03 07:42:44 -04:00
Curtis Dunham 4a3f11149d arm: use condition code registers for ARM ISA
Analogous to ee049bf (for x86).  Requires a bump of the checkpoint version
and corresponding upgrader code to move the condition code register values
to the new register file.
2014-04-29 16:05:02 -05:00
Andrew Bardsley 035a82ee2c arm: ISA X31 destination register fix
This patch substituted the zero register for X31 used as a
destination register.  This prevents false dependencies based on
X31.
2014-09-03 07:42:43 -04:00
Dam Sunwoo 5008a20aa4 cpu: fix bimodal predictor to use correct global history reg
A small bug in the bimodal predictor caused significant degradation in
performance on some benchmarks. This was caused by using the wrong
globalHistoryReg during the update phase. This patches fixes the bug
and brings the performance to normal level.
2014-09-03 07:42:41 -04:00
Mitch Hayenga 476c6fe368 arm: Mark v7 cbz instructions as direct branches
v7 cbz/cbnz instructions were improperly marked as indirect branches.
2014-09-03 07:42:40 -04:00
Mitch Hayenga 4f13f676aa cpu: Fix cache blocked load behavior in o3 cpu
This patch fixes the load blocked/replay mechanism in the o3 cpu.  Rather than
flushing the entire pipeline, this patch replays loads once the cache becomes
unblocked.

Additionally, deferred memory instructions (loads which had conflicting stores),
when replayed would not respect the number of functional units (only respected
issue width).  This patch also corrects that.

Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.
2014-09-03 07:42:39 -04:00
Mitch Hayenga 283935a6f0 cpu: Fix o3 quiesce fetch bug
O3 is supposed to stop fetching instructions once a quiesce is encountered.
However due to a bug, it would continue fetching instructions from the current
fetch buffer.  This is because of a break statment that only broke out of the
first of 2 nested loops.  It should have broken out of both.
2014-09-03 07:42:38 -04:00
Mitch Hayenga 4f26bedc18 cpu: Fix SMT scheduling issue with the O3 cpu
The o3 cpu could attempt to schedule inactive threads under round-robin SMT
mode.

This is because it maintained an independent priority list of threads from the
active thread list.  This priority list could be come stale once threads were
inactive, leading to the cpu trying to fetch/commit from inactive threads.


Additionally the fetch queue is now forcibly flushed of instrctuctions
from the de-scheduled thread.

Relevant output:

24557000: system.cpu: [tid:1]: Calling deactivate thread.
24557000: system.cpu: [tid:1]: Removing from active threads list

24557500: system.cpu:
FullO3CPU: Ticking main, FullO3CPU.
24557500: system.cpu.fetch: Running stage.
24557500: system.cpu.fetch: Attempting to fetch from [tid:1]
2014-09-03 07:42:37 -04:00
Mitch Hayenga daedc5a491 cpu: Fix incorrect speculative branch predictor behavior
When a branch mispredicted gem5 would squash all history after and including
the mispredicted branch.  However, the mispredicted branch is still speculative
and its history is required to rollback state if another, older, branch
mispredicts.  This leads to things like RAS corruption.
2014-09-03 07:42:36 -04:00
Mitch Hayenga ecd5300971 cpu: Add a fetch queue to the o3 cpu
This patch adds a fetch queue that sits between fetch and decode to the
o3 cpu.  This effectively decouples fetch from decode stalls allowing it
to be more aggressive, running futher ahead in the instruction stream.
2014-09-03 07:42:35 -04:00
Mitch Hayenga 1716749c8c cpu: Fix o3 front-end pipeline interlock behavior
The o3 pipeline interlock/stall logic is incorrect.  o3 unnecessicarily stalled
fetch and decode due to later stages in the pipeline.  In general, a stage
should usually only consider if it is stalled by the adjacent, downstream stage.
Forcing stalls due to later stages creates and results in bubbles in the
pipeline.  Additionally, o3 stalled the entire frontend (fetch, decode, rename)
on a branch mispredict while the ROB is being serially walked to update the
RAT (robSquashing). Only should have stalled at rename.
2014-09-03 07:42:34 -04:00
Mitch Hayenga 976f27487b cpu: Change writeback modeling for outstanding instructions
As highlighed on the mailing list gem5's writeback modeling can impact
performance.  This patch removes the limitation on maximum outstanding issued
instructions, however the number that can writeback in a single cycle is still
respected in instToCommit().
2014-09-03 07:42:33 -04:00
Mitch Hayenga fd722946dd arch: Properly guess OpClass from optional StaticInst flags
isa_parser.py guesses the OpClass if none were given based upon the StaticInst
flags.  The existing code does not take into account optionally set flags.
This code hoists the setting of optional flags so OpClass is properly assigned.
2014-09-03 07:42:32 -04:00
Geoffrey Blake b404ffde60 cache: Fix handling of LL/SC requests under contention
If a set of LL/SC requests contend on the same cache block we
can get into a situation where CPUs will deadlock if they expect
a failed SC to supply them data.  This case happens where 3 or
more cores are contending for a cache block using LL/SC and the system
is configured where 2 cores are connected to a local bus and the
third is connected to a remote bus.  If a core on the local bus
sends an SCUpgrade and the core on the remote bus sends and SCUpgrade
they will race to see who will win the SC access.  In the meantime
if the other core appends a read to one of the SCUpgrades it will expect
to be supplied data by that SCUpgrade transaction.  If it happens that
the SCUpgrade that was picked to supply the data is failed, it will
drop the appended request for data and never respond, leaving the requesting
core to deadlock.  This patch makes all SC's behave as normal stores to
prevent this case but still makes sure to check whether it can perform
the update.
2014-09-03 07:42:31 -04:00
Curtis Dunham 12210ada54 arm: support 16kb vm granules 2014-05-27 11:00:56 -05:00
Andreas Hansson 77c28cc395 mem: Packet queue clean up
No change in functionality, just a bit of tidying up.
2014-09-03 07:42:28 -04:00
Mitch Hayenga 71769d2d7b dev: Avoid invalid sized reads in PL390 with DPRINTF enabled
The first DPRINTF() in PL390::writeDistributor always read a uint32_t, though a
packet may have only been 1 or 2 bytes.  This caused an assertion in
packet->get().
2014-09-03 07:42:27 -04:00
Andrew Bardsley 87f6034462 sim: Fix checkpoint restore for Ticked
This patch makes restoring the 'lastStopped' value for Ticked-containing
objects (including MinorCPU) optional so that Ticked-containing objects
can be restored from non-Ticked-containing objects (such as AtomicSimpleCPU).
2014-09-03 07:42:25 -04:00
Andreas Sandberg 326662b01b arch, cpu: Factor out the ExecContext into a proper base class
We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.

The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.
2014-09-03 07:42:22 -04:00
Andreas Hansson e1ac962939 arch: Cleanup unused ISA traits constants
This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
2014-09-03 07:42:21 -04:00
Mitch Hayenga 23c8540756 config: Change parsing of Addr so hex values work from scripts
When passed from a configuration script with a hexadecimal value (like
"0x80000000"), gem5 would error out. This is because it would call
"toMemorySize" which requires the argument to end with a size specifier (like
1MB, etc).

This modification makes it so raw hex values can be passed through Addr
parameters from the configuration scripts.
2014-09-03 07:42:20 -04:00
Andreas Hansson 1046b8d6e5 arm: Fix ExtMachInst hash operator underlying type
This patch fixes the hash operator used for ARM ExtMachInst, which
incorrectly was still using uint32_t. Instead of changing it to
uint64_t it is not using the underlying data type of the BitUnion.
2014-09-03 07:42:19 -04:00
Nilay Vaish 2cbe7c705b ruby: remove typedef of Index as int64
The Index type defined as typedef int64 does not really provide any help
since in most places we use primitive types instead of Index.  Also, the name
Index is very generic that it does not merit being used as a typename.
2014-09-01 16:55:50 -05:00
Nilay Vaish 4ccdf8fb81 x86: set op class of two fp instructions
This patch sets op class of two fp instructions: movfp and pop x87 stack
as IntAluOp since these instructions do not make use of the fp alu.
2014-09-01 16:55:49 -05:00
Nilay Vaish b4dade6fb2 ruby: PerfectSwitch: moves code to a per vnet helper function
This patch moves code from the wakeup() function to a operateVnet().
The aim is to improve the readiblity of the code.
2014-09-01 16:55:48 -05:00
Nilay Vaish 7a0d5aafe4 ruby: message buffers: significant changes
This patch is the final patch in a series of patches.  The aim of the series
is to make ruby more configurable than it was.  More specifically, the
connections between controllers are not at all possible (unless one is ready
to make significant changes to the coherence protocol).  Moreover the buffers
themselves are magically connected to the network inside the slicc code.
These connections are not part of the configuration file.

This patch makes changes so that these connections will now be made in the
python configuration files associated with the protocols.  This requires
each state machine to expose the message buffers it uses for input and output.
So, the patch makes these buffers configurable members of the machines.

The patch drops the slicc code that usd to connect these buffers to the
network.  Now these buffers are exposed to the python configuration system
as Master and Slave ports.  In the configuration files, any master port
can be connected any slave port.  The file pyobject.cc has been modified to
take care of allocating the actual message buffer.  This is inline with how
other port connections work.
2014-09-01 16:55:47 -05:00
Nilay Vaish 00286fc5cb build opts: add MI_example to NULL ISA
A later changeset changes the file src/python/swig/pyobject.cc to include
a header file that includes a header file generated at build time depending
on the PROTOCOL in use.  Since NULL ISA was not specifying any protocol,
this resulted in compilation problems.  Hence, the changeset.
2014-09-01 16:55:46 -05:00
Nilay Vaish d07abd9b5b mem: change the namespace Message to ProtoMessage
The namespace Message conflicts with the Message data type used extensively
in Ruby.  Since Ruby is being moved to the same Master/Slave ports based
configuration style as the rest of gem5, this conflict needs to be resolved.
Hence, the namespace is being renamed to ProtoMessage.
2014-09-01 16:55:46 -05:00
Nilay Vaish cee8faaad0 ruby: slicc: change the way configurable members are specified
There are two changes this patch makes to the way configurable members of a
state machine are specified in SLICC.  The first change is that the data
member declarations will need to be separated by a semi-colon instead of a
comma.  Secondly, the default value to be assigned would now use SLICC's
assignment operator i.e. ':='.
2014-09-01 16:55:45 -05:00