Commit graph

6622 commits

Author SHA1 Message Date
Andreas Hansson 6498ccddb2 misc: Fix issues identified by static analysis
Another bunch of issues addressed.
2014-10-01 08:05:54 -04:00
Andreas Hansson b520223699 arm: Use MiscRegIndex rather than int when flattening
Some additional type checking to avoid future issues.
2014-10-01 08:05:52 -04:00
Andreas Hansson 10f82934be arm: More UBSan cleanups after additional full-system runs
Some incorrect casting to IntRegIndex, and a few uninitialized members
in the i8254xGBe device.
2014-10-01 08:05:51 -04:00
Andreas Hansson ec41000dad arm: Fixed undefined behaviours identified by gcc
This patch fixes the runtime errors highlighted by the undefined
behaviour sanitizer. In the end there were two issues. First, when
rotating an immediate, we ended up shifting an uint32_t by 32 in some
cases. This case is fixed by checking for a rotation by 0
positions. Second, the Mrc15 and Mcr15 are operating on an IntReg and
a MiscReg, but we used the type RegRegImmOp and passed a MiscRegIndex
as an IntRegIndex. This issue is resolved by introducing a
MiscRegRegImmOp and RegMiscRegImmOp with the appropriate types.

With these fixes there are no runtime errors identified for the full
ARM regressions.
2014-09-27 09:08:37 -04:00
Andreas Hansson 341dbf2662 arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
2014-09-27 09:08:36 -04:00
Andreas Hansson deb2200671 scons: Address issues related to gcc 4.9.1
Fix a number few minor issues to please gcc 4.9.1. Removing the
'-fuse-linker-plugin' flag means no libraries are part of the LTO
process, but hopefully this is an acceptable loss, as the flag causes
issues on a lot of systems (only certain combinations of gcc, ld and
ar work).
2014-09-27 09:08:34 -04:00
Curtis Dunham 4836aef1e4 dev: Output invalid access size in IsaFake panic 2014-09-27 09:08:33 -04:00
Curtis Dunham b7f1d675da mem: Output precise range when XBar has conflicts 2014-09-27 09:08:32 -04:00
Curtis Dunham 725be98fe8 mem: Provide better diagnostic for unconnected port
When _masterPort is null, a message to that effect is
more helpful than a segfault.
2014-09-27 09:08:30 -04:00
Andreas Hansson de62aedabc misc: Fix a bunch of minor issues identified by static analysis
Add some missing initialisation, and fix a handful benign resource
leaks (including some false positives).
2014-09-27 09:08:29 -04:00
Mitch Hayenga cc6523e2d6 cpu: Remove unused deallocateContext calls
The call paths for de-scheduling a thread are halt() and suspend(), from
the thread context. There is no call to deallocateContext() in general,
though some CPUs chose to define it. This patch removes the function
from BaseCPU and the cores which do not require it.
2014-09-20 17:18:36 -04:00
Mitch Hayenga e1403fc2af alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
2014-09-20 17:18:35 -04:00
Andreas Hansson 1f6d5f8f84 mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.

As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.

--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
2014-09-20 17:18:32 -04:00
Stephan Diestelhorst 435f4aec3d mem: Add access statistics for the snoop filter
Adds a simple access counter for requests and snoops for the snoop filter and
also classifies hits based on whether a single other holder existed or whether
multiple shares held the line.
2014-04-25 12:36:16 +01:00
Stephan Diestelhorst afa2428eca mem: Tie in the snoop filter in the coherent bus 2014-09-20 17:18:29 -04:00
Stephan Diestelhorst 7d488cc66f mem: Add a simple snoop counter per bus
This patch adds a simple counter for both total messages and a histogram for
the fan-out of snoop messages.  The fan-out describes to how many ports snoops
had to be sent per incoming request / snoop-from-below.  Without any
cleverness, this usually means to either all, or all but the requesting port.
2014-04-24 13:28:47 +01:00
Stephan Diestelhorst fe98cb6be4 misc: Add functions for doing popcount and power-of-two checking
Adds two public domain algorithms for determining number of set bits and also
whether a value is a power of two, uses the builtin that is available in GCC
and clang for popcount.
2014-04-24 17:41:26 +01:00
Stephan Diestelhorst ba98d598ae mem: Simple Snoop Filter
This is a first cut at a simple snoop filter that tracks presence of lines in
the caches "above" it. The snoop filter can be applied at any given cache
hierarchy and will then handle the caches above it appropriately; there is no
need to use this only in the last-level bus.

This design currently has some limitations: missing stats, no notion of clean
evictions (these will not update the underlying snoop filter, because they are
not sent from the evicting cache down), no notion of capacity for the snoop
filter and thus no need for invalidations caused by capacity pressure in the
snoop filter. These are planned to be added on top with future change sets.
2014-09-20 17:18:26 -04:00
Stephan Diestelhorst 16351ba8d6 energy: Tighter checking of levels for DFS systems
There are cases where users might by accident / intention specify less voltage
operating points thatn frequency points.  We consider one of these cases
special: giving only a single voltage to a voltage domain effectively renders
it as a static domain.  This patch adds additional logic in the auxiliary parts
of the functionality to handle these cases properly (simple driver asking for
N>1 operating levels, we should return the same voltage for all of them) and
adds error checking code in the voltage domain.
2014-08-12 19:00:44 +01:00
Stephan Diestelhorst 65aaf62714 energy: Add the Energy Controller in the right configs
Tie in the newly created energy controller components in the default
configurations.
2014-07-25 13:36:23 +01:00
Akash Bagdia 04e51e5e3e energy: Memory-mapped Energy Controller component
This patch provides an Energy Controller device that provides software
(driver) access to a DVFS handler. The device is currently residing in
the dev/arm tree, but there is nothing inherently ARM specific in the
behaviour. It is currently only tested and supported for ARM Linux,
hence the location.
2014-09-20 17:18:23 -04:00
Stephan Diestelhorst 4422d1322a energy: Small extentions and fixes for DVFS handler
These additions allow easier interoperability with and querying from an
additional controller which will be in a separate patch.  Also adding warnings
for changing the enabled state of the handler across checkpoint / resume and
deviating from the state in the configuration.

Contributed-by: Akash Bagdia <akash.bagdia@arm.com>
2014-06-16 14:59:44 +01:00
Wendy Elsasser bf23847072 mem: Add DDR4 bank group timing
Added the following parameter to the DRAMCtrl class:
 - bank_groups_per_rank

This defaults to 1. For the DDR4 case, the default is overridden to indicate
bank group architecture, with multiple bank groups per rank.

Added the following delays to the DRAMCtrl class:
 - tCCD_L : CAS-to-CAS, same bank group delay
 - tRRD_L : RAS-to-RAS, same bank group delay

These parameters are only applied when bank group timing is enabled.  Bank
group timing is currently enabled only for DDR4 memories.

For all other memories, these delays will default to '0 ns'

In the DRAM controller model, applied the bank group timing to the per bank
parameters actAllowedAt and colAllowedAt.
The actAllowedAt will be updated based on bank group when an ACT is issued.
The colAllowedAt will be updated based on bank group when a RD/WR burst is
issued.

At the moment no modifications are made to the scheduling.
2014-09-20 17:18:21 -04:00
Wendy Elsasser b6ecfe9183 mem: Add memory rank-to-rank delay
Add the following delay to the DRAM controller:
 - tCS : Different rank bus turnaround delay

This will be applied for
 1) read-to-read,
 2) write-to-write,
 3) write-to-read, and
 4) read-to-write
command sequences, where the new command accesses a different rank
than the previous burst.

The delay defaults to 2*tCK for each defined memory class. Note that
this does not correspond to one particular timing constraint, but is a
way of modelling all the associated constraints.

The DRAM controller has some minor changes to prioritize commands to
the same rank. This prioritization will only occur when the command
stream is not switching from a read to write or vice versa (in the
case of switching we have a gap in any case).

To prioritize commands to the same rank, the model will determine if there are
any commands queued (same type) to the same rank as the previous command.
This check will ensure that the 'same rank' command will be able to execute
without adding bubbles to the command flow, e.g. any ACT delay requirements
can be done under the hoods, allowing the burst to issue seamlessly.
2014-09-20 17:17:57 -04:00
Wendy Elsasser a384525355 cpu: Update DRAM traffic gen
Add new DRAM_ROTATE mode to traffic generator.
This mode will generate DRAM traffic that rotates across
banks per rank, command types, and ranks per channel

The looping order is illustrated below:
for (ranks per channel)
   for (command types)
      for (banks per rank)
         // Generate DRAM Command Series

This patch also adds the read percentage as an input argument to the
DRAM sweep script. If the simulated read percentage is 0 or 100, the
middle for loop does not generate additional commands.  This loop is
used only when the read percentage is set to 50, in which case the
middle loop will toggle between read and write commands.

Modified sweep.py script, which generates DRAM traffic.
Added input arguments and support for new DRAM_ROTATE mode.
The script now has input arguments for:
 1) Read percentage
 2) Number of ranks
 3) Address mapping
 4) Traffic generator mode  (DRAM or DRAM_ROTATE)

The default values are:
 100% reads, 1 rank, RoRaBaCoCh address mapping, and DRAM traffic gen mode

For the DRAM traffic mode, added multi-rank support.
2014-09-20 17:17:55 -04:00
Andreas Sandberg 3f7a9348dd dev: Add support for 9p proxying over VirtIO
This patch adds support for 9p filesystem proxying over VirtIO. It can
currently operate by connecting to a 9p server over a socket
(VirtIO9PSocket) or by starting the diod 9p server and connecting over
pipe (VirtIO9PDiod).


*WARNING*: Checkpoints are currently not supported for systems with 9p
 proxies!
2014-09-20 17:17:54 -04:00
Andreas Sandberg 8c070c8f1b dev: Add a VirtIO block device model 2014-09-20 17:17:53 -04:00
Andreas Sandberg b8c9b04bd6 dev: Add a VirtIO console device model 2014-09-20 17:17:52 -04:00
Andreas Sandberg bf2c2183c6 dev, pci: Implement basic VirtIO support
This patch adds support for VirtIO over the PCI bus. It does so by
providing the following new SimObjects:

 * VirtIODeviceBase - Abstract base class for VirtIO devices.
 * PciVirtIO - VirtIO PCI transport interface.

A VirtIO device is hooked up to the guest system by adding a PciVirtIO
device to the PCI bus and connecting it to a VirtIO device using the
vio parameter.

New VirtIO devices should inherit from VirtIODevice base and
implementing one or more VirtQueues. The VirtQueues are usually
device-specific and all derive from the VirtQueue class. Queues must
be registered with the base class from the constructor since the
device assumes that the number of queues stay constant.
2014-09-20 17:17:51 -04:00
Andreas Sandberg 0c5139310d dev: Refactor terminal<->UART interface to make it more generic
The terminal currently assumes that the transport to the guest always
inherits from the Uart class. This assumption breaks when
implementing, for example, a VirtIO consoles. This patch removes this
assumption by adding pointer to the from the terminal to the uart and
replacing it with a more general callback interface. The Uart, or any
other class using the terminal, class implements an instance of the
callbacks class and registers it with the terminal.
2014-09-20 17:17:50 -04:00
Andreas Hansson 0fa128bbd0 base: Clean up redundant string functions and use C++11
This patch does a bit of housekeeping on the string helper functions
and relies on the C++11 standard library where possible. It also does
away with our custom string hash as an implementation is already part
of the standard library.
2014-09-20 17:17:49 -04:00
Andrew Bardsley b2c2e67468 base: Add getSectionNames to IniFile
Add an accessor to IniFile to list all the sections in the file.
2014-09-20 17:17:47 -04:00
Mitch Hayenga 4f0e3cd4d7 cpu: Add ExecFlags debug flag
Adds a debug flag to print out the flags a instruction is tagged with.
2014-09-20 17:17:45 -04:00
Mitch Hayenga 3e5bf0c922 mem: Remove the GHB prefetcher from the source tree
There are two primary issues with this code which make it deserving of deletion.

1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher
2) This prefetcher isn't even structured like a GHB prefetcher.
   It's basically a worse version of the stride prefetcher.

It primarily serves to confuse new gem5 users and most functionality is already
present in the stride prefetcher.
2014-09-20 17:17:44 -04:00
Dam Sunwoo ca3513d630 cpu: use probes infrastructure to do simpoint profiling
Instead of having code embedded in cpu model to do simpoint profiling use
the probes infrastructure to do it.
2014-09-20 17:17:43 -04:00
Andrew Bardsley 7329c0e20b config: Cleanup .json config file generation
This patch 'completes' .json config files generation by adding in the
SimObject references and String-valued parameters not currently
printed.

TickParamValues are also changed to print in the same tick-value
format as in .ini files.

This allows .json files to describe a system as fully as the .ini files
currently do.

This patch adds a new function config_value (which mirrors ini_str) to
each ParamValue and to SimObject.  This function can then be explicitly
changed to give different .json and .ini printing behaviour rather than
being written in terms of ini_str.
2014-09-20 17:17:42 -04:00
Andreas Hansson 41fc8a573e arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
2014-09-19 10:35:18 -04:00
Andreas Hansson 619c5519fe cpu: Use a deque in o3 rename instruction queue
Switch from a list to a data structure with better data layout.
2014-09-19 10:35:14 -04:00
Andreas Hansson 586a219d11 base: Ensure the CP annotation compiles again
A bit of revamping to get the CP annotate functionality to compile.
2014-09-19 10:35:12 -04:00
Andreas Hansson efd5cf323a misc: Use safe_cast when assumptions are made about return value
This patch changes two dynamic_cast to safe_cast as we assume the
return value is not NULL (without checking).
2014-09-19 10:35:11 -04:00
Andreas Hansson 32c111eda4 misc: Restore ostream flags where needed
This patch ensures we adhere to the normal ostream usage rules, and
restore the flags after modifying them.
2014-09-19 10:35:09 -04:00
Andreas Hansson addfd89dce stats: Fix flow-control bug in Vector2D printing 2014-09-19 10:35:08 -04:00
Andreas Hansson f615c4aeb0 misc: Remove assertions ensuring unsigned values >= 0 2014-09-19 10:35:07 -04:00
Andreas Hansson 377f081251 mem: Check return value of checkFunctional in SimpleMemory
Simple fix to ensure we only iterate until we are done.
2014-09-19 10:35:06 -04:00
Andreas Hansson 38646d48eb mem: Add checks to sendTimingReq in cache
A small fix to ensure the return value is not ignored.
2014-09-19 10:35:04 -04:00
Nilay Vaish 2ccdfc547d ruby: network: revert some of the changes from ad9c042dce54
The changeset ad9c042dce54 made changes to the structures under the network
directory to use a map of buffers instead of vector of buffers.
The reasoning was that not all vnets that are created are used and we
needlessly allocate more buffers than required and then iterate over them
while processing network messages.  But the move to map resulted in a slow
down which was pointed out by Andreas Hansson.  This patch moves things
back to using vector of message buffers.
2014-09-15 16:19:38 -05:00
Andrew Bardsley 1a45a8c5d3 cpu: Fix memory access in Minor not setting parent Request flags
This patch fixes cases where uncacheable/memory type flags are not set
correctly on a memory op which is split in the LSQ.  Without this
patch, request->request if freely used to check flags where the flags
should actually come from the accumulation of request fragment flags.

This patch also fixes a bug where an uncacheable access which passes
through tryToSendRequest more than once can increment
LSQ::numAccessesInMemorySystem more than once.
2014-09-12 10:22:49 -04:00
Andrew Bardsley c8b919aba2 style: Fix line continuation, especially in debug messages
This patch closes a number of space gaps in debug messages caused by
the incorrect use of line continuation within strings. (There's also
one consistency change to a similar, but correct, use of line
continuation)
2014-09-12 10:22:47 -04:00
Andreas Hansson 2b4906fc64 minor: Fix typo in DPRINTF for Minor branch prediction 2014-09-12 10:22:46 -04:00
Andreas Sandberg 53a24b01ab sim: Automatically unregister probe listeners
The ProbeListener base class automatically registers itself with a
probe manager. Currently, the class does not unregister a itself when
it is destroyed, which makes removing probes listeners somewhat
cumbersome. This patch adds an automatic call to
manager->removeListener in the ProbeListener destructor, which solves
the problem.
2014-09-09 04:36:43 -04:00
Geoffrey Blake b0e4de667a config: Fix vectorparam command line parsing
Parsing vectorparams from the command was slightly broken
in that it wouldn't accept the input that the help message
provided to the user and it didn't do the conversion
on the second code path used to convert the string input
to the actual internal representation.  This patch fixes these bugs.
2014-09-09 04:36:34 -04:00
Mitch Hayenga cd1bd7572a cpu: Only iterate over possible threads on the o3 cpu
Some places in O3 always iterated over "Impl::MaxThreads" even if a CPU had
fewer threads.  This removes a few of those instances.
2014-09-09 04:36:34 -04:00
Mitch Hayenga 9a595fac74 mem: Add accessor function for vaddr
Determine if a request has an associated virtual address.
2014-09-09 04:36:33 -04:00
Andreas Sandberg 11494c4345 sim: Fix resource leak in BaseGlobalEvent
Static analysis revealed that BaseGlobalEvent::barrier was never
deallocated. This changeset solves this leak by making the barrier
allocation a part of the BaseGlobalEvent instead of storing a pointer
to a separate heap-allocated barrier.
2014-09-09 04:36:32 -04:00
Andreas Hansson da4539dc74 misc: Fix a number of unitialised variables and members
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
2014-09-09 04:36:31 -04:00
Ali Saidi 346fe73370 dev: seperate legacy io offsets from PCI offset
The PC platform has a single IO range that is used both legacy IO and PCI IO
while other platforms may use seperate regions. Provide another mechanism to
configure the legacy IO base address range and set it to the PCI IO address
range for x86.
2014-09-03 07:43:06 -04:00
Ali Saidi 1c0ae90027 arm: Support >2GB of memory for AArch64 systems 2014-09-03 07:43:05 -04:00
Ali Saidi 1e13f1b074 dev, arm: Add support for linux generic pci host driver
This change adds support for a generic pci host bus driver that
has been included in recent Linux kernel instead of the more
bespoke one we've been using to date. It also works with
aarch64 so it provides PCI support for 64-bit ARM Linux.

To make this work a new configuration option pci_io_base is added
to the RealView platform that should be set to the start of
the memory used as memory mapped IO ports (IO ports that are
memory mapped, not regular memory mapped IO). And a parameter
pci_cfg_gen_offsets which specifies if the config space
offsets should be used that the generic driver expects.

To use the pci-host-generic device you need to:
pci_io_base = 0x2f000000 (Valid for VExpress EMM)
pci_cfg_gen_offsets = True

and add the following to your device tree:

    pci {
        compatible = "pci-host-ecam-generic";
        device_type = "pci";
        #address-cells = <0x3>;
        #size-cells = <0x2>;
        #interrupt-cells = <0x1>;
        //bus-range = <0x0 0x1>;

        // CPU_PHYSICAL(2)  SIZE(2)
        // Note, some DTS blobs only support 1 size
        reg = <0x0 0x30000000 0x0 0x10000000>;

        // IO (1), no bus address (2), cpu address (2), size (2)
        // MMIO (1), at address (2), cpu address (2), size (2)
        ranges = <0x01000000 0x0 0x00000000 0x0 0x2f000000 0x0 0x10000>,
                 <0x02000000 0x0 0x40000000 0x0 0x40000000 0x0 0x10000000>;

        // With gem5 we typically use INTA/B/C/D one per device
        interrupt-map = <0x0000 0x0 0x0 0x1 0x1 0x0 0x11 0x1
                         0x0000 0x0 0x0 0x2 0x1 0x0 0x12 0x1
                         0x0000 0x0 0x0 0x3 0x1 0x0 0x13 0x1
                         0x0000 0x0 0x0 0x4 0x1 0x0 0x14 0x1>;

        // Only match INTA/B/C/D and not BDF
        interrupt-map-mask = <0x0000 0x0 0x0 0x7>;
    };
2014-09-03 07:43:04 -04:00
Geoffrey Blake 31e4e475d9 config: Add port splicing capability to PortRef class
The new configuration scripts need the ability to splice
a simobject between a pair of ports that are already connected.
The primary use case is when a CommMonitor needs to be
created after the system is configured and then spliced between
the pair of ports it will monitor.
2014-09-03 07:43:03 -04:00
Geoffrey Blake 845e199934 config: Refactor RealviewEMM to fit into new config system
This eliminates some default devices and adds in helper functions
to connect the devices defined here to associate with the proper
clock domains.
2014-09-03 07:43:01 -04:00
Andreas Hansson 83a46bfc09 base: Use STL C++11 random number generation
This patch changes the random number generator from the in-house
Mersenne twister to an implementation relying entirely on C++11 STL.

The format for the checkpointing of the twister is simplified. As the
functionality was never used this should not matter. Note that this
patch does not actually make use of the checkpointing
functionality. As the random number generator is not thread safe, it
may be sensible to create one generator per thread, system, or even
object. Until this is decided the status quo is maintained in that no
generator state is part of the checkpoint.
2014-09-03 07:42:55 -04:00
Andreas Hansson 2698e73966 base: Use the global Mersenne twister throughout
This patch tidies up random number generation to ensure that it is
done consistently throughout the code base. In essence this involves a
clean-up of Ruby, and some code simplifications in the traffic
generator.

As part of this patch a bunch of skewed distributions (off-by-one etc)
have been fixed.

Note that a single global random number generator is used, and that
the object instantiation order will impact the behaviour (the sequence
of numbers will be unaffected, but if module A calles random before
module B then they would obviously see a different outcome). The
dependency on the instantiation order is true in any case due to the
execution-model of gem5, so we leave it as is. Also note that the
global ranom generator is not thread safe at this point.

Regressions using the memtest, TrafficGen or any Ruby tester are
affected and will be updated accordingly.
2014-09-03 07:42:54 -04:00
Andreas Hansson 1ff4c45bbb mem: Avoid unecessary retries when bus peer is not ready
This patch removes unecessary retries that happened when the bus layer
itself was no longer busy, but the the peer was not yet ready. Instead
of sending a retry that will inevitably not succeed, the bus now
silenty waits until the peer sends a retry.
2014-09-03 07:42:53 -04:00
Mitch Hayenga 8f95144e16 arm: Make memory ops work on 64bit/128-bit quantities
Multiple instructions assume only 32-bit load operations are available,
this patch increases load sizes to 64-bit or 128-bit for many load pair and
load multiple instructions.
2014-09-03 07:42:52 -04:00
Curtis Dunham f6f63ec0aa mem: write streaming support via WriteInvalidate promotion
Support full-block writes directly rather than requiring RMW:
 * a cache line is allocated in the cache upon receipt of a
   WriteInvalidateReq, not the WriteInvalidateResp.
 * only top-level caches allocate the line; the others just pass
   the request along and invalidate as necessary.
 * to close a timing window between the *Req and the *Resp, a new
   metadata bit tracks whether another cache has read a copy of
   the new line before the writeback to memory.
2014-06-27 12:29:00 -05:00
Andreas Hansson 3be4f4b846 mem: Fix a bug in the cache port flow control
This patch fixes a bug in the cache port where the retry flag was
reset too early, allowing new requests to arrive before the retry was
actually sent, but with the event already scheduled. This caused a
deadlock in the interactions with the O3 LSQ.

The patche fixes the underlying issue by shifting the resetting of the
flag to be done by the event that also calls sendRetry(). The patch
also tidies up the flow control in recvTimingReq and ensures that we
also check if we already have a retry outstanding.
2014-09-03 07:42:50 -04:00
Curtis Dunham 5d029463ee cpu, mem: Make software prefetches non-blocking
Previously, they were treated so much like loads that they could stall
at the head of the ROB.  Now they are always treated like L1 hits.
If they actually miss, a new request is created at the L1 and tracked
from the MSHRs there if necessary (i.e. if it didn't coalesce with
an existing outstanding load).
2014-05-13 12:20:49 -05:00
Curtis Dunham e3b19cb294 mem: Refactor assignment of Packet types
Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.
2014-05-13 12:20:48 -05:00
Mitch Hayenga afbae1ec95 x86: Flag instructions that call suspend as IsQuiesce
The o3 cpu relies upon instructions that suspend a thread context being
flagged as "IsQuiesce".  If they are not, unpredictable behavior can occur.
This patch fixes that for the x86 ISA.
2014-09-03 07:42:46 -04:00
Mitch Hayenga 659bdc1a6b cpu: Fix o3 drain bug
For X86, the o3 CPU would get stuck with the commit stage not being
drained if an interrupt arrived while drain was pending. isDrained()
makes sure that pcState.microPC() == 0, thus ensuring that we are at
an instruction boundary. However, when we take an interrupt we
execute:

    pcState.upc(romMicroPC(entry));
    pcState.nupc(romMicroPC(entry) + 1);
    tc->pcState(pcState);

As a result, the MicroPC is no longer zero. This patch ensures the drain is
delayed until no interrupts are present.  Once draining, non-synchronous
interrupts are deffered until after the switch.
2014-09-03 07:42:45 -04:00
Mitch Hayenga bb1e6cf7c4 arm: Fix v8 neon latency issue for loads/stores
Neon memory ops that operate on multiple registers currently have very poor
performance because of interleave/deinterleave micro-ops.

This patch marks the deinterleave/interleave micro-ops as "No_OpClass" such
that they take minumum cycles to execute and are never resource constrained.

Additionaly the micro-ops over-read registers.  Although one form may need
to read up to 20 sources, not all do.  This adds in new forms so false
dependencies are not modeled.  Instructions read their minimum number of
sources.
2014-09-03 07:42:44 -04:00
Curtis Dunham 4a3f11149d arm: use condition code registers for ARM ISA
Analogous to ee049bf (for x86).  Requires a bump of the checkpoint version
and corresponding upgrader code to move the condition code register values
to the new register file.
2014-04-29 16:05:02 -05:00
Andrew Bardsley 035a82ee2c arm: ISA X31 destination register fix
This patch substituted the zero register for X31 used as a
destination register.  This prevents false dependencies based on
X31.
2014-09-03 07:42:43 -04:00
Dam Sunwoo 5008a20aa4 cpu: fix bimodal predictor to use correct global history reg
A small bug in the bimodal predictor caused significant degradation in
performance on some benchmarks. This was caused by using the wrong
globalHistoryReg during the update phase. This patches fixes the bug
and brings the performance to normal level.
2014-09-03 07:42:41 -04:00
Mitch Hayenga 476c6fe368 arm: Mark v7 cbz instructions as direct branches
v7 cbz/cbnz instructions were improperly marked as indirect branches.
2014-09-03 07:42:40 -04:00
Mitch Hayenga 4f13f676aa cpu: Fix cache blocked load behavior in o3 cpu
This patch fixes the load blocked/replay mechanism in the o3 cpu.  Rather than
flushing the entire pipeline, this patch replays loads once the cache becomes
unblocked.

Additionally, deferred memory instructions (loads which had conflicting stores),
when replayed would not respect the number of functional units (only respected
issue width).  This patch also corrects that.

Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.
2014-09-03 07:42:39 -04:00
Mitch Hayenga 283935a6f0 cpu: Fix o3 quiesce fetch bug
O3 is supposed to stop fetching instructions once a quiesce is encountered.
However due to a bug, it would continue fetching instructions from the current
fetch buffer.  This is because of a break statment that only broke out of the
first of 2 nested loops.  It should have broken out of both.
2014-09-03 07:42:38 -04:00
Mitch Hayenga 4f26bedc18 cpu: Fix SMT scheduling issue with the O3 cpu
The o3 cpu could attempt to schedule inactive threads under round-robin SMT
mode.

This is because it maintained an independent priority list of threads from the
active thread list.  This priority list could be come stale once threads were
inactive, leading to the cpu trying to fetch/commit from inactive threads.


Additionally the fetch queue is now forcibly flushed of instrctuctions
from the de-scheduled thread.

Relevant output:

24557000: system.cpu: [tid:1]: Calling deactivate thread.
24557000: system.cpu: [tid:1]: Removing from active threads list

24557500: system.cpu:
FullO3CPU: Ticking main, FullO3CPU.
24557500: system.cpu.fetch: Running stage.
24557500: system.cpu.fetch: Attempting to fetch from [tid:1]
2014-09-03 07:42:37 -04:00
Mitch Hayenga daedc5a491 cpu: Fix incorrect speculative branch predictor behavior
When a branch mispredicted gem5 would squash all history after and including
the mispredicted branch.  However, the mispredicted branch is still speculative
and its history is required to rollback state if another, older, branch
mispredicts.  This leads to things like RAS corruption.
2014-09-03 07:42:36 -04:00
Mitch Hayenga ecd5300971 cpu: Add a fetch queue to the o3 cpu
This patch adds a fetch queue that sits between fetch and decode to the
o3 cpu.  This effectively decouples fetch from decode stalls allowing it
to be more aggressive, running futher ahead in the instruction stream.
2014-09-03 07:42:35 -04:00
Mitch Hayenga 1716749c8c cpu: Fix o3 front-end pipeline interlock behavior
The o3 pipeline interlock/stall logic is incorrect.  o3 unnecessicarily stalled
fetch and decode due to later stages in the pipeline.  In general, a stage
should usually only consider if it is stalled by the adjacent, downstream stage.
Forcing stalls due to later stages creates and results in bubbles in the
pipeline.  Additionally, o3 stalled the entire frontend (fetch, decode, rename)
on a branch mispredict while the ROB is being serially walked to update the
RAT (robSquashing). Only should have stalled at rename.
2014-09-03 07:42:34 -04:00
Mitch Hayenga 976f27487b cpu: Change writeback modeling for outstanding instructions
As highlighed on the mailing list gem5's writeback modeling can impact
performance.  This patch removes the limitation on maximum outstanding issued
instructions, however the number that can writeback in a single cycle is still
respected in instToCommit().
2014-09-03 07:42:33 -04:00
Mitch Hayenga fd722946dd arch: Properly guess OpClass from optional StaticInst flags
isa_parser.py guesses the OpClass if none were given based upon the StaticInst
flags.  The existing code does not take into account optionally set flags.
This code hoists the setting of optional flags so OpClass is properly assigned.
2014-09-03 07:42:32 -04:00
Geoffrey Blake b404ffde60 cache: Fix handling of LL/SC requests under contention
If a set of LL/SC requests contend on the same cache block we
can get into a situation where CPUs will deadlock if they expect
a failed SC to supply them data.  This case happens where 3 or
more cores are contending for a cache block using LL/SC and the system
is configured where 2 cores are connected to a local bus and the
third is connected to a remote bus.  If a core on the local bus
sends an SCUpgrade and the core on the remote bus sends and SCUpgrade
they will race to see who will win the SC access.  In the meantime
if the other core appends a read to one of the SCUpgrades it will expect
to be supplied data by that SCUpgrade transaction.  If it happens that
the SCUpgrade that was picked to supply the data is failed, it will
drop the appended request for data and never respond, leaving the requesting
core to deadlock.  This patch makes all SC's behave as normal stores to
prevent this case but still makes sure to check whether it can perform
the update.
2014-09-03 07:42:31 -04:00
Curtis Dunham 12210ada54 arm: support 16kb vm granules 2014-05-27 11:00:56 -05:00
Andreas Hansson 77c28cc395 mem: Packet queue clean up
No change in functionality, just a bit of tidying up.
2014-09-03 07:42:28 -04:00
Mitch Hayenga 71769d2d7b dev: Avoid invalid sized reads in PL390 with DPRINTF enabled
The first DPRINTF() in PL390::writeDistributor always read a uint32_t, though a
packet may have only been 1 or 2 bytes.  This caused an assertion in
packet->get().
2014-09-03 07:42:27 -04:00
Andrew Bardsley 87f6034462 sim: Fix checkpoint restore for Ticked
This patch makes restoring the 'lastStopped' value for Ticked-containing
objects (including MinorCPU) optional so that Ticked-containing objects
can be restored from non-Ticked-containing objects (such as AtomicSimpleCPU).
2014-09-03 07:42:25 -04:00
Andreas Sandberg 326662b01b arch, cpu: Factor out the ExecContext into a proper base class
We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.

The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.
2014-09-03 07:42:22 -04:00
Andreas Hansson e1ac962939 arch: Cleanup unused ISA traits constants
This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
2014-09-03 07:42:21 -04:00
Mitch Hayenga 23c8540756 config: Change parsing of Addr so hex values work from scripts
When passed from a configuration script with a hexadecimal value (like
"0x80000000"), gem5 would error out. This is because it would call
"toMemorySize" which requires the argument to end with a size specifier (like
1MB, etc).

This modification makes it so raw hex values can be passed through Addr
parameters from the configuration scripts.
2014-09-03 07:42:20 -04:00
Andreas Hansson 1046b8d6e5 arm: Fix ExtMachInst hash operator underlying type
This patch fixes the hash operator used for ARM ExtMachInst, which
incorrectly was still using uint32_t. Instead of changing it to
uint64_t it is not using the underlying data type of the BitUnion.
2014-09-03 07:42:19 -04:00
Nilay Vaish 2cbe7c705b ruby: remove typedef of Index as int64
The Index type defined as typedef int64 does not really provide any help
since in most places we use primitive types instead of Index.  Also, the name
Index is very generic that it does not merit being used as a typename.
2014-09-01 16:55:50 -05:00
Nilay Vaish 4ccdf8fb81 x86: set op class of two fp instructions
This patch sets op class of two fp instructions: movfp and pop x87 stack
as IntAluOp since these instructions do not make use of the fp alu.
2014-09-01 16:55:49 -05:00
Nilay Vaish b4dade6fb2 ruby: PerfectSwitch: moves code to a per vnet helper function
This patch moves code from the wakeup() function to a operateVnet().
The aim is to improve the readiblity of the code.
2014-09-01 16:55:48 -05:00
Nilay Vaish 7a0d5aafe4 ruby: message buffers: significant changes
This patch is the final patch in a series of patches.  The aim of the series
is to make ruby more configurable than it was.  More specifically, the
connections between controllers are not at all possible (unless one is ready
to make significant changes to the coherence protocol).  Moreover the buffers
themselves are magically connected to the network inside the slicc code.
These connections are not part of the configuration file.

This patch makes changes so that these connections will now be made in the
python configuration files associated with the protocols.  This requires
each state machine to expose the message buffers it uses for input and output.
So, the patch makes these buffers configurable members of the machines.

The patch drops the slicc code that usd to connect these buffers to the
network.  Now these buffers are exposed to the python configuration system
as Master and Slave ports.  In the configuration files, any master port
can be connected any slave port.  The file pyobject.cc has been modified to
take care of allocating the actual message buffer.  This is inline with how
other port connections work.
2014-09-01 16:55:47 -05:00
Nilay Vaish 00286fc5cb build opts: add MI_example to NULL ISA
A later changeset changes the file src/python/swig/pyobject.cc to include
a header file that includes a header file generated at build time depending
on the PROTOCOL in use.  Since NULL ISA was not specifying any protocol,
this resulted in compilation problems.  Hence, the changeset.
2014-09-01 16:55:46 -05:00
Nilay Vaish d07abd9b5b mem: change the namespace Message to ProtoMessage
The namespace Message conflicts with the Message data type used extensively
in Ruby.  Since Ruby is being moved to the same Master/Slave ports based
configuration style as the rest of gem5, this conflict needs to be resolved.
Hence, the namespace is being renamed to ProtoMessage.
2014-09-01 16:55:46 -05:00
Nilay Vaish cee8faaad0 ruby: slicc: change the way configurable members are specified
There are two changes this patch makes to the way configurable members of a
state machine are specified in SLICC.  The first change is that the data
member declarations will need to be separated by a semi-colon instead of a
comma.  Secondly, the default value to be assigned would now use SLICC's
assignment operator i.e. ':='.
2014-09-01 16:55:45 -05:00
Nilay Vaish b1d3873ec5 ruby: slicc: improve the grammar
This patch changes the grammar for SLICC so as to remove some of the
redundant / duplicate rules.  In particular rules for object/variable
declaration and class member declaration have been unified. Similarly, the
rules for a general function and a class method have been unified.

One more change is in the priority of two rules.  The first rule is on
declaring a function with all the params typed and named.  The second rule is
on declaring a function with all the params only typed.  Earlier the second
rule had a higher priority.  Now the first rule has a higher priority.
2014-09-01 16:55:44 -05:00
Nilay Vaish 3202ec98e7 ruby: mesi three level: slight naming changes. 2014-09-01 16:55:44 -05:00
Nilay Vaish 557200725c ruby: slicc: donot prefix machine name to variables
This changeset does away with prefixing of member variables of state machines
with the identity of the machine itself.
2014-09-01 16:55:43 -05:00
Nilay Vaish 6ceb1aadc2 ruby: remove unused toString() from AbstractController 2014-09-01 16:55:42 -05:00
Nilay Vaish 00dbadcbb0 ruby: network: move getNumNodes() to base class
All the implementations were doing the same things.
2014-09-01 16:55:42 -05:00
Nilay Vaish cc2cc58869 ruby: eliminate type Time
There is another type Time in src/base class which results in a conflict.
2014-09-01 16:55:41 -05:00
Nilay Vaish 82d136285d ruby: move files from ruby/system to ruby/structures
The directory ruby/system is crowded and unorganized. Hence, the files the
hold actual physical structures, are being moved to the directory
ruby/structures.  This includes Cache Memory, Directory Memory,
Memory Controller, Wire Buffer, TBE Table, Perfect Cache Memory, Timer Table,
Bank Array.

The directory ruby/systems has the glue code that holds these structures
together.

--HG--
rename : src/mem/ruby/system/MachineID.hh => src/mem/ruby/common/MachineID.hh
rename : src/mem/ruby/buffers/MessageBuffer.cc => src/mem/ruby/network/MessageBuffer.cc
rename : src/mem/ruby/buffers/MessageBuffer.hh => src/mem/ruby/network/MessageBuffer.hh
rename : src/mem/ruby/buffers/MessageBufferNode.cc => src/mem/ruby/network/MessageBufferNode.cc
rename : src/mem/ruby/buffers/MessageBufferNode.hh => src/mem/ruby/network/MessageBufferNode.hh
rename : src/mem/ruby/system/AbstractReplacementPolicy.hh => src/mem/ruby/structures/AbstractReplacementPolicy.hh
rename : src/mem/ruby/system/BankedArray.cc => src/mem/ruby/structures/BankedArray.cc
rename : src/mem/ruby/system/BankedArray.hh => src/mem/ruby/structures/BankedArray.hh
rename : src/mem/ruby/system/Cache.py => src/mem/ruby/structures/Cache.py
rename : src/mem/ruby/system/CacheMemory.cc => src/mem/ruby/structures/CacheMemory.cc
rename : src/mem/ruby/system/CacheMemory.hh => src/mem/ruby/structures/CacheMemory.hh
rename : src/mem/ruby/system/DirectoryMemory.cc => src/mem/ruby/structures/DirectoryMemory.cc
rename : src/mem/ruby/system/DirectoryMemory.hh => src/mem/ruby/structures/DirectoryMemory.hh
rename : src/mem/ruby/system/DirectoryMemory.py => src/mem/ruby/structures/DirectoryMemory.py
rename : src/mem/ruby/system/LRUPolicy.hh => src/mem/ruby/structures/LRUPolicy.hh
rename : src/mem/ruby/system/MemoryControl.cc => src/mem/ruby/structures/MemoryControl.cc
rename : src/mem/ruby/system/MemoryControl.hh => src/mem/ruby/structures/MemoryControl.hh
rename : src/mem/ruby/system/MemoryControl.py => src/mem/ruby/structures/MemoryControl.py
rename : src/mem/ruby/system/MemoryNode.cc => src/mem/ruby/structures/MemoryNode.cc
rename : src/mem/ruby/system/MemoryNode.hh => src/mem/ruby/structures/MemoryNode.hh
rename : src/mem/ruby/system/MemoryVector.hh => src/mem/ruby/structures/MemoryVector.hh
rename : src/mem/ruby/system/PerfectCacheMemory.hh => src/mem/ruby/structures/PerfectCacheMemory.hh
rename : src/mem/ruby/system/PersistentTable.cc => src/mem/ruby/structures/PersistentTable.cc
rename : src/mem/ruby/system/PersistentTable.hh => src/mem/ruby/structures/PersistentTable.hh
rename : src/mem/ruby/system/PseudoLRUPolicy.hh => src/mem/ruby/structures/PseudoLRUPolicy.hh
rename : src/mem/ruby/system/RubyMemoryControl.cc => src/mem/ruby/structures/RubyMemoryControl.cc
rename : src/mem/ruby/system/RubyMemoryControl.hh => src/mem/ruby/structures/RubyMemoryControl.hh
rename : src/mem/ruby/system/RubyMemoryControl.py => src/mem/ruby/structures/RubyMemoryControl.py
rename : src/mem/ruby/system/SparseMemory.cc => src/mem/ruby/structures/SparseMemory.cc
rename : src/mem/ruby/system/SparseMemory.hh => src/mem/ruby/structures/SparseMemory.hh
rename : src/mem/ruby/system/TBETable.hh => src/mem/ruby/structures/TBETable.hh
rename : src/mem/ruby/system/TimerTable.cc => src/mem/ruby/structures/TimerTable.cc
rename : src/mem/ruby/system/TimerTable.hh => src/mem/ruby/structures/TimerTable.hh
rename : src/mem/ruby/system/WireBuffer.cc => src/mem/ruby/structures/WireBuffer.cc
rename : src/mem/ruby/system/WireBuffer.hh => src/mem/ruby/structures/WireBuffer.hh
rename : src/mem/ruby/system/WireBuffer.py => src/mem/ruby/structures/WireBuffer.py
rename : src/mem/ruby/recorder/CacheRecorder.cc => src/mem/ruby/system/CacheRecorder.cc
rename : src/mem/ruby/recorder/CacheRecorder.hh => src/mem/ruby/system/CacheRecorder.hh
2014-09-01 16:55:40 -05:00
Alexandru 5efbb4442a mem: adding architectural page table support for SE mode
This patch enables the use of page tables that are stored in system memory
and respect x86 specification, in SE mode. It defines an architectural
page table for x86 as a MultiLevelPageTable class and puts a placeholder
class for other ISAs page tables, giving the possibility for future
implementation.
2014-08-28 10:11:44 -05:00
Alexandru 26ac28dec2 mem: adding a multi-level page table class
This patch defines a multi-level page table class that stores the page table in
system memory, consistent with ISA specifications. In this way, cpu models that
use the actual hardware to execute (e.g. KvmCPU), are able to traverse the page
table.
2014-04-01 12:18:12 -05:00
Andreas Hansson 9e4cd5bf1e mem: Fix DRAMSim2 cycle check when restoring from checkpoint
This patch ensures the cycle check is still valid even restoring from
a checkpoint. In this case the DRAMSim2 cycle count is relative to the
startTick rather than 0.
2014-08-26 10:14:38 -04:00
Andreas Hansson 6fa8015b7f base: Add const to intmath and be more flexible with typing
This patch ensures the functions can be used on const variables.
2014-08-26 10:14:32 -04:00
Andreas Sandberg 70176fecd1 base: Replace the internal varargs stuff with C++11 constructs
We currently use our own home-baked support for type-safe variadic
functions. This is confusing and somewhat limited (e.g., cprintf only
supports a limited number of arguments). This changeset converts all
uses of our internal varargs support to use C++11 variadic macros.
2014-08-26 10:13:45 -04:00
Andreas Sandberg f3e5fee743 base: Add compiler macros for C++11 final/override
Add the macros M5_ATTR_FINAL and M5_ATTR_OVERRIDE which are defined to
final and override respectively if supported by the compiler. This is
done to allow a smooth transition to gcc >= 4.7.
2014-08-26 10:13:33 -04:00
Mitch Hayenga 0da99b7e0c mips: Fix RLIMIT_RSS naming
MIPS defined RLIMIT_RSS in a way that could cause a naming conflict with
RLIMIT_RSS from the host system.  Broke clang+MacOS build.
2014-08-26 10:13:31 -04:00
Andreas Sandberg 61b8d5e4e4 base: Add a static assert to check bit union ranges
If a bit field in a bit union specified as Bitfield<LSB, MSB> instead
of Bitfield<MSB, LSB> the code silently fails and the field is read as
zero. This changeset introduces a static assert that tests, at compile
time, that the bit order is correct.
2014-08-26 10:13:28 -04:00
Andreas Sandberg a3d3eb0ff7 sparc: Fixup bit ordering in the PSTATE bit union
The order of the MSB and LSB bit of the mm field in the PSTATE union
is wrong. Any access to this field will currently be ignored and reads
will always return zero. This patch fixes the ordering so it is <MSB,
LSB> instead of <LSB, MSB>.
2014-08-26 10:13:23 -04:00
Andreas Hansson 3efabb4b2f mem: Update DRAM controller comments
Update comments and add a reference for more information.
2014-08-26 10:13:03 -04:00
Andreas Hansson 56b7796e0d mem: Fix address interleaving bug in DRAM controller
This patch fixes a bug in the DRAM controller address decoding. In
cases where the DRAM burst size (e.g. 32 bytes in a rank with a single
LPDDR3 x32) was smaller than the channel interleaving size
(e.g. systems with a 64-byte cache line) one address bit effectively
got used as a channel bit when it should have been a low-order column
bit.

This patch adds a notion of "columns per stripe", and more clearly
deals with the low-order column bits and high-order column bits. The
patch also relaxes the granularity check such that it is possible to
use interleaving granularities other than the cache line size.

The patch also adds a missing M5_CLASS_VAR_USED to the tCK member as
it is only used in the debug build for now.
2014-08-26 10:12:45 -04:00
Curtis Dunham 04d1f61ae8 sim: bump checkpoint version for multiple event queues
This patch adds a fix for older checkpoints before support for
multiple event queues were added in changeset 2cce74fe359e. The change
in checkpoint version should really hav ebeen part of the
aforementioned changeset.
2014-02-05 16:17:41 -06:00
Dam Sunwoo b04d6c7c33 arm: change MISCREG_L2ERRSR to warn not fail
Some newer binaries compiled for Versatile Express TC2 contain access
to implementation specific L2MERRSR registers. This causes an infinite
loop of undefined exceptions. This patch changes the behavior to "warn
not fail" to keep the workloads going.
2014-08-13 06:57:36 -04:00
Dam Sunwoo 74a4926fe0 sim: remove kernel mapping check for baremetal workloads
Baremetal workloads are specified using the "kernel" parameter, but
don't always have the correct address mappings. This patch adds a
boolean flag to the system and bypasses the kernel addr mapping checks
when running in baremetal mode.
2014-08-13 06:57:35 -04:00
Andreas Sandberg 41d069ef6a scons: Build the branch predictor for all CPUs
The branch predictor is normally only built when a CPU that uses a
branch predictor is built. The list of CPUs is currently incomplete as
the simple CPUs support branch predictors (for warming, branch stats,
etc). In practice, all CPU models now use branch predictors, so this
changeset removes the CPU model check and replaces it with a check for
the NULL ISA.
2014-08-13 06:57:31 -04:00
Andreas Sandberg 8b8d991df0 mips: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
MIPS-specific classes to silence the warning.
2014-08-13 06:57:30 -04:00
Andreas Sandberg 8d04e32a83 power: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
POWER-specific ProcessInfo struct to silence the warning.
2014-08-13 06:57:29 -04:00
Andreas Sandberg 6b908211e6 scons: Silence clang 3.4 warnings on Ubuntu 12.04
This changeset fixes three types of warnings that occur in clang 3.4
on Ubuntu 12.04:

 * Certain versions of libstdc++ (primarily 4.8) use struct and class
   interchangeably. This triggers a warning in clang.

 * Swig has a tendency to generate code with the register class which
   was deprecated in C++11. This triggers a deprecation warning in
   clang.

 * Swig sometimes generates Python wrapper code which returns
   uninitialized values. It's unclear if this is actually a problem
   (the cases might be limited to failure paths). We'll silence these
   warnings for now since there is little we can do about the
   generated code.
2014-08-13 06:57:28 -04:00
Andreas Sandberg eb9226317d base: Remove unused M5_PRAGMA_NORETURN
The M5_PRAGMA_NORETURN macro was only used in for
__exit_message. Since the macro only holds a stub definition and all
functions with noreturn semantics use the M5_ATTR_NORETURN, this
macros is completely redundant.
2014-08-13 06:57:27 -04:00
Andreas Sandberg 25f5a6733c cpu: Don't forward declare RefCountingPtr
RefCountingPtr is sometimes forward declared to avoid having to
include refcnt.hh. This does not work since we typically return
instances of RefCountingPtr rather than references to instances. The
only reason this currently works is that we include refcnt.hh in
cprintf.hh, which "leaks" the header to most other source files. This
changeset replaces such forward declarations with an include of
refcnt.hh.
2014-08-13 06:57:26 -04:00
Mitch Hayenga f6f6ae461e mem: Properly set cache block status fields on writebacks
When a cacheline is written back to a lower-level cache,
tags->insertBlock() sets various status parameters. However these
status bits were cleared immediately after calling. This patch makes
it so that these status fields are not cleared by moving them outside
of the tags->insertBlock() call.
2014-08-13 06:57:24 -04:00
Andreas Hansson 66904b9584 cpu: Modernise the branch predictor (STL and C++11)
This patch does some minor house keeping of the branch predictor by
adopting STL containers, and shifting some iterator to use range-based
for loops.

The predictor history is also changed from a list to a deque as we
never to insertion/deletion other than at the front and back.
2014-08-13 06:57:21 -04:00
Curtis Dunham 94daae6864 arm: remove dead code fplib mul64x64 2014-03-11 09:50:02 -05:00
Geoffrey Blake dbdce42b88 config: Add SubSystem container for simobjects
This patch adds the SubSystem container for grouping
simobjects together in logical subsystems to facilitate
building a larger system from constituent parts.  The container
is simply a non-abstract empty simobject to hold the components
that will be connected as its children.  In simulation the
object does not participate, its only use is during configuration
of the system.
2014-08-10 05:39:16 -04:00
Geoffrey Blake 09b5003815 config: Add hooks to enable new config sys
This patch adds helper functions to SimObject.py, params.py and
simulate.py to enable the new configuration system.  Functions like
enumerateParams() in SimObject lets the config system auto-generate
command line options for simobjects to be modified on the command
line.

Params in params.py have __call__() added
to their definition to allow the argparse module to use them
as a type to check command input is in the proper format.
2014-08-10 05:39:13 -04:00
Andreas Hansson 47313601c1 cpu: Ensure the traffic generator suppresses non-memory packets
This patch adds a check to ensure that packets which are not going to
a memory range are suppressed in the traffic generator. Thus, if a
trace is collected in full-system, the packets destined for devices
are not played back.
2014-08-10 05:39:04 -04:00
Andreas Hansson d45ab59c29 base: Remove unused files
A bit of pruning
2014-08-10 05:38:59 -04:00
Anthony Gutierrez a628afedad mem: refactor LRU cache tags and add random replacement tags
this patch implements a new tags class that uses a random replacement policy.
these tags prefer to evict invalid blocks first, if none are available a
replacement candidate is chosen at random.

this patch factors out the common code in the LRU class and creates a new
abstract class: the BaseSetAssoc class. any set associative tag class must
implement the functionality related to the actual replacement policy in the
following methods:

accessBlock()
findVictim()
insertBlock()
invalidate()
2014-07-28 12:23:23 -04:00
Andrew Bardsley 0e8a90f06b cpu: `Minor' in-order CPU model
This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).

The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).

Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.

Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.

Minor is faster than the o3 model. Sample results:

     Benchmark     |   Stat host_seconds (s)
    ---------------+--------v--------v--------
     (on ARM, opt) | simple | o3     | minor
                   | timing | timing | timing
    ---------------+--------+--------+--------
    10.linux-boot  |   169  |  1883  |  1075
    10.mcf         |   117  |   967  |   491
    20.parser      |   668  |  6315  |  3146
    30.eon         |   542  |  3413  |  2414
    40.perlbmk     |  2339  | 20905  | 11532
    50.vortex      |   122  |  1094  |   588
    60.bzip2       |  2045  | 18061  |  9662
    70.twolf       |   207  |  2736  |  1036
2014-07-23 16:09:04 -05:00
Steve Reinhardt 06bb6a4731 syscall emulation: fix fast build issue
Surprisingly gcc will complain about unused variables even
inside an 'if (false)' block.

I thought I had tested this previously, but apparently not.
2014-07-19 02:06:22 -07:00
Binh Pham c99b13d904 x86: make PioBus return BadAddress errors
Stop setting the use_default_range flag in PioBus in order to
have random bad addresses result in a BadAddress response and
not a gem5 fatal error.  This is necessary in Ruby as Ruby is
connected directly to PioBus, so misspeculated addresses will
be sent there directly.  For the classic memory system, this
change has no effect, as bad addresses are caught by the
memory bus before being sent to the PioBus.

This work was done while Binh was an intern at AMD Research.
2014-07-18 22:05:51 -07:00
Steve Reinhardt fe530648d5 sim: remove unused MemoryModeStrings array
The System object has a static MemoryModeStrings array
that's (1) unused and (2) redundant, since there's an
auto-generated version in the Enums namespace.  No
point in leaving it in.
2014-07-18 22:05:51 -07:00
Steve Reinhardt e3de6950a4 kern: get rid of unused linux syscall files 2014-07-18 22:05:51 -07:00
Steve Reinhardt f5aace8300 syscall emulation: fix DPRINTF arg ordering bug
When we switched getSyscallArg() from explicit arg indices to
the implicit method, some DPRINTF arguments were left as calls
to getSyscallArg(), even though C/C++ doesn't guarantee
anything about the order of invocation of these calls.  As a
result, the args could be printed out in arbitrary orders.

Interestingly, this bug has been around since 2009:
http://repo.gem5.org/gem5/rev/4842482e1bd1
2014-07-18 22:05:51 -07:00
Anthony Gutierrez 59c8c454eb base: fix operator== for comparing EthAddr objects
this operator uses memcmp() to detect if two EthAddr object have the same
address, however memcmp() will return 0 if all bytes are equal. operator==
returns the return value of memcmp() to indicate whether or not two
address are equal. this is incorrect as it will always give the opposite of
the intended behavior. this patch fixes that problem.
2014-07-09 09:28:15 -04:00
Anthony Gutierrez 3956ec0a89 base: fix some bugs in EthAddr
per the IEEE 802 spec:
1) fixed broadcast() to ensure that all bytes are equal to 0xff.
2) fixed unicast() to ensure that bit 0 of the first byte is equal to 0
3) fixed multicast() to ensure that bit 0 of the first byte is equal to 1, and
   that it is not a broadcast.

also the constructors in EthAddr are fixed so that all bytes of data are
initialized.
2014-07-02 13:19:13 -04:00
Radhika Jagtap b998a0c6ac util: Add DVFS perfLevel to checkpoint upgrade script
This patch updates the checkpoint upgrader script. It adds the _perfLevel
variable in the clock domain and voltage domain simObjects used for DVFS.
2014-07-01 11:58:22 -04:00
Stephan Diestelhorst 65cea4708e power: Add basic DVFS support for gem5
Adds DVFS capabilities to gem5, by allowing users to specify lists for
frequencies and voltages in SrcClockDomains and VoltageDomains respectively.
A separate component, DVFSHandler, provides a small interface to change
operating points of the associated domains.

Clock domains will be linked to voltage domains and thus allow separate clock,
but shared voltage lines.

Currently all the valid performance-level updates are performed with a fixed
transition latency as specified for the domain.

Config file example:
...
vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V'])
tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster1.clk_domain.domain_id = 0
tsys.cluster2.clk_domain.domain_id = 1
tsys.cluster1.clk_domain.voltage_domain = vd
tsys.cluster2.clk_domain.voltage_domain = vd
tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain,
                             tsys.cluster2.clk_domain]
tsys.dvfs_handler.enable = True
2014-06-30 13:56:06 -04:00
Andreas Hansson 1f539ce4cc mem: DRAMPower trace output
This patch adds a DRAMPower flag to enable off-line DRAM power
analysis using the DRAMPower tool. A new DRAMPower flag is added
and a follow-on patch adds a Python script to post-process the output
and order it based on time stamps.

The long-term goal is to link DRAMPower as a library and provide the
commands through function calls to the model rather than first
printing and then parsing the commands. At the moment it is also up to
the user to ensure that the same DRAM configuration is used by the
gem5 controller model and DRAMPower.
2014-06-30 13:56:03 -04:00
Andreas Hansson b4ce51eb9e mem: Add bank and rank indices as fields to the DRAM bank
This patch adds the index of the bank and rank as a field so that we can
determine the identity of a given bank (reference or pointer) for the
power tracing. We also grab the opportunity of cleaning up the
arguments used for identifying the bank when activating.
2014-06-30 13:56:02 -04:00
Andreas Hansson d59bc8ee1f mem: Extend DRAM row bits from 16 to 32 for larger densities
This patch extends the DRAM row bits to 32 to support larger density
memories. Additional checks are also added to ensure the row fits in
the 32 bits.
2014-06-30 13:56:01 -04:00
Anthony Gutierrez f34a8f0d61 cpu: implement a bi-mode branch predictor 2014-06-30 13:50:03 -04:00
Binh Pham b085db84af x86: fix table walker assertion
In a cycle, we could see a R and W requests corresponding to the same
page walk being sent to the memory. During the cycle that assertion
happens, we have 2 responses corresponding to the R and W above. We
also have a 'read' variable to keep track of the inflight Read
request, this gets reset to NULL right after we send out any R
request; and gets set to the next R in the page walk when a response
comes back.

The issue we are seeing here is when we get a response for W request,
assert(!read) fires because we got a response for R request right
before this, hence we set 'read' to NOT NULL value, pointing to the
next R request in the pagewalk!

This work was done while Binh was an intern at AMD Research.
2014-06-21 10:39:44 -07:00
Binh Pham b72c879868 o3: make dispatch LSQ full check more selective
Dispatch should not check LSQ size/LSQ stall for non load/store
instructions.

This work was done while Binh was an intern at AMD Research.
2014-06-21 10:26:55 -07:00
Binh Pham 0782d92286 o3: split load & store queue full cases in rename
Check for free entries in Load Queue and Store Queue separately to
avoid cases when load cannot be renamed due to full Store Queue and
vice versa.

This work was done while Binh was an intern at AMD Research.
2014-06-21 10:26:43 -07:00
Andreas Hansson fdb965f5c1 scons: Bump the compiler version to gcc 4.6 and clang 3.0
This patch bumps the supported version of gcc from 4.4 to 4.6, and
clang from 2.9 to 3.0. This enables, amongst other things, range-based
for loops, lambda expressions, etc. The STL implementation shipping
with 4.6 also has a full functional implementation of unique_ptr and
shared_ptr.
2014-06-10 17:44:39 -04:00
Joel Hestness 4f8ac94549 sim: More rigorous clocking comments
The language describing the clockEdge and nextCycle functions were ambiguous,
and so were prone to misinterpretation/misuse. Clear up the comments to more
rigorously describe their functionality.
2014-06-09 22:01:16 -05:00
Steve Reinhardt 0be64ffe2f style: eliminate equality tests with true and false
Using '== true' in a boolean expression is totally redundant,
and using '== false' is pretty verbose (and arguably less
readable in most cases) compared to '!'.

It's somewhat of a pet peeve, perhaps, but I had some time
waiting for some tests to run and decided to clean these up.

Unfortunately, SLICC appears not to have the '!' operator,
so I had to leave the '== false' tests in the SLICC code.
2014-05-31 18:00:23 -07:00
Nilay Vaish e685767b58 ruby: slicc: remove unused ids DNUCA* 2014-05-23 06:07:02 -05:00
Nilay Vaish 9c9257a612 ruby: remove old protocol documentation 2014-05-23 06:07:02 -05:00
Nilay Vaish 8bf41e41c1 ruby: message buffer: drop dequeue_getDelayCycles()
The functionality of updating and returning the delay cycles would now be
performed by the dequeue() function itself.
2014-05-23 06:07:02 -05:00
Nilay Vaish 1e26b7ea29 cpu: o3: remove stat totalCommittedInsts
This patch removes the stat totalCommittedInsts.  This variable was used for
recording the total number of instructions committed across all the threads
of a core.  The instructions committed by each thread are recorded invidually.
The total would now be generated by summing these individual counts.
2014-05-23 06:07:02 -05:00
Steve Reinhardt 109908c2a6 syscall emulation: clean up & comment SyscallReturn 2014-05-12 14:23:31 -07:00
Andreas Hansson f800f268db mem: Update DDR3 and DDR4 based on datasheets
This patch makes a more firm connection between the DDR3-1600
configuration and the corresponding datasheet, and also adds a
DDR3-2133 and a DDR4-2400 configuration. At the moment there is also
an ongoing effort to align the choice of datasheets to what is
available in DRAMPower.
2014-05-09 18:58:49 -04:00
Andreas Hansson cc4ca78f99 mem: Add DRAM cycle time
This patch extends the current timing parameters with the DRAM cycle
time. This is needed as the DRAMPower tool expects timestamps in DRAM
cycles. At the moment we could get away with doing this in a
post-processing step as the DRAMPower execution is separate from the
simulation run. However, in the long run we want the tool to be called
during the simulation, and then the cycle time is needed.
2014-05-09 18:58:49 -04:00
Andreas Hansson 8c56efe747 mem: Simplify DRAM response scheduling
This patch simplifies the DRAM response scheduling based on the
assumption that they are always returned in order.
2014-05-09 18:58:48 -04:00
Andreas Hansson 8e3869411d mem: Add precharge all (PREA) to the DRAM controller
This patch adds the basic ingredients for a precharge all operation,
to be used in conjunction with DRAM power modelling.

Currently we do not try and apply any cleverness when precharging all
banks, thus even if only a single bank is open we use PREA as opposed
to PRE. At the moment we only have a single tRP (tRPpb), and do not
model the slightly longer all-bank precharge constraint (tRPab).
2014-05-09 18:58:48 -04:00
Andreas Hansson 0ba1e72e9b mem: Remove printing of DRAM params
This patch removes the redundant printing of DRAM params.
2014-05-09 18:58:48 -04:00
Andreas Hansson 6753cb705e mem: Add tRTP to the DRAM controller
This patch adds the tRTP timing constraint, governing the minimum time
between a read command and a precharge. Default values are provided
for the existing DRAM types.
2014-05-09 18:58:48 -04:00
Andreas Hansson 60799dc552 mem: Merge DRAM latency calculation and bank state update
This patch merges the two control paths used to estimate the latency
and update the bank state. As a result of this merging the computation
is now in one place only, and should be easier to follow as it is all
done in absolute (rather than relative) time.

As part of this change, the scheduling is also refined to ensure that
we look at a sensible estimate of the bank ready time in choosing the
next request. The bank latency stat is removed as it ends up being
misleading when the DRAM access code gets evaluated ahead of time (due
to the eagerness of waking the model up for scheduling the next
request).
2014-05-09 18:58:48 -04:00
Andreas Hansson b8631d9ae8 mem: Add tWR to DRAM activate and precharge constraints
This patch adds the write recovery time to the DRAM timing
constraints, and changes the current tRASDoneAt to a more generic
preAllowedAt, capturing when a precharge is allowed to take place.

The part of the DRAM access code that accounts for the precharge and
activate constraints is updated accordingly.
2014-05-09 18:58:48 -04:00
Andreas Hansson c735ef6cb0 mem: Merge DRAM page-management calculations
This patch treats the closed page policy as yet another case of
auto-precharging, and thus merges the code with that used for the
other policies.
2014-05-09 18:58:48 -04:00
Andreas Hansson 87f4c956c4 mem: Add DRAM power states to the controller
This patch adds power states to the controller. These states and the
transitions can be used together with the Micron power model. As a
more elaborate use-case, the transitions can be used to drive the
DRAMPower tool.

At the moment, the power-down modes are not used, and this patch
simply serves to capture the idle, auto refresh and active modes. The
patch adds a third state machine that interacts with the refresh state
machine.
2014-05-09 18:58:48 -04:00
Andreas Hansson babf072c1c mem: Ensure DRAM refresh respects timings
This patch adds a state machine for the refresh scheduling to
ensure that no accesses are allowed while the refresh is in progress,
and that all banks are propely precharged.

As part of this change, the precharging of banks of broken out into a
method of its own, making is similar to how activations are dealt
with. The idle accounting is also updated to ensure that the refresh
duration is not added to the time that the DRAM is in the idle state
with all banks precharged.
2014-05-09 18:58:48 -04:00
Andreas Hansson 5c2c3f598e mem: Make DRAM read/write switching less conservative
This patch changes the read/write event loop to use a single event
(nextReqEvent), along with a state variable, thus joining the two
control flows. This change makes it easier to follow the state
transitions, and control what happens when.

With the new loop we modify the overly conservative switching times
such that the write-to-read switch allows bank preparation to happen
in parallel with the bus turn around. Similarly, the read-to-write
switch uses the introduced tRTW constraint.
2014-05-09 18:58:48 -04:00
Ali Saidi dbaf43394b arm: Make sure UndefinedInstructions are properly initialized 2014-04-17 16:56:09 -05:00
Ali Saidi a00b44ebe8 arm: allow DC instructions by default so SE mode works 2014-04-17 16:55:54 -05:00
Ali Saidi c4a2f76fea sim, arm: implement more of the at variety syscalls
Needed for new AArch64 binaries
2014-04-17 16:55:05 -05:00
Andrew Bardsley f5c3f60601 cpu: Useful getters for ActivityRecorder
Add some useful getters to ActivityRecorder
2014-05-09 18:58:48 -04:00
Andrew Bardsley bf78299f04 cpu: Add flag name printing to StaticInst
This patch adds a the member function StaticInst::printFlags to allow all
of an instruction's flags to be printed without using the individual
is... member functions or resorting to exposing the 'flags' vector

It also replaces the enum definition StaticInst::Flags with a
Python-generated enumeration and adds to the enum generation mechanism
in src/python/m5/params.py to allow Enums to be placed in namespaces
other than Enums or, alternatively, in wrapper structs allowing them to
be inherited by other classes (so populating that class's name-space
with the enumeration element names).
2014-05-09 18:58:47 -04:00
Andrew Bardsley 8087d2622d cpu: Timebuf const accessors
Add const accessors for timebuf elements.
2014-05-09 18:58:47 -04:00
Andrew Bardsley f7d80348fa arm: Add branch flags onto macroops
Mark branch flags onto macroops to allow branch prediction before
microop decomposition
2014-05-09 18:58:47 -04:00
Andrew Bardsley eab00f4966 cpu: Allow setWhen on trace objects
Allow setting of 'when' in trace records.  This allows later times
than the arbitrary record creation point to be used as inst. times
2014-05-09 18:58:47 -04:00
Curtis Dunham af39ab297f arm: add preliminary ISA splits for ARM arch 2014-05-09 18:58:47 -04:00
Curtis Dunham fe27f937aa arch: teach ISA parser how to split code across files
This patch encompasses several interrelated and interdependent changes
to the ISA generation step.  The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units.  I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner.  It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type.  Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths.  In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
2014-05-09 18:58:47 -04:00
Geoffrey Blake 0c1913336a config: Avoid generating a reference to myself for Parent.any
The unproxy code for Parent.any can generate a circular reference
in certain situations with classes hierarchies like those in ClockDomain.py.
This patch solves this by marking ouself as visited to make sure the
search does not resolve to a self-reference.
2014-05-09 18:58:47 -04:00
Geoffrey Blake 85940fd537 arch, arm: Preserve TLB bootUncacheability when switching CPUs
The ARM TLBs have a bootUncacheability flag used to make some loads
and stores become uncacheable when booting in FS mode. Later the
flag is cleared to let those loads and stores operate as normal.  When
doing a takeOverFrom(), this flag's state is not preserved and is
momentarily reset until the CPSR is touched. On single core runs this
is a non-issue. On multi-core runs this can lead to crashes on the O3
CPU model from the following series of events:
 1) takeOverFrom executed to switch from Atomic -> O3
 2) All bootUncacheability flags are reset to true
 3) Core2 tries to execute a load covered by bootUncacheability, it
    is flagged as uncacheable
 4) Core2's load needs to replay due to a pipeline flush
 3) Core1 core does an action on CPSR
 4) The handling code for CPSR then checks all other cores
    to determine if bootUncacheability can be set to false
 5) Asynchronously set bootUncacheability on all cores to false
 6) Core2 replays load previously set as uncacheable and notices
    it is now flagged as cacheable, leads to a panic.
This patch implements takeOverFrom() functionality for the ARM TLBs
to preserve flag values when switching from atomic -> detailed.
2014-05-09 18:58:47 -04:00
Curtis Dunham 1028c03320 cpu: add more instruction mix statistics
For the o3, add instruction mix (OpClass) histogram at commit (stats
also already collected at issue). For the simple CPUs we add a
histogram of executed instructions
2014-05-09 18:58:47 -04:00
Mitch Hayenga a15b713cba mem: Squash prefetch requests from downstream caches
This patch squashes prefetch requests from downstream caches,
so that they do not steal cachelines away from caches closer
to the cpu.  It was originally coded by Mitch Hayenga and
modified by Aasheesh Kolli.
2014-05-09 18:58:46 -04:00
Stephan Diestelhorst b9e6c260a0 stats: Method stats source
This source for stats binds an object and a method / function from the object
to a stats object. This allows pulling out stats from object methods without
needing to go through a global, or static shim.

Syntax is somewhat unpleasant, but the templates and method pointer type
specification were quite tricky.  Interface is very clean though; and similar
to .functor
2014-05-09 18:58:46 -04:00
Akash Bagdia 2b1a01ee6c cpu, arm: Allow the specification of a socket field
Allow the specification of a socket ID for every core that is reflected in the
MPIDR field in ARM systems.  This allows studying multi-socket / cluster
systems with ARM CPUs.
2014-05-09 18:58:46 -04:00
Sascha Bischoff e940bac278 mem: Auto-generate CommMonitor trace file names
Splits the CommMonitor trace_file parameter into three parameters. Previously,
the trace was only enabled if the trace_file parameter was set, and would be
written to this file. This patch adds in a trace_enable and trace_compress
parameter to the CommMonitor.

No trace is generated if trace_enable is set to False. If it is set to True, the
trace is written to a file based on the name of the SimObject in the simulation
hierarchy. For example, system.cluster.il1_commmonitor.trc. This filename can be
overridden by additionally specifying a file name to the trace_file parameter
(more on this later).

The trace_compress parameter will append .gz to any filename if set to True.
This enables compression of the generated traces. If the file name already ends
in .gz, then no changes are made.

The trace_file parameter will override the name set by the trace_enable
parameter. In the case that the specified name does not end in .gz but
trace_compress is set to true, .gz is appended to the supplied file name.
2014-05-09 18:58:46 -04:00
Geoffrey Blake 29601eada7 arm: Panics in miscreg read functions can be tripped by O3 model
Unimplemented miscregs for the generic timer were guarded by panics
in arm/isa.cc which can be tripped by the O3 model if it speculatively
executes a wrong path containing a mrs instruction with a bad miscreg
index. These registers were flagged as implemented and accessible.
This patch changes the miscreg info bit vector to flag them as
unimplemented and inaccessible. In this case, and UndefinedInst
fault will be generated if the register access is not trapped
by a hypervisor.
2014-05-09 18:58:46 -04:00
Chris Emmons a3306d0d5e dev: Set HDLCD default pixel clock for 1080p @ 60Hz
This patch changes the default pixel clock to effectively generate
1080p resolution at 60 frames per second. It is dependent upon the
kernel device tree file using the specified resolution / display
string in the comments.
2014-05-09 18:58:46 -04:00
Matt Evans 73dc89e542 arm: quick hack to allow a greater number of CPUs to a guest OS
This is a quick hack to communicate a greater number of CPUs to a guest OS via
the ARM A9 SCU config register. Some OSes (Linux) just look at the bottom field
to count CPUs and with a small change can look at bits [3:0] to learn about up
to 16 CPUs.

Very much unsupported (and contains warning messages as such) but useful for
running 8 core sims without hardwiring CPU count in the guest OS.
2014-05-09 18:58:46 -04:00
Curtis Dunham 7f1603d207 arch: remove inline specifiers on all inst constrs, all ISAs
With (upcoming) separate compilation, they are useless.  Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
2014-05-09 18:58:46 -04:00
Curtis Dunham eb61f0123b arm: cleanup ARM ISA definition 2014-05-09 18:58:46 -04:00
Curtis Dunham ad019c5c58 scons: Require SWIG >= 2.0.4 and remove vector typemaps
SWIG commit fd666c1 (*) made it unnecessary for gem5 to have these
typemaps to handle Vector types.

* fd666c1440
2014-05-09 18:58:46 -04:00
Curtis Dunham ecf774bc56 arm: Correctly display disassembly of vldmia/vstmia
The MicroMemOp class generates the disassembly for both integer
and floating point instructions, but it would always print its
first operand as an integer register without considering that the
op may be a floating instruction in which case a float register
should be displayed instead.
2014-04-23 05:18:30 -04:00
Andreas Hansson 6cd82c1116 sim: Use correct unit for abort message
This patch fixes the unit in the abort printout.
2014-04-23 05:18:27 -04:00
Mitchell Hayenga bf25c53a7d cpu: Fix setTranslateLatency() bug for squashed instructions
setTranslateLatency could sometimes improperly access a deleted request
packet after an instruction was squashed.
2014-04-23 05:18:26 -04:00
Sascha Bischoff 2031c03c09 misc: Proper type check and import for PortRef
Rewriting the type checking around PortRef, which was interacting strangely
with other Python scripts.

Tested-by: stephan.diestelhorst@arm.com
2014-04-23 05:18:25 -04:00
Mitch Hayenga e4086878f6 cpu: Fix case where o3 lsq could print out uninitialized data
In the O3 LSQ, data read/written is printed out in DPRINTFs.  However,
the data field is treated as a character string with a null terminated.
However the data field is not encoded this way.  This patch removes
that possibility by removing the data part of the print.
2014-04-01 14:22:06 -05:00
Mitch Hayenga a0d30f36a6 mem: Don't print out the data of a cache block
This never actually worked since it was printing out only a word
of the cache block and not the entire thing and doubly didn't work
csprintf overrides the %#x specifier and assumes a char* array is
actually a string.
2014-04-01 14:24:36 -05:00
Mitchell Hayenga 0fad0c7f7d arm: Don't use a stack allocated mnemonic
FailUnimplemented passed a stack created mnemonic as a const char * which
causes some grief when the stack goes away.
2014-04-23 05:18:20 -04:00
Dam Sunwoo 84f8fe637c cpu: Add O3 CPU width checks
O3CPU has a compile-time maximum width set in o3/impl.hh, but checking
the configuration against this limit was not implemented anywhere
except for fetch. Configuring a wider pipe than the limit can silently
cause various issues during the simulation. This patch adds the proper
checking in the constructor of the various pipeline stages.
2014-04-23 05:18:18 -04:00
Curtis Dunham c9071ff95e base: explicitly suggest potential use of 'All' debug flags
Without this declaration, new clangs will complain about this value
being unused. It has no explicit use in the codebase, but it can be
useful to turn on all debugging flags while in a debugger to greatly
increase simulator verbosity.
2014-04-23 05:17:59 -04:00
Curtis Dunham e651188f75 arch: remove 'null update' check in isa-parser
SCons already does this for all build steps.
2014-04-23 05:17:57 -04:00
Curtis Dunham fa4a262204 stats: better error message for uninitialized statistic
As suggested by Nathan Binkert in 2008:
http://permalink.gmane.org/gmane.comp.emulators.m5.users/2676
2014-02-10 18:24:20 -06:00
Nilay Vaish 4ceeda20aa ruby: slicc: remove old documentation
Has not been maintained at all.  Since there is alternate documentation
available on gem5.org, no need to have this separately.
2014-04-19 09:00:31 -05:00
Nilay Vaish 183100b8cb ruby: slicc: slight change to rule for transitions
It had an unnecessary pairs token which is being removed.
2014-04-19 09:00:31 -05:00
Faissal Sleiman a1570f544f o3: Fix occupancy checks for SMT
A number of calls to isEmpty() and numFreeEntries()
should be thread-specific.

In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob
has instructions from thread 0 (isEmpty() returns false), and none from
thread 1. If we are trying to squash all of thread 1, then
readTailInst(thread 1) will be called because rob->isEmpty() returns
false. The result is end_it is not in the list and the while
statement loops indefinitely back over the cpu's instList.

In iew_impl.hh, all threads are told they have the entire remaining IQ, when
each thread actually has a certain allocation. The result is extra stalls at
the iew dispatch stage which the rename stage usually takes care of.

In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only
contains instructions from thread 0. This returns a dummyInst (which may work
since we are trying to squash all instructions, but hardly seems like the right
way to do it).

In rob_impl.hh this fix skips the rest of the function more frequently and is
more efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-04-19 09:00:30 -05:00
Marco Elver d9fa950396 ruby: recorder: Fix (de-)serializing with different cache block-sizes
Upon aggregating records, serialize system's cache-block size, as the
cache-block size can be different when restoring from a checkpoint. This way,
we can correctly read all records when restoring from a checkpoints, even if
the cache-block size is different.

Note, that it is only possible to restore from a checkpoint if the
desired cache-block size is smaller or equal to the cache-block size
when the checkpoint was taken; we can split one larger request into
multiple small ones, but it is not reliable to do the opposite.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-04-19 09:00:30 -05:00
Andreas Sandberg 02b51afb7e kvm, x86: Add initial support for multicore simulation
Simulating a SMP or multicore requires devices to be shared between
multiple KVM vCPUs. This means that locking is required when accessing
devices. This changeset adds the necessary locking to allow devices to
execute correctly. It is implemented by temporarily migrating the KVM
CPU to the VM's (and devices) event queue when handling
MMIO. Similarly, the VM migrates to the interrupt controller's event
queue when delivering an interrupt.

The support for fast-forwarding of multicore simulations added by this
changeset assumes that all devices in a system are simulated in the
same thread and each vCPU has its own thread. Special care must be
taken to ensure that devices living under the CPU in the object
hierarchy (e.g., the interrupt controller) do not inherit the parent
CPUs thread and are assigned to device thread. The KvmVM object is
assumed to live in the same thread as the other devices in the system.
2014-04-09 16:01:58 +02:00
Andreas Sandberg 221f4f232a dev: Protect PollEvent processing when running in parallel mode
The calling thread is undefined when the PollQueue services events.
This implies that PollEvents need to handle the case where they are
processed from a different thread than the thread that created the
event. This changeset adds temporary event queue migrations to the VNC
server, the ethernet tap device, and the terminal to protect them from
inter-thread calls.
2014-04-09 16:01:43 +02:00
Nilay Vaish d805e42b81 ruby: slicc: change enqueue statement
As of now, the enqueue statement can take in any number of 'pairs' as
argument.  But we only use the pair in which latency is the key.  This
latency is allowed to be either a fixed integer or a member variable of
controller in which the expression appears.  This patch drops the use of pairs
in an enqueue statement.  Instead, an expression is allowed which will be
interpreted to be the latency of the enqueue.  This expression can anything
allowed by slicc including a constant integer or a member variable.
2014-04-08 13:26:30 -05:00
Nilay Vaish e689c00b16 ruby: coherence protocols: drop the phrase IntraChip
The phrase is no longer valid since we do not distinguish between
inter and intra chip communication.
2014-04-08 13:26:29 -05:00
Andreas Sandberg 838bcd3b19 sim: Add the ability to lock and migrate between event queues
We need the ability to lock event queues to enable device accesses
across threads. The serviceOne() method now takes a service lock prior
to handling a new event. By locking an event queue, a different
thread/eq can effectively execute in the context of the locked event
queue. To simplify temporary event queue migrations, this changeset
introduces the EventQueue::ScopedMigration class that unlocks the
current event queue, locks a new event queue, and updates the current
event queue variable.

In order to prevent deadlocks, event queues need to be released when
waiting on barriers. This is implemented using the
EventQueue::ScopedRelease class. An instance of this class is, for
example, used in the BaseGlobalEvent class to release the event queue
when waiting on the synchronization barrier.

The intended use for this functionality is when devices need to be
accessed across thread boundaries. For example, when fast-forwarding,
it might be useful to run devices and CPUs in separate threads. In
such a case, the CPU locks the device queue whenever it needs to
perform IO.  This functionality is primarily intended for KVM.

Note: Migrating between event queues can lead to non-deterministic
timing. Use with extreme care!

--HG--
extra : rebase_source : 23e3a741a1fd73861d1339782dbbe1bc76285315
2014-04-03 11:22:49 +02:00
Marco Elver b884fcf412 cpu: o3: lsq: Fix TSO implementation
This patch fixes violation of TSO in the O3CPU, as all loads must be
ordered with all other loads. In the LQ, if a snoop is observed, all
subsequent loads need to be squashed if the system is TSO.

Prior to this patch, the following case could be violated:

 P0         | P1          ;
 MOV [x],mail=/usr/spool/mail/nilay | MOV EAX,[y] ;
 MOV [y],mail=/usr/spool/mail/nilay | MOV EBX,[x] ;

exists (1:EAX=1 /\ 1:EBX=0) [is a violation]

The problem was found using litmus [http://diy.inria.fr].

Committed by: Nilay Vaish <nilay@cs.wisc.edu
2014-03-25 13:15:04 -05:00
Andreas Hansson a00383a40a mem: Track DRAM read/write switching and add hysteresis
This patch adds stats for tracking the number of reads/writes per bus
turn around, and also adds hysteresis to the write-to-read switching
to ensure that the queue does not oscilate around the low threshold.
2014-03-23 11:12:14 -04:00
Andreas Hansson 7c18691db1 mem: Rename SimpleDRAM to a more suitable DRAMCtrl
This patch renames the not-so-simple SimpleDRAM to a more suitable
DRAMCtrl. The name change is intended to ensure that we do not send
the wrong message (although the "simple" in SimpleDRAM was originally
intended as in cleverly simple, or elegant).

As the DRAM controller modelling work is being presented at ISPASS'14
our hope is that a broader audience will use the model in the future.

--HG--
rename : src/mem/SimpleDRAM.py => src/mem/DRAMCtrl.py
rename : src/mem/simple_dram.cc => src/mem/dram_ctrl.cc
rename : src/mem/simple_dram.hh => src/mem/dram_ctrl.hh
2014-03-23 11:12:12 -04:00
Andreas Hansson 3dd1587afc mem: Change memory defaults to be more representative
Make the default memory type DDR3-1600 x64, and use the open-adaptive
page policy. This change is aiming to ensure that users by default are
using a realistic memory system.
2014-03-23 11:12:10 -04:00
Wendy Elsasser bbbae677ed mem: Add close adaptive paging policy to DRAM controller model
This patch adds a second adaptive page policy to the DRAM controller,
closing the page unless there are already queued accesses to the open
page.
2014-03-23 11:12:08 -04:00
Andreas Hansson 03a1aed803 mem: DRAM controller tidying up
Minor tidying up and removing of redundant code, including the
printing of queue state every million accesses.
2014-03-23 11:12:06 -04:00
Andreas Hansson bc83eb2197 mem: Fix bug in DRAM bytes per activate
This patch ensures that we do not sample the bytes per activate when
the row has already been closed.
2014-03-23 11:12:05 -04:00
Andreas Hansson 116985d661 mem: Limit the accesses to a page before forcing a precharge
This patch adds a basic starvation-prevention mechanism where a DRAM
page is forced to close after a certain number of accesses. The limit
is combined with the open and open-adaptive page policy and if reached
causes an auto-precharge.
2014-03-23 11:12:03 -04:00
Andreas Hansson 6557741311 mem: Make DRAM write queue draining more aggressive
This patch changes the triggering condition for the write draining
such that we grab the opportunity to issue writes if there are no
reads waiting (as opposed to waiting for the writes to reach the high
threshold). As a result, we potentially drain some of the writes in read
idle periods (if any).

A low threshold is added to be able to control how many write bursts
are kept in the memory controller queue (acting as on-chip storage).

The high and low thresholds are updated to sensible values for a 32/64
size write buffer. Note that the thresholds should be adjusted along
with the queue sizes.

This patch also adds some basic initialisation sanity checks and moves
part of the initialisation to the constructor.
2014-03-23 11:12:01 -04:00
Neha Agarwal 364a51181e cpu: DRAM Traffic Generator
This patch enables a new 'DRAM' mode to the existing traffic
generator, catered to generate specific requests to DRAM based on
required hit length (stride size) and bank utilization. It is an add on
to the Random mode.

The basic idea is to control how many successive packets target the
same page, and how many banks are being used in parallel. This gives a
two-dimensional space that stresses different aspects of the DRAM
timing.

The configuration file needed to use this patch has to be changed as
follow: (reference to Random Mode, LPDDR3 memory type)

'STATE 0 10000000000 RANDOM 50 0 134217728 64 3004 5002 0'
-> 'STATE 0 10000000000 DRAM 50 0 134217728 32 3004 5002 0 96 1024 8 6 1'

The last 4 parameters to be added are:
<stride size (bytes), page size(bytes), number of banks available in DRAM,
    number of banks to be utilized, address mapping scheme>

The address mapping information is used to get the stride address
stream of the specified size and to know where to find the bank
bits. The configuration file has a parameter where '0'-> RoCoRaBaCh,
'1'-> RoRaBaCoCh/RoRaBaChCo address-mapping schemes. Note that the
generator currently assumes a single channel and a single rank. This
is to avoid overwhelming the traffic generator with information about
the memory organisation.
2014-03-23 11:11:58 -04:00
Neha Agarwal 43abaf518f mem: DDR3 config for comparing with DRAMSim2
This patch adds a new DDR3 configuration to match with the parameters
that are specified in one of the DDR3 configs used in DRAMSim2.
2014-03-23 11:11:56 -04:00
Andreas Hansson 7e7b67472a mem: More descriptive address-mapping scheme names
This patch adds the row bits to the name of the address mapping
schemes to make it more clear that all the current schemes places the
row bits as the most significant bits.
2014-03-23 11:11:53 -04:00
Stan Czerniawski 4f77bc230a misc: Fix -q (quiet) flag
Check the right flag.
2014-03-23 11:11:49 -04:00
Andreas Hansson 9ac4f781ec ruby: Move Ruby debug flags to ruby dir and remove stale options
This patch moves the Ruby-related debug flags to the ruby
sub-directory, and also removes the state SConsopts that add the
no-longer-used NO_VECTOR_BOUNDS_CHECK.
2014-03-23 11:11:48 -04:00
Andreas Hansson 9f018d2f5a mem: Include the DRAMSim2 wrapper in NULL build
This patch makes sure DRAMSim2 is included in a build of the NULL ISA.
2014-03-23 11:11:44 -04:00
Sascha Bischoff 548d47ea2c mem: CommMonitor trace warn on non-timing mode
Add a warning to the CommMonitor which will alert the user if they try
and record a trace when the system is not in timing mode.
2014-03-23 11:11:40 -04:00
Stan Czerniawski e18d0e04a2 cpu: Add basic check to TrafficGen initial state
Prevent incomplete configuration of TrafficGen class from causing
segmentation faults. If an 'INIT' line is not present in the
configuration file then the currState variable will remain
uninitialized which may result in a crash.
2014-03-23 11:11:39 -04:00
Andrew Bardsley 0c001e729a dev: Fix IsaFake's cxx_header setting
cxx_header was set incorrectly on IsaFake
2014-03-23 11:11:37 -04:00
Eric Van Hensbergen 7630168a75 arm: m5ops readfile64 args broken, offset coming through garbage
There were several sections of the m5ops code which were
essentially copy/pasted versions of the 32-bit code. The
problem is that some of these didn't account fo4 64-bit
registers leading to arguments being in the wrong registers.
This patch addresses the args for readfile64, writefile64,
and addsymbol64 -- all of which seemed to suffer from a
similar set of problems when moving to 64-bit.
2014-03-23 11:11:34 -04:00
Andreas Hansson 5093e58dc2 base: Fix error message time unit (cycle -> tick)
This patch fixes the unit used in all error messages.
2014-03-23 11:11:32 -04:00
Nilay Vaish 52a83c1d0e ruby: consumer: avoid accessing wakeup times when waking up
Each consumer object maintains a set of tick values when the object is supposed
to wakeup and do some processing.  As of now, the object accesses this set both
when scheduling a wakeup event and when the object actually wakes up.  The set
is accessed during wakeup to remove the current tick value from the set.  This
functionality is now being moved to the scheduling function where ticks are
removed at a later time.
2014-03-20 09:14:14 -05:00
Nilay Vaish 4b67ada89e ruby: garnet: convert network interfaces into clocked objects
This helps in configuring the network interfaces from the python script and
these objects no longer rely on the network object for the timing information.
2014-03-20 09:14:14 -05:00
Nilay Vaish 4f7ef51efb ruby: slicc: code refactor 2014-03-20 09:14:14 -05:00
Nilay Vaish 9b3418d163 ruby: no piobus in se mode
Piobus was recently added to se scripts for ruby so that the interrupt
controller can be connected to something (required since the interrupt
controller sends address range messages).  This patch removes the piobus
and instead, the pio port of ruby port will now ignore the range change
messages in se mode.
2014-03-20 08:03:09 -05:00
Nilay Vaish f7e7fa6d90 ruby: remove some of the unnecessary code 2014-03-17 17:40:14 -05:00
Andreas Sandberg 11ffa379ab kvm: Clean up signal handling
KVM used to use two signals, one for instruction count exits and one
for timer exits. There is really no need to distinguish between the
two since they only trigger exits from KVM. This changeset unifies and
renames the signals and adds a method, kick(), that can be used to
raise the control signal in the vCPU thread. It also removes the early
timer warning since we do not normally see if the signal was
delivered.

--HG--
extra : rebase_source : cd0e45ca90894c3d6f6aa115b9b06a1d8f0fda4d
2014-03-16 17:40:58 +01:00
Andreas Sandberg 5db547bca4 kvm: x86: Adjust PC to remove the CS segment base address
gem5 seems to store the PC as RIP+CS_BASE. This is not what KVM
expects, so we need to subtract CS_BASE prior to transferring the PC
into KVM. This changeset adds the necessary PC manipulation and
refactors thread context updates slightly to avoid reading registers
multiple times from KVM.

--HG--
extra : rebase_source : 3f0569dca06a1fcd8694925f75c8918d954ada44
2014-03-16 17:30:24 +01:00
Andreas Sandberg f791e7b313 kvm: x86: Add support for x86 INIT and STARTUP handling
This changeset adds support for INIT and STARTUP IPI handling. We
currently handle both of these interrupts in gem5 and transfer the
state to KVM. Since we do not have a BIOS loaded, we pretend that the
INIT interrupt suspends the CPU after reset.

--HG--
extra : rebase_source : 7f3b25f3801d68f668b6cd91eaf50d6f48ee2a6a
2014-03-16 17:28:23 +01:00
Paul Rosenfeld 32bf74cb8e alpha: Small removal of dead comments/code from alpha ISA
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-12 07:03:22 -05:00
Andreas Hansson 62fe81e9c1 cpu: Make CPU and ThreadContext getters const
This patch merely tidies up the CPU and ThreadContext getters by
making them const where appropriate.
2014-03-07 15:56:23 -05:00
Geoffrey Blake c4a8e5c36c arm: Handle functional TLB walks properly
The table walker code currently accounts for two types of walks,
Atomic and Timing, and treats them differently. Atomic walks keep a
single instance of WalkerState around for all walks to use in
currState. Timing mode keeps a queue of in-flight WalkerStates and
maintains currState as NULL between walks.

If a functional walk is done during Timing mode, it is treated as an
atomic walk and either creates a persistent WalkerState if in between
Timing walks, or stomps an existing currState for an in-progress
Timing walk.

This patch distinguishes functional walks as being able to exist at
any time and sets up a temporary WalkerState for its exclusive use and
then cleans up when finished, leaving any in progress Atomic or Timing
walks undisturbed.
2014-03-07 15:56:23 -05:00
Prakash Ramrakhyani e88cffb30a mem: Fix incorrect assert failure in the Cache
This patch fixes an assert condition that is not true at all
times. There are valid situations that arise in dual-core
dual-workload runs where the assert condition is false. The function
call following the assert however needs to be called only when the
condition is true (a block cannot be invalidated in the tags structure
if has not been allocated in the structure, and the tempBlock is never
allocated). Hence the 'assert' has been replaced with an 'if'.
2014-03-07 15:56:23 -05:00
Radhika Jagtap c446dc40bd mem: Edit proto Packet and enhance the python script
This patch changes the decode script to output the optional fields of
the proto message Packet, namely id and flags. The flags field is set
by the communication monitor.

The id field is useful for CPU trace experiments, e.g. linking the
fetch side to decode side. It had to be renamed because it clashes
with a built in python function id() for getting the "identity" of an
object.

This patch also takes a few common function definitions out from the
multiple scripts and adds them to a protolib python module.
2014-03-07 15:56:23 -05:00
Stephan Diestelhorst 45677ffa97 misc: Add panic_if / fatal_if / chatty_assert
This snippet can be used to replace if + {panics, fatals, asserts} constructs.
The idea is to have both the condition checking and a verbose printout in a single statement.  The interface is as follows:

panic_if(foo != bar, "These should be equal: foo %i bar %i", foo, bar);
fatal_if(foo != bar, "These should be equal: foo %i bar %i", foo, bar);
chatty_assert(foo == bar, "These should be equal: foo %i bar %i", foo, bar);
2014-03-07 15:56:23 -05:00
Mitch Hayenga b9a9d99b22 scons: Fixes uninitialized warnings issued by clang
Small fixes to appease recent clang versions.
2014-03-07 15:56:23 -05:00
Stephan Diestelhorst bef2086f5b arm: Fix uninitialised warning with gcc 4.8
Small fix for a warning that prevents compilation with gcc 4.8.1 due
to detecting that a variable might be uninitialised. The fix is to
assign a safe default.
2014-03-07 15:56:23 -05:00
Ali Saidi bf39a475fe mem: Wakeup sleeping CPUs without caches on LLSC
For systems without caches, the LLSC code does not get snoops for
wake-ups. We add the LLSC code in the abstract memory to do the job
for us.
2014-03-07 15:56:23 -05:00
Andreas Sandberg f4a897d8e3 sim: Schedule the global sync event at curTick() + simQuantum
The global synchronization event used to be scheduled at
simQuantum. This prevented repeated entries into gem5 from Python as
it can be scheduled in the past. This changeset ensures that the first
global synchronization happens at curTick() + simQuantum instead.
2014-03-06 15:59:53 +01:00
Andreas Sandberg be246cef62 x86: Setup correct TSL/TR segment attributes on INIT
The TSL/LDT & TR/TSS segments didn't contain valid attributes. This
caused problems when transfering the state into KVM where invalid
state is a no-go. Fixup the attributes with values from AMD's
architecture programmer's manual.
2014-03-03 14:44:57 +01:00
Andreas Sandberg e7d230ede0 kvm: x86: Always assume segments to be usable
When transferring segment registers into kvm, we need to find the
value of the unusable bit. We used to assume that this could be
inferred from the selector since segments are generally unusable if
their selector is 0. This assumption breaks in some weird corner
cases.  Instead, we just assume that segments are always usable. This
is what qemu does so it should work.
2014-03-03 14:34:33 +01:00
Andreas Sandberg 739cc0128b kvm: Initialize signal handlers from startupThread()
Signal handlers in KVM are controlled per thread and should be
initialized from the thread that is going to execute the CPU. This
changeset moves the initialization call from startup() to
startupThread().
2014-03-03 14:31:39 +01:00
Nilay Vaish 5cd9dd29bd ruby: message buffer: changes related to tracking push/pop times
The last pop operation is now tracked as a Tick instead of in Cycles.
This helps in avoiding use of the receiver's clock during the enqueue
operation.
2014-03-01 23:59:58 -06:00
Nilay Vaish 67cd04b6fe ruby: make the max_size variable of the MessageBuffer unsigned 2014-03-01 23:59:57 -06:00
Christopher Torng 919baa603d cpu: Enable fast-forwarding for MIPS InOrderCPU and O3CPU
A copyRegs() function is added to MIPS utilities
to copy architectural state from the old CPU to
the new CPU during fast-forwarding. This
addition alone enables fast-forwarding for the
o3 cpu model running MIPS.

The patch also adds takeOverFrom() and
drainResume() functions to the InOrderCPU to
enable it to take over from another CPU. This
change enables fast-forwarding for the inorder
cpu model running MIPS, but not for Alpha.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-01 23:35:23 -06:00
Nilay Vaish a533f3f983 ruby: profiler: statically allocate stats variable
Couple of users observed segmentation fault when the simulator tries to
register the statistical variable m_IncompleteTimes.  It seems that there
is some problem with the initialization of these variables when allocated
in the constructor.
2014-03-01 23:35:21 -06:00
Nilay Vaish 7e27860ef4 ruby: route all packets through ruby port
Currently, the interrupt controller in x86 is connected to the io bus
directly.  Therefore the packets between the io devices and the interrupt
controller do not go through ruby.  This patch changes ruby port so that
these packets arrive at the ruby port first, which then routes them to their
destination.  Note that the patch does not make these packets go through the
ruby network.  That would happen in a subsequent patch.
2014-02-23 19:16:16 -06:00
Andreas Hansson 5755fff998 ruby: Simplify RubyPort flow control and routing
This patch simplfies the retry logic in the RubyPort, avoiding
redundant attributes, and enforcing more stringent checks on the
interactions with the normal ports. The patch also simplifies the
routing done by the RubyPort, using the port identifiers instead of a
heavy-weight sender state.

The patch also fixes a bug in the sending of responses from PIO
ports. Previously these responses bypassed the queue in the queued
port, and ignored the return value, potentially leading to response
packets being lost.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-02-23 19:16:16 -06:00
Nilay Vaish 7572ab71b5 ruby: message buffer: refactor code
Code in two of the functions was exactly the same.  This patch moves
this code to a new function which is called from the two functions
mentioned initially.
2014-02-23 19:16:15 -06:00
Nilay Vaish cde20fd476 ruby: remove few not required #includes 2014-02-23 19:16:15 -06:00
Nilay Vaish 82378f7301 ruby: slicc: remove unused COPY_HEAD functionality 2014-02-23 19:16:15 -06:00
Nilay Vaish 13ad07601b ruby: protocols: remove unused action z_stall 2014-02-23 19:16:15 -06:00
Nilay Vaish cd33f9bc42 ruby: network: move message buffers to base network class. 2014-02-21 08:02:05 -06:00
Nilay Vaish bd8f954526 ruby: network: garnet: fixed: removes net_ptr from links 2014-02-21 08:02:04 -06:00
Nilay Vaish 307f53e164 ruby: cache: remove not required variable m_cache_name 2014-02-21 08:02:02 -06:00
Nilay Vaish f8f8b7e5c2 ruby: network: garnet: fixed: removes next cycle functions
At several places, there are functions that take a cycle value as input
and performs some computation.  Along with each such function, another
function was being defined that simply added one more cycle to input and
computed the same function.  This patch removes this second copy of the
function.  Places where these functions were being called have been updated
to use the original function with argument being current cycle + 1.
2014-02-20 17:28:01 -06:00
Nilay Vaish 896654746a ruby: controller: slight code refactoring 2014-02-20 17:27:45 -06:00
Nilay Vaish 0ce8c25919 ruby: mesi three level: rename incorrectly named files
Two files had been incorrectly named with a .cache suffix.

--HG--
rename : src/mem/protocol/MESI_Three_Level-L0.cache => src/mem/protocol/MESI_Three_Level-L0cache.sm
rename : src/mem/protocol/MESI_Three_Level-L1.cache => src/mem/protocol/MESI_Three_Level-L1cache.sm
2014-02-20 17:27:17 -06:00
Nilay Vaish db5b3d37fe ruby: network: removes unused code. 2014-02-20 17:27:07 -06:00
Nilay Vaish dd5c72e5a7 ruby: slicc: slight code refactoring 2014-02-20 17:26:49 -06:00
Nilay Vaish b312a41f21 ruby: message buffer: removes some unecessary functions. 2014-02-20 17:26:41 -06:00
Andreas Sandberg 0d6009e8dc kvm: Add support for multi-system simulation
The introduction of parallel event queues added most of the support
needed to run multiple VMs (systems) within the same gem5
instance. This changeset fixes up signal delivery so that KVM's
control signals are delivered to the thread that executes the CPU's
event queue. Specifically:

  * Timers and counters are now initialized from a separate method
    (startupThread) that is scheduled as the first event in the
    thread-specific event queue. This ensures that they are
    initialized from the thread that is going to execute the CPUs
    event queue and enables signal delivery to the right thread when
    exiting from KVM.

  * The POSIX-timer-based KVM timer (used to force exits from KVM) has
    been updated to deliver signals to the thread that's executing KVM
    instead of the process (thread is undefined in that case). This
    assumes that the timer is instantiated from the thread that is
    going to execute the KVM vCPU.

  * Signal masking is now done using pthread_sigmask instead of
    sigprocmask. The behavior of the latter is undefined in threaded
    applications.

  * Since signal masks can be inherited, make sure to actively unmask
    the control signals when setting up the KVM signal mask.

There are currently no facilities to multiplex between multiple KVM
CPUs in the same event queue, we are therefore limited to
configurations where there is only one KVM CPU per event queue. In
practice, this means that multi-system configurations can be
simulated, but not multiple CPUs in a shared-memory configuration.
2014-02-20 15:43:53 +01:00
Andreas Hansson 4b81585c49 mem: Fix bug in PhysicalMemory use of mmap and munmap
This patch fixes a bug in how physical memory used to be mapped and
unmapped. Previously we unmapped and re-mapped if restoring from a
checkpoint. However, we never checked that the new mapping was
actually the same, it was just magically working as the OS seems to
fairly reliably give us the same chunk back. This patch fixes this
issue by relying entirely on the mmap call in the constructor.
2014-02-18 05:51:01 -05:00
Andreas Hansson f0ea79c41f dev: Include basic devices in NULL ISA build
This patch enbles use of the basic PIO devices as part of the NULL
build. Although it might seem counter intuitive to have a PIO device
without being able to execute a driver, this change enables us to
break a device class hierarchy into an ISA-agnostic part, and an
ISA-specific part, without requiring multiple-inheritance. The
ISA-agnostic base class is a PIO device, but does not make use of the
port.
2014-02-18 05:50:59 -05:00
Andreas Hansson 969b436243 mem: Filter cache snoops based on address ranges
This patch adds a filter to the cache to drop snoop requests that are
not for a range covered by the cache. This fixes an issue observed
when multiple caches are placed in parallel, covering different
address ranges. Without this patch, all the caches will forward the
snoop upwards, when only one should do so.
2014-02-18 05:50:58 -05:00
Andreas Hansson bf2f178f85 mem: Add a wrapped DRAMSim2 memory controller
This patch adds DRAMSim2 as a memory controller by wrapping the
external library and creating a sublass of AbstractMemory that bridges
between the semantics of gem5 and the DRAMSim2 interface.

The DRAMSim2 wrapper extracts the clock period from the config
file. There is no way of extracting this information from DRAMSim2
itself, so we simply read the same config file and get it from there.

To properly model the response queue, the wrapper keeps track of how
many transactions are in the actual controller, and how many are
stacking up waiting to be sent back as responses (in the wrapper). The
latter requires us to move away from the queued port and manage the
packets ourselves. This is due to DRAMSim2 not having any flow control
on the response path.

DRAMSim2 assumes that the transactions it is given are matching the
burst size of the choosen memory. The wrapper checks to ensure the
cache line size of the system matches the burst size of DRAMSim2 as
there are currently no provisions to split the system requests. In
theory we could allow a cache line size smaller than the burst size,
but that would lead to inefficient use of the DRAM, so for not we
fatal also in this case.
2014-02-18 05:50:53 -05:00
Andreas Hansson c9cb492e1c mem: Fix input to DPRINTF in CommMonitor
Minor fix of the debug message parameters.
2014-02-18 05:50:51 -05:00
Andreas Sandberg c52190a695 cpu: simple: Add support for using branch predictors
This changesets adds branch predictor support to the
BaseSimpleCPU. The simple CPUs normally don't need a branch predictor,
however, there are at least two cases where it can be desirable:

  1) A simple CPU can be used to warm the branch predictor of an O3
     CPU before switching to the slower O3 model.

  2) The simple CPU can be used as a quick way of evaluating/debugging
     new branch predictors since it exposes branch predictor
     statistics.

Limitations:
  * Since the simple CPU doesn't speculate, only one instruction will
    be active in the branch predictor at a time (i.e., the branch
    predictor will never see speculative branches).

  * The outcome of a branch prediction does not affect the performance
    of the simple CPU.
2014-02-09 20:49:28 +01:00
Nilay Vaish eb73a14fe2 base: calls abort() from fatal
Currently fatal() ends the simulation in a normal fashion.  This results in
the call stack getting lost when using a debugger and it is not always
possible to debug the simulation just from the information provided by the
printed error message.  Even though the error is likely due to a user's fault,
the information available should not be thrown away.  Hence, this patch to
call abort() from fatal().
2014-02-06 16:30:13 -06:00
Nilay Vaish bb0e9119e7 ruby: memory controller: use MemoryNode * 2014-02-06 16:30:12 -06:00
Andreas Sandberg e76a37985f x86: Fix x87 state transfer bug
Changeset 7274310be1bb (isa: clean up register constants) increased
the value of NumFloatRegs, which triggered a bug in
X86ISA::copyRegs(). This bug is caused by the x87 stack being copied
twice since register indexes past NUM_FLOATREGS are mapped into the
x87 stack relative to the top of the stack, which is undefined when
the copy takes place.

This changeset updates the copyRegs() function to use access registers
using the non-flattening interface, which guarantees that undesirable
register folding does not happen.
2014-02-05 14:08:13 +01:00
Nikos Nikoleris c6279f2d19 x86, kvm: Fix bug in the RFlags get and set functions
The getRFlags and setRFlags utility functions were not updated
correctly when condition registers were separated into their own
register class. This lead to incorrect state transfer in calls from
kvm into the simulator (e.g., m5 readfile ended up in an infinite
loop) and when switching CPUs. This patch makes these utility
functions use getCCReg and setCCReg instead of getIntReg and setIntReg
which read and write the integer registers.

Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>
2014-02-02 16:37:35 +01:00
Ola Jeppsson 7f16951451 unittest: Fix build errors
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-30 12:21:58 -06:00
Mitch Hayenga 96317d466e mem: Add additional tolerance to stride prefetcher
Forces the prefetcher to mispredict twice in a row before resetting the
confidence of prefetching.  This helps cases where a load PC strides by a
constant factor, however it may operate on different arrays at times.
Avoids the cost of retraining.  Primarily helps with small iteration loops.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga 771c864bf4 mem: Allowed tagged instruction prefetching in stride prefetcher
For systems with a tightly coupled L2, a stride-based prefetcher may observe
access requests from both instruction and data L1 caches.  However, the PC
address of an instruction miss gives no relevant training information to the
stride based prefetcher(there is no stride to train).  In theses cases, its
better if the L2 stride prefetcher simply reverted back to a simple N-block
ahead prefetcher.  This patch enables this option.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E) 95735e10e7 mem: prefetcher: add options, support for unaligned addresses
This patch extends the classic prefetcher to work on non-block aligned
addresses.  Because the existing prefetchers in gem5 mask off the lower
address bits of cache accesses, many predictable strides fail to be
detected.  For example, if a load were to stride by 48 bytes, with 64 byte
cachelines, the current stride based prefetcher would see an access pattern
of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern.  This
patch fixes this, by training the prefetcher on access and not masking off the
lower address bits.

It also adds the following configuration options:
1) Training/prefetching only on cache misses,
2) Training/prefetching only on data acceses,
3) Optionally tagging prefetches with a PC address.
#3 allows prefetchers to train off of prefetch requests in systems with
multiple cache levels and PC-based prefetchers present at multiple levels.
It also effectively allows a pipelining of prefetch requests (like in POWER4)
across multiple levels of cache hierarchy.

Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7%  for SPECFP (geomean).
2014-01-29 23:21:25 -06:00
Xiangyu Dong 32cc2ea8b9 cpu: fix bug when TrafficGen deschedules event
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 22:35:04 -06:00
Mitch Hayenga b77ca57f8c arm: Enable umask syscall in SE mode
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:51 -06:00
Mitch Hayenga 55a4ff5f04 base: Fix race condition in the socket listen function
gem5 makes the incorrect assumption that by binding a socket, it
effectively has allocated a port. Linux only allocates ports once you call
listen on the given socket, not when you call bind.  So even if the port was
free when bind was called, another process (gem5 instance) could race in
between the bind & listen calls and steal the port. In the current code, if
the call to bind fails due to the port being in use (EADDRINUSE), gem5 retries
for a different port.  However if listen fails, gem5 just panics. The fix is
testing the return value of listen and re-trying if it was due to EADDRINUSE.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:51 -06:00
Amin Farmahini ffbdaa7cce mem: Remove redundant findVictim() input argument
The patch
(1) removes the redundant writeback argument from findVictim()
(2) fixes the description of access() function

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:50 -06:00
Amin Farmahini 575a73f4a1 mem: Fixes a bug in simple_dram write merging
Fixes updating the value of size in the write merge function.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:49 -06:00
Nilay Vaish bdee69d0b1 x86: use lfpimm instead of limm for fptan 2014-01-27 18:50:54 -06:00
Nilay Vaish 6a543b5134 x86: implements x87 add/sub instructions 2014-01-27 18:50:53 -06:00
Nilay Vaish 5be0b846b1 x86: implements fxch instruction. 2014-01-27 18:50:52 -06:00
Nilay Vaish 4eb3b1ed0b x86: correct error in emms instruction. 2014-01-27 18:50:51 -06:00
ARM gem5 Developers 612f8f074f arm: Add support for ARMv8 (AArch64 & AArch32)
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli    (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt       (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole           (AArch64 NEON, validation)
Ali Saidi            (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang         (AArch64 Linux support)
Rene De Jong         (AArch64 Linux support, performance opt.)
Matt Horsnell        (AArch64 MP, validation)
Matt Evans           (device models, code integration, validation)
Chris Adeniyi-Jones  (AArch64 syscall-emulation)
Prakash Ramrakhyani  (validation)
Dam Sunwoo           (validation)
Chander Sudanthi     (validation)
Stephan Diestelhorst (validation)
Andreas Hansson      (code integration, performance opt.)
Eric Van Hensbergen  (performance opt.)
Gabe Black
2014-01-24 15:29:34 -06:00
Andreas Hansson cfc4a99982 arch: Make all register index flattening const
This patch makes all the register index flattening methods const for
all the ISAs. As part of this, readMiscRegNoEffect for ARM is also
made const.
2014-01-24 15:29:30 -06:00