Commit graph

11029 commits

Author SHA1 Message Date
Andreas Hansson df8df4fd0a stats: Bump stats for decoder, TLB, prefetcher and DRAM changes
Changes due to speculative execution of an unaligned PC, introduction
of TLB stats, changes and re-work of the prefetcher, and the
introduction of rank-wise refresh in the DRAM controller.
2014-12-23 09:31:20 -05:00
Mitch Hayenga b2342c5d9a mem: Change prefetcher to use random_mt
Prefechers has used rand() to generate random numers previously.
2014-12-23 09:31:19 -05:00
Curtis Dunham 516e6046ae mem: Hide WriteInvalidate requests from prefetchers
Without this tweak, a prefetcher will happily prefetch data that will
promptly be invalidated and overwritten by a WriteInvalidate.
2014-12-23 09:31:19 -05:00
Mitch Hayenga bd4f901c77 mem: Fix event scheduling issue for prefetches
The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever
a future prefetch is ready.  However, a prefetch being ready does not guarentee
that it can obtain an MSHR.  So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond
until an MSHR is finally freed and the prefetch can happen.

This patch fixes this by not signaling the prefetch ready time if the prefetch
could not be generated.  The event is rescheduled as soon as a MSHR becomes
available.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 4acd4a2055 mem: Fix bug relating to writebacks and prefetches
Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system.  I hit that case.  Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.
2014-12-23 09:31:18 -05:00
Mitch Hayenga df82a2d003 mem: Rework the structuring of the prefetchers
Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.

Finally, the stride-based prefetcher now assumes a custimizable lookup table
(sets/ways) rather than the previous fully associative structure.
2014-12-23 09:31:18 -05:00
Mitch Hayenga 6cb58b2bd2 mem: Add parameter to reserve MSHR entries for demand access
Adds a new parameter that reserves some number of MSHR entries for demand
accesses.  This helps prevent prefetchers from taking all MSHRs, forcing demand
requests from the CPU to stall.
2014-12-23 09:31:18 -05:00
Curtis Dunham 4d88978913 arm: Add stats to table walker
This patch adds table walker stats for:
- Walk events
- Instruction vs Data
- Page size histogram
- Wait time and service time histograms
- Pending requests histogram (per cycle) - measures dist. of L
  (p(1..) = how often busy, p(0) = how often idle)
- Squashes, before starting and after completion
2014-12-23 09:31:18 -05:00
Andreas Hansson 59460b91f3 config: Expose the DRAM ranks as a command-line option
This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).

The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).
2014-12-23 09:31:18 -05:00
Andreas Hansson 2f7baf9dbe mem: Ensure DRAM controller is idle when in atomic mode
This patch addresses an issue seen with the KVM CPU where the refresh
events scheduled by the DRAM controller forces the simulator to switch
out of the KVM mode, thus killing performance.

The current patch works around the fact that we currently have no
proper API to inform a SimObject of the mode switches. Instead we rely
on drainResume being called after any switch, and cache the previous
mode locally to be able to decide on appropriate actions.

The switcheroo regression require a minor stats bump as a result.
2014-12-23 09:31:18 -05:00
Omar Naji 381d1da791 mem: Add rank-wise refresh to the DRAM controller
This patch adds rank-wise refresh to the controller, as opposed to the
channel-wide refresh currently in place. In essence each rank can be
refreshed independently, and for this to be possible the controller
is extended with a state machine per rank.

Without this patch the data bus is always idle during a refresh, as
all the ranks are refreshing at the same time. With the rank-wise
refresh it is possible to use one rank while another one is
refreshing, and thus the data bus can be kept busy.

The patch introduces a Rank class to encapsulate the state per rank,
and also shifts all the relevant banks, activation tracking etc to the
rank. The arbitration is also updated to consider the state of the rank.
2014-12-23 09:31:18 -05:00
Omar Naji 152c02354e mem: Fix a bug in the DRAM controller arbitration
Fix a minor issue that affects multi-rank systems.
2014-12-23 09:31:18 -05:00
Andreas Hansson e76e8e28a3 tests: Add a regression for the stack distance calculator
Re-use the existing traffic generator regression, and enable the stack
distance calculation in the comm monitor, along with the verification
stack.

The traffic generator config is also tuned to not increase the
run-time too much (and actually have some address re-use).
2014-12-23 09:31:18 -05:00
Kanishk Sugand 7a25b1a0e0 mem: Add stack distance statistics to the CommMonitor
This patch adds the stack distance calculator to the CommMonitor. The
stats are disabled by default.
2014-12-23 09:31:18 -05:00
Kanishk Sugand 888975b29d mem: Add a stack distance calculator
This patch adds a stand-alone stack distance calculator. The stack
distance calculator is a passive SimObject that observes the addresses
passed to it. It calculates stack distances (LRU Distances) of
incoming addresses based on the partial sum hierarchy tree algorithm
described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043.

For each transaction a hashtable look-up is performed. At every
non-unique transaction the tree is traversed from the leaf at the
returned index to the root, the old node is deleted from the tree, and
the sums (to the right) are collected and decremented. The collected
sum represets the stack distance of the found node. At every unique
transaction the stack distance is returned as
numeric_limits<uint64>::max().

In addition to the basic stack distance calculation, a feature to mark
an old node in the tree is added. This is useful if it is required to
see the reuse pattern. For example, Writebacks to the lower level
(e.g. membus from L2), can be marked instead of being removed from the
stack (isMarked flag of Node set to True). And then later if this same
address is accessed (by L1), the value of the isMarked flag would be
True. This gives some insight on how the Writeback policy of the
lower level affect the read/write accesses in an application.

Debugging is enabled by setting the verify flag to true. Debugging is
implemented using a dummy stack that behaves in a naive way, using STL
vectors. Note that this has a large impact on run time.
2014-12-23 09:31:18 -05:00
Marco Elver 177682ead4 config: Add --memchecker option
This patch adds the --memchecker option, to denote that a MemChecker
should be instantiated for the system. The exact usage of the MemChecker
depends on the system configuration.

For now CacheConfig.py makes use of the option, adding MemCheckerMonitor
instances between CPUs and D-Caches.

Note, however, that currently this only provides limited checking on a
running system; other parts of the system, such as I/O devices are not
monitored, and may cause warnings to be issued by the monitor.
2014-12-23 09:31:18 -05:00
Marco Elver dd0f3943e2 mem: Add MemChecker and MemCheckerMonitor
This patch adds the MemChecker and MemCheckerMonitor classes. While
MemChecker can be integrated anywhere in the system and is independent,
the most convenient usage is through the MemCheckerMonitor -- this
however, puts limitations on where the MemChecker is able to observe
read/write transactions.
2014-12-23 09:31:17 -05:00
Andreas Sandberg 184fefbb3b arm: Raise an alignment fault if a PC has illegal alignment
We currently don't handle unaligned PCs correctly. There is one check
for unaligned PCs in the TLB when running in aarch64 mode, but this
check does not cover cases where the CPU does not do a TLB lookup when
decoding an instruction (e.g., a branch stays within the same cache
line). Additionally, the Decoder class sometimes throws an assertion
for unaligned PCs which breaks speculation.

This changeset introduces a decoder fault bit field in the ExtMachInst
structure. This field can be used to signal a decoder failure. If set,
the decoder generates an internal gem5fault instruction instead of a
normal instruction. This instruction in turns either panics (fault
type PANIC), returns an PCAlignmentFault (fault type UNALIGNED,
aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32).

The patch causes minor changes to the realview64 regressions, and a
stats bump will follow.
2014-12-23 09:31:17 -05:00
Andreas Sandberg b33812ba43 arm: Clean up and document decoder API
This changeset adds more documentation to the ArmISA::Decoder class
and restructures it slightly to make API groups more obvious.
2014-12-23 09:31:17 -05:00
Andreas Sandberg 070b4a81db arm: Add support for filtering in the PMU
This patch adds support for filtering events in the PMU. In order to
do so, it updates the ISADevice base class to forward an ISA pointer
to ISA devices. This enables such devices to access the MiscReg file
to determine the current execution level.
2014-12-23 09:31:17 -05:00
Dam Sunwoo 809134a2b1 config: Add options to take/resume from SimPoint checkpoints
More documentation at http://gem5.org/Simpoints

Steps to profile, generate, and use SimPoints with gem5:

1. To profile workload and generate SimPoint BBV file, use the
following option:

--simpoint-profile --simpoint-interval <interval length>

Requires single Atomic CPU and fastmem.
<interval length> is in number of instructions.

2. Generate SimPoint analysis using SimPoint 3.2 from UCSD.
(SimPoint 3.2 not included with this flow.)

3. To take gem5 checkpoints based on SimPoint analysis, use the
following option:

--take-simpoint-checkpoint=<simpoint file path>,<weight file
path>,<interval length>,<warmup length>

<simpoint file> and <weight file> is generated by SimPoint analysis
tool from UCSD. SimPoint 3.2 format expected. <interval length> and
<warmup length> are in number of instructions.

4. To resume from gem5 SimPoint checkpoints, use the following option:

--restore-simpoint-checkpoint -r <N> --checkpoint-dir <simpoint
checkpoint path>

<N> is (SimPoint index + 1). E.g., "-r 1" will resume from SimPoint
#0.
2014-12-23 09:31:17 -05:00
Gabe Black 7e34bae813 scons: Make the USE_KVM variable available in C++.
We need it to determine whether we should expect KVM related parameters
exist in the cirrus graphics device.
2014-12-22 16:49:24 -08:00
Nilay Vaish 8d9ce18c2b Added tag stable_2014_12_14 for changeset bdb307e8be54 2014-12-14 16:21:04 -06:00
Gabe Black 70eb68beae Let other objects set up memory like regions in a KVM VM. 2014-12-09 21:53:44 -08:00
Andreas Sandberg 9b7578d8c7 arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0
The aarch64 system register decoder is currently not decoding
PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the
decoder so that they are decoded using the values in table C5-6 in ARM
DDI 0478A.c.
2014-12-08 04:49:53 -05:00
Andreas Sandberg 6a9fbd295d dev: Add response sanity checks in PioPort
Add an assert in the PioPort that checks if a response packet from a
device has the right flags set before passing it to them rest of the
memory system.
2014-12-08 04:49:52 -05:00
Andreas Sandberg 1ccc4e0e21 dev: Correctly transform packets into responses
The VirtIO devices didn't correctly set the response flags in memory
packets. This changeset adds the required Packet::makeResponse()
calls.
2014-12-08 04:49:51 -05:00
Gabe Black 4a8a0a0798 misc: Generalize GDB single stepping.
The new single stepping implementation for x86 doesn't rely on any ISA
specific properties or functionality. This change pulls out the per ISA
implementation of those functions and promotes the X86 implementation to the
base class.

One drawback of that implementation is that the CPU might stop on an
instruction twice if it's affected by both breakpoints and single stepping.
While that might be a little surprising, it's harmless and would only happen
under somewhat unlikely circumstances.
2014-12-05 22:37:03 -08:00
Gabe Black fb07d43b1a x86: Implement a remote GDB stub.
This stub should allow remote debugging of 32 bit and 64 bit targets. Single
stepping seems to work, as do breakpoints. If both breakpoints and single
stepping affect an instruction, gdb will stop at the instruction twice before
continuing. That's a little surprising, but is generally harmless.
2014-12-05 22:36:16 -08:00
Gabe Black 16c9b41616 misc: Add some utility functions for schedule inst commit events.
These can be used to simplify the implementation of single step in derived
classes.
2014-12-05 22:35:47 -08:00
Gabe Black cddf988bfd misc: Rename the GDB "Event" event class to InputEvent.
The "Event" name is the same as the base event class. That's a bit confusing,
and makes it a little awkward to add other event types.
2014-12-05 22:34:42 -08:00
Gabe Black f9f46b8fa9 sim: Ensure GDB interrupts the simulation at an instruction boundary.
Use the comInstEventQueue to ensure GDB interrupts the simulation at an
instruction boundary and not in the middle of a macroop, memory access, etc.
2014-12-05 01:51:49 -08:00
Gabe Black bacbb8ecbc cpu: Only check for PC events on instruction boundaries.
Only the instruction address is actually checked, so there's no need to check
repeatedly while we're working through the microops of a macroop and that's
not changing.
2014-12-05 01:47:35 -08:00
Gabe Black fe48c0a32b misc: Make the GDB register cache accessible in various sized chunks.
Not all ISAs have 64 bit sized registers, so it's not always very convenient
to access the GDB register cache in 64 bit sized chunks. This change makes it
accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations
were working around that limitation by bundling and unbundling 32 bit values
into 64 bit values. That code has been removed.
2014-12-05 01:44:24 -08:00
Gabe Black 7540656fc5 config: Add two options for setting the kernel command line.
Both options accept template which will, through python string formatting,
have "mem", "disk", and "script" values substituted in from the mdesc.
Additional values can be used on a case by case basis by passing them as
keyword arguments to the fillInCmdLine function. That makes it possible to
have specialized parameters for a particular ISA, for instance.

The first option lets you specify the template directly, and the other lets
you specify a file which has the template in it.
2014-12-04 16:42:07 -08:00
Gabe Black 22aaa5867f x86: Rework opcode parsing to support 3 byte opcodes properly.
Instead of counting the number of opcode bytes in an instruction and recording
each byte before the actual opcode, we can represent the path we took to get to
the actual opcode byte by using a type code. That has a couple of advantages.
First, we can disambiguate the properties of opcodes of the same length which
have different properties. Second, it reduces the amount of data stored in an
ExtMachInst, making them slightly easier/faster to create and process. This
also adds some flexibility as far as how different types of opcodes are
handled, which might come in handy if we decide to support VEX or XOP
instructions.

This change also adds tables to support properly decoding 3 byte opcodes.
Before we would fall off the end of some arrays, on top of the ambiguity
described above.

This change doesn't measureably affect performance on the twolf benchmark.

--HG--
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa
2014-12-04 15:53:54 -08:00
Gabe Black 3069c28a02 arch: Allow named constants as decode case values.
The values in a "bitfield" or in an ExtMachInst structure member may not be a
literal value, it might select from an arbitrary collection of options. Instead
of using the raw value of those constants in the decoder, it's easier to tell
what's going on if they can be referred to as a symbolic constant/enum.

To support that, the ISA description language is extended slightly so that in
addition to integer literals, the case value for decode blobs can also be a
string literal. It's up to the ISA author to ensure that the string evaluates
to a legal constant value when interpretted as C++.
2014-12-04 15:52:48 -08:00
Nilay Vaish cca1608bd5 config: ruby: mi protocol: correct master slave setting for dma
In the MI protocol, the master slave connection between the dma controller
and network was being set incorrectly.  This patch corrects it.
2014-12-04 08:59:44 -06:00
Gabe Black d67cf81f5d x86: Clean up style in process.cc. 2014-12-02 22:01:51 -08:00
Gabe Black 2d9dae01fb sim: Make it possible to override the breakpoint length check.
The check which makes sure the length of the breakpoint being written is the
same as a MachInst is only correct on fixed instruction width ISAs. Instead of
incorrectly applying that check to all ISAs, this change makes that the
default check and lets ISA specific GDB classes override it.
2014-12-03 03:27:19 -08:00
Gabe Black b7dc4ba516 config: Get rid of some extra spaces around default arguments. 2014-12-03 03:11:00 -08:00
Gabe Black ecec8cde63 ide: Accept the IDLE (0xe3) ATA command.
This command is supposed to set up a timer which will put the drive into a
standby mode if it isn't sent a command within a given time out. Since most of
the timeouts are generally significantly longer than a simulation would run
anyway, and we don't have an implementation for standby mode to begin with,
we can accept the command, do nothing, and report success.
2014-12-03 03:07:35 -08:00
Gabe Black bce58726f3 dev: Support translating left and right ALT keys.
This is used primarily for VNC.
2014-12-03 03:06:03 -08:00
Andreas Hansson 6489598fb4 stats: Bump stats for fixes, mostly TLB and WriteInvalidate 2014-12-02 06:08:25 -05:00
Andreas Hansson 966c3f4bc5 scons: Ensure dictionary iteration is sorted by key
This patch adds sorting based on the SimObject name or parameter name
for all situations where we iterate over dictionaries. This should
ensure a deterministic and consistent order across the host systems
and hopefully avoid regression results differing across python
versions.
2014-12-02 06:08:22 -05:00
Curtis Dunham 5d22250845 mem: Support WriteInvalidate (again)
This patch takes a clean-slate approach to providing WriteInvalidate
(write streaming, full cache line writes without first reading)
support.

Unlike the prior attempt, which took an aggressive approach of directly
writing into the cache before handling the coherence actions, this
approach follows the existing cache flows as closely as possible.
2014-12-02 06:08:19 -05:00
Curtis Dunham 7ca27dd3cc mem: Remove WriteInvalidate support
Prepare for a different implementation following in the next patch
2014-12-02 06:08:17 -05:00
Andrew Bardsley df37cad0fd cpu: Fix retries on barrier/store in Minor's store buffer
This patch fixes a case where a store in Minor's store buffer never
leaves the store buffer as it is pre-maturely counted as having been
issued, leading to the store buffer idling.

LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses
either in memory, or still in the store buffer after being completed.

For stores which are also barriers, the store will stay in the store
buffer for a cycle after it is completed and will be cleaned up by the
barrier clearing code (to ensure that barriers are completed in-order).
To acheive this, numUnissuedAccesses is not decremented when a store-barrier
is issued to memory, but when its barrier effect is cleared.

Without this patch, the correct behaviour happens when a memory transaction
is immediately accepted, but not if it needs a retry.
2014-12-02 06:08:15 -05:00
Andrew Bardsley 98f3e7a310 cpu: Fix memoryIssueLimit checking in Minor
This patch fixes the checking of the number of memory instructions issued
per cycles in the Minor CPU.
2014-12-02 06:08:13 -05:00
Andrew Bardsley 3cd0b1f6a6 arm: Fix TLB ignoring faults when table walking
This patch fixes a case where the Minor CPU can deadlock due to the lack
of a response to TLB request because of a bug in fault handling in the ARM
table walker.

TableWalker::processWalkWrapper is the scheduler-called wrapper which
handles deferred walks which calls to TableWalker::wait cannot immediately
process.  The handling of faults generated by processWalk{AArch64,LPAE,}
calls in those two functions is is different.  processWalkWrapper ignores
fault returns from processWalk... which can lead to ::finish not being
called on a translation.

This fix provides fault handling in processWalkWrapper similar to that
found in the leaf functions which BaseTLB::Translation::finish.
2014-12-02 06:08:11 -05:00