Commit graph

17 commits

Author SHA1 Message Date
Andrew Lukefahr 6d32004407 minor: fixed LSQ MasterPortID
Minor was reporting the data cache access as ".inst" accesses.
This just switches the MasterPortID to dataMasterPortId.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-03 17:51:48 -06:00
Andrew Bardsley df37cad0fd cpu: Fix retries on barrier/store in Minor's store buffer
This patch fixes a case where a store in Minor's store buffer never
leaves the store buffer as it is pre-maturely counted as having been
issued, leading to the store buffer idling.

LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses
either in memory, or still in the store buffer after being completed.

For stores which are also barriers, the store will stay in the store
buffer for a cycle after it is completed and will be cleaned up by the
barrier clearing code (to ensure that barriers are completed in-order).
To acheive this, numUnissuedAccesses is not decremented when a store-barrier
is issued to memory, but when its barrier effect is cleared.

Without this patch, the correct behaviour happens when a memory transaction
is immediately accepted, but not if it needs a retry.
2014-12-02 06:08:15 -05:00
Andrew Bardsley 98f3e7a310 cpu: Fix memoryIssueLimit checking in Minor
This patch fixes the checking of the number of memory instructions issued
per cycles in the Minor CPU.
2014-12-02 06:08:13 -05:00
Andreas Hansson 41846cb61b mem: Assume all dynamic packet data is array allocated
This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.

The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.

As the last part the patch, it renames dataDynamicArray to dataDynamic.
2014-12-02 06:07:43 -05:00
Andreas Hansson 9779ba2e37 mem: Add const getters for write packet data
This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.
2014-12-02 06:07:36 -05:00
Andreas Hansson 481eb6ae80 arm: Fixes based on UBSan and static analysis
Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
2014-11-14 03:53:51 -05:00
Marc Orr bf80734b2c x86 isa: This patch attempts an implementation at mwait.
Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
   evicted from the sleeping cpu's cache. This eviction is forwarded to the
   sleeping cpu, which then wakes up.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-11-06 05:42:22 -06:00
Andrew Lukefahr bd32d55a2c cpu: Minor Draining Bug
Fixes a bug where Minor drains in the midst of committing a
conditional store.

While committing a conditional store, lastCommitWasEndOfMacroop is true
(from the previous instruction) as we still haven't finished the conditional
store. If a drain occurs before the cache response, Minor would check just
lastCommitWasEndOfMacroop, which was true, and set drainState=DrainHaltFetch,
which increases the streamSeqNum.  This caused the conditional store to be
squashed when the memory responded and it completed.  However, to the memory
the store succeeded, while to the instruction sequence it never occurred.

In the case of an LLSC, the instruction sequence will replay the squashed
STREX, which will fail as the cache is no longer in LLSC.  Then the
instruction sequence will loop back to a LDREX, which receives the updated
(incorrect) value.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-11-06 05:42:21 -06:00
Andrew Bardsley 536c72333f cpu: Fix barrier push to store buffer when full bug in Minor
This patch fixes a bug where a completing load or store which is also a
barrier can push a barrier into the store buffer without first checking
that there is a free slot.

The bug was not fatal but would print a warning that the store buffer
was full when inserting.
2014-10-29 23:18:24 -05:00
Andreas Sandberg e0074324ba cpu: Probe points for basic PMU stats
This changeset adds probe points that can be used to implement PMU
counters for CPU stats. The following probes are supported:

  * BaseCPU::ppCycles / Cycles
  * BaseCPU::ppRetiredInsts / RetiredInsts
  * BaseCPU::ppRetiredLoads / RetiredLoads
  * BaseCPU::ppRetiredStores / RetiredStores
  * BaseCPU::ppRetiredBranches RetiredBranches
2014-10-16 05:49:41 -04:00
Andreas Hansson 341dbf2662 arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
2014-09-27 09:08:36 -04:00
Mitch Hayenga e1403fc2af alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
2014-09-20 17:18:35 -04:00
Andreas Hansson 41fc8a573e arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
2014-09-19 10:35:18 -04:00
Andrew Bardsley 1a45a8c5d3 cpu: Fix memory access in Minor not setting parent Request flags
This patch fixes cases where uncacheable/memory type flags are not set
correctly on a memory op which is split in the LSQ.  Without this
patch, request->request if freely used to check flags where the flags
should actually come from the accumulation of request fragment flags.

This patch also fixes a bug where an uncacheable access which passes
through tryToSendRequest more than once can increment
LSQ::numAccessesInMemorySystem more than once.
2014-09-12 10:22:49 -04:00
Andreas Hansson 2b4906fc64 minor: Fix typo in DPRINTF for Minor branch prediction 2014-09-12 10:22:46 -04:00
Andreas Sandberg 326662b01b arch, cpu: Factor out the ExecContext into a proper base class
We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.

The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.
2014-09-03 07:42:22 -04:00
Andrew Bardsley 0e8a90f06b cpu: `Minor' in-order CPU model
This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).

The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).

Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.

Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.

Minor is faster than the o3 model. Sample results:

     Benchmark     |   Stat host_seconds (s)
    ---------------+--------v--------v--------
     (on ARM, opt) | simple | o3     | minor
                   | timing | timing | timing
    ---------------+--------+--------+--------
    10.linux-boot  |   169  |  1883  |  1075
    10.mcf         |   117  |   967  |   491
    20.parser      |   668  |  6315  |  3146
    30.eon         |   542  |  3413  |  2414
    40.perlbmk     |  2339  | 20905  | 11532
    50.vortex      |   122  |  1094  |   588
    60.bzip2       |  2045  | 18061  |  9662
    70.twolf       |   207  |  2736  |  1036
2014-07-23 16:09:04 -05:00