Commit graph

103 commits

Author SHA1 Message Date
Nikos Nikoleris 698767e538 cpu, arch: fix the type used for the request flags
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-08-15 12:00:35 +01:00
Mitch Hayenga c75ff71139 mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.

This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
2016-04-07 09:30:20 -05:00
Andreas Sandberg be28d96510 Revert power patch sets with unexpected interactions
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:

e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject

Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>

--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
2016-04-06 19:43:31 +01:00
Mitch Hayenga 8615b27174 mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
2016-04-05 12:39:21 -05:00
Steve Reinhardt 1b6355c895 cpu. arch: add initiateMemRead() to ExecContext interface
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.

This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.

This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().

For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level.  For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
2016-01-17 18:27:46 -08:00
Steve Reinhardt 707275265f cpu: remove unnecessary data ptr from O3 internal read() funcs
The read() function merely initiates a memory read operation; the
data doesn't arrive until the access completes and a response packet
is received from the memory system.  Thus there's no need to provide
a data pointer; its existence is historical.

Getting this pointer out of this internal o3 interface sets the
stage for similar cleanup in the ExecContext interface.  Also
found that we were pointlessly setting the contents at this pointer
on a store forward (the useful memcpy happens just a few lines
below the deleted one).
2016-01-17 18:27:46 -08:00
Andreas Hansson 12eb034378 scons: Enable -Wextra by default
Make best use of the compiler, and enable -Wextra as well as
-Wall. There are a few issues that had to be resolved, but they are
all trivial.
2016-01-11 05:52:20 -05:00
Mitch Hayenga fafa83ed32 cpu: Add per-thread monitors
Adds per-thread address monitors to support FullSystem SMT.
2015-09-30 11:14:19 -05:00
Hongil Yoon fb0f9884e2 cpu, o3: consider split requests for LSQ checksnoop operations
This patch enables instructions in LSQ to track two physical addresses for
corresponding two split requests. Later, the information is used in
checksnoop() to search for/invalidate the corresponding LD instructions.

The current implementation has kept track of only the physical address that is
referenced by the first split request. Thus, for checksnoop(), the line
accessed by the second request has not been considered, causing potential
correctness issues.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-09-15 08:14:06 -05:00
Andreas Sandberg 53e777d683 base: Declare a type for context IDs
Context IDs used to be declared as ad hoc (usually as int). This
changeset introduces a typedef for ContextIDs and a constant for
invalid context IDs.
2015-08-07 09:59:13 +01:00
Nilay Vaish aafa5c3f86 revert 5af8f40d8f2c 2015-07-28 01:58:04 -05:00
Nilay Vaish 608641e23c cpu: implements vector registers
This adds a vector register type.  The type is defined as a std::array of a
fixed number of uint64_ts.  The isa_parser.py has been modified to parse vector
register operands and generate the required code.  Different cpus have vector
register files now.
2015-07-26 10:21:20 -05:00
Andreas Hansson bd583d00f9 misc: Appease gcc 5.1
Three minor issues are resolved:

1. Apparently gcc 5.1 does not like negation of booleans followed by
   bitwise AND.

2. Somehow the compiler also gets confused and warns about
   NoopMachInst being unused (removing it causes compilation errors
   though). Most likely a compiler bug.

3. There seems to be a number of instances where loop unrolling causes
   false positives for the array-bounds check. For now, switch to
   std::array. Potentially we could disable the warning for newer gcc
   versions, but switching to std::array is probably a good move in
   any case.
2015-05-15 13:39:53 -04:00
Andreas Sandberg 48281375ee mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

    * Speculation: CPU models are not allowed to issue speculative
      loads.

    * Write combining: CPU models and caches are not allowed to merge
      writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
2015-05-05 03:22:33 -04:00
Rekai 3d5434022a cpu: o3 register renaming request handling improved
Now, prior to the renaming, the instruction requests the exact amount of
registers it will need, and the rename_map decides whether the instruction is
allowed to proceed or not.
2015-03-02 04:00:38 -05:00
Andreas Sandberg 550c318490 sim: Move the BaseTLB to src/arch/generic/
The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.

--HG--
rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py
rename : src/sim/tlb.cc => src/arch/generic/tlb.cc
rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-02-11 10:23:27 -05:00
Ali Saidi 9d8ddd92dc sim: Clean up InstRecord
Track memory size and flags as well as add some comments and consts.
2015-01-25 07:22:44 -05:00
Marc Orr bf80734b2c x86 isa: This patch attempts an implementation at mwait.
Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
   evicted from the sleeping cpu's cache. This eviction is forwarded to the
   sleeping cpu, which then wakes up.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-11-06 05:42:22 -06:00
Andreas Hansson a2d246b6b8 arch: Use shared_ptr for all Faults
This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
2014-10-16 05:49:51 -04:00
Andreas Hansson 341dbf2662 arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
2014-09-27 09:08:36 -04:00
Andreas Sandberg 326662b01b arch, cpu: Factor out the ExecContext into a proper base class
We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.

The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.
2014-09-03 07:42:22 -04:00
Akash Bagdia 2b1a01ee6c cpu, arm: Allow the specification of a socket field
Allow the specification of a socket ID for every core that is reflected in the
MPIDR field in ARM systems.  This allows studying multi-socket / cluster
systems with ARM CPUs.
2014-05-09 18:58:46 -04:00
Andreas Hansson 62fe81e9c1 cpu: Make CPU and ThreadContext getters const
This patch merely tidies up the CPU and ThreadContext getters by
making them const where appropriate.
2014-03-07 15:56:23 -05:00
Ali Saidi 6bed6e0352 cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped.
This patch add support for generating wake-up events in the CPU when an address
that is currently in the exclusive state is hit by a snoop. This mechanism is required
for ARMv8 multi-processor support.
2014-01-24 15:29:30 -06:00
Dam Sunwoo 85e8779de7 mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks.  Cache occupancy stats are
recalculated on each stat dump.
2014-01-24 15:29:30 -06:00
Ali Saidi cf266f05a9 cpu: Fix O3 uncacheable load that is replayed but misses the TLB
This change fixes an issue in the O3 CPU where an uncachable instruction
is attempted to be executed before it reaches the head of the ROB. It is
determined to be uncacheable, and is replayed, but a PanicFault is attached
to the instruction to make sure that it is properly executed before
committing. If the TLB entry it was using is replaced in the interveaning
time, the TLB returns a delayed translation when the load is replayed at
the head of the ROB, however the LSQ code can't differntiate between the
old fault and the new one. If the translation isn't complete it can't
be faulting, so clear the fault.
2013-10-17 10:20:45 -05:00
Yasuko Eckert 2c293823aa cpu: add a condition-code register class
Add a third register class for condition codes,
in parallel with the integer and FP classes.
No ISAs use the CC class at this point though.
2013-10-15 14:22:44 -04:00
Andreas Hansson d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Ali Saidi 6df196b71e O3: Clean up the O3 structures and try to pack them a bit better.
DynInst is extremely large the hope is that this re-organization will put the
most used members close to each other.
2012-06-05 01:23:09 -04:00
Ali Saidi 1b370431d0 sim: Remove FastAlloc
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
2012-06-05 01:23:08 -04:00
Andreas Hansson 72538294fb gcc: Clean-up of non-C++0x compliant code, first steps
This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
2012-03-19 06:36:09 -04:00
Geoffrey Blake 043709fdfa CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files.  Also merges with the SE/FS changes.
2012-03-09 09:59:27 -05:00
Andreas Hansson 9f07d2ce7e CPU: Round-two unifying instr/data CPU ports across models
This patch continues the unification of how the different CPU models
create and share their instruction and data ports. Most importantly,
it forces every CPU to have an instruction and a data port, and gives
these ports explicit getters in the BaseCPU (getDataPort and
getInstPort). The patch helps in simplifying the code, make
assumptions more explicit, andfurther ease future patches related to
the CPU ports.

The biggest changes are in the in-order model (that was not modified
in the previous unification patch), which now moves the ports from the
CacheUnit to the CPU. It also distinguishes the instruction fetch and
load-store unit from the rest of the resources, and avoids the use of
indices and casting in favour of keeping track of these two units
explicitly (since they are always there anyways). The atomic, timing
and O3 model simply return references to their already existing ports.
2012-02-24 11:42:00 -05:00
Ali Saidi 8aaa39e93d mem: Add a master ID to each request object.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
2012-02-12 16:07:38 -06:00
Gabe Black f2b46fdb85 Faults: Turn off arch/faults.hh
Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.
2012-02-07 04:43:21 -08:00
Gabe Black ea8b347dc5 Merge with head, hopefully the last time for this batch. 2012-01-31 22:40:08 -08:00
Geoffrey Blake af6aaf2581 CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU.  These changes have only been tested with the ARM ISA.  Other
ISAs potentially require modification.
2012-01-31 07:46:03 -08:00
Gabe Black 85424bef19 SE/FS: Get rid of includes of config/full_system.hh. 2011-11-18 02:20:22 -08:00
Ali Saidi 649c239cee LSQ: Only trigger a memory violation with a load/load if the value changes.
Only create a memory ordering violation when the value could have changed
between two subsequent loads, instead of just when loads go out-of-order
to the same address. While not very common in the case of Alpha, with
an architecture with a hardware table walker this can happen reasonably
frequently beacuse a translation will miss and start a table walk and
before the CPU re-schedules the faulting instruction another one will
pass it to the same address (or cache block depending on the dendency
checking).

This patch has been tested with a couple of self-checking hand crafted
programs to stress ordering between two cores.

The performance improvement on SPEC benchmarks can be substantial (2-10%).
2011-09-13 12:58:08 -04:00
Gabe Black 49a7ed0397 StaticInst: Merge StaticInst and StaticInstBase.
Having two StaticInst classes, one nominally ISA dependent and the other ISA
dependent, has not been historically useful and makes the StaticInst class
more complicated that it needs to be. This change merges StaticInstBase into
StaticInst.
2011-09-09 02:40:11 -07:00
Gabe Black ec204f003c O3: Add a pointer to the macroop for a microop in the dyninst. 2011-08-14 04:08:14 -07:00
Gabe Black 16882b0483 Translation: Use a pointer type as the template argument.
This allows regular pointers and reference counted pointers without having to
use any shim structures or other tricks.
2011-08-07 09:21:48 -07:00
Gabe Black 6230668f5e O3: Get rid of the raw ExtMachInst constructor on DynInsts.
This constructor assumes that the ExtMachInst can be decoded directly into a
StaticInst that's useful to execute. With the advent of microcoded
instructions that's no longer true.
2011-08-02 11:51:16 -07:00
Gabe Black 3a1428365a ExecContext: Rename the readBytes/writeBytes functions to readMem and writeMem.
readBytes and writeBytes had the word "bytes" in their names because they
accessed blobs of bytes. This distinguished them from the read and write
functions which handled higher level data types. Because those functions don't
exist any more, this change renames readBytes and writeBytes to more general
names, readMem and writeMem, which reflect the fact that they are how you read
and write memory. This also makes their names more consistent with the
register reading/writing functions, although those are still read and set for
some reason.
2011-07-02 22:35:04 -07:00
Gabe Black 2e7426664a ExecContext: Get rid of the now unused read/write templated functions. 2011-07-02 22:34:58 -07:00
Ali Saidi 5962fecc1d CPU: Remove references to memory copy operations 2011-04-04 11:42:26 -05:00
Ali Saidi 7dde557fdc O3: Tighten memory order violation checking to 16 bytes.
The comment in the code suggests that the checking granularity should be 16
bytes, however in reality the shift by 8 is 256 bytes which seems much
larger than required.
2011-04-04 11:42:23 -05:00
Giacomo Gabrielli e2507407b1 O3: Enhance data address translation by supporting hardware page table walkers.
Some ISAs (like ARM) relies on hardware page table walkers.  For those ISAs,
when a TLB miss occurs, initiateTranslation() can return with NoFault but with
the translation unfinished.

Instructions experiencing a delayed translation due to a hardware page table
walk are deferred until the translation completes and kept into the IQ.  In
order to keep track of them, the IQ has been augmented with a queue of the
outstanding delayed memory instructions.  When their translation completes,
instructions are re-executed (only their initiateAccess() was already
executed; their DTB translation is now skipped).  The IEW stage has been
modified to support such a 2-pass execution.
2011-02-11 18:29:35 -06:00
Ali Saidi e681c0f7b3 O3: Support squashing all state after special instruction
For SPARC ASIs are added to the ExtMachInst. If the ASI is changed simply
marking the instruction as Serializing isn't enough beacuse that only
stops rename. This provides a mechanism to squash all the instructions
and refetch them
2010-12-07 16:19:57 -08:00
Giacomo Gabrielli 719f9a6d4f O3: Make all instructions that write a misc. register not perform the write until commit.
ARM instructions updating cumulative flags (ARM FP exceptions and saturation
flags) are not serialized.

Added aliases for ARM FP exceptions and saturation flags in FPSCR.  Removed
write accesses to the FP condition codes for most ARM VFP instructions: only
VCMP and VCMPE instructions update the FP condition codes.  Removed a potential
cause of seg. faults in the O3 model for NEON memory macro-ops (ARM).
2010-12-07 16:19:57 -08:00