Commit graph

1275 commits

Author SHA1 Message Date
Steve Reinhardt 5200e04e92 arch, x86: add support for arrays as memory operands
Although the cache models support wider accesses, the ISA descriptions
assume that (for the most part) memory operands are integer types,
which makes it difficult to define instructions that do memory accesses
larger than 64 bits.

This patch adds some generic support for memory operands that are arrays
of uint64_t, and specifically a 'u2qw' operand type for x86 that is an
array of 2 uint64_ts (128 bits).  This support is unused at this point,
but will be needed shortly for cmpxchg16b.  Ideally the 128-bit SSE
memory accesses will also be rewritten to use this support.

Support for 128-bit accesses could also have been added using the gcc
__int128_t extension, which would have been less disruptive.  However,
although clang also supports __int128_t, it's still non-standard.
Also, more importantly, this approach creates a path to defining
256- and 512-byte operands as well, which will be useful for eventual
AVX support.
2016-02-06 17:21:20 -08:00
Steve Reinhardt 92b750d5ef syscall_emul: fix bug in aux vector initialization
Writing 16 bytes from an 8-byte source value is a bad idea.
This doesn't appear to have broken anything, but showed up
as spurious differences when tracediffing runs.
2016-02-06 17:21:20 -08:00
Steve Reinhardt 2d91e741e8 x86: create function to check miscreg validity
In the process of trying to get rid of an '== false' comparison,
it became apparent that a slightly more involved solution was
needed.  Split this out into its own changeset since it's not
a totally trivial local change like the others.
2016-02-06 17:21:20 -08:00
Steve Reinhardt 5592798865 style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'.
2016-02-06 17:21:19 -08:00
Steve Reinhardt dc8018a5c3 style: remove trailing whitespace
Result of running 'hg m5style --skip-all --fix-white -a'.
2016-02-06 17:21:18 -08:00
Steve Reinhardt 1b6355c895 cpu. arch: add initiateMemRead() to ExecContext interface
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.

This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.

This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().

For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level.  For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
2016-01-17 18:27:46 -08:00
Steve Reinhardt e595d9cccb arch: don't call *Timing functions from *Atomic versions
The readMemAtomic/writeMemAtomic helper functions were calling
readMemTiming/writeMemTiming respectively.  This is functionally
correct, since the *Timing functions are doing the same access
initiation operation as the *Atomic functions (just that the
*Atomic versions also complete the access in line).  It also
provides for some (very minimal) code reuse.  Unfortunately,
it's potentially pretty confusing, since it makes it look like
the atomic accesses are somehow being converted to timing
accesses.  It also gets in the way of specializing the timing
interface (as will be done in a future patch).
2016-01-17 18:27:46 -08:00
Steve Reinhardt fb0383bc72 arch: get rid of unused LargestRead typedef 2016-01-17 18:27:46 -08:00
Gabor Dozsa e677494260 pseudo inst,util: Add optional key to initparam pseudo instruction
The key parameter can be used to read out various config parameters from
within the simulated software.
2016-01-07 16:33:47 -06:00
Boris Shingarov d765dbf22c arm: remote GDB: rationalize structure of register offsets
Currently, the wire format of register values in g- and G-packets is
modelled using a union of uint8/16/32/64 arrays.  The offset positions
of each register are expressed as a "register count" scaled according
to the width of the register in question.  This results in counter-
intuitive and error-prone "register count arithmetic", and some
formats would even be altogether unrepresentable in such model, e.g.
a 64-bit register following a 32-bit one would have a fractional index
in the regs64 array.
Another difficulty is that the array is allocated before the actual
architecture of the workload is known (and therefore before the correct
size for the array can be calculated).

With this patch I propose a simpler mechanism for expressing the
register set structure.  In the new code, GdbRegCache is an abstract
class; its subclasses contain straightforward structs reflecting the
register representation.  The determination whether to use e.g. the
AArch32 vs. AArch64 register set (or SPARCv8 vs SPARCv9, etc.) is made
by polymorphically dispatching getregs() to the concrete subclass.
The subclass is not instantiated until it is needed for actual
g-/G-packet processing, when the mode is already known.

This patch is not meant to be merged in on its own, because it changes
the contract between src/base/remote_gdb.* and src/arch/*/remote_gdb.*,
so as it stands right now, it would break the other architectures.
In this patch only the base and the ARM code are provided for review;
once we agree on the structure, I will provide src/arch/*/remote_gdb.*
for the other architectures; those patches could then be merged in
together.

Review Request: http://reviews.gem5.org/r/3207/
Pushed by Joel Hestness <jthestness@gmail.com>
2015-12-18 15:12:07 -06:00
Swapnil Haria 08cec03f8e x86: Invalidating TLB entry on page fault
As per the x86 architecture specification, matching TLB entries need to be
invalidated on a page fault. For instance, after a page fault due to inadequate
protection bits on a TLB hit, the TLB entry needs to be invalidated. This
behavior is clearly specified in the x86 architecture manuals from both AMD and
Intel.  This invalidation is missing currently in gem5, due to which linux
kernel versions 3.8 and up cannot be simulated efficiently. This is exposed by
a linux optimisation in commit e4a1cc56e4d728eb87072c71c07581524e5160b1, which
removes a tlb flush on updating page table entries in x86.

Testing: Linux kernel versions 3.8 onwards were booting very slowly in FS mode,
due to repeated page faults (~300000 before the first print statement in a
bash file). Ensured that page fault rate drops drastically and observed
reduction in boot time from order of hours to minutes for linux kernel v3.8
and v3.11
2015-11-16 05:08:54 -06:00
Bjoern A. Zeeb f50e92d2c7 x86: cpuid: add family to warn() message
doCpuid() has to identical warn messages about unimplemented functions.  Add
the family to the log message to make them distinguishable.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-16 04:58:39 -06:00
Bjoern A. Zeeb 5c49635f20 x86: pagetable walker: fix typo in comment 2015-11-16 04:58:39 -06:00
Andreas Hansson b48ed9b6c2 x86: Add missing explicit overrides for X86 devices
Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.
2015-10-23 09:51:12 -04:00
Andreas Hansson 22c04190c6 misc: Remove redundant compiler-specific defines
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
2015-10-12 04:07:59 -04:00
Rekai Gonzalez Alberquilla d3d159749a isa: Add parameter to pick different decoder inside ISA
The decoder is responsible for splitting instructions in micro
operations (uops). Given that different micro architectures may split
operations differently, this patch allows to specify which micro
architecture each isa implements, so different cores in the system can
split instructions differently, also decoupling uop splitting
(microArch) from ISA (Arch). This is done making the decodification
calls templates that receive a type 'DecoderFlavour' that maps the
name of the operation to the class that implements it. This way there
is only one selection point (converting the command line enum to the
appropriate DecodeFeatures object). In addition, there is no explicit
code replication: template instantiation hides that, and the compiler
should be able to resolve a number of things at compile-time.
2015-10-09 14:50:54 -05:00
Steve Reinhardt a2c875c746 x86: implement rcpps and rcpss SSE insts
These are packed single-precision approximate reciprocal operations,
vector and scalar versions, respectively.

This code was basically developed by copying the code for
sqrtps and sqrtss.  The mrcp micro-op was simplified relative to
msqrt since there are no double-precision versions of this operation.
2015-10-06 17:26:50 -07:00
Steve Reinhardt 57b9f53afa x86: implement fild, fucomi, and fucomip x87 insts
fild loads an integer value into the x87 top of stack register.
fucomi/fucomip compare two x87 register values (the latter
also doing a stack pop).
These instructions are used by some versions of GNU libstdc++.
2015-10-06 17:26:50 -07:00
Mitch Hayenga 9e07a7504c cpu,isa,mem: Add per-thread wakeup logic
Changes wakeup functionality so that only specific threads on SMT
capable cpus are woken.
2015-09-30 11:14:19 -05:00
Mitch Hayenga a5c4eb3de9 isa,cpu: Add support for FS SMT Interrupts
Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
2015-09-30 11:14:19 -05:00
David Hashe a2d9aae3c3 x86: x86 instruction-implementation bug fixes
Added explicit data sizes and an opcode type for correct execution.
2015-07-20 09:15:18 -05:00
David Hashe d0f6aad3c6 syscall: Add readlink to x86 with special case /proc/self/exe
This patch implements the correct behavior.
2015-07-20 09:15:18 -05:00
Nilay Vaish aafa5c3f86 revert 5af8f40d8f2c 2015-07-28 01:58:04 -05:00
Nilay Vaish 608641e23c cpu: implements vector registers
This adds a vector register type.  The type is defined as a std::array of a
fixed number of uint64_ts.  The isa_parser.py has been modified to parse vector
register operands and generate the required code.  Different cpus have vector
register files now.
2015-07-26 10:21:20 -05:00
Nilay Vaish 0ef3dcc27b x86: decode instructions with vex prefix
This patch updates the x86 decoder so that it can decode instructions with vex
prefix. It also updates the isa with opcodes from vex opcode maps 1, 2 and 3.
Note that none of the instructions have been implemented yet. The
implementations would be provided in due course of time.
2015-07-17 11:31:22 -05:00
Andreas Sandberg 76cd4393c0 sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

  * Add a set of APIs to serialize into a subsection of the current
    object. Previously, objects that needed this functionality would
    use ad-hoc solutions using nameOut() and section name
    generation. In the new world, an object that implements the
    interface has the methods serializeSection() and
    unserializeSection() that serialize into a named /subsection/ of
    the current object. Calling serialize() serializes an object into
    the current section.

  * Move the name() method from Serializable to SimObject as it is no
    longer needed for serialization. The fully qualified section name
    is generated by the main serialization code on the fly as objects
    serialize sub-objects.

  * Add a scoped ScopedCheckpointSection helper class. Some objects
    need to serialize data structures, that are not deriving from
    Serializable, into subsections. Previously, this was done using
    nameOut() and manual section name generation. To simplify this,
    this changeset introduces a ScopedCheckpointSection() helper
    class. When this class is instantiated, it adds a new /subsection/
    and subsequent serialization calls during the lifetime of this
    helper class happen inside this section (or a subsection in case
    of nested sections).

  * The serialize() call is now const which prevents accidental state
    manipulation during serialization. Objects that rely on modifying
    state can use the serializeOld() call instead. The default
    implementation simply calls serialize(). Note: The old-style calls
    need to be explicitly called using the
    serializeOld()/serializeSectionOld() style APIs. These are used by
    default when serializing SimObjects.

  * Both the input and output checkpoints now use their own named
    types. This hides underlying checkpoint implementation from
    objects that need checkpointing and makes it easier to change the
    underlying checkpoint storage code.
2015-07-07 09:51:03 +01:00
Nikos Nikoleris 67925a8334 x86: Adjust the size of the values written to the x87 misc registers
All x87 misc registers are implemented in an array of 64 bit values
but in real hardware the size of some of these registers is smaller.
Previsouly all 64 bits where incorrectly set and then later read.  To
ensure correctness we mask the value in setMiscRegNoEffect to write
only the valid bits.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-07-04 10:43:47 -05:00
Andreas Hansson bd583d00f9 misc: Appease gcc 5.1
Three minor issues are resolved:

1. Apparently gcc 5.1 does not like negation of booleans followed by
   bitwise AND.

2. Somehow the compiler also gets confused and warns about
   NoopMachInst being unused (removing it causes compilation errors
   though). Most likely a compiler bug.

3. There seems to be a number of instances where loop unrolling causes
   false positives for the array-bounds check. For now, switch to
   std::array. Potentially we could disable the warning for newer gcc
   versions, but switching to std::array is probably a good move in
   any case.
2015-05-15 13:39:53 -04:00
Steve Reinhardt c65fa3dceb syscall_emul: fix warn_once behavior
The current ignoreWarnOnceFunc doesn't really work as expected,
since it will only generate one warning total, for whichever
"warn-once" syscall is invoked first.  This patch fixes that
behavior by keeping a "warned" flag in the SyscallDesc object,
allowing suitably flagged syscalls to warn exactly once per
syscall.
2015-05-05 09:25:59 -07:00
Andreas Sandberg 48281375ee mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

    * Speculation: CPU models are not allowed to issue speculative
      loads.

    * Write combining: CPU models and caches are not allowed to merge
      writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
2015-05-05 03:22:33 -04:00
Andreas Hansson 554ddc7c07 arch, cpu: Do not forward snoops to table walker
This patch simplifies the overall CPU by changing the TLB caches such
that they do not forward snoops to the table walker port(s). Note that
only ARM and X86 are affected.

There is no reason for the ports to snoop as they do not actually take
any action, and from a performance point of view we are better of not
snooping more than we have to.

Should it at a later point be required to snoop for a particular TLB
design it is easy enough to add it back.
2015-05-05 03:22:27 -04:00
Nilay Vaish ee06fed656 x86: change divide-by-zero fault to divide-error
Same exception is raised whether division with zero is performed or the
quotient is greater than the maximum value that the provided space can hold.
Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.
2015-04-29 22:35:22 -05:00
Andreas Hansson 179787f31f misc: Appease gcc 5.1 without moving GDB_REG_BYTES
This patch rolls back the move of the GDB_REG_BYTES constant, and
instead adds M5_VAR_USED.
2015-04-24 03:30:08 -04:00
Andreas Hansson c8c4f66889 misc: Appease gcc 5.1
This patch fixes a few small issues to ensure gem5 compiles when using
gcc 5.1.

First, the GDB_REG_BYTES in the RemoteGDB header are, rather
surprisingly, flagged as unused for both ARM and X86. Removing them,
however, causes compilation errors as they are actually used in the
source file. Moving the constant into the class definition fixes the
issue. Possibly a gcc bug.

Second, we have an unused EthPktData constructor using auto_ptr, and
the latter is deprecated. Since the code is never used it is simply
removed.
2015-04-23 13:37:46 -04:00
Brandon Potter 4991c29965 syscall_emul: implement clock_gettime system call 2015-04-22 07:51:27 -07:00
Monir Mozumder 00e3cab8fc syscall_emul: update x86 syscall table
Update table with additional definitions through Linux 3.13.
2015-04-22 07:51:27 -07:00
Nilay Vaish e596e52498 x86: implements x87 mult/div instructions 2015-04-13 17:33:57 -05:00
Lena Olson 333988a73e x86: fix debug trace output for mwait
When running with the Exec flag, the mwait instruction attempted
to print out its source registers, which were never actually
initialized. This led to sporadic assertion failures when the
value stored there was invalid.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-03 11:42:10 -05:00
Steve Reinhardt 6677b9122a mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW
Makes x86-style locked operations even more distinct from
LLSC operations.  Using "locked" by itself should be
obviously ambiguous now.
2015-03-23 16:14:20 -07:00
Andreas Hansson f26a289295 mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
2015-03-02 04:00:35 -05:00
Andreas Hansson d0e1b8a19c arch: Make readMiscRegNoEffect const throughout
Finally took the plunge and made this apply to all ISAs, not just ARM.
2015-02-16 03:33:28 -05:00
Marco Balboni 268d9e59c5 mem: Clarification of packet crossbar timings
This patch clarifies the packet timings annotated
when going through a crossbar.

The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.

The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.

For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.
2015-02-11 10:23:47 -05:00
Andreas Sandberg 550c318490 sim: Move the BaseTLB to src/arch/generic/
The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.

--HG--
rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py
rename : src/sim/tlb.cc => src/arch/generic/tlb.cc
rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-02-11 10:23:27 -05:00
Ali Saidi 0bd986015b cpu: Put all CPU instruction tracers in a single file 2015-01-25 07:22:17 -05:00
Andreas Hansson 10c69bb168 mem: Remove unused Packet src and dest fields
This patch takes the final step in removing the src and dest fields in
the packet. These fields were rather confusing in that they only
remember a single multiplexing component, and pushed the
responsibility to the bridge and caches to store the fields in a
senderstate, thus effectively creating a stack. With the recent
changes to the crossbar response routing the crossbar is now
responsible without relying on the packet fields. Thus, these
variables are now unused and can be removed.
2015-01-22 05:01:31 -05:00
Andreas Hansson ce12d4bc63 x86: Delay X86 table walk on receiving walker response
This patch fixes a minor issue in the X86 page table walker where it
ended up sending new request packets to the crossbar before the
response processing was finished (recvTimingResp is directly calling
sendTimingReq). Under certain conditions this caused the crossbar to
see illegal combinations of request/response overlap, in turn causing
problems with a slightly modified crossbar implementation.
2015-01-22 05:00:54 -05:00
Emilio Castillo 7bb65dd434 x86 : fxsave and fxrestore missing template code
This patch corrects the FXSAVE and FXRSTOR Macroops.  The actual code used for
saving/restore the FP registers is in the file but it was not used.

The FXSAVE and FXRSTOR instructions are used in the kernel for saving and
loading the state of the mmx,xmm and fpu registers.

This operation is triggered in FS by issuing a Device Not Available Fault.  The
cr0 register has a TS flag that is set upon each context change. Every time a
task access any FP related register (SIMD as well) if the TS flag is set to
one, the device not available fault is issued.  The kernel saves the current
state of the registers, and restore the previous state of the currently running
task.

Right now Gem5 lacks of this capability. the Device Not Available Fault is
never issued, leading to several problems when different threads share the same
CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this
behavior.

In order to test this a hack in the atomic cpu code was done to detect if a
static instruction has any FP operands and the cr0 reg TS bit is set.  This
check must be done in the ISA dependent code. But it seems to be tricky to
access the cr0 register while executing an instruction.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-10 14:30:53 -06:00
Gabe Black cd6380605c x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield.
These are for the monitor/mwait instructions, SSSE3, and XSAVE.
2015-01-06 22:15:00 -08:00
Gabe Black cb181d6f91 cpuid, x86: Revert "Enabling more features in CPUid"
That change enables CPUID bits for features that aren't implemented in gem5.
If a simulated system tries to use those features because it was told it
could, bad things can happen.
2015-01-06 22:13:56 -08:00
Maxime Martinasso 5a5416d575 x86: implements the simd128 ADDSUBPD instruction
This patch implements the simd128 ADDSUBPD instruction for the x86 architecture.

Tested with a simple program in assembly language which executes the
instruction.  Checked that different versions of the instruction are executed
by using the execution tracing option.

Committed by: Nilay Vaish <nilay@cs.wisc.edu
2015-01-03 17:51:48 -06:00