Commit graph

279 commits

Author SHA1 Message Date
Jason Lowe-Power 153e5879c6 x86: Fix implicit stack addressing in 64-bit mode
When in 64-bit mode, if the stack is accessed implicitly by an instruction
the alternate address prefix should be ignored if present.

This patch adds an extra flag to the ldstop which signifies when the
address override should be ignored. Then, for all of the affected
instructions, this patch adds two options to the ld and st opcode to use
the current stack addressing mode for all addresses and to ignore the
AddressSizeFlagBit.  Finally, this patch updates the x86 TLB to not
truncate the address if it is in 64-bit mode and the IgnoreAddrSizeFlagBit
is set.

This fixes a problem when calling __libc_start_main with a binary that is
linked with a recent version of ld. This version of ld uses the address
override prefix (0x67) on the call instruction instead of a nop.

Note: This has not been tested in compatibility mode and only the call
instruction with the address override prefix has been tested.

See [1] page 9 (pdf page 45)

For instructions that are affected see [1] page 519 (pdf page 555).

[1] http://support.amd.com/TechDocs/24594.pdf

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2017-02-10 11:19:34 -05:00
Tony Gutierrez 0799600686 x86: fix issue with casting in Cvtf2i
UBSAN flags this operation because it detects that arg is being cast directly
to an unsigned type, argBits. this patch fixes this by first casting the
value to a signed int type, then reintrepreting the raw bits of the signed
int into argBits.
2016-11-21 15:35:56 -05:00
Alexandru Dutu 0f27d70e90 x86: revamp cmpxchg8b/cmpxchg16b implementation
The previous implementation did a pair of nested RMW operations,
which isn't compatible with the way that locked RMW operations are
implemented in the cache models.  It was convenient though in that
it didn't require any new micro-ops, and supported cmpxchg16b using
64-bit memory ops.  It also worked in AtomicSimpleCPU where
atomicity was guaranteed by the core and not by the memory system.
It did not work with timing CPU models though.

This new implementation defines new 'split' load and store micro-ops
which allow a single memory operation to use a pair of registers as
the source or destination, then uses a single ldsplit/stsplit RMW
pair to implement cmpxchg.  This patch requires support for 128-bit
memory accesses in the ISA (added via a separate patch) to support
cmpxchg16b.
2016-02-06 17:21:20 -08:00
Steve Reinhardt 5200e04e92 arch, x86: add support for arrays as memory operands
Although the cache models support wider accesses, the ISA descriptions
assume that (for the most part) memory operands are integer types,
which makes it difficult to define instructions that do memory accesses
larger than 64 bits.

This patch adds some generic support for memory operands that are arrays
of uint64_t, and specifically a 'u2qw' operand type for x86 that is an
array of 2 uint64_ts (128 bits).  This support is unused at this point,
but will be needed shortly for cmpxchg16b.  Ideally the 128-bit SSE
memory accesses will also be rewritten to use this support.

Support for 128-bit accesses could also have been added using the gcc
__int128_t extension, which would have been less disruptive.  However,
although clang also supports __int128_t, it's still non-standard.
Also, more importantly, this approach creates a path to defining
256- and 512-byte operands as well, which will be useful for eventual
AVX support.
2016-02-06 17:21:20 -08:00
Steve Reinhardt dc8018a5c3 style: remove trailing whitespace
Result of running 'hg m5style --skip-all --fix-white -a'.
2016-02-06 17:21:18 -08:00
Steve Reinhardt 1b6355c895 cpu. arch: add initiateMemRead() to ExecContext interface
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.

This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.

This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().

For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level.  For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
2016-01-17 18:27:46 -08:00
Steve Reinhardt a2c875c746 x86: implement rcpps and rcpss SSE insts
These are packed single-precision approximate reciprocal operations,
vector and scalar versions, respectively.

This code was basically developed by copying the code for
sqrtps and sqrtss.  The mrcp micro-op was simplified relative to
msqrt since there are no double-precision versions of this operation.
2015-10-06 17:26:50 -07:00
Steve Reinhardt 57b9f53afa x86: implement fild, fucomi, and fucomip x87 insts
fild loads an integer value into the x87 top of stack register.
fucomi/fucomip compare two x87 register values (the latter
also doing a stack pop).
These instructions are used by some versions of GNU libstdc++.
2015-10-06 17:26:50 -07:00
Nilay Vaish ee06fed656 x86: change divide-by-zero fault to divide-error
Same exception is raised whether division with zero is performed or the
quotient is greater than the maximum value that the provided space can hold.
Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.
2015-04-29 22:35:22 -05:00
Steve Reinhardt 6677b9122a mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW
Makes x86-style locked operations even more distinct from
LLSC operations.  Using "locked" by itself should be
obviously ambiguous now.
2015-03-23 16:14:20 -07:00
Andreas Hansson a2d246b6b8 arch: Use shared_ptr for all Faults
This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
2014-10-16 05:49:51 -04:00
Mitch Hayenga afbae1ec95 x86: Flag instructions that call suspend as IsQuiesce
The o3 cpu relies upon instructions that suspend a thread context being
flagged as "IsQuiesce".  If they are not, unpredictable behavior can occur.
This patch fixes that for the x86 ISA.
2014-09-03 07:42:46 -04:00
Nilay Vaish 4ccdf8fb81 x86: set op class of two fp instructions
This patch sets op class of two fp instructions: movfp and pop x87 stack
as IntAluOp since these instructions do not make use of the fp alu.
2014-09-01 16:55:49 -05:00
Curtis Dunham fe27f937aa arch: teach ISA parser how to split code across files
This patch encompasses several interrelated and interdependent changes
to the ISA generation step.  The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units.  I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner.  It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type.  Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths.  In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
2014-05-09 18:58:47 -04:00
Curtis Dunham 7f1603d207 arch: remove inline specifiers on all inst constrs, all ISAs
With (upcoming) separate compilation, they are useless.  Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
2014-05-09 18:58:46 -04:00
Nilay Vaish 4eb3b1ed0b x86: correct error in emms instruction. 2014-01-27 18:50:51 -06:00
Andreas Sandberg 114b643dd0 x86: Add support for FXSAVE, FXSAVE64, FXRSTOR, and FXRSTOR64 2013-09-30 12:06:36 +02:00
Andreas Sandberg 654d1e675a x86: Add support for loading 32-bit and 80-bit floats in the x87
The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.

80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.
2013-09-30 12:00:20 +02:00
Andreas Sandberg c299dcedc6 x86: Fix re-entrancy problems in x87 store instructions
X87 store instructions typically loads and pops the top value of the
stack and stores it in memory. The current implementation pops the
stack at the same time as the floating point value is loaded to a
temporary register. This will corrupt the state of the x87 stack if
the store fails. This changeset introduces a pop87 micro-instruction
that pops the stack and uses this instruction in the affected
macro-instructions to pop the stack after storing the value to memory.
2013-09-30 11:51:25 +02:00
Andreas Sandberg d06064c386 x86: Add support for maintaining the x87 tag word
The current implementation of the x87 never updates the x87 tag
word. This is currently not a big issue since the simulated x87 never
checks for stack overflows, however this becomes an issue when
switching between a virtualized CPU and a simulated CPU. This
changeset adds support, which is enabled by default, for updating the
tag register to every floating point microop that updates the stack
top using the spm mechanism.

The new tag words is generated by the helper function
X86ISA::genX87Tags(). This function is currently limited to flagging a
stack position as valid or invalid and does not try to distinguish
between the valid, zero, and special states.
2013-06-18 16:36:08 +02:00
Andreas Sandberg a8e8c4f433 x86: Fix loading of floating point constants
This changeset actually fixes two issues:

 * The lfpimm instruction didn't work correctly when applied to a
   floating point constant (it did work for integers containing the
   bit string representation of a constant) since it used
   reinterpret_cast to convert a double to a uint64_t. This caused a
   compilation error, at least, in gcc 4.6.3.

 * The instructions loading floating point constants in the x87
   processor didn't work correctly since they just stored a truncated
   integer instead of a double in the floating point register. This
   changeset fixes the old microcode by using lfpimm instruction
   instead of the limm instructions.
2013-06-18 16:30:06 +02:00
Andreas Sandberg 5d584934ad x86: Make fprem like the fprem on a real x87
The current implementation of fprem simply does an fmod and doesn't
simulate any of the iterative behavior in a real fprem. This isn't
normally a problem, however, it can lead to problems when switching
between CPU models. If switching from a real CPU in the middle of an
fprem loop to a simulated CPU, the output of the fprem loop becomes
correupted. This changeset changes the fprem implementation to work
like the one on real hardware.
2013-06-18 16:10:42 +02:00
Andreas Sandberg de89e133d8 x86: Fix the flag handling code in FABS and FCHS
This changeset fixes two problems in the FABS and FCHS
implementation. First, the ISA parser expects the assignment in
flag_code to be a pure assignment and not an and-assignment, which
leads to the isa_parser omitting the misc reg update. Second, the FCHS
and FABS macro-ops don't set the SetStatus flag, which means that the
default micro-op version, which doesn't update FSW, is executed.
2013-06-18 16:10:21 +02:00
Nilay Vaish fba40864aa x86: add op class for int and fp microops in isa description
Currently all the integer microops are marked as IntAluOp and the floating
point microops are marked as FloatAddOp. This patch adds support for marking
different microops differently. Now IntMultOp, IntDivOp, FloatDivOp,
FloatMultOp, FloatCvtOp, FloatSqrtOp classes will be used as well. This will
help in providing different latencies for different op class.
2013-05-21 11:33:57 -05:00
Nilay Vaish 5c940fec0a x86: implement some of the x87 instructions
This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.
2013-03-11 13:15:46 -05:00
Nilay Vaish 7f5463539b x86: implements emms instruction 2013-01-15 07:43:20 -06:00
Nilay Vaish 91b00d98a5 x86: implement fabs, fchs instructions 2013-01-15 07:43:19 -06:00
Nilay Vaish 23ba6fc5fb x86: implement x87 fp instruction fsincos
This patch implements the fsincos instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
2012-12-30 12:45:45 -06:00
Nilay Vaish f47c2f6415 X86: make use of register predication
The patch introduces two predicates for condition code registers -- one
tests if a register needs to be read, the other tests whether a register
needs to be written to. These predicates are evaluated twice -- during
construction of the microop and during its execution. Register reads
and writes are elided depending on how the predicates evaluate.
2012-09-11 09:33:42 -05:00
Nilay Vaish 6369df59c8 x86: Add a separate register for D flag bit
The D flag bit is part of the cc flag bit register currently. But since it
is not being used any where in the implementation, it creates an unnecessary
dependency. Hence, it is being moved to a separate register.
2012-09-11 09:25:43 -05:00
Nilay Vaish 4d4d212ae9 X86: Split Condition Code register
This patch moves the ECF and EZF bits to individual registers (ecfBit and
ezfBit) and the CF and OF bits to cfofFlag registers. This is being done
so as to lower the read after write dependencies on the the condition code
register. Ultimately we will have the following registers [ZAPS], [OF],
[CF], [ECF], [EZF] and [DF]. Note that this is only one part of the
solution for lowering the dependencies. The other part will check whether
or not the condition code register needs to be actually read. This would
be done through a separate patch.
2012-05-22 11:29:53 -05:00
Andreas Hansson b6aa6d55eb clang/gcc: Fix compilation issues with clang 3.0 and gcc 4.6
This patch addresses a number of minor issues that cause problems when
compiling with clang >= 3.0 and gcc >= 4.6. Most importantly, it
avoids using the deprecated ext/hash_map and instead uses
unordered_map (and similarly so for the hash_set). To make use of the
new STL containers, g++ and clang has to be invoked with "-std=c++0x",
and this is now added for all gcc versions >= 4.6, and for clang >=
3.0. For gcc >= 4.3 and <= 4.5 and clang <= 3.0 we use the tr1
unordered_map to avoid the deprecation warning.

The addition of c++0x in turn causes a few problems, as the
compiler is more stringent and adds a number of new warnings. Below,
the most important issues are enumerated:

1) the use of namespaces is more strict, e.g. for isnan, and all
   headers opening the entire namespace std are now fixed.

2) another other issue caused by the more stringent compiler is the
   narrowing of the embedded python, which used to be a char array,
   and is now unsigned char since there were values larger than 128.

3) a particularly odd issue that arose with the new c++0x behaviour is
   found in range.hh, where the operator< causes gcc to complain about
   the template type parsing (the "<" is interpreted as the beginning
   of a template argument), and the problem seems to be related to the
   begin/end members introduced for the range-type iteration, which is
   a new feature in c++11.

As a minor update, this patch also fixes the build flags for the clang
debug target that used to be shared with gcc and incorrectly use
"-ggdb".
2012-04-14 05:43:31 -04:00
Gabe Black a7859f7e45 X86: Fix address size handling so real mode works properly.
Virtual (pre-segmentation) addresses are truncated based on address size, and
any non-64 bit linear address is truncated to 32 bits. This means that real
mode addresses aren't truncated down to 16 bits after their segment bases are
added in.
2012-03-31 12:27:33 -07:00
Gabe Black 559b43a372 X86: Use the M5PanicFault fault in execute methods instead of calling panic.
If an instruction is executed speculatively and hits a situation where it
wants to panic, it should return a fault instead. If the instruction was
misspeculated, the fault can be thrown away. If the instruction wasn't
misspeculated, the fault will be invoked and the panic will still happen.
2012-02-26 15:32:53 -08:00
Gabe Black 93fb460fad X86: Fix a bad segmentation check for the stack segment.
--HG--
extra : rebase_source : 755f4f6eae52f88ed516a1f1ac9e2565725d89c1
2011-12-01 00:17:14 -05:00
Nilay Vaish 582ea4d543 x86: Add microop for fence
This patch adds a new microop for memory barrier. The microop itself does
nothing, but since it is marked as a memory barrier, the O3 CPU should flush
all the pending loads and stores before the fence to the memory system.
2011-11-03 22:52:21 -05:00
Gabe Black d735abe5da GCC: Get everything working with gcc 4.6.1.
And by "everything" I mean all the quick regressions.
2011-10-31 01:09:44 -07:00
Gabe Black 997cbe1c09 ISA parser: Use '_' instead of '.' to delimit type modifiers on operands.
By using an underscore, the "." is still available and can unambiguously be
used to refer to members of a structure if an operand is a structure, class,
etc. This change mostly just replaces the appropriate "."s with "_"s, but
there were also a few places where the ISA descriptions where handling the
extensions themselves and had their own regular expressions to update. The
regular expressions in the isa parser were updated as well. It also now
looks for one of the defined type extensions specifically after connecting "_"
where before it would look for any sequence of characters after a "."
following an operand name and try to use it as the extension. This helps to
disambiguate cases where a "_" may legitimately be part of an operand name but
not separate the name from the type suffix.

Because leaving the "_" and suffix on the variable name still leaves a valid
C++ identifier and all extensions need to be consistent in a given context, I
considered leaving them on as a breadcrumb that would show what the intended
type was for that operand. Unfortunately the operands can be referred to in
code templates, the Mem operand in particular, and since the exact type of Mem
can be different for different uses of the same template, that broke things.
2011-09-26 23:48:54 -07:00
Gabe Black aade13769f ISA: Use readBytes/writeBytes for all instruction level memory operations. 2011-07-02 22:34:29 -07:00
Gabe Black 2f72d6a1f4 X86: Fix store microops so they don't drop faults in timing mode.
If a fault was returned by the CPU when a store initiated it's write, the
store instruction would ignore the fault. This change fixes that.
2011-07-02 22:31:22 -07:00
Gabe Black efb9f7c2ae X86: Eliminate an unused argument for building store microops. 2011-06-21 19:28:14 -07:00
Gabe Black 2e4fb3f139 X86: Mark IO reads and writes as non-speculative. 2011-03-01 22:42:59 -08:00
Gabe Black 72d35701e9 X86: Mark prefetches as such in their instruction and request flags. 2011-03-01 22:42:18 -08:00
Gabe Black fde8b5c387 X86: Get rid of "inline" on the MicroPanic constructor in decoder.cc.
This was making certain versions of gcc omit the function from the object file
which would break the build.
2011-02-15 15:58:16 -08:00
Gabe Black bce2be525d X86: Put the result used for flags in an intermediate variable.
Using the destination register directly causes the ISA parser to treat it as a
source even if none of the original bits are used.
2011-02-13 17:45:12 -08:00
Gabe Black 4e1adf85f7 X86: Don't read in dest regs if all bits are replaced.
In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or
64 bits wide overwrite all bits of the destination register. This change
removes false dependencies in these cases where the previous value of a
register doesn't need to be read to write a new value. New versions of most
microops are created that have a "Big" suffix which simply overwrite their
destination, and the right version to use is selected during microop
allocation based on the selected data size.

This does not change the performance of the O3 CPU model significantly, I
assume because there are other false dependencies from the condition code bits
in the flags register.
2011-02-13 17:44:24 -08:00
Gabe Black 1aa9698fa0 X86: Define fault objects to carry debug messages.
These faults can panic/warn/warn_once, etc., instead of instructions doing
that themselves directly. That way, instructions can be speculatively
executed, and only if they're actually going to commit will their fault be
invoked and the panic, etc., happen.
2011-02-13 17:42:05 -08:00
Brad Beckmann afd754dc0d x86: set IsCondControl flag for the appropriate microops 2011-02-06 22:14:16 -08:00
Gabe Black cb22bead7d X86: Get rid of the stupd microop. 2011-02-02 19:57:12 -08:00
Gabe Black d3e021820e X86: Take advantage of new PCState syntax. 2010-12-08 00:27:23 -08:00