Commit graph

2941 commits

Author SHA1 Message Date
Andreas Sandberg 9b7578d8c7 arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0
The aarch64 system register decoder is currently not decoding
PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the
decoder so that they are decoded using the values in table C5-6 in ARM
DDI 0478A.c.
2014-12-08 04:49:53 -05:00
Gabe Black 4a8a0a0798 misc: Generalize GDB single stepping.
The new single stepping implementation for x86 doesn't rely on any ISA
specific properties or functionality. This change pulls out the per ISA
implementation of those functions and promotes the X86 implementation to the
base class.

One drawback of that implementation is that the CPU might stop on an
instruction twice if it's affected by both breakpoints and single stepping.
While that might be a little surprising, it's harmless and would only happen
under somewhat unlikely circumstances.
2014-12-05 22:37:03 -08:00
Gabe Black fb07d43b1a x86: Implement a remote GDB stub.
This stub should allow remote debugging of 32 bit and 64 bit targets. Single
stepping seems to work, as do breakpoints. If both breakpoints and single
stepping affect an instruction, gdb will stop at the instruction twice before
continuing. That's a little surprising, but is generally harmless.
2014-12-05 22:36:16 -08:00
Gabe Black fe48c0a32b misc: Make the GDB register cache accessible in various sized chunks.
Not all ISAs have 64 bit sized registers, so it's not always very convenient
to access the GDB register cache in 64 bit sized chunks. This change makes it
accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations
were working around that limitation by bundling and unbundling 32 bit values
into 64 bit values. That code has been removed.
2014-12-05 01:44:24 -08:00
Gabe Black 22aaa5867f x86: Rework opcode parsing to support 3 byte opcodes properly.
Instead of counting the number of opcode bytes in an instruction and recording
each byte before the actual opcode, we can represent the path we took to get to
the actual opcode byte by using a type code. That has a couple of advantages.
First, we can disambiguate the properties of opcodes of the same length which
have different properties. Second, it reduces the amount of data stored in an
ExtMachInst, making them slightly easier/faster to create and process. This
also adds some flexibility as far as how different types of opcodes are
handled, which might come in handy if we decide to support VEX or XOP
instructions.

This change also adds tables to support properly decoding 3 byte opcodes.
Before we would fall off the end of some arrays, on top of the ambiguity
described above.

This change doesn't measureably affect performance on the twolf benchmark.

--HG--
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa
2014-12-04 15:53:54 -08:00
Gabe Black 3069c28a02 arch: Allow named constants as decode case values.
The values in a "bitfield" or in an ExtMachInst structure member may not be a
literal value, it might select from an arbitrary collection of options. Instead
of using the raw value of those constants in the decoder, it's easier to tell
what's going on if they can be referred to as a symbolic constant/enum.

To support that, the ISA description language is extended slightly so that in
addition to integer literals, the case value for decode blobs can also be a
string literal. It's up to the ISA author to ensure that the string evaluates
to a legal constant value when interpretted as C++.
2014-12-04 15:52:48 -08:00
Gabe Black d67cf81f5d x86: Clean up style in process.cc. 2014-12-02 22:01:51 -08:00
Andrew Bardsley 3cd0b1f6a6 arm: Fix TLB ignoring faults when table walking
This patch fixes a case where the Minor CPU can deadlock due to the lack
of a response to TLB request because of a bug in fault handling in the ARM
table walker.

TableWalker::processWalkWrapper is the scheduler-called wrapper which
handles deferred walks which calls to TableWalker::wait cannot immediately
process.  The handling of faults generated by processWalk{AArch64,LPAE,}
calls in those two functions is is different.  processWalkWrapper ignores
fault returns from processWalk... which can lead to ::finish not being
called on a translation.

This fix provides fault handling in processWalkWrapper similar to that
found in the leaf functions which BaseTLB::Translation::finish.
2014-12-02 06:08:11 -05:00
Andreas Hansson 74bbe20141 cpu: Always mask the snoop address when performing lock check
Ensure the snoop address check is always using a cache-block aligned
address. This patch updates Alpha and Mips to match the other ISAs.
2014-12-02 06:08:00 -05:00
Alexandru Dutu 1f539f13c3 mem: Page Table map api modification
This patch adds uncacheable/cacheable and read-only/read-write attributes to
the map method of PageTableBase. It also modifies the constructor of TlbEntry
structs for all architectures to consider the new attributes.
2014-11-23 18:01:09 -08:00
Alexandru Dutu f743bdcb69 x86: Segment initialization to support KvmCPU in SE
This patch sets up low and high privilege code and data segments and places them
in the following order: cs low, ds low, ds, cs, in the GDT. Additionally, a
syscall and page fault handler for KvmCPU in SE mode are defined. The order of
the segment selectors in GDT is required in this manner for interrupt handling
to work properly. Segment initialization is done for all the thread
contexts.
2014-11-23 18:01:08 -08:00
Alexandru Dutu adbaa4dfde kvm, x86: Adding support for SE mode execution
This patch adds methods in KvmCPU model to handle KVM exits caused by syscall
instructions and page faults. These types of exits will be encountered if
KvmCPU is run in SE mode.
2014-11-23 18:01:08 -08:00
Alexandru Dutu 335514dfdc cpuid, x86: Enabling more features in CPUid
Adding more features in the CPUid with the purpose of supporting running the
KvmCPU in SE mode.
2014-11-23 18:01:08 -08:00
Gabe Black aceeecb192 x86: Fix setting segment bases in real mode.
The data size used for actually writing the base value for the segment was the
default size, but really it should set the entire value without any possible
truncation.
2014-11-17 01:00:53 -08:00
Gabe Black f8603fa120 x86: Fix some bugs in the real mode far jmp instruction.
The far pointer should be shifted right to get the selector value, not left.
Also, when calculating the width of the offset, the wrong register was used in
one spot.
2014-11-17 00:20:01 -08:00
Gabe Black 7739c24fbe x86: APIC: Only set deliveryStatus if our IPI is going somewhere.
Otherwise the IPI which isn't sent will never arrive, and the deliveryStatus
bit will never be cleared.
2014-11-17 00:19:07 -08:00
Gabe Black 79e7ca307e x86: APIC: Fix the getRegArrayBit function.
The getRegArrayBit function extracts a bit from a series of registers which
are treated as a single large bit array. A previous change had modified the
logic which figured out which bit to extract from ">> 5" to "% 5" which seems
wrong, especially when other, similar functions were changed to use "% 32".
2014-11-17 00:17:06 -08:00
Gabe Black d228db1143 x86: Fix the CPUID Long Mode Address Size function.
The value in EAX has an 8 bit field for the linear address size and one for
the physical address size when calling that function. A recent change
implemented it but returned 0xff for both of those fields. That implies that
linear and physical addresses are 255 bits wide which is wrong. When using the
KVM CPU model this causes an error, presumably because some of those bits are
actually reserved, or the CPU or kernel realizes 255 bits is a bad value.

This change makes those values 48.
2014-11-16 23:12:42 -08:00
Andreas Hansson 481eb6ae80 arm: Fixes based on UBSan and static analysis
Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
2014-11-14 03:53:51 -05:00
Marc Orr bf80734b2c x86 isa: This patch attempts an implementation at mwait.
Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
   evicted from the sleeping cpu's cache. This eviction is forwarded to the
   sleeping cpu, which then wakes up.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-11-06 05:42:22 -06:00
Ali Saidi 7a0bf814b6 automated merge 2014-10-29 23:22:26 -05:00
Ali Saidi f2db2a96d1 arm, tests: Update config files to more recent kernels and create 64-bit regressions.
This changes the default ARM system to a Versatile Express-like system that supports
2GB of memory and PCI devices and updates the default kernels/file-systems for
AArch64 ARM systems (64-bit) to support up to 32GB of memory and PCI devices. Some
platforms that are no longer supported have been pruned from the configuration files.

In addition a set of 64-bit ARM regressions have been added to the regression system.
2014-10-29 23:18:27 -05:00
Ali Saidi b31d9e93e2 arm, mem: Fix drain bug and provide drain prints for more components. 2014-10-29 23:18:26 -05:00
Ali Saidi baf88e908d arm: Fix multi-system AArch64 boot w/caches.
Automatically extract cpu release address from DTB file.
Check SCTLR_EL1 to verify all caches are enabled.
2014-10-29 23:18:26 -05:00
Ali Saidi 9900629f83 arm: Mark some miscregs (timer counter) registers at unverifiable.
The checker can't verify timer registers, so it should just grab the version
from the executing CPU, otherwise it could get a larger value and diverge
execution.
2014-10-29 23:18:24 -05:00
Nilay Vaish 6523aad25c sim: revert 6709bbcf564d
The identifier SYS_getdents is not available on Mac OS X.  Therefore, its use
results in compilation failure.  It seems there is no straight forward way to
implement the system call getdents using readdir() or similar C functions.
Hence the commit 6709bbcf564d is being rolled back.
2014-10-22 15:59:57 -05:00
Andreas Hansson d6f1c6ce89 x86: Fixes to avoid LTO warnings
This patch fixes a few minor issues that caused link-time warnings
when using LTO, mainly for x86. The most important change is how the
syscall array is created. Previously gcc and clang would complain that
the declaration and definition types did not match. The organisation
is now changed to match how it is done for ARM, moving the code that
was previously in syscalls.cc into process.cc, and having a class
variable pointing to the static array.

With these changes, there are no longer any warnings using gcc 4.6.3
with LTO.
2014-10-20 18:03:56 -04:00
Michael Adler a3fe4c0662 sim: implement getdents/getdents64 in user mode
Has been tested only for alpha.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-10-20 16:44:53 -05:00
Andreas Hansson a2d246b6b8 arch: Use shared_ptr for all Faults
This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
2014-10-16 05:49:51 -04:00
Andreas Hansson 2475862747 arch,x86,mem: Dynamically determine the ISA for Ruby store check
This patch makes the memory system ISA-agnostic by enabling the Ruby
Sequencer to dynamically determine if it has to do a store check. To
enable this check, the ISA is encoded as an enum, and the system
is able to provide the ISA to the Sequencer at run time.

--HG--
rename : src/arch/x86/insts/microldstop.hh => src/arch/x86/ldstflags.hh
2014-10-16 05:49:44 -04:00
Andreas Sandberg 37908d62a4 arm: Add helper methods to setup architected PMU events 2014-10-16 05:49:42 -04:00
Andreas Sandberg 9d35d48e84 arm: Add TLB PMU probes
This changeset adds probe points that can be used to implement PMU
counters for TLB stats. The following probes are supported:

* ArmISA::TLB::ppRefills / TLB Refills (TLB insertions)
2014-10-16 05:49:41 -04:00
Andreas Sandberg 3697990c27 arm: Add a model of an ARM PMUv3
This class implements a subset of the ARM PMU v3 specification as
described in the ARMv8 reference manual. It supports most of the
features of the PMU, however the following features are known to be
missing:

 * Event filtering (e.g., from different privilege levels).
 * Access controls (the PMU currently ignores the execution level).
 * The chain counter (event no. 0x1E) is unimplemented.

The PMU itself does not implement any events, it merely provides an
interface for the configuration scripts to hook up probes that drive
events. Configuration scripts should call addEventProbe() to configure
custom events or high-level methods to configure architected
events. The Python implementation of addEventProbe() automatically
delays event type registration until after instantiation.

In order to support CPU switching and some combined counters (e.g.,
memory references synthesized from loads and stores), the PMU allows
multiple probes per event type. When creating a system that switches
between CPU models that share the same PMU, PMU events for all of the
CPU models can be registered with the PMU.

Kudos to Matt Horsnell for the initial gem5 implementation of the PMU.
2014-10-16 05:49:39 -04:00
Akash Bagdia 8b7724d04c arm: Don't speculatively access most miscregisters.
Speculative exeuction can cause panics in detailed execution mode that
shouldn't happen.
2014-09-02 11:26:32 +01:00
Jiuyue Ma 9fb8b8515b x86: add LongModeAddressSize function to cpuid
LongModeAddressSize was used by kernel 2.6.28.4 for physical address
validation, if not properly implemented, PCI resource allocation may
failed because of ioremap failed:

- linux-2.6.28.4/arch/x86/mm/ioremap.c:27-30
  27 static inline int phys_addr_valid(unsigned long addr)
  28 {
  29     return addr < (1UL << boot_cpu_data.x86_phys_bits);
  30 }

- linux-2.6.28.4/arch/x86/kernel/cpu/common.c:475-482
 475 #ifdef CONFIG_X86_64
 476         if (c->extended_cpuid_level >= 0x80000008) {
 477                 u32 eax = cpuid_eax(0x80000008);
 478
 479                 c->x86_virt_bits = (eax >> 8) & 0xff;
 480                 c->x86_phys_bits = eax & 0xff;
 481         }
 482 #endif

- linux-2.6.28.4/arch/x86/mm/ioremap.c:209-214
 209 if (!phys_addr_valid(phys_addr)) {
 210     printk(KERN_WARNING "ioremap: invalid physical address %llx\n",
 211            (unsigned long long)phys_addr);
 212     WARN_ON_ONCE(1);
 213     return NULL;
 214 }

This patch return 0x0000ffff for LongModeAddressSize, which guarantee phys_addr_valid never failed.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-06-13 16:48:47 +08:00
Andreas Hansson b520223699 arm: Use MiscRegIndex rather than int when flattening
Some additional type checking to avoid future issues.
2014-10-01 08:05:52 -04:00
Andreas Hansson 10f82934be arm: More UBSan cleanups after additional full-system runs
Some incorrect casting to IntRegIndex, and a few uninitialized members
in the i8254xGBe device.
2014-10-01 08:05:51 -04:00
Andreas Hansson ec41000dad arm: Fixed undefined behaviours identified by gcc
This patch fixes the runtime errors highlighted by the undefined
behaviour sanitizer. In the end there were two issues. First, when
rotating an immediate, we ended up shifting an uint32_t by 32 in some
cases. This case is fixed by checking for a rotation by 0
positions. Second, the Mrc15 and Mcr15 are operating on an IntReg and
a MiscReg, but we used the type RegRegImmOp and passed a MiscRegIndex
as an IntRegIndex. This issue is resolved by introducing a
MiscRegRegImmOp and RegMiscRegImmOp with the appropriate types.

With these fixes there are no runtime errors identified for the full
ARM regressions.
2014-09-27 09:08:37 -04:00
Andreas Hansson 341dbf2662 arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
2014-09-27 09:08:36 -04:00
Andreas Hansson deb2200671 scons: Address issues related to gcc 4.9.1
Fix a number few minor issues to please gcc 4.9.1. Removing the
'-fuse-linker-plugin' flag means no libraries are part of the LTO
process, but hopefully this is an acceptable loss, as the flag causes
issues on a lot of systems (only certain combinations of gcc, ld and
ar work).
2014-09-27 09:08:34 -04:00
Mitch Hayenga e1403fc2af alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
2014-09-20 17:18:35 -04:00
Andreas Hansson 1f6d5f8f84 mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.

As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.

--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
2014-09-20 17:18:32 -04:00
Andreas Hansson 41fc8a573e arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
2014-09-19 10:35:18 -04:00
Andrew Bardsley c8b919aba2 style: Fix line continuation, especially in debug messages
This patch closes a number of space gaps in debug messages caused by
the incorrect use of line continuation within strings. (There's also
one consistency change to a similar, but correct, use of line
continuation)
2014-09-12 10:22:47 -04:00
Mitch Hayenga 8f95144e16 arm: Make memory ops work on 64bit/128-bit quantities
Multiple instructions assume only 32-bit load operations are available,
this patch increases load sizes to 64-bit or 128-bit for many load pair and
load multiple instructions.
2014-09-03 07:42:52 -04:00
Mitch Hayenga afbae1ec95 x86: Flag instructions that call suspend as IsQuiesce
The o3 cpu relies upon instructions that suspend a thread context being
flagged as "IsQuiesce".  If they are not, unpredictable behavior can occur.
This patch fixes that for the x86 ISA.
2014-09-03 07:42:46 -04:00
Mitch Hayenga bb1e6cf7c4 arm: Fix v8 neon latency issue for loads/stores
Neon memory ops that operate on multiple registers currently have very poor
performance because of interleave/deinterleave micro-ops.

This patch marks the deinterleave/interleave micro-ops as "No_OpClass" such
that they take minumum cycles to execute and are never resource constrained.

Additionaly the micro-ops over-read registers.  Although one form may need
to read up to 20 sources, not all do.  This adds in new forms so false
dependencies are not modeled.  Instructions read their minimum number of
sources.
2014-09-03 07:42:44 -04:00
Curtis Dunham 4a3f11149d arm: use condition code registers for ARM ISA
Analogous to ee049bf (for x86).  Requires a bump of the checkpoint version
and corresponding upgrader code to move the condition code register values
to the new register file.
2014-04-29 16:05:02 -05:00
Andrew Bardsley 035a82ee2c arm: ISA X31 destination register fix
This patch substituted the zero register for X31 used as a
destination register.  This prevents false dependencies based on
X31.
2014-09-03 07:42:43 -04:00
Mitch Hayenga 476c6fe368 arm: Mark v7 cbz instructions as direct branches
v7 cbz/cbnz instructions were improperly marked as indirect branches.
2014-09-03 07:42:40 -04:00
Mitch Hayenga fd722946dd arch: Properly guess OpClass from optional StaticInst flags
isa_parser.py guesses the OpClass if none were given based upon the StaticInst
flags.  The existing code does not take into account optionally set flags.
This code hoists the setting of optional flags so OpClass is properly assigned.
2014-09-03 07:42:32 -04:00
Curtis Dunham 12210ada54 arm: support 16kb vm granules 2014-05-27 11:00:56 -05:00
Andreas Sandberg 326662b01b arch, cpu: Factor out the ExecContext into a proper base class
We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.

The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.
2014-09-03 07:42:22 -04:00
Andreas Hansson e1ac962939 arch: Cleanup unused ISA traits constants
This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
2014-09-03 07:42:21 -04:00
Mitch Hayenga 23c8540756 config: Change parsing of Addr so hex values work from scripts
When passed from a configuration script with a hexadecimal value (like
"0x80000000"), gem5 would error out. This is because it would call
"toMemorySize" which requires the argument to end with a size specifier (like
1MB, etc).

This modification makes it so raw hex values can be passed through Addr
parameters from the configuration scripts.
2014-09-03 07:42:20 -04:00
Andreas Hansson 1046b8d6e5 arm: Fix ExtMachInst hash operator underlying type
This patch fixes the hash operator used for ARM ExtMachInst, which
incorrectly was still using uint32_t. Instead of changing it to
uint64_t it is not using the underlying data type of the BitUnion.
2014-09-03 07:42:19 -04:00
Nilay Vaish 4ccdf8fb81 x86: set op class of two fp instructions
This patch sets op class of two fp instructions: movfp and pop x87 stack
as IntAluOp since these instructions do not make use of the fp alu.
2014-09-01 16:55:49 -05:00
Alexandru 5efbb4442a mem: adding architectural page table support for SE mode
This patch enables the use of page tables that are stored in system memory
and respect x86 specification, in SE mode. It defines an architectural
page table for x86 as a MultiLevelPageTable class and puts a placeholder
class for other ISAs page tables, giving the possibility for future
implementation.
2014-08-28 10:11:44 -05:00
Andreas Sandberg 70176fecd1 base: Replace the internal varargs stuff with C++11 constructs
We currently use our own home-baked support for type-safe variadic
functions. This is confusing and somewhat limited (e.g., cprintf only
supports a limited number of arguments). This changeset converts all
uses of our internal varargs support to use C++11 variadic macros.
2014-08-26 10:13:45 -04:00
Mitch Hayenga 0da99b7e0c mips: Fix RLIMIT_RSS naming
MIPS defined RLIMIT_RSS in a way that could cause a naming conflict with
RLIMIT_RSS from the host system.  Broke clang+MacOS build.
2014-08-26 10:13:31 -04:00
Andreas Sandberg a3d3eb0ff7 sparc: Fixup bit ordering in the PSTATE bit union
The order of the MSB and LSB bit of the mm field in the PSTATE union
is wrong. Any access to this field will currently be ignored and reads
will always return zero. This patch fixes the ordering so it is <MSB,
LSB> instead of <LSB, MSB>.
2014-08-26 10:13:23 -04:00
Dam Sunwoo b04d6c7c33 arm: change MISCREG_L2ERRSR to warn not fail
Some newer binaries compiled for Versatile Express TC2 contain access
to implementation specific L2MERRSR registers. This causes an infinite
loop of undefined exceptions. This patch changes the behavior to "warn
not fail" to keep the workloads going.
2014-08-13 06:57:36 -04:00
Andreas Sandberg 8b8d991df0 mips: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
MIPS-specific classes to silence the warning.
2014-08-13 06:57:30 -04:00
Andreas Sandberg 8d04e32a83 power: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
POWER-specific ProcessInfo struct to silence the warning.
2014-08-13 06:57:29 -04:00
Curtis Dunham 94daae6864 arm: remove dead code fplib mul64x64 2014-03-11 09:50:02 -05:00
Stephan Diestelhorst 65cea4708e power: Add basic DVFS support for gem5
Adds DVFS capabilities to gem5, by allowing users to specify lists for
frequencies and voltages in SrcClockDomains and VoltageDomains respectively.
A separate component, DVFSHandler, provides a small interface to change
operating points of the associated domains.

Clock domains will be linked to voltage domains and thus allow separate clock,
but shared voltage lines.

Currently all the valid performance-level updates are performed with a fixed
transition latency as specified for the domain.

Config file example:
...
vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V'])
tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster1.clk_domain.domain_id = 0
tsys.cluster2.clk_domain.domain_id = 1
tsys.cluster1.clk_domain.voltage_domain = vd
tsys.cluster2.clk_domain.voltage_domain = vd
tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain,
                             tsys.cluster2.clk_domain]
tsys.dvfs_handler.enable = True
2014-06-30 13:56:06 -04:00
Binh Pham b085db84af x86: fix table walker assertion
In a cycle, we could see a R and W requests corresponding to the same
page walk being sent to the memory. During the cycle that assertion
happens, we have 2 responses corresponding to the R and W above. We
also have a 'read' variable to keep track of the inflight Read
request, this gets reset to NULL right after we send out any R
request; and gets set to the next R in the page walk when a response
comes back.

The issue we are seeing here is when we get a response for W request,
assert(!read) fires because we got a response for R request right
before this, hence we set 'read' to NOT NULL value, pointing to the
next R request in the pagewalk!

This work was done while Binh was an intern at AMD Research.
2014-06-21 10:39:44 -07:00
Steve Reinhardt 0be64ffe2f style: eliminate equality tests with true and false
Using '== true' in a boolean expression is totally redundant,
and using '== false' is pretty verbose (and arguably less
readable in most cases) compared to '!'.

It's somewhat of a pet peeve, perhaps, but I had some time
waiting for some tests to run and decided to clean these up.

Unfortunately, SLICC appears not to have the '!' operator,
so I had to leave the '== false' tests in the SLICC code.
2014-05-31 18:00:23 -07:00
Steve Reinhardt 109908c2a6 syscall emulation: clean up & comment SyscallReturn 2014-05-12 14:23:31 -07:00
Ali Saidi dbaf43394b arm: Make sure UndefinedInstructions are properly initialized 2014-04-17 16:56:09 -05:00
Ali Saidi a00b44ebe8 arm: allow DC instructions by default so SE mode works 2014-04-17 16:55:54 -05:00
Ali Saidi c4a2f76fea sim, arm: implement more of the at variety syscalls
Needed for new AArch64 binaries
2014-04-17 16:55:05 -05:00
Andrew Bardsley bf78299f04 cpu: Add flag name printing to StaticInst
This patch adds a the member function StaticInst::printFlags to allow all
of an instruction's flags to be printed without using the individual
is... member functions or resorting to exposing the 'flags' vector

It also replaces the enum definition StaticInst::Flags with a
Python-generated enumeration and adds to the enum generation mechanism
in src/python/m5/params.py to allow Enums to be placed in namespaces
other than Enums or, alternatively, in wrapper structs allowing them to
be inherited by other classes (so populating that class's name-space
with the enumeration element names).
2014-05-09 18:58:47 -04:00
Andrew Bardsley f7d80348fa arm: Add branch flags onto macroops
Mark branch flags onto macroops to allow branch prediction before
microop decomposition
2014-05-09 18:58:47 -04:00
Curtis Dunham af39ab297f arm: add preliminary ISA splits for ARM arch 2014-05-09 18:58:47 -04:00
Curtis Dunham fe27f937aa arch: teach ISA parser how to split code across files
This patch encompasses several interrelated and interdependent changes
to the ISA generation step.  The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units.  I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner.  It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type.  Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths.  In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
2014-05-09 18:58:47 -04:00
Geoffrey Blake 85940fd537 arch, arm: Preserve TLB bootUncacheability when switching CPUs
The ARM TLBs have a bootUncacheability flag used to make some loads
and stores become uncacheable when booting in FS mode. Later the
flag is cleared to let those loads and stores operate as normal.  When
doing a takeOverFrom(), this flag's state is not preserved and is
momentarily reset until the CPSR is touched. On single core runs this
is a non-issue. On multi-core runs this can lead to crashes on the O3
CPU model from the following series of events:
 1) takeOverFrom executed to switch from Atomic -> O3
 2) All bootUncacheability flags are reset to true
 3) Core2 tries to execute a load covered by bootUncacheability, it
    is flagged as uncacheable
 4) Core2's load needs to replay due to a pipeline flush
 3) Core1 core does an action on CPSR
 4) The handling code for CPSR then checks all other cores
    to determine if bootUncacheability can be set to false
 5) Asynchronously set bootUncacheability on all cores to false
 6) Core2 replays load previously set as uncacheable and notices
    it is now flagged as cacheable, leads to a panic.
This patch implements takeOverFrom() functionality for the ARM TLBs
to preserve flag values when switching from atomic -> detailed.
2014-05-09 18:58:47 -04:00
Akash Bagdia 2b1a01ee6c cpu, arm: Allow the specification of a socket field
Allow the specification of a socket ID for every core that is reflected in the
MPIDR field in ARM systems.  This allows studying multi-socket / cluster
systems with ARM CPUs.
2014-05-09 18:58:46 -04:00
Geoffrey Blake 29601eada7 arm: Panics in miscreg read functions can be tripped by O3 model
Unimplemented miscregs for the generic timer were guarded by panics
in arm/isa.cc which can be tripped by the O3 model if it speculatively
executes a wrong path containing a mrs instruction with a bad miscreg
index. These registers were flagged as implemented and accessible.
This patch changes the miscreg info bit vector to flag them as
unimplemented and inaccessible. In this case, and UndefinedInst
fault will be generated if the register access is not trapped
by a hypervisor.
2014-05-09 18:58:46 -04:00
Curtis Dunham 7f1603d207 arch: remove inline specifiers on all inst constrs, all ISAs
With (upcoming) separate compilation, they are useless.  Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
2014-05-09 18:58:46 -04:00
Curtis Dunham eb61f0123b arm: cleanup ARM ISA definition 2014-05-09 18:58:46 -04:00
Curtis Dunham ecf774bc56 arm: Correctly display disassembly of vldmia/vstmia
The MicroMemOp class generates the disassembly for both integer
and floating point instructions, but it would always print its
first operand as an integer register without considering that the
op may be a floating instruction in which case a float register
should be displayed instead.
2014-04-23 05:18:30 -04:00
Mitchell Hayenga 0fad0c7f7d arm: Don't use a stack allocated mnemonic
FailUnimplemented passed a stack created mnemonic as a const char * which
causes some grief when the stack goes away.
2014-04-23 05:18:20 -04:00
Curtis Dunham e651188f75 arch: remove 'null update' check in isa-parser
SCons already does this for all build steps.
2014-04-23 05:17:57 -04:00
Eric Van Hensbergen 7630168a75 arm: m5ops readfile64 args broken, offset coming through garbage
There were several sections of the m5ops code which were
essentially copy/pasted versions of the 32-bit code. The
problem is that some of these didn't account fo4 64-bit
registers leading to arguments being in the wrong registers.
This patch addresses the args for readfile64, writefile64,
and addsymbol64 -- all of which seemed to suffer from a
similar set of problems when moving to 64-bit.
2014-03-23 11:11:34 -04:00
Andreas Sandberg f791e7b313 kvm: x86: Add support for x86 INIT and STARTUP handling
This changeset adds support for INIT and STARTUP IPI handling. We
currently handle both of these interrupts in gem5 and transfer the
state to KVM. Since we do not have a BIOS loaded, we pretend that the
INIT interrupt suspends the CPU after reset.

--HG--
extra : rebase_source : 7f3b25f3801d68f668b6cd91eaf50d6f48ee2a6a
2014-03-16 17:28:23 +01:00
Paul Rosenfeld 32bf74cb8e alpha: Small removal of dead comments/code from alpha ISA
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-12 07:03:22 -05:00
Geoffrey Blake c4a8e5c36c arm: Handle functional TLB walks properly
The table walker code currently accounts for two types of walks,
Atomic and Timing, and treats them differently. Atomic walks keep a
single instance of WalkerState around for all walks to use in
currState. Timing mode keeps a queue of in-flight WalkerStates and
maintains currState as NULL between walks.

If a functional walk is done during Timing mode, it is treated as an
atomic walk and either creates a persistent WalkerState if in between
Timing walks, or stomps an existing currState for an in-progress
Timing walk.

This patch distinguishes functional walks as being able to exist at
any time and sets up a temporary WalkerState for its exclusive use and
then cleans up when finished, leaving any in progress Atomic or Timing
walks undisturbed.
2014-03-07 15:56:23 -05:00
Mitch Hayenga b9a9d99b22 scons: Fixes uninitialized warnings issued by clang
Small fixes to appease recent clang versions.
2014-03-07 15:56:23 -05:00
Stephan Diestelhorst bef2086f5b arm: Fix uninitialised warning with gcc 4.8
Small fix for a warning that prevents compilation with gcc 4.8.1 due
to detecting that a variable might be uninitialised. The fix is to
assign a safe default.
2014-03-07 15:56:23 -05:00
Ali Saidi bf39a475fe mem: Wakeup sleeping CPUs without caches on LLSC
For systems without caches, the LLSC code does not get snoops for
wake-ups. We add the LLSC code in the abstract memory to do the job
for us.
2014-03-07 15:56:23 -05:00
Andreas Sandberg be246cef62 x86: Setup correct TSL/TR segment attributes on INIT
The TSL/LDT & TR/TSS segments didn't contain valid attributes. This
caused problems when transfering the state into KVM where invalid
state is a no-go. Fixup the attributes with values from AMD's
architecture programmer's manual.
2014-03-03 14:44:57 +01:00
Christopher Torng 919baa603d cpu: Enable fast-forwarding for MIPS InOrderCPU and O3CPU
A copyRegs() function is added to MIPS utilities
to copy architectural state from the old CPU to
the new CPU during fast-forwarding. This
addition alone enables fast-forwarding for the
o3 cpu model running MIPS.

The patch also adds takeOverFrom() and
drainResume() functions to the InOrderCPU to
enable it to take over from another CPU. This
change enables fast-forwarding for the inorder
cpu model running MIPS, but not for Alpha.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-01 23:35:23 -06:00
Andreas Sandberg e76a37985f x86: Fix x87 state transfer bug
Changeset 7274310be1bb (isa: clean up register constants) increased
the value of NumFloatRegs, which triggered a bug in
X86ISA::copyRegs(). This bug is caused by the x87 stack being copied
twice since register indexes past NUM_FLOATREGS are mapped into the
x87 stack relative to the top of the stack, which is undefined when
the copy takes place.

This changeset updates the copyRegs() function to use access registers
using the non-flattening interface, which guarantees that undesirable
register folding does not happen.
2014-02-05 14:08:13 +01:00
Nikos Nikoleris c6279f2d19 x86, kvm: Fix bug in the RFlags get and set functions
The getRFlags and setRFlags utility functions were not updated
correctly when condition registers were separated into their own
register class. This lead to incorrect state transfer in calls from
kvm into the simulator (e.g., m5 readfile ended up in an infinite
loop) and when switching CPUs. This patch makes these utility
functions use getCCReg and setCCReg instead of getIntReg and setIntReg
which read and write the integer registers.

Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>
2014-02-02 16:37:35 +01:00
Mitch Hayenga b77ca57f8c arm: Enable umask syscall in SE mode
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:51 -06:00
Nilay Vaish bdee69d0b1 x86: use lfpimm instead of limm for fptan 2014-01-27 18:50:54 -06:00
Nilay Vaish 6a543b5134 x86: implements x87 add/sub instructions 2014-01-27 18:50:53 -06:00
Nilay Vaish 5be0b846b1 x86: implements fxch instruction. 2014-01-27 18:50:52 -06:00
Nilay Vaish 4eb3b1ed0b x86: correct error in emms instruction. 2014-01-27 18:50:51 -06:00
ARM gem5 Developers 612f8f074f arm: Add support for ARMv8 (AArch64 & AArch32)
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli    (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt       (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole           (AArch64 NEON, validation)
Ali Saidi            (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang         (AArch64 Linux support)
Rene De Jong         (AArch64 Linux support, performance opt.)
Matt Horsnell        (AArch64 MP, validation)
Matt Evans           (device models, code integration, validation)
Chris Adeniyi-Jones  (AArch64 syscall-emulation)
Prakash Ramrakhyani  (validation)
Dam Sunwoo           (validation)
Chander Sudanthi     (validation)
Stephan Diestelhorst (validation)
Andreas Hansson      (code integration, performance opt.)
Eric Van Hensbergen  (performance opt.)
Gabe Black
2014-01-24 15:29:34 -06:00
Andreas Hansson cfc4a99982 arch: Make all register index flattening const
This patch makes all the register index flattening methods const for
all the ISAs. As part of this, readMiscRegNoEffect for ARM is also
made const.
2014-01-24 15:29:30 -06:00
Ali Saidi 7d0344704a arch, cpu: Add support for flattening misc register indexes.
With ARMv8 support the same misc register id  results in accessing different
registers depending on the current mode of the processor. This patch adds
the same orthogonality to the misc register file as the others (int, float, cc).
For all the othre ISAs this is currently a null-implementation.

Additionally, a system variable is added to all the ISA objects.
2014-01-24 15:29:30 -06:00
Ali Saidi 6bed6e0352 cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped.
This patch add support for generating wake-up events in the CPU when an address
that is currently in the exclusive state is hit by a snoop. This mechanism is required
for ARMv8 multi-processor support.
2014-01-24 15:29:30 -06:00
Ali Saidi 904872a01a mem: Remove explict cast from memhelper.
Previously we were casting the result type to the the memory type which
is incorrect for things like dual-memory operations which still return a
single result.
2014-01-24 15:29:30 -06:00
Dam Sunwoo 85e8779de7 mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks.  Cache occupancy stats are
recalculated on each stat dump.
2014-01-24 15:29:30 -06:00
Andreas Hansson f2b0b551cc x86: Fix memory leak in table walker
This patch fixes a memory leak in the table walker, by ensuring that
the sender state is deleted again if the request packet cannot be
successfully sent.
2014-01-24 15:29:29 -06:00
Christopher Torng 903b442228 mips: Floating point convert bug fix
In mips architecture, floating point convert instructions use the
FloatConvertOp format defined in src/arch/mips/isa/formats/fp.isa. The type
of the operands in the ISA description file (_sw for signed word, or _sf for
signed float, etc.) is  used to create a type for the operand in C++. Then the
operand is converted using the fpConvert() function in src/arch/mips/utility.cc.

If we are converting from a word to a float, and we want to convert 0xffffffff,
we expect -1 to be passed into fpConvert(). Instead, we see MAX_INT passed in.
Then fpConvert() converts _val_ to MAX_INT in single-precision floating point,
and we get the wrong value.

To fix it, the signs of the convert operands are being changed from unsigned to
signed in the MIPS ISA description.

Then, the FloatConvertOp format is being changed to insert a int32_t into the
C++ code instead of a uint32_t.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-12-29 19:29:45 -06:00
Christian Menard d4f205ea2f x86: Implementation of Int3 and Int_Ib in long mode
This is an implementation of the x86 int3 and int immediate
instructions for long mode according to 'AMD64 Programmers Manual
Volume 3'.
2013-11-26 17:51:07 +01:00
Chander Sudanthi 3e6da89419 ARM: add support for TEEHBR access
Thumb2 ARM kernels may access the TEEHBR via thumbee_notifier
in arch/arm/kernel/thumbee.c.  The Linux kernel code just seems
to be saving and restoring the register.  This patch adds support
for the TEEHBR cp14 register.  Note, this may be a special case
when restoring from an image that was run on a system that
supports ThumbEE.
2013-10-31 13:41:13 -05:00
Prakash Ramrakhyani 885656f2ed mem: Add privilege info to request class
This patch adds a flag in the request class that indicates if the request
was made in privileged mode.
2013-10-31 13:41:13 -05:00
Eric Van Hensbergen bfdd031c0d arm: Accomodate function name changes in newer linux kernels 2013-10-17 10:20:45 -05:00
Yasuko Eckert 1bb293d1e7 arch/x86: add support for explicit CC register file
Convert condition code registers from being specialized
("pseudo") integer registers to using the recently
added CC register class.

Nilay Vaish also contributed to this patch.
2013-10-15 14:22:44 -04:00
Yasuko Eckert 2c293823aa cpu: add a condition-code register class
Add a third register class for condition codes,
in parallel with the integer and FP classes.
No ISAs use the CC class at this point though.
2013-10-15 14:22:44 -04:00
Steve Reinhardt 219c423f1f cpu: rename *_DepTag constants to *_Reg_Base
Make these names more meaningful.

Specifically, made these substitutions:

s/FP_Base_DepTag/FP_Reg_Base/g;
s/Ctrl_Base_DepTag/Misc_Reg_Base/g;
s/Max_DepTag/Max_Reg_Index/g;
2013-10-15 14:22:43 -04:00
Steve Reinhardt a830e63de7 isa: clean up register constants
Clean up and add some consistency to the *_Base_DepTag
constants as well as some related register constants:
- Get rid of NumMiscArchRegs, TotalArchRegs, and TotalDataRegs
  since they're never used and not always defined
- Set FP_Base_DepTag = NumIntRegs when possible (i.e.,
  every case except x86)
- Set Ctrl_Base_DepTag = FP_Base_DepTag + NumFloatRegs
  (this was true before, but wasn't always expressed
  that way)
- Drastically reduce the number of arbitrary constants
  appearing in these calculations
2013-10-15 14:22:43 -04:00
Steve Reinhardt 7aa423acad cpu: clean up architectural register classification
Move from a poorly documented scheme where the mapping
of unified architectural register indices to register
classes is hardcoded all over to one where there's an
enum for the register classes and a function that
encapsulates the mapping.
2013-10-15 14:22:42 -04:00
Andreas Sandberg 4f5775df64 mem: Rename the ASI_BITS flag field in Request
ASI_BITS in the Request object were originally used to store a memory
request's ASI on SPARC. This is not the case any more since other ISAs
use the ASI bits to store architecture-dependent information. This
changeset renames the ASI_BITS to ARCH_BITS which better describes
their use. Additionally, the getAsi() accessor is renamed to
getArchFlags().
2013-10-15 13:26:34 +02:00
Andreas Sandberg 5e7738467b mem: Use a flag instead of address bit 63 for generic IPRs
Using address bit 63 to identify generic IPRs caused problems on
SPARC, where IPRs are heavily used. This changeset redefines how
generic IPRs are identified. Instead of using bit 63, we now use a
separate flag (GENERIC_IPR) a memory request.
2013-10-15 13:24:35 +02:00
Nilay Vaish 87cc327abb x86: enables lstat and readlink syscalls 2013-10-07 18:05:49 -05:00
Andreas Sandberg fec2dea5c3 x86: Add support for m5ops through a memory mapped interface
In order to support m5ops in virtualized environments, we need to use
a memory mapped interface. This changeset adds support for that by
reserving 0xFFFF0000-0xFFFFFFFF and mapping those to the generic IPR
interface for m5ops. The mapping is done in the
X86ISA::TLB::finalizePhysical() which means that it just works for all
of the CPU models, including virtualized ones.
2013-09-30 12:20:53 +02:00
Andreas Sandberg d9856f33a4 arch: Add support for m5ops using mmapped IPRs
In order to support m5ops on virtualized CPUs, we need to either
intercept hypercall instructions or provide a memory mapped m5ops
interface. Since KVM does not normally pass the results of hypercalls
to userspace, which makes that method unfeasible. This changeset
introduces support for m5ops using memory mapped mmapped IPRs. This is
implemented by adding a class of "generic" IPRs which are handled by
architecture-independent code. Such IPRs always have bit 63 set and
are handled by handleGenericIprRead() and
handleGenericIprWrite(). Platform specific impementations of
handleIprRead and handleIprWrite should use
GenericISA::isGenericIprAccess to determine if an IPR address should
be handled by the generic code instead of the architecture-specific
code. Platforms that don't need their own IPR support can reuse
GenericISA::handleIprRead() and GenericISA::handleIprWrite().
2013-09-30 12:20:43 +02:00
Andreas Sandberg 114b643dd0 x86: Add support for FXSAVE, FXSAVE64, FXRSTOR, and FXRSTOR64 2013-09-30 12:06:36 +02:00
Andreas Sandberg 47bcc5c737 x86: Add support for FLDENV & FNSTENV 2013-09-30 12:04:36 +02:00
Andreas Sandberg 654d1e675a x86: Add support for loading 32-bit and 80-bit floats in the x87
The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.

80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.
2013-09-30 12:00:20 +02:00
Andreas Sandberg c299dcedc6 x86: Fix re-entrancy problems in x87 store instructions
X87 store instructions typically loads and pops the top value of the
stack and stores it in memory. The current implementation pops the
stack at the same time as the floating point value is loaded to a
temporary register. This will corrupt the state of the x87 stack if
the store fails. This changeset introduces a pop87 micro-instruction
that pops the stack and uses this instruction in the affected
macro-instructions to pop the stack after storing the value to memory.
2013-09-30 11:51:25 +02:00
Andreas Sandberg cccca70149 x86: Add support routines to load and store 80-bit floats
The x87 FPU on x86 supports extended floating point. We currently
handle all floating point on x86 as double and don't support 80-bit
loads/stores. This changeset add a utility function to load and
convert 80-bit floats to doubles (loadFloat80) and another function to
store doubles as 80-bit floats (storeFloat80). Both functions use
libfputils to do the conversion in software. The functions are
currently not used, but are required to handle floating point in KVM
and to properly support all x87 loads/stores.
2013-09-30 09:42:30 +02:00
Andreas Sandberg 3af2d8eab0 x86: Add limited support for extracting function call arguments
Add support for extracting the first 6 64-bit integer argumements to a
function call in X86ISA::getArgument().
2013-09-30 09:37:17 +02:00
Andreas Sandberg a6e723e4d6 x86: Add support routines to convert between x87 tag formats
This changeset adds the convX87XTagsToTags() and convX87TagsToXTags()
which convert between the tag formats in the FTW register and the
format used in the xsave area. The conversion from to the x87 FTW
representation is currently loses some information since it does not
reconstruct the valid/zero/special flags which are not included in the
xsave representation.
2013-09-19 17:30:26 +02:00
Andreas Sandberg e93e12a62b x86: Expose the raw hash map of MSRs
This patch allows the KVM CPU module to initialize it's MSRs by
enumerating the MSRs in the gem5 x86 implementation.
2013-09-18 11:28:28 +02:00
Andreas Sandberg 4b840b8322 x86: Add support for checking the raw state of an interrupt
In order to support hardware virtualization, we need to be able to
check if there are any interrupts pending irregardless of the
rflags.intf value. This changeset adds the checkInterruptsRaw() method
to the x86 interrupt control. It returns true if there are pending
interrupts that can be delivered as soon as the CPU is ready for
interrupt delivery.
2013-09-18 11:28:27 +02:00
Andreas Sandberg 15733e9b33 x86: Expose the interrupt vector in faults
This patch allows a hardware virtualized CPU to discover which interrupt
to deliver to the guest.
2013-09-18 11:28:24 +02:00
Andreas Hansson 19a5b68db7 arch: Resurrect the NOISA build target and rename it NULL
This patch makes it possible to once again build gem5 without any
ISA. The main purpose is to enable work around the interconnect and
memory system without having to build any CPU models or device models.

The regress script is updated to include the NULL ISA target. Currently
no regressions make use of it, but all the testers could (and perhaps
should) transition to it.

--HG--
rename : build_opts/NOISA => build_opts/NULL
rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts
rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh
rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc
2013-09-04 13:22:57 -04:00
Andreas Hansson cead68a781 alpha: Move system virtProxy to Alpha only
This patch moves the system virtual port proxy to the Alpha system
only to make the resurrection of the NOISA slightly less
painful. Alpha is the only ISA that is actually using it.
2013-09-04 13:22:55 -04:00
Andreas Hansson f7d44590cb alpha: Check interrupts before quiesce
This patch adds a check to the quiesce operation to ensure that the
CPU does not suspend itself when there are unmasked interrupts
pending. Without this patch there are corner cases when the CPU gets
an interrupt before the quiesce is executed and then never wakes up
again.
2013-08-19 03:52:29 -04:00
Nilay Vaish e038741598 x86: add tlb checkpointing
This patch adds checkpointing support to x86 tlb. It upgrades the
cpt_upgrader.py script so that previously created checkpoints can
be updated. It moves the checkpoint version to 6.
2013-08-07 14:51:17 -05:00
Andreas Hansson d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Steve Reinhardt 1f43e244bd dev: make BasicPioDevice take size in constructor
Instead of relying on derived classes explicitly assigning
to the BasicPioDevice pioSize field, require them to pass
a size value in to the constructor.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 21:57:04 -05:00
Steve Reinhardt 502ad1e675 dev: consistently end device classes in 'Device'
PciDev and IntDev stuck out as the only device classes that
ended in 'Dev' rather than 'Device'.  This patch takes care
of that inconsistency.

Note that you may need to delete pre-existing files matching
build/*/python/m5/internal/param_* as scons does not pick up
indirect dependencies on imported python modules when generating
params, and the PciDev -> PciDevice rename takes place in a
file (dev/Device.py) that gets imported quite a bit.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 21:56:50 -05:00
Steve Reinhardt b0b1c0205c devices: make more classes derive from BasicPioDevice
A couple of devices that have single fixed memory mapped regions
were not derived from BasicPioDevice, when that's exactly
the functionality that BasicPioDevice provides.  This patch
gets rid of a little bit of redundant code by making those
devices actually do so.

Also fixed the weird case of X86ISA::Interrupts, where
the class already did derive from BasicPioDevice but
didn't actually use all the features it could have.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 21:56:24 -05:00
Akash Bagdia 7d7ab73862 sim: Add the notion of clock domains to all ClockedObjects
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.

The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).

The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.

All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.

The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
2013-06-27 05:49:49 -04:00
Andreas Sandberg d06064c386 x86: Add support for maintaining the x87 tag word
The current implementation of the x87 never updates the x87 tag
word. This is currently not a big issue since the simulated x87 never
checks for stack overflows, however this becomes an issue when
switching between a virtualized CPU and a simulated CPU. This
changeset adds support, which is enabled by default, for updating the
tag register to every floating point microop that updates the stack
top using the spm mechanism.

The new tag words is generated by the helper function
X86ISA::genX87Tags(). This function is currently limited to flagging a
stack position as valid or invalid and does not try to distinguish
between the valid, zero, and special states.
2013-06-18 16:36:08 +02:00
Andreas Sandberg a8e8c4f433 x86: Fix loading of floating point constants
This changeset actually fixes two issues:

 * The lfpimm instruction didn't work correctly when applied to a
   floating point constant (it did work for integers containing the
   bit string representation of a constant) since it used
   reinterpret_cast to convert a double to a uint64_t. This caused a
   compilation error, at least, in gcc 4.6.3.

 * The instructions loading floating point constants in the x87
   processor didn't work correctly since they just stored a truncated
   integer instead of a double in the floating point register. This
   changeset fixes the old microcode by using lfpimm instruction
   instead of the limm instructions.
2013-06-18 16:30:06 +02:00
Andreas Sandberg c9c02efb99 x86: Initialize the MXCSR register 2013-06-18 16:28:36 +02:00
Andreas Sandberg 688fc7f71f x86: Make the boot state VMX compliant
This patch allows the default x86 state to be used when by CPUs that
use hardware virtualization.
2013-06-18 16:27:28 +02:00
Andreas Sandberg 5d584934ad x86: Make fprem like the fprem on a real x87
The current implementation of fprem simply does an fmod and doesn't
simulate any of the iterative behavior in a real fprem. This isn't
normally a problem, however, it can lead to problems when switching
between CPU models. If switching from a real CPU in the middle of an
fprem loop to a simulated CPU, the output of the fprem loop becomes
correupted. This changeset changes the fprem implementation to work
like the one on real hardware.
2013-06-18 16:10:42 +02:00
Andreas Sandberg 46a8cbbb7f x86: Add helper functions to access rflags
The rflags register is spread across several different registers. Most
of the flags are stored in MISCREG_RFLAGS, but some are stored in
microcode registers. When accessing RFLAGS, we need to reconstruct it
from these registers. This changeset adds two functions,
X86ISA::getRFlags() and X86ISA::setRFlags(), that take care of this
magic.
2013-06-18 16:10:22 +02:00
Andreas Sandberg de89e133d8 x86: Fix the flag handling code in FABS and FCHS
This changeset fixes two problems in the FABS and FCHS
implementation. First, the ISA parser expects the assignment in
flag_code to be a pure assignment and not an and-assignment, which
leads to the isa_parser omitting the misc reg update. Second, the FCHS
and FABS macro-ops don't set the SetStatus flag, which means that the
default micro-op version, which doesn't update FSW, is executed.
2013-06-18 16:10:21 +02:00
Andreas Sandberg 0b4a8b4086 x86: Fix bug when copying TSC on CPU handover
The TSC value stored in MISCREG_TSC is actually just an offset from
the current CPU cycle to the actual TSC value. Writes with
side-effects to the TSC subtract the current cycle count before
storing the new value, while reads add the current cycle count. When
switching CPUs, the current value is copied without side-effects. This
works as long as the source and the destination CPUs have the same
clock frequencies. The TSC will jump, sometimes backwards, if they
have different clock frequencies. Most OSes assume the TSC to be
monotonic and break when this happens.

This changeset makes sure that the TSC is copied with side-effects to
ensure that the offset is updated to match the new CPU.
2013-06-11 09:24:38 +02:00
Andreas Sandberg 7846f59d0d arch: Create a method to finalize physical addresses
in the TLB

Some architectures (currently only x86) require some fixing-up of
physical addresses after a normal address translation. This is usually
to remap devices such as the APIC, but could be used for other memory
mapped devices as well. When running the CPU in a using hardware
virtualization, we still need to do these address fix-ups before
inserting the request into the memory system. This patch moves this
patch allows that code to be used by such CPUs without doing full
address translations.
2013-06-03 13:55:41 +02:00