the GPUISA class is meant to encapsulate any ISA-specific behavior - special
register accesses, isa-specific WF/kernel state, etc. - in a generic enough
way so that it may be used in ISA-agnostic code.
gpu-compute: use the GPUISA object to advance the PC
the GPU model treats the PC as a pointer to individual instruction objects -
which are store in a contiguous array - and not a byte address to be fetched
from the real memory system. this is ok for HSAIL because all instructions
are considered by the model to be the same size.
in machine ISA, however, instructions may be 32b or 64b, and branches are
calculated by advancing the PC by the number of words (4 byte chunks) it
needs to advance in the real instruction stream. because of this there is
a mismatch between the PC we use to index into the instruction array, and
the actual byte address PC the ISA expects. here we move the PC advance
calculation to the ISA so that differences in the instrucion sizes may be
accounted for in generic way.
because every taken branch causes fetch to be discarded, we move the call
to the WF to avoid to have to call it from each and every branch instruction
type.
we are removing doGmReturn from the GM pipe, and adding completeAcc()
implementations for the HSAIL mem ops. the behavior in doGmReturn is
dependent on HSAIL and HSAIL mem ops, however the completion phase
of memory ops in machine ISA can be very different, even amongst individual
machine ISA mem ops. so we remove this functionality from the pipeline and
allow it to be implemented by the individual instructions.
this patch removes the GPUStaticInst enums that were defined in GPU.py.
instead, a simple set of attribute flags that can be set in the base
instruction class are used. this will help unify the attributes of HSAIL
and machine ISA instructions within the model itself.
because the static instrution now carries the attributes, a GPUDynInst
must carry a pointer to a valid GPUStaticInst so a new static kernel launch
instruction is added, which carries the attributes needed to perform a
the kernel launch.
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to
Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which
distinguishes writes to the INT and FP register banks.
Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72,
where the "latency" of FMADD is 3 if the next instruction is a FMADD and
has only the augend to destination dependency, otherwise it's 7 cycles.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Normal MMAPPED_IPR requests are allowed to execute speculatively under the
assumption that they have no side effects. The special case of m5ops that are
treated like MMAPPED_IPR should not be allowed to execute speculatively, since
they can have side-effects. Adding the STRICT_ORDER flag to these requests
blocks execution until the associated instruction hits the ROB head.
this patch fixes issues with changeset 11593
use the host's pwrite() syscall for pwrite64Func(),
as opposed to pwrite64(), because pwrite64() does
not work well on all distros.
undo the enabling of fstatfs, as we will add this
in a separate pate.
Introduce and use a lookup table.
Using fetchDescriptor() rather than DMA cleanly handles nested paging.
Change-Id: I69ec762f176bd752ba1040890e731826b58d15a6
During host bootup, KVM reads/writes to CNTHCTL_EL2. Because this
miscreg has not been implemented, the simulation would end there. This
patch causes the simulation to warn about the read/write instead of fail.
Change-Id: If034bfd0818a9a5e50c5fe86609e945258c96fa3
This fixes a bug where stage 2 lookups used the AArch32
permissions rules even if we were executing in AArch64 mode.
Change-Id: Ia40758f0599667ca7ca15268bd3bf051342c24c1
This patch restricts trapping to hypervisor only if we are in the
correct exception level for the trap to happen.
Change-Id: I0a382b6a572ef835ea36d2702b8a81b633bd3df0
Faults that could potentially be routed to the hypervisor checked
whether or not they were in a secure state without checking if security
was enabled or not. This caused faults not to be routed correctly. This
patch causes secure state checking to first ask if security is enabled.
Change-Id: I179e9b181b27f552734c9bab2b18d05ac579a119
We recompute if we are doing a stage 2 walk inside of the table walker
but we have already figured it out in the tlb. Pass the information in
to the walk instead of recomputing it.
Change-Id: I39637ce99309b2ddbc30344d45ac9ebf6a203401
The functional case is already handled within the fetchDescriptor()
function. We can thus use that function for both atomic and functional
mode when we start the table walk.
Change-Id: Iacaed28cd9024d259fd37a58150efd00ff94d86e
This patch adds the option for faults to be routed to the hypervisor
using the pre-existing routeToHyp() functions that are present in each
fault type.
Change-Id: I9735512c094457636b9870456a5be5432288e004
During address translation instructions (such as AT S1E1R_Xt) the exception
level can be different than the current exception level. This patch fixes
how the TLB determines what EL to use during these instructions.
Change-Id: Ia9ce229404de9e284bc1f7479fd2c580efd55f8f
This patch adds the AArch64 instruction hvc which raises an exception
from EL1 into EL2. The host OS uses this instruction to world switch
into the guest.
Change-Id: I930ee43f4f0abd4b35a68eb2a72e44e3ea6570be
Make it so that getInterrupt *always* returns an interrupt if
checkInterrupts() returns true. This fixes/simplifies handling
of interrupts on the SMT FS CPUs (currently minor).
Don't consult the TLB test interface for PA's returned by functional
translations by the AT instruction. We implement this by chaning the
ISA code to synthesize 0-length functional reads for the TLB lookup.
The TLB then bypasses the final PA check in the tester if the size is
zero.
Change-Id: I2487b7f829cea88c37e229e9fc7a4543aced961b
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Previously when we initialized the TLB we would allocate a number of
TLB entries which would be marked as valid. As a result the TLB
contained an entry which would be considered a valid entry for the 0
page.
Change-Id: I23ace86426a171a4f6200ebeb29ad57c21647036
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Add helper functions to dump the guest kernel's dmesg buffer to a text
file in m5out. This functionality is split into two parts. First, a
dmesg dump function that can be used in other places:
void Linux::dumpDmesg(ThreadContext *, std::ostream &)
This function is used to implement two PCEvents: DmesgDumpEvent and
KernelPanic event. The only difference between the two is that the
latter produces a gem5 panic instead of a warning in addition to
dumping the kernel log.
Change-Id: I6d2af1d666ace57124089648ea906f6c787ac63c
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Gabor Dozsa <gabor.dozsa@arm.com>
There is a mismatch between DataType and SrcDataType in constructing
Atomic ST instruction. The mismatch causes atomic_store and
atomic_store_explicit function to store incorrect value in memory.
Eliminate the VSZ constant that defined the Wavefront size (in numbers of work
items); replaced it with a parameter in the GPU.py configuration script.
Changed all data structures dependent on the Wavefront size to be dynamically
sized. Legal values of Wavefront size are 16, 32, 64 for now and checked at
initialization time.
Fixing an issue with regStats not calling the parent class method
for most SimObjects in Gem5. This causes issues if one adds new
stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
We want to extend the stats of objects hierarchically and thus it is necessary
to register the statistics of the base-class(es), as well. For now, these are
empty, but generic stats will be added there.
Patch originally provided by Akash Bagdia at ARM Ltd.
In particular, when EL0 is in AArch32 but EL1 is AArch64, AArch64
memory translation must be used. This is essential for typical
AArch64/32 interworking use cases.
The ERET instruction doesn't set PSTATE correctly in some cases
(particularly when returning to aarch32 code). Among other things,
this breaks EL0 thumb code when using a 64-bit kernel. This changeset
updates the ERET implementation to match the ARM ARM.
Change-Id: I408e7c69a23cce437859313dfe84e68744b07c98
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>
The current implementation of aarch32 FP/SIMD in gem5 assumes that EL1
and higher are all 32-bit. This breaks interprocessing since an
aarch64 EL1 uses different enable/disable bits. This change updates
the permission checks to according to what is prescribed by the ARM
ARM.
Change-Id: Icdcef31b00644cfeebec00216b3993aa1de12b88
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Mitch Hayenga <mitch.hayenga@arm.com>
Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>
LPAE has been tested with Linux 4.4 and seems to work just fine. Let's
enable it by default.
Change-Id: Id88c6e3c91ae9c353279d42f2aa1f8a78485bd32
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Gabor Dozsa <gabor.dozsa@arm.com>
According to the ARM ARM (see AArch32.TranslateAddress in the
pseudocode library), the TLB should be operating in aarch64 mode if
the EL0 is aarch32 and EL1 is aarch64. This is currently not the case
in gem5, which breaks 64/32 interprocessing. Update the check to match
the reference manual.
Change-Id: I6f1444d57c0e2eb5f8880f513f33a9197b7cb2ce
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Mitch Hayenga <mitch.hayenga@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
We currently check the current state instead of the state of the
target EL when determining how we report a fault. This breaks
interprocessing since EL0 in aarch32 would report its fault status
using the aarch32 registers even if EL1 is in aarch64. Fix this to
report the fault using the format of the target EL.
Change-Id: Ic080267ac210783d1e01c722a4ddaa687dce280e
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Mitch Hayenga <mitch.hayenga@arm.com>
The TLB currently assumes that the pxn bit in an LPAE page descriptor
disables execution from unprivileged mode. However, according to the
architecture manual, this bit should disable execution from privileged
modes. Update the TLB implementation to reflect this behavior.
Change-Id: I7f1bb232d7a94a93fd601a9230223195ac952947
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
A lot of code assumes that it is possible to test what the highest EL
is and if it is 64 bit. These calls currently don't work in SE mode
since they rely on an instance of an ArmSystem.
Change-Id: I0d1f261926a66ce3dc4fa116845ffb2a081446f2
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>