Allow the specification of a socket ID for every core that is reflected in the
MPIDR field in ARM systems. This allows studying multi-socket / cluster
systems with ARM CPUs.
Unimplemented miscregs for the generic timer were guarded by panics
in arm/isa.cc which can be tripped by the O3 model if it speculatively
executes a wrong path containing a mrs instruction with a bad miscreg
index. These registers were flagged as implemented and accessible.
This patch changes the miscreg info bit vector to flag them as
unimplemented and inaccessible. In this case, and UndefinedInst
fault will be generated if the register access is not trapped
by a hypervisor.
With (upcoming) separate compilation, they are useless. Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
The MicroMemOp class generates the disassembly for both integer
and floating point instructions, but it would always print its
first operand as an integer register without considering that the
op may be a floating instruction in which case a float register
should be displayed instead.
There were several sections of the m5ops code which were
essentially copy/pasted versions of the 32-bit code. The
problem is that some of these didn't account fo4 64-bit
registers leading to arguments being in the wrong registers.
This patch addresses the args for readfile64, writefile64,
and addsymbol64 -- all of which seemed to suffer from a
similar set of problems when moving to 64-bit.
This changeset adds support for INIT and STARTUP IPI handling. We
currently handle both of these interrupts in gem5 and transfer the
state to KVM. Since we do not have a BIOS loaded, we pretend that the
INIT interrupt suspends the CPU after reset.
--HG--
extra : rebase_source : 7f3b25f3801d68f668b6cd91eaf50d6f48ee2a6a
The table walker code currently accounts for two types of walks,
Atomic and Timing, and treats them differently. Atomic walks keep a
single instance of WalkerState around for all walks to use in
currState. Timing mode keeps a queue of in-flight WalkerStates and
maintains currState as NULL between walks.
If a functional walk is done during Timing mode, it is treated as an
atomic walk and either creates a persistent WalkerState if in between
Timing walks, or stomps an existing currState for an in-progress
Timing walk.
This patch distinguishes functional walks as being able to exist at
any time and sets up a temporary WalkerState for its exclusive use and
then cleans up when finished, leaving any in progress Atomic or Timing
walks undisturbed.
Small fix for a warning that prevents compilation with gcc 4.8.1 due
to detecting that a variable might be uninitialised. The fix is to
assign a safe default.
The TSL/LDT & TR/TSS segments didn't contain valid attributes. This
caused problems when transfering the state into KVM where invalid
state is a no-go. Fixup the attributes with values from AMD's
architecture programmer's manual.
A copyRegs() function is added to MIPS utilities
to copy architectural state from the old CPU to
the new CPU during fast-forwarding. This
addition alone enables fast-forwarding for the
o3 cpu model running MIPS.
The patch also adds takeOverFrom() and
drainResume() functions to the InOrderCPU to
enable it to take over from another CPU. This
change enables fast-forwarding for the inorder
cpu model running MIPS, but not for Alpha.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Changeset 7274310be1bb (isa: clean up register constants) increased
the value of NumFloatRegs, which triggered a bug in
X86ISA::copyRegs(). This bug is caused by the x87 stack being copied
twice since register indexes past NUM_FLOATREGS are mapped into the
x87 stack relative to the top of the stack, which is undefined when
the copy takes place.
This changeset updates the copyRegs() function to use access registers
using the non-flattening interface, which guarantees that undesirable
register folding does not happen.
The getRFlags and setRFlags utility functions were not updated
correctly when condition registers were separated into their own
register class. This lead to incorrect state transfer in calls from
kvm into the simulator (e.g., m5 readfile ended up in an infinite
loop) and when switching CPUs. This patch makes these utility
functions use getCCReg and setCCReg instead of getIntReg and setIntReg
which read and write the integer registers.
Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.
Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.
Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
With ARMv8 support the same misc register id results in accessing different
registers depending on the current mode of the processor. This patch adds
the same orthogonality to the misc register file as the others (int, float, cc).
For all the othre ISAs this is currently a null-implementation.
Additionally, a system variable is added to all the ISA objects.
This patch add support for generating wake-up events in the CPU when an address
that is currently in the exclusive state is hit by a snoop. This mechanism is required
for ARMv8 multi-processor support.
Previously we were casting the result type to the the memory type which
is incorrect for things like dual-memory operations which still return a
single result.
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks. Cache occupancy stats are
recalculated on each stat dump.
This patch fixes a memory leak in the table walker, by ensuring that
the sender state is deleted again if the request packet cannot be
successfully sent.
In mips architecture, floating point convert instructions use the
FloatConvertOp format defined in src/arch/mips/isa/formats/fp.isa. The type
of the operands in the ISA description file (_sw for signed word, or _sf for
signed float, etc.) is used to create a type for the operand in C++. Then the
operand is converted using the fpConvert() function in src/arch/mips/utility.cc.
If we are converting from a word to a float, and we want to convert 0xffffffff,
we expect -1 to be passed into fpConvert(). Instead, we see MAX_INT passed in.
Then fpConvert() converts _val_ to MAX_INT in single-precision floating point,
and we get the wrong value.
To fix it, the signs of the convert operands are being changed from unsigned to
signed in the MIPS ISA description.
Then, the FloatConvertOp format is being changed to insert a int32_t into the
C++ code instead of a uint32_t.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Thumb2 ARM kernels may access the TEEHBR via thumbee_notifier
in arch/arm/kernel/thumbee.c. The Linux kernel code just seems
to be saving and restoring the register. This patch adds support
for the TEEHBR cp14 register. Note, this may be a special case
when restoring from an image that was run on a system that
supports ThumbEE.
Convert condition code registers from being specialized
("pseudo") integer registers to using the recently
added CC register class.
Nilay Vaish also contributed to this patch.
Make these names more meaningful.
Specifically, made these substitutions:
s/FP_Base_DepTag/FP_Reg_Base/g;
s/Ctrl_Base_DepTag/Misc_Reg_Base/g;
s/Max_DepTag/Max_Reg_Index/g;
Clean up and add some consistency to the *_Base_DepTag
constants as well as some related register constants:
- Get rid of NumMiscArchRegs, TotalArchRegs, and TotalDataRegs
since they're never used and not always defined
- Set FP_Base_DepTag = NumIntRegs when possible (i.e.,
every case except x86)
- Set Ctrl_Base_DepTag = FP_Base_DepTag + NumFloatRegs
(this was true before, but wasn't always expressed
that way)
- Drastically reduce the number of arbitrary constants
appearing in these calculations
Move from a poorly documented scheme where the mapping
of unified architectural register indices to register
classes is hardcoded all over to one where there's an
enum for the register classes and a function that
encapsulates the mapping.
ASI_BITS in the Request object were originally used to store a memory
request's ASI on SPARC. This is not the case any more since other ISAs
use the ASI bits to store architecture-dependent information. This
changeset renames the ASI_BITS to ARCH_BITS which better describes
their use. Additionally, the getAsi() accessor is renamed to
getArchFlags().
Using address bit 63 to identify generic IPRs caused problems on
SPARC, where IPRs are heavily used. This changeset redefines how
generic IPRs are identified. Instead of using bit 63, we now use a
separate flag (GENERIC_IPR) a memory request.
In order to support m5ops in virtualized environments, we need to use
a memory mapped interface. This changeset adds support for that by
reserving 0xFFFF0000-0xFFFFFFFF and mapping those to the generic IPR
interface for m5ops. The mapping is done in the
X86ISA::TLB::finalizePhysical() which means that it just works for all
of the CPU models, including virtualized ones.
In order to support m5ops on virtualized CPUs, we need to either
intercept hypercall instructions or provide a memory mapped m5ops
interface. Since KVM does not normally pass the results of hypercalls
to userspace, which makes that method unfeasible. This changeset
introduces support for m5ops using memory mapped mmapped IPRs. This is
implemented by adding a class of "generic" IPRs which are handled by
architecture-independent code. Such IPRs always have bit 63 set and
are handled by handleGenericIprRead() and
handleGenericIprWrite(). Platform specific impementations of
handleIprRead and handleIprWrite should use
GenericISA::isGenericIprAccess to determine if an IPR address should
be handled by the generic code instead of the architecture-specific
code. Platforms that don't need their own IPR support can reuse
GenericISA::handleIprRead() and GenericISA::handleIprWrite().
The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.
80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.
X87 store instructions typically loads and pops the top value of the
stack and stores it in memory. The current implementation pops the
stack at the same time as the floating point value is loaded to a
temporary register. This will corrupt the state of the x87 stack if
the store fails. This changeset introduces a pop87 micro-instruction
that pops the stack and uses this instruction in the affected
macro-instructions to pop the stack after storing the value to memory.
The x87 FPU on x86 supports extended floating point. We currently
handle all floating point on x86 as double and don't support 80-bit
loads/stores. This changeset add a utility function to load and
convert 80-bit floats to doubles (loadFloat80) and another function to
store doubles as 80-bit floats (storeFloat80). Both functions use
libfputils to do the conversion in software. The functions are
currently not used, but are required to handle floating point in KVM
and to properly support all x87 loads/stores.
This changeset adds the convX87XTagsToTags() and convX87TagsToXTags()
which convert between the tag formats in the FTW register and the
format used in the xsave area. The conversion from to the x87 FTW
representation is currently loses some information since it does not
reconstruct the valid/zero/special flags which are not included in the
xsave representation.
In order to support hardware virtualization, we need to be able to
check if there are any interrupts pending irregardless of the
rflags.intf value. This changeset adds the checkInterruptsRaw() method
to the x86 interrupt control. It returns true if there are pending
interrupts that can be delivered as soon as the CPU is ready for
interrupt delivery.
This patch makes it possible to once again build gem5 without any
ISA. The main purpose is to enable work around the interconnect and
memory system without having to build any CPU models or device models.
The regress script is updated to include the NULL ISA target. Currently
no regressions make use of it, but all the testers could (and perhaps
should) transition to it.
--HG--
rename : build_opts/NOISA => build_opts/NULL
rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts
rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh
rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc
This patch moves the system virtual port proxy to the Alpha system
only to make the resurrection of the NOISA slightly less
painful. Alpha is the only ISA that is actually using it.
This patch adds a check to the quiesce operation to ensure that the
CPU does not suspend itself when there are unmasked interrupts
pending. Without this patch there are corner cases when the CPU gets
an interrupt before the quiesce is executed and then never wakes up
again.
This patch adds checkpointing support to x86 tlb. It upgrades the
cpt_upgrader.py script so that previously created checkpoints can
be updated. It moves the checkpoint version to 6.
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.
Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.
A follow-on patch updates the configuration scripts accordingly.
Instead of relying on derived classes explicitly assigning
to the BasicPioDevice pioSize field, require them to pass
a size value in to the constructor.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
PciDev and IntDev stuck out as the only device classes that
ended in 'Dev' rather than 'Device'. This patch takes care
of that inconsistency.
Note that you may need to delete pre-existing files matching
build/*/python/m5/internal/param_* as scons does not pick up
indirect dependencies on imported python modules when generating
params, and the PciDev -> PciDevice rename takes place in a
file (dev/Device.py) that gets imported quite a bit.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
A couple of devices that have single fixed memory mapped regions
were not derived from BasicPioDevice, when that's exactly
the functionality that BasicPioDevice provides. This patch
gets rid of a little bit of redundant code by making those
devices actually do so.
Also fixed the weird case of X86ISA::Interrupts, where
the class already did derive from BasicPioDevice but
didn't actually use all the features it could have.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.
The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).
The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.
All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.
The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
The current implementation of the x87 never updates the x87 tag
word. This is currently not a big issue since the simulated x87 never
checks for stack overflows, however this becomes an issue when
switching between a virtualized CPU and a simulated CPU. This
changeset adds support, which is enabled by default, for updating the
tag register to every floating point microop that updates the stack
top using the spm mechanism.
The new tag words is generated by the helper function
X86ISA::genX87Tags(). This function is currently limited to flagging a
stack position as valid or invalid and does not try to distinguish
between the valid, zero, and special states.
This changeset actually fixes two issues:
* The lfpimm instruction didn't work correctly when applied to a
floating point constant (it did work for integers containing the
bit string representation of a constant) since it used
reinterpret_cast to convert a double to a uint64_t. This caused a
compilation error, at least, in gcc 4.6.3.
* The instructions loading floating point constants in the x87
processor didn't work correctly since they just stored a truncated
integer instead of a double in the floating point register. This
changeset fixes the old microcode by using lfpimm instruction
instead of the limm instructions.
The current implementation of fprem simply does an fmod and doesn't
simulate any of the iterative behavior in a real fprem. This isn't
normally a problem, however, it can lead to problems when switching
between CPU models. If switching from a real CPU in the middle of an
fprem loop to a simulated CPU, the output of the fprem loop becomes
correupted. This changeset changes the fprem implementation to work
like the one on real hardware.
The rflags register is spread across several different registers. Most
of the flags are stored in MISCREG_RFLAGS, but some are stored in
microcode registers. When accessing RFLAGS, we need to reconstruct it
from these registers. This changeset adds two functions,
X86ISA::getRFlags() and X86ISA::setRFlags(), that take care of this
magic.
This changeset fixes two problems in the FABS and FCHS
implementation. First, the ISA parser expects the assignment in
flag_code to be a pure assignment and not an and-assignment, which
leads to the isa_parser omitting the misc reg update. Second, the FCHS
and FABS macro-ops don't set the SetStatus flag, which means that the
default micro-op version, which doesn't update FSW, is executed.
The TSC value stored in MISCREG_TSC is actually just an offset from
the current CPU cycle to the actual TSC value. Writes with
side-effects to the TSC subtract the current cycle count before
storing the new value, while reads add the current cycle count. When
switching CPUs, the current value is copied without side-effects. This
works as long as the source and the destination CPUs have the same
clock frequencies. The TSC will jump, sometimes backwards, if they
have different clock frequencies. Most OSes assume the TSC to be
monotonic and break when this happens.
This changeset makes sure that the TSC is copied with side-effects to
ensure that the offset is updated to match the new CPU.
in the TLB
Some architectures (currently only x86) require some fixing-up of
physical addresses after a normal address translation. This is usually
to remap devices such as the APIC, but could be used for other memory
mapped devices as well. When running the CPU in a using hardware
virtualization, we still need to do these address fix-ups before
inserting the request into the memory system. This patch moves this
patch allows that code to be used by such CPUs without doing full
address translations.
This is the x86 version of the ARM changeset baa17ba80e06. In case an
instruction has been squashed by the o3 cpu, this patch allows page
table walker to avoid carrying out a pending translation that the
instruction requested for.
Currently call and return instructions are marked as IsCall and IsReturn. Thus, the
branch predictor does not use RAS for these instructions. Similarly, the number of
function calls that took place is recorded as 0. This patch marks these instructions
as they should be.
Currently all the integer microops are marked as IntAluOp and the floating
point microops are marked as FloatAddOp. This patch adds support for marking
different microops differently. Now IntMultOp, IntDivOp, FloatDivOp,
FloatMultOp, FloatCvtOp, FloatSqrtOp classes will be used as well. This will
help in providing different latencies for different op class.
The vsyscall address for gettimeofday is 0xffffffffff600000ul. The offset
therefore should be 0x0 instead of 0x410. This can be cross checked with
the file sysdeps/unix/sysv/linux/x86_64/gettimeofday.c in source of glibc.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
The 'lret' instruction reloads instruction pointer and code segment from the
stack and then pops them. But the popping part is missing from the current
implementation. This caused incorrect behavior in some code related to the
Fiasco OS. Microops are being added to rectify the behavior of the instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Add the method checkRaw to ArmISA::Interrupts. This method can be used
to query the raw state (ignoring CPSR masks) of an interrupt. It is
primarily intended for hardware virtualized CPUs.
Add the options 'panic_on_panic' and 'panic_on_oops' to the
LinuxArmSystem SimObject. When these option are enabled, the simulator
panics when the guest kernel panics or oopses. Enable panic on panic
and panic on oops in ARM-based test cases.
This changeset adds support for forwarding arguments to the PC
event constructors to following methods:
addKernelFuncEvent
addFuncEvent
Additionally, this changeset adds the following helper method to the
System base class:
addFuncEventOrPanic - Hook a PCEvent to a symbol, panic on failure.
addKernelFuncEventOrPanic - Hook a PCEvent to a kernel symbol, panic
on failure.
System implementations have been updated to use the new functionality
where appropriate.
This patch adds a missing flag to the ldr_ret_uop microop instruction.
The flag is added when the instruction is used, not directly in the
constructor of the instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
It is possible that operating system wants to shutdown the
lapic timer by writing timer's initial count to 0. This patch
adds a check that the timer event is only scheduled if the
count is 0.
The patch also converts few of the panics related to the keyboard
to warnings since we are any way not interested in simulating the
keyboard.
This patch fixes an issue related to the table walker recycling
packets that still have a bus delay that is not accounted for. For
now, we simply ignore the values and reset them to zero.
This patch fixes the warnings that clang3.2svn emit due to the "-Wall"
flag. There is one case of an uninitialised value in the ARM neon ISA
description, and then a whole range of unused private fields that are
pruned.
A derived function with a different signature than a base class
function will result in the base class function of the same name being
hidden. The parameter list and return type for the member function in
the derived class must match those of the member function in the base
class, otherwise the function in the derived class will hide the
function in the base class and no polymorphic behaviour will occur.
This patch addresses these warnings by ensuring a unique function name
to avoid (unintentionally) hiding any functions.
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
This patch moves the 16x APIC clock divider to the Python code to
avoid the post-instantiation modifications to the clock. The x86 APIC
was the only object setting the clock after creation time and this
required some custom functionality and configuration. With this patch,
the clock multiplier is moved to the Python code and the objects are
instantiated with the appropriate clock.
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.
There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
If multiple memory operations to the same page are miss the TLB they are
all inserted into the page table queue and before this change could result
in multiple uncessesary walks as well as duplicate enteries being inserted
into the TLB.
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.
To make memory mode tests cleaner, the following methods are added to
the System class:
* isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
* isTimingMode() -- True if the memory mode is 'timing'.
* bypassCaches() -- True if caches should be bypassed.
The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
The explict tests in the follwing fp comparison operations were
incorrect as they checked for only signaling NaNs and not quite-NaNs
as well. When compiled with gcc, the comparison generates a fp exception
that causes the FE_INVALID flag to be set and we check for it, so even
though the check was incorrect, the correct exception was set. With clang
this behavior seems to not occur. The checks are updated to test for nans and
the behavior is now correct with both clang and gcc.
Clang generated executables would enter the if condition when it wasn't
supposted to, resulting in the wrong simulated behavior.
Implementing the operation this way is a bit faster anyway.
The changes made by the changeset 270c9a75e91f do not work well with switching
of cpus. The problem is that decoder for the old thread context holds state
that is not taken over by the new decoder.
This patch adds a takeOverFrom() function to Decoder class in each ISA. Except
for x86, functions in other ISAs are blank. For x86, the function copies state
from the old decoder to the new decoder.
Note that clflush is only being enabled. It is not implemented
in actual. A warning is printed if the cpu encounters a clflush
instruction. We need to enable this instruction in cpuid since
JRE 1.7 tests for it.
The changes made by the changeset 9376 were not quite correct. The patch made
changes to the code which resulted in decoder not getting initialized correctly
when the state was restored from a checkpoint.
This patch adds a startup function to each ISA object. For x86, this function
sets the required state in the decoder. For other ISAs, the function is empty
right now.
Used as a command in full-system scripts helps the user ensure the benchmarks have finished successfully.
For example, one can use:
/path/to/benchmark args || /sbin/m5 fail 1
and thus ensure gem5 will exit with an error if the benchmark fails.
This changeset inserts a TLB flush in BaseCPU::switchOut to prevent
stale translations when doing repeated switching. Additionally, the
TLB flushing functionality is exported to the Python to make debugging
of switching/checkpointing easier.
A simulation script will typically use the TLB flushing functionality
to generate a reference trace. The following sequence can be used to
simulate a handover (this depends on how drain is implemented, but is
generally the case) between identically configured CPU models:
m5.drain(test_sys)
[ cpu.flushTLBs() for cpu in test_sys.cpu ]
m5.resume(test_sys)
The generated trace should normally be identical to a trace generated
when switching between identically configured CPU models or
checkpointing and resuming.
Currently, we invalidate the cached miscregs in
TLB::unserialize(). The intended use of the drainResume() method is to
invalidate cached state and prepare the system to resume after a CPU
handover or (un)serialization. This patch moves the TLB miscregs
invalidation code to the drainResume() method to avoid surprising
behavior.
Since the page table walker only checks if a drain has completed in
doL1DescriptorWrapper() and doL2DescriptorWrapper(), it sometimes
looses track of a drain request if there is a squash. This changeset
adds a completeDrain() call after squashing requests in the pending
queue, which fixes this issue.
In order to see all registers independent of the current CPU mode, the
ARM architecture model uses the magic MISCREG_CPSR_MODE register to
change the register mappings without actually updating the CPU
mode. This hack is no longer needed since the thread context now
provides a flat interface to the register file. This patch replaces
the CPSR_MODE hack with the flat register interface.
After making the ISA an independent SimObject, it is serialized
automatically by the Python world. Previously, this just resulted in
an empty ISA section. This patch moves the contents of the ISA to that
section and removes the explicit ISA serialization from the thread
contexts, which makes it behave like a normal SimObject during
serialization.
Note: This patch breaks checkpoint backwards compatibility! Use the
cpt_upgrader.py utility to upgrade old checkpoints to the new format.
This patch adds support for the memInvalidate() drain method. TLB
flushing is requested by calling the virtual flushAll() method on the
TLB.
Note: This patch renames invalidateAll() to flushAll() on x86 and
SPARC to make the interface consistent across all supported
architectures.
At least gcc 4.4.3 seems to get confused by the use of func both as a
template parameter and a member variable in the M5VarArgsFault
class. This causes the value of the member variable func to be
unpredictable in M5VarArgsFault objects. This changeset renames the
template parameter to remove this ambiguity.
This patch makes the start and end address private in a move to
prevent direct manipulation and matching of ranges based on these
fields. This is done so that a transition to ranges with interleaving
support is possible.
As a result of hiding the start and end, a number of member functions
are needed to perform the comparisons and manipulations that
previously took place directly on the members. An accessor function is
provided for the start address, and a function is added to test if an
address is within a range. As a result of the latter the != and ==
operator is also removed in favour of the member function. A member
function that returns a string representation is also created to allow
debug printing.
In general, this patch does not add any functionality, but it does
take us closer to a situation where interleaving (and more cleverness)
can be added under the bonnet without exposing it to the user. More on
that in a later patch.
This patch makes the values of ID_ISARx, MIDR, and FPSID configurable
as ISA parameter values. Additionally, setMiscReg now ignores writes
to all of the ID registers.
Note: This moves the MIDR parameter from ArmSystem to ArmISA for
consistency.
The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked
address. Previously we relied on the cache to handle the locking for us, however
some users on the gem5 mailing list reported a case where the cpu speculatively
executes a ll operation after a pending sc operation in the pipeline and that
makes the cache monitor valid. This should handle that case by invaliding the
local monitor.
This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
The predecoder in x86 does a lot of work, most of which can be skipped if the
decoder cache is put in front of it.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Avoid reading them every instruction, and also eliminate the last use of the
thread context in the decoders.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch implements the fnstsw instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
This patch implements the fsincos instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
uopSet_uop is microop instruction that has the IsControl flags set, but the
IsCondControl or IsUncondControl flags seems not to be set, neither in
the construction nor where the microop is used. This patch adds the the
flags in the constructor of the instruction (MicroUopSetPCCPSR).
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
A flag was missing for the movret_uop microop instruction. This patch adds
that flag when the instruction is used, not directly in the constructor of
the instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.
This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
This patch enables dumping statistics and Linux process information on
context switch boundaries (__switch_to() calls) that are used for
Streamline integration (a graphical statistics viewer from ARM).
This patch takes the Linux thread info support scattered across
different ISA implementations (currently in ARM, ALPHA, and MIPS), and
unifies them into a single file.
Adds a few more helper functions to read out TGID, mm, etc.
ISA-specific information (e.g., ALPHA PCBB register) is now moved to
the corresponding isa_traits.hh files.
This patch simplifies the scheduling of the next walk for the ARM
table walker. Previously it used the CPU clock, but as the table
walker inherits the clock from the CPU, it is cleaner to simply use
its own clock (which is the same).
This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.
The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.
This patch changes how the serialization of the system works. The base
class had a non-virtual serialize and unserialize, that was hidden by
a function with the same name for a number of subclasses (most likely
not intentional as the base class should have been virtual). A few of
the derived systems had no specialization at all (e.g. Power and x86
that simply called the System::serialize), but MIPS and Alpha adds
additional symbol table entries to the checkpoint.
Instead of overriding the virtual function, the additional entries are
now printed through a virtual function (un)serializeSymtab. The reason
for not calling System::serialize from the two related systems is that
a follow up patch will require the system to also serialize the
PhysicalMemory, and if this is done in the base class if ends up being
between the general parts and the specialized symbol table.
With this patch, the checkpoint is not modified, as the order of the
segments is unchanged.
This patch addresses a number of smaller issues identified by the code
inspection utility cppcheck. There are a number of identified leaks in
the arm/linux/system.cc (although the function only get's called once
so it is not a major problem), a few deletes in dev/x86/i8042.cc that
were not array deletes, and sprintfs where the character array had one
element less than needed. In the IIC tags there was a function
allocating an array of longs which is in fact never used.
Newer Linux kernels require DTB (device tree blobs) to specify platform
configurations. The input DTB filename can be specified through gem5 parameters
in LinuxArmSystem.
Instead of statically defining miscRegName to contain NUM_MISCREGS
elements, let the compiler determine the length of the array. This
allows us to use a static_assert to test that all registers are listed
in the name vector.
This patch takes the final plunge and transitions from the templated
Range class to the more specific AddrRange. In doing so it changes the
obvious Range<Addr> to AddrRange, and also bumps the range_map to be
AddrRangeMap.
In addition to the obvious changes, including the removal of redundant
includes, this patch also does some house keeping in preparing for the
introduction of address interleaving support in the ranges. The Range
class is also stripped of all the functionality that is never used.
--HG--
rename : src/base/range.hh => src/base/addr_range.hh
rename : src/base/range_map.hh => src/base/addr_range_map.hh
The patch introduces two predicates for condition code registers -- one
tests if a register needs to be read, the other tests whether a register
needs to be written to. These predicates are evaluated twice -- during
construction of the microop and during its execution. Register reads
and writes are elided depending on how the predicates evaluate.
The D flag bit is part of the cc flag bit register currently. But since it
is not being used any where in the implementation, it creates an unnecessary
dependency. Hence, it is being moved to a separate register.
This patch is meant for allowing predicated reads and writes. Note that this
predication is different from the ISA provided predication. They way we
currently provide the ISA description for X86, we read/write registers that
do not need to be actually read/written. This is likely to be true for other
ISAs as well. This patch allows for read and write predicates to be associated
with operands. It allows for the register indices for source and destination
registers to be decided at the time when the microop is constructed. The
run time indicies come in to play only when the at least one of the
predicates has been provided. This patch will not affect any of the ISAs that
do not provide these predicates. Also the patch assumes that the order in
which operands appear in any function of the microop is same across all the
functions of the microops. A subsequent patch will enable predication for the
x86 ISA.
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.
In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.
Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).
The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.
Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.
The PciConfigAll now also uses a Param.Latency rather than a
Param.Tick. For backwards compatibility it still sets the pio_latency
to 1 tick. All the comments have also been updated to not state that
it is in simticks when it is not necessarily the case.
This patch moves the clock of the CPU, bus, and numerous devices to
the new class ClockedObject, that sits in between the SimObject and
MemObject in the class hierarchy. Although there are currently a fair
amount of MemObjects that do not make use of the clock, they
potentially should do so, e.g. the caches should at some point have
the same clock as the CPU, potentially with a 1:n ratio. This patch
does not introduce any new clock objects or object hierarchies
(clusters, clock domains etc), but is still a step in the direction of
having a more structured approach clock domains.
The most contentious part of this patch is the serialisation of clocks
that some of the modules (but not all) did previously. This
serialisation should not be needed as the clock is set through the
parameters even when restoring from the checkpoint. In other words,
the state is "stored" in the Python code that creates the modules.
The nextCycle methods are also simplified and the clock phase
parameter of the CPU is removed (this could be part of a clock object
once they are introduced).
Alpha System was overriding loadState() function to setup some functional
event. The system tried to read/write to memory before the Ruby memory had
unserialized the state. With this patch, Alpha System overrides the
startup() function, and sets up functional events in this function. This
works because startup() is called after Ruby memory system has unserialized
the memory state.
This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.
This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.
New tool chains seem to be looking for kernel versions newer than what
this this was previously set to. Also take this opportunity to change
the hostname we report in uname to sim.gem5.org.
Enable different whitelists for different OS/arch combinations,
since some use the generic Linux definitions only, and others
use definitions inherited from earlier Unix flavors on those
architectures.
Also update x86 function pointers so ioctl is no longer
unimplemented on that platform.
This patch is a revised version of Vince Weaver's earlier patch.
According to the A15 TRM the value of this register is as follows (assuming 16 word = 64 byte lines)
[31:29] Format - b100 specifies v7
[28] RAZ - b0
[27:24] CWG log2(max writeback size #words) - 0x4 16 words
[23:20] ERG log2(max reservation size #words) - 0x4 16 words
[19:16] DminLine log2(smallest dcache line #words) - 0x4 16 words
[15:14] L1Ip L1 index/tagging policy - b11 specifies PIPT
[13:4] RAZ - b0000000000
[3:0] IminLine log2(smallest icache line #words) - 0x4 16 words
This patch makes getAddrRanges const throughout the code base. There
is no reason why it should not be, and making it const prevents adding
any unintentional side-effects.
This patch fixes two warnings, one related to a narrowing conversion
(int to MachInst), and one due to the cast operator for arguments and
a mismatch in const-ness (const void* and void*).
npc in PCState for ARM was being calculated before the current flags were
updated with the next flags. This causes an issue as the npc is incremented by
two or four depending on the current flags (thumb or not) and was leading to
branches that were predicted correctly being identified as mispredicted.
This patch fixes a failing compilation caused by MaxMiscDestRegs being
zero. According to gcc 4.6, the result is a comparison that is always
false due to limited range of data type.
Due to recent changes to X86 TLB, gem5 stopped compiling on
gcc version 4.4.3. This patch provides the fix for that problem. The patch
is tested on gcc 4.4.3. The change is not required for more recent
versions of gcc (like on 4.6.3).
initCPU() will be called to initialize switched out CPUs for the simple and
inorder CPU models. this patch prevents those CPUs from being initialized
because they should get their state from the active CPU when it is switched
out.
This change allows designating a system as MP capable or not as some
bootloaders/kernels care that it's set right. You can have a single
processor MP capable system, but you can't have a multi-processor
UP only system. This change also fixes the initialization of the MIDR
register.
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
The CPUID instruction was implemented so that it would only write its results
if the instruction was successful. This works fine on the simple CPU where
unwritten registers retain their old values, but on a CPU like O3 with
renaming this is broken. The instruction needs to write the old values back
into the registers explicitly if they aren't being changed.
There are some bits of some fields of the ExtMachInst which are not actually
used for anything but are included in the hash of an ExtMachInst for
simplicity and efficiency. This change makes sure the decoder's internal
working ExtMachInst is completely initialized, even these unused bits, so that
there isn't any nondeterministic behavior, no valgrind messages about
uninitialized variables, and no potential false misses/redundant entries in
the decode cache.
The GDT can be accessed by user level software running in compatibility mode
by moving segment selectors into segment registers. The GDT needs to be set up
at an address accessible in this mode.
A small change was added a while ago to keep addresses from overflowing 32
bits when larger addresses shouldn't be accessible to software. That change
truncated when not in long mode, but really it should have truncated when not
in 64 bit mode. The difference is whether compatibility mode is included, a
mode that's supposed to act like a legacy 32 bit mode.
This will allow it to be specialized by the ISAs. The existing caching scheme
is provided by the BasicDecodeCache in the GenericISA namespace and is built
from the generalized components.
--HG--
rename : src/cpu/decode_cache.cc => src/arch/generic/decode_cache.cc
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.
--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
This patch moves the DMA device to its own set of files, splitting it
from the IO device. There are no behavioural changes associated with
this patch.
The patch also grabs the opportunity to do some very minor tidying up,
including some white space removal and pruning some redundant
parameters.
Besides the immediate benefits of the separation-of-concerns, this
patch also makes upcoming changes more streamlined as it split the
devices that are only slaves and the DMA device that also acts as a
master.
--HG--
rename : src/dev/io_device.cc => src/dev/dma_device.cc
rename : src/dev/io_device.hh => src/dev/dma_device.hh
This patch makes the (device) DmaPort non-snooping and removes the
recvSnoop constructor parameter and instead introduces a
SnoopingDmaPort subclass for the ARM table walker.
Functionality is unchanged, as are the stats, and the patch merely
clarifies that the normal DMA ports are not snooping (although they
may issue requests that are snooped by others, as done with PCI, PCIe,
AMBA4 ACE etc).
Currently this port is declared in the ARM table walker as it is not
used anywhere else. If other ports were to have similar behaviour it
could be moved in a future patch.
This patch moves the ECF and EZF bits to individual registers (ecfBit and
ezfBit) and the CF and OF bits to cfofFlag registers. This is being done
so as to lower the read after write dependencies on the the condition code
register. Ultimately we will have the following registers [ZAPS], [OF],
[CF], [ECF], [EZF] and [DF]. Note that this is only one part of the
solution for lowering the dependencies. The other part will check whether
or not the condition code register needs to be actually read. This would
be done through a separate patch.
Symbol tables masked with the loadAddrMask create redundant entries
that could conflict with kernel function events that rely on the
original addresses. This patch guards the creation of those masked
symbol tables by default, with an option to enable them when needed
(for early-stage kernel debugging, etc.)
This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.
For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.
It's possible for two page table walks to overlap which will go in the same
place in the TLB's trie. They would land on top of each other, so this change
adds some code which detects if an address already matches an entry and if so
throws away the new one.
The parameter is _machInst, which is very similar to the member machInst. If
machInst is used to pass the parameter to a lower level constructor, what
really happens is that machInst is set to whatever it already happened to be,
effectively leaving it uninitialized.
This change also adjusts the TlbEntry class so that it stores the number of
address bits wide a page is rather than its size in bytes. In other words,
instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn into 4K,
but it's a little harder going the other way.
This patch simplifies the packet by removing the broadcast flag and
instead more firmly relying on (and enforcing) the semantics of
transactions in the classic memory system, i.e. request packets are
routed from a master to a slave based on the address, and when they
are created they have neither a valid source, nor destination. On
their way to the slave, the request packet is updated with a source
field for all modules that multiplex packets from multiple master
(e.g. a bus). When a request packet is turned into a response packet
(at the final slave), it moves the potentially populated source field
to the destination field, and the response packet is routed through
any multiplexing components back to the master based on the
destination field.
Modules that connect multiplexing components, such as caches and
bridges store any existing source and destination field in the sender
state as a stack (just as before).
The packet constructor is simplified in that there is no longer a need
to pass the Packet::Broadcast as the destination (this was always the
case for the classic memory system). In the case of Ruby, rather than
using the parameter to the constructor we now rely on setDest, as
there is already another three-argument constructor in the packet
class.
In many places where the packet information was printed as part of
DPRINTFs, request packets would be printed with a numeric "dest" that
would always be -1 (Broadcast) and that field is now removed from the
printing.
This patch introduces port access methods that separates snoop
request/responses from normal memory request/responses. The
differentiation is made for functional, atomic and timing accesses and
builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the
different phases of the protocol (request -> [forwarded snoop request
-> snoop response]* -> response) all use the same port access
functions, even though the snoop packets flow in the opposite
direction to the normal packet. That is, a coherent master sends
normal request and receives responses, but receives snoop requests and
sends snoop responses (vice versa for the slave). These two distinct
phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a
slave through sendFunctional, and the request packet is turned into a
response before the call returns. In a system without cache coherence,
this is all that is needed from the functional interface. For the
cache-coherent scenario, a slave also sends snoop requests to coherent
masters through sendFunctionalSnoop, with responses returned within
the same packet pointer. This is currently used by the bus and caches,
and the LSQ of the O3 CPU. The send/recvFunctional and
send/recvFunctionalSnoop are moved from the Port super class to the
appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with
request being sent from master to slave through sendAtomic. In the
case of cache-coherent ports, a slave can send snoop requests to a
master through sendAtomicSnoop. Just as for the functional access
methods, the atomic send and receive member functions are moved to the
appropriate subclasses.
The timing access methods are different from the functional and atomic
in that requests and responses are separated in time and
send/recvTiming are used for both directions. Hence, a master uses
sendTiming to send a request to a slave, and a slave uses sendTiming
to send a response back to a master, at a later point in time. Snoop
requests and responses travel in the opposite direction, similar to
what happens in functional and atomic accesses. With the introduction
of this patch, it is possible to determine the direction of packets in
the bus, and no longer necessary to look for both a master and a slave
port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming
that are pure virtual functions, the recvFunctionalSnoop,
recvAtomicSnoop and recvTimingSnoop have a default implementation that
calls panic. This is to allow non-coherent master and slave ports to
not implement these functions.
This patch addresses a number of minor issues that cause problems when
compiling with clang >= 3.0 and gcc >= 4.6. Most importantly, it
avoids using the deprecated ext/hash_map and instead uses
unordered_map (and similarly so for the hash_set). To make use of the
new STL containers, g++ and clang has to be invoked with "-std=c++0x",
and this is now added for all gcc versions >= 4.6, and for clang >=
3.0. For gcc >= 4.3 and <= 4.5 and clang <= 3.0 we use the tr1
unordered_map to avoid the deprecation warning.
The addition of c++0x in turn causes a few problems, as the
compiler is more stringent and adds a number of new warnings. Below,
the most important issues are enumerated:
1) the use of namespaces is more strict, e.g. for isnan, and all
headers opening the entire namespace std are now fixed.
2) another other issue caused by the more stringent compiler is the
narrowing of the embedded python, which used to be a char array,
and is now unsigned char since there were values larger than 128.
3) a particularly odd issue that arose with the new c++0x behaviour is
found in range.hh, where the operator< causes gcc to complain about
the template type parsing (the "<" is interpreted as the beginning
of a template argument), and the problem seems to be related to the
begin/end members introduced for the range-type iteration, which is
a new feature in c++11.
As a minor update, this patch also fixes the build flags for the clang
debug target that used to be shared with gcc and incorrectly use
"-ggdb".
This patch removes the assumption on having on single instance of
PhysicalMemory, and enables a distributed memory where the individual
memories in the system are each responsible for a single contiguous
address range.
All memories inherit from an AbstractMemory that encompasses the basic
behaviuor of a random access memory, and provides untimed access
methods. What was previously called PhysicalMemory is now
SimpleMemory, and a subclass of AbstractMemory. All future types of
memory controllers should inherit from AbstractMemory.
To enable e.g. the atomic CPU and RubyPort to access the now
distributed memory, the system has a wrapper class, called
PhysicalMemory that is aware of all the memories in the system and
their associated address ranges. This class thus acts as an
infinitely-fast bus and performs address decoding for these "shortcut"
accesses. Each memory can specify that it should not be part of the
global address map (used e.g. by the functional memories by some
testers). Moreover, each memory can be configured to be reported to
the OS configuration table, useful for populating ATAG structures, and
any potential ACPI tables.
Checkpointing support currently assumes that all memories have the
same size and organisation when creating and resuming from the
checkpoint. A future patch will enable a more flexible
re-organisation.
--HG--
rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py
rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py
rename : src/mem/physical.cc => src/mem/abstract_mem.cc
rename : src/mem/physical.hh => src/mem/abstract_mem.hh
rename : src/mem/physical.cc => src/mem/simple_mem.cc
rename : src/mem/physical.hh => src/mem/simple_mem.hh
Virtual (pre-segmentation) addresses are truncated based on address size, and
any non-64 bit linear address is truncated to 32 bits. This means that real
mode addresses aren't truncated down to 16 bits after their segment bases are
added in.
This patch introduces the notion of a master and slave port in the C++
code, thus bringing the previous classification from the Python
classes into the corresponding simulation objects and memory objects.
The patch enables us to classify behaviours into the two bins and add
assumptions and enfore compliance, also simplifying the two
interfaces. As a starting point, isSnooping is confined to a master
port, and getAddrRanges to slave ports. More of these specilisations
are to come in later patches.
The getPort function is not getMasterPort and getSlavePort, and
returns a port reference rather than a pointer as NULL would never be
a valid return value. The default implementation of these two
functions is placed in MemObject, and calls fatal.
The one drawback with this specific patch is that it requires some
code duplication, e.g. QueuedPort becomes QueuedMasterPort and
QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort
(avoiding multiple inheritance). With the later introduction of the
port interfaces, moving the functionality outside the port itself, a
lot of the duplicated code will disappear again.
This patch changes the name of a bitfield from W to W_FIELD to avoid
clashes with W being used as a class (typename) in the templatized
range_map. It also changes L to L_FIELD to avoid future problems. The
problem manifestes itself when the CPU includes a header that in turn
includes range_map.hh. The relevant parts of the decoder are updated.
This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
This patch makes the code compile with clang 2.9 and 3.0 again by
making two very minor changes. Firt, it maintains a strict typing in
the forward declaration of the BaseCPUParams. Second, it adds a
FullSystemInt flag of the type unsigned int next to the boolean
FullSystem flag. The FullSystemInt variable can be used in
decode-statements (expands to switch statements) in the instruction
decoder.
Making the CheckerCPU a runtime time option requires the code to be compatible
with ISAs other than ARM. This patch adds the appropriate function
stubs to allow compilation.
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files. Also merges with the SE/FS changes.
The change to port proxies recently moved code out of the constructor into
initState(). This is needed for code that loads data into memory, however
for code that setups symbol tables, kernel based events, etc this is the wrong
thing to do as that code is only called when a checkpoint isn't being restored
from.
New kernels attempt to read CP14 what debug architecture is available.
These changes add the debug registers and return that none is currently
available.
With the recent series of patches, the symbol table loading moved from
"construct" time to "init" time, but the kernel function event
callback registration was left behind. This patch moves it to the
proper location.
Add extra declarations to allow the compiler to pick up the right function.
Please note that these declarations have been added as part of the
clang-related changes.
This patch adds a function to X86 tlb that returns the
walker port. This port is required for correctly connecting
the walker ports for the cpu just switched in
If an instruction is executed speculatively and hits a situation where it
wants to panic, it should return a fault instead. If the instruction was
misspeculated, the fault can be thrown away. If the instruction wasn't
misspeculated, the fault will be invoked and the panic will still happen.
This patch is adding a clearer design intent to all objects that would
not be complete without a port proxy by making the proxies members
rathen than dynamically allocated. In essence, if NULL would not be a
valid value for the proxy, then we avoid using a pointer to make this
clear.
The same approach is used for the methods using these proxies, such as
loadSections, that now use references rather than pointers to better
reflect the fact that NULL would not be an acceptable value (in fact
the code would break and that is how this patch started out).
Overall the concept of "using a reference to express unconditional
composition where a NULL pointer is never valid" could be done on a
much broader scale throughout the code base, but for now it is only
done in the locations affected by the proxies.
This patch moves all port creation from the getPort method to be
consistently done in the MemObject's constructor. This is possible
thanks to the Swig interface passing the length of the vector ports.
Previously there was a mix of: 1) creating the ports as members (at
object construction time) and using getPort for the name resolution,
or 2) dynamically creating the ports in the getPort call. This is now
uniform. Furthermore, objects that would not be complete without a
port have these ports as members rather than having pointers to
dynamically allocated ports.
This patch also enables an elaboration-time enumeration of all the
ports in the system which can be used to determine the masterId.
This patch classifies all ports in Python as either Master or Slave
and enforces a binding of master to slave. Conceptually, a master (such
as a CPU or DMA port) issues requests, and receives responses, and
conversely, a slave (such as a memory or a PIO device) receives
requests and sends back responses. Currently there is no
differentiation between coherent and non-coherent masters and slaves.
The classification as master/slave also involves splitting the dual
role port of the bus into a master and slave port and updating all the
system assembly scripts to use the appropriate port. Similarly, the
interrupt devices have to have their int_port split into a master and
slave port. The intdev and its children have minimal changes to
facilitate the extra port.
Note that this patch does not enforce any port typing in the C++
world, it merely ensures that the Python objects have a notion of the
port roles and are connected in an appropriate manner. This check is
carried when two ports are connected, e.g. bus.master =
memory.port. The following patches will make use of the
classifications and specialise the C++ ports into masters and slaves.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
Usage: m5 writefile <filename>
File will be created in the gem5 output folder with the identical filename.
Implementation is largely based on the existing "readfile" functionality.
Currently does not support exporting of folders.
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU. These changes have only been tested with the ARM ISA. Other
ISAs potentially require modification.
This patch cleans up forward declarations and a member-function
prototype that still referred to the old FunctionalPort, VirtualPort
and TranslatingPort. There is no change in functionality.
This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.
The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy
--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
A recent changeset (aae12ce9f34c) removed support for
PAL-mode breakpoints in Alpha, since it was awkward
and likely unused. This patch lets a user know if they
potentially run into this limitation.
The DPRINTF for doing protection checks appears after the checks have been
carried out. It is possible that the function returns while the checks are
being carried, in which case the printf is missed out. This patch moves the
DPRINTF before the checks.
--HG--
extra : rebase_source : 172896057e593022444d882ea93323a5d9f77a89
Adds the flag 'recvSnoops' which enables pagewalkers using DmaPorts,
to properly configure snoops.
--HG--
extra : rebase_source : 64207bef62c3268ddff2236ee4adae873812325f
Squashes the subsequent instructions in O3 pipe after the service call, so that
they see the effect of the system call when re-executed. This isn't really an issue
with FS mode, but can show up in SE mode.
--HG--
extra : rebase_source : 613a69fe1d9834261e25a8cd340aa6b47578e1fe
This patch adds a new microop for memory barrier. The microop itself does
nothing, but since it is marked as a memory barrier, the O3 CPU should flush
all the pending loads and stores before the fence to the memory system.
This parameter depends on a number of coincidences to work properly. First,
there must be an array assigned to system called "cpu" even though there's no
parameter called that. Second, the items in the "cpu" array have to have a
"clock" parameter which has a "frequency" member. This is true of the normal
CPUs, but isn't true of the memory tester CPUs. This happened to work before
because the memory tester CPUs were only used in SE mode where this parameter
was being excluded. Since everything is being pulled into a common binary,
this won't work any more. Since the boot_cpu_frequency parameter is only used
by Alpha's Linux System object (and Mips's through copy and paste), the
definition of that parameter is moved down to those objects specifically.
PageTable supported an allocate() call that called back
through the Process to allocate memory, but did not have
a method to map addresses without allocating new pages.
It makes more sense for Process to do the allocation, so
this method was renamed allocateMem() and moved to Process,
and uses a new map() call on PageTable.
The remaining uses of the process pointer in PageTable
were only to get the name and the PID, so by passing these
in directly in the constructor, we can make PageTable
completely independent of Process.
Not all objects need a platform pointer, and having one creates a dependence
on their being a platform object. This change removes the platform pointer to
from the base device object and moves it into subclasses that actually need
it.
In order for a system object to work in SE mode and FS mode, it has to either
always require a platform object even in SE mode, or get rid of the
requirement all together. Making SE mode carry around unnecessary/unused bits
of FS seems less than ideal, so I decided to go with the second option. The
platform pointer in the System class was used for exactly one purpose, a path
for the Alpha Linux system object to get to the real time clock and read its
frequency so that it could short cut the loops_per_jiffy calculation. There
was also a copy and pasted implementation in MIPS, but since it was only there
because it was there in Alpha I still count that as one use.
This change reverses the mechanism that communicates the RTC frequency so that
the Tsunami platform object pushes it up to the AlphaSystem object. This is
slightly less specific than it could be because really only the
AlphaLinuxSystem uses it. Because the intrFrequency function on the Platform
class was no longer necessary (and unimplemented on anything but Alpha) it was
eliminated.
After this change, a platform will need to have a system, but a system won't
have to have a platform.
These faults take varargs to their constructors which they print into a string
and pass to the M5DebugFault base class. They are basically faults wrapped
around panics, faults, warns, and warnonce-es so that they happen only at
commit.
By using an underscore, the "." is still available and can unambiguously be
used to refer to members of a structure if an operand is a structure, class,
etc. This change mostly just replaces the appropriate "."s with "_"s, but
there were also a few places where the ISA descriptions where handling the
extensions themselves and had their own regular expressions to update. The
regular expressions in the isa parser were updated as well. It also now
looks for one of the defined type extensions specifically after connecting "_"
where before it would look for any sequence of characters after a "."
following an operand name and try to use it as the extension. This helps to
disambiguate cases where a "_" may legitimately be part of an operand name but
not separate the name from the type suffix.
Because leaving the "_" and suffix on the variable name still leaves a valid
C++ identifier and all extensions need to be consistent in a given context, I
considered leaving them on as a breadcrumb that would show what the intended
type was for that operand. Unfortunately the operands can be referred to in
code templates, the Mem operand in particular, and since the exact type of Mem
can be different for different uses of the same template, that broke things.
There was a change a while ago that refactored some scons stuff which got rid
of cpu_models.py but also accidentally got rid of the ISA parser as a source
for its target files. That meant that changes which affected the parser
wouldn't cause a rebuild unless they also changed one of the description
files. This change fixes that.
Translating MSR addresses into MSR register indices took a lot of space in the
TLB source and made looking around in that file awkward. This change moves
the lookup into its own file to get it out of the way. It also changes it from
a switch statement to a hash map which should hopefully be a little more
efficient.
This change is a significant reorganization of the MIPS fault code that gets
rid of duplication, fixes some bugs, doubtlessly introduces others, and adds
names for the exception code constants.
Pass in a bool to indicate if the fault is from a store instead of having two
different classes. The classes were also misleadingly named since loads are
also processed by the DTB but should return ITB faults since they aren't
stores. The TLB may be returning the wrong fault in this case, but I haven't
looked at it closely.
Get rid of Fault classes left over from when this file was copied from Alpha,
and rename ArithmeticOverflowFault to be IntegerOverflowFault and get rid of
the old IntegerOverflowFault stub. The Integer version is what's actually in
the manual, but the Arithmetic version had the implementation.
The decoder now checks the value of FULL_SYSTEM in a switch statement to
decide whether to return a real syscall instruction or one that triggers
syscall emulation (or a panic in FS mode). The switch statement should devolve
into an if, and also should be optimized out since it's based on constant
input.
Only create a memory ordering violation when the value could have changed
between two subsequent loads, instead of just when loads go out-of-order
to the same address. While not very common in the case of Alpha, with
an architecture with a hardware table walker this can happen reasonably
frequently beacuse a translation will miss and start a table walk and
before the CPU re-schedules the faulting instruction another one will
pass it to the same address (or cache block depending on the dendency
checking).
This patch has been tested with a couple of self-checking hand crafted
programs to stress ordering between two cores.
The performance improvement on SPEC benchmarks can be substantial (2-10%).
So a mips-cross-gdb can connect with gem5(MIPS_SE), and do some remote
debugging.
Testing:
Build gem5 for MIPS_SE and make gem5 wait at beginning:
modify "rgdb_wait = -1" to "rgdb_wait = 0" in src/sim/system.cc;
scons build/MIPS_SE/gem5.opt CPU_MODELS=O3CPU
----
Build GDB-7.3 mips-cross:
./configure --target=mips-linux-gnu --prefix=xxx/gdb-7.3-install/
make
make install
----
Run:
./build/MIPS_SE/gem5.opt configs/example/se.py --detailed --caches
./mips-linux-gnu-gdb xxx/gem5/tests/test-progs/hello/bin/mips/linux/hello
(gdb) target remote :7000
(gdb) info registers
(gdb) disassemble
(gdb) si
(gdb) break main
(gdb) c
(gdb) quit
Testing done.
Having two StaticInst classes, one nominally ISA dependent and the other ISA
dependent, has not been historically useful and makes the StaticInst class
more complicated that it needs to be. This change merges StaticInstBase into
StaticInst.
This change pulls the instruction decoding machinery (including caches) out of
the StaticInst class and puts it into its own class. This has a few intrinsic
benefits. First, the StaticInst code, which has gotten to be quite large, gets
simpler. Second, the code that handles decode caching is now separated out
into its own component and can be looked at in isolation, making it easier to
understand. I took the opportunity to restructure the code a bit which will
hopefully also help.
Beyond that, this change also lays some ground work for each ISA to have its
own, potentially stateful decode object. We'd be able to include less
contextualizing information in the ExtMachInst objects since that context
would be applied at the decoder. Also, the decoder could "know" ahead of time
that all the instructions it's going to see are going to be, for instance, 64
bit mode, and it will have one less thing to check when it decodes them.
Because the decode caching mechanism has been separated out, it's now possible
to have multiple caches which correspond to different types of decoding
context. Having one cache for each element of the cross product of different
configurations may become prohibitive, so it may be desirable to clear out the
cache when relatively static state changes and not to have one for each
setting.
Because the decode function is no longer universally accessible as a static
member of the StaticInst class, a new function was added to the ThreadContexts
that returns the applicable decode object.
Do some minor cleanup of some recently added comments, a warning, and change
other instances of stack extension to be like what's now being done for x86.
The way flag bits were being set for microops in x86 ended up implicitly
calling the bitset constructor which was truncating flags beyond the width of
an unsigned long. This change sets the bits in chunks which are always small
enough to avoid being truncated. On 64 bit machines this should reduce to be
the same as before, and on 32 bit machines it should work properly and not be
unreasonably inefficient.
When an instruction is translated in the x86 TLB, a variable called
delayedResponse is passed back and forth which tracks whether a translation
could be completed immediately, or if there's going to be callback that will
finish things up. If a read was to the internal memory space, memory mapped
registers used to implement things like MSRs, the function hadn't yet gotten
to where delayedResponse was set to false, it's default. That meant that the
value was never set, and the TLB could start waiting for a callback that would
never come. This change simply moves the assignment to above where control
can divert to translateInt().
Nothing big here, but when you have an address that is not in the page table request to be allocated, if it falls outside of the maximum stack range all you get is a page fault and you don't know why. Add a little warn() to explain it a bit. Also add some comments and alter logic a little so that you don't totally ignore the return value of checkAndAllocNextPage().
There are a set of locations is the linux kernel that are managed via
cache maintence instructions until all processors enable their MMUs & TLBs.
Writes to these locations are manually flushed from the cache to main
memory when the occur so that cores operating without their MMU enabled
and only issuing uncached accesses can receive the correct data. Unfortuantely,
gem5 doesn't support any kind of software directed maintence of the cache.
Until such time as that support exists this patch marks the specific cache blocks
that need to be coherent as non-cacheable until all CPUs enable their MMU and
thus allows gem5 to boot MP systems with caches enabled (a requirement for
booting an O3 cpu and thus an O3 CPU regression).