When doing an unsigned 64 bit division with a divisor that has its most
significant bit set, the division code would spill a bit off of the end of a
uint64_t trying to shift the dividend into position. This change adds code
that handles that case specially by purposefully letting it spill and then
going ahead assuming there was a 65th one bit.
Accessing traceData (to call setAddress() and/or setData())
after initiating a timing translation was causing crashes,
since a failed translation could delete the traceData
object before returning.
It turns out that there was never a need to access traceData
after initiating the translation, as the traced data was
always available earlier; this ordering was merely
historical. Furthermore, traceData->setAddress() and
traceData->setData() were being called both from the CPU
model and the ISA definition, often redundantly.
This patch standardizes all setAddress and setData calls
for memory instructions to be in the CPU models and not
in the ISA definition. It also moves those calls above
the translation calls to eliminate the crashes.
When each load or store is sent to the LSQ, we check whether it will cross a
cache line boundary and, if so, split it in two. This creates two TLB
translations and two memory requests. Care has to be taken if the first
packet of a split load is sent but the second blocks the cache. Similarly,
for a store, if the first packet cannot be sent, we must store the second
one somewhere to retry later.
This modifies the LSQSenderState class to record both packets in a split
load or store.
Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA
to indicate whether unaligned memory accesses are allowed. This is used
throughout the changed code so that compiler can optimise away code dealing
with split requests for ISAs that don't need them.
1) Move alpha-specific code out of page_table.cc:serialize().
2) Begin serializing M5_pid and unserializing it, but adding an function to do optional paramIn so that old checkpoints don't need to be fixed up.
3) Fix up alpha startup code so that the unserialized M5_pid value is properly written to DTB_IPR_ASN.
4) Fix the memory unserialize that I forgot somehow in the last changeset.
5) Add in an agg_se.py to handle aggregated checkpoints. --bench foo-bar plus positional arguments foo bar are the only changes in usage from se.py.
Note this aggregation stuff has only been tested for Alpha and nothing else, though it should take a very minimal amount of work to get it to work with another ISA.
In Linux, the set_thread_area system call stores the address of the thread
local storage area into a field of the current thread_info structure. Later,
to access that value, the program uses the rdhwr instruction to read a
"hardware register" with index 29. The 64 bit MIPS manual, volume II, says
that index 29 is reserved for a future ABI extension and should cause a
"Reserved Instruction Exception". In Linux (and potentially other ISAs) that
exception is trapped and emulated to return the value stored by
set_thread_area as if that were actually stored by a physical register.
The tp_value address (as named in the Linux kernel) is ironically stored as a
control register so that it goes with a particular ThreadContext. Syscall
emulation will use that to emulate storing to the OS's thread info structure,
and rdhwr will emulate faulting and returning that value from software by
returning the value itself, as if it was in hardware. In other words, we fake
faking the register in SE mode. In an FS mode implementation it should
work as specified in the manual.
The MIPS ISA object expects to be constructed with a CPU pointer it uses to
look at other thread contexts and allow them to be manipulated with control
registers. Unfortunately, that differs from all the other ISA classes and
would complicate their implementation.
This change makes the event constructor use a CPU pointer pulled out of the
thread context passed to setMiscReg instead.
Some of the micro-ops weren't casting 1 to ULL before shifting,
which can cause problems. On the perl makerand input this
caused some values to be negative that shouldn't have been.
The casts are done as ULL(1) instead of 1ULL to match others
in the m5 code base.
The PC indexes in the various register sets was defined in the section for
unaliased registers which was throwing off the indexing. This moves those
where they belong. Also, to make detecting accesses to the PC easier and
because it's in the same place in all modes, the intRegForceUser function
now passes it through as index 15.
Unfortunately my implementation of the movd instruction had two bugs.
In one case, when moving a 32-bit value into an xmm register, the
lower half of the xmm register was not zero extended.
The other case is that xmm was used instead of xmmlm as the source
for a register move. My test case didn't notice this at first
as it moved xmm0 to eax, which both have the same register
number.
This double cast led to rounding errors which caused
some benchmarks to get the wrong values, most notably lucas
which failed spectacularly due to CVTTSD2SI returning an
off-by-one value. equake was also broken.
Specifically, get rid of the big switch statement so more cases can be
handled. Enumerating all the possible settings doesn't scale well. Also do
some minor style clean up.
Add constants for all the modes and registers, maps for aliasing, functions
that use the maps and range check, and use a named constant instead of a magic
number for the microcode register.
This problem is like the one fixed with movhpd a few weeks ago.
A +8 displacement is used to access memory when there should
be none.
This fix is needed for the perlbmk spec2k benchmark to run.
64-bit vsyscall is different than 32-bit.
There are only two syscalls, time and gettimeofday.
On a real system, there is complicated code that implements these
without entering the kernel. That would be complicated to implement in m5.
Instead we just place code that calls the regular syscalls (this is how
tools such as valgrind handle this case).
This is needed for the perlbmk spec2k benchmark.
These are complicated instructions and the micro-code might be suboptimal.
This has been tested with some small sample programs (attached)
The psrldq instruction is needed by various spec2k programs.
This patch implements the movd_Vo_Edp series of instructions.
It addresses various concerns by Gabe Black about which file the
instruction belonged in, as well as supporting REX prefixed
instructions properly.
This instruction is needed for some of the spec2k benchmarks, most
notably bzip2.
This patch implements the haddpd instruction.
It fixes the problem in the previous version (pointed out by Gabe Black)
where an incorrect result would happen if you issue the instruction
with the same argument twice, i.e. "haddpd %xmm0,%xmm0"
This instruction is used by many spec2k benchmarks.
This patch hooks up the truncate, ftruncate, truncate64 and ftruncate64
system calls on 32-bit and 64-bit X86.
These have been tested on both architectures.
ftruncate/ftruncate64 is needed for the f90 spec2k benchmarks.
When accessing arguments for a syscall, the position of an argument depends on
the policies of the ISA, how much space preceding arguments took up, and the
"alignment" of the index for this particular argument into the number of
possible storate locations. This change adjusts getSyscallArg to take its
index parameter by reference instead of value and to adjust it to point to the
possible location of the next argument on the stack, basically just after the
current one. This way, the rules for the new argument can be applied locally
without knowing about other arguments since those have already been taken into
account implicitly.
All system calls have also been changed to reflect the new interface. In a
number of cases this made the implementation clearer since it encourages
arguments to be collected in one place in order and then used as necessary
later, as opposed to scattering them throughout the function or using them in
place in long expressions. It also discourages using getSyscallArg over and
over to retrieve the same value when a temporary would do the job.
The movdqa instruction should enforce 16-byte alignment.
This implementation does not do that.
These instructions are needed for most of x86_64 spec2k to run.
The st_size entry was in the wrong place
(see linux-2.6.29/arch/x86/include/asm/stat.h )
Also, the packed attribute is needed when compiling on a
64-bit machine, otherwise gcc adds extra padding that
break the layout of the structure.
This adds support for the 32-bit, big endian Power ISA. This supports both
integer and floating point instructions based on the Power ISA Book I v2.06.
I've tested these on x86 and they work as expected.
In theory for 32-bit x86 we should have some sort of special
handling for the legacy 16-bit uid/gid syscalls, but in practice
modern toolchains don't use the 16-bit versions, and m5 sets the uid
and gid values to be less than 16-bits anyway.
This fix is needed for the perl spec2k benchmarks to run.
Get rid of misc.py and just stick misc things in __init__.py
Move utility functions out of SCons files and into m5.util
Move utility type stuff from m5/__init__.py to m5/util/__init__.py
Remove buildEnv from m5 and allow access only from m5.defines
Rename AddToPath to addToPath while we're moving it to m5.util
Rename read_command to readCommand while we're moving it
Rename compare_versions to compareVersions while we're moving it.
--HG--
rename : src/python/m5/convert.py => src/python/m5/util/convert.py
rename : src/python/m5/smartdict.py => src/python/m5/util/smartdict.py
The manuals from both AMD and Intel say that when writing to a 32 bit
destination in 64 bit mode, the upper 32 bits of the register are filled with
zeros. They also both say that the CMOV instructions leave their destination
alone when their condition fails. Unfortunately, it seems that CMOV will zero
extend its destination register whether or not it was supposed to actually do
a move on both platforms. This seems to be the only case where this happens,
but it would be hard to say for sure.
This is my best guess as far as what these should do. Other existing microops
use implicit registers, mul1s and mul1u for instance, so this should be ok.
The microop that loads the implicit DoubleBits register would fall into one
of the microop slots for moving to/from special registers.