1) Move alpha-specific code out of page_table.cc:serialize().
2) Begin serializing M5_pid and unserializing it, but adding an function to do optional paramIn so that old checkpoints don't need to be fixed up.
3) Fix up alpha startup code so that the unserialized M5_pid value is properly written to DTB_IPR_ASN.
4) Fix the memory unserialize that I forgot somehow in the last changeset.
5) Add in an agg_se.py to handle aggregated checkpoints. --bench foo-bar plus positional arguments foo bar are the only changes in usage from se.py.
Note this aggregation stuff has only been tested for Alpha and nothing else, though it should take a very minimal amount of work to get it to work with another ISA.
This patch changes the way that Ruby handles atomic RMW instructions. This implementation, unlike the prior one, is protocol independent. It works by locking an address from the sequencer immediately after the read portion of an RMW completes. When that address is locked, the coherence controller will only satisfy requests coming from one port (e.g., the mandatory queue) and will ignore all others. After the write portion completed, the line is unlocked. This should also work with multi-line atomics, as long as the blocks are always acquired in the same order.
In Linux, the set_thread_area system call stores the address of the thread
local storage area into a field of the current thread_info structure. Later,
to access that value, the program uses the rdhwr instruction to read a
"hardware register" with index 29. The 64 bit MIPS manual, volume II, says
that index 29 is reserved for a future ABI extension and should cause a
"Reserved Instruction Exception". In Linux (and potentially other ISAs) that
exception is trapped and emulated to return the value stored by
set_thread_area as if that were actually stored by a physical register.
The tp_value address (as named in the Linux kernel) is ironically stored as a
control register so that it goes with a particular ThreadContext. Syscall
emulation will use that to emulate storing to the OS's thread info structure,
and rdhwr will emulate faulting and returning that value from software by
returning the value itself, as if it was in hardware. In other words, we fake
faking the register in SE mode. In an FS mode implementation it should
work as specified in the manual.
The MIPS ISA object expects to be constructed with a CPU pointer it uses to
look at other thread contexts and allow them to be manipulated with control
registers. Unfortunately, that differs from all the other ISA classes and
would complicate their implementation.
This change makes the event constructor use a CPU pointer pulled out of the
thread context passed to setMiscReg instead.
Added error messages when:
- a state does not exist in a machine's list of known states.
- an event does not exist in a machine
- the actions of a certain machine have not been declared
Connects M5 cpu and dma ports directly to ruby sequencers and dma
sequencers. Rubymem also includes a pio port so that pio requests
and be forwarded to a special pio bus connecting to device pio
ports.
Some of the micro-ops weren't casting 1 to ULL before shifting,
which can cause problems. On the perl makerand input this
caused some values to be negative that shouldn't have been.
The casts are done as ULL(1) instead of 1ULL to match others
in the m5 code base.
The PC indexes in the various register sets was defined in the section for
unaliased registers which was throwing off the indexing. This moves those
where they belong. Also, to make detecting accesses to the PC easier and
because it's in the same place in all modes, the intRegForceUser function
now passes it through as index 15.
Unfortunately my implementation of the movd instruction had two bugs.
In one case, when moving a 32-bit value into an xmm register, the
lower half of the xmm register was not zero extended.
The other case is that xmm was used instead of xmmlm as the source
for a register move. My test case didn't notice this at first
as it moved xmm0 to eax, which both have the same register
number.
This double cast led to rounding errors which caused
some benchmarks to get the wrong values, most notably lucas
which failed spectacularly due to CVTTSD2SI returning an
off-by-one value. equake was also broken.
Specifically, get rid of the big switch statement so more cases can be
handled. Enumerating all the possible settings doesn't scale well. Also do
some minor style clean up.