This changes the RFE macroop into 3 microops:
URa = [sp]; URb = [sp+4]; // load CPSR,PC values from stack
sp = sp + offset; // optionally auto-increment
PC = URa; CPSR = URb; // write to the PC and CPSR.
Importantly:
- writing to PC is handled in the last micro-op.
- loading occurs prior to state changes.
Allow some loads that update the base register to use just two micro-ops. three
micro-ops are only used if the destination register matches the offset register
or the PC is the destination regsiter. If the PC is updated it needs to be
the last micro-op otherwise O3 will mispredict.
There were four bugs in these instructions. First, the loaded value was being
stored into a floating point register as floating point, changing the value as
it was transfered. Second, the meaning of the "up" bit had been reversed.
Third, the statically sized microop array wasn't bit enough for all possible
inputs. It's now dynamically sized and should always be big enough. Fourth,
the offset was stored as an unsigned 8 bit value. Negative offsets would look
like moderately large positive offsets.
This change moves the writeback of load multiple instructions to the beginning
of the macroop. That way, the MicroLdrRetUop that changes the mode will
necessarily happen later, ensuring the writeback happens in the original mode.
The actual value in the base register if it also shows up in the register list
is undefined, so it's fine if it gets clobbered by one of the loads. For
stores where the base register is the lowest numbered in the register list,
the original value should be written back. That means stores can't write back
at the beginning, but the mode changing problem doesn't affect them so they
can continue to write back at the end.