Suppose the saturating counters of a branch predictor contain n bits. When the
counter is between 0 and (2^(n-1) - 1), boundaries included, the branch is
predicted as not taken. When the counter is between 2^(n-1) and (2^n - 1),
boundaries included, the branch is predicted as taken.
when insts execute, they mark the time they finish to be used for subsequent isnts
they may need forwarding of data. However, the regdepmap was using the wrong
value to index into the destination operands of the instruction to be forwarded.
Thus, in some cases, we are checking to see if the 3rd destination register
for an instruction is executed at a certain time, when there is only 1 dest. register
valid. Thus, we get a bad, uninitialized time value that will stall forwarding
causing performance loss but still the correct execution.
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
make sure to only read 1 src reg. for write-hint and any other similar
'store' instruction. Reading the source reg when its not necessary
can cause the simulator to read from uninitialized values
These recordEvent() calls could cause crashes since they
access the req pointer after it's potentially been
deleted during a failed translation call. (Similar
problem to the traceData bug fixed in the previous cset.)
Moving them above the translation call (as was done
recentlyi in cset 8b2b8e5e7d35) avoids the crash
but doesn't work, since at that point we don't know if
the access is uncached or not.
It's not clear why these calls are there, and no one
seems to use them, so we'll just delete them. If they
are needed, they should be moved to somewhere that's
guaranteed to be after the translation completes but
before the request is possibly deleted, e.g., in
finishTranslation().
Accessing traceData (to call setAddress() and/or setData())
after initiating a timing translation was causing crashes,
since a failed translation could delete the traceData
object before returning.
It turns out that there was never a need to access traceData
after initiating the translation, as the traced data was
always available earlier; this ordering was merely
historical. Furthermore, traceData->setAddress() and
traceData->setData() were being called both from the CPU
model and the ISA definition, often redundantly.
This patch standardizes all setAddress and setData calls
for memory instructions to be in the CPU models and not
in the ISA definition. It also moves those calls above
the translation calls to eliminate the crashes.
When implementing timing address translations instead of atomic, I
forgot to preserve the faults that are returned from the read and
write calls. This patch reinstates them.
When each load or store is sent to the LSQ, we check whether it will cross a
cache line boundary and, if so, split it in two. This creates two TLB
translations and two memory requests. Care has to be taken if the first
packet of a split load is sent but the second blocks the cache. Similarly,
for a store, if the first packet cannot be sent, we must store the second
one somewhere to retry later.
This modifies the LSQSenderState class to record both packets in a split
load or store.
Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA
to indicate whether unaligned memory accesses are allowed. This is used
throughout the changed code so that compiler can optimise away code dealing
with split requests for ISAs that don't need them.
This initiates a timing translation and passes the read or write on to the
processor before waiting for it to finish. Once the translation is finished,
the instruction's state is updated via the 'finish' function. A new
DataTranslation class is created to handle this.
The idea is taken from the implementation of timing translations in
TimingSimpleCPU by Gabe Black. This patch also separates out the timing
translations from this CPU and uses the new DataTranslation class.
- on certain retry requests you can get an assertion failure
- fix by allowing the request to literally "Retry" itself
if it wasnt successful before, and then block any requests
through cache port while waiting for the cache to be
made available for access
when threads are switching in/out the CPU, we need to keep
track of special cases like branches. Add appropriate
variables in ThreadState t track this and then use
these variables when updating pc after context switch
this will be used for when a thread comes back from a cache miss, it needs to update the PCs
because the inst might of been a branch or delayslot in which the next PC isnt always
a straight addition
allow a thread to wakeup and be activated after
it has been in suspended state and another
thread is switched out. Need to give
pipeline stages a "activateThread" function
so that can get to their suspended instruction
when the time is right.
give resources their own specific
activity to do for a "suspend" event
instead of defaulting to deactivating the thread for a
suspend thread event. This really matters
for the fetch sequence unit which wants to remove the
thread from fetching while other units want to
ignore a thread suspension. If you deactivate a thread
in a resource then you may lose some of the allotted
bandwidth that the thread is taking up...
update/add in the use of isThreadReady & isThreadSuspended
functions.Check in activateThread what list a thread is
on so it can be managed accordingly.
-Support ability to activate next ready thread after a cache miss
through the activateNextReadyContext/Thread() functions
-To support this a "readyList" of thread ids is added
-After a cache miss, thread will suspend and then call
activitynextreadythread
allow for events to schedule themselves later if desired. this is important
because of cases like where you need to activate a thread only after the previous
thread has been deactivated. The ordering there has to be enforced
add code to recognize memory stalls in resources and the pipeline as well
as squash a thread if there is a stall and we are in the switch on cache miss
model
add buffer for instructions to switch out to in a pipeline stage
can't squash the instruction and remove the pipeline so we kind of need
to 'suspend' an instruction at the stage while the memory stall resolves
for the switch on cache miss model
- loads were happening on same cycle as the address was generated which is slightly
unrealistic. Instead, force address generation to be on separate cycle from load
initiation
- also, mark the stages in a more traditional way (F-D-X-M-W)