This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
This patch performs minimal changes to move the instruction and data
ports from specialised subclasses to the base CPU (to the largest
degree possible). Ultimately it servers to make the CPU(s) have a
well-defined interface to the memory sub-system.
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.
The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy
--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
In FS mode the syscall function will panic, but the interface will be
consistent and code which calls syscall can be compiled in. This will allow,
for instance, instructions that use syscall to be built unconditionally but
then not returned by the decoder.
Having two StaticInst classes, one nominally ISA dependent and the other ISA
dependent, has not been historically useful and makes the StaticInst class
more complicated that it needs to be. This change merges StaticInstBase into
StaticInst.
This change pulls the instruction decoding machinery (including caches) out of
the StaticInst class and puts it into its own class. This has a few intrinsic
benefits. First, the StaticInst code, which has gotten to be quite large, gets
simpler. Second, the code that handles decode caching is now separated out
into its own component and can be looked at in isolation, making it easier to
understand. I took the opportunity to restructure the code a bit which will
hopefully also help.
Beyond that, this change also lays some ground work for each ISA to have its
own, potentially stateful decode object. We'd be able to include less
contextualizing information in the ExtMachInst objects since that context
would be applied at the decoder. Also, the decoder could "know" ahead of time
that all the instructions it's going to see are going to be, for instance, 64
bit mode, and it will have one less thing to check when it decodes them.
Because the decode caching mechanism has been separated out, it's now possible
to have multiple caches which correspond to different types of decoding
context. Having one cache for each element of the cross product of different
configurations may become prohibitive, so it may be desirable to clear out the
cache when relatively static state changes and not to have one for each
setting.
Because the decode function is no longer universally accessible as a static
member of the StaticInst class, a new function was added to the ThreadContexts
that returns the applicable decode object.
readBytes and writeBytes had the word "bytes" in their names because they
accessed blobs of bytes. This distinguished them from the read and write
functions which handled higher level data types. Because those functions don't
exist any more, this change renames readBytes and writeBytes to more general
names, readMem and writeMem, which reflect the fact that they are how you read
and write memory. This also makes their names more consistent with the
register reading/writing functions, although those are still read and set for
some reason.
The DTB expects the correct PC in the ThreadContext
but how if the memory accesses are speculative? Shouldn't
we send along the requestor's PC to the translate functions?
if a faulting instruction reaches an execution unit,
then ignore it and pass it through the pipeline.
Once we recognize the fault in the graduation unit,
dont allow a second fault to creep in on the same cycle.
Before graduating an instruction, explicitly check fault
by making the fault check it's own separate command
that can be put on an instruction schedule.
remove events in the resource pool that can be called from the CPU event, since the CPU
event is scheduled at the same time at the resource pool event.
----
Also, match the resPool event function names to the cpu event function names
----
only update BTB on a taken branch and update branch predictor w/pcstate from instruction
---
only pay attention to branch predictor updates if the the inst. is in fact a branch
formerly, this was implicit when you accessed the execution unit
or the use-def unit but it's better that this just be something
that a user can specify.
Architectures like SPARC need to read the window pointer
in order to figure out it's register dependence. However,
this may not get updated until after an instruction gets
executed, so now we lazily detect the register dependence
in the EXE stage (execution unit or use_def). This
makes sure we get the mapping after the most current change.
Add a few constants and functions that the InOrder model wants for SPARC.
* * *
sparc: add eaComp function
InOrder separates the address generation from the actual access so give
Sparc that functionality
* * *
sparc: add control flags for branches
branch predictors and other cpu model functions need to know specific information
about branches, so add the necessary flags here
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
***
(1): get rid of expandForMT function
MIPS is the only ISA that cares about having a piece of ISA state integrate
multiple threads so add constants for MIPS and relieve the other ISAs from having
to define this. Also, InOrder was the only core that was actively calling
this function
* * *
(2): get rid of corespecific type
The CoreSpecific type was used as a proxy to pass in HW specific params to
a MIPS CPU, but since MIPS FS hasnt been touched for awhile, it makes sense
to not force every other ISA to use CoreSpecific as well use a special
reset function to set it. That probably should go in a PowerOn reset fault
anyway.
Because int and not InstSeqNum was used in a couple of places, you can
overflow the int type and thus get wierd bugs when the sequence number
is negative (or some wierd value)
remove constructors that werent being used (it just gets confusing)
use initialization list for all the variables instead of relying on initVars()
function
-use a pointer to CacheReqPacket instead of PacketPtr so correct destructors
get called on packet deletion
- make sure to delete the packet if the cache blocks the sendTiming request
or for some reason we dont use the packet
- dont overwrite memory requests since in the worst case an instruction will
be replaying a request so no need to keep allocating a new request
- we dont use retryPkt so delete it
- fetch code was split out already, so just assert that this is a memory
reference inst. and that the staticInst is available
keep track of when an instruction needs the execution
behind it to be serialized. Without this, in SE Mode
instructions can execute behind a system call exit().
resources don't need to call getLatency because the latency is already a member
in the class. If there is some type of special case where different instructions
impose a different latency inside a resource then we can revisit this and
add getLatency() back in
each resource has a certain # of requests it can take per cycle. update the #s here
to be more realistic based off of the pipeline width and if the resource needs to
be accessed on multiple cycles
---
need to delete the cache request's data on clearRequest() now that we are recycling
requests
---
fetch unit needs to deallocate the fetch buffer blocks when they are replaced or
squashed.