The changes made by the changeset 9376 were not quite correct. The patch made
changes to the code which resulted in decoder not getting initialized correctly
when the state was restored from a checkpoint.
This patch adds a startup function to each ISA object. For x86, this function
sets the required state in the decoder. For other ISAs, the function is empty
right now.
Used as a command in full-system scripts helps the user ensure the benchmarks have finished successfully.
For example, one can use:
/path/to/benchmark args || /sbin/m5 fail 1
and thus ensure gem5 will exit with an error if the benchmark fails.
This changeset inserts a TLB flush in BaseCPU::switchOut to prevent
stale translations when doing repeated switching. Additionally, the
TLB flushing functionality is exported to the Python to make debugging
of switching/checkpointing easier.
A simulation script will typically use the TLB flushing functionality
to generate a reference trace. The following sequence can be used to
simulate a handover (this depends on how drain is implemented, but is
generally the case) between identically configured CPU models:
m5.drain(test_sys)
[ cpu.flushTLBs() for cpu in test_sys.cpu ]
m5.resume(test_sys)
The generated trace should normally be identical to a trace generated
when switching between identically configured CPU models or
checkpointing and resuming.
Currently, we invalidate the cached miscregs in
TLB::unserialize(). The intended use of the drainResume() method is to
invalidate cached state and prepare the system to resume after a CPU
handover or (un)serialization. This patch moves the TLB miscregs
invalidation code to the drainResume() method to avoid surprising
behavior.
Since the page table walker only checks if a drain has completed in
doL1DescriptorWrapper() and doL2DescriptorWrapper(), it sometimes
looses track of a drain request if there is a squash. This changeset
adds a completeDrain() call after squashing requests in the pending
queue, which fixes this issue.
In order to see all registers independent of the current CPU mode, the
ARM architecture model uses the magic MISCREG_CPSR_MODE register to
change the register mappings without actually updating the CPU
mode. This hack is no longer needed since the thread context now
provides a flat interface to the register file. This patch replaces
the CPSR_MODE hack with the flat register interface.
After making the ISA an independent SimObject, it is serialized
automatically by the Python world. Previously, this just resulted in
an empty ISA section. This patch moves the contents of the ISA to that
section and removes the explicit ISA serialization from the thread
contexts, which makes it behave like a normal SimObject during
serialization.
Note: This patch breaks checkpoint backwards compatibility! Use the
cpt_upgrader.py utility to upgrade old checkpoints to the new format.
This patch adds support for the memInvalidate() drain method. TLB
flushing is requested by calling the virtual flushAll() method on the
TLB.
Note: This patch renames invalidateAll() to flushAll() on x86 and
SPARC to make the interface consistent across all supported
architectures.
At least gcc 4.4.3 seems to get confused by the use of func both as a
template parameter and a member variable in the M5VarArgsFault
class. This causes the value of the member variable func to be
unpredictable in M5VarArgsFault objects. This changeset renames the
template parameter to remove this ambiguity.
This patch makes the start and end address private in a move to
prevent direct manipulation and matching of ranges based on these
fields. This is done so that a transition to ranges with interleaving
support is possible.
As a result of hiding the start and end, a number of member functions
are needed to perform the comparisons and manipulations that
previously took place directly on the members. An accessor function is
provided for the start address, and a function is added to test if an
address is within a range. As a result of the latter the != and ==
operator is also removed in favour of the member function. A member
function that returns a string representation is also created to allow
debug printing.
In general, this patch does not add any functionality, but it does
take us closer to a situation where interleaving (and more cleverness)
can be added under the bonnet without exposing it to the user. More on
that in a later patch.
This patch makes the values of ID_ISARx, MIDR, and FPSID configurable
as ISA parameter values. Additionally, setMiscReg now ignores writes
to all of the ID registers.
Note: This moves the MIDR parameter from ArmSystem to ArmISA for
consistency.
The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked
address. Previously we relied on the cache to handle the locking for us, however
some users on the gem5 mailing list reported a case where the cpu speculatively
executes a ll operation after a pending sc operation in the pipeline and that
makes the cache monitor valid. This should handle that case by invaliding the
local monitor.
This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
The predecoder in x86 does a lot of work, most of which can be skipped if the
decoder cache is put in front of it.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Avoid reading them every instruction, and also eliminate the last use of the
thread context in the decoders.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch implements the fnstsw instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
This patch implements the fsincos instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
uopSet_uop is microop instruction that has the IsControl flags set, but the
IsCondControl or IsUncondControl flags seems not to be set, neither in
the construction nor where the microop is used. This patch adds the the
flags in the constructor of the instruction (MicroUopSetPCCPSR).
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
A flag was missing for the movret_uop microop instruction. This patch adds
that flag when the instruction is used, not directly in the constructor of
the instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.
This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
This patch enables dumping statistics and Linux process information on
context switch boundaries (__switch_to() calls) that are used for
Streamline integration (a graphical statistics viewer from ARM).
This patch takes the Linux thread info support scattered across
different ISA implementations (currently in ARM, ALPHA, and MIPS), and
unifies them into a single file.
Adds a few more helper functions to read out TGID, mm, etc.
ISA-specific information (e.g., ALPHA PCBB register) is now moved to
the corresponding isa_traits.hh files.
This patch simplifies the scheduling of the next walk for the ARM
table walker. Previously it used the CPU clock, but as the table
walker inherits the clock from the CPU, it is cleaner to simply use
its own clock (which is the same).
This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.
The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.
This patch changes how the serialization of the system works. The base
class had a non-virtual serialize and unserialize, that was hidden by
a function with the same name for a number of subclasses (most likely
not intentional as the base class should have been virtual). A few of
the derived systems had no specialization at all (e.g. Power and x86
that simply called the System::serialize), but MIPS and Alpha adds
additional symbol table entries to the checkpoint.
Instead of overriding the virtual function, the additional entries are
now printed through a virtual function (un)serializeSymtab. The reason
for not calling System::serialize from the two related systems is that
a follow up patch will require the system to also serialize the
PhysicalMemory, and if this is done in the base class if ends up being
between the general parts and the specialized symbol table.
With this patch, the checkpoint is not modified, as the order of the
segments is unchanged.
This patch addresses a number of smaller issues identified by the code
inspection utility cppcheck. There are a number of identified leaks in
the arm/linux/system.cc (although the function only get's called once
so it is not a major problem), a few deletes in dev/x86/i8042.cc that
were not array deletes, and sprintfs where the character array had one
element less than needed. In the IIC tags there was a function
allocating an array of longs which is in fact never used.
Newer Linux kernels require DTB (device tree blobs) to specify platform
configurations. The input DTB filename can be specified through gem5 parameters
in LinuxArmSystem.
Instead of statically defining miscRegName to contain NUM_MISCREGS
elements, let the compiler determine the length of the array. This
allows us to use a static_assert to test that all registers are listed
in the name vector.
This patch takes the final plunge and transitions from the templated
Range class to the more specific AddrRange. In doing so it changes the
obvious Range<Addr> to AddrRange, and also bumps the range_map to be
AddrRangeMap.
In addition to the obvious changes, including the removal of redundant
includes, this patch also does some house keeping in preparing for the
introduction of address interleaving support in the ranges. The Range
class is also stripped of all the functionality that is never used.
--HG--
rename : src/base/range.hh => src/base/addr_range.hh
rename : src/base/range_map.hh => src/base/addr_range_map.hh
The patch introduces two predicates for condition code registers -- one
tests if a register needs to be read, the other tests whether a register
needs to be written to. These predicates are evaluated twice -- during
construction of the microop and during its execution. Register reads
and writes are elided depending on how the predicates evaluate.
The D flag bit is part of the cc flag bit register currently. But since it
is not being used any where in the implementation, it creates an unnecessary
dependency. Hence, it is being moved to a separate register.
This patch is meant for allowing predicated reads and writes. Note that this
predication is different from the ISA provided predication. They way we
currently provide the ISA description for X86, we read/write registers that
do not need to be actually read/written. This is likely to be true for other
ISAs as well. This patch allows for read and write predicates to be associated
with operands. It allows for the register indices for source and destination
registers to be decided at the time when the microop is constructed. The
run time indicies come in to play only when the at least one of the
predicates has been provided. This patch will not affect any of the ISAs that
do not provide these predicates. Also the patch assumes that the order in
which operands appear in any function of the microop is same across all the
functions of the microops. A subsequent patch will enable predication for the
x86 ISA.
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.
In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.
Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).
The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.
Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.
The PciConfigAll now also uses a Param.Latency rather than a
Param.Tick. For backwards compatibility it still sets the pio_latency
to 1 tick. All the comments have also been updated to not state that
it is in simticks when it is not necessarily the case.
This patch moves the clock of the CPU, bus, and numerous devices to
the new class ClockedObject, that sits in between the SimObject and
MemObject in the class hierarchy. Although there are currently a fair
amount of MemObjects that do not make use of the clock, they
potentially should do so, e.g. the caches should at some point have
the same clock as the CPU, potentially with a 1:n ratio. This patch
does not introduce any new clock objects or object hierarchies
(clusters, clock domains etc), but is still a step in the direction of
having a more structured approach clock domains.
The most contentious part of this patch is the serialisation of clocks
that some of the modules (but not all) did previously. This
serialisation should not be needed as the clock is set through the
parameters even when restoring from the checkpoint. In other words,
the state is "stored" in the Python code that creates the modules.
The nextCycle methods are also simplified and the clock phase
parameter of the CPU is removed (this could be part of a clock object
once they are introduced).
Alpha System was overriding loadState() function to setup some functional
event. The system tried to read/write to memory before the Ruby memory had
unserialized the state. With this patch, Alpha System overrides the
startup() function, and sets up functional events in this function. This
works because startup() is called after Ruby memory system has unserialized
the memory state.
This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.
This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.
New tool chains seem to be looking for kernel versions newer than what
this this was previously set to. Also take this opportunity to change
the hostname we report in uname to sim.gem5.org.
Enable different whitelists for different OS/arch combinations,
since some use the generic Linux definitions only, and others
use definitions inherited from earlier Unix flavors on those
architectures.
Also update x86 function pointers so ioctl is no longer
unimplemented on that platform.
This patch is a revised version of Vince Weaver's earlier patch.
According to the A15 TRM the value of this register is as follows (assuming 16 word = 64 byte lines)
[31:29] Format - b100 specifies v7
[28] RAZ - b0
[27:24] CWG log2(max writeback size #words) - 0x4 16 words
[23:20] ERG log2(max reservation size #words) - 0x4 16 words
[19:16] DminLine log2(smallest dcache line #words) - 0x4 16 words
[15:14] L1Ip L1 index/tagging policy - b11 specifies PIPT
[13:4] RAZ - b0000000000
[3:0] IminLine log2(smallest icache line #words) - 0x4 16 words
This patch makes getAddrRanges const throughout the code base. There
is no reason why it should not be, and making it const prevents adding
any unintentional side-effects.
This patch fixes two warnings, one related to a narrowing conversion
(int to MachInst), and one due to the cast operator for arguments and
a mismatch in const-ness (const void* and void*).
npc in PCState for ARM was being calculated before the current flags were
updated with the next flags. This causes an issue as the npc is incremented by
two or four depending on the current flags (thumb or not) and was leading to
branches that were predicted correctly being identified as mispredicted.
This patch fixes a failing compilation caused by MaxMiscDestRegs being
zero. According to gcc 4.6, the result is a comparison that is always
false due to limited range of data type.
Due to recent changes to X86 TLB, gem5 stopped compiling on
gcc version 4.4.3. This patch provides the fix for that problem. The patch
is tested on gcc 4.4.3. The change is not required for more recent
versions of gcc (like on 4.6.3).
initCPU() will be called to initialize switched out CPUs for the simple and
inorder CPU models. this patch prevents those CPUs from being initialized
because they should get their state from the active CPU when it is switched
out.
This change allows designating a system as MP capable or not as some
bootloaders/kernels care that it's set right. You can have a single
processor MP capable system, but you can't have a multi-processor
UP only system. This change also fixes the initialization of the MIDR
register.
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
The CPUID instruction was implemented so that it would only write its results
if the instruction was successful. This works fine on the simple CPU where
unwritten registers retain their old values, but on a CPU like O3 with
renaming this is broken. The instruction needs to write the old values back
into the registers explicitly if they aren't being changed.
There are some bits of some fields of the ExtMachInst which are not actually
used for anything but are included in the hash of an ExtMachInst for
simplicity and efficiency. This change makes sure the decoder's internal
working ExtMachInst is completely initialized, even these unused bits, so that
there isn't any nondeterministic behavior, no valgrind messages about
uninitialized variables, and no potential false misses/redundant entries in
the decode cache.
The GDT can be accessed by user level software running in compatibility mode
by moving segment selectors into segment registers. The GDT needs to be set up
at an address accessible in this mode.
A small change was added a while ago to keep addresses from overflowing 32
bits when larger addresses shouldn't be accessible to software. That change
truncated when not in long mode, but really it should have truncated when not
in 64 bit mode. The difference is whether compatibility mode is included, a
mode that's supposed to act like a legacy 32 bit mode.
This will allow it to be specialized by the ISAs. The existing caching scheme
is provided by the BasicDecodeCache in the GenericISA namespace and is built
from the generalized components.
--HG--
rename : src/cpu/decode_cache.cc => src/arch/generic/decode_cache.cc
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.
--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
This patch moves the DMA device to its own set of files, splitting it
from the IO device. There are no behavioural changes associated with
this patch.
The patch also grabs the opportunity to do some very minor tidying up,
including some white space removal and pruning some redundant
parameters.
Besides the immediate benefits of the separation-of-concerns, this
patch also makes upcoming changes more streamlined as it split the
devices that are only slaves and the DMA device that also acts as a
master.
--HG--
rename : src/dev/io_device.cc => src/dev/dma_device.cc
rename : src/dev/io_device.hh => src/dev/dma_device.hh
This patch makes the (device) DmaPort non-snooping and removes the
recvSnoop constructor parameter and instead introduces a
SnoopingDmaPort subclass for the ARM table walker.
Functionality is unchanged, as are the stats, and the patch merely
clarifies that the normal DMA ports are not snooping (although they
may issue requests that are snooped by others, as done with PCI, PCIe,
AMBA4 ACE etc).
Currently this port is declared in the ARM table walker as it is not
used anywhere else. If other ports were to have similar behaviour it
could be moved in a future patch.
This patch moves the ECF and EZF bits to individual registers (ecfBit and
ezfBit) and the CF and OF bits to cfofFlag registers. This is being done
so as to lower the read after write dependencies on the the condition code
register. Ultimately we will have the following registers [ZAPS], [OF],
[CF], [ECF], [EZF] and [DF]. Note that this is only one part of the
solution for lowering the dependencies. The other part will check whether
or not the condition code register needs to be actually read. This would
be done through a separate patch.
Symbol tables masked with the loadAddrMask create redundant entries
that could conflict with kernel function events that rely on the
original addresses. This patch guards the creation of those masked
symbol tables by default, with an option to enable them when needed
(for early-stage kernel debugging, etc.)
This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.
For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.
It's possible for two page table walks to overlap which will go in the same
place in the TLB's trie. They would land on top of each other, so this change
adds some code which detects if an address already matches an entry and if so
throws away the new one.
The parameter is _machInst, which is very similar to the member machInst. If
machInst is used to pass the parameter to a lower level constructor, what
really happens is that machInst is set to whatever it already happened to be,
effectively leaving it uninitialized.
This change also adjusts the TlbEntry class so that it stores the number of
address bits wide a page is rather than its size in bytes. In other words,
instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn into 4K,
but it's a little harder going the other way.
This patch simplifies the packet by removing the broadcast flag and
instead more firmly relying on (and enforcing) the semantics of
transactions in the classic memory system, i.e. request packets are
routed from a master to a slave based on the address, and when they
are created they have neither a valid source, nor destination. On
their way to the slave, the request packet is updated with a source
field for all modules that multiplex packets from multiple master
(e.g. a bus). When a request packet is turned into a response packet
(at the final slave), it moves the potentially populated source field
to the destination field, and the response packet is routed through
any multiplexing components back to the master based on the
destination field.
Modules that connect multiplexing components, such as caches and
bridges store any existing source and destination field in the sender
state as a stack (just as before).
The packet constructor is simplified in that there is no longer a need
to pass the Packet::Broadcast as the destination (this was always the
case for the classic memory system). In the case of Ruby, rather than
using the parameter to the constructor we now rely on setDest, as
there is already another three-argument constructor in the packet
class.
In many places where the packet information was printed as part of
DPRINTFs, request packets would be printed with a numeric "dest" that
would always be -1 (Broadcast) and that field is now removed from the
printing.
This patch introduces port access methods that separates snoop
request/responses from normal memory request/responses. The
differentiation is made for functional, atomic and timing accesses and
builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the
different phases of the protocol (request -> [forwarded snoop request
-> snoop response]* -> response) all use the same port access
functions, even though the snoop packets flow in the opposite
direction to the normal packet. That is, a coherent master sends
normal request and receives responses, but receives snoop requests and
sends snoop responses (vice versa for the slave). These two distinct
phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a
slave through sendFunctional, and the request packet is turned into a
response before the call returns. In a system without cache coherence,
this is all that is needed from the functional interface. For the
cache-coherent scenario, a slave also sends snoop requests to coherent
masters through sendFunctionalSnoop, with responses returned within
the same packet pointer. This is currently used by the bus and caches,
and the LSQ of the O3 CPU. The send/recvFunctional and
send/recvFunctionalSnoop are moved from the Port super class to the
appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with
request being sent from master to slave through sendAtomic. In the
case of cache-coherent ports, a slave can send snoop requests to a
master through sendAtomicSnoop. Just as for the functional access
methods, the atomic send and receive member functions are moved to the
appropriate subclasses.
The timing access methods are different from the functional and atomic
in that requests and responses are separated in time and
send/recvTiming are used for both directions. Hence, a master uses
sendTiming to send a request to a slave, and a slave uses sendTiming
to send a response back to a master, at a later point in time. Snoop
requests and responses travel in the opposite direction, similar to
what happens in functional and atomic accesses. With the introduction
of this patch, it is possible to determine the direction of packets in
the bus, and no longer necessary to look for both a master and a slave
port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming
that are pure virtual functions, the recvFunctionalSnoop,
recvAtomicSnoop and recvTimingSnoop have a default implementation that
calls panic. This is to allow non-coherent master and slave ports to
not implement these functions.