In MessageBuffer the m_not_avail_count member is incremented but not used.
This causes an overflow reported by ASAN. This patch changes from an int to
Stats::Scalar, since the count is useful in debugging finite MessageBuffers.
RISC-V makes use of load-reserved and store-conditional instructions to
enable creation of lock-free concurrent data manipulation as well as
ACQUIRE and RELEASE semantics for memory ordering of LR, SC, and AMO
instructions (the latter of which do not follow LR/SC semantics). This
patch is a correction to patch 4, which added these instructions to the
implementation of RISC-V. It modifies locked_mem.hh and the
implementations of lr.w, sc.w, lr.d, and sc.d to apply the proper gem5
flags and return the proper values.
An important difference between gem5's LLSC semantics and RISC-V's LR/SC
ones, beyond the name, is that gem5 uses 0 to indicate failure and 1 to
indicate success, while RISC-V is the opposite. Strictly speaking, RISC-V
uses 0 to indicate success and nonzero to indicate failure where the
value would indicate the error, but currently only 1 is reserved as a
failure code by the ISA reference.
This is the seventh patch in the series which originally consisted of five
patches that added the RISC-V ISA to gem5. The original five patches added
all of the instructions and added support for more detailed CPU models and
the sixth patch corrected the implementations of Linux constants and
structs. There will be an eighth patch that adds some regression tests
for the instructions.
[Removed some commented-out code from locked_mem.hh.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
This is an add-on patch for the original series that implemented RISC-V
that improves the implementation of Linux emulation for SE mode. Basically
it cleans up linux/linux.hh by removing constants that haven't been
defined for the RISC-V Linux proxy kernel and rearranging the stat
struct so it aligns with RISC-V's implementation of it. It also adds
placeholders for system calls that have been given numbers in RISC-V
but haven't been given implementations yet. These system calls are
as follows:
- readlinkat
- sigprocmask
- ioctl
- clock_gettime
- getrusage
- getrlimit
- setrlimit
The first five patches implemented RISC-V with the base ISA and multiply,
floating point, and atomic extensions and added support for detailed
CPU models with memory timing.
[Fixed incompatibility with changes made from patch 1.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
Last of five patches adding RISC-V to GEM5. This patch adds support for
timing, minor, and detailed CPU models that was missing in the last four,
which basically consists of handling timing-mode memory accesses and
telling the minor and detailed models what a no-op instruction should
be (addi zero, zero, 0).
Patches 1-4 introduced RISC-V and implemented the base instruction set,
RV64I, and added the multiply, floating point, and atomic memory
extensions, RV64MAFD.
[Fixed compatibility with edit from patch 1.]
[Fixed compatibility with hg copy edit from patch 1.]
[Fixed some style errors in locked_mem.hh.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
Fourth of five patches adding RISC-V to GEM5. This patch adds the RV64A
extension, which includes atomic memory instructions. These instructions
atomically read a value from memory, modify it with a value contained in a
source register, and store the original memory value in the destination
register and modified value back into memory. Because this requires two
memory accesses and GEM5 does not support two timing memory accesses in
a single instruction, each of these instructions is split into two micro-
ops: A "load" micro-op, which reads the memory, and a "store" micro-op,
which modifies and writes it back. Each atomic memory instruction also has
two bits that acquire and release a lock on its memory location.
Additionally, there are atomic load and store instructions that only either
load or store, but not both, and can acquire or release memory locks.
Note that because the current implementation of RISC-V only supports one
core and one thread, it doesn't make sense to make use of AMO instructions.
However, they do form a standard extension of the RISC-V ISA, so they are
included mostly as a placeholder for when multithreaded execution is
implemented. As a result, any tests for their correctness in a future
patch may be abbreviated.
Patch 1 introduced RISC-V and implemented the base instruction set, RV64I;
patch 2 implemented the integer multiply extension, RV64M; and patch 3
implemented the single- and double-precision floating point extensions,
RV64FD.
Patch 5 will add support for timing, minor, and detailed CPU models that
isn't present in patches 1-4.
[Added missing file amo.isa]
[Replaced information removed from initial patch that was missed during
division into multiple patches.]
[Fixed some minor formatting issues.]
[Fixed oversight where LR and SC didn't have both AQ and RL flags.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
Third of five patches adding RISC-V to GEM5. This patch adds the RV64FD
extensions, which include single- and double-precision floating point
instructions.
Patch 1 introduced RISC-V and implemented the base instruction set, RV64I
and patch 2 implemented the integer multiply extension, RV64M.
Patch 4 will implement the atomic memory instructions, RV64A, and patch
5 will add support for timing, minor, and detailed CPU models that is
missing from the first four patches.
[Fixed exception handling in floating-point instructions to conform better
to IEEE-754 2008 standard and behavior of the Chisel-generated RISC-V
simulator.]
[Fixed style errors in decoder.isa.]
[Fixed some fuzz caused by modifying a previous patch.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
Second of five patches adding RISC-V to GEM5. This patch adds the
RV64M extension, which includes integer multiply and divide instructions.
Patch 1 introduced RISC-V and implemented the base instruction set, RV64I.
Patch 3 will implement the floating point extensions, RV64FD; patch 4 will
implement the atomic memory instructions, RV64A; and patch 5 will add
support for timing, minor, and detailed CPU models that is missing from
the first four patches.
[Added mulw instruction that was missed when dividing changes among
patches.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
First of five patches adding RISC-V to GEM5. This patch introduces the
base 64-bit ISA (RV64I) in src/arch/riscv for use with syscall emulation.
The multiply, floating point, and atomic memory instructions will be added
in additional patches, as well as support for more detailed CPU models.
The loader is also modified to be able to parse RISC-V ELF files, and a
"Hello world\!" example for RISC-V is added to test-progs.
Patch 2 will implement the multiply extension, RV64M; patch 3 will implement
the floating point (single- and double-precision) extensions, RV64FD;
patch 4 will implement the atomic memory instructions, RV64A, and patch 5
will add support for timing, minor, and detailed CPU models that is missing
from the first four patches (such as handling locked memory).
[Removed several unused parameters and imports from RiscvInterrupts.py,
RiscvISA.py, and RiscvSystem.py.]
[Fixed copyright information in RISC-V files copied from elsewhere that had
ARM licenses attached.]
[Reorganized instruction definitions in decoder.isa so that they are sorted
by opcode in preparation for the addition of ISA extensions M, A, F, D.]
[Fixed formatting of several files, removed some variables and
instructions that were missed when moving them to other patches, fixed
RISC-V Foundation copyright attribution, and fixed history of files
copied from other architectures using hg copy.]
[Fixed indentation of switch cases in isa.cc.]
[Reorganized syscall descriptions in linux/process.cc to remove large
number of repeated unimplemented system calls and added implmementations
to functions that have received them since it process.cc was first
created.]
[Fixed spacing for some copyright attributions.]
[Replaced the rest of the file copies using hg copy.]
[Fixed style check errors and corrected unaligned memory accesses.]
[Fix some minor formatting mistakes.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
If the cache access mode is parallel, i.e. "sequential_access" parameter
is set to "False", tags and data are accessed in parallel. Therefore,
the hit_latency is the maximum latency between tag_latency and
data_latency. On the other hand, if the cache access mode is
sequential, i.e. "sequential_access" parameter is set to "True",
tags and data are accessed sequentially. Therefore, the hit_latency
is the sum of tag_latency plus data_latency.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This function was used by the now-defunct InOrderCPU model. Since this
model is no longer in gem5, this function was not called from anywhere in
the code.
Changeset 11701 only serialized the useful portion of of an ethernet packets'
payload. However, the device models expect each ethernet packet to contain
a 16KB buffer, even if there is no data in it. This patch adds a 'bufLength'
field to EthPacketData so the original size of the packet buffer can always
be unserialized.
Reported-by: Gabor Dozsa <Gabor.Dozsa@arm.com>
There has been some problem when using address and undefined-behavior
sanitizers at the same time. This patch will look for the special case
where both are enabled at once and change the flags passed to the compiler
to reflect this.
the GPUExecContext context currently stores a reference to its parent WF's
GPUISA object, however there are some special instructions that do not have
an associated WF. when these objects are constructed they set their WF pointer
to null, which causes the GPUExecContext to segfault when trying to
dereference
the WF pointer to get at the WF's GPUISA object. here we change the GPUISA
reference in the GPUExecContext class to a pointer so that it may be set to
null.
UBSAN flags this operation because it detects that arg is being cast directly
to an unsigned type, argBits. this patch fixes this by first casting the
value to a signed int type, then reintrepreting the raw bits of the signed
int into argBits.
SequencerMsg is autogenerated by slicc scripts and the MessageSizeType is
initialized to the max enume value by default. The DMASequencer pushes this
message to the mandatory queue and since the MessageSizeType is unitialized,
string_to_MessageSizeType() function used by traces to print the message fails
with a panic. This patch avoids this problem by initializing MessageSizeType
of SequencerMsg to Request_Control.
fixes to appease clang++. tested on:
Ubuntu clang version 3.5.0-4ubuntu2~trusty2
(tags/RELEASE_350/final) (based on LLVM 3.5.0)
Ubuntu clang version 3.6.0-2ubuntu1~trusty1
(tags/RELEASE_360/final) (based on LLVM 3.6.0)
the fixes address the following five issues:
1) the exec continuations in gpu_static_inst.hh were marked
as protected when they should be public. here we mark
them as public
2) the Abs instruction uses std::abs() in its execute method.
because Abs is templated, it can also operate on U32 and U64,
types, which cause Abs::execute() to pass uint32_t and uint64_t
types to std::abs() respectively. this triggers a warning
because std::abs() has no effect in this case. to rememdy this
we add template specialization for the execute() method of Abs
when its template paramter is U32 or U64.
3) Some potocols that utilize the code in cprintf.hh were missing
includes to BoolVec.hh, which defines operator<< for the BoolVec
type. This would cause issues when the generated code would try
to pass a BoolVec type to a method in cprintf.hh that used
operator<< on an instance of a BoolVec.
4) Surprise, clang doesn't like it when you clobber all the bits
in a newly allocated object. I.e., this code:
tlb = new GpuTlbEntry\[size\];
std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size);
Let's use std::vector to track the TLB entries in the GpuTlb now...
5) There were a few variables used only in DPRINTFs, so we mark them
with M5_VAR_USED.
This patch adds the ability for an application to request dist-gem5 to begin/
end synchronization using an m5 op. When toggling on sync, all nodes agree
on the next sync point based on the maximum of all nodes' ticks. CPUs are
suspended until the sync point to avoid sending network messages until sync has
been enabled. Toggling off sync acts like a global execution barrier, where
all CPUs are disabled until every node reaches the toggle off point. This
avoids tricky situations such as one node hitting a toggle off followed by a
toggle on before the other nodes hit the first toggle off.
DMA sequencers and protocols can currently only issue one DMA access at
a time. This patch implements the necessary functionality to support
multiple outstanding DMA requests in Ruby.
Currently, all the network devices create a 16K buffer for the 'data' field
in EthPacketData, and use 'length' to keep track of the size of the packet
in the buffer. This patch introduces the 'simLength' parameter to
EthPacketData, which is used to hold the effective length of the packet used
for all timing calulations in the simulator. Serialization is performed using
only the useful data in the packet ('length') and not necessarily the entire
original buffer.
this patch adds an ordered response buffer to the GM pipeline
to ensure in-order data delivery. the buffer is implemented as
a stl ordered map, which sorts the request in program order by
using their sequence ID. when requests return to the GM pipeline
they are marked as done. only the oldest request may be serviced
from the ordered buffer, and only if is marked as done.
the FIFO response buffers are kept and used in OoO delivery mode
for HSAIL an operand's indices into the register files may be calculated
trivially, because the operands are always read from a register file, or are
an immediate.
for machine ISA, however, an op selector may specify special registers, or
may specify special SGPRs with an alias op selector value. the location of
some of the special registers values are dependent on the size of the RF
in some cases. here we add a way for the underlying getRegisterIndex()
method to know about the size of the RFs, so that it may find the relative
positions of the special register values.
currently the PC is incremented on an instruction granularity, and not as an
instruction's byte address. machine ISA instructions assume the PC is a byte
address, and is incremented accordingly. here we make the GPU model, and the
HSAIL instructions treat the PC as a byte address as well.
the GPUISA class is meant to encapsulate any ISA-specific behavior - special
register accesses, isa-specific WF/kernel state, etc. - in a generic enough
way so that it may be used in ISA-agnostic code.
gpu-compute: use the GPUISA object to advance the PC
the GPU model treats the PC as a pointer to individual instruction objects -
which are store in a contiguous array - and not a byte address to be fetched
from the real memory system. this is ok for HSAIL because all instructions
are considered by the model to be the same size.
in machine ISA, however, instructions may be 32b or 64b, and branches are
calculated by advancing the PC by the number of words (4 byte chunks) it
needs to advance in the real instruction stream. because of this there is
a mismatch between the PC we use to index into the instruction array, and
the actual byte address PC the ISA expects. here we move the PC advance
calculation to the ISA so that differences in the instrucion sizes may be
accounted for in generic way.
because every taken branch causes fetch to be discarded, we move the call
to the WF to avoid to have to call it from each and every branch instruction
type.
we are removing doGmReturn from the GM pipe, and adding completeAcc()
implementations for the HSAIL mem ops. the behavior in doGmReturn is
dependent on HSAIL and HSAIL mem ops, however the completion phase
of memory ops in machine ISA can be very different, even amongst individual
machine ISA mem ops. so we remove this functionality from the pipeline and
allow it to be implemented by the individual instructions.
this patch removes the GPUStaticInst enums that were defined in GPU.py.
instead, a simple set of attribute flags that can be set in the base
instruction class are used. this will help unify the attributes of HSAIL
and machine ISA instructions within the model itself.
because the static instrution now carries the attributes, a GPUDynInst
must carry a pointer to a valid GPUStaticInst so a new static kernel launch
instruction is added, which carries the attributes needed to perform a
the kernel launch.
the RequestDesc was previously implemented as a std::pair, which made
the implementation overly complex and error prone. here we encapsulate the
packet, primary, and secondary types all in a single data structure with
all members properly intialized in a ctor
Read() should not write anything when returning 0 (EOF).
This patch does not correct the same bug occuring for :
nbr_read=read(file, buf, nbytes)
When nbr_read<nbytes, nbytes bytes are copied into the virtual
RAM instead of nbr_read. If buf is smaller than nbytes, a
page fault occurs, even if buf is in fact bigger than nbr_read.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to
Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which
distinguishes writes to the INT and FP register banks.
Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72,
where the "latency" of FMADD is 3 if the next instruction is a FMADD and
has only the augend to destination dependency, otherwise it's 7 cycles.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
ClockedObject was changed to require its regStats() to be called from every
child class. If you forget to do this, the error was indecipherable. This
patch makes the error more clear.
Added power-down state transitions to the DRAM controller model.
Added per rank parameter, outstandingEvents, which tracks the number
of outstanding command events and is used to determine when the
controller should transition to a low power state.
The controller will only transition when there are no outstanding events
scheduled and the number of command entries for the given rank is 0.
The outstandingEvents parameter is incremented for every RD/WR burst,
PRE, and REF event scheduled. ACT is implicitly covered by RD/WR
since burst will always issue and complete after a required ACT.
The parameter is decremented when the event is serviced (completed).
The controller will automatically transition to ACT power down,
PRE power down, or SREF.
Transition to ACT power down state scheduled from:
1) The RespondEvent, where read data is received from the memory.
ACT power-down entry will be scheduled when one or more banks is
open, all commands for the rank have completed (no more commands
scheduled), and there are no commands in queue for the rank
Transition to PRE power down scheduled from:
1) respondEvent, when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
2) prechargeEvent when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
3) refreshEvent, after the refresh is complete when the previous
state was ACT power-down
4) refreshEvent, after the refresh is complete when the previous
state was PRE power-down and there are commands in the queue.
Transition to SREF will be scheduled from:
1) refreshEvent, after the refresh is completes when the previous
state was PRE power-down with no commands in queue
Power-down exit commands are scheduled from:
1) The refreshEvent, prior to issuing a refresh
2) doDRAMAccess, to wake-up the rank for RD/WR command issue.
Self-refresh exit commands are scheduled from:
1) The next request event, when the queue has commands for the rank
in the readQueue or there are commands for the rank in the
writeQueue and the bus state is WRITE.
Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
The per rank statistics are periodically updated based on
state transition and refresh events.
Add a method to update these when a dump event occurs to
ensure they reflect accurate values.
Specifically, need to ensure that the low-power state
durations, power, and energy are logged correctly.
Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
Add constraint that all ranks have to be in PWR_IDLE
before signaling drain complete
This will ensure that the banks are all closed and the rank
has exited any low-power states.
On suspend, update the power stats to sync the DRAM power logic
The logic maintains the location of the signalDrainDone
method, which is still triggered from either:
1) Read response event
2) Next request event
This ensures that the drain will complete in the READ bus
state and minimizes the changes required.
Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
Add local variable to stores commands to be issued.
These commands are in order within a single bank but will be out
of order across banks & ranks.
A new procedure, flushCmdList, sorts commands across banks / ranks,
and flushes the sorted list, up to curTick() to DRAMPower.
This is currently called in refresh, once all previous commands are
guaranteed to have completed. Could be called in other events like
the powerEvent as well.
By only flushing commands up to curTick(), will not get out of sync
when flushed at a periodic stats dump (done in subsequent patch).
Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
The generic timer needs a pointer to an ArmSystem to wire itself to the
system register handler. This was previously specified as an instance
of System that was later cast to ArmSystem. Make this more robust by
specifying it as an ArmSystem in the Python interface and add a check
to make sure that it is non-NULL.
Change-Id: I989455e666f4ea324df28124edbbadfd094b0d02
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
This patch adds port direction names to the links during topology
creation, which can be used for better printed names for the links
or for users to code up their own adaptive routing algorithms.
It also adds support for every router to have an independent latency
value to support heterogeneous topologies with the subsequent
garnet2.0 patch.
This patch makes the internal links within the network topology
unidirectional, thus allowing any deadlock-free routing algorithms to
be specified from the topology itself using weights.
This patch also renames Mesh.py and MeshDirCorners.py to
Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing).
It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.
networktest is essentially a collection of synthetic traffic patterns
for the network. The protocol name and the tester having the same name
led to multiple python configuration files with the same name, adding
confusion. This patch renames networktest to garnet_synthetic_traffic,
and also adds more synthetic traffic patterns.
Over the past 6 years, we realized that the protocol is essentially used
to run the garnet network in a standalone manner, and feed standard synthetic
traffic patterns through it.
Instead of scheduling another event, this patch adds a warning in case gdb
is attached multiple times and the first attachement event has not been
processed yet.
This patch adds a method to the Wavefront class to compute the actual workgroup
size. This can be different from the maximum workgroup size specified when
launching the kernel through the NDRange object. Current solution is still not
optimal, as we are computing these for each wavefront and the dispatcher also
needs to have this information and can't actually call
Wavefront::computeActuallWgSz before the wavefronts are being created. A long
term solution would be to have a Workgroup class that deals with all these
details.
When loading a checkpoint, it's sometimes desirable to be able to test
whether an entry within a secion exists. This is currently done
automatically in the UNSERIALIZE_OPT_SCALAR macro, but it isn't
possible to do for arrays, containers, or enums. Instead of adding
even more macros, add a helper function (CheckpointIn::entryExists())
that tests for the presence of an entry.
Change-Id: I4b4646b03276b889fd3916efefff3bd552317dbc
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
The drain did not wait until stages were ready again. Therefore, as a
result of messages in the TimeBuffer being drain, the state after the
drain was not consistent and asserts fired in some places when the
draining happened after a stage got blocked, but before the notification
arrived to the previous stages.
Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This patch adds methods to serialize the context of a particular wavefront
to the simulated system memory. Context serialization is used when a wavefront
is preempeted (i.e. context switch).
std::stack has no iterators, therefore the reconvergence stack can't be
iterated without poping elements off. We will be using std::list instead to be
able to iterate for saving and restoring purposes.
WFContext struct is currently unused and it has been rendered not useful in
saving and restoring the context of a Wavefront. Wavefront class should be
sufficient for that purpose and the runtime can figure out the memory size
it will need to allocate for a Wavefront through an IOCTL.
This change adds a Trace CPU param to exit simulation early,
i.e. when the first (any one) trace execution is complete. With
this change the user gets a choice to configure exit as either
when the last CPU finishes (default) or first CPU finishes
replay. Configuring an early exit enables simulating and
measuring stats strictly when memory-system resources are being
stressed by all Trace CPUs.
Change-Id: I3998045fdcc5cd343e1ca92d18dd7f7ecdba8f1d
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
This change subtracts the time offset present in the trace from
all the event times when nodes and request are sent so that the
replay starts immediately when the simulation starts. This makes
the stats accurate when the time offset in traces is large, for
example when traces are generated in the middle of a workload
execution. It also solves the problem of unnecessary DRAM
refresh events that would keep occuring during the large time
offset before even a single request is replayed into the system.
Change-Id: Ie0898842615def867ffd5c219948386d952af7f7
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
This change adds a simple feature to scale the frequency of
the Trace CPU.
The compute delays in the input traces provide timing. This
change adds a freqency multiplier parameter to the Trace CPU
set to 1.0 by default. The compute delay is manipulated to
effectively achieve the frequency at which the nodes become
ready and thus scale the frequency of the Trace CPU.
Change-Id: Iaabbd57806941ad56094fcddbeb38fcee1172431
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
This patch enables timing accesses for KVM cpu. A new state,
RunningMMIOPending, is added to indicate that there are outstanding timing
requests generated by KVM in the system. KVM's tick() is disabled and the
simulation does not enter into KVM until all outstanding timing requests have
completed. The main motivation for this is to allow KVM CPU to perform MMIO
in Ruby, since Ruby does not support atomic accesses.
Normal MMAPPED_IPR requests are allowed to execute speculatively under the
assumption that they have no side effects. The special case of m5ops that are
treated like MMAPPED_IPR should not be allowed to execute speculatively, since
they can have side-effects. Adding the STRICT_ORDER flag to these requests
blocks execution until the associated instruction hits the ROB head.
The quiesce family of magic ops can be simplified by the inclusion of
quiesceTick() and quiesce() functions on ThreadContext. This patch also
gets rid of the FS guards, since suspending a CPU is also a valid
operation for SE mode.
This patch introduces the DmaCallback helper class, which registers a callback
to fire after a sequence of (potentially non-contiguous) DMA transfers on a
DmaPort completes.
Connecting basic blocks would stop too early in kernels where ret was not the
last instruction. This patch allows basic blocks after the ret instruction
to be properly connected.
The receiver thread in dist_iface is allowed to directly exit the simulation.
This can cause exit to be called twice if the main thread simultaneously wants
to exit the simulation. Therefore, have the receiver thread enqueue a request
to exit on the primary event queue for the main simulation thread to handle.
Ethernet devices are currently only hooked up if running in FS mode. Much of
the Ethernet networking code is generic and can be used to build non-Ethernet
device models. Some of these device models do not require a complex driver
stack and can be built to use an EmulatedDriver in SE mode. This patch enables
etherent interfaces to properly connect regardless of whether the simulation
is in FS or SE mode.
Currently only 'start' and 'end' of AddrRange are printed in config.ini.
This causes address ranges to be overlapping when loading a c++-only
config with interleaved addresses using CxxConfigManger. This patch adds
prints for the interleave and XOR bits to config.ini such that address
ranges are properly setup with cxx config.
Add a customizable NoMali GPU model and an example Mali T760
configuration. Unlike the normal NoMali model (NoMaliGpu), the
NoMaliCustopmGpu model exposes all the important GPU ID registers to
Python. This makes it possible to implement custom GPU configurations
by without changing the underlying NoMali library.
Change-Id: I4fdba05844c3589893aa1a4c11dc376ec33d4e9e
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Only map memories into the KVM guest address space that are
marked as usable by KVM. Create BackingStoreEntry class
containing flags for is_conf_reported, in_addr_map, and
kvm_map.
This changeset reverts the changset "dev, sim: Added missing override
keywords to fix CLANG compilation (OSX)" which was incorrectly rebased.
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Previously printing an mshr would trigger an assertion if the MSHR was
not in service or if the targets list was empty. This patch changes
the print function to bypasses the accessor functions for
postInvalidate and postDowngrade and avoid the relevant assertions. It
also checks if the targets list is empty before calling print on it.
Change-Id: Ic18bee6cb088f63976112eba40e89501237cfe62
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Secure and non-secure data can coexist in the cache and therefore the
snoop filter should treat differently packets with secure and non
secure accesses. This patch uses the lower bits of the line address to
keep track of whether the packet is addressing secure memory or not.
Change-Id: I54a5e614dad566a5083582bede86c86896f2c2c1
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
This patch changes the default behaviour of the SystemXBar, adding a
snoop filter. With the recent updates to the snoop filter allocation
behaviour this change no longer causes problems for the regressions
without caches.
Change-Id: Ibe0cd437b71b2ede9002384126553679acc69cc1
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
This patch improves the snoop filter allocation decisions by not only
looking at whether a port is snooping or not, but also if the packet
actually came from a cache. The issue with only looking at isSnooping
is that the CPU ports, for example, are snooping, but not actually
caching. Previously we ended up incorrectly allocating entries in
systems without caches (such as the atomic and timing quick
regressions). Eventually these misguided allocations caused the snoop
filter to panic due to an excessive size.
On the request path we now include the fromCache check on the packet
itself, and for responses we check if we actually have a snoop-filter
entry.
Change-Id: Idd2dbc4f00c7e07d331e9a02658aee30d0350d7e
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
This patch takes yet another step in maintaining the clusivity, in
that it allows a mostly-inclusive cache to hold on to blocks even when
responding to a ReadExReq or UpgradeReq. Previously the cache simply
invalidated these blocks, but there is no strict need to do so.
The most important part of this patch is that we simply mark the block
clean when satisfying the upstream request where the cache is allowed
to keep the block. The only tricky part of the patch is in the memory
management of deferred snoops, where we need to distinguish the cases
where only the packet was copied (we expected to respond), and the
cases where we created an entirely new packet and request (we kept it
only to replay later).
The code in satisfyRequest is definitely ready for some refactoring
after this.
Change-Id: I201ddc7b2582eaa46fb8cff0c7ad09e02d64b0fc
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>