Three minor issues are resolved:
1. Apparently gcc 5.1 does not like negation of booleans followed by
bitwise AND.
2. Somehow the compiler also gets confused and warns about
NoopMachInst being unused (removing it causes compilation errors
though). Most likely a compiler bug.
3. There seems to be a number of instances where loop unrolling causes
false positives for the array-bounds check. For now, switch to
std::array. Potentially we could disable the warning for newer gcc
versions, but switching to std::array is probably a good move in
any case.
The system class currently clears the vector of active CPUs in
initState(). CPUs are added to the list by registerThreadContext()
which is called from BaseCPU::init(). This obviously breaks when the
System object is initialized after the CPUs. This changeset removes
the offending clear() call since the list will be empty after it has
been instantiated anyway.
The current ignoreWarnOnceFunc doesn't really work as expected,
since it will only generate one warning total, for whichever
"warn-once" syscall is invoked first. This patch fixes that
behavior by keeping a "warned" flag in the SyscallDesc object,
allowing suitably flagged syscalls to warn exactly once per
syscall.
Sometimes, we need to defer an express snoop in an MSHR, but the original
request might complete and deallocate the original pkt->req. In those cases,
create a copy of the request so that someone who is inspecting the delayed
snoop can also inspect the request still. All of this is rather hacky, but the
allocation / linking and general life-time management of Packet and Request is
rather tricky. Deleting the copy is another tricky area, testing so far has
shown that the right copy is deleted at the right time.
We currently assume that all uncacheable memory accesses are strictly
ordered. Instead of always enforcing strict ordering, we now only
enforce it if the required memory type is device memory or strongly
ordered memory.
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:
* Speculation: CPU models are not allowed to issue speculative
loads.
* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.
Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
With the recent patches addressing how we deal with uncacheable
accesses there is no longer need for the work arounds put in place to
enforce certain sections of memory to be uncacheable during boot.
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.
The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
This patch simplifies the overall CPU by changing the TLB caches such
that they do not forward snoops to the table walker port(s). Note that
only ARM and X86 are affected.
There is no reason for the ports to snoop as they do not actually take
any action, and from a performance point of view we are better of not
snooping more than we have to.
Should it at a later point be required to snoop for a particular TLB
design it is easy enough to add it back.
This patch ensures that we pass on information about a packet being
shared (rather than exclusive), when forwarding a packet downstream.
Without this patch there is a risk that a downstream cache considers
the line exclusive when it really isn't.
This patch adds a missing counter update for the uncacheable
accesses. By updating this counter we also get a meaningful average
latency for uncacheable accesses (previously inf).
This patch changes the cache implementation to rely on virtual methods
rather than using the replacement policy as a template argument.
There is no impact on the simulation performance, and overall the
changes make it easier to modify (and subclass) the cache and/or
replacement policy.
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
Both open_adaptive and close_adaptive page polices keep the page
open if a row hit is found. If a row hit is not found, close_adaptive
page policy precharges the row, and open_adaptive policy precharges
the row only if there is a bank conflict request waiting in the queue.
This patch makes the checks for above conditions simpler.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued. As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined). The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value. To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean. If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
This patch sets the default latency of the division microop to a single cycle
on x86. This is because the division instructions DIV and IDIV have been
implemented as loops of div microops, where each microop computes a single bit
of the quotient.
Same exception is raised whether division with zero is performed or the
quotient is greater than the maximum value that the provided space can hold.
Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.
This patch introduces a UFS host controller and a UFS device. More
information about the UFS standard can be found at the JEDEC site:
http://www.jedec.org/standards-documents/results/jesd220
Note that the model does not implement the complete standard, and as
such is not an actual implementation of UFS. The following SCSI
commands are implemented: inquiry, read, read capacity, report LUNs,
start/stop, test unit ready, verify, write, format unit, send
diagnostic, synchronize cache, mode select, mode sense, request sense,
unmap, write buffer and read buffer. This is sufficient for usage with
Linux and Android.
To interact with this model a kernel version 3.9 or above is
needed.
This adds a NAND flash timing model. This model takes the number of
planes into account and is ultimately intended to be used as a
high-level performance model for any device using flash. To access the
memory, use either readMemory or writeMemory.
To make use of the model you will need an interface model
such as UFSHostDevice, which is part of a separate patch.
At the moment the flash device is part of the ARM device tree since
the only use if the UFSHostDevice, and that in turn relies on the ARM
GIC.
This patch adds an I2C bus and base device. I2C is used to connect a
variety of sensors, and this patch serves as a starting point to
enable a range of I2C devices.
This patch fixes a few small issues to ensure gem5 compiles when using
gcc 5.1.
First, the GDB_REG_BYTES in the RemoteGDB header are, rather
surprisingly, flagged as unused for both ARM and X86. Removing them,
however, causes compilation errors as they are actually used in the
source file. Moving the constant into the class definition fixes the
issue. Possibly a gcc bug.
Second, we have an unused EthPktData constructor using auto_ptr, and
the latter is deprecated. Since the code is never used it is simply
removed.
The o3 cpu instruction queue model uses the count variable to track the number
of unissued instructions in the queue. Previously, the squash method used
this variable to avoid executing the doSquash method when there were no
unissued instructions in the pipeline. A corner case problem exists when
only issued instructions exist in the pipeline and a squash occurs; the
doSquash code is not invoked and subsequently does not clean up state properly.
Very small changes to iew.predictedNotTakenIncorrect
and iew.branchMispredicts. Looks like similar updates
were committed on April 3 (changeset 235ff1c046df), but
only for the quick tests.
This patch takes the final step in removing the InOrderCPU from the
tree. Rest in peace.
The MinorCPU is now used to model an in-order microarchitecture, and
long term the MinorCPU will eventually be renamed InOrderCPU.
The change in 20.parser is from new x87 instructions. The change to
pc-o3-timing is not clear to me. It seems that this test might be invoking
some undefined behavior.
This patch ensures that the CPU progress Event is triggered for the new set of
switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids
printing the interval state if the cpu is currently switched out.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>