When switching from an atomic CPU to any of the timing CPUs, a drain is
unnecessary since no events are scheduled in atomic mode. However, when
trying to switch CPUs starting with a timing CPU, there may be events
scheduled. This change ensures that all events are drained from the system
by calling m5.drain before switching CPUs.
The profileEvent pointer is tested against NULL in various places, but
it is not initialized unless running in full-system mode. In SE mode, this
can result in segmentation faults when profileEvent default intializes to
something other than NULL.
This patch addresses a few minor issues reported by the clang static
analyzer.
The analysis was run with:
scan-build -disable-checker deadcode \
-enable-checker experimental.core \
-disable-checker experimental.core.CastToStruct \
-enable-checker experimental.cpluscplus
This seperates the functionality to clear the state in a block into
blk.hh and the functionality to udpate the tag information into the
tags. This gets rid of the case where calling invalidateBlk on an
already-invalid block does something different than calling it on a
valid block, which was confusing.
The patch introduces two predicates for condition code registers -- one
tests if a register needs to be read, the other tests whether a register
needs to be written to. These predicates are evaluated twice -- during
construction of the microop and during its execution. Register reads
and writes are elided depending on how the predicates evaluate.
The D flag bit is part of the cc flag bit register currently. But since it
is not being used any where in the implementation, it creates an unnecessary
dependency. Hence, it is being moved to a separate register.
This patch is meant for allowing predicated reads and writes. Note that this
predication is different from the ISA provided predication. They way we
currently provide the ISA description for X86, we read/write registers that
do not need to be actually read/written. This is likely to be true for other
ISAs as well. This patch allows for read and write predicates to be associated
with operands. It allows for the register indices for source and destination
registers to be decided at the time when the microop is constructed. The
run time indicies come in to play only when the at least one of the
predicates has been provided. This patch will not affect any of the ISAs that
do not provide these predicates. Also the patch assumes that the order in
which operands appear in any function of the microop is same across all the
functions of the microops. A subsequent patch will enable predication for the
x86 ISA.
If I understand correctly, this was put in place so that a debugger can be
attached when the protocol aborts. While this sounds useful, it is a problem
when the simulation is not being actively monitored. I think it is better to
remove this.
This patch simply bumps the stats to avoid having failing
regressions. Someone with more insight in the changes should verify
that these differences all make sense.
Despite gzwrite taking an unsigned for length, it returns an int for
bytes written; gzwrite fails if (int)len < 0. Because of this, call
gzwrite with len no larger than INT_MAX: write in blocks of INT_MAX if
data to be written is larger than INT_MAX.
This patch prunes the range_ops header that is no longer used. The
bridge used it to do filtering of address ranges, but this is changed
since quite some time.
Ultimately this patch aims to simplify the handling of ranges before
specialising the AddrRange to an AddrRegion that also allows striping
bits to be selected.
This patch aims to simplify the use of the Range class before
introducing a more elaborate AddrRegion to replace the AddrRange. The
SackRange is the only use of the range class besides address ranges,
and the removal of this use makes for an easier modification of the
range class.
The functionlity that is removed with this patch is not used anywhere
throughout the code base.
This patch addresses a previously highlighted issue with the default
latencies used for PIO and PCI devices. The values are merely educated
guesses and might not represent the particular system you want to
model. However, the values in this patch are definitely far more
realistic than the previous ones.
In i8254xGBe, the writeConfig method is updated to use configDelay
instead of pioDelay.
A follow-up patch will update the regression stats.
This patch allows for specifying multiple programs via command line. It also
adds an option for specifying whether to use of SMT. But SMT does not work for
the o3 cpu as of now.
Includes a small change in sim_object.cc that adds the name space to
the output stream parameter in serializeAll. Leaving out the name
space unfortunately confuses Doxygen.
Simulation objects normally register derived statistics, presumably
what regFormulas originally was meant for, in regStats(). This patch
removes regRegformulas since there is no need to have a separate
method call to register formulas.
The simple_bootloader checks for CPU0 in a manner incompatible with systems
actually using affinity levels -- just looking at MPIDR[7:0]. However, in
future we may wish to use real affinity levels and this method will be in danger
of matching several CPUs with affinity0 = 0.
Match affinity2 == affinity1 == affinity0 == 0 instead.
Implement some code we used to panic on as it actually does happen with the
e1000 driver in Linux 3.3+. We used to assume that a TSO header would never
be part of a larger payload, however it appears as though it now can be.
Some bare metal build flows seem to build binaries that we aren't necessarily
expecting. Initialize everything to 0, so we don't make any assumptions about
what is or isn't in the binary.
This patch is a first step to using Cycles as a parameter type. The
main affected modules are the CPUs and the Ruby caches. There are
definitely plenty more places that are affected, but this patch serves
as a starting point to making the transition.
An important part of this patch is to actually enable parameters to be
specified as Param.Cycles which involves some changes to params.py.
The =operator for the DataBlock class was incorrectly interpreting the class
member m_alloc. This variable stands for whether the assigned memory for the
data block needs to be freed or not by the class itself. It seems that the
=operator interpreted the variable as whether the memory is assigned to the
data block. This wrong interpretation was causing values not to propagate
to RubySystem::m_mem_vec_ptr. This caused major issues with restoring from
checkpoints when using a protocol which verified that the cache data was
consistent with the backing store (i.e. MOESI-hammer).
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.
In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.
Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
This patch tightens up the semantics around port binding and checks
that the ports that are being bound are currently not connected, and
similarly connected before unbind is called.
The patch consequently also changes the order of the unbind and bind
for the switching of CPUs to ensure that the rules are adhered
to. Previously the ports would be "over-written" without any check.
There are no changes in behaviour due to this patch, and the only
place where the unbind functionality is used is in the CPU.
This patch updates how the checker CPU handles the ports such that the
regressions will once again run without causing a panic.
A minor amount of tidying up was also done as part of this patch.
This patch disables a warning for unused values which causes problems
when compiling the swig-generated sources using recent llvm-based
compilers like llvm-gcc and clang.