This patch fixes a bug in the O3 fetch stage that was introduced when
the cache line size was moved to the system. By mistake, the
initialisation and resetting of the fetch stage was merged and put in
the constructor. The resetting is now re-added where it should be.
Using arm-linux-gnueabi-gcc 4.7.3-1ubuntu1 on Ubuntu 13.04 to compiled
the m5 binary yields the error:
m5op_arm.S: Assembler messages:
m5op_arm.S:85: Error: selected processor does not support ARM mode `bxj lr'
For each of of the SIMPLE_OPs. Apparently, this compiler doesn't like the
interworking of these code types for the default arch. Adding -march=armv7-a
makes it compile. Another alternative that I found to work is replacing the
bxj lr instruction with mov pc, lr, but I don't know how that affects the
KVM stuff and if bxj is needed.
Some of the code in StateMachine.py file is added to all the controllers and
is independent of the controller definition. This code is being moved to the
AbstractController class which is the parent class of all controllers.
This patch adds checkpointing support to x86 tlb. It upgrades the
cpt_upgrader.py script so that previously created checkpoints can
be updated. It moves the checkpoint version to 6.
This patch contains three fixes to max tick options handling in Options.py and
Simulation.py:
1) Since the global simulator frequency isn't bound until m5.instantiate()
is called, the maxtick resolution needs to happen after this call, since
changes to the global frequency will cause m5.simulate() to misinterpret the
maxtick value. Shuffling this also requires tweaking the checkpoint directory
handling to signal the checkpoint restore tick back to run(). Fixing this
completely and correctly will require storing the simulation frequency into
checkpoints, which is beyond the scope of this patch.
2) The maxtick option in Options.py was defaulted to MaxTicks, so the old code
would always skip over the maxtime part of the conditionals at the beginning
of run(). Change the maxtick default to None, and set the maxtick local
variable in run() appropriately.
3) To clarify whether max ticks settings are relative or absolute, split the
maxtick option into separate options, for relative and absolute. Ensure that
these two options and the maxtime option are handled appropriately to set the
maxtick variable in Simulation.py.
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.
Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.
A follow-on patch updates the configuration scripts accordingly.
This patch changes how we determine the Python-related compiler and
linker flags. The previous approach used the internal LINKFORSHARED
which is not intended as part of the external API
(http://bugs.python.org/issue3588) and causes failures on recent OSX
installations.
Instead of using distutils we now rely on python-config and scons
ParseConfig. For backwards compatibility we also parse out the
includes and libs although this could safely be dropped. The drawback
of this patch is that Python 2.5 is now required, but hopefully that
is an acceptable compromise as any system with gcc 4.4 most likely
will have Python >= 2.5.
Instead of relying on derived classes explicitly assigning
to the BasicPioDevice pioSize field, require them to pass
a size value in to the constructor.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
PciDev and IntDev stuck out as the only device classes that
ended in 'Dev' rather than 'Device'. This patch takes care
of that inconsistency.
Note that you may need to delete pre-existing files matching
build/*/python/m5/internal/param_* as scons does not pick up
indirect dependencies on imported python modules when generating
params, and the PciDev -> PciDevice rename takes place in a
file (dev/Device.py) that gets imported quite a bit.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
It was confusing having an AmbaDev namespace along with an
AmbaDevice class. The namespace stuff is now moved in to
a new base AmbaDevice class, which is a mixin for classes
AmbaPioDevice (the former AmbaDevice) and AmbaDmaDevice
to provide the readId function as an inherited member function.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
A couple of devices that have single fixed memory mapped regions
were not derived from BasicPioDevice, when that's exactly
the functionality that BasicPioDevice provides. This patch
gets rid of a little bit of redundant code by making those
devices actually do so.
Also fixed the weird case of X86ISA::Interrupts, where
the class already did derive from BasicPioDevice but
didn't actually use all the features it could have.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This code seems not to be of any use now. There is no path in the simulator
that allows for reconfiguring the network. A better approach would be to
take a checkpoint and start the simulation from the checkpoint with the new
configuration.
The configuration scripts provided for ruby assume that the available
physical memory is equally distributed amongst the directory controllers.
But there is no check to ensure this assumption has been adhered to. This
patch adds the required check.
This patch removes the sparse histogram total from the CommMonitor
stats. It also bumps the stats after the unit fixes in the atomic
cache access. Lastly, it updates the stats to match the new port
ordering. All numbers are the same, and the only thing that changes is
which master corresponds to what port index.
This patch reorganizes the cache tags to allow more flexibility to
implement new replacement policies. The base tags class is now a
clocked object so that derived classes can use a clock if they need
one. Also having deriving from SimObject allows specialized Tag
classes to be swapped in/out in .py files.
The cache set is now templatized to allow it to contain customized
cache blocks with additional informaiton. This involved moving code to
the .hh file and removing cacheset.cc.
The statistics belonging to the cache tags are now including ".tags"
in their name. Hence, the stats need an update to reflect the change
in naming.
This patch removes the multiplication operator support for Clock
parameters as this functionality is now achieved by creating derived
clock domains.
Nate, this one is for you.
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.
The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).
The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.
All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.
The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
This patch extends the existing system builders to also include a
syscall-emulation builder. This builder is deployed in all
syscall-emulation regressions that do not involve Ruby,
i.e. o3-timing, simple-timing and simple-atomic, as well as the
multi-processor regressions o3-timing-mp, simple-timing-mp and
simple-atomic-mp (the latter are only used by SPARC at this point).
The values chosen for the cache sizes match those that were used in
the existing config scripts (despite being on the large
side). Similarly, a mem_class parameter is added to the builder base
class to enable simple-atomic to use SimpleMemory and o3-timing to use
the default DDR3 configuration.
Due to the different order the ports are connected, the bus stats get
shuffled around for the multi-processor regressions. A separate patch
bumps the port indices. Besides this, all behaviour is exactly the
same.
This patch adds a 'sys_clock' command-line option and use it to assign
clocks to the system during instantiation.
As part of this change, the default clock in the System class is
removed and whenever a system is instantiated a system clock value
must be set. A default value is provided for the command-line option.
The configs and tests are updated accordingly.
This patch adds a 'cpu_clock' command-line option and uses the value
to assign clocks to components running at the CPU speed (L1 and L2
including the L2-bus). The configuration scripts are updated
accordingly.
The 'clock' option is left unchanged in this patch as it is still used
by a number of components. In follow-on patches the latter will be
disambiguated further.
This patch removes the explicit setting of the clock period for
certain instances of CoherentBus, NonCoherentBus and IOCache where the
specified clock is same as the default value of the system clock. As
all the values used are the defaults, there are no performance
changes. There are similar cases where the toL2Bus is set to use the
parent CPU clock which is already the default behaviour.
The main motivation for these simplifications is to ease the
introduction of clock domains.
This patch prunes the 00.gzip regressions with the main motivation
being that it adds little (or no) coverage and requires a substantial
amount of run time.
A complete regression run, including compilation from a clean repo, is
almost 20% faster(!).
This patch does a bit of tidying up in the bridge code, adding const
where appropriate and also removing redundant checks and adding a few
new ones.
There are no changes to the behaviour of any regressions.
This patch fixes the CommMonitor local variable names, and also
introduces a variable to capture if it expects to see a response. The
latter check considers both needsResponse and memInhibitAsserted.
This patch changes the IEW drain check to include the FU pool as there
can be instructions that are "stored" in FU completion events and thus
not covered by the existing checks. With this patch, we simply include
a check to see if all the FUs are considered non-busy in the next
tick.
Without this patch, the pc-switcheroo-full regression fails after
minor changes to the cache timing (aligning to clock edge).
This patch fixes an outstanding issue in the cache timing calculations
where an atomic access returned a time in Cycles, but the port
forwarded it on as if it was in Ticks.
A separate patch will update the regression stats.
This patch changes the regression script such that it is possible to
identify the runs that fail with an exit code, and those that finish
with stats differences. The ones that truly fail are reported as
FAILED, and those that finish with changed stats as CHANGED.
The yellow colour has been reclaimed from the skipped regressions and
is now used for the changed ones. With no obvious good option left the
skipped ones are now in cyan.
While I was editing the script I also bumped any occurence of M5 to
gem5.
This patch fixes a bug in the granularity calculation. For example, if
the high bit is 6 (counting from 0) and we have one interleaving bit,
then the granularity is now 2 ** (6 - 1 + 1) = 64.
This patch changes the updards snoop packet to avoid allocating and
later deleting it. As the code executes in 0 time and the lifetime of
the packet does not extend beyond the block there is no reason to heap
allocate it.
This patch removes the printing of the SparseHist total in the
stats.txt output file. This has been removed as a sparse histogram has
no total, and therefore this was printing out the value of a
non-local, unrelated variable.
This patch adds separate actions for requests that missed in the local cache
and messages were sent out to get the requested line. These separate actions
are required for differentiating between the hit and miss latencies in the
statistics collected.
This patch adds separate actions for requests that missed in the local cache
and messages were sent out to get the requested line. These separate actions
are required for differentiating between the hit and miss latencies in the
statistics collected.
The patch started of with removing the global variables from the profiler for
profiling the miss latency of requests made to the cache. The corrresponding
histograms have been moved to the Sequencer. These are combined together when
the histograms are printed. Separate histograms are now maintained for
tracking latency of all requests together, of hits only and of misses only.
A particular set of histograms used to use the type GenericMachineType defined
in one of the protocol files. This patch removes this type. Now, everything
that relied on this type would use MachineType instead. To do this, SLICC has
been changed so that multiple machine types can be declared by a controller
in its preamble.