Commit graph

6446 commits

Author SHA1 Message Date
Andreas Sandberg
4dbf25adc3 sim: Fix undefined behavior in the pseudo-inst interface
The order between updating and using arg_num in
PseudoInst::pseudoInst() is currently undefined. This changeset
explicitly updates arg_num after it has been used to extract an
argument.

--HG--
extra : rebase_source : 67c46dc3333d16ce56687ee8aea41ce6c6d133bb
2013-09-18 17:08:35 +02:00
Andreas Hansson
9aa939891f mem: Fix scheduling bug in SimpleMemory
This patch ensures that a dequeue event is not scheduled if the memory
controller is waiting for a retry already. Without this check it is
possible for the controller to attempt sending something whilst
already having one packet that is in retry, thus causing the bus to
have an assertion failure.
2013-09-18 08:46:33 -04:00
Andreas Hansson
fe5212f932 swig: Fix issue with circular import in 2.0.9/2.0.10
This patch fixes an issue which prevented gem5 from running when built
using swig 2.0.9 and 2.0.10. The generated event.py tried to import
m5.internal which in turn relied on importing event. This patch seems
to fix the problem, and so far has not caused any other issues.
2013-09-18 08:46:31 -04:00
Andreas Sandberg
e93e12a62b x86: Expose the raw hash map of MSRs
This patch allows the KVM CPU module to initialize it's MSRs by
enumerating the MSRs in the gem5 x86 implementation.
2013-09-18 11:28:28 +02:00
Andreas Sandberg
4b840b8322 x86: Add support for checking the raw state of an interrupt
In order to support hardware virtualization, we need to be able to
check if there are any interrupts pending irregardless of the
rflags.intf value. This changeset adds the checkInterruptsRaw() method
to the x86 interrupt control. It returns true if there are pending
interrupts that can be delivered as soon as the CPU is ready for
interrupt delivery.
2013-09-18 11:28:27 +02:00
Andreas Sandberg
15733e9b33 x86: Expose the interrupt vector in faults
This patch allows a hardware virtualized CPU to discover which interrupt
to deliver to the guest.
2013-09-18 11:28:24 +02:00
Joel Hestness
cc155ffa0d ruby: Fix Topology throttle connections
The Topology source sets up input and output buffers for each of the external
nodes of a topology by indexing on Ruby's generated controller unique IDs.
These unique IDs are found by adding the MachineType_base_number to the version
number of each controller (see any generated *_Controller.cc - init() calls
getToNetQueue and getFromNetQueue using m_version + base). However, the
Topology object used the cntrl_id - which is required to be unique across all
controllers - to index the controllers list as they are being connected to
their input and output buffers. If the cntrl_ids did not match the Ruby unique
ID, the throttles end up connected to incorrectly indexed nodes in the network,
resulting in packets traversing incorrect network paths. This patch fixes the
Topology indexing scheme by using the Ruby unique ID to match that of the
SimpleNetwork buffer vectors.
2013-09-11 15:35:18 -05:00
Joel Hestness
a1f9081bab cpu: Dynamically instantiate O3 CPU LSQUnits
Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's
object, but it would only initialize numThreads LSQUnits as specified by the
user. This had the effect of leaving some LSQUnits uninitialized when the
number of threads was less than MaxThreads, and when adding statistics to the
LSQUnit that must be initialized, this caused the stats initialization check to
fail. By dynamically instantiating LSQUnits, they are all initialized and this
avoids uninitialized LSQUnits from floating around during runtime.
2013-09-11 15:34:50 -05:00
Joel Hestness
c1cf55c738 ruby: Statically allocate stats in SimpleNetwork, Switch, Throttle
The previous changeset (9863:9483739f83ee) used STL vector containers to
dynamically allocate stats in the Ruby SimpleNetwork, Switch and Throttle. For
gcc versions before at least 4.6.3, this causes the standard vector allocator
to call Stats copy constructors (a no-no, since stats should be allocated in
the body of each SimObject instance). Since the size of these stats arrays is
known at compile time (NOTE: after code generation), this patch changes their
allocation to be static rather than using an STL vector.
2013-09-11 15:33:27 -05:00
Nilay Vaish
e391fd151b stats: add operator= for DataWrapVec class
gcc/g++ 4.4.7 complained about the operator= being undefined.
This changeset adds the operator.
2013-09-09 18:52:23 -05:00
Nilay Vaish
90bfbd9793 ruby: network: convert to gem5 style stats 2013-09-06 16:21:35 -05:00
Nilay Vaish
24dc914d87 ruby: profiler: removes function resourceUsage() 2013-09-06 16:21:32 -05:00
Nilay Vaish
79b5ea9d19 ruby: remove undefined message size type
This message size type does not work well with one of the statistical
variables. It also seems unnecessary.
2013-09-06 16:21:30 -05:00
Nilay Vaish
0280997fbf ruby: network: removes reset functionality 2013-09-06 16:21:30 -05:00
Nilay Vaish
e7bd70e079 ruby: network: shorten variable names 2013-09-06 16:21:29 -05:00
Nilay Vaish
47d113696d stats: adds a Formula operator for division 2013-09-06 16:21:29 -05:00
Nilay Vaish
c0a8ad0a35 ruby: converts sparse memory stats to gem5 style 2013-09-06 16:21:28 -05:00
Andreas Hansson
53cf77cf18 sim: Fix clang warning for unused variable
This patch ensures the NULL ISA can build without causing issues with
an unused variable.
2013-09-05 13:53:54 -04:00
Andreas Hansson
3b90f52b61 util: Add ini string as tooltip info in dot output
This patch adds the config ini string as a tooltip that can be
displayed in most browsers rendering the resulting svg. Certain
characters are modified for HTML output.

Tested on chrome and firefox.
2013-09-04 13:23:00 -04:00
Andreas Hansson
fad36b35c6 util: Add colours to the dot output
This patch is adding a splash of colour to the dot output to make it
easier to distinguish objects of different types. As a bonus, the
pastel-colour palette also makes the output look like a something from
the 21st century.
2013-09-04 13:22:59 -04:00
Andreas Hansson
62cf785178 util: Add class name to dot graph and output to svg
This patch adds the class name to the label, creates some more space
by increasing the rank separation, and additionally outputs the graph
as an editable SVG in addition to the PDF.
2013-09-04 13:22:58 -04:00
Andreas Hansson
19a5b68db7 arch: Resurrect the NOISA build target and rename it NULL
This patch makes it possible to once again build gem5 without any
ISA. The main purpose is to enable work around the interconnect and
memory system without having to build any CPU models or device models.

The regress script is updated to include the NULL ISA target. Currently
no regressions make use of it, but all the testers could (and perhaps
should) transition to it.

--HG--
rename : build_opts/NOISA => build_opts/NULL
rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts
rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh
rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc
2013-09-04 13:22:57 -04:00
Andreas Hansson
ea40297018 cpu: Move the branch predictor out of the BaseCPU
The branch predictor is guarded by having either the in-order or
out-of-order CPU as one of the available CPU models and therefore
should not be used in the BaseCPU. This patch moves the parameter to
the relevant CPU classes.
2013-09-04 13:22:56 -04:00
Andreas Hansson
bb1d2f3957 arch: Header clean up for NOISA resurrection
This patch is a first step to getting NOISA working again. A number of
redundant includes make life more difficult than it has to be and this
patch simply removes them. There are also some redundant forward
declarations removed.
2013-09-04 13:22:55 -04:00
Andreas Hansson
cead68a781 alpha: Move system virtProxy to Alpha only
This patch moves the system virtual port proxy to the Alpha system
only to make the resurrection of the NOISA slightly less
painful. Alpha is the only ISA that is actually using it.
2013-09-04 13:22:55 -04:00
Andreas Hansson
fdf6f6c4b6 scons: Enable build on OSX
This patch changes the SConscript to build gem5 with libc++ on OSX as
the conventional libstdc++ does not have the C++11 constructs that the
current code base makes use of (e.g. std::forward).

Since this was the last use of the transitional TR1, the unordered map
and set header can now be simplified as well.
2013-09-04 13:22:54 -04:00
Andreas Hansson
c6062a3981 cpu: Fix timing CPU isDrained comment formatting
This patch fixes up the comment formatting for isDrained in the timing
CPU.
2013-08-20 11:21:27 -04:00
Andreas Hansson
c57c452143 base: Fix VectorPrint initialisation
This patch changes how the initialisation of the VectorPrint struct is
done so that gcc 4.4 is happy again.
2013-08-20 11:21:26 -04:00
Andreas Hansson
b63631536d stats: Cumulative stats update
This patch updates the stats to reflect the: 1) addition of the
internal queue in SimpleMemory, 2) moving of the memory class outside
FSConfig, 3) fixing up of the 2D vector printing format, 4) specifying
burst size and interface width for the DRAM instead of relying on
cache-line size, 5) performing merging in the DRAM controller write
buffer, and 6) fixing how idle cycles are counted in the atomic and
timing CPU models.

The main reason for bundling them up is to minimise the changeset
size.
2013-08-19 03:52:36 -04:00
Lena Olson
646c4a23ca cpu: Accurately count idle cycles for simple cpu
Added a couple missing updates to the notIdleFraction stat. Without
these, it sometimes gives a (not) idle fraction that is greater than 1
or less than 0.
2013-08-19 03:52:35 -04:00
Andreas Hansson
c26911013c config: Command line support for multi-channel memory
This patch adds support for specifying multi-channel memory
configurations on the command line, e.g. 'se/fs.py
--mem-type=ddr3_1600_x64 --mem-channels=4'. To enable this, it
enhances the functionality of MemConfig and moves the existing
makeMultiChannel class method from SimpleDRAM to the support scripts.

The se/fs.py example scripts are updated to make use of the new
feature.
2013-08-19 03:52:34 -04:00
Andreas Hansson
49d88f08b0 mem: Change AbstractMemory defaults to match the common case
This patch changes the default parameter value of conf_table_reported
to match the common case. It also simplifies the regression and config
scripts to reflect this change.
2013-08-19 03:52:33 -04:00
Sascha Bischoff
e553844efc cpu: Fix TrafficGen trace playback
This patch addresses an issue with trace playback in the TrafficGen
where the trace was reset but the header was not read from the trace
when a captured trace was played back for a second time. This resulted
in parsing errors as the expected message was not found in the trace
file.

The header check is moved to an init funtion which is called by the
constructor and when the trace is reset. This ensures that the trace
header is read each time when the trace is replayed.

This patch also addresses a small formatting issue in a panic.
2013-08-19 03:52:32 -04:00
Andreas Hansson
6279eaf1f7 mem: Use STL deque in favour of list for DRAM queues
This patch changes the data structure used for the DRAM read, write
and response queues from an STL list to deque. This optimisation is
based on the observation that the size is small (and fixed), and that
the structures are frequently iterated over in a linear fashion.
2013-08-19 03:52:32 -04:00
Andreas Hansson
ac42db8134 mem: Perform write merging in the DRAM write queue
This patch implements basic write merging in the DRAM to avoid
redundant bursts. When a new access is added to the queue it is
compared against the existing entries, and if it is either
intersecting or immediately succeeding/preceeding an existing item it
is merged.

There is currently no attempt made at avoiding iterating over the
existing items in determining whether merging is possible or not.
2013-08-19 03:52:31 -04:00
Amin Farmahini
243f135e5f mem: Replacing bytesPerCacheLine with DRAM burstLength in SimpleDRAM
This patch gets rid of bytesPerCacheLine parameter and makes the DRAM
configuration separate from cache line size. Instead of
bytesPerCacheLine, we define a parameter for the DRAM called
burst_length. The burst_length parameter shows the length of a DRAM
device burst in bits. Also, lines_per_rowbuffer is replaced with
device_rowbuffer_size to improve code portablity.

This patch adds a burst length in beats for each memory type, an
interface width for each memory type, and the memory controller model
is extended to reason about "system" packets vs "dram" packets and
assemble the responses properly. It means that system packets larger
than a full burst are split into multiple dram packets.
2013-08-19 03:52:30 -04:00
Andreas Hansson
7a61f667f0 cpu: Fix timing CPU drain check
This patch modifies the SimpleTimingCPU drain check to also consider
the fetch event. Previously, there was an assumption that there is
never a fetch event scheduled if the CPU is not executing
microcode. However, when a context is activated, a fetch even is
scheduled, and microPC() is zero.
2013-08-19 03:52:30 -04:00
Andreas Hansson
f7d44590cb alpha: Check interrupts before quiesce
This patch adds a check to the quiesce operation to ensure that the
CPU does not suspend itself when there are unmasked interrupts
pending. Without this patch there are corner cases when the CPU gets
an interrupt before the quiesce is executed and then never wakes up
again.
2013-08-19 03:52:29 -04:00
Sascha Bischoff
6211c24a96 stats: Fix issue when printing 2D vectors
This patch addresses an issue with the text-based stats output which
resulted in Vector2D stats being printed without subnames in the event
that one of the dimensions was of length 1.

This patch also fixes the total printing for the 2D vector. Previously
totals were printed without explicitly stating that a total was being
printed. This has been rectified in this patch.
2013-08-19 03:52:29 -04:00
Akash Bagdia
e7e17f92db power: Add voltage domains to the clock domains
This patch adds the notion of voltage domains, and groups clock
domains that operate under the same voltage (i.e. power supply) into
domains. Each clock domain is required to be associated with a voltage
domain, and the latter requires the voltage to be explicitly set.

A voltage domain is an independently controllable voltage supply being
provided to section of the design. Thus, if you wish to perform
dynamic voltage scaling on a CPU, its clock domain should be
associated with a separate voltage domain.

The current implementation of the voltage domain does not take into
consideration cases where there are derived voltage domains running at
ratio of native voltage domains, as with the case where there can be
on-chip buck/boost (charge pumps) voltage regulation logic.

The regression and configuration scripts are updated with a generic
voltage domain for the system, and one for the CPUs.
2013-08-19 03:52:28 -04:00
Andreas Hansson
d5593f3c75 mem: Warn instead of panic for tXAW violation
Until the performance bug is fixed, avoid killing simulations.
2013-08-19 03:52:26 -04:00
Andreas Hansson
7bc3eaec7a mem: Allow disabling of tXAW through a 0 activation limit
This patch fixes an issue where an activation limit of 0 was not
allowed. With this patch, setting the limit to 0 simply disables the
tXAW constraint.
2013-08-19 03:52:26 -04:00
Andreas Hansson
2a675aecb9 mem: Add an internal packet queue in SimpleMemory
This patch adds a packet queue in SimpleMemory to avoid using the
packet queue in the port (and thus have no involvement in the flow
control). The port queue was bound to 100 packets, and as the
SimpleMemory is modelling both a controller and an actual RAM, it
potentially has a large number of packets in flight. There is
currently no limit on the number of packets in the memory controller,
but this could easily be added in a follow-on patch.

As a result of the added internal storage, the functional access and
draining is updated. Some minor cleaning up and renaming has also been
done.

The memtest regression changes as a result of this patch and the stats
will be updated.
2013-08-19 03:52:25 -04:00
Andreas Hansson
9b2effd9e2 cpu: Fix a bug in the O3 CPU introduced by the cache line patch
This patch fixes a bug in the O3 fetch stage that was introduced when
the cache line size was moved to the system. By mistake, the
initialisation and resetting of the fetch stage was merged and put in
the constructor. The resetting is now re-added where it should be.
2013-08-19 03:52:24 -04:00
Nilay Vaish
95381f8a99 ruby: slicc: remove double trigger, continueProcessing
These constructs are not in use and are not being maintained by any one.
In addition, it is not known if doubleTrigger works correctly with Ruby now.
2013-08-07 14:51:18 -05:00
Nilay Vaish
f1b17bf157 ruby: slicc: move some code to AbstractController
Some of the code in StateMachine.py file is added to all the controllers and
is independent of the controller definition. This code is being moved to the
AbstractController class which is the parent class of all controllers.
2013-08-07 14:51:18 -05:00
Nilay Vaish
e038741598 x86: add tlb checkpointing
This patch adds checkpointing support to x86 tlb. It upgrades the
cpt_upgrader.py script so that previously created checkpoints can
be updated. It moves the checkpoint version to 6.
2013-08-07 14:51:17 -05:00
Andreas Sandberg
b5bb2a25aa cpu: Remove unused getBranchPred() method from BaseCPU
Remove unused virtual getBranchPred() method from BaseCPU as it is not
implemented by any of the CPU models. It used to always return NULL.
2013-07-19 11:52:07 +02:00
Andreas Hansson
d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Xiangyu Dong
4e8ecd7c6f mem: Add cache class destructor to avoid memory leaks
Make valgrind a little bit happier
2013-07-18 08:29:47 -04:00
Andreas Hansson
204df3b928 sim: Make MaxTick in Python match the one in C++
This patch aligns the MaxTick in Python with the one in C++. Thus,
both reflect the maximum value that an unsigned 64-bit integer can
have.
2013-07-18 08:29:08 -04:00
Deyuan Guo
fb29dcf378 loader: Load weak symbols for function tracing 2013-07-15 18:08:57 -04:00
Umesh Bhaskar
5ba9e7afe2 debug : Fixes the issue wherein Debug symbols were not getting dumped into trace files for SE mode 2013-07-15 11:08:34 -04:00
Steve Reinhardt
1f43e244bd dev: make BasicPioDevice take size in constructor
Instead of relying on derived classes explicitly assigning
to the BasicPioDevice pioSize field, require them to pass
a size value in to the constructor.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 21:57:04 -05:00
Steve Reinhardt
502ad1e675 dev: consistently end device classes in 'Device'
PciDev and IntDev stuck out as the only device classes that
ended in 'Dev' rather than 'Device'.  This patch takes care
of that inconsistency.

Note that you may need to delete pre-existing files matching
build/*/python/m5/internal/param_* as scons does not pick up
indirect dependencies on imported python modules when generating
params, and the PciDev -> PciDevice rename takes place in a
file (dev/Device.py) that gets imported quite a bit.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 21:56:50 -05:00
Steve Reinhardt
2737650a69 dev/arm: get rid of AmbaDev namespace
It was confusing having an AmbaDev namespace along with an
AmbaDevice class.  The namespace stuff is now moved in to
a new base AmbaDevice class, which is a mixin for classes
AmbaPioDevice (the former AmbaDevice) and AmbaDmaDevice
to provide the readId function as an inherited member function.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 21:56:39 -05:00
Steve Reinhardt
b0b1c0205c devices: make more classes derive from BasicPioDevice
A couple of devices that have single fixed memory mapped regions
were not derived from BasicPioDevice, when that's exactly
the functionality that BasicPioDevice provides.  This patch
gets rid of a little bit of redundant code by making those
devices actually do so.

Also fixed the weird case of X86ISA::Interrupts, where
the class already did derive from BasicPioDevice but
didn't actually use all the features it could have.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 21:56:24 -05:00
Brad Beckmann
8e54c93222 ruby: removed the very old double trigger hack
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-07-11 13:56:05 -05:00
Nilay Vaish
1be0098c0b ruby: append transition comment only when in opt/debug 2013-06-28 21:42:27 -05:00
Nilay Vaish
b3980cdb9a ruby: network: remove reconfiguration code
This code seems not to be of any use now. There is no path in the simulator
that allows for reconfiguring the network. A better approach would be to
take a checkpoint and start the simulation from the checkpoint with the new
configuration.
2013-06-28 21:36:37 -05:00
Prakash Ramrakhyani
ac515d7a9b mem: Reorganize cache tags and make them a SimObject
This patch reorganizes the cache tags to allow more flexibility to
implement new replacement policies. The base tags class is now a
clocked object so that derived classes can use a clock if they need
one. Also having deriving from SimObject allows specialized Tag
classes to be swapped in/out in .py files.

The cache set is now templatized to allow it to contain customized
cache blocks with additional informaiton. This involved moving code to
the .hh file and removing cacheset.cc.

The statistics belonging to the cache tags are now including ".tags"
in their name. Hence, the stats need an update to reflect the change
in naming.
2013-06-27 05:49:50 -04:00
Andreas Hansson
0d68d36b9d mem: Remove the cache builder
This patch removes the redundant cache builder class.
2013-06-27 05:49:50 -04:00
Andreas Hansson
a0e551869c config: Remove Clock parameter multiplication
This patch removes the multiplication operator support for Clock
parameters as this functionality is now achieved by creating derived
clock domains.

Nate, this one is for you.
2013-06-27 05:49:50 -04:00
Akash Bagdia
7d7ab73862 sim: Add the notion of clock domains to all ClockedObjects
This patch adds the notion of source- and derived-clock domains to the
ClockedObjects. As such, all clock information is moved to the clock
domain, and the ClockedObjects are grouped into domains.

The clock domains are either source domains, with a specific clock
period, or derived domains that have a parent domain and a divider
(potentially chained). For piece of logic that runs at a derived clock
(a ratio of the clock its parent is running at) the necessary derived
clock domain is created from its corresponding parent clock
domain. For now, the derived clock domain only supports a divider,
thus ensuring a lower speed compared to its parent. Multiplier
functionality implies a PLL logic that has not been modelled yet
(create a separate clock instead).

The clock domains should be used as a mechanism to provide a
controllable clock source that affects clock for every clocked object
lying beneath it. The clock of the domain can (in a future patch) be
controlled by a handler responsible for dynamic frequency scaling of
the respective clock domains.

All the config scripts have been retro-fitted with clock domains. For
the System a default SrcClockDomain is created. For CPUs that run at a
different speed than the system, there is a seperate clock domain
created. This domain incorporates the CPU and the associated
caches. As before, Ruby runs under its own clock domain.

The clock period of all domains are pre-computed, such that no virtual
functions or multiplications are needed when calling
clockPeriod. Instead, the clock period is pre-computed when any
changes occur. For this to be possible, each clock domain tracks its
children.
2013-06-27 05:49:49 -04:00
Akash Bagdia
076d04a653 config: Add a system clock command-line option
This patch adds a 'sys_clock' command-line option and use it to assign
clocks to the system during instantiation.

As part of this change, the default clock in the System class is
removed and whenever a system is instantiated a system clock value
must be set. A default value is provided for the command-line option.

The configs and tests are updated accordingly.
2013-06-27 05:49:49 -04:00
Akash Bagdia
7eccb1b779 config: Remove redundant explicit setting of default clocks
This patch removes the explicit setting of the clock period for
certain instances of CoherentBus, NonCoherentBus and IOCache where the
specified clock is same as the default value of the system clock. As
all the values used are the defaults, there are no performance
changes. There are similar cases where the toL2Bus is set to use the
parent CPU clock which is already the default behaviour.

The main motivation for these simplifications is to ease the
introduction of clock domains.
2013-06-27 05:49:49 -04:00
Andreas Hansson
3b92748937 mem: Tidy up the bridge with const and additional checks
This patch does a bit of tidying up in the bridge code, adding const
where appropriate and also removing redundant checks and adding a few
new ones.

There are no changes to the behaviour of any regressions.
2013-06-27 05:49:49 -04:00
Andreas Hansson
f25ea3fd56 mem: Fix CommMonitor style and response check
This patch fixes the CommMonitor local variable names, and also
introduces a variable to capture if it expects to see a response. The
latter check considers both needsResponse and memInhibitAsserted.
2013-06-27 05:49:49 -04:00
Andreas Hansson
33a8d777ad mem: Align cache timing to clock edges
This patch changes the cache timing calculations such that the results
are aligned to clock edges.

Plenty stats change as a results of this patch.
2013-06-27 05:49:49 -04:00
Andreas Hansson
10650fc525 cpu: Consider instructions waiting for FU completion in draining
This patch changes the IEW drain check to include the FU pool as there
can be instructions that are "stored" in FU completion events and thus
not covered by the existing checks. With this patch, we simply include
a check to see if all the FUs are considered non-busy in the next
tick.

Without this patch, the pc-switcheroo-full regression fails after
minor changes to the cache timing (aligning to clock edge).
2013-06-27 05:49:49 -04:00
Andreas Hansson
368f50a0a1 mem: Cycles converted to Ticks in atomic cache accesses
This patch fixes an outstanding issue in the cache timing calculations
where an atomic access returned a time in Cycles, but the port
forwarded it on as if it was in Ticks.

A separate patch will update the regression stats.
2013-06-27 05:49:49 -04:00
Andreas Hansson
997a6a4add base: Fix address range granularity calculation
This patch fixes a bug in the granularity calculation. For example, if
the high bit is 6 (counting from 0) and we have one interleaving bit,
then the granularity is now 2 ** (6 - 1 + 1) = 64.
2013-06-27 05:49:49 -04:00
Andreas Hansson
f330b3c28d mem: Remove a redundant heap allocation for a snoop packet
This patch changes the updards snoop packet to avoid allocating and
later deleting it. As the code executes in 0 time and the lifetime of
the packet does not extend beyond the block there is no reason to heap
allocate it.
2013-06-27 05:49:49 -04:00
Andreas Hansson
9a1169f3d7 mem: Remove CoherentBus snoop port unused private member
This patch removes an unused member to avoid getting compiler warnings
when using clang.
2013-06-27 05:49:49 -04:00
Sascha Bischoff
3d19bccb93 stats: Remove printing of SparseHist total
This patch removes the printing of the SparseHist total in the
stats.txt output file. This has been removed as a sparse histogram has
no total, and therefore this was printing out the value of a
non-local, unrelated variable.
2013-06-27 05:49:49 -04:00
Nilay Vaish
d8ed1d1a2c ruby: moesi cmp directory: separate actions for external hits
This patch adds separate actions for requests that missed in the local cache
and messages were sent out to get the requested line. These separate actions
are required for differentiating between the hit and miss latencies in the
statistics collected.
2013-06-25 00:32:04 -05:00
Nilay Vaish
128ab50c47 ruby: mesi cmp directory: separate actions for external hits
This patch adds separate actions for requests that missed in the local cache
and messages were sent out to get the requested line. These separate actions
are required for differentiating between the hit and miss latencies in the
statistics collected.
2013-06-25 00:32:03 -05:00
Nilay Vaish
beb6e57c6f ruby: profiler: lots of inter-related changes
The patch started of with removing the global variables from the profiler for
profiling the miss latency of requests made to the cache. The corrresponding
histograms have been moved to the Sequencer. These are combined together when
the histograms are printed. Separate histograms are now maintained for
tracking latency of all requests together, of hits only and of misses only.

A particular set of histograms used to use the type GenericMachineType defined
in one of the protocol files. This patch removes this type. Now, everything
that relied on this type would use MachineType instead. To do this, SLICC has
been changed so that multiple machine types can be declared by a controller
in its preamble.
2013-06-25 00:32:03 -05:00
Nilay Vaish
b3db882dee ruby: remove the three files related to profiling
This patch removes the following three files: RubySlicc_Profiler.sm,
RubySlicc_Profiler_interface.cc and RubySlicc_Profiler_interface.hh.
Only one function prototyped in the file RubySlicc_Profiler.sm. Rest of the
code appearing in any of these files is not in use. Therefore, these files
are being removed.

That one single function, profileMsgDelay(), is being moved to the protocol
files where it is in use. If we need any of these deleted functions, I think
the right way to make them visible is to have the AbstractController class in
a .sm and let the controller state machine inherit from this class. The
AbstractController class can then have the prototypes of these profiling
functions in its definition.
2013-06-24 08:59:08 -05:00
Joel Hestness ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E)
71c6c43110 ruby: MessageBuffer: Remove unused m_size variable
The m_size variable attempted to track m_prio_heap.size(), but it did so
incorrectly due to the functions reanalyzeMessages and reanalyzeAllMessages().
Since this variable is intended to track m_prio_heap.size(), we can simply
replace instances where m_size is referenced with m_prio_heap.size(), which
has the added bonus of removing the need for m_size.

Note: This patch also removes an extraneous DPRINTF format string designator
from reanalyzeAllMessages()

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-24 06:57:06 -05:00
Lena Olson
94280c7e51 ruby: fix typo in MOESI_CMP_token protocol 2013-06-20 16:20:38 -05:00
Lena Olson
ed234ddec6 ruby: Fix prefetching for MESI_CMP_Directory
Transitions from present on PF_Ifetch were missing, causing a crash when
prefetching is enabled.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-18 16:59:22 -05:00
Lena Olson
eb1279ff49 ruby: fix slicc compiler to complain about duplicate symbols
Previously, .sm files were allowed to use the same name for a type and a
variable. This is unnecessarily confusing and has some bad side effects, like
not being able to declare later variables in the same scope with the same type.
This causes the compiler to complain and die on things like Address Address.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-18 16:58:52 -05:00
Lena Olson
7c39d5df7e ruby: restrict Address to being a type and not a variable name
Change all occurrances of Address as a variable name to instead use Addr.
Address is an allowed name in slicc even when Address is also being used as a
type, leading to declarations of "Address Address". While this works, it
prevents adding another field of type Address because the compiler then thinks
Address is a variable name, not type.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-06-18 16:58:33 -05:00
Andreas Sandberg
d06064c386 x86: Add support for maintaining the x87 tag word
The current implementation of the x87 never updates the x87 tag
word. This is currently not a big issue since the simulated x87 never
checks for stack overflows, however this becomes an issue when
switching between a virtualized CPU and a simulated CPU. This
changeset adds support, which is enabled by default, for updating the
tag register to every floating point microop that updates the stack
top using the spm mechanism.

The new tag words is generated by the helper function
X86ISA::genX87Tags(). This function is currently limited to flagging a
stack position as valid or invalid and does not try to distinguish
between the valid, zero, and special states.
2013-06-18 16:36:08 +02:00
Andreas Sandberg
a8e8c4f433 x86: Fix loading of floating point constants
This changeset actually fixes two issues:

 * The lfpimm instruction didn't work correctly when applied to a
   floating point constant (it did work for integers containing the
   bit string representation of a constant) since it used
   reinterpret_cast to convert a double to a uint64_t. This caused a
   compilation error, at least, in gcc 4.6.3.

 * The instructions loading floating point constants in the x87
   processor didn't work correctly since they just stored a truncated
   integer instead of a double in the floating point register. This
   changeset fixes the old microcode by using lfpimm instruction
   instead of the limm instructions.
2013-06-18 16:30:06 +02:00
Andreas Sandberg
c9c02efb99 x86: Initialize the MXCSR register 2013-06-18 16:28:36 +02:00
Andreas Sandberg
688fc7f71f x86: Make the boot state VMX compliant
This patch allows the default x86 state to be used when by CPUs that
use hardware virtualization.
2013-06-18 16:27:28 +02:00
Andreas Sandberg
5d584934ad x86: Make fprem like the fprem on a real x87
The current implementation of fprem simply does an fmod and doesn't
simulate any of the iterative behavior in a real fprem. This isn't
normally a problem, however, it can lead to problems when switching
between CPU models. If switching from a real CPU in the middle of an
fprem loop to a simulated CPU, the output of the fprem loop becomes
correupted. This changeset changes the fprem implementation to work
like the one on real hardware.
2013-06-18 16:10:42 +02:00
Andreas Sandberg
6151c0f7f4 kvm: Use the address finalization code in the TLB
Reuse the address finalization code in the TLB instead of replicating
it when handling MMIO. This patch also adds support for injecting
memory mapped IPR requests into the memory system.
2013-06-18 16:10:22 +02:00
Andreas Sandberg
46a8cbbb7f x86: Add helper functions to access rflags
The rflags register is spread across several different registers. Most
of the flags are stored in MISCREG_RFLAGS, but some are stored in
microcode registers. When accessing RFLAGS, we need to reconstruct it
from these registers. This changeset adds two functions,
X86ISA::getRFlags() and X86ISA::setRFlags(), that take care of this
magic.
2013-06-18 16:10:22 +02:00
Andreas Sandberg
de89e133d8 x86: Fix the flag handling code in FABS and FCHS
This changeset fixes two problems in the FABS and FCHS
implementation. First, the ISA parser expects the assignment in
flag_code to be a pure assignment and not an and-assignment, which
leads to the isa_parser omitting the misc reg update. Second, the FCHS
and FABS macro-ops don't set the SetStatus flag, which means that the
default micro-op version, which doesn't update FSW, is executed.
2013-06-18 16:10:21 +02:00
Andreas Sandberg
64270b19c3 kvm: Add more VM stats
This changeset adds the following stats to KVM:
 * numVMHalfEntries: Number of entries into KVM to finalize pending
   IO operations without executing guest instructions. These typically
   happen as a result of a drain where the guest must finalize some
   operations before the guest state is consistent.
 * numExitSignal: Number of VM exits that have been triggered by a
   signal. These usually happen as a result of the timer that limits
   the time spent in KVM.
2013-06-11 09:43:05 +02:00
Andreas Sandberg
c97a99110b kvm: Separate host frequency from simulated CPU frequency
We used to use the KVM CPU's clock to specify the host frequency. This
was not ideal for several reasons. One of them being that the clock
parameter of a CPU determines the frequency of some of the components
connected to the CPU. This changeset adds a separate hostFreq
parameter that should be used to specify the host frequency until we
add code to autodetect it. The hostFactor should still be used to
specify the conversion factor between the host performance and that of
the simulated system.
2013-06-11 09:24:55 +02:00
Andreas Sandberg
4f002930bc kvm: Don't handle IO and execute in the same tick
We currently execute instructions in the guest and then handle any IO
request right after we break out of the virtualized environment. This
has the effect of executing IO requests in the exact same tick as the
first instruction in the sequence that was just run. There seem to be
cases where this simplification upsets some timing-sensitive devices.

This changeset splits execute and IO (and other services) across
multiple ticks. This is implemented by adding a separate
RunningService state to the CPU state machine. When a VM requires
service, it enters into this state and pending IO is then serviced in
the future instead of immediately. The delay between getting the
request and servicing it depends on the number of cycles executed in
the guest, which allows other components to catch up with the CPU.
2013-06-11 09:24:51 +02:00
Andreas Sandberg
df059f45a0 kvm: Maintain a local instruction counter and update totalNumInsts
Update the system's totalNumInst counter when exiting from KVM and
maintain an internal absolute instruction count instead of relying on
the one from perf.
2013-06-11 09:24:40 +02:00
Andreas Sandberg
0b4a8b4086 x86: Fix bug when copying TSC on CPU handover
The TSC value stored in MISCREG_TSC is actually just an offset from
the current CPU cycle to the actual TSC value. Writes with
side-effects to the TSC subtract the current cycle count before
storing the new value, while reads add the current cycle count. When
switching CPUs, the current value is copied without side-effects. This
works as long as the source and the destination CPUs have the same
clock frequencies. The TSC will jump, sometimes backwards, if they
have different clock frequencies. Most OSes assume the TSC to be
monotonic and break when this happens.

This changeset makes sure that the TSC is copied with side-effects to
ensure that the offset is updated to match the new CPU.
2013-06-11 09:24:38 +02:00
Andreas Sandberg
2442aae54f sim: Revert [34e3295b0e39] (sim: Fix early termination in mult...)
HG changset 34e3295b0e39 introduced a check in the main simulation
loop that discards exit events that happen at the same tick as another
exit event. This was supposed to fix a problem where a simulation
script got confused by multiple exit events. This obviously breaks the
simulator since it can hide important simulation events, such as a
simulation failure, that happen at the same time as a non-fatal
simulation event.
2013-06-11 09:24:10 +02:00
Andreas Sandberg
0793d0727b cpu: Add support for scheduling multiple inst/load stop events
Currently, the only way to get a CPU to stop after a fixed number of
instructions/loads is to set a property on the CPU that causes a
SimLoopExitEvent to be scheduled when the CPU is constructed. This is
clearly not ideal in cases where the simulation script wants the CPU
to stop at multiple instruction counts (e.g., SimPoint generation).

This changeset adds the methods scheduleInstStop() and
scheduleLoadStop() to the BaseCPU. These methods are exported to
Python and are designed to be used from the simulation script. By
using these methods instead of the old properties, a simulation script
can schedule a stop at any point during simulation or schedule
multiple stops. The number of instructions specified when scheduling a
stop is relative to the current point of execution.
2013-06-11 09:18:25 +02:00
Nilay Vaish
d32ee94231 ruby: remove several unused variables in Profiler
This patch removes per processor cycle count, histogram for filter stats,
histogram for multicasts, histogram for prefetch wait, some function
prototypes that do not have definitions.
2013-06-09 07:30:00 -05:00
Nilay Vaish
27b321f2f7 ruby: remove periodic event from Profiler
The Profiler class does not need an event for dumping statistics
periodically. This is because there is a method for dumping statistics
for all the sim objects periodically. Since Ruby is a sim object, its
statistics are also included.
2013-06-09 07:29:59 -05:00
Nilay Vaish
f59a7af50a ruby: stats: use gem5's stats for cache and memory controllers
This moves event and transition count statistics for cache controllers to
gem5's statistics. It does the same for the statistics associated with the
memory controller in ruby.

All the cache/directory/dma controllers individually collect the event and
transition counts. A callback function, collateStats(), has been added that
is invoked on the controller version 0 of each controller class. This
function adds all the individual controller statistics to a vector
variables. All the code for registering the statistical variables and
collating them is generated by SLICC. The patch removes the files
*_Profiler.{cc,hh} and *_ProfileDumper.{cc,hh} which were earlier used for
collecting and dumping statistics respectively.
2013-06-09 07:29:59 -05:00
Nilay Vaish
38736ce7c3 ruby: remove undefined functions in Address class 2013-06-09 07:29:58 -05:00
Nilay Vaish
f2b5b4c8cc stats: allow printing vectors on a single line
This patch adds a new flag to specify if the data values for a given vector
should be printed in one line in the stats.txt file. The default behavior
will be to print the data in multiple lines. It makes changes to print
functions to enforce this behavior.
2013-06-09 07:29:57 -05:00
Andreas Sandberg
a3685b0181 dev: Clarify why updates are delayed when the MC14818 is activated 2013-06-04 10:08:21 +02:00
Andreas Sandberg
7846f59d0d arch: Create a method to finalize physical addresses
in the TLB

Some architectures (currently only x86) require some fixing-up of
physical addresses after a normal address translation. This is usually
to remap devices such as the APIC, but could be used for other memory
mapped devices as well. When running the CPU in a using hardware
virtualization, we still need to do these address fix-ups before
inserting the request into the memory system. This patch moves this
patch allows that code to be used by such CPUs without doing full
address translations.
2013-06-03 13:55:41 +02:00
Andreas Sandberg
63dae28703 base: Make the Python module loader PEP302 compliant
The custom Python loader didn't comply with PEP302 for two reasons:

 * Previously, we would overwrite old modules on name
   conflicts. PEP302 explicitly states that: "If there is an existing
   module object named 'fullname' in sys.modules, the loader must use
   that existing module".

 * The "__package__" attribute wasn't set. PEP302: "The __package__
   attribute must be set."

This changeset addresses both of these issues.
2013-06-03 13:51:03 +02:00
Andreas Sandberg
c2ec232920 kvm: Allow architectures to override the cycle accounting mechanism
Some architectures have special registers in the guest that can be
used to do cycle accounting. This is generally preferrable since the
prevents the guest from seeing a non-monotonic clock. This changeset
adds a virtual method, getHostCycles(), that the architecture-specific
code can override to implement this functionallity. The default
implementation uses the hwCycles counter.
2013-06-03 13:39:11 +02:00
Andreas Sandberg
15f81b6ed9 kvm: Add handling of EAGAIN when creating timers
timer_create can apparently return -1 and set errno to EAGAIN if the
kernel suffered a temporary failure when allocating a timer. This
happens from time to time, so we need to handle it.
2013-06-03 13:38:59 +02:00
Andreas Sandberg
743f80712e sim: Add debug output when executing pseudo-instructions 2013-06-03 13:21:21 +02:00
Andreas Sandberg
2b65fce5d9 kvm: Add a call to thread->startup() in startup()
It is now required to initialize the thread context by calling
startup() on it. Failing to do so currently causes decoder in
x86-based CPUs to get very confused when restoring from checkpoints.
2013-06-03 12:36:56 +02:00
Andreas Sandberg
5e60f87aa3 dev: Add support for disabling ticking and the divider in MC146818
Some Linux versions disable updates (regB.set = 1) to prevent the chip
from updating its internal state while the OS is updating it. Support
for this was already there, this patch merely disables the check in
writeReg that prevented it from being enabled. The patch also includes
support for disabling the divider, which is used to control when clock
updates should start after setting the internal RTC state.

These changes are required to boot most vanilla Linux distributions
that update the RTC settings at boot.
2013-06-03 12:28:52 +02:00
Andreas Sandberg
14b8a17f28 dev: Clean up MC146818 register (A & B) handling
Rewrite reg A & B handling to use the bitunion stuff instead of bit
masking. Add better error messages when the kernel tries to enable
unsupported stuff.
2013-06-03 12:28:41 +02:00
Andreas Hansson
3bc4ecdcb4 mem: More descriptive DRAM config names
This patch changes the class names of the variuos DRAM configurations
to better reflect what memory they are based on. The speed and
interface width is now part of the name, and also the alias that is
used to select them on the command line.

Some minor changes are done to the actual parameters, to better
reflect the named configurations. As a result of these changes the
regressions change slightly and the stats will be bumped in a separate
patch.
2013-05-30 12:54:14 -04:00
Andreas Hansson
83d99aebb1 mem: Add bytes per activate DRAM controller stat
This patch adds a histogram to track how many bytes are accessed in an
open row before it is closed. This metric is useful in characterising
a workload and the efficiency of the DRAM scheduler. For example, a
DDR3-1600 device requires 44 cycles (tRC) before it can activate
another row in the same bank. For a x32 interface (8 bytes per cycle)
that means 8 x 44 = 352 bytes must be transferred to hide the
preparation time.
2013-05-30 12:54:13 -04:00
Andreas Hansson
d82bffd297 mem: Add static latency to the DRAM controller
This patch adds a frontend and backend static latency to the DRAM
controller by delaying the responses. Two parameters expressing the
frontend and backend contributions in absolute time are added to the
controller, and the appropriate latency is added to the responses when
adding them to the (infinite) queued port for sending.

For writes and reads that hit in the write buffer, only the frontend
latency is added. For reads that are serviced by the DRAM, the static
latency is the sum of the pipeline latencies of the entire frontend,
backend and PHY. The default values are chosen based on having roughly
10 pipeline stages in total at 500 MHz.

In the future, it would be sensible to make the controller use its
clock and convert these latencies (and a few of the DRAM timings) to
cycles.
2013-05-30 12:54:12 -04:00
Andreas Hansson
7da851d1a8 mem: Spring cleaning of MSHR and MSHRQueue
This patch does some minor tidying up of the MSHR and MSHRQueue. The
clean up started as part of some ad-hoc tracing and debugging, but
seems worthwhile enough to go in as a separate patch.

The highlights of the changes are reduced scoping (private) members
where possible, avoiding redundant new/delete, and constructor
initialisation to please static code analyzers.
2013-05-30 12:54:11 -04:00
Andreas Hansson
42191522cc mem: Fix MSHR print format
This patch fixes an incorrect print format string by adding an
additional string element.
2013-05-30 12:54:09 -04:00
Andreas Hansson
4d7d8393ed cpu: Prune the stale TraceCPU
This patch prunes the TraceCPU as the code is stale and the
functionality that it provided can now be achieved with the TrafficGen
using its trace playback mode.

The TraceCPU was able to play back pre-recorded memory traces of a few
different formats, and to achieve this level of flexibility with the
TrafficGen, use the util/encode_packet_trace (with suitable
modifications) to create a protobuf trace off-line.
2013-05-30 12:54:09 -04:00
Sascha Bischoff
6f4be9bd4c cpu: Check that minimum TrafficGen period is less than max period
Add a check which ensures that the minumum period for the LINEAR and
RANDOM traffic generator states is less than or equal to the maximum
period. If the minimum period is greater than the maximum period a
fatal is triggered.
2013-05-30 12:54:08 -04:00
Sascha Bischoff
04ccc79134 cpu: Fix bug when reading in TrafficGen state transitions
This patch fixes a bug with the traffic generator which occured when
reading in the state transitions from the configuration
file. Previously, the size of the vector which stored the transitions
was used to get the size of the transitions matrix, rather than using
the number of states. Therefore, if there were more transitions than
states, i.e. some transitions has a probability of less than 1, then
the traffic generator would fatal when trying to check the
transitions.

This issue has been addressed by using the number of input states,
rather then the number of transitions.
2013-05-30 12:54:07 -04:00
Andreas Hansson
fc09bc8678 cpu: Add request elasticity to the traffic generator
This patch adds an optional request elasticity to the traffic
generator, effectievly compensating for it in the case of the linear
and random generators, and adding it in the case of the trace
generator. The accounting is left with the top-level traffic
generator, and the individual generators do the necessary math as part
of determining the next packet tick.

Note that in the linear and random generators we have to compensate
for the blocked time to not be elastic, i.e. without this patch the
aforementioned generators will slow down in the case of back-pressure.
2013-05-30 12:54:06 -04:00
Andreas Hansson
4931414ca7 cpu: Block traffic generator when requests have to retry
This patch changes the queued port for a conventional master port and
stalls the traffic generator when requests are not immediately
accepted. This is a first step to allowing elasticity in the injection
of requests.

The patch also adds stats for the sent packets and retries, and
slightly changes how the nextPacketTick and getNextPacket
interact. The advancing of the trace is now moved to getNextPacket and
nextPacketTick is only responsible for answering the question when the
next packet should be sent.
2013-05-30 12:54:05 -04:00
Andreas Hansson
c9c35da934 cpu: Move traffic generator sending out of generator states
This patch moves the responsibility for sending packets out of the
generator states and leaves it with the top-level traffic
generator. The main aim of this patch is to enable a transition to
non-queued ports, i.e. with send/retry flow control, and to do so it
is much more convenient to not wrap the port interactions and instead
leave it all local to the traffic generator.

The generator states now only govern when they are ready to send
something new, and the generation of the packets to send. They thus
have no knowledge of the port that is used.
2013-05-30 12:54:04 -04:00
Andreas Hansson
ba11a02cf2 cpu: Fold together the StateGraph and the TrafficGen
This patch simplifies the object hierarchy of the traffic generator by
getting rid of the StateGraph class and folding this functionality
into the traffic generator itself.

The main goal of this patch is to facilitate upcoming changes by
reducing the number of affected layers.
2013-05-30 12:54:03 -04:00
Andreas Hansson
7e13c4d046 mem: Make returning snoop responses occupy response layer
This patch introduces a mirrored internal snoop port to facilitate
easy addition of flow control for the snoop responses that are turned
into normal responses on their return. To perform this, the slave
ports of the coherent bus are wrapped in internal master ports that
are passed as the source ports to the response layer in question.

As a result of this patch, there is more contention for the response
resources, and as such system performance will decrease slightly.

A consequence of the mirrored internal port is that the port the bus
tells to retry (the internal one) and the port actually retrying (the
mirrored) one are not the same. Thus, the existing check in tryTiming
is not longer correct. In fact, the test is redundant as the layer is
only in the retry state while calling sendRetry on the waiting port,
and if the latter does not immediately call the bus then the retry
state is left. Consequently the check is removed.
2013-05-30 12:54:02 -04:00
Andreas Hansson
2308f812ef mem: Make the buses multi layered
This patch makes the buses multi layered, and effectively creates a
crossbar structure with distributed contention ports at the
destination ports. Before this patch, a bus could have a single
request, response and snoop response in flight at any time, and with
these changes there can be as many requests as connected slaves (bus
master ports), and as many responses as connected masters (bus slave
ports).

Together with address interleaving, this patch enables us to create
high-throughput memory interconnects, e.g. 50+ GByte/s.
2013-05-30 12:54:01 -04:00
Andreas Hansson
e82996d9da mem: Separate the two snoop response cases in the bus
This patch makes the flow control and state updates of the coherent
bus more clear by separating the two cases, i.e. forward as a snoop
response, or turn it into a normal response.

With this change it is also more clear what resources are being
occupied, and that we effectively bypass the busy check for the second
case. As a result of the change in resource usage some stats change.
2013-05-30 12:54:00 -04:00
Andreas Hansson
cb62d39835 mem: Tidy up a few variables in the bus
This patch does some minor housekeeping on the bus code, removing
redundant code, and moving the extraction of the destination id to the
top of the functions using it.
2013-05-30 12:53:59 -04:00
Uri Wiener
91f7b065a9 mem: Add basic stats to the buses
This patch adds a basic set of stats which are hard to impossible to
implement using only communication monitors, and are needed for
insight such as bus utilization, transactions through the bus etc.

Stats added include throughput and transaction distribution, and also
a two-dimensional vector capturing how many packets and how much data
is exchanged between the masters and slaves connected to the bus.
2013-05-30 12:53:58 -04:00
Andreas Hansson
e1e73c5f39 mem: Use unordered set in bus request tracking
This patch changes the set used to track outstanding requests to an
unordered set (part of C++11 STL). There is no need to maintain the
order, and hopefully there might even be a small performance benefit.
2013-05-30 12:53:57 -04:00
Andreas Hansson
82397921a5 mem: Check for waiting state in bus draining
This patch fixes a bug in the bus where the bus transitions from busy
to idle and still has a port that is waiting for a retry from a peer.
2013-05-30 12:53:57 -04:00
Andreas Hansson
bf6291460d mem: Add a LPDDR3-1600 configuration
This patch adds a typical (leaning towards fast) LPDDR3 configuration
based on publically available data. As expected, it looks very similar
to the LPDDR2-S4 configuration, only with a slightly lower burst time.
2013-05-30 12:53:56 -04:00
Andreas Hansson
ce1ad84abd mem: Adapt the LPDDR2 to match a single x32 channel
This patch adapts the existing LPDDR2 configuration to make use of the
multi-channel functionality. Thus, to get a x64 interface two
controllers should be instantiated using the makeMultiChannel method.

The page size and ranks are also adapted to better suit with a typical
LPDDR2 part.
2013-05-30 12:53:55 -04:00
Andreas Hansson
88aa7755f4 mem: Avoid explicitly zeroing the memory backing store
This patch removes the explicit memset as it is redundant and causes
the simulator to touch the entire space, forcing the host system to
allocate the pages.

Anonymous pages are mapped on the first access, and the page-fault
handler is responsible for zeroing them. Thus, the pages are still
zeroed, but we avoid touching the entire allocated space which enables
us to use much larger memory sizes as long as not all the memory is
actually used.
2013-05-30 12:53:54 -04:00
Andreas Hansson
4c7a283e55 base: Avoid size limitation on protobuf coded streams
This patch changes how the streams are created to avoid the size
limitation on the coded streams. As we only read/write a single
message at a time, there is never any message larger than a few
bytes. However, the coded stream eventually complains that its
internal counter reaches 64+ MByte if the total file size exceeds this
value.

Based on suggestions in the protobuf discussion forums, the coded
stream is now created for every message that is read/written. The
result is that the internal byte count never goes about tens of bytes,
and we can read/write any size file that the underlying file I/O can
handle.
2013-05-30 12:53:53 -04:00
Andreas Hansson
d1a43d83da cpu: Make hash struct instead of class to please clang
This patch changes the type of the hash function for BasicBlockRanges
to match the original definition of the templatized type. Without
this, clang raises a warning and combined with the "-Werror" flag this
causes compilation to fail.
2013-05-30 12:53:52 -04:00
Malek Musleh
64af621cc6 ruby: slicc: fix error msg in TypeFieldMemberAST.py 2013-05-21 11:57:14 -05:00
Gedare Bloom
22b60c57e6 x86: Squash outstanding walks when instructions are squashed.
This is the x86 version of the ARM changeset baa17ba80e06. In case an
instruction has been squashed by the o3 cpu, this patch allows page
table walker to avoid carrying out a pending translation that the
instruction requested for.
2013-05-21 11:40:11 -05:00
Nilay Vaish
30fe807316 x86: mark instructions for being function call/return
Currently call and return instructions are marked as IsCall and IsReturn. Thus, the
branch predictor does not use RAS for these instructions. Similarly, the number of
function calls that took place is recorded as 0. This patch marks these instructions
as they should be.
2013-05-21 11:34:41 -05:00
Nilay Vaish
fba40864aa x86: add op class for int and fp microops in isa description
Currently all the integer microops are marked as IntAluOp and the floating
point microops are marked as FloatAddOp. This patch adds support for marking
different microops differently. Now IntMultOp, IntDivOp, FloatDivOp,
FloatMultOp, FloatCvtOp, FloatSqrtOp classes will be used as well. This will
help in providing different latencies for different op class.
2013-05-21 11:33:57 -05:00
Nilay Vaish
4ef466cc8a ruby: moesi hammer: cosmetic changes
Updates copyright years, removes space at the end of lines, shortens
variable names.
2013-05-21 11:32:45 -05:00
Nilay Vaish
09d5bc7e6f ruby: mesi cmp directory: cosmetic changes
Updates copyright years, removes space at the end of lines, shortens
variable names.
2013-05-21 11:32:38 -05:00
Nilay Vaish
bd3d1955da ruby: moesi cmp token: cosmetic changes
Updates copyright years, removes space at the end of lines, shortens
variable names.
2013-05-21 11:32:24 -05:00
Nilay Vaish
e7ce518168 ruby: moesi cmp directory: cosmetic changes
Updates copyright years, removes space at the end of lines, shortens
variable names.
2013-05-21 11:32:15 -05:00
Nilay Vaish ext:(%2C%20Malek%20Musleh%20%3Cmalek.musleh%40gmail.com%3E)
59a7abff29 ruby: add stats to .sm files, remove cache profiler
This patch changes the way cache statistics are collected in ruby.

As of now, there is separate entity called CacheProfiler which holds
statistical variables for caches. The CacheMemory class defines different
functions for accessing the CacheProfiler. These functions are then invoked
in the .sm files. I find this approach opaque and prone to error. Secondly,
we probably should not be paying the cost of a function call for recording
statistics.

Instead, this patch allows for accessing statistical variables in the
.sm files. The collection would become transparent. Secondly, it would happen
in place, so no function calls. The patch also removes the CacheProfiler class.

--HG--
rename : src/mem/slicc/ast/InfixOperatorExprAST.py => src/mem/slicc/ast/OperatorExprAST.py
2013-05-21 11:31:31 -05:00
Anthony Gutierrez
d3c33d91b6 cpu: remove local/globalHistoryBits params from branch pred
having separate params for the local/globalHistoryBits and the
local/globalPredictorSize can lead to inconsistencies if they
are not carefully set. this patch dervies the number of bits
necessary to index into the local/global predictors based on
their size.

the value of the localHistoryTableSize for the ARM O3 CPU has been
increased to 1024 from 64, which is more accurate for an A15 based
on some correlation against A15 hardware.
2013-05-14 18:39:47 -04:00
Andreas Sandberg
4e52789c6d kvm: Add support for disabling coalesced MMIO
Add the option useCoalescedMMIO to the BaseKvmCPU. The default
behavior is to disable coalesced MMIO since this hasn't been heavily
tested.
2013-05-14 16:02:45 +02:00
Andreas Sandberg
3ba93822cc kvm: Dump state before panic in KVM exit handlers 2013-05-14 15:59:43 +02:00
Andreas Sandberg
98483ba858 kvm: Fix the memory interface used by KVM
The CpuPort class was removed before the KVM patches were committed,
which means that the KVM interface currently doesn't compile. This
changeset adds the BaseKvmCPU::KVMCpuPort class which derives from
MasterPort. This class is used on the data and instruction ports
instead of the old CpuPort.
2013-05-14 15:56:04 +02:00
Andreas Sandberg
1ae30c68c1 arm: Add support for the m5fail pseudo-op 2013-05-14 15:06:50 +02:00
Andreas Sandberg
e316e4e5fe kvm: Add a stat counting number of instructions executed
This changeset adds a 'numInsts' stat to the KVM-based CPU. It also
cleans up the variable names in kvmRun to make the distinction between
host cycles and estimated simulated cycles clearer. As a bonus
feature, it also fixes a warning (unreferenced variable) when
compiling in fast mode.
2013-05-02 12:03:43 +02:00
Andreas Sandberg
fa249461ca kvm: Add checkpoint debug print
Add a debug print (when the Checkpoint debug flag is set) on serialize
and unserialize. Additionally, dump the KVM state before
serializing. The KVM state isn't dumped after unserializing since the
state is loaded lazily on the next KVM entry.
2013-05-02 12:02:19 +02:00
Andreas Sandberg
41156c8196 kvm: Make MMIO requests uncacheable
Device accesses are normally uncacheable. This change probably doesn't
make any difference since we normally disable caching when KVM is
active. However, there might be devices that check this, so we'd
better enable this flag to be safe.
2013-05-02 12:01:50 +02:00
Andreas Sandberg
12d7498ad5 sim: Add support for m5fail in pseudoInst() 2013-05-02 11:54:08 +02:00
Michael Levenhagen
223f89a162 x86: corrects vsyscall address for gettimeofday
The vsyscall address for gettimeofday is 0xffffffffff600000ul. The offset
therefore should be 0x0 instead of 0x410. This can be cross checked with
the file sysdeps/unix/sysv/linux/x86_64/gettimeofday.c in source of glibc.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-04-23 15:21:32 -05:00
Michael Levenhagen
794d00257a x86: enable gettimeofday and getppid system calls
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-04-23 15:21:30 -05:00
Mitch Hayenga
b222ba2fd3 sim: Fix two bugs relating to software caching of PageTable entries.
The existing implementation can read uninitialized data or stale information
from the cached PageTable entries.

1) Add a valid bit for the cache entries.  Simply using zero for the virtual
address to signify invalid entries is not sufficient.  Speculative, wrong-path
accesses frequently access page zero.  The current implementation would return
a uninitialized TLB entry when address zero was accessed and the PageTable
cache entry was invalid.

2) When unmapping/mapping/remaping a page, invalidate the corresponding
PageTable cache entry if one already exists.
2013-04-23 09:47:52 -04:00
Andreas Hansson
3e35fa5dcc cpu: Fix TraceGen flag initalisation
This patch ensures the flags are always initialised.
2013-04-23 05:07:10 -04:00
Nilay Vaish
95eebf9e5e ruby: mesi coherence protocol: remove unused state M_MB 2013-04-23 00:03:07 -05:00
Christian Menard
25a6b1866e x86: increment the stack pointer in lret inst
The 'lret' instruction reloads instruction pointer and code segment from the
stack and then pops them. But the popping part is missing from the current
implementation. This caused incorrect behavior in some code related to the
Fiasco OS. Microops are being added to rectify the behavior of the instruction.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-04-23 00:03:04 -05:00
Nilay Vaish
aa86800e7a ruby: patch checkpoint restore with garnet
Due to recent changes to clocking system in Ruby and the way Ruby restores
state from a checkpoint, garnet was failing to run from a checkpointed state.
The problem is that Ruby resets the time to zero while warming up the caches.
If any component records a local copy of the time (read calls curCycle())
before the simulation has started, then that component will not operate until
that time is reached. In the context of this particular patch, the Garnet
Network class calls curCycle() at multiple places. Any non-operational
component can block in requests in the memory system, which the system
interprets as a deadlock. This patch makes changes so that Garnet can
successfully run from checkpointed state.

It adds a globally visible time at which the actual execution started. This
time is initialized in RubySystem::startup() function. This variable is only
meant for components with in Ruby. This replaces the private variable that
was maintained within Garnet since it is not possible to figure out the
correct time when the value of this variable can be set.

The patch also does away with all cases where curCycle() is called with in
some Ruby component before the system has actually started executing. This
is required due to the quirky manner in which ruby restores from a checkpoint.
2013-04-23 00:03:02 -05:00
Andreas Hansson
e23e3bea8b mem: Address mapping with fine-grained channel interleaving
This patch adds an address mapping scheme where the channel
interleaving takes place on a cache line granularity. It is similar to
the existing RaBaChCo that interleaves on a DRAM page, but should give
higher performance when there is less locality in the address
stream.
2013-04-22 13:20:34 -04:00
Andreas Hansson
e61799aa7c mem: More descriptive enum names for address mapping
This patch changes the slightly ambigious names used for the address
mapping scheme to be more descriptive, and actually spell out what
they do. With this patch we also open up for adding more flavours of
open- and close-type mappings, i.e. interleaving across channels with
the open map.
2013-04-22 13:20:33 -04:00
Andreas Hansson
99b3a12a75 cpu: Use request flags in trace playback
This patch changes the TraceGen such that it uses the optional request
flags from the protobuf trace if they are present.
2013-04-22 13:20:33 -04:00
Andreas Hansson
fe97f0e2b1 cpu: Make the generators usable outside the TrafficGen module
This patch enables the use of the generator behaviours outside the
TrafficGen module. This is useful e.g. to allow packet replay modes
for other devices in the system without having to replace them with a
TrafficGen in the configuration files.

This change also enables more specific behaviours to be composed as
specific modules, e.g. BaseBandModem can use a number of generators
and have application-specific parameters based around a specific set
of generators.
2013-04-22 13:20:33 -04:00
Andreas Hansson
a35d3ff167 mem: Add a WideIO DRAM configuration
This patch adds a WideIO 200 MHz configuration that can be used as a
baseline to compare with DDRx and LPDDRx. Note that it is a single
channel and that it should be replicated 4 times. It is based on
publically available information and attempts to capture an envisioned
8 Gbit single-die part (i.e. without TSVs).
2013-04-22 13:20:33 -04:00
Uri Wiener
a8fbfefb5e mem: Adding verbose debug output in the memory system
This patch provides useful printouts throughut the memory system. This
includes pretty-printed cache tags and function call messages
(call-stack like).
2013-04-22 13:20:33 -04:00
Andreas Hansson
9929e884b6 mem: Replace check with panic where inhibited should not happen
This patch changes the SimpleTimingPort and RubyPort to panic on
inhibited requests as this should never happen in either of the
cases. The SimpleTimingPort is only used for the I/O devices PIO port
and the DMA devices config port and should thus never see an inhibited
request. Similarly, the SimpleTimingPort is also used for the
MessagePort in x86, and there should also not be any cases where the
port sees an inhibited request.
2013-04-22 13:20:33 -04:00
Andreas Sandberg
33ab8f735d kvm: Add support for pseudo-ops on ARM
This changeset adds support for m5 pseudo-ops when running in
kvm-mode. Unfortunately, we can't trap the normal gem5 co-processor
entry in KVM (it doesn't seem to be possible to trap accesses to
non-existing co-processors). We therefore use BZJ instructions to
cause a trap from virtualized mode into gem5. The BZJ instruction is
becomes a normal branch to the gem5 fallback code when running in
simulated mode, which means that this patch does not need to change
the ARM ISA-specific code.

Note: This requires a patched host kernel.
2013-04-22 13:20:32 -04:00
Andreas Sandberg
1c529a4196 sim: Add a helper function to execute pseudo instructions
All architectures execute m5 pseudo instructions by setting up
arguments according to the ABI and executing a magic instruction that
contains an operation number. Handling of such instructions is
currently spread across the different ISA implementations. This
changeset introduces the PseudoInst::pseudoInst function which handles
most of this in an architecture independent way. This is function is
mainly intended to be used from KVM, but can also be used from the
simulated CPUs.
2013-04-22 13:20:32 -04:00
Andreas Sandberg
32ecd72b6e kvm: Add support for state dumping on ARM 2013-04-22 13:20:32 -04:00
Andreas Sandberg
f156020158 kvm: Add basic support for ARM
Architecture specific limitations:
 * LPAE is currently not supported by gem5. We therefore panic if LPAE
   is enabled when returning to gem5.
 * The co-processor based interface to the architected timer is
   unsupported. We can't support this due to limitations in the KVM
   API on ARM.
 * M5 ops are currently not supported. This requires either a kernel
   hack or a memory mapped device that handles the guest<->m5
   interface.
2013-04-22 13:20:32 -04:00
Andreas Sandberg
6d2941d990 arm: Add a method to query interrupt state ignoring CPSR masks
Add the method checkRaw to ArmISA::Interrupts. This method can be used
to query the raw state (ignoring CPSR masks) of an interrupt. It is
primarily intended for hardware virtualized CPUs.
2013-04-22 13:20:32 -04:00
Andreas Sandberg
f8f66fa3df kvm: Add experimental support for a perf-based execution timer
Add support for using the CPU cycle counter instead of a normal POSIX
timer to generate timed exits to gem5. This should, in theory, provide
better resolution when requesting timer signals.

The perf-based timer requires a fairly recent kernel since it requires
a working PERF_EVENT_IOC_PERIOD ioctl. This ioctl has existed in the
kernel for a long time, but it used to be completely broken due to an
inverted match when the kernel copied things from user
space. Additionally, the ioctl does not change the sample period
correctly on all kernel versions which implement it. It is currently
only known to work reliably on kernel version 3.7 and above on ARM.
2013-04-22 13:20:32 -04:00
Andreas Sandberg
2607efded8 kvm: Avoid synchronizing the TC on every KVM exit
Reduce the number of KVM->TC synchronizations by overloading the
getContext() method and only request an update when the TC is
requested as opposed to every time KVM returns to gem5.
2013-04-22 13:20:32 -04:00
Andreas Sandberg
f485ad1908 kvm: Basic support for hardware virtualized CPUs
This changeset introduces the architecture independent parts required
to support KVM-accelerated CPUs. It introduces two new simulation
objects:

KvmVM -- The KVM VM is a component shared between all CPUs in a shared
         memory domain. It is typically instantiated as a child of the
         system object in the simulation hierarchy. It provides access
         to KVM VM specific interfaces.

BaseKvmCPU -- Abstract base class for all KVM-based CPUs. Architecture
	      dependent CPU implementations inherit from this class
	      and implement the following methods:

                * updateKvmState() -- Update the
                  architecture-dependent KVM state from the gem5
                  thread context associated with the CPU.

                * updateThreadContext() -- Update the thread context
                  from the architecture-dependent KVM state.

                * dump() -- Dump the KVM state using (optional).

	      In order to deliver interrupts to the guest, CPU
	      implementations typically override the tick() method and
	      check for, and deliver, interrupts prior to entering
	      KVM.

Hardware-virutalized CPU currently have the following limitations:
 * SE mode is not supported.
 * PC events are not supported.
 * Timing statistics are currently very limited. The current approach
   simply scales the host cycles with a user-configurable factor.
 * The simulated system must not contain any caches.
 * Since cycle counts are approximate, there is no way to request an
   exact number of cycles (or instructions) to be executed by the CPU.
 * Hardware virtualized CPUs and gem5 CPUs must not execute at the
   same time in the same simulator instance.
 * Only single-CPU systems can be simulated.
 * Remote GDB connections to the guest system are not supported.

Additionally, m5ops requires an architecture specific interface and
might not be supported.
2013-04-22 13:20:32 -04:00
Timothy M. Jones
005616518c cpu: Let python scripts obtain the number of instructions executed 2013-04-22 13:20:31 -04:00
Andreas Sandberg
5f2361f3af arm: Enable support for triggering a sim panic on kernel panics
Add the options 'panic_on_panic' and 'panic_on_oops' to the
LinuxArmSystem SimObject. When these option are enabled, the simulator
panics when the guest kernel panics or oopses. Enable panic on panic
and panic on oops in ARM-based test cases.
2013-04-22 13:20:31 -04:00
Dam Sunwoo
e8381142b0 sim: separate nextCycle() and clockEdge() in clockedObjects
Previously, nextCycle() could return the *current* cycle if the current tick was
already aligned with the clock edge. This behavior is not only confusing (not
quite what the function name implies), but also caused problems in the
drainResume() function. When exiting/re-entering the sim loop (e.g., to take
checkpoints), the CPUs will drain and resume. Due to the previous behavior of
nextCycle(), the CPU tick events were being rescheduled in the same ticks that
were already processed before draining. This caused divergence from runs that
did not exit/re-entered the sim loop. (Initially a cycle difference, but a
significant impact later on.)

This patch separates out the two behaviors (nextCycle() and clockEdge()),
uses nextCycle() in drainResume, and uses clockEdge() everywhere else.
Nothing (other than name) should change except for the drainResume timing.
2013-04-22 13:20:31 -04:00
Dam Sunwoo
2c1e344313 cpu: generate SimPoint basic block vector profiles
This patch is based on http://reviews.m5sim.org/r/1474/ originally written by
Mitch Hayenga. Basic block vectors are generated (simpoint.bb.gz in simout
folder) based on start and end addresses of basic blocks.

Some comments to the original patch are addressed and hooks are added to create
and resume from checkpoints based on instruction counts dictated by external
SimPoint analysis tools.

SimPoint creation/resuming options will be implemented as a separate patch.
2013-04-22 13:20:31 -04:00
Chris Emmons
121b15a54d ARM: Add support for HDLCD controller for TC2 and newer Versatile Express tiles.
Newer core tiles / daughterboards for the Versatile Express platform have an
HDLCD controller that supports HD-quality output.  This patch adds an
implementation of the controller.
2013-04-22 13:20:31 -04:00
Andreas Sandberg
aa08069b3f sim: Add helper functions that add PCEvents with custom arguments
This changeset adds support for forwarding arguments to the PC
event constructors to following methods:

addKernelFuncEvent
addFuncEvent

Additionally, this changeset adds the following helper method to the
System base class:

addFuncEventOrPanic - Hook a PCEvent to a symbol, panic on failure.

addKernelFuncEventOrPanic - Hook a PCEvent to a kernel symbol, panic
                            on failure.


System implementations have been updated to use the new functionality
where appropriate.
2013-04-22 13:20:31 -04:00
Ali Saidi
c9e4678c16 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-04-22 13:20:31 -04:00
Nilay Vaish
3d858e5627 Merged c22628fa2564 and 2285b98847d7 2013-04-17 16:09:37 -05:00
Deyuan Guo ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E)
b54e118628 base: load weak symbols from object file
Without loading weak symbols into gem5, some function names and the given PC
cannot correspond correctly, because the binding attributes of unction names
in an ELF file are not only STB_GLOBAL or STB_LOCAL, but also STB_WEAK. This
patch adds a function for loading weak symbols.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-04-17 16:07:19 -05:00
Nathanael Premillieu
3ff091bdf4 arm: set ldr_ret_uop as conditional or unconditional control
This patch adds a missing flag to the ldr_ret_uop microop instruction.
The flag is added when the instruction is used, not directly in the
constructor of the instruction.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
2013-04-17 16:07:10 -05:00
Nilay Vaish
03c60f005e ruby: moesi cmp directory: add copyright notice 2013-04-17 16:06:58 -05:00
Andreas Hansson
234c9a36a2 dev: Fix a bug in the use of seekp/seekg
This patch fixes two instances of incorrect use of the seekp/seekg
stream member functions. These two functions return a stream reference
(*this), and should not be compared to an integer value.
2013-04-17 08:17:03 -04:00
Joel Hestness
1583056de8 Ruby: Fix RubyPort evict packet memory leak
When using the o3 or inorder CPUs with many Ruby protocols, the caches may
need to forward invalidations to the CPUs. The RubyPort was instantiating a
packet to be sent to the CPUs to signal the eviction, but the packets were
not being freed by the CPUs. Consistent with the classic memory model, stack
allocate the packet and heap allocate the request so on
ruby_eviction_callback() completion, the packet deconstructor is called, and
deletes the request (*Note: stack allocating the request causes double
deletion, since it will be deleted in the packet destructor). This results in
the least memory allocations without memory errors.
2013-04-09 16:25:30 -05:00
Joel Hestness
46d4b71aa2 Ruby: Delete packet requests during warmup
When warming up caches in Ruby, the CacheRecorder sends fetch requests into
Ruby Sequencers with packet types that require responses. Since responses are
never generated for these CacheRecorder requests, the requests are not deleted
in the packet destructor called from the Ruby hit callback. Free the request.
2013-04-09 16:25:29 -05:00
Joel Hestness
e98c3c227d Ruby: Add field to slicc machine for generic type
This allows you to have (i.e.) an L2 cache that is not named "L2Cache"
but is still a GenericMachineType_L2Cache. This is particularly
helpful if the protocol has multiple L2 controllers.
2013-04-09 16:25:29 -05:00
Joel Hestness
b936619ab4 Ruby: Order profilers based on version
When Ruby stats are printed for events and transitions, they include stats
for all of the controllers of the same type, but they are not necessarily
printed in order of the controller ID "version", because of the way the
profilers were added to the profiler vector. This patch fixes the push order
problem so that the stats are printed in ascending order 0->(# controllers),
so statistics parsers may correctly assume the controller to which the stats
belong.
2013-04-09 16:25:29 -05:00
Jason Power
88d34665d0 Ruby: More descriptive message buffer connection fatal
When connecting message buffers between Ruby controllers, it is
easy to mistakenly connect multiple controllers to the same message
buffer. This patch prints a more descriptive fatal message than the
previous assert statement in order to facilitate easier debugging.
2013-04-09 16:15:06 -05:00
Jason Power
19cc9fc6bd Ruby: Fix typo in Slicc if-statement AST error
The error in the SLICC code was hidden by the python error in SLICC parser
before this patch
2013-04-09 16:12:42 -05:00
Joel Hestness
3b02210713 Ruby System, Cache Recorder: Use delete [] for trace vars
The cache trace variables are array allocated uint8_t* in the RubySystem and
the Ruby CacheRecorder, but the code used delete to free the memory, resulting
in Valgrind memory errors. Change these deletes to delete [] to get rid of the
errors.
2013-04-07 20:31:15 -05:00
Nilay Vaish
ac778b1d02 o3cpu: commit: changes interrupt handling
Currently the commit stage keeps a local copy of the interrupt object.
Since the interrupt is usually handled several cycles after the commit
stage becomes aware of it, it is possible that the local copy of the
interrupt object may not be the interrupt that is actually handled.
It is possible that another interrupt occurred in the
interval between interrupt detection and interrupt handling.

This patch creates a copy of the interrupt just before the interrupt
is handled. The local copy is ignored.
2013-03-29 14:05:26 -05:00
Nilay Vaish
d2fd3b2ec2 x86: changes to apic, keyboard
It is possible that operating system wants to shutdown the
lapic timer by writing timer's initial count to 0. This patch
adds a check that the timer event is only scheduled if the
count is 0.

The patch also converts few of the panics related to the keyboard
to warnings since we are any way not interested in simulating the
keyboard.
2013-03-28 09:34:23 -05:00
Mitch Hayenga
4920f0d7e5 mem: Fix cache latency bug
Fixes a latency calculation bug for accesses during a cache line fill.

Under a cache miss, before the line is filled, accesses to the cache are
associated with a MSHR and marked as targets.  Once the line fill completes,
MSHR target packets pay an additional latency of
"responseLatency + busSerializationLatency".  However, the "whenReady"
field of the cache line is only set to an additional delay of
"busSerializationLatency".  This lacks the responseLatency component of
the fill.  It is possible for accesses that occur on the cycle of
(or briefly after) the line fill to respond without properly paying the
responseLatency.  This also creates the situation where two accesses to the
same address may be serviced in an order opposite of how they were received
by the cache.  For stores to the same address, this means that although the
cache performs the stores in the order they were received, acknowledgements
may be sent in a different order.

Adding the responseLatency component to the whenReady field preserves the
penalty that should be paid and prevents these ordering issues.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-03-27 18:36:09 -05:00
Steve Reinhardt
f0b745d556 scons: don't die on warnings in swig-generated code
There's not much to do about it other than disable the offending
warning anyway, so it's not worth terminating the build over.

Also suppress uninitialized variable warnings on gcc (happens
at least with gcc 4.4 and swig 1.3.40).
2013-03-27 10:03:02 -07:00
Rene de Jong
87089175cc mem: Cancel cache retry event when blocking port
This patch solves the corner case scenario where the sendRetryEvent could be
scheduled twice, when an io device stresses the IOcache in the system. This
should not be possible in the cache system.
2013-03-26 14:46:51 -04:00
Andreas Hansson
93a8423dea mem: Separate waiting for the bus and waiting for a peer
This patch splits the retryList into a list of ports that are waiting
for the bus itself to become available, and a map that tracks the
ports where forwarding failed due to a peer not accepting the
packet. Thus, when a retry reaches the bus, it can be sent to the
appropriate port that initiated that transaction.

As a consequence of this patch, only ports that are really ready to go
will get a retry, thus reducing the amount of redundant failed
attempts. This patch also makes it easier to reason about the order of
servicing requests as the ports waiting for the bus are now clearly
FIFO and much easier to change if desired.
2013-03-26 14:46:47 -04:00
Andreas Hansson
362f6f1a16 mem: Introduce a variable for the retrying port
This patch introduces a variable to keep track of the retrying port
instead of relying on it being the front of the retryList.

Besides the improvement in readability, this patch is a step towards
separating out the two cases where a port is waiting for the bus to be
free, and where the forwarding did not succeed and the bus is waiting
for a retry to pass on to the original initiator of the transaction.

The changes made are currently such that the regressions are not
affected. This is ensured by always prioritizing the currently
retrying port and putting it back at the front of the retry list.
2013-03-26 14:46:46 -04:00
Andreas Hansson
2123176684 mem: Add a generic id field to the packet trace
This patch adds an optional generic 64-bit identifier field to the
packet trace. This can be used to store the sequential number of the
instruction that gave rise to the packet, thread id, master id,
"sub"-master within a larger module etc. As the field is optional it
has a marginal cost if not used.
2013-03-26 14:46:45 -04:00
Andreas Hansson
7a57b1bce0 mem: Add optional request flags to the packet trace
This patch adds an optional flags field to the packet trace to encode
the request flags that contain information about whether the request
is (un)cacheable, instruction fetch, preftech etc.
2013-03-26 14:46:44 -04:00
Andreas Hansson
08c1835bef cpu: Remove CpuPort and use MasterPort in the CPU classes
This patch changes the port in the CPU classes to use MasterPort
instead of the derived CpuPort. The functions of the CpuPort are now
distributed across the relevant subclasses. The port accessor
functions (getInstPort and getDataPort) now return a MasterPort
instead of a CpuPort. This simplifies creating derivative CPUs that do
not use the CpuPort.
2013-03-26 14:46:42 -04:00
Nilay Vaish
b2c8c50f17 ruby: slicc: set sender, receiver clock objs for optional queue 2013-03-22 17:21:23 -05:00
Nilay Vaish
e85b556d70 ruby: message buffer: correct previous errors
A recent set of patches added support for multiple clock domains to ruby.
I had made some errors while writing those patches. The sender was using
the receiver side clock while enqueuing a message in the buffer. Those
errors became visible while creating (or restoring from) checkpoints. The
errors also become visible when a multi eventq scenario occurs.
2013-03-22 17:21:22 -05:00
Nilay Vaish
47c8cb72fc ruby: message buffer: remove _ptr from some variables
The names were getting too long.
2013-03-22 15:53:27 -05:00
Nilay Vaish
6465cf5824 ruby: message buffer node: used Tick in place of Cycles
The message buffer node used to keep time in terms of Cycles. Since the
sender and the receiver can have different clock periods, storing node
time in cycles requires some conversion. Instead store the time directly
in Ticks.
2013-03-22 15:53:26 -05:00
Nilay Vaish
39e9445468 ruby: consumer: avoid using receiver side clock
A set of patches was recently committed to allow multiple clock domains
in ruby. In those patches, I had inadvertently made an incorrect use of
the clocks. Suppose object A needs to schedule an event on object B. It
was possible that A accesses B's clock to schedule the event. This is not
possible in actual system. Hence, changes are being to the Consumer class
so as to avoid such happenings. Note that in a multi eventq simulation,
this can possibly lead to an incorrect simulation.

There are two functions in the Consumer class that are used for scheduling
events. The first function takes in the relative delay over the current time
as the argument and adds the current time to it for scheduling the event.
The second function takes in the absolute time (in ticks) for scheduling the
event. The first function is now being moved to protected section of the
class so that only objects of the derived classes can use it. All other
objects will have to specify absolute time while scheduling an event
for some consumer.
2013-03-22 15:53:26 -05:00
Nilay Vaish
28005a7626 ruby: remove unsued profile functions 2013-03-22 15:53:25 -05:00
Nilay Vaish
89bb826079 ruby: keep histogram of outstanding requests in seq
The histogram for tracking outstanding counts per cycle is maintained
in the profiler. For a parallel implementation of the memory system, we
need that this histogram is maintained locally. Hence it will now be
kept in the sequencer itself. The resulting histograms will be merged
when the stats are printed.
2013-03-22 15:53:25 -05:00
Nilay Vaish
870d545788 slicc: remove check if the L1Cache has a sequencer 2013-03-22 15:53:24 -05:00
Nilay Vaish
8573a69d8f ruby: move stall and wakeup functions to AbstractController
These functions are currently implemented in one of the files related to Slicc.
Since these are purely C++ functions, they are better suited to be in the base
class.
2013-03-22 15:53:24 -05:00
Nilay Vaish
eccc86e809 ruby: connect two controllers using only message buffers
This patch modifies ruby so that two controllers can be connected to each
other with only message buffers in between. Before this patch, all the
controllers had to be connected to the network  for them to communicate
with each other. With this patch, one can have protocols where a controller
is not connected to the network, but communicates with another controller
through a message buffer.
2013-03-22 15:53:23 -05:00
Nilay Vaish
5aa43e130a ruby: convert Topology to regular class
The Topology class in Ruby does not need to inherit from SimObject class.
This patch turns it into a regular class. The topology object is now created
in the constructor of the Network class. All the parameters for the topology
class have been moved to the network class.
2013-03-22 15:53:23 -05:00
Nilay Vaish
2d50127642 ruby: network: move routers from topology to network 2013-03-22 15:53:22 -05:00
Andreas Hansson
2ca42cd626 cpu: Avoid including inorder TLBUnit to avoid gcc LTO bug
This patch comments out the inclusion of the inorder TLBUnit which is
only used in the 9-stage pipeline. With the TLBUnit present, gcc >=
4.6 in combination with LTO ends up throwing away the definition of
the TLBUnit destructor, and consequently fail to link. See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53808 for more details
about the bug, and http://gcc.gnu.org/ml/gcc/2012-06/msg00397.html for
the discussion thread that also touches on similar issues seen with
clang.
2013-03-20 06:41:23 -04:00
Andreas Hansson
c01c5e971b mem: Fix missing delete of packet in DRAM access
This patch fixes a memory leak caused by not deleting packets that
require no response.
2013-03-18 05:22:45 -04:00
Nilay Vaish
dc37b03439 ruby: set: corrects csprintf() call introduced by 7d95b650c9b6 2013-03-15 16:28:08 -05:00
Andreas Sandberg
fc6f569d94 cpu: Fix state transition bug in the traffic generator
The traffic generator used to incorrectly determine the next state in
when state 0 had a non-zero probability. Due to the way the next
transition was determined, state 0 could never be entered other than
as an initial state. This changeset updates the transitition() method
to correctly handle such cases and cases where the transition matrix
is a 1x1 matrix.
2013-03-12 18:41:29 +01:00
Nilay Vaish
5c940fec0a x86: implement some of the x87 instructions
This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.
2013-03-11 13:15:46 -05:00
Andreas Hansson
82f600e02d base: Fix address range granularity calculations
This patch fixes a bug in the address range granularity
calculations. Previously it incorrectly used the high bit to establish
the size of the regions created, when it should really be looking at
the low bit.
2013-03-07 05:55:03 -05:00
Andreas Hansson
92e973b310 ruby: Fix gcc 4.8 maybe-uninitialized compilation error
This patch fixes the one-and-only gcc 4.8 compilation error, being a
warning about "maybe uninitialized" in Orion.
2013-03-07 05:55:02 -05:00
Andreas Hansson
c4645c0d68 x86: Make the table walker reset the packet delay
This patch fixes an issue related to the table walker recycling
packets that still have a bus delay that is not accounted for. For
now, we simply ignore the values and reset them to zero.
2013-03-07 05:55:01 -05:00
Nilay Vaish
c061819890 ruby: remove the functional copy of memory in se mode
This patch removes the functional copy of the memory that was maintained in
the se mode. Now ruby itself will provide the data.
2013-03-06 21:53:57 -06:00
Nilay Vaish
e8802fa127 ruby: garnet: fixed: implement functional access 2013-03-06 21:53:16 -06:00
Ali Saidi
f205d83359 cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code
that was checking for the pipeline being blocked wasn't checking for a pending
translation, only for a icache access.
2013-03-04 23:33:47 -05:00
Ali Saidi
f4fd12d49e ARM: fix some cases where instructions that write to fp reg 15 are accidently branches. 2013-03-04 23:33:47 -05:00
Blake Hechtman ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E)
af8eb67fb4 ruby: fixes functional writes to RubyRequest
The functional write code was assuming that all writes are block sized,
which may not be true for Ruby Requests. This bug can lead to a buffer
overflow.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-03-02 23:12:55 -06:00
Nilay Vaish
a4e8512afa sim: remove duplicate check on stack size 2013-03-02 18:04:51 -06:00
Andreas Hansson
e5bcb30756 mem: Add check if SimpleDRAM nextReqEvent is scheduled
This check covers a case where a retry is called from the SimpleDRAM
causing a new request to appear before the DRAM itself schedules a
nextReqEvent. By adding this check, the event is not scheduled twice.
2013-03-01 13:20:33 -05:00
Andreas Hansson
da5356ccce mem: Add a method to build multi-channel DRAM configurations
This patch adds a class method that allows easy creation of
channel-interleaved multi-channel DRAM configurations. It is enabled
by a class method to allow customisation of the class independent of
the channel configuration. For example, the user can create a MyDDR
subclass of e.g. SimpleDDR3, and then create a four-channel
configuration of the subclass by calling MyDDR.makeMultiChannel(4,
mem_start, mem_size).
2013-03-01 13:20:32 -05:00
Andreas Hansson
0facc8e1ac mem: SimpleDRAM variable naming and whitespace fixes
This patch fixes a number of small cosmetic issues in the SimpleDRAM
module. The most important change is to move the accounting of
received packets to after the check is made if the packet should be
retried or not. Thus, packets are only counted if they are actually
accepted.
2013-03-01 13:20:24 -05:00
Andreas Hansson
3ba131f4d5 mem: Add support for multi-channel DRAM configurations
This patch adds support for multi-channel instances of the DRAM
controller model by stripping away the channel bits in the address
decoding. The patch relies on the availiability of address
interleaving and, at this time, it is up to the user to configure the
interleaving appropriately. At the moment it is assumed that the
channel interleaving bits are immediately following the column bits
(smallest sensible interleaving). Convenience methods for building
multi-channel configurations will be added later.
2013-03-01 13:20:22 -05:00
Andreas Hansson
1a58362e25 mem: Merge interleaved ranges when creating backing store
This patch adds merging of interleaved ranges before creating the
backing stores. The backing stores are always a contigous chunk of the
address space, and with this patch it is possible to have interleaved
memories in the system.
2013-03-01 13:20:21 -05:00
Andreas Hansson
cafd38f36c mem: Merge ranges in bus before passing them on
This patch adds basic merging of address ranges to the bus, such that
interleaved ranges are merged together before being passed on by the
bus. As such, the bus aggregates the address ranges of the connected
slave ports and then passes on the merged ranges through its master
ports. The bus thus hides the complexity of the interleaved ranges and
only exposes contigous ranges to the surrounding system.

As part of this patch, the bus ranges are also cached for any future
queries.
2013-03-01 13:20:19 -05:00
Dibakar Gope ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E)
c636a09e83 ruby: mesi coherence protocol: invalidate lock
The MESI CMP directory coherence protocol, while transitioning from SM to IM,
did not invalidate the lock that it might have taken on a cache line. This
patch adds an action for doing so.

The problem was found by Dibakar, but I was not happy with his proposed
solution. So I implemented a different solution.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-02-28 10:04:26 -06:00
Nilay Vaish
fea27bc49b slicc: remove unused variable message_buffer_names 2013-02-19 22:58:51 -06:00
Nilay Vaish
e95e78ff2f ruby: remove unused variable m_print_config in class Topology 2013-02-19 22:58:50 -06:00
Andreas Hansson
da950caed2 mem: Fix sender state bug and delay popping
This patch fixes a newly introduced bug where the sender state was
popped before checking that it should be. Amazingly all regressions
pass, but Linux fails to boot on the detailed CPU with caches enabled.
2013-02-19 12:57:47 -05:00
Andreas Hansson
a62afd094b scons: Fix warnings issued by clang 3.2svn (XCode 4.6)
This patch fixes the warnings that clang3.2svn emit due to the "-Wall"
flag. There is one case of an uninitialised value in the ARM neon ISA
description, and then a whole range of unused private fields that are
pruned.
2013-02-19 05:56:08 -05:00
Andreas Hansson
08a5fd328b scons: Unify the flags shared by gcc and clang
This patch restructures and unifies the flags used by gcc and clang as
they are largely the same. The common parts are now dealt with in a
shared block of code, and the few bits and pieces that are
specifically affecting either gcc or clang are done separately.
2013-02-19 05:56:07 -05:00
Andreas Hansson
5eddb63877 scons: Add warning delete with non-virtual destructor
This patch enables a warning for deleting derived classes that do not
have a virtual destructor. The patch merely adds additional checks,
and there are currently no cases that had to be fixed.
2013-02-19 05:56:07 -05:00
Andreas Hansson
319443d42d scons: Add warning for missing declarations
This patch enables warnings for missing declarations. To avoid issues
with SWIG-generated code, the warning is only applied to non-SWIG
code.
2013-02-19 05:56:07 -05:00
Andreas Hansson
b44e0ce52b scons: Add warning for overloaded virtual functions
Fix the ISA startup warnings
2013-02-19 05:56:07 -05:00
Andreas Hansson
0acd2a96e5 scons: Add warning for overloaded virtual functions
A derived function with a different signature than a base class
function will result in the base class function of the same name being
hidden. The parameter list and return type for the member function in
the derived class must match those of the member function in the base
class, otherwise the function in the derived class will hide the
function in the base class and no polymorphic behaviour will occur.

This patch addresses these warnings by ensuring a unique function name
to avoid (unintentionally) hiding any functions.
2013-02-19 05:56:06 -05:00
Andreas Hansson
d670fa60a1 scons: Add warning for missing field initializers
This patch adds a warning for missing field initializers for both gcc
and clang, and addresses the warnings that were generated.
2013-02-19 05:56:06 -05:00
Andreas Hansson
c10098f28b scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as
produced when using gcc/clang with -Wshadow). There are many
locations where constructor parameters and function parameters shadow
local variables, but these are left unchanged.
2013-02-19 05:56:06 -05:00
Andreas Hansson
860155a5fc mem: Enforce strict use of busFirst- and busLastWordTime
This patch adds a check to ensure that the delay incurred by
the bus is not simply disregarded, but accounted for by someone. At
this point, all the modules do is to zero it out, and no additional
time is spent. This highlights where the bus timing is simply dropped
instead of being paid for.

As a follow up, the locations identified in this patch should add this
additional time to the packets in one way or another. For now it
simply acts as a sanity check and highlights where the delay is simply
ignored.

Since no time is added, all regressions remain the same.
2013-02-19 05:56:06 -05:00
Andreas Hansson
40d0e6c899 mem: Change accessor function names to match the port interface
This patch changes the names of the cache accessor functions to be in
line with those used by the ports. This is done to avoid confusion and
get closer to a one-to-one correspondence between the interface of the
memory object (the cache in this case) and the port itself.

The member function timingAccess has been split into a snoop/non-snoop
part to avoid branching on the isResponse() of the packet.
2013-02-19 05:56:06 -05:00
Andreas Hansson
b3fc8839c4 mem: Make packet bus-related time accounting relative
This patch changes the bus-related time accounting done in the packet
to be relative. Besides making it easier to align the cache timing to
cache clock cycles, it also makes it possible to create a Last-Level
Cache (LLC) directly to a memory controller without a bus inbetween.

The bus is unique in that it does not ever make the packets wait to
reflect the time spent forwarding them. Instead, the cache is
currently responsible for making the packets wait. Thus, the bus
annotates the packets with the time needed for the first word to
appear, and also the last word. The cache then delays the packets in
its queues before passing them on. It is worth noting that every
object attached to a bus (devices, memories, bridges, etc) should be
doing this if we opt for keeping this way of accounting for the bus
timing.
2013-02-19 05:56:06 -05:00
Andreas Hansson
362160c8ae mem: Add deferred packet class to prefetcher
This patch removes the time field from the packet as it was only used
by the preftecher. Similar to the packet queue, the prefetcher now
wraps the packet in a deferred packet, which also has a tick
representing the absolute time when the packet should be sent.
2013-02-19 05:56:06 -05:00
Andreas Hansson
7cd49b24d2 sim: Make clock private and access using clockPeriod()
This patch makes the clock member private to the ClockedObject and
forces all children to access it using clockPeriod(). This makes it
impossible to inadvertently change the clock, and also makes it easier
to transition to a situation where the clock is derived from e.g. a
clock domain, or through a multiplier.
2013-02-19 05:56:06 -05:00
Andreas Hansson
5c7ebee434 x86: Move APIC clock divider to Python
This patch moves the 16x APIC clock divider to the Python code to
avoid the post-instantiation modifications to the clock. The x86 APIC
was the only object setting the clock after creation time and this
required some custom functionality and configuration. With this patch,
the clock multiplier is moved to the Python code and the objects are
instantiated with the appropriate clock.
2013-02-19 05:56:06 -05:00
Sascha Bischoff
86a4d09269 mem: Fix SenderState related cache deadlock
This patch fixes a potential deadlock in the caches. This deadlock
could occur when more than one cache is used in a system, and
pkt->senderState is modified in between the two caches. This happened
as the caches relied on the senderState remaining unchanged, and used
it for instantaneous upstream communication with other caches.

This issue has been addressed by iterating over the linked list of
senderStates until we are either able to cast to a MSHR* or
senderState is NULL. If the cast is successful, we know that the
packet has previously passed through another cache, and therefore
update the downstreamPending flag accordingly. Otherwise, we do
nothing.
2013-02-19 05:56:06 -05:00
Andreas Hansson
0622f30961 mem: Add predecessor to SenderState base class
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.

There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
2013-02-19 05:56:05 -05:00
Andreas Hansson
f69d431ede base: Fix a bug in the address interleaving
This patch fixes a minor (but important) typo in the matching of an
address to an interleaved range.
2013-02-19 05:56:05 -05:00
Andreas Hansson
9947923c60 mem: Ensure trace captures packet fields before forwarding
This patch fixes a bug in the CommMonitor caused by the packet being
modified before it is captured in the trace. By recording the fields
before passing the packet on, and then putting these values in the
trace we ensure that even if the packet is modified the trace captures
what the CommMonitor saw.
2013-02-19 05:56:05 -05:00
Anthony Gutierrez
f7107fb795 loader: add a flattened device tree blob (dtb) object
this adds a dtb_object so the loader can load in the dtb
file for linux/android ARM kernels.
2013-02-15 18:48:59 -05:00
Mrinmoy Ghosh
8cef39fb67 arm: fix a page table walker issue where a page could be translated multiple times
If multiple memory operations to the same page are miss the TLB they are
all inserted into the page table queue and before this change could result
in multiple uncessesary walks as well as duplicate enteries being inserted
into the TLB.
2013-02-15 17:40:10 -05:00
Andreas Sandberg
3af59ab386 cpu: Document exec trace flags 2013-02-15 17:40:10 -05:00
Andreas Sandberg
08467a88a6 dev: Use the correct return type for disk offsets
Replace the use of off_t in the various DiskImage related classes with
std::streampos. off_t is a signed 32 bit integer on most 32-bit
systems, whereas std::streampos is normally a 64 bit integer on most
modern systems. Furthermore, std::streampos is the type used by
tellg() and seekg() in the standard library, so it should have been
used in the first place. This patch makes it possible to use disk
images larger than 2 GiB on 32 bit systems with a modern C++ standard
library.
2013-02-15 17:40:10 -05:00
Geoffrey Blake
ca96e7bff1 cpu: Avoid duplicate entries in tracking structures for writes to misc regs
setMiscReg currently makes a new entry for each write to a misc reg without
checking for duplicates, this can cause a triggering of the assert if an
instruction get replayed and writes to the same misc regs multiple times.
This fix prevents duplicate entries and instead updates the value.
2013-02-15 17:40:10 -05:00
Geoffrey Blake
8e79c68936 cpu: Fix rename mis-handling serializing instructions when resource constrained
The rename can mis-handle serializing instructions (i.e. strex) if it gets
into a resource constrained situation and the serializing instruction has
to be placed on the skid buffer to handle blocking.  In this situation the
instruction informs the pipeline it is serializing and logs that the next
instruction must be serialized, but since we are blocking the pipeline
defers this action to place the serializing instruction and
incoming instructions into the skid buffer. When resuming from blocking,
rename will pull the serializing instruction from the skid buffer and
the current logic will see this as the "next" instruction that has to
be serialized and because of flags set on the serializing instruction,
it passes through the pipeline stage as normal and resets rename to
non-serializing.  This causes instructions to follow the serializing inst
incorrectly and eventually leads to an error in the pipeline. To fix this
rename should check first if it has to block before checking for serializing
instructions.
2013-02-15 17:40:10 -05:00
Chris Emmons
27630e9cad ARM: Postpones creation of framebuffer output file until it is actually used.
This delay prevents a potential conflict with the HDLCD if both are in the same
system even if only one is enabled.
2013-02-15 17:40:10 -05:00
Andreas Hansson
f6550b3d20 mem: Tighten up cache constness and scoping
This patch merely adopts a more strict use of const for the cache
member functions and variables, and also moves a large portion of the
member functions from public to protected.
2013-02-15 17:40:10 -05:00
Sascha Bischoff
2f3b322280 base: Add warn() and inform() to m5.utils for use from python
This patch adds two fuctions to m5.util, warn and inform, which mirror those
found in the C++ side of gem5. These are added in addition to the already
existing m5.util.panic and m5.util.fatal which already mirror the C++
functionality. This ensures that warning and information messages generated
by python are in the same format as those generated by C++.

Occurrences of
    print "Warning: %s..." % name
have been replaced with
    warn("%s...", name)
2013-02-15 17:40:10 -05:00
Matt Horsnell
e88e7d88b9 o3: fix tick used for renaming and issue with range selection
Fixes the tick used from rename:
- previously this gathered the tick on leaving rename which was always 1 less
  than the dispatch. This conflated the decode ticks when back pressure built
  in the pipeline.
- now picks up tick on entry.

Added --store_completions flag:
- will additionally display the store completion tail in the viewer.
- this highlights periods when large numbers of stores are outstanding (>16 LSQ
  blocking)

Allows selection by tick range (previously this caused an infinite loop)
2013-02-15 17:40:09 -05:00
Andreas Sandberg
6459908069 arm: Don't export private GIC methods 2012-10-25 14:08:29 +01:00
Andreas Sandberg
81be8b9d15 arm: Create a GIC base class and make the PL390 derive from it
This patch moves the GIC interface to a separate base class and makes
all interrupt devices use that base class instead of a pointer to the
PL390 implementation. This allows us to have multiple GIC
implementations. Future implementations will allow in-kernel GIC
implementations when using hardware virtualization.

--HG--
rename : src/dev/arm/gic.cc => src/dev/arm/gic_pl390.cc
rename : src/dev/arm/gic.hh => src/dev/arm/gic_pl390.hh
2012-10-25 14:05:24 +01:00
Andreas Sandberg
b904bd5437 sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.

To make memory mode tests cleaner, the following methods are added to
the System class:

 * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
 * isTimingMode() -- True if the memory mode is 'timing'.
 * bypassCaches() -- True if caches should be bypassed.

The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
2013-02-15 17:40:09 -05:00
Andreas Sandberg
1eec115c31 cpu: Refactor memory system checks
CPUs need to test that the memory system is in the right mode in two
places, when the CPU is initialized (unless it's switched out) and on
a drainResume(). This led to some code duplication in the CPU
models. This changeset introduces the verifyMemoryMode() method which
is called by BaseCPU::init() if the CPU isn't switched out. The
individual CPU models are responsible for calling this method when
resuming from a drain as this code is CPU model specific.
2013-02-15 17:40:08 -05:00
Andreas Sandberg
e5dca84c3f config: Move CPU handover logic to m5.switchCpus()
CPU switching consists of the following steps:
 1. Drain the system
 2. Switch out old CPUs (cpu.switchOut())
 3. Change the system timing mode to the mode the new CPUs require
 4. Flush caches if switching to hardware virtualization
 5. Inform new CPUs of the handover (cpu.takeOverFrom())
 6. Resume the system

m5.switchCpus() previously only did step 2 & 5. Since information
about the new processors' memory system requirements is now exposed,
do all of the steps above.

This patch adds automatic memory system switching and flush (if
needed) to switchCpus(). Additionally, it adds optional draining to
switchCpus(). This has the following implications:

* changeToTiming and changeToAtomic are no longer needed, so they have
  been removed.

* changeMemoryMode is only used internally, so it is has been renamed
  to be private.

* switchCpus requires a reference to the system containing the CPUs as
  its first parameter.

WARNING: This changeset breaks compatibility with existing
configuration scripts since it changes the signature of
m5.switchCpus().
2013-02-15 17:40:08 -05:00
Andreas Sandberg
7f1263f144 cpu: Make checker CPUs inherit from CheckerCPU in the Python hierarchy
Checker CPUs currently don't inherit from the CheckerCPU in the Python
object hierarchy. This has two consequences:
 * It makes CPU model discovery from the Python world somewhat
   complicated as there is no way of testing if a CPU is a checker.
 * Parameters are duplicated in the checker configuration
   specification.

This changeset makes all checker CPUs inherit from the base checker
CPU class.
2013-02-15 17:40:08 -05:00
Andreas Sandberg
7cd1fd4324 cpu: Add CPU metadata om the Python classes
The configuration scripts currently hard-code the requirements of each
CPU. This is clearly not optimal as it makes writing new configuration
scripts painful and adding new CPU models requires existing scripts to
be updated. This patch adds the following class methods to the base
CPU and all relevant CPUs:

 * memory_mode -- Return a string describing the current memory mode
                  (invalid/atomic/timing).

 * require_caches -- Does the CPU model require caches?

 * support_take_over -- Does the CPU support CPU handover?
2013-02-15 17:40:08 -05:00
Ali Saidi
db5c478e70 arm: fix some fp comparisons that worked by accident.
The explict tests in the follwing fp comparison operations were
incorrect as they checked for only signaling NaNs and not quite-NaNs
as well. When compiled with gcc, the comparison generates a fp exception
that causes the FE_INVALID flag to be set and we check for it, so even
though the check was incorrect, the correct exception was set. With clang
this behavior seems to not occur. The checks are updated to test for nans and
the behavior is now correct with both clang and gcc.
2013-02-15 17:40:08 -05:00
Ali Saidi
4412046041 cpu: include set in o3/commit_impl.
While the majority of compilers seemed to pickup set from else where,
one version of gcc 4.7 complains, so explictly add it.
2013-02-15 17:40:08 -05:00
Ali Saidi
68495a0748 ARM: Fix an issue with clang generating wrong code.
Clang generated executables would enter the if condition when it wasn't
supposted to, resulting in the wrong simulated behavior.
Implementing the operation this way is a bit faster anyway.
2013-02-15 17:40:08 -05:00
Ali Saidi
7ae06a3b3b cpu: fix case with o3 cpu blocking and unblocking decode in cycle
Fix a case in the O3 CPU where the decode stage blocks and unblocks in a
single cycle sending both signals to fetch which causes an assert or worse.
The previous check could never work before since the status was set to Blocked
before a test for the status being Unblocking was executed.
2013-02-15 17:40:08 -05:00
Ali Saidi
b84bd3028c cpu: Fix a livelock in the o3 cpu.
Check if an instruction just enabled interrupts and we've previously had an
interrupt pending that was not handled because interrupts were subsequently
disabled before the pipeline reached a place to handle the interrupt. In that
case squash now to make sure the interrupt is handled.
2013-02-15 17:40:07 -05:00
Andreas Sandberg
d4eca0591d base: Add support for newer versions of IPython
IPython is used for the interactive gem5 shell if it exists. IPython
made API changes in version 0.11. This patch adds support for IPython
version 0.11 and above.

--HG--
extra : rebase_source : 5388d0919adb58d97f49a1a637db48cba61283a3
2013-02-10 13:23:58 +01:00
Andreas Hansson
7c6bc52bf5 Ruby: Fix compilation errors on gcc 4.7 and clang 3.2
This patch fixes a few (recently added) errors that prevented gem5 from
compiling on more recent versions of gcc and clang.
2013-02-14 12:24:51 -05:00
Nilay Vaish
71c27e6370 ruby: MI protocol: add a missing transition
The transition for state MII and event Store was found missing during testing.
The transition is being added. The controller will not stall the Store request
in state MII
2013-02-10 21:43:18 -06:00
Nilay Vaish
cb7782f78d ruby: enable multiple clock domains
This patch allows ruby to have multiple clock domains. As I understand
with this patch, controllers can have different frequencies. The entire
network needs to run at a single frequency.

The idea is that with in an object, time is treated in terms of cycles.
But the messages that are passed from one entity to another should contain
the time in Ticks. As of now, this is only true for the message buffers,
but not for the links in the network. As I understand the code, all the
entities in different networks (simple, garnet-fixed, garnet-flexible) should
be clocked at the same frequency.

Another problem is that the directory controller has to operate at the same
frequency as the ruby system. This is because the memory controller does
not make use of the Message Buffer, and instead implements a buffer of its
own. So, it has no idea of the frequency at which the directory controller
is operating and uses ruby system's frequency for scheduling events.
2013-02-10 21:43:17 -06:00
Nilay Vaish
253e8edf13 ruby: replace Time with Cycles (final patch in the series)
This patch is as of now the final patch in the series of patches that replace
Time with Cycles.This patch further replaces Time with Cycles in Sequencer,
Profiler, different protocols and related entities.

Though Time has not been completely removed, the places where it is in use
seem benign as of now.
2013-02-10 21:43:10 -06:00
Nilay Vaish
f6e3ab7bd4 ruby: replace Time with Cycles in garnet fixed and flexible 2013-02-10 21:43:09 -06:00
Nilay Vaish
9d6d6c6718 ruby: replace Time with Tick in replacement policy classes 2013-02-10 21:43:08 -06:00
Nilay Vaish
221d39284e ruby: convert block size, memory size to unsigned 2013-02-10 21:43:07 -06:00
Nilay Vaish
5e33045a2a ruby: replace Time with Cycles in MessageBuffer 2013-02-10 21:26:26 -06:00
Nilay Vaish
b742081cc1 ruby: replace Time with Cycles in Memory Controller 2013-02-10 21:26:25 -06:00
Nilay Vaish
89f86dbd28 ruby: Replace Time with Cycles in SequencerMessage 2013-02-10 21:26:25 -06:00
Nilay Vaish
7862478eef ruby: replace Time with Cycles in Message class
Concomitant changes are being committed as well, including the io operator<<
for the Cycles class.
2013-02-10 21:26:24 -06:00
Nilay Vaish
d3aebe1f91 ruby: replaces Time with Cycles in many places
The patch started of with replacing Time with Cycles in the Consumer class.
But to get ruby to compile, the rest of the changes had to be carried out.
Subsequent patches will further this process, till we completely replace
Time with Cycles.
2013-02-10 21:26:24 -06:00
Nilay Vaish
affd77ea77 base: add some mathematical operators to Cycles class 2013-02-10 21:26:23 -06:00
Nilay Vaish
bc1daae7fd ruby: modifies histogram add() function
This patch modifies the Histogram class' add() function so that it can add
linear histograms as well. The function assumes that the left end point of
the ranges of the two histograms are the same. It also assumes that when
the ranges of the two histogram are changed to accomodate an element not in
the range, the factor used in changing the range is same for both the
histograms.

This function is then used in removing one of the calls to the global
profiler*. The histograms for recording the delays incurred in processing
different requests are now maintained by the controllers. The profiler
adds these histograms when it needs to print the stats.
2013-02-10 21:26:22 -06:00
Nilay Vaish
a49b1df3f0 ruby: record fully busy cycle with in the controller
This patch does several things. First, the counter for fully busy cycles for a
controller is now kept with in the controller, instead of being part of the profiler.
Second, the topology class no longer keeps an array of controllers which was only
used for printing stats. Instead, ruby system will now ask each controller to print
the stats. Thirdly, the statistical variable for recording how many different types
were created is being moved in to the controller from the profiler. Note that for
printing, the profiler will collate results from different controllers.
2013-02-10 21:26:22 -06:00
Andreas Sandberg
10f1f8c6a4 base: Fix broken IPython argument handling
Prior to this changeset, we used to clear sys.argv before entering the
IPython shell. This caused some versions of IPython to crash because
they assume argv[0] to exist. The correct way of overriding the
arguments passed to IPython is to set the argv keyword argument when
initializing the shell.
2013-02-10 13:23:56 +01:00
Nilay Vaish
87ea04ab2f sim: remove unused struct priority_compare 2013-01-31 21:26:29 -06:00
Nilay Vaish
6aed4d4f93 ruby: correct computation of number of bits required for address
The number of bits required for an address was set to floorLog2(memory size).
This is correct under the assumption that the memory size is a power of 2,
which is not always true. Hence, floorLog2 is being replaced with ceilLog2.
2013-01-31 09:44:20 -06:00
Andreas Hansson
a4288dabf9 mem: Add comments for the DRAM address decoding
This patch adds more verbose comments to explain the two different
address mapping schemes of the DRAM controller.
2013-01-31 07:49:18 -05:00
Andreas Hansson
c4898b15bc mem: Add DDR3 and LPDDR2 DRAM controller configurations
This patch moves the default DRAM parameters from the SimpleDRAM class
to two different subclasses, one for DDR3 and one for LPDDR2. More can
be added as we go forward.

The regressions that previously used the SimpleDRAM are now using
SimpleDDR3 as this is the most similar configuration.
2013-01-31 07:49:14 -05:00
Ani Udipi
eaa37e611f mem: Add tTAW and tFAW to the SimpleDRAM model
This patch adds two additional scheduling constraints to the DRAM
controller model, to constrain the activation rate. The two metrics
are determine the size of the activation window in terms of the number
of activates and the minimum time required for that number of
activates. This maps to current DDRx, LPDDRx and WIOx standards that
have either tFAW (4 activate window) or tTAW (2 activate window)
scheduling constraints.
2013-01-31 07:49:14 -05:00
Andreas Hansson
b7153e2a64 mem: Separate out the different cases for DRAM bus busy time
This patch changes how the data bus busy time is calculated such that
it is delayed to the actual scheduling time of the request as opposed
to being done as soon as possible.

This patch changes a bunch of statistics, and the stats update is
bundled together with the introruction of tFAW/tTAW and the named DRAM
configurations like DDR3 and LPDDR2.
2013-01-31 07:49:13 -05:00
Anthony Gutierrez
af0f8b31db cache: remove drainManager because it's not used
the cache drainManager is set but never cleared, this is because
the cache itself does not need to be drained and thus never
triggers a signalDrainDone(). because the drainManager variable
is not used properly and does not appear to be necessary it has
been removed with this patch.
2013-01-28 20:19:42 -05:00
Nilay Vaish
a8eb5b18e0 ruby: remove get_time()
This patch replaces get_time() in *.sm files with curCycle() which
is now possible since controllers are clocked objects.
2013-01-28 06:14:18 -06:00
Nilay Vaish
31659e83fb ruby: remove call to curCycle in panic()
The panic() function already prints the current tick value. This call to
curCycle() is as such redundant. Since we are trying to move towards multiple
clock domains, this call will print misleading time.
2013-01-28 06:11:42 -06:00
Nilay Vaish ext:(%2C%20Timothy%20Jones%20%3Ctimothy.jones%40cl.cam.ac.uk%3E)
dbeabedaf0 branch predictor: move out of o3 and inorder cpus
This patch moves the branch predictor files in the o3 and inorder directories
to src/cpu/pred. This allows sharing the branch predictor across different
cpu models.

This patch was originally posted by Timothy Jones in July 2010
but never made it to the repository.

--HG--
rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc
rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh
rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh
rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh
2013-01-24 12:28:51 -06:00
Andrea Pellegrini
11d5ffa108 o3 cpu: fix zero reg problem
There was an issue w/ the rename logic, which would assign a previous physical
register to the ZeroReg architectural register in x86.  This issue was giving
problems for instructions squashed in threads w/ ID different from 0,
sometimes allowing non-mispredicted instructions to obtain a value different
from zero when reading the zeroReg.
2013-01-22 00:13:28 -06:00
Nilay Vaish
fc57ae6401 x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch
The changes made by the changeset 270c9a75e91f do not work well with switching
of cpus. The problem is that decoder for the old thread context holds state
that is not taken over by the new decoder.

This patch adds a takeOverFrom() function to Decoder class in each ISA. Except
for x86, functions in other ISAs are blank. For x86, the function copies state
from the old decoder to the new decoder.
2013-01-22 00:10:10 -06:00
Joel Hestness
1429d21244 O3 IEW: Make incrWb and decrWb clearer
Move the increment/decrement of wbOutstanding outside of the comparison
in incrWb and decrWb in the IEW. This also fixes a compiler bug with gcc
4.4.7, which incorrectly optimizes "-- ==" as "-=".
2013-01-19 15:14:54 -06:00
Nilay Vaish
5b6f972750 ruby: remove calls to g_system_ptr->getTime()
This patch further removes calls to g_system_ptr->getTime() where ever other
clocked objects are available for providing current time.
2013-01-17 13:10:12 -06:00
Nilay Vaish
f2bcf4f01c x86 cpuid: enable clflush
Note that clflush is only being enabled. It is not implemented
in actual. A warning is printed if the cpu encounters a clflush
instruction. We need to enable this instruction in cpuid since
JRE 1.7 tests for it.
2013-01-15 07:43:21 -06:00
Nilay Vaish
ac9bb51405 x86: implements fsin, fcos instructions 2013-01-15 07:43:21 -06:00
Nilay Vaish
7f5463539b x86: implements emms instruction 2013-01-15 07:43:20 -06:00
Nilay Vaish
91b00d98a5 x86: implement fabs, fchs instructions 2013-01-15 07:43:19 -06:00
Malek Musleh
1abf950f3c ruby sequencer: converts cycles to ticks in deadlock panic()
This patch converts the panic() print outs in the Sequencer::wakeup()
call from ruby cycles to Ticks(). This makes it easier to debug deadlocks
with the ProtocolTrace flag so the issue time indicated in the panic message
can be quickly searched for.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-01-14 10:05:12 -06:00
Nilay Vaish
2012983718 Ruby: remove reference to g_system_ptr from class Message
This patch was initiated so as to remove reference to g_system_ptr,
the pointer to Ruby System that is used for getting the current time.
That simple change actual requires changing a lot many things in slicc and
garnet. All these changes are related to how time is handled.

In most of the places, g_system_ptr has been replaced by another clock
object. The changes have been done under the assumption that all the
components in the memory system are on the same clock frequency, but the
actual clocks might be distributed.
2013-01-14 10:05:10 -06:00
Nilay Vaish
cf232de461 Ruby: use ClockedObject in Consumer class
Many Ruby structures inherit from the Consumer, which is used for scheduling
events. The Consumer used to relay on an Event Manager for scheduling events
and on g_system_ptr for time. With this patch, the Consumer will now use a
ClockedObject to schedule events and to query for current time. This resulted
in several structures being converted from SimObjects to ClockedObjects. Also,
the MessageBuffer class now requires a pointer to a ClockedObject so as to
query for time.
2013-01-14 10:04:21 -06:00
Andreas Hansson
cbbc4c7f6b scons: Address clang 3.2 compilation error
This patch fixes a compilation error encountered using clang 3.2 on OSX.
2013-01-14 10:23:56 -05:00
Nilay Vaish
f7c0ba406e base simple cpu: removes commented out code about cache ops 2013-01-12 22:11:16 -06:00
Nilay Vaish
25ec278a0b x86: Changes to decoder, corrects 9376
The changes made by the changeset 9376 were not quite correct. The patch made
changes to the code which resulted in decoder not getting initialized correctly
when the state was restored from a checkpoint.

This patch adds a startup function to each ISA object. For x86, this function
sets the required state in the decoder. For other ISAs, the function is empty
right now.
2013-01-12 22:09:48 -06:00
Lluís Vilanova
807168a1de util: add m5_fail op.
Used as a command in full-system scripts helps the user ensure the benchmarks have finished successfully.

For example, one can use:

    /path/to/benchmark args || /sbin/m5 fail 1

and thus ensure gem5 will exit with an error if the benchmark fails.
2013-01-08 08:54:12 -05:00
Tao Zhang
858d99b7cc sim: Fix early termination in multi-core simulation under SE mode.
When "-I" (maximum instruction number) and "-F" (fastforward instruction
number) are applied together, gem5 immediately exits after the cpu switching.
The reason is that multiple exit events may be generated in the same cycle by
Atomic CPU and inserted to mainEventQueue. However, mainEventQueue can only
serve one exit event in one cycle. Therefore, the rest exit events are left in
mainEventQueue without being descheduled or deleted, which causes gem5 exits
immediately after the system resumes by cpu switching.
2013-01-08 08:54:11 -05:00
Mitch Hayenga
4a752b1655 arm: add access syscall for ARM SE mode
This patch adds the "access" syscall for ARM SE as required by some spec2006
benchmarks.
2013-01-08 08:54:07 -05:00
Mitch Hayenga
c7dbd5e768 mem: Make LL/SC locks fine grained
The current implementation in gem5 just keeps a list of locks per cacheline.
Due to this, a store to a non-overlapping portion of the cacheline can cause an
LL/SC pair to fail.  This patch simply adds an address range to the lock
structure, so that the lock is only invalidated if the store overlaps the lock
range.
2013-01-08 08:54:07 -05:00
Mitch Hayenga
dc4a0aa2fa mem: Fix use-after-free bug
Running with valgrind I noticed a use after free originating from
simple_mem.cc.  It looks like this is a known issue and this additional call
site was missed in an earlier patch.
2013-01-08 08:54:06 -05:00
Andreas Sandberg
8480615d8d dev: Fix infinite recursion in DMA devices
The DMA device sometimes calls the process() method on a completion
event directly instead of scheduling it on the current tick. This
breaks some devices that assume that the completion handler won't be
called until the current event handler has returned. Specifically, it
causes infinite recursion in the IdeDisk component because it does not
advance its chunk generator until after a dmaRead()/dmaWrite() has
returned. This changeset removes this mico-optimization and schedules
the event in the current tick instead. This way the semantics event
handling stay the same even when the delay is 0.
2013-01-07 16:56:39 -05:00
Sascha Bischoff
8a767885d6 stats: Fix swig wrapping for Tick in stats
Tick was not correctly wrapped for the stats system, and therefore it was not
possible to configure the stats dumping from the python scripts without
defining Ticks as long long. This patch fixes the wrapping of Tick by copying
the typemap of uint64_t to Tick.
2013-01-07 16:56:36 -05:00
Andreas Sandberg
009970f59b cpu: Unify the serialization code for all of the CPU models
Cleanup the serialization code for the simple CPUs and the O3 CPU. The
CPU-specific code has been replaced with a (un)serializeThread that
serializes the thread state / context of a specific thread. Assuming
that the thread state class uses the CPU-specific thread state uses
the base thread state serialization code, this allows us to restore a
checkpoint with any of the CPU models.
2013-01-07 13:05:52 -05:00
Andreas Sandberg
e09e9fa279 cpu: Flush TLBs on switchOut()
This changeset inserts a TLB flush in BaseCPU::switchOut to prevent
stale translations when doing repeated switching. Additionally, the
TLB flushing functionality is exported to the Python to make debugging
of switching/checkpointing easier.

A simulation script will typically use the TLB flushing functionality
to generate a reference trace. The following sequence can be used to
simulate a handover (this depends on how drain is implemented, but is
generally the case) between identically configured CPU models:

  m5.drain(test_sys)
  [ cpu.flushTLBs() for cpu in test_sys.cpu ]
  m5.resume(test_sys)

The generated trace should normally be identical to a trace generated
when switching between identically configured CPU models or
checkpointing and resuming.
2013-01-07 13:05:48 -05:00
Andreas Sandberg
964aa49d15 mem: Fix guest corruption when caches handle uncacheable accesses
When the classic gem5 cache sees an uncacheable memory access, it used
to ignore it or silently drop the cache line in case of a
write. Normally, there shouldn't be any data in the cache belonging to
an uncacheable address range. However, since some architecture models
don't implement cache maintenance instructions, there might be some
dirty data in the cache that is discarded when this happens. The
reason it has mostly worked before is because such cache lines were
most likely evicted by normal memory activity before a TLB flush was
requested by the OS.

Previously, the cache model would invalidate cache lines when they
were accessed by an uncacheable write. This changeset alters this
behavior so all uncacheable memory accesses cause a cache flush with
an associated writeback if necessary. This is implemented by reusing
the cache flushing machinery used when draining the cache, which
implies that writebacks are performed using functional accesses.
2013-01-07 13:05:47 -05:00
Andreas Sandberg
1814a85a05 cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.

Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
9e8003148f cpu: Make sure that a drained atomic CPU isn't executing ucode
Currently, the atomic CPU can be in the middle of a microcode sequence
when it is drained. This leads to two problems:

 * When switching to a hardware virtualized CPU, we obviously can't
   execute gem5 microcode.

 * Since curMacroStaticInst is populated when executing microcode,
   repeated switching between CPUs executing microcode leads to
   incorrect execution.

After applying this patch, the CPU will be on a proper instruction
boundary, which means that it is safe to switch to any CPU model
(including hardware virtualized ones). This changeset fixes a bug
where the multiple switches to the same atomic CPU sometimes corrupts
the target state because of dangling pointers to the currently
executing microinstruction.

Note: This changeset moves tick event descheduling from switchOut() to
drain(), which makes timing consistent between just draining a system
and draining /and/ switching between two atomic CPUs. This makes
debugging quite a lot easier (execution traces get the same timing),
but the latency of the last instruction before a drain will not be
accounted for correctly (it will always be 1 cycle).

Note 2: This changeset removes so_state variable, the locked variable,
and the tickEvent from checkpoints since none of them contain state
that needs to be preserved across checkpoints. The so_state is made
redundant because we don't use the drain state variable anymore, the
lock variable should never be set when the system is drained, and the
tick event isn't scheduled.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
f9bcf46371 cpu: Make sure that a drained timing CPU isn't executing ucode
Currently, the timing CPU can be in the middle of a microcode sequence
or multicycle (stayAtPC is true) instruction when it is drained. This
leads to two problems:

 * When switching to a hardware virtualized CPU, we obviously can't
   execute gem5 microcode.

 * If stayAtPC is true we might execute half of an instruction twice
   when restoring a checkpoint or switching CPUs, which leads to an
   incorrect execution.

After applying this patch, the CPU will be on a proper instruction
boundary, which means that it is safe to switch to any CPU model
(including hardware virtualized ones). This changeset also fixes a bug
where the timing CPU sometimes switches out with while stayAtPC is
true, which corrupts the target state after a CPU switch or
checkpoint.

Note: This changeset removes the so_state variable from checkpoints
since the drain state isn't used anymore.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
52ff37caa3 cpu: Fix broken thread context handover
The thread context handover code used to break when multiple handovers
were performed during the same quiesce period. Previously, the thread
contexts would assign the TC pointer in the old quiesce event to the
new TC. This obviously broke in cases where multiple switches were
performed within the same quiesce period, in which case the TC pointer
in the quiesce event would point to an old CPU.

The new implementation deschedules pending quiesce events in the old
TC and schedules a new quiesce event in the new TC. The code has been
refactored to remove most of the code duplication.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
fca4fea769 cpu: Fix O3 LSQ debug dumping constness and formatting 2013-01-07 13:05:46 -05:00
Andreas Sandberg
fb52ea9220 arm: Invalidate cached TLB configuration in drainResume
Currently, we invalidate the cached miscregs in
TLB::unserialize(). The intended use of the drainResume() method is to
invalidate cached state and prepare the system to resume after a CPU
handover or (un)serialization. This patch moves the TLB miscregs
invalidation code to the drainResume() method to avoid surprising
behavior.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
0d59549cd9 arm: Fix draining of the pagetable walker when squashing
Since the page table walker only checks if a drain has completed in
doL1DescriptorWrapper() and doL2DescriptorWrapper(), it sometimes
looses track of a drain request if there is a squash. This changeset
adds a completeDrain() call after squashing requests in the pending
queue, which fixes this issue.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
8db27aa230 cpu: Fix broken squashAfter implementation in O3 CPU
Commit can currently both commit and squash in the same cycle. This
confuses other stages since the signals coming from the commit stage
can only signal either a squash or a commit in a cycle. This changeset
changes the behavior of squashAfter so that it commits all
instructions, including the instruction that requested the squash, in
the first cycle and then starts to squash in the next cycle.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
a2077ccf02 o3 cpu: Remove unused variables 2013-01-07 13:05:45 -05:00
Andreas Sandberg
1c3a1888d8 sim: Remove unused variables 2013-01-07 13:05:45 -05:00
Andreas Sandberg
2cfe62adc4 cpu: Rename defer_registration->switched_out
The defer_registration parameter is used to prevent a CPU from
initializing at startup, leaving it in the "switched out" mode. The
name of this parameter (and the help string) is confusing. This patch
renames it to switched_out, which should be more descriptive.
2013-01-07 13:05:45 -05:00
Andreas Sandberg
f7da0fddd1 cpu: Remove unused params.hh header file in inorder CPU 2013-01-07 13:05:45 -05:00
Andreas Sandberg
38925ff621 arm: Remove the register mapping hack used when copying TCs
In order to see all registers independent of the current CPU mode, the
ARM architecture model uses the magic MISCREG_CPSR_MODE register to
change the register mappings without actually updating the CPU
mode. This hack is no longer needed since the thread context now
provides a flat interface to the register file. This patch replaces
the CPSR_MODE hack with the flat register interface.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
a7e0cbeb36 cpu: Introduce sanity checks when switching between CPUs
This patch introduces the following sanity checks when switching
between CPUs:

 * Check that the set of new and old CPUs do not overlap. Having an
   overlap between the set of new CPUs and the set of old CPUs is
   currently not supported. Doing such a switch used to result in the
   following assertion error:
     BaseCPU::takeOverFrom(BaseCPU*): \
       Assertion `!new_itb_port->isConnected()' failed.

 * Check that all new CPUs are in the switched out state.

 * Check that all old CPUs are in the switched in state.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
901258c22b cpu: Correctly call parent on switchOut() and takeOverFrom()
This patch cleans up the CPU switching functionality by making sure
that CPU models consistently call the parent on switchOut() and
takeOverFrom(). This has the following implications that might alter
current functionality:

 * The call to BaseCPU::switchout() in the O3 CPU is moved from
   signalDrained() (!) to switchOut().

 * A call to BaseSimpleCPU::switchOut() is introduced in the simple
   CPUs.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
4ae02295d5 cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order
to do serialization. This was a bit of a hack involving two static
SimpleThread instances and a magic constructor that was only used by
the O3 CPU.

This patch moves the ThreadContext serialization code into two global
procedures that, in addition to the normal serialization parameters,
take a ThreadContext reference as a parameter. This allows us to reuse
the serialization code in all ThreadContext implementations.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
6daada2701 cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.

This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
e2dad8236a cpu: Implement a flat register interface in thread contexts
Some architectures map registers differently depending on their mode
of operations. There is currently no architecture independent way of
accessing all registers. This patch introduces a flat register
interface to the ThreadContext class. This interface is useful, for
example, when serializing or copying thread contexts.
2013-01-07 13:05:44 -05:00
Andreas Sandberg
17b47d35e1 arch: Move the ISA object to a separate section
After making the ISA an independent SimObject, it is serialized
automatically by the Python world. Previously, this just resulted in
an empty ISA section. This patch moves the contents of the ISA to that
section and removes the explicit ISA serialization from the thread
contexts, which makes it behave like a normal SimObject during
serialization.

Note: This patch breaks checkpoint backwards compatibility! Use the
cpt_upgrader.py utility to upgrade old checkpoints to the new format.
2013-01-07 13:05:42 -05:00
Andreas Sandberg
7eb0fb8b6e cpu: Check that the memory system is in the correct mode
This patch adds checks to all CPU models to make sure that the memory
system is in the correct mode at startup and when resuming after a
drain.  Previously, we only checked that the memory system was in the
right mode when resuming. This is inadequate since this is a
configuration error that should be detected at startup as well as when
resuming. Additionally, since the check was done using an assert, it
wasn't performed when NDEBUG was set (e.g., the fast target).
2013-01-07 13:05:41 -05:00
Andreas Sandberg
94561dd526 arch: Add support for invalidating TLBs when draining
This patch adds support for the memInvalidate() drain method.  TLB
flushing is requested by calling the virtual flushAll() method on the
TLB.

Note: This patch renames invalidateAll() to flushAll() on x86 and
SPARC to make the interface consistent across all supported
architectures.
2013-01-07 13:05:40 -05:00
Andreas Sandberg
d44f2f611f mem: Remove the IIC replacement policy
The IIC replacement policy seems to be unused and has probably
gathered too much bit rot to be useful. This patch removes the IIC and
its associated cache parameters.
2013-01-07 13:05:39 -05:00
Andreas Hansson
9364d35b8b dev: Do not serialize timer parameters
This patch removes the intNum and clock from the serialized scalars as
these are set by the Python parameters and should not be part of the
checkpoint.
2013-01-07 13:05:39 -05:00
Andreas Hansson
406891c62a scons: Enforce gcc >= 4.4 or clang >= 2.9 and c++0x support
This patch checks that the compiler in use is either gcc >= 4.4 or
clang >= 2.9. and enables building with --std=c++0x in all cases. As a
consequence, we can tidy up the hashmap and always have static_assert
available. If anyone wants to use alternative compilers, icc for
example supports c++0x to a similar level and could be added if
needed.

This patch opens up for a more elaborate use of c++0x features that
are present in gcc 4.4 and clang 2.9, e.g. auto typed variables,
variadic templates, rvalues and move semantics, and strongly typed
enums. There will be no going back on this one...
2013-01-07 13:05:39 -05:00
Andreas Hansson
221302335b scons: Remove stale compiler options
This patch simply prunes the SUNCC and ICC compiler options as they
are both sufficiently stale that they would have to be re-written from
scratch anyhow. The patch serves to clean things up before shifting to
a build environment that enforces basic c++11 compliance as done in
the following patch.
2013-01-07 13:05:39 -05:00
Andreas Hansson
921490a060 sim: Fatal if a clocked object is set to have a clock of 0
This patch adds a check to the clocked object constructor to ensure it
is not configured to have a clock period of 0.
2013-01-07 13:05:39 -05:00
Andreas Hansson
490dc30d96 dev: Make the ethernet devices use a non-zero clock
This patch changes the NS gige controller to have a non-clock, and
sets the default to 500 MHz. The blocks that could prevoiusly be
by-passed with a zero clock are now always present, and the user is
left with the option of setting a very high clock frequency to achieve
a similar performance.
2013-01-07 13:05:39 -05:00
Chander Sudanthi
694a81e994 ARM: pl111/LCD framebuffer checkpointing fix
Fixed check pointing of the framebuffer.  Previously, the pixel size was not
considered in determining the size of the buffer to checkpoint.  This patch
checkpoints the entire framebuffer instead of the first quarter.
2013-01-07 13:05:39 -05:00
Andreas Sandberg
c3551e82f7 arch: Fix broken M5VarArgsFault initialization
At least gcc 4.4.3 seems to get confused by the use of func both as a
template parameter and a member variable in the M5VarArgsFault
class. This causes the value of the member variable func to be
unpredictable in M5VarArgsFault objects. This changeset renames the
template parameter to remove this ambiguity.
2013-01-07 13:05:38 -05:00
Andreas Hansson
18b147acef mem: Merge ranges that are part of the conf table
This patch adds basic merging of address ranges when determining which
address ranges should be reported in the configuration table. By
performing this merging it is possible to distribute an address range
across many memory channels (controllers). This is essential to enable
address interleaving.
2013-01-07 13:05:38 -05:00
Andreas Hansson
b8c2fa6ba9 base: Add support for merging of interleaved address ranges
This patch adds support for merging a vector of interleaved address
ranges into a contigous range. The functionality will be used in the
interconnect and the PhysicalMemory to transform interleaved memory
ranges to contigous ranges before passing them on.

The actual use of the merging is appearing in future patches.
2013-01-07 13:05:38 -05:00
Andreas Hansson
01c5598373 mem: Add interleaving bits to the address ranges
This patch adds support for interleaving bits for the address
ranges. What was previously just a start and end address, now has an
additional three fields, for the high bit, and number of bits to use
for interleaving, and a match value to compare against. If the number
of interleaving bits is set to zero it is effectively disabled.

A number of convenience functions are added to the range to enquire
about the interleaving, its granularity and the number of stripes it
is part of.
2013-01-07 13:05:38 -05:00
Andreas Hansson
e6c57786a4 config: Traverse lists when visiting children in all proxy
This patch makes the all proxy traverse any potential list that is
encountered in the object hierarchy instead of only looking at
children that are SimObjects. An example of where this is useful is
when creating a multi-channel memory system as a list of controllers,
whilst ensuring that the memories are still visible in the system.
2013-01-07 13:05:38 -05:00
Andreas Hansson
e0d93fde99 base: Simplify the AddrRangeMap by removing unused code
This patch cleans up the AddrRangeMap in preparation for the addition
of interleaving by removing unused code. The non-const editions of
find are never used, and hence the duplication is not needed.
2013-01-07 13:05:38 -05:00
Andreas Hansson
e65de3f5ca config: Do not use hardcoded physmem in fs script
This patch generalises the address range resolution for the I/O cache
and I/O bridge such that they do not assume a single memory. The patch
involves adding a parameter to the system which is then defined based
on the memories that are to be visible from the I/O subsystem, whether
behind a cache or a bridge.

The change is needed to allow interleaved memory controllers in the
system.
2013-01-07 13:05:38 -05:00
Andreas Hansson
15a979c6be mem: Tidy up bus addr range debug messages
This patch tidies up a number of the bus DPRINTFs related to range
manipulation. In particular, it shifts the message about range changes
to the start of the member function, and also adds information about
when all ranges are received.
2013-01-07 13:05:38 -05:00
Andreas Hansson
caf6786ad5 mem: Skip address mapper range checks to allow more flexibility
This patch makes the address mapper less stringent about checking the
before and after ranges, i.e. the original and remapped ranges. The
checks were not really necessary, and there are situations when the
previous checks were too strict.
2013-01-07 13:05:38 -05:00
Andreas Hansson
71da1d2157 base: Encapsulate the underlying fields in AddrRange
This patch makes the start and end address private in a move to
prevent direct manipulation and matching of ranges based on these
fields. This is done so that a transition to ranges with interleaving
support is possible.

As a result of hiding the start and end, a number of member functions
are needed to perform the comparisons and manipulations that
previously took place directly on the members. An accessor function is
provided for the start address, and a function is added to test if an
address is within a range. As a result of the latter the != and ==
operator is also removed in favour of the member function. A member
function that returns a string representation is also created to allow
debug printing.

In general, this patch does not add any functionality, but it does
take us closer to a situation where interleaving (and more cleverness)
can be added under the bonnet without exposing it to the user. More on
that in a later patch.
2013-01-07 13:05:38 -05:00
Andreas Hansson
cfdaf53104 mem: Remove the joining of neighbouring ranges
This patch temporarily removes the joining of ranges when creating the
backing store, to reserve this functionality for the interleaved
ranges that are about to be introduced.

When creating the mmaps for the backing store, there is no point in
creating larger contigous chunks that what is necessary. The larger
chunks will only make life more difficult for the host.

Merging will be re-added later, but then only for interleaved ranges.
2013-01-07 13:05:38 -05:00
Andreas Hansson
ccb6c64047 cpu: Share the send functionality between traffic generators
This patch moves the packet creating and sending to a member function
in the shared base class to avoid code duplication.
2013-01-07 13:05:37 -05:00
Andreas Hansson
1da209140c cpu: Add support for protobuf input for the trace generator
This patch adds support for reading input traces encoded using
protobuf according to what is done in the CommMonitor.

A follow-up patch adds a Python script that can be used to convert the
previously used ASCII traces to protobuf equivalents. The appropriate
regression input is updated as part of this patch.
2013-01-07 13:05:37 -05:00
Andreas Hansson
35bdee72cb cpu: Encapsulate traffic generator input in a stream
This patch encapsulates the traffic generator input in a stream class
such that the parsing is not visible to the trace generator. The
change takes us one step closer to using protobuf-based input traces
for the trace replay.

The functionality of the current input stream is identical to what it
was, and the ASCII format remains the same for now.
2013-01-07 13:05:37 -05:00
Andreas Hansson
4afa6c4c3e base: Add wrapped protobuf input stream
This patch adds support for inputting protobuf messages through a
ProtoInputStream which hides the internal streams used by the
library. The stream is created based on the name of an input file and
optionally includes decompression using gzip.

The input stream will start by getting a magic number from the file,
and also verify that it matches with the expected value. Once opened,
messages can be read incrementally from the stream, returning
true/false until an error occurs or the end of the file is reached.
2013-01-07 13:05:37 -05:00
Andreas Hansson
f456c7983d mem: Add tracing support in the communication monitor
This patch adds packet tracing to the communication monitor using a
protobuf as the mechanism for creating the trace.

If no file is specified, then the tracing is disabled. If a file is
specified, then for every packet that is successfully sent, a protobuf
message is serialized to the file.
2013-01-07 13:05:37 -05:00
Andreas Hansson
11ab30fa5a base: Add wrapped protobuf output streams
This patch adds support for outputting protobuf messages through a
ProtoOutputStream which hides the internal streams used by the
library. The stream is created based on the name of an output file and
optionally includes compression using gzip.

The output stream will start by putting a magic number in the file,
and then for every message that is serialized prepend the size such
that the stream can be written and read incrementally. At this point
this merely serves as a proof of concept.
2013-01-07 13:05:37 -05:00
Andreas Hansson
41f228c2ea scons: Add support for google protobuf building
This patch enables the use of protobuf input files in the build
process, thus allowing .proto files to be added to input. Each .proto
file is compiled using the protoc tool and the newly created C++
source is added to the list of sources.

The first location where the protobufs will be used is in the
capturing and replay of memory traces, involving the communication
monitor and the trace-generator state of the traffic generator. This
will follow in the next patch.

This patch does add a dependency on the availability of the BSD
licensed protobuf library (and headers), and the protobuf compiler,
protoc. These dependencies are checked in the SConstruct, similar to
e.g. swig. The user can override the use of protoc from the PATH by
specifying the PROTOC environment variable.

Although the dependency on libprotobuf and protoc might seem like a
big step, they add significant value to the project going
forward. Execution traces and other types of traces could easily be
added and parsers for C++ and Python are automatically generated. We
could also envision using protobufs for the checkpoints, description
of the traffic-generator behaviour etc. The sky is the limit. We could
also use the GzipOutputStream from the protobuf library instead of the
current GPL gzstream.

Currently, only the C++ source and header is generated. Going forward
we might want to add the Python output to support simple command-line
tools for displaying and editing the traces.
2013-01-07 13:05:37 -05:00
Andreas Sandberg
63f1d0516d arm: Fix DMA event handling bug in the PL111 model
The PL111 model currently maintains a list of pre-allocated
DmaDoneEvents to prevent unnecessary heap allocations. This list
effectively works like a stack where the top element is the latest
scheduled event. When an event triggers, the top pointer is moved down
the stack. This obviously breaks since events usually retire from the
bottom (events don't necessarily have to retire in order), which
triggers the following assertion:

gem5.debug: build/ARM/dev/arm/pl111.cc:460: void Pl111::fillFifo(): \
  Assertion `!dmaDoneEvent[dmaPendingNum-1].scheduled()' failed.

This changeset adds a vector listing the currently unused events. This
vector acts like a stack where the an element is popped off the stack
when a new event is needed an pushed on the stack when they trigger.
2013-01-07 13:05:37 -05:00
Andreas Hansson
fffdc6a450 dev: Fix the Pl111 timings by separating pixel and DMA clock
This patch fixes the Pl111 timings by creating a separate clock for
the pixel timings. The device clock is used for all interactions with
the memory system, just like the AHB clock on the actual module.

The result without this patch is that the module only is allowed to
send one request every tick of the 24MHz clock which causes a huge
backlog.
2013-01-07 13:05:36 -05:00
Andreas Hansson
f22d3bb9c3 cpu: Fix the traffic gen read percentage
This patch fixes the computation that determines whether to perform a
read or a write such that the two corner cases (0 and 100) are both
more efficient and handled correctly.
2013-01-07 13:05:35 -05:00
Andreas Hansson
852a7bcf92 mem: Add sanity check to packet queue size
This patch adds a basic check to ensure that the packet queue does not
grow absurdly large. The queue should only be used to store packets
that were delayed due to blocking from the neighbouring port, and not
for actual storage. Thus, a limit of 100 has been chosen for now
(which is already quite substantial).
2013-01-07 13:05:35 -05:00
Andreas Hansson
ce5fc494e3 ruby: Fix missing cxx_header in Switch
This patch addresses a warning related to the swig interface
generation for the Switch class. The cxx_header is now specified
correctly, and the header in question has got a few includes added to
make it all compile.
2013-01-07 13:05:35 -05:00
Chris Emmons
b7827a5aaa config: Replace second keyboard with a mouse.
The platform has two KMI devices that are both setup to be keyboards.  This
patch changes the second keyboard to a mouse.  This patch will allow keyboard
input as usual and additionally provide mouse support.
2013-01-07 13:05:35 -05:00
Andreas Hansson
174269978a mem: Fix a bug in the memory serialization file naming
This patch fixes a bug that caused multiple systems to overwrite each
other physical memory. The system name is now included in the filename
such that this is avoided.
2013-01-07 13:05:35 -05:00
Andreas Sandberg
0d1ad50326 arm: Make ID registers ISA parameters
This patch makes the values of ID_ISARx, MIDR, and FPSID configurable
as ISA parameter values. Additionally, setMiscReg now ignores writes
to all of the ID registers.

Note: This moves the MIDR parameter from ArmSystem to ArmISA for
consistency.
2013-01-07 13:05:35 -05:00
Andreas Sandberg
3db3f83a5e arch: Make the ISA class inherit from SimObject
The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.

This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.
2013-01-07 13:05:35 -05:00
Ali Saidi
69d419f313 o3: Fix issue with LLSC ordering and speculation
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked
address. Previously we relied on the cache to handle the locking for us, however
some users on the gem5 mailing list reported a case where the cpu speculatively
executes a ll operation after a pending sc operation in the pipeline and that
makes the cache monitor valid. This should handle that case by invaliding the
local monitor.
2013-01-07 13:05:33 -05:00
Ali Saidi
5146a69835 cpu: rename the misleading inSyscall to noSquashFromTC
isSyscall was originally created because during handling of a syscall in SE
mode the threadcontext had to be updated. However, in many places this is used
in FS mode (e.g. fault handlers) and the name doesn't make much sense. The
boolean actually stops gem5 from squashing speculative and non-committed state
when a write to a threadcontext happens, so re-name the variable to something
more appropriate
2013-01-07 13:05:33 -05:00
Ali Saidi
9a645d6e9b cache: add note about where conflicts are handled 2013-01-07 13:05:32 -05:00
Gabe Black
e17c375ddd Decoder: Remove the thread context get/set from the decoder.
This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-01-04 19:00:45 -06:00
Gabe Black
d1965af220 X86: Move address based decode caching in front of the predecoder.
The predecoder in x86 does a lot of work, most of which can be skipped if the
decoder cache is put in front of it.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-01-04 19:00:44 -06:00
Gabe Black
63b10907ef SPARC: Keep a copy of the current ASI in the decoder.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-01-04 18:09:45 -06:00
Gabe Black
a83e74b37a ARM: Keep a copy of the fpscr len and stride fields in the decoder.
Avoid reading them every instruction, and also eliminate the last use of the
thread context in the decoders.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-01-04 18:09:35 -06:00
Nilay Vaish
e9fa54de58 x86: implement x87 fp instruction fnstsw
This patch implements the fnstsw instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
2012-12-30 12:45:50 -06:00
Nilay Vaish
23ba6fc5fb x86: implement x87 fp instruction fsincos
This patch implements the fsincos instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
2012-12-30 12:45:45 -06:00
Nathanael Premillieu
3026a116ba arm: set uopSet_uop as conditional or unconditional control
uopSet_uop is microop instruction that has the IsControl flags set, but the
IsCondControl or IsUncondControl flags seems not to be set, neither in
the construction nor where the microop is used. This patch adds the the
flags in the constructor of the instruction (MicroUopSetPCCPSR).

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-12 09:50:33 -06:00
Nathanael Premillieu
84fc57bfe6 arm: set movret_uop as conditional or unconditional control
A flag was missing for the movret_uop microop instruction. This patch adds
that flag when the instruction is used, not directly in the constructor of
the instruction.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-12 09:50:16 -06:00
Nilay Vaish
f3d0be210f ruby: add support for prefetching to MESI protocol 2012-12-11 10:05:56 -06:00
Nilay Vaish
c120273708 ruby: modify the directed tester to read/write streams
The directed tester supports only generating only read or only write accesses. The
patch modifies the tester to support streams that have both read and write accesses.
2012-12-11 10:05:55 -06:00
Nilay Vaish
9b72a0f627 ruby: change slicc to allow for constructor args
The patch adds support to slicc for recognizing arguments that should be
passed to the constructor of a class. I did not like the fact that an explicit
check was being carried on the type 'TBETable' to figure out the arguments to
be passed to the constructor.
The patch also moves some of the member variables that are declared for all
the controllers to the base class AbstractController.
2012-12-11 10:05:55 -06:00
Nilay Vaish
93e283abb3 ruby: add a prefetcher
This patch adds a prefetcher for the ruby memory system. The prefetcher
is based on a prefetcher implemented by others (well, I don't know
who wrote the original). The prefetcher does stride-based prefetching,
both unit and non-unit. It obseves the misses in the cache and trains on
these. After the training period is over, the prefetcher starts issuing
prefetch requests to the controller.
2012-12-11 10:05:54 -06:00
Nilay Vaish
d502384795 ruby: add functions for computing next stride/page address 2012-12-11 10:05:53 -06:00
Erik Tomusk
3dc7e4f496 TournamentBP: Fix some bugs with table sizes and counters
globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled.
globalHistoryBits controls how much history is kept, global and choice
predictor sizes control how much of that history is used when accessing
predictor tables. This way, global and choice predictors can actually be
different sizes, and it is no longer possible to walk off the predictor arrays
and cause a seg fault.

There are now individual thresholds for choice, global, and local saturating
counters, so that taken/not taken decisions are correct even when the
predictors' counters' sizes are different.

The interface for localPredictorSize has been removed from TournamentBP because
the value can be calculated from localHistoryBits.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-06 09:31:06 -06:00
Malek Musleh
150e9b8c68 inorder cpu: add missing DPRINTF argument
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-06 05:25:40 -06:00
Nathanael Premillieu
eb899407c5 o3 cpu: remove some unused buggy functions in the lsq
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-12-06 04:36:51 -06:00
Nilay Vaish
2d6470936c sim: have a curTick per eventq
This patch adds a _curTick variable to an eventq. This variable is updated
whenever an event is serviced in function serviceOne(), or all events upto
a particular time are processed in function serviceEvents(). This change
helps when there are eventqs that do not make use of curTick for scheduling
events.
2012-11-16 10:27:47 -06:00
Nilay Vaish
90c45c29fe ruby: support functional accesses in garnet flexible network 2012-11-10 17:18:01 -06:00
Nilay Vaish
1492ab066d ruby: bug in functionalRead, revert recent changes
Recent changes to functionalRead() in the memory system was not correct.
The change allowed for returning data from the first message found in
the buffers of the memory system. This is not correct since it is possible
that a timing message has data from an older state of the block.

The changes are being reverted.
2012-11-10 17:18:00 -06:00
Andreas Hansson
c4b36901d0 mem: Fix DRAM draining to ensure write queue is empty
This patch fixes the draining of the SimpleDRAM controller model. The
controller performs buffering of writes and normally there is no need
to ever empty the write buffer (if you have a fast on-chip memory,
then use it). The patch adds checks to ensure the write buffer is
drained when the controller is asked to do so.
2012-11-08 04:25:06 -05:00
Hamid Reza Khaleghzadeh ext:(%2C%20Lluc%20Alvarez%20%3Clluc.alvarez%40bsc.es%3E%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E)
8cd475d58e ruby: reset and dump stats along with reset of the system
This patch adds support to ruby so that the statistics maintained by ruby
are reset/dumped when the statistics for the rest of the system are
reset/dumped. For resetting the statistics, ruby now provides the
resetStats() function that a sim object can provide. As a consequence, the
clearStats() function has been removed from RubySystem. For dumping stats,
Ruby now adds a callback event to the dumpStatsQueue. The exit callback that
ruby used to add earlier is being removed.

Created by: Hamid Reza Khaleghzadeh.
Improved by: Lluc Alvarez, Nilay Vaish
Committed by: Nilay Vaish
2012-11-02 12:18:25 -05:00
Ali Saidi
ce5766c409 mem: fix use after free issue in memories until 4-phase work complete. 2012-11-02 11:50:16 -05:00
Andreas Sandberg
ddd6af414c mem: Add support for writing back and flushing caches
This patch adds support for the following optional drain methods in
the classical memory system's cache model:

memWriteback() - Write back all dirty cache lines to memory using
functional accesses.

memInvalidate() - Invalidate all cache lines. Dirty cache lines
are lost unless a writeback is requested.

Since memWriteback() is called when checkpointing systems, this patch
adds support for checkpointing systems with caches. The serialization
code now checks whether there are any dirty lines in the cache. If
there are dirty lines in the cache, the checkpoint is flagged as bad
and a warning is printed.
2012-11-02 11:32:02 -05:00
Andreas Sandberg
050f24c796 sim: Add drain methods to request additional cleanup operations
This patch adds the following two methods to the Drainable base class:

memWriteback() - Write back all dirty cache lines to memory using
functional accesses.

memInvalidate() - Invalidate memory system buffers. Dirty data
won't be written back.

Specifying calling memWriteback() after draining will allow us to
checkpoint systems with caches. memInvalidate() can be used to drop
memory system buffers in preparation for switching to an accelerated
CPU model that bypasses the gem5 memory system (e.g., hardware
virtualized CPUs).

Note: This patch only adds the methods to Drainable, the code for
flushing the TLB and the cache is committed separately.
2012-11-02 11:32:02 -05:00
Andreas Sandberg
aae6134b54 sim: Add SWIG interface for Serializable
This changeset adds a SWIG interface for the Serializable class, which
fixes a warning when compiling the SWIG interface for the event
queue. Currently, the only method exported is the name() method.
2012-11-02 11:32:02 -05:00
Andreas Sandberg
dc01535c7e python: Rename doDrain()->drain() and make it do the right thing
There is no point in exporting the old drain() method in
Simulate.py. It should only be used internally by doDrain(). This
patch moves the old drain() method into doDrain() and renames
doDrain() to drain().
2012-11-02 11:32:02 -05:00
Andreas Sandberg
196397fea4 sim: Reuse the code to change memory mode.
changeToAtomic and changeToTiming both do essentially the same thing,
they check the type of their input argument, drain the system, and
switch to the desired memory mode. This patch moves all of that code
to a separate method (changeMemoryMode) and calls that from both
changeToAtomic and changeToTiming.
2012-11-02 11:32:02 -05:00
Andreas Sandberg
b81a977e6a sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
2012-11-02 11:32:01 -05:00
Andreas Sandberg
eb703a4b4e cpu: O3 add a header declaring the DerivO3CPU
SWIG needs a complete declaration of all wrapped objects. This patch
adds a header file with the DerivO3CPU class and includes it in the
SWIG interface.

--HG--
rename : src/cpu/o3/cpu_builder.cc => src/cpu/o3/deriv.cc
2012-11-02 11:32:01 -05:00
Andreas Sandberg
ebe65a394b cpu: Add header files for checker CPUs
In order to create reliable SWIG wrappers, we need to include the
declaration of the wrapped class in the SWIG file. Previously, we
didn't expose the declaration of checker CPUs. This patch adds header
files for such CPUs and include them in the SWIG wrapper.

--HG--
rename : src/cpu/dummy_checker_builder.cc => src/cpu/dummy_checker.cc
rename : src/cpu/o3/checker_builder.cc => src/cpu/o3/checker.cc
2012-11-02 11:32:01 -05:00
Andreas Sandberg
df02047d5a dev: Fix ethernet device inheritance structure
The Python wrappers and the C++ should have the same object
structure. If this is not the case, bad things will happen when the
SWIG wrappers cast between an object and any of its base classes. This
was not the case for NSGigE and Sinic devices. This patch makes NSGigE
and Sinic inherit from the new EtherDevBase class, which in turn
inherits from EtherDevice. As a bonus, this removes some duplicated
statistics from the Sinic device.
2012-11-02 11:32:01 -05:00
Andreas Sandberg
c0ab52799c sim: Include object header files in SWIG interfaces
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.

This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
2012-11-02 11:32:01 -05:00
Andreas Sandberg
044a652587 pci: Make Python wrapper cast to the right type
The PCI base class is PciDev and not PciDevice, which is used by the
Python world. Make sure this is reflected in the wrapper code.
2012-11-02 11:32:01 -05:00
Andreas Sandberg
249e318212 mips: Remove unused Python file
Remove BISystem.py, BareIronMipsSystem is already implemented in
MipsSystem.py.
2012-11-02 11:32:01 -05:00
Andreas Sandberg
49a799ce77 dev: Add missing inline declarations 2012-11-02 11:32:01 -05:00
Andreas Sandberg
00b3a57d88 base: Add missing header file to addr_range.hh. 2012-11-02 11:32:01 -05:00
Dam Sunwoo
81406018b0 ARM: dump stats and process info on context switches
This patch enables dumping statistics and Linux process information on
context switch boundaries (__switch_to() calls) that are used for
Streamline integration (a graphical statistics viewer from ARM).
2012-11-02 11:32:01 -05:00
Chander Sudanthi
322daba74c base: Fix a few incorrectly handled print format cases
This patch ensures cases like %0.6u, %06f, and %.6u are processed correctly.
The case like %06f is ambiguous and was made to match printf.  Also, this patch
removes the goto statement in cprintf.cc in favor of a function call.
2012-11-02 11:32:00 -05:00
Chander Sudanthi
55787cc0d0 base: split out the VncServer into a VncInput and Server classes
This patch adds a VncInput base class which VncServer inherits from.
Another class can implement the same interface and be used instead
of the VncServer, for example a class that replays Vnc traffic.

--HG--
rename : src/base/vnc/VncServer.py => src/base/vnc/Vnc.py
rename : src/base/vnc/vncserver.cc => src/base/vnc/vncinput.cc
rename : src/base/vnc/vncserver.hh => src/base/vnc/vncinput.hh
2012-11-02 11:32:00 -05:00
Dam Sunwoo
ac161c1d72 ISA: generic Linux thread info support
This patch takes the Linux thread info support scattered across
different ISA implementations (currently in ARM, ALPHA, and MIPS), and
unifies them into a single file.

Adds a few more helper functions to read out TGID, mm, etc.

ISA-specific information (e.g., ALPHA PCBB register) is now moved to
the corresponding isa_traits.hh files.
2012-11-02 11:32:00 -05:00
Ali Saidi
d0678d1c31 sim: Fix as issue where exit events on instr queues are used after freed. 2012-11-02 11:32:00 -05:00
Mrinmoy Ghosh
4440332bdd o3: Fix a couple of issues with the local predictor.
Fix some issues with the local predictor and the way it's indexed.
2012-11-02 11:32:00 -05:00
Andreas Sandberg
7e25052fee Partly revert [4f54b0f229b5] and move draining to m5.changeToTiming
Changeset 4f54b0f229b5 removed the call to doDrain in changeToTiming
based on the assumption that the system does not need draining when
running in atomic mode. This is a false assumption since at least the
System class requires the system to be drained before it allows
switching of memory modes. This patch reverts that part of the
changeset.
2012-11-02 11:32:00 -05:00
Andreas Hansson
3d98119717 mem: Fix typo in port comments
This patch merely fixes a few typos in the port comments.
2012-10-31 09:28:23 -04:00
Andreas Hansson
6f6adbf0f6 dev: Make default clock more reasonable for system and devices
This patch changes the default system clock from 1THz to 1GHz. This
clock is used by all modules that do not override the default (parent
clock), and primarily affects the IO subsystem. Every DMA device uses
its clock to schedule the next transfer, and the change will thus
cause this inter-transfer delay to be longer.

The default clock of the bus is removed, as the clock inherited from
the system provides exactly the same value.

A follow-on patch will bump the stats.
2012-10-25 13:14:44 -04:00
Andreas Hansson
1fdc4e850e arm: Use table walker clock that is inherited from CPU
This patch simplifies the scheduling of the next walk for the ARM
table walker. Previously it used the CPU clock, but as the table
walker inherits the clock from the CPU, it is cleaner to simply use
its own clock (which is the same).
2012-10-25 04:32:42 -04:00
Andreas Hansson
69e82539fd dev: Remove zero-time loop in DMA timing send
This patch removes the zero-time loop used to send items from the DMA
port transmit list. Instead of having a loop, the DMA port now uses an
event to schedule sending of a single packet.

Ultimately this patch serves to ease the transition to a blocking
4-phase handshake.

A follow-on patch will update the regression statistics.
2012-10-23 04:49:33 -04:00
Nilay Vaish
52d8693677 ruby: functional access updates to network test protocol
I had forgotten to change the network test protocol while making changes to
ruby for supporting functional accesses. This patch updates the protocol so
that it can compile correctly.
2012-10-18 18:35:42 -05:00
Nilay Vaish
5ffc165939 ruby: improved support for functional accesses
This patch adds support to different entities in the ruby memory system
for more reliable functional read/write accesses. Only the simple network
has been augmented as of now. Later on Garnet will also support functional
accesses.
The patch adds functional access code to all the different types of messages
that protocols can send around. These messages are functionally accessed
by going through the buffers maintained by the network entities.
The patch also rectifies some of the bugs found in coherence protocols while
testing the patch.

With this patch applied, functional writes always succeed. But functional
reads can still fail.
2012-10-15 17:51:57 -05:00
Nilay Vaish
07ce90f7aa memtest: move check on outstanding requests
The Memtest tester allows for only one request to be outstanding for a
particular physical address. The check has been written separately for
reads and writes. This patch moves the check earlier than its current
position so that it need not be written separately for reads and writes.
2012-10-15 17:27:17 -05:00
Nilay Vaish
61434a9943 ruby: register multiple memory controllers
Currently the Ruby System maintains pointer to only one of the memory
controllers. But there can be multiple controllers in the system. This
patch adds a vector of memory controllers.
2012-10-15 17:27:17 -05:00
Nilay Vaish
c14e6cfc4e ruby: remove AbstractMemOrCache
The only place where this abstract class is in use is the memory controller,
which it self is an abstract class. Does not seem useful at all.
2012-10-15 17:27:16 -05:00
Nilay Vaish
3e607f146f ruby: allow function definition in slicc structs
This patch adds support for function definitions to appear in slicc structs.
This is required for supporting functional accesses for different types of
messages. Subsequent patches will use this to development.
2012-10-15 17:27:16 -05:00
Nilay Vaish
c7b0901b97 ruby banked array: do away with event scheduling
It seems unecessary that the BankedArray class needs to schedule an event
to figure out when the access ends. Instead only the time for the end of access
needs to be tracked.
2012-10-15 17:27:15 -05:00
Nilay Vaish
6a65fafa52 ruby: reset timing after cache warm up
Ruby system was recently converted to a clocked object. Such objects maintain
state related to the time that has passed so far. During the cache warmup, Ruby
system changes its own time and the global time. Later on, the global time is
restored. So Ruby system also needs to reset its own time.
2012-10-15 17:27:15 -05:00
Andreas Hansson
b6bd4f34b4 Mem: Fix incorrect logic in bus blocksize check
This patch fixes the logic in the blocksize check such that the
warning is printed if the size is not 16, 32, 64 or 128.
2012-10-15 12:51:21 -04:00
Andreas Hansson
2a740aa096 Port: Add protocol-agnostic ports in the port hierarchy
This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.

The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.
2012-10-15 08:12:35 -04:00
Andreas Hansson
9baa35ba80 Mem: Separate the host and guest views of memory backing store
This patch moves all the memory backing store operations from the
independent memory controllers to the global physical memory. The main
reason for this patch is to allow address striping in a future set of
patches, but at this point it already provides some useful
functionality in that it is now possible to change the number of
memory controllers and their address mapping in combination with
checkpointing. Thus, the host and guest view of the memory backing
store are now completely separate.

With this patch, the individual memory controllers are far simpler as
all responsibility for serializing/unserializing is moved to the
physical memory. Currently, the functionality is more or less moved
from AbstractMemory to PhysicalMemory without any major
changes. However, in a future patch the physical memory will also
resolve any ranges that are interleaved and properly assign the
backing store to the memory controllers, and keep the host memory as a
single contigous chunk per address range.

Functionality for future extensions which involve CPU virtualization
also enable the host to get pointers to the backing store.
2012-10-15 08:12:32 -04:00
Andreas Hansson
d7ad8dc608 Checkpoint: Make system serialize call children
This patch changes how the serialization of the system works. The base
class had a non-virtual serialize and unserialize, that was hidden by
a function with the same name for a number of subclasses (most likely
not intentional as the base class should have been virtual). A few of
the derived systems had no specialization at all (e.g. Power and x86
that simply called the System::serialize), but MIPS and Alpha adds
additional symbol table entries to the checkpoint.

Instead of overriding the virtual function, the additional entries are
now printed through a virtual function (un)serializeSymtab. The reason
for not calling System::serialize from the two related systems is that
a follow up patch will require the system to also serialize the
PhysicalMemory, and if this is done in the base class if ends up being
between the general parts and the specialized symbol table.

With this patch, the checkpoint is not modified, as the order of the
segments is unchanged.
2012-10-15 08:12:29 -04:00
Andreas Hansson
0c58106b6e Mem: Use deque instead of list for bus retries
This patch changes the data structure used to keep track of ports that
should be told to retry. As the bus is doing this in an FCFS way,
there is no point having a list. A deque is a better match (and is at
least in theory a better choice from a performance point of view).
2012-10-15 08:12:25 -04:00
Andreas Hansson
93a159875a Fix: Address a few minor issues identified by cppcheck
This patch addresses a number of smaller issues identified by the code
inspection utility cppcheck. There are a number of identified leaks in
the arm/linux/system.cc (although the function only get's called once
so it is not a major problem), a few deletes in dev/x86/i8042.cc that
were not array deletes, and sprintfs where the character array had one
element less than needed. In the IIC tags there was a function
allocating an array of longs which is in fact never used.
2012-10-15 08:12:23 -04:00
Andreas Hansson
88554790c3 Mem: Use cycles to express cache-related latencies
This patch changes the cache-related latencies from an absolute time
expressed in Ticks, to a number of cycles that can be scaled with the
clock period of the caches. Ultimately this patch serves to enable
future work that involves dynamic frequency scaling. As an immediate
benefit it also makes it more convenient to specify cache performance
without implicitly assuming a specific CPU core operating frequency.

The stat blocked_cycles that actually counter in ticks is now updated
to count in cycles.

As the timing is now rounded to the clock edges of the cache, there
are some regressions that change. Plenty of them have very minor
changes, whereas some regressions with a short run-time are perturbed
quite significantly. A follow-on patch updates all the statistics for
the regressions.
2012-10-15 08:10:54 -04:00
Andreas Hansson
1c321b8847 Regression: Use CPU clock and 32-byte width for L1-L2 bus
This patch changes the CoherentBus between the L1s and L2 to use the
CPU clock and also four times the width compared to the default
bus. The parameters are not intending to fit every single scenario,
but rather serve as a better startingpoint than what we previously
had.

Note that the scripts that do not use the addTwoLevelCacheHiearchy are
not affected by this change.

A separate patch will update the stats.
2012-10-15 08:08:08 -04:00
Andreas Hansson
930db9257d Clock: Inherit the clock from parent by default
This patch changes the default 1 Tick clock period to a proxy that
resolves the parents clock. As a result of this, the caches and
L1-to-L2 bus, for example, will automatically use the clock period of
the CPU unless explicitly overridden.

To ensure backwards compatibility, the System class overrides the
proxy and specifies a 1 Tick clock. We could change this to something
more reasonable in a follow-on patch, perhaps 1 GHz or something
similar.

With this patch applied, all clocked objects should have a reasonable
clock period set, and could start specifying delays in Cycles instead
of absolute time.
2012-10-15 08:07:07 -04:00
Andreas Hansson
8cc503f1dd Param: Fix proxy traversal to support chained proxies
This patch modifies how proxies are traversed and unproxied to allow
chained proxies. The issue that is solved manifested itself when a
proxy during its evaluation ended up being hitting another proxy, and
the second one got evaluated using the object that was originally used
for the first proxy.

For a more tangible example, see the following patch on making the
default clock being inherited from the parent. In this patch, the CPU
clock is a proxy Parent.clock, which is overridden in the system to be
an actual value. This all works fine, but the AlphaLinuxSystem has a
boot_cpu_frequency parameter that is Self.cpu[0].clock.frequency. When
the latter is evaluated, it all happens relative to the current object
of the proxy, i.e. the system. Thus the cpu.clock is evaluated as
Parent.clock, but using the system rather than the cpu as the object
to enquire.
2012-10-15 08:07:06 -04:00
Andreas Hansson
36d199b9a9 Mem: Use range operations in bus in preparation for striping
This patch transitions the bus to use the AddrRange operations instead
of directly accessing the start and end. The change facilitates the
move to a more elaborate AddrRange class that also supports address
striping in the bus by specifying interleaving bits in the ranges.

Two new functions are added to the AddrRange to determine if two
ranges intersect, and if one is a subset of another. The bus
propagation of address ranges is also tweaked such that an update is
only propagated if the bus received information from all the
downstream slave modules. This avoids the iteration and need for the
cycle-breaking scheme that was previously used.
2012-10-15 08:07:04 -04:00
Andreas Hansson
43ca8415e8 Mem: Determine bus block size during initialisation
This patch moves the block size computation from findBlockSize to
initialisation time, once all the neighbouring ports are connected.

There is no need to dynamically update the block size, and the caching
of the value effectively avoided that anyhow. This is very similar to
what was already in place, just with a slightly leaner implementation.
2012-10-11 06:38:43 -04:00
Andreas Hansson
5dba9225f7 Doxygen: Update the version of the Doxyfile
This patch bumps the Doxyfile to match more recent versions of
Doxygen. The sections that are deprecated have been removed, and the
new ones added. The project name has also been updated.
2012-10-11 06:38:42 -04:00
Nilay Vaish
88ba1c452b ruby: makes some members non-static
This patch makes some of the members (profiler, network, memory vector)
of ruby system non-static.
2012-10-02 14:35:45 -05:00
Nilay Vaish
4488379244 ruby: changes to simple network
This patch makes the Switch structure inherit from BasicRouter, as is
done in two other networks.
2012-10-02 14:35:45 -05:00
Nilay Vaish
b370f6a7b2 ruby: rename template_hack to template
I don't like using the word hack. Hence, the patch.
2012-10-02 14:35:44 -05:00
Nilay Vaish
d58f84c481 ruby: remove unused code in protocols 2012-10-02 14:35:44 -05:00
Nilay Vaish
73eafe4849 ruby: remove some unused things in slicc
This patch removes the parts of slicc that were required for multi-chip
protocols. Going ahead, it seems multi-chip protocols would be implemented
by playing with the network itself.
2012-10-02 14:35:43 -05:00
Nilay Vaish
3c9d3b16d8 ruby: move functional access to ruby system
This patch moves the code for functional accesses to ruby system. This is
because the subsequent patches add support for making functional accesses
to the messages in the interconnect. Making those accesses from the ruby port
would be cumbersome.
2012-10-02 14:35:42 -05:00
Nilay Vaish
95664da097 MI coherence protocol: add copyright notice 2012-09-30 13:20:53 -05:00
Djordje Kovacevic
80a26a3e39 MEM: Put memory system document into doxygen 2012-09-25 11:49:41 -05:00
Mrinmoy Ghosh
6fc0094337 Cache: add a response latency to the caches
In the current caches the hit latency is paid twice on a miss. This patch lets
a configurable response latency be set of the cache for the backward path.
2012-09-25 11:49:41 -05:00
Sascha Bischoff
74ab69c7ea Statistics: Add a function to configure periodic stats dumping
This patch adds a function, periodicStatDump(long long period), which will dump
and reset the statistics every period. This function is designed to be called
from the python configuration scripts. This allows the periodic stats dumping to
be configured more easilly at run time.

The period is currently specified as a long long as there are issues passing
Tick into the C++ from the python as they have conflicting definitions. If the
period is less than curTick, the first occurance occurs at curTick. If the
period is set to 0, then the event is descheduled and the stats are not
periodically dumped.

Due to issues when resumung from a checkpoint, the StatDump event must be moved
forward such that it occues AFTER the current tick. As the function is called
from the python, the event is scheduled before the system resumes from the
checkpoint. Therefore, the event is moved using the updateEvents() function.
This is called from simulate.py once the system has resumed from the checkpoint.

NOTE: It should be noted that this is a fairly temporary patch which re-adds the
capability to extract temporal information  from the communication monitors. It
should not be used at the same time as anything that relies on dumping the
statistics based on in simulation events i.e. a context switch.
2012-09-25 11:49:41 -05:00
Dam Sunwoo
acbb7a2eed ARM: added support for flattened device tree blobs
Newer Linux kernels require DTB (device tree blobs) to specify platform
configurations. The input DTB filename can be specified through gem5 parameters
in LinuxArmSystem.
2012-09-25 11:49:41 -05:00
Ali Saidi
5adb4ddc12 O3: Pack the comm structures a bit better to reduce their size. 2012-09-25 11:49:40 -05:00
Ali Saidi
396600de10 mem: Add a gasket that allows memory ranges to be re-mapped.
For example if DRAM is at two locations and mirrored this patch allows the
mirroring to occur.
2012-09-25 11:49:40 -05:00
Ali Saidi
0c99d21ad7 ARM: Squash outstanding walks when instructions are squashed. 2012-09-25 11:49:40 -05:00
Andreas Sandberg
6f603e0807 arm: Use a static_assert to test that miscRegName[] is complete
Instead of statically defining miscRegName to contain NUM_MISCREGS
elements, let the compiler determine the length of the array. This
allows us to use a static_assert to test that all registers are listed
in the name vector.
2012-09-25 11:49:40 -05:00
Andreas Sandberg
4544f3def4 base: Check for static_assert support and provide fallback
C++11 has support for static_asserts to provide compile-time assertion
checking. This is very useful when testing, for example, structure
sizes to make sure that the compiler got the right alignment or vector
sizes.
2012-09-25 11:49:40 -05:00
Andreas Sandberg
6598241f2c sim: Move CPU-specific methods from SimObject to the BaseCPU class 2012-09-25 11:49:40 -05:00
Andreas Sandberg
5f32eceeda sim: Remove SimObject::setMemoryMode
Remove SimObject::setMemoryMode from the main SimObject class since it
is only valid for the System class. In addition to removing the method
from the C++ sources, this patch also removes getMemoryMode and
changeTiming from SimObject.py and updates the simulation code to call
the (get|set)MemoryMode method on the System object instead.
2012-09-25 11:49:40 -05:00
Djordje Kovacevic
d060a28a29 CPU: Add abandoned instructions to O3 Pipe Viewer 2012-09-25 11:49:40 -05:00
Nathanael Premillieu
bfffbb6797 ARM: Inst writing to cntrlReg registers not set as control inst
Deletion of the fact that instructions that writes to registers of type
"cntrlReg" are not set as control instruction (flag IsControl not set).
2012-09-25 11:49:40 -05:00
Ali Saidi
04ca96427c ARM: Predict target of more instructions that modify PC. 2012-09-25 11:49:40 -05:00
Andreas Sandberg
1b29352dd5 build: Add missing dependencies when building param SWIG interfaces
This patch adds an explicit dependency between param_%s.i and the
Python source file defining the object. Previously, the build system
didn't rebuild SWIG interfaces correctly when an object's Python
sources were updated.
2012-09-25 11:49:40 -05:00
Joel Hestness
4095af5fd6 RubyPort and Sequencer: Fix draining
Fix the drain functionality of the RubyPort to only call drain on child ports
during a system-wide drain process, instead of calling each time that a
ruby_hit_callback is executed.

This fixes the issue of the RubyPort ports being reawakened during the drain
simulation, possibly with work they didn't previously have to complete. If
they have new work, they may call process on the drain event that they had
not registered work for, causing an assertion failure when completing the
drain event.

Also, in RubyPort, set the drainEvent to NULL when there are no events
to be drained. If not set to NULL, the drain loop can result in stale
drainEvents used.
2012-09-23 13:57:08 -05:00
Andreas Hansson
3b6a143ec5 DRAM: Introduce SimpleDRAM to capture a high-level controller
This patch introduces a high-level model of a DRAM controller, with a
basic read/write buffer structure, a selectable and customisable
arbiter, a few address mapping options, and the basic DRAM timing
constraints. The parameters make it possible to turn this model into
any desired DDRx/LPDDRx/WideIOx memory controller.

The intention is not to be cycle accurate or capture every aspect of a
DDR DRAM interface, but rather to enable exploring of the high-level
knobs with a good simulation speed. Thus, contrary to e.g. DRAMSim
this module emphasizes simulation speed with a good-enough accuracy.

This module is merely a starting point, and there are plenty additions
and improvements to come. A notable addition is the support for
address-striping in the bus to enable a multi-channel DRAM
controller. Also note that there are still a few "todo's" in the code
base that will be addressed as we go along.

A follow-up patch will add basic performance regressions that use the
traffic generator to exercise a few well-defined corner cases.
2012-09-21 11:48:13 -04:00
Andreas Hansson
d75b1b5a73 TrafficGen: Add a basic traffic generator
This patch adds a traffic generator to the code base. The generator is
aimed to be used as a black box model to create appropriate use-cases
and benchmarks for the memory system, and in particular the
interconnect and the memory controller.

The traffic generator is a master module, where the actual behaviour
is captured in a state-transition graph where each state generates
some sort of traffic. By constructing a graph it is possible to create
very elaborate scenarios from basic generators. Currencly the set of
generators include idling, linear address sweeps, random address
sequences and playback of traces (recording will be done by the
Communication Monitor in a follow-up patch). At the moment the graph
and the states are described in an ad-hoc line-based format, and in
the future this should be aligned with our used of e.g. the Google
protobufs. Similarly for the traces, the format is currently a
simplistic ad-hoc line-based format that merely serves as a starting
point.

In addition to being used as a black-box model for system components,
the traffic generator is also useful for creating test cases and
regressions for the interconnect and memory system. In future patches
we will use the traffic generator to create DRAM test cases for the
controller model.

The patch following this one adds a basic regressions which also
contains an example configuration script and trace file for playback.
2012-09-21 11:48:08 -04:00
Andreas Hansson
4aee3aa073 Mem: Tidy up bus member variables types
This patch merely tidies up the types used for the bus member
variables. It also makes the constant ones const.
2012-09-21 10:11:24 -04:00
Lluc Alvarez
c8de765468 SE: Ignore FUTEX_PRIVATE_FLAG of sys_futex
This patch ignores the FUTEX_PRIVATE_FLAG of the sys_futex system call
in SE mode.

With this patch, when sys_futex with the options FUTEX_WAIT_PRIVATE or
FUTEX_WAKE_PRIVATE is emulated, the FUTEX_PRIVATE_FLAG is ignored and
so their behaviours are the regular FUTEX_WAIT and FUTEX_WAKE.

Emulating FUTEX_WAIT_PRIVATE and FUTEX_WAKE_PRIVATE as if they were
non-private is safe from a functional point of view. The
FUTEX_PRIVATE_FLAG does not change the semantics of the futex, it's
just a mechanism to improve performance under certain circunstances
that can be ignored in SE mode.
2012-09-21 04:51:18 -04:00
Anthony Gutierrez
9cd0c5ecc8 bus: removed outdated warn regarding 64 B block sizes
this warn is outdated as 64 B blocks are very common, and even
the default size for some CPU types. E.g., arm_detailed.
2012-09-20 17:25:52 -04:00
Andreas Hansson
a731f8f9dd Mem: Remove the file parameter from AbstractMemory
This patch removes the unused file parameter from the
AbstractMemory. The patch serves to make it easier to transition to a
separation of the actual contigious host memory backing store, and the
gem5 memory controllers.

Without the file parameter it becomes easier to hide the creation of
the mmap in the PhysicalMemory, as there are no longer any reasons to
expose the actual contigious ranges to the user.

To the best of my knowledge there is no use of the parameter, so the
change should not affect anyone.
2012-09-19 06:15:46 -04:00
Andreas Hansson
ffb6aec603 AddrRange: Transition from Range<T> to AddrRange
This patch takes the final plunge and transitions from the templated
Range class to the more specific AddrRange. In doing so it changes the
obvious Range<Addr> to AddrRange, and also bumps the range_map to be
AddrRangeMap.

In addition to the obvious changes, including the removal of redundant
includes, this patch also does some house keeping in preparing for the
introduction of address interleaving support in the ranges. The Range
class is also stripped of all the functionality that is never used.

--HG--
rename : src/base/range.hh => src/base/addr_range.hh
rename : src/base/range_map.hh => src/base/addr_range_map.hh
2012-09-19 06:15:44 -04:00
Andreas Hansson
c34df76272 AddrRange: Simplify Range by removing stream input/output
This patch simplifies the Range class in preparation for the
introduction of a more specific AddrRange class that allows
interleaving/striping.

The only place where the parsing was used was in the unit test.
2012-09-19 06:15:43 -04:00
Andreas Hansson
12c291f9d7 AddrRange: Remove unused range_multimap
This patch simply removes the unused range_multimap in preparation for
a more specific AddrRangeMap that also allows interleaving in addition
to pure ranges.
2012-09-19 06:15:42 -04:00
Andreas Hansson
fccbf8bb45 AddrRange: Simplify AddrRange params Python hierarchy
This patch simplifies the Range object hierarchy in preparation for an
address range class that also allows striping (e.g. selecting a few
bits as matching in addition to the range).

To extend the AddrRange class to an AddrRegion, the first step is to
simplify the hierarchy such that we can make it as lean as possible
before adding the new functionality. The only class using Range and
MetaRange is AddrRange, and the three classes are now collapsed into
one.
2012-09-19 06:15:41 -04:00
Nilay Vaish
33c904e0a5 ruby: eliminate typedef integer_t 2012-09-18 22:49:12 -05:00
Nilay Vaish
86b1c0fd54 ruby: avoid using g_system_ptr for event scheduling
This patch removes the use of g_system_ptr for event scheduling. Each consumer
object now needs to specify upfront an EventManager object it would use for
scheduling events. This makes the ruby memory system more amenable for a
multi-threaded simulation.
2012-09-18 22:46:34 -05:00
Andreas Hansson
7c55464aac Mem: Add a maximum bandwidth to SimpleMemory
This patch makes a minor addition to the SimpleMemory by enforcing a
maximum data rate. The bandwidth is configurable, and a reasonable
value (12.8GB/s) has been choosen as the default.

The changes do add some complexity to the SimpleMemory, but they
should definitely be justifiable as this enables a far more realistic
setup using even this simple memory controller.

The rate regulation is done for reads and writes combined to reflect
the bidirectional data busses used by most (if not all) relevant
memories. Moreover, the regulation is done per packet as opposed to
long term, as it is the short term data rate (data bus width times
frequency) that is the limiting factor.

A follow-up patch bumps the stats for the regressions.
2012-09-18 10:30:02 -04:00
Andreas Hansson
d1f3a3b91a gcc: Enable Link-Time Optimization for gcc >= 4.6
This patch adds Link-Time Optimization when building the fast target
using gcc >= 4.6, and adds a scons flag to disable it (-no-lto). No
check is performed to guarantee that the linker supports LTO and use
of the linker plugin, so the user has to ensure that binutils GNU ld
>= 2.21 or the gold linker is available. Typically, if gcc >= 4.6 is
available, the latter should not be a problem. Currently the LTO
option is only useful for gcc >= 4.6, due to the limited support on
clang and earlier versions of gcc. The intention is to also add
support for clang once the LTO integration matures.

The same number of jobs is used for the parallel phase of LTO as the
jobs specified on the scons command line, using the -flto=n flag that
was introduced with gcc 4.6. The gold linker also supports concurrent
and incremental linking, but this is not used at this point.

The compilation and linking time is increased by almost 50% on
average, although ARM seems to be particularly demanding with an
increase of almost 100%. Also beware when using this as gcc uses a
tremendous amount of memory and temp space in the process. You have
been warned.

After some careful consideration, and plenty discussions, the flag is
only added to the fast target, and the warning that was issued in an
earlier version of this patch is now removed. Similarly, the flag used
to enable LTO, now the default is to use it, and the flag has been
modified to disable LTO. The rationale behind this decision is that
opt is used for development, whereas fast is only used for long runs,
e.g. regressions or more elaborate experiments where the additional
compile and link time is amortized by a much larger run time.

When it comes to the return on investment, the regression seems to be
roughly 15% faster with LTO. For a bit more detail, I ran twolf on
ARM.fast, with three repeated runs, and they all finish within 42
minutes (+- 25 seconds) without LTO and 31 minutes (+- 25 seconds)
with LTO, i.e. LTO gives an impressive >25% speed-up for this case.

Without LTO (ARM.fast twolf)

real	42m37.632s
user	42m34.448s
sys	0m0.390s

real	41m51.793s
user	41m50.384s
sys	0m0.131s

real	41m45.491s
user	41m39.791s
sys	0m0.139s

With LTO (ARM.fast twolf)

real	30m33.588s
user	30m5.701s
sys	0m0.141s

real	31m27.791s
user	31m24.674s
sys	0m0.111s

real	31m25.500s
user	31m16.731s
sys	0m0.106s
2012-09-14 12:13:22 -04:00
Andreas Hansson
a57eda0843 scons: Add a target for google-perftools profiling
This patch adds a new target called 'perf' that facilitates profiling
using google perftools rather than gprof. The perftools CPU profiler
offers plenty useful information in addition to gprof, and the latter
is kept mostly to offer profiling also on non-Linux hosts.
2012-09-14 12:13:21 -04:00
Andreas Hansson
224ea5fba6 scons: Restructure ccflags and ldflags
This patch restructures the ccflags such that the common parts are
defined in a single location, also capturing all the target types in a
single place.

The patch also adds a corresponding ldflags in preparation for
google-perf profiling support and the addition of Link-Time
Optimization.
2012-09-14 12:13:20 -04:00
Andreas Hansson
806a1144ce scons: Use c++0x with gcc >= 4.4 instead of 4.6
This patch shifts the version of gcc for which we enable c++0x from
4.6 to 4.4 The more long term plan is to see what the c++0x features
can bring and what level of support would be enabled simply by bumping
the required version of gcc from 4.3 to 4.4.

A few minor things had to be fixed in the code base, most notably the
choice of a hashmap implementation. In the Ruby Sequencer there were
also a few minor issues that gcc 4.4 was not too happy about.
2012-09-14 12:13:18 -04:00
Joel Hestness
234fa4cf7e Standard Switch: Drain the system before switching CPUs
When switching from an atomic CPU to any of the timing CPUs, a drain is
unnecessary since no events are scheduled in atomic mode. However, when
trying to switch CPUs starting with a timing CPU, there may be events
scheduled. This change ensures that all events are drained from the system
by calling m5.drain before switching CPUs.
2012-09-12 21:41:37 -05:00
Joel Hestness
16dcb723c1 Base CPU: Initialize profileEvent to NULL
The profileEvent pointer is tested against NULL in various places, but
it is not initialized unless running in full-system mode. In SE mode, this
can result in segmentation faults when profileEvent default intializes to
something other than NULL.
2012-09-12 21:40:28 -05:00
Jason Power
aa8bcd15ec Ruby: Modify Scons so that we can put .sm files in extras
Also allows for header files which are required in slicc generated
code to be in a directory other than src/mem/ruby/slicc_interface.
2012-09-12 14:52:04 -05:00
Anthony Gutierrez
c6927ed138 stats: remove duplicate instruction stats from the commit stage
these stats are duplicates of insts/opsCommitted, cause
confusion, and are poorly named.
2012-09-12 11:35:52 -04:00
Andreas Hansson
292d8252a4 clang: Fix issues identified by the clang static analyzer
This patch addresses a few minor issues reported by the clang static
analyzer.

The analysis was run with:

scan-build -disable-checker deadcode \
           -enable-checker experimental.core \
           -disable-checker experimental.core.CastToStruct \
           -enable-checker experimental.cpluscplus
2012-09-11 14:15:47 -04:00
Lena Olson
584eba3ab6 Cache: Split invalidateBlk up to seperate block vs. tags
This seperates the functionality to clear the state in a block into
blk.hh and the functionality to udpate the tag information into the
tags.  This gets rid of the case where calling invalidateBlk on an
already-invalid block does something different than calling it on a
valid block, which was confusing.
2012-09-11 14:14:49 -04:00
Nilay Vaish
f47c2f6415 X86: make use of register predication
The patch introduces two predicates for condition code registers -- one
tests if a register needs to be read, the other tests whether a register
needs to be written to. These predicates are evaluated twice -- during
construction of the microop and during its execution. Register reads
and writes are elided depending on how the predicates evaluate.
2012-09-11 09:33:42 -05:00
Nilay Vaish
6369df59c8 x86: Add a separate register for D flag bit
The D flag bit is part of the cc flag bit register currently. But since it
is not being used any where in the implementation, it creates an unnecessary
dependency. Hence, it is being moved to a separate register.
2012-09-11 09:25:43 -05:00
Nilay Vaish
3700e5448a ISA Parser: Allow predication of source and destination registers
This patch is meant for allowing predicated reads and writes. Note that this
predication is different from the ISA provided predication. They way we
currently provide the ISA description for X86, we read/write registers that
do not need to be actually read/written. This is likely to be true for other
ISAs as well. This patch allows for read and write predicates to be associated
with operands. It allows for the register indices for source and destination
registers to be decided at the time when the microop is constructed. The
run time indicies come in to play only when the at least one of the
predicates has been provided. This patch will not affect any of the ISAs that
do not provide these predicates. Also the patch assumes that the order in
which operands appear in any function of the microop is same across all the
functions of the microops. A subsequent patch will enable predication for the
x86 ISA.
2012-06-03 10:59:04 -05:00
Nilay Vaish
637c6c7e32 Ruby: Use uint32_t instead of uint32 everywhere 2012-09-11 09:24:45 -05:00
Nilay Vaish
f00347a20f Ruby: Use uint8_t instead of uint8 everywhere 2012-09-11 09:23:56 -05:00
Nilay Vaish
c5bf1390aa Ruby System: Convert to Clocked Object
This patch moves Ruby System from being a SimObject to recently introduced
ClockedObject.
2012-09-10 12:21:01 -05:00
Nilay Vaish
4e6f048ef0 Ruby Slicc: remove the call to cin.get() function
If I understand correctly, this was put in place so that a debugger can be
attached when the protocol aborts. While this sounds useful, it is a problem
when the simulation is not being actively monitored. I think it is better to
remove this.
2012-09-10 12:20:34 -05:00
Marco Elver
9e0edbcea8 Mem: Allow serializing of more than INT_MAX bytes
Despite gzwrite taking an unsigned for length, it returns an int for
bytes written; gzwrite fails if (int)len < 0.  Because of this, call
gzwrite with len no larger than INT_MAX: write in blocks of INT_MAX if
data to be written is larger than INT_MAX.
2012-09-10 11:57:43 -04:00
Palle Lyckegaard
21d4d50ba1 NetBSD: Build on NetBSD
Minor patch against so building on NetBSD is possible.
2012-09-10 11:57:42 -04:00
Andreas Hansson
3215ed9754 AddrRange: Remove the unused range_ops header
This patch prunes the range_ops header that is no longer used. The
bridge used it to do filtering of address ranges, but this is changed
since quite some time.

Ultimately this patch aims to simplify the handling of ranges before
specialising the AddrRange to an AddrRegion that also allows striping
bits to be selected.
2012-09-10 11:57:40 -04:00
Andreas Hansson
1f9c3bcb46 Inet: Remove the SackRange and its use
This patch aims to simplify the use of the Range class before
introducing a more elaborate AddrRegion to replace the AddrRange. The
SackRange is the only use of the range class besides address ranges,
and the removal of this use makes for an easier modification of the
range class.

The functionlity that is removed with this patch is not used anywhere
throughout the code base.
2012-09-10 11:57:39 -04:00
Andreas Hansson
cf5935445f Device: Bump PIO and PCI latencies to more reasonable values
This patch addresses a previously highlighted issue with the default
latencies used for PIO and PCI devices. The values are merely educated
guesses and might not represent the particular system you want to
model. However, the values in this patch are definitely far more
realistic than the previous ones.

In i8254xGBe, the writeConfig method is updated to use configDelay
instead of pioDelay.

A follow-up patch will update the regression stats.
2012-09-10 11:57:36 -04:00
Andreas Sandberg
d4a6d9846a sim: Update the SimObject documentation
Includes a small change in sim_object.cc that adds the name space to
the output stream parameter in serializeAll. Leaving out the name
space unfortunately confuses Doxygen.
2012-09-07 14:20:53 -05:00
Andreas Sandberg
2f397f314b sim: Remove the unused SimObject::regFormulas method
Simulation objects normally register derived statistics, presumably
what regFormulas originally was meant for, in regStats(). This patch
removes regRegformulas since there is no need to have a separate
method call to register formulas.
2012-09-07 14:20:53 -05:00
Ali Saidi
03ff612054 O3: Get rid of incorrect assert in RAS. 2012-09-07 14:20:53 -05:00
Ali Saidi
2059c01673 dev: Fix bifield definition in timer_cpulocal.hh
Bitfield definition in the local timer model for ARM had the bitfield
range numbers reversed which could lead to buggy behavior.
2012-09-07 14:20:53 -05:00
Ali Saidi
5217d5a451 Igbe: Newer kernels seem to allow TSO headers and packet data to be in one desc
Implement some code we used to panic on as it actually does happen with the
e1000 driver in Linux 3.3+. We used to assume that a TSO header would never
be part of a larger payload, however it appears as though it now can be.
2012-09-07 14:20:53 -05:00
Krishnendra Nathella
3f5ee1cf8c sim: add validation to make sure there is memory where we're loading the kernel 2012-09-07 14:20:53 -05:00
Ali Saidi
3742b19b36 loader: initialize all memory in the ObjectFile objects.
Some bare metal build flows seem to build binaries that we aren't necessarily
expecting. Initialize everything to 0, so we don't make any assumptions about
what is or isn't in the binary.
2012-09-07 14:20:52 -05:00
Ali Saidi
8fc0cef611 ARM: Fix one of the timers used in the VExpress EMM platform. 2012-09-07 14:20:52 -05:00
Andreas Hansson
287ea1a081 Param: Transition to Cycles for relevant parameters
This patch is a first step to using Cycles as a parameter type. The
main affected modules are the CPUs and the Ruby caches. There are
definitely plenty more places that are affected, but this patch serves
as a starting point to making the transition.

An important part of this patch is to actually enable parameters to be
specified as Param.Cycles which involves some changes to params.py.
2012-09-07 12:34:38 -04:00
Joel Hestness
6924e10978 Ruby Memory Controller: Fix clocking 2012-09-05 20:51:41 -05:00
Jason Power
494f6a858e Ruby: Correct DataBlock =operator
The =operator for the DataBlock class was incorrectly interpreting the class
member m_alloc. This variable stands for whether the assigned memory for the
data block needs to be freed or not by the class itself. It seems that the
=operator interpreted the variable as whether the memory is assigned to the
data block. This wrong interpretation was causing values not to propagate
to RubySystem::m_mem_vec_ptr. This caused major issues with restoring from
checkpoints when using a protocol which verified that the cache data was
consistent with the backing store (i.e. MOESI-hammer).
2012-08-28 17:57:51 -05:00
Andreas Hansson
0cacf7e817 Clock: Add a Cycles wrapper class and use where applicable
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.

Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.

This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.

In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.

An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
2012-08-28 14:30:33 -04:00
Andreas Hansson
d53d04473e Clock: Rework clocks to avoid tick-to-cycle transformations
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.

In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.

Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.

As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
2012-08-28 14:30:31 -04:00
Andreas Hansson
d14e5857c7 Port: Stricter port bind/unbind semantics
This patch tightens up the semantics around port binding and checks
that the ports that are being bound are currently not connected, and
similarly connected before unbind is called.

The patch consequently also changes the order of the unbind and bind
for the switching of CPUs to ensure that the rules are adhered
to. Previously the ports would be "over-written" without any check.

There are no changes in behaviour due to this patch, and the only
place where the unbind functionality is used is in the CPU.
2012-08-28 14:30:27 -04:00
Andreas Hansson
105ad88d35 Checker: Fix checker CPU ports
This patch updates how the checker CPU handles the ports such that the
regressions will once again run without causing a panic.

A minor amount of tidying up was also done as part of this patch.
2012-08-28 14:30:24 -04:00
Andreas Hansson
d090f4d930 swig: Disable unused value warning with llvm 3.1 compilers
This patch disables a warning for unused values which causes problems
when compiling the swig-generated sources using recent llvm-based
compilers like llvm-gcc and clang.
2012-08-28 14:30:22 -04:00
Anthony Gutierrez
5b1614de02 sim: fix overflow check in simulate because Tick is now unsigned 2012-08-27 20:53:20 -04:00
Nilay Vaish
85c7352462 Ruby: remove README.debugging and Decommissioning_note
These files were relevant when Ruby was part of GEMS. They are not required
any longer.
2012-08-27 14:57:46 -05:00
Nilay Vaish
0737837109 System: Remove redundant call to startupCPU 2012-08-27 01:14:46 -05:00
Nilay Vaish
9190940511 Ruby: Remove RubyEventQueue
This patch removes RubyEventQueue. Consumer objects now rely on RubySystem
or themselves for scheduling events.
2012-08-27 01:00:55 -05:00
Nilay Vaish
7122b83d8f Ruby Memory Vector: Allow more than 4GB of memory
The memory size variable was a 32-bit int. This meant that the size of the
memory was limited to 4GB. This patch changes the type of the variable to
64-bit to support larger memory sizes. Thanks to Raghuraman Balasubramanian
for bringing this to notice.
2012-08-27 01:00:54 -05:00
Nilay Vaish
b422994fea MESI Protocol: Correct the virtual network in profile functions
The virtual network in a couple of places was incorrectly mentioned
as 3 in place of 1. This is being corrected.
2012-08-25 15:49:06 -05:00
Nilay Vaish
01f1430833 MESI Coherence Protocol: Add copyright notice 2012-08-25 13:16:45 -05:00
Andreas Hansson
2c1052cd4d DMA: Refactor the DMA device and align timing and atomic
This patch does a bunch of house-keeping updates on the DMA, including
indentation, and formatting, but most importantly breaks out the
response handling such that it can be shared between the atomic and
timing modes. It also removes a potential bug caused by the atomic
handling of responses only deleting the allocated request (pkt->req)
once the DMA action completes instead of doing so for every packet.

Before this patch, the handling of responses was near identical for
atomic and timing, but the code was simply duplicated. With this
patch, the handleResp method deals with the responses in both cases.

There are further updates to make after removing the NACKs, but that
will be part of a separate follow-up patch. This patch does not change
the behaviour of any regression.
2012-08-22 11:40:01 -04:00
Andreas Hansson
c60db56741 Packet: Remove NACKs from packet and its use in endpoints
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).

The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.

Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
2012-08-22 11:39:59 -04:00
Andreas Hansson
a6074016e2 Bridge: Remove NACKs in the bridge and unify with packet queue
This patch removes the NACKing in the bridge, as the split
request/response busses now ensure that protocol deadlocks do not
occur, i.e. the message-dependency chain is broken by always allowing
responses to make progress without being stalled by requests. The
NACKs had limited support in the system with most components ignoring
their use (with a suitable call to panic), and as the NACKs are no
longer needed to avoid protocol deadlocks, the cleanest way is to
simply remove them.

The bridge is the starting point as this is the only place where the
NACKs are created. A follow-up patch will remove the code that deals
with NACKs in the endpoints, e.g. the X86 table walker and DMA
port. Ultimately the type of packet can be complete removed (until
someone sees a need for modelling more complex protocols, which can
now be done in parts of the system since the port and interface is
split).

As a consequence of the NACK removal, the bridge now has to send a
retry to a master if the request or response queue was full on the
first attempt. This change also makes the bridge ports very similar to
QueuedPorts, and a later patch will change the bridge to use these. A
first step in this direction is taken by aligning the name of the
member functions, as done by this patch.

A bit of tidying up has also been done as part of the simplifications.

Surprisingly, this patch has no impact on any of the
regressions. Hence, there was never any NACKs issued. In a follow-up
patch I would suggest changing the size of the bridge buffers set in
FSConfig.py to also test the situation where the bridge fills up.
2012-08-22 11:39:58 -04:00
Andreas Hansson
e317d8b9ff Port: Extend the QueuedPort interface and use where appropriate
This patch extends the queued port interfaces with methods for
scheduling the transmission of a timing request/response. The methods
are named similar to the corresponding sendTiming(Snoop)Req/Resp,
replacing the "send" with "sched". As the queues are currently
unbounded, the methods always succeed and hence do not return a value.

This functionality was previously provided in the subclasses by
calling PacketQueue::schedSendTiming with the appropriate
parameters. With this change, there is no need to introduce these
extra methods in the subclasses, and the use of the queued interface
is more uniform and explicit.
2012-08-22 11:39:56 -04:00
Andreas Hansson
70e99e0b91 Device: Remove overloaded pio_latency parameter
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.

The PciConfigAll now also uses a Param.Latency rather than a
Param.Tick. For backwards compatibility it still sets the pio_latency
to 1 tick. All the comments have also been updated to not state that
it is in simticks when it is not necessarily the case.
2012-08-21 05:50:03 -04:00
Andreas Hansson
a81c969529 CPU: Remove overloaded function_trace_start parameter
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.

The inorder CPU is particularly interesting as it uses a different
name for the parameter, and never make any use of it internally.
2012-08-21 05:49:43 -04:00
Andreas Hansson
5803309574 PacketQueue: Allow queuing in the same tick as desired send tick
This patch allows packets to be enqueued in the same tick as they are
intended to be sent. This does not imply they actually are sent that
tick, although that is possible.

This change is useful for module that use the queued ports primarly to
avoid handling the flow control involved in sending and retrying
packets.
2012-08-21 05:49:24 -04:00
Andreas Hansson
4be1ae3cf8 EventManager: Remove test for NULL pointer in constructor
This patch tidies up the EventManager constructor and prunes a corner
case where the EventManager would initialise its eventq pointer to
NULL. This would cause segmentation faults on actual use and should
never happen.
2012-08-21 05:49:18 -04:00
Andreas Hansson
016593f2e9 Clock: Make Tick unsigned and remove UTick
This patch makes the Tick unsigned and removes the UTick typedef. The
ticks should never be negative, and there was only one major issue
with removing it, caused by the o3 CPU using a -1 as an initial value.

The patch has no impact on any regressions.
2012-08-21 05:49:09 -04:00
Andreas Hansson
452217817f Clock: Move the clock and related functions to ClockedObject
This patch moves the clock of the CPU, bus, and numerous devices to
the new class ClockedObject, that sits in between the SimObject and
MemObject in the class hierarchy. Although there are currently a fair
amount of MemObjects that do not make use of the clock, they
potentially should do so, e.g. the caches should at some point have
the same clock as the CPU, potentially with a 1:n ratio. This patch
does not introduce any new clock objects or object hierarchies
(clusters, clock domains etc), but is still a step in the direction of
having a more structured approach clock domains.

The most contentious part of this patch is the serialisation of clocks
that some of the modules (but not all) did previously. This
serialisation should not be needed as the clock is set through the
parameters even when restoring from the checkpoint. In other words,
the state is "stored" in the Python code that creates the modules.

The nextCycle methods are also simplified and the clock phase
parameter of the CPU is removed (this could be part of a clock object
once they are introduced).
2012-08-21 05:49:01 -04:00