Commit graph

6395 commits

Author SHA1 Message Date
Andrew Bardsley f7d80348fa arm: Add branch flags onto macroops
Mark branch flags onto macroops to allow branch prediction before
microop decomposition
2014-05-09 18:58:47 -04:00
Andrew Bardsley eab00f4966 cpu: Allow setWhen on trace objects
Allow setting of 'when' in trace records.  This allows later times
than the arbitrary record creation point to be used as inst. times
2014-05-09 18:58:47 -04:00
Curtis Dunham af39ab297f arm: add preliminary ISA splits for ARM arch 2014-05-09 18:58:47 -04:00
Curtis Dunham fe27f937aa arch: teach ISA parser how to split code across files
This patch encompasses several interrelated and interdependent changes
to the ISA generation step.  The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units.  I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner.  It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type.  Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths.  In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
2014-05-09 18:58:47 -04:00
Geoffrey Blake 0c1913336a config: Avoid generating a reference to myself for Parent.any
The unproxy code for Parent.any can generate a circular reference
in certain situations with classes hierarchies like those in ClockDomain.py.
This patch solves this by marking ouself as visited to make sure the
search does not resolve to a self-reference.
2014-05-09 18:58:47 -04:00
Geoffrey Blake 85940fd537 arch, arm: Preserve TLB bootUncacheability when switching CPUs
The ARM TLBs have a bootUncacheability flag used to make some loads
and stores become uncacheable when booting in FS mode. Later the
flag is cleared to let those loads and stores operate as normal.  When
doing a takeOverFrom(), this flag's state is not preserved and is
momentarily reset until the CPSR is touched. On single core runs this
is a non-issue. On multi-core runs this can lead to crashes on the O3
CPU model from the following series of events:
 1) takeOverFrom executed to switch from Atomic -> O3
 2) All bootUncacheability flags are reset to true
 3) Core2 tries to execute a load covered by bootUncacheability, it
    is flagged as uncacheable
 4) Core2's load needs to replay due to a pipeline flush
 3) Core1 core does an action on CPSR
 4) The handling code for CPSR then checks all other cores
    to determine if bootUncacheability can be set to false
 5) Asynchronously set bootUncacheability on all cores to false
 6) Core2 replays load previously set as uncacheable and notices
    it is now flagged as cacheable, leads to a panic.
This patch implements takeOverFrom() functionality for the ARM TLBs
to preserve flag values when switching from atomic -> detailed.
2014-05-09 18:58:47 -04:00
Curtis Dunham 1028c03320 cpu: add more instruction mix statistics
For the o3, add instruction mix (OpClass) histogram at commit (stats
also already collected at issue). For the simple CPUs we add a
histogram of executed instructions
2014-05-09 18:58:47 -04:00
Mitch Hayenga a15b713cba mem: Squash prefetch requests from downstream caches
This patch squashes prefetch requests from downstream caches,
so that they do not steal cachelines away from caches closer
to the cpu.  It was originally coded by Mitch Hayenga and
modified by Aasheesh Kolli.
2014-05-09 18:58:46 -04:00
Stephan Diestelhorst b9e6c260a0 stats: Method stats source
This source for stats binds an object and a method / function from the object
to a stats object. This allows pulling out stats from object methods without
needing to go through a global, or static shim.

Syntax is somewhat unpleasant, but the templates and method pointer type
specification were quite tricky.  Interface is very clean though; and similar
to .functor
2014-05-09 18:58:46 -04:00
Akash Bagdia 2b1a01ee6c cpu, arm: Allow the specification of a socket field
Allow the specification of a socket ID for every core that is reflected in the
MPIDR field in ARM systems.  This allows studying multi-socket / cluster
systems with ARM CPUs.
2014-05-09 18:58:46 -04:00
Sascha Bischoff e940bac278 mem: Auto-generate CommMonitor trace file names
Splits the CommMonitor trace_file parameter into three parameters. Previously,
the trace was only enabled if the trace_file parameter was set, and would be
written to this file. This patch adds in a trace_enable and trace_compress
parameter to the CommMonitor.

No trace is generated if trace_enable is set to False. If it is set to True, the
trace is written to a file based on the name of the SimObject in the simulation
hierarchy. For example, system.cluster.il1_commmonitor.trc. This filename can be
overridden by additionally specifying a file name to the trace_file parameter
(more on this later).

The trace_compress parameter will append .gz to any filename if set to True.
This enables compression of the generated traces. If the file name already ends
in .gz, then no changes are made.

The trace_file parameter will override the name set by the trace_enable
parameter. In the case that the specified name does not end in .gz but
trace_compress is set to true, .gz is appended to the supplied file name.
2014-05-09 18:58:46 -04:00
Geoffrey Blake 29601eada7 arm: Panics in miscreg read functions can be tripped by O3 model
Unimplemented miscregs for the generic timer were guarded by panics
in arm/isa.cc which can be tripped by the O3 model if it speculatively
executes a wrong path containing a mrs instruction with a bad miscreg
index. These registers were flagged as implemented and accessible.
This patch changes the miscreg info bit vector to flag them as
unimplemented and inaccessible. In this case, and UndefinedInst
fault will be generated if the register access is not trapped
by a hypervisor.
2014-05-09 18:58:46 -04:00
Chris Emmons a3306d0d5e dev: Set HDLCD default pixel clock for 1080p @ 60Hz
This patch changes the default pixel clock to effectively generate
1080p resolution at 60 frames per second. It is dependent upon the
kernel device tree file using the specified resolution / display
string in the comments.
2014-05-09 18:58:46 -04:00
Matt Evans 73dc89e542 arm: quick hack to allow a greater number of CPUs to a guest OS
This is a quick hack to communicate a greater number of CPUs to a guest OS via
the ARM A9 SCU config register. Some OSes (Linux) just look at the bottom field
to count CPUs and with a small change can look at bits [3:0] to learn about up
to 16 CPUs.

Very much unsupported (and contains warning messages as such) but useful for
running 8 core sims without hardwiring CPU count in the guest OS.
2014-05-09 18:58:46 -04:00
Curtis Dunham 7f1603d207 arch: remove inline specifiers on all inst constrs, all ISAs
With (upcoming) separate compilation, they are useless.  Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
2014-05-09 18:58:46 -04:00
Curtis Dunham eb61f0123b arm: cleanup ARM ISA definition 2014-05-09 18:58:46 -04:00
Curtis Dunham ad019c5c58 scons: Require SWIG >= 2.0.4 and remove vector typemaps
SWIG commit fd666c1 (*) made it unnecessary for gem5 to have these
typemaps to handle Vector types.

* fd666c1440
2014-05-09 18:58:46 -04:00
Curtis Dunham ecf774bc56 arm: Correctly display disassembly of vldmia/vstmia
The MicroMemOp class generates the disassembly for both integer
and floating point instructions, but it would always print its
first operand as an integer register without considering that the
op may be a floating instruction in which case a float register
should be displayed instead.
2014-04-23 05:18:30 -04:00
Andreas Hansson 6cd82c1116 sim: Use correct unit for abort message
This patch fixes the unit in the abort printout.
2014-04-23 05:18:27 -04:00
Mitchell Hayenga bf25c53a7d cpu: Fix setTranslateLatency() bug for squashed instructions
setTranslateLatency could sometimes improperly access a deleted request
packet after an instruction was squashed.
2014-04-23 05:18:26 -04:00
Sascha Bischoff 2031c03c09 misc: Proper type check and import for PortRef
Rewriting the type checking around PortRef, which was interacting strangely
with other Python scripts.

Tested-by: stephan.diestelhorst@arm.com
2014-04-23 05:18:25 -04:00
Mitch Hayenga e4086878f6 cpu: Fix case where o3 lsq could print out uninitialized data
In the O3 LSQ, data read/written is printed out in DPRINTFs.  However,
the data field is treated as a character string with a null terminated.
However the data field is not encoded this way.  This patch removes
that possibility by removing the data part of the print.
2014-04-01 14:22:06 -05:00
Mitch Hayenga a0d30f36a6 mem: Don't print out the data of a cache block
This never actually worked since it was printing out only a word
of the cache block and not the entire thing and doubly didn't work
csprintf overrides the %#x specifier and assumes a char* array is
actually a string.
2014-04-01 14:24:36 -05:00
Mitchell Hayenga 0fad0c7f7d arm: Don't use a stack allocated mnemonic
FailUnimplemented passed a stack created mnemonic as a const char * which
causes some grief when the stack goes away.
2014-04-23 05:18:20 -04:00
Dam Sunwoo 84f8fe637c cpu: Add O3 CPU width checks
O3CPU has a compile-time maximum width set in o3/impl.hh, but checking
the configuration against this limit was not implemented anywhere
except for fetch. Configuring a wider pipe than the limit can silently
cause various issues during the simulation. This patch adds the proper
checking in the constructor of the various pipeline stages.
2014-04-23 05:18:18 -04:00
Curtis Dunham c9071ff95e base: explicitly suggest potential use of 'All' debug flags
Without this declaration, new clangs will complain about this value
being unused. It has no explicit use in the codebase, but it can be
useful to turn on all debugging flags while in a debugger to greatly
increase simulator verbosity.
2014-04-23 05:17:59 -04:00
Curtis Dunham e651188f75 arch: remove 'null update' check in isa-parser
SCons already does this for all build steps.
2014-04-23 05:17:57 -04:00
Curtis Dunham fa4a262204 stats: better error message for uninitialized statistic
As suggested by Nathan Binkert in 2008:
http://permalink.gmane.org/gmane.comp.emulators.m5.users/2676
2014-02-10 18:24:20 -06:00
Nilay Vaish 4ceeda20aa ruby: slicc: remove old documentation
Has not been maintained at all.  Since there is alternate documentation
available on gem5.org, no need to have this separately.
2014-04-19 09:00:31 -05:00
Nilay Vaish 183100b8cb ruby: slicc: slight change to rule for transitions
It had an unnecessary pairs token which is being removed.
2014-04-19 09:00:31 -05:00
Faissal Sleiman a1570f544f o3: Fix occupancy checks for SMT
A number of calls to isEmpty() and numFreeEntries()
should be thread-specific.

In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob
has instructions from thread 0 (isEmpty() returns false), and none from
thread 1. If we are trying to squash all of thread 1, then
readTailInst(thread 1) will be called because rob->isEmpty() returns
false. The result is end_it is not in the list and the while
statement loops indefinitely back over the cpu's instList.

In iew_impl.hh, all threads are told they have the entire remaining IQ, when
each thread actually has a certain allocation. The result is extra stalls at
the iew dispatch stage which the rename stage usually takes care of.

In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only
contains instructions from thread 0. This returns a dummyInst (which may work
since we are trying to squash all instructions, but hardly seems like the right
way to do it).

In rob_impl.hh this fix skips the rest of the function more frequently and is
more efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-04-19 09:00:30 -05:00
Marco Elver d9fa950396 ruby: recorder: Fix (de-)serializing with different cache block-sizes
Upon aggregating records, serialize system's cache-block size, as the
cache-block size can be different when restoring from a checkpoint. This way,
we can correctly read all records when restoring from a checkpoints, even if
the cache-block size is different.

Note, that it is only possible to restore from a checkpoint if the
desired cache-block size is smaller or equal to the cache-block size
when the checkpoint was taken; we can split one larger request into
multiple small ones, but it is not reliable to do the opposite.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-04-19 09:00:30 -05:00
Andreas Sandberg 02b51afb7e kvm, x86: Add initial support for multicore simulation
Simulating a SMP or multicore requires devices to be shared between
multiple KVM vCPUs. This means that locking is required when accessing
devices. This changeset adds the necessary locking to allow devices to
execute correctly. It is implemented by temporarily migrating the KVM
CPU to the VM's (and devices) event queue when handling
MMIO. Similarly, the VM migrates to the interrupt controller's event
queue when delivering an interrupt.

The support for fast-forwarding of multicore simulations added by this
changeset assumes that all devices in a system are simulated in the
same thread and each vCPU has its own thread. Special care must be
taken to ensure that devices living under the CPU in the object
hierarchy (e.g., the interrupt controller) do not inherit the parent
CPUs thread and are assigned to device thread. The KvmVM object is
assumed to live in the same thread as the other devices in the system.
2014-04-09 16:01:58 +02:00
Andreas Sandberg 221f4f232a dev: Protect PollEvent processing when running in parallel mode
The calling thread is undefined when the PollQueue services events.
This implies that PollEvents need to handle the case where they are
processed from a different thread than the thread that created the
event. This changeset adds temporary event queue migrations to the VNC
server, the ethernet tap device, and the terminal to protect them from
inter-thread calls.
2014-04-09 16:01:43 +02:00
Nilay Vaish d805e42b81 ruby: slicc: change enqueue statement
As of now, the enqueue statement can take in any number of 'pairs' as
argument.  But we only use the pair in which latency is the key.  This
latency is allowed to be either a fixed integer or a member variable of
controller in which the expression appears.  This patch drops the use of pairs
in an enqueue statement.  Instead, an expression is allowed which will be
interpreted to be the latency of the enqueue.  This expression can anything
allowed by slicc including a constant integer or a member variable.
2014-04-08 13:26:30 -05:00
Nilay Vaish e689c00b16 ruby: coherence protocols: drop the phrase IntraChip
The phrase is no longer valid since we do not distinguish between
inter and intra chip communication.
2014-04-08 13:26:29 -05:00
Andreas Sandberg 838bcd3b19 sim: Add the ability to lock and migrate between event queues
We need the ability to lock event queues to enable device accesses
across threads. The serviceOne() method now takes a service lock prior
to handling a new event. By locking an event queue, a different
thread/eq can effectively execute in the context of the locked event
queue. To simplify temporary event queue migrations, this changeset
introduces the EventQueue::ScopedMigration class that unlocks the
current event queue, locks a new event queue, and updates the current
event queue variable.

In order to prevent deadlocks, event queues need to be released when
waiting on barriers. This is implemented using the
EventQueue::ScopedRelease class. An instance of this class is, for
example, used in the BaseGlobalEvent class to release the event queue
when waiting on the synchronization barrier.

The intended use for this functionality is when devices need to be
accessed across thread boundaries. For example, when fast-forwarding,
it might be useful to run devices and CPUs in separate threads. In
such a case, the CPU locks the device queue whenever it needs to
perform IO.  This functionality is primarily intended for KVM.

Note: Migrating between event queues can lead to non-deterministic
timing. Use with extreme care!

--HG--
extra : rebase_source : 23e3a741a1fd73861d1339782dbbe1bc76285315
2014-04-03 11:22:49 +02:00
Marco Elver b884fcf412 cpu: o3: lsq: Fix TSO implementation
This patch fixes violation of TSO in the O3CPU, as all loads must be
ordered with all other loads. In the LQ, if a snoop is observed, all
subsequent loads need to be squashed if the system is TSO.

Prior to this patch, the following case could be violated:

 P0         | P1          ;
 MOV [x],mail=/usr/spool/mail/nilay | MOV EAX,[y] ;
 MOV [y],mail=/usr/spool/mail/nilay | MOV EBX,[x] ;

exists (1:EAX=1 /\ 1:EBX=0) [is a violation]

The problem was found using litmus [http://diy.inria.fr].

Committed by: Nilay Vaish <nilay@cs.wisc.edu
2014-03-25 13:15:04 -05:00
Andreas Hansson a00383a40a mem: Track DRAM read/write switching and add hysteresis
This patch adds stats for tracking the number of reads/writes per bus
turn around, and also adds hysteresis to the write-to-read switching
to ensure that the queue does not oscilate around the low threshold.
2014-03-23 11:12:14 -04:00
Andreas Hansson 7c18691db1 mem: Rename SimpleDRAM to a more suitable DRAMCtrl
This patch renames the not-so-simple SimpleDRAM to a more suitable
DRAMCtrl. The name change is intended to ensure that we do not send
the wrong message (although the "simple" in SimpleDRAM was originally
intended as in cleverly simple, or elegant).

As the DRAM controller modelling work is being presented at ISPASS'14
our hope is that a broader audience will use the model in the future.

--HG--
rename : src/mem/SimpleDRAM.py => src/mem/DRAMCtrl.py
rename : src/mem/simple_dram.cc => src/mem/dram_ctrl.cc
rename : src/mem/simple_dram.hh => src/mem/dram_ctrl.hh
2014-03-23 11:12:12 -04:00
Andreas Hansson 3dd1587afc mem: Change memory defaults to be more representative
Make the default memory type DDR3-1600 x64, and use the open-adaptive
page policy. This change is aiming to ensure that users by default are
using a realistic memory system.
2014-03-23 11:12:10 -04:00
Wendy Elsasser bbbae677ed mem: Add close adaptive paging policy to DRAM controller model
This patch adds a second adaptive page policy to the DRAM controller,
closing the page unless there are already queued accesses to the open
page.
2014-03-23 11:12:08 -04:00
Andreas Hansson 03a1aed803 mem: DRAM controller tidying up
Minor tidying up and removing of redundant code, including the
printing of queue state every million accesses.
2014-03-23 11:12:06 -04:00
Andreas Hansson bc83eb2197 mem: Fix bug in DRAM bytes per activate
This patch ensures that we do not sample the bytes per activate when
the row has already been closed.
2014-03-23 11:12:05 -04:00
Andreas Hansson 116985d661 mem: Limit the accesses to a page before forcing a precharge
This patch adds a basic starvation-prevention mechanism where a DRAM
page is forced to close after a certain number of accesses. The limit
is combined with the open and open-adaptive page policy and if reached
causes an auto-precharge.
2014-03-23 11:12:03 -04:00
Andreas Hansson 6557741311 mem: Make DRAM write queue draining more aggressive
This patch changes the triggering condition for the write draining
such that we grab the opportunity to issue writes if there are no
reads waiting (as opposed to waiting for the writes to reach the high
threshold). As a result, we potentially drain some of the writes in read
idle periods (if any).

A low threshold is added to be able to control how many write bursts
are kept in the memory controller queue (acting as on-chip storage).

The high and low thresholds are updated to sensible values for a 32/64
size write buffer. Note that the thresholds should be adjusted along
with the queue sizes.

This patch also adds some basic initialisation sanity checks and moves
part of the initialisation to the constructor.
2014-03-23 11:12:01 -04:00
Neha Agarwal 364a51181e cpu: DRAM Traffic Generator
This patch enables a new 'DRAM' mode to the existing traffic
generator, catered to generate specific requests to DRAM based on
required hit length (stride size) and bank utilization. It is an add on
to the Random mode.

The basic idea is to control how many successive packets target the
same page, and how many banks are being used in parallel. This gives a
two-dimensional space that stresses different aspects of the DRAM
timing.

The configuration file needed to use this patch has to be changed as
follow: (reference to Random Mode, LPDDR3 memory type)

'STATE 0 10000000000 RANDOM 50 0 134217728 64 3004 5002 0'
-> 'STATE 0 10000000000 DRAM 50 0 134217728 32 3004 5002 0 96 1024 8 6 1'

The last 4 parameters to be added are:
<stride size (bytes), page size(bytes), number of banks available in DRAM,
    number of banks to be utilized, address mapping scheme>

The address mapping information is used to get the stride address
stream of the specified size and to know where to find the bank
bits. The configuration file has a parameter where '0'-> RoCoRaBaCh,
'1'-> RoRaBaCoCh/RoRaBaChCo address-mapping schemes. Note that the
generator currently assumes a single channel and a single rank. This
is to avoid overwhelming the traffic generator with information about
the memory organisation.
2014-03-23 11:11:58 -04:00
Neha Agarwal 43abaf518f mem: DDR3 config for comparing with DRAMSim2
This patch adds a new DDR3 configuration to match with the parameters
that are specified in one of the DDR3 configs used in DRAMSim2.
2014-03-23 11:11:56 -04:00
Andreas Hansson 7e7b67472a mem: More descriptive address-mapping scheme names
This patch adds the row bits to the name of the address mapping
schemes to make it more clear that all the current schemes places the
row bits as the most significant bits.
2014-03-23 11:11:53 -04:00
Stan Czerniawski 4f77bc230a misc: Fix -q (quiet) flag
Check the right flag.
2014-03-23 11:11:49 -04:00
Andreas Hansson 9ac4f781ec ruby: Move Ruby debug flags to ruby dir and remove stale options
This patch moves the Ruby-related debug flags to the ruby
sub-directory, and also removes the state SConsopts that add the
no-longer-used NO_VECTOR_BOUNDS_CHECK.
2014-03-23 11:11:48 -04:00
Andreas Hansson 9f018d2f5a mem: Include the DRAMSim2 wrapper in NULL build
This patch makes sure DRAMSim2 is included in a build of the NULL ISA.
2014-03-23 11:11:44 -04:00
Sascha Bischoff 548d47ea2c mem: CommMonitor trace warn on non-timing mode
Add a warning to the CommMonitor which will alert the user if they try
and record a trace when the system is not in timing mode.
2014-03-23 11:11:40 -04:00
Stan Czerniawski e18d0e04a2 cpu: Add basic check to TrafficGen initial state
Prevent incomplete configuration of TrafficGen class from causing
segmentation faults. If an 'INIT' line is not present in the
configuration file then the currState variable will remain
uninitialized which may result in a crash.
2014-03-23 11:11:39 -04:00
Andrew Bardsley 0c001e729a dev: Fix IsaFake's cxx_header setting
cxx_header was set incorrectly on IsaFake
2014-03-23 11:11:37 -04:00
Eric Van Hensbergen 7630168a75 arm: m5ops readfile64 args broken, offset coming through garbage
There were several sections of the m5ops code which were
essentially copy/pasted versions of the 32-bit code. The
problem is that some of these didn't account fo4 64-bit
registers leading to arguments being in the wrong registers.
This patch addresses the args for readfile64, writefile64,
and addsymbol64 -- all of which seemed to suffer from a
similar set of problems when moving to 64-bit.
2014-03-23 11:11:34 -04:00
Andreas Hansson 5093e58dc2 base: Fix error message time unit (cycle -> tick)
This patch fixes the unit used in all error messages.
2014-03-23 11:11:32 -04:00
Nilay Vaish 52a83c1d0e ruby: consumer: avoid accessing wakeup times when waking up
Each consumer object maintains a set of tick values when the object is supposed
to wakeup and do some processing.  As of now, the object accesses this set both
when scheduling a wakeup event and when the object actually wakes up.  The set
is accessed during wakeup to remove the current tick value from the set.  This
functionality is now being moved to the scheduling function where ticks are
removed at a later time.
2014-03-20 09:14:14 -05:00
Nilay Vaish 4b67ada89e ruby: garnet: convert network interfaces into clocked objects
This helps in configuring the network interfaces from the python script and
these objects no longer rely on the network object for the timing information.
2014-03-20 09:14:14 -05:00
Nilay Vaish 4f7ef51efb ruby: slicc: code refactor 2014-03-20 09:14:14 -05:00
Nilay Vaish 9b3418d163 ruby: no piobus in se mode
Piobus was recently added to se scripts for ruby so that the interrupt
controller can be connected to something (required since the interrupt
controller sends address range messages).  This patch removes the piobus
and instead, the pio port of ruby port will now ignore the range change
messages in se mode.
2014-03-20 08:03:09 -05:00
Nilay Vaish f7e7fa6d90 ruby: remove some of the unnecessary code 2014-03-17 17:40:14 -05:00
Andreas Sandberg 11ffa379ab kvm: Clean up signal handling
KVM used to use two signals, one for instruction count exits and one
for timer exits. There is really no need to distinguish between the
two since they only trigger exits from KVM. This changeset unifies and
renames the signals and adds a method, kick(), that can be used to
raise the control signal in the vCPU thread. It also removes the early
timer warning since we do not normally see if the signal was
delivered.

--HG--
extra : rebase_source : cd0e45ca90894c3d6f6aa115b9b06a1d8f0fda4d
2014-03-16 17:40:58 +01:00
Andreas Sandberg 5db547bca4 kvm: x86: Adjust PC to remove the CS segment base address
gem5 seems to store the PC as RIP+CS_BASE. This is not what KVM
expects, so we need to subtract CS_BASE prior to transferring the PC
into KVM. This changeset adds the necessary PC manipulation and
refactors thread context updates slightly to avoid reading registers
multiple times from KVM.

--HG--
extra : rebase_source : 3f0569dca06a1fcd8694925f75c8918d954ada44
2014-03-16 17:30:24 +01:00
Andreas Sandberg f791e7b313 kvm: x86: Add support for x86 INIT and STARTUP handling
This changeset adds support for INIT and STARTUP IPI handling. We
currently handle both of these interrupts in gem5 and transfer the
state to KVM. Since we do not have a BIOS loaded, we pretend that the
INIT interrupt suspends the CPU after reset.

--HG--
extra : rebase_source : 7f3b25f3801d68f668b6cd91eaf50d6f48ee2a6a
2014-03-16 17:28:23 +01:00
Paul Rosenfeld 32bf74cb8e alpha: Small removal of dead comments/code from alpha ISA
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-12 07:03:22 -05:00
Andreas Hansson 62fe81e9c1 cpu: Make CPU and ThreadContext getters const
This patch merely tidies up the CPU and ThreadContext getters by
making them const where appropriate.
2014-03-07 15:56:23 -05:00
Geoffrey Blake c4a8e5c36c arm: Handle functional TLB walks properly
The table walker code currently accounts for two types of walks,
Atomic and Timing, and treats them differently. Atomic walks keep a
single instance of WalkerState around for all walks to use in
currState. Timing mode keeps a queue of in-flight WalkerStates and
maintains currState as NULL between walks.

If a functional walk is done during Timing mode, it is treated as an
atomic walk and either creates a persistent WalkerState if in between
Timing walks, or stomps an existing currState for an in-progress
Timing walk.

This patch distinguishes functional walks as being able to exist at
any time and sets up a temporary WalkerState for its exclusive use and
then cleans up when finished, leaving any in progress Atomic or Timing
walks undisturbed.
2014-03-07 15:56:23 -05:00
Prakash Ramrakhyani e88cffb30a mem: Fix incorrect assert failure in the Cache
This patch fixes an assert condition that is not true at all
times. There are valid situations that arise in dual-core
dual-workload runs where the assert condition is false. The function
call following the assert however needs to be called only when the
condition is true (a block cannot be invalidated in the tags structure
if has not been allocated in the structure, and the tempBlock is never
allocated). Hence the 'assert' has been replaced with an 'if'.
2014-03-07 15:56:23 -05:00
Radhika Jagtap c446dc40bd mem: Edit proto Packet and enhance the python script
This patch changes the decode script to output the optional fields of
the proto message Packet, namely id and flags. The flags field is set
by the communication monitor.

The id field is useful for CPU trace experiments, e.g. linking the
fetch side to decode side. It had to be renamed because it clashes
with a built in python function id() for getting the "identity" of an
object.

This patch also takes a few common function definitions out from the
multiple scripts and adds them to a protolib python module.
2014-03-07 15:56:23 -05:00
Stephan Diestelhorst 45677ffa97 misc: Add panic_if / fatal_if / chatty_assert
This snippet can be used to replace if + {panics, fatals, asserts} constructs.
The idea is to have both the condition checking and a verbose printout in a single statement.  The interface is as follows:

panic_if(foo != bar, "These should be equal: foo %i bar %i", foo, bar);
fatal_if(foo != bar, "These should be equal: foo %i bar %i", foo, bar);
chatty_assert(foo == bar, "These should be equal: foo %i bar %i", foo, bar);
2014-03-07 15:56:23 -05:00
Mitch Hayenga b9a9d99b22 scons: Fixes uninitialized warnings issued by clang
Small fixes to appease recent clang versions.
2014-03-07 15:56:23 -05:00
Stephan Diestelhorst bef2086f5b arm: Fix uninitialised warning with gcc 4.8
Small fix for a warning that prevents compilation with gcc 4.8.1 due
to detecting that a variable might be uninitialised. The fix is to
assign a safe default.
2014-03-07 15:56:23 -05:00
Ali Saidi bf39a475fe mem: Wakeup sleeping CPUs without caches on LLSC
For systems without caches, the LLSC code does not get snoops for
wake-ups. We add the LLSC code in the abstract memory to do the job
for us.
2014-03-07 15:56:23 -05:00
Andreas Sandberg f4a897d8e3 sim: Schedule the global sync event at curTick() + simQuantum
The global synchronization event used to be scheduled at
simQuantum. This prevented repeated entries into gem5 from Python as
it can be scheduled in the past. This changeset ensures that the first
global synchronization happens at curTick() + simQuantum instead.
2014-03-06 15:59:53 +01:00
Andreas Sandberg be246cef62 x86: Setup correct TSL/TR segment attributes on INIT
The TSL/LDT & TR/TSS segments didn't contain valid attributes. This
caused problems when transfering the state into KVM where invalid
state is a no-go. Fixup the attributes with values from AMD's
architecture programmer's manual.
2014-03-03 14:44:57 +01:00
Andreas Sandberg e7d230ede0 kvm: x86: Always assume segments to be usable
When transferring segment registers into kvm, we need to find the
value of the unusable bit. We used to assume that this could be
inferred from the selector since segments are generally unusable if
their selector is 0. This assumption breaks in some weird corner
cases.  Instead, we just assume that segments are always usable. This
is what qemu does so it should work.
2014-03-03 14:34:33 +01:00
Andreas Sandberg 739cc0128b kvm: Initialize signal handlers from startupThread()
Signal handlers in KVM are controlled per thread and should be
initialized from the thread that is going to execute the CPU. This
changeset moves the initialization call from startup() to
startupThread().
2014-03-03 14:31:39 +01:00
Nilay Vaish 5cd9dd29bd ruby: message buffer: changes related to tracking push/pop times
The last pop operation is now tracked as a Tick instead of in Cycles.
This helps in avoiding use of the receiver's clock during the enqueue
operation.
2014-03-01 23:59:58 -06:00
Nilay Vaish 67cd04b6fe ruby: make the max_size variable of the MessageBuffer unsigned 2014-03-01 23:59:57 -06:00
Christopher Torng 919baa603d cpu: Enable fast-forwarding for MIPS InOrderCPU and O3CPU
A copyRegs() function is added to MIPS utilities
to copy architectural state from the old CPU to
the new CPU during fast-forwarding. This
addition alone enables fast-forwarding for the
o3 cpu model running MIPS.

The patch also adds takeOverFrom() and
drainResume() functions to the InOrderCPU to
enable it to take over from another CPU. This
change enables fast-forwarding for the inorder
cpu model running MIPS, but not for Alpha.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-01 23:35:23 -06:00
Nilay Vaish a533f3f983 ruby: profiler: statically allocate stats variable
Couple of users observed segmentation fault when the simulator tries to
register the statistical variable m_IncompleteTimes.  It seems that there
is some problem with the initialization of these variables when allocated
in the constructor.
2014-03-01 23:35:21 -06:00
Nilay Vaish 7e27860ef4 ruby: route all packets through ruby port
Currently, the interrupt controller in x86 is connected to the io bus
directly.  Therefore the packets between the io devices and the interrupt
controller do not go through ruby.  This patch changes ruby port so that
these packets arrive at the ruby port first, which then routes them to their
destination.  Note that the patch does not make these packets go through the
ruby network.  That would happen in a subsequent patch.
2014-02-23 19:16:16 -06:00
Andreas Hansson 5755fff998 ruby: Simplify RubyPort flow control and routing
This patch simplfies the retry logic in the RubyPort, avoiding
redundant attributes, and enforcing more stringent checks on the
interactions with the normal ports. The patch also simplifies the
routing done by the RubyPort, using the port identifiers instead of a
heavy-weight sender state.

The patch also fixes a bug in the sending of responses from PIO
ports. Previously these responses bypassed the queue in the queued
port, and ignored the return value, potentially leading to response
packets being lost.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-02-23 19:16:16 -06:00
Nilay Vaish 7572ab71b5 ruby: message buffer: refactor code
Code in two of the functions was exactly the same.  This patch moves
this code to a new function which is called from the two functions
mentioned initially.
2014-02-23 19:16:15 -06:00
Nilay Vaish cde20fd476 ruby: remove few not required #includes 2014-02-23 19:16:15 -06:00
Nilay Vaish 82378f7301 ruby: slicc: remove unused COPY_HEAD functionality 2014-02-23 19:16:15 -06:00
Nilay Vaish 13ad07601b ruby: protocols: remove unused action z_stall 2014-02-23 19:16:15 -06:00
Nilay Vaish cd33f9bc42 ruby: network: move message buffers to base network class. 2014-02-21 08:02:05 -06:00
Nilay Vaish bd8f954526 ruby: network: garnet: fixed: removes net_ptr from links 2014-02-21 08:02:04 -06:00
Nilay Vaish 307f53e164 ruby: cache: remove not required variable m_cache_name 2014-02-21 08:02:02 -06:00
Nilay Vaish f8f8b7e5c2 ruby: network: garnet: fixed: removes next cycle functions
At several places, there are functions that take a cycle value as input
and performs some computation.  Along with each such function, another
function was being defined that simply added one more cycle to input and
computed the same function.  This patch removes this second copy of the
function.  Places where these functions were being called have been updated
to use the original function with argument being current cycle + 1.
2014-02-20 17:28:01 -06:00
Nilay Vaish 896654746a ruby: controller: slight code refactoring 2014-02-20 17:27:45 -06:00
Nilay Vaish 0ce8c25919 ruby: mesi three level: rename incorrectly named files
Two files had been incorrectly named with a .cache suffix.

--HG--
rename : src/mem/protocol/MESI_Three_Level-L0.cache => src/mem/protocol/MESI_Three_Level-L0cache.sm
rename : src/mem/protocol/MESI_Three_Level-L1.cache => src/mem/protocol/MESI_Three_Level-L1cache.sm
2014-02-20 17:27:17 -06:00
Nilay Vaish db5b3d37fe ruby: network: removes unused code. 2014-02-20 17:27:07 -06:00
Nilay Vaish dd5c72e5a7 ruby: slicc: slight code refactoring 2014-02-20 17:26:49 -06:00
Nilay Vaish b312a41f21 ruby: message buffer: removes some unecessary functions. 2014-02-20 17:26:41 -06:00
Andreas Sandberg 0d6009e8dc kvm: Add support for multi-system simulation
The introduction of parallel event queues added most of the support
needed to run multiple VMs (systems) within the same gem5
instance. This changeset fixes up signal delivery so that KVM's
control signals are delivered to the thread that executes the CPU's
event queue. Specifically:

  * Timers and counters are now initialized from a separate method
    (startupThread) that is scheduled as the first event in the
    thread-specific event queue. This ensures that they are
    initialized from the thread that is going to execute the CPUs
    event queue and enables signal delivery to the right thread when
    exiting from KVM.

  * The POSIX-timer-based KVM timer (used to force exits from KVM) has
    been updated to deliver signals to the thread that's executing KVM
    instead of the process (thread is undefined in that case). This
    assumes that the timer is instantiated from the thread that is
    going to execute the KVM vCPU.

  * Signal masking is now done using pthread_sigmask instead of
    sigprocmask. The behavior of the latter is undefined in threaded
    applications.

  * Since signal masks can be inherited, make sure to actively unmask
    the control signals when setting up the KVM signal mask.

There are currently no facilities to multiplex between multiple KVM
CPUs in the same event queue, we are therefore limited to
configurations where there is only one KVM CPU per event queue. In
practice, this means that multi-system configurations can be
simulated, but not multiple CPUs in a shared-memory configuration.
2014-02-20 15:43:53 +01:00
Andreas Hansson 4b81585c49 mem: Fix bug in PhysicalMemory use of mmap and munmap
This patch fixes a bug in how physical memory used to be mapped and
unmapped. Previously we unmapped and re-mapped if restoring from a
checkpoint. However, we never checked that the new mapping was
actually the same, it was just magically working as the OS seems to
fairly reliably give us the same chunk back. This patch fixes this
issue by relying entirely on the mmap call in the constructor.
2014-02-18 05:51:01 -05:00
Andreas Hansson f0ea79c41f dev: Include basic devices in NULL ISA build
This patch enbles use of the basic PIO devices as part of the NULL
build. Although it might seem counter intuitive to have a PIO device
without being able to execute a driver, this change enables us to
break a device class hierarchy into an ISA-agnostic part, and an
ISA-specific part, without requiring multiple-inheritance. The
ISA-agnostic base class is a PIO device, but does not make use of the
port.
2014-02-18 05:50:59 -05:00
Andreas Hansson 969b436243 mem: Filter cache snoops based on address ranges
This patch adds a filter to the cache to drop snoop requests that are
not for a range covered by the cache. This fixes an issue observed
when multiple caches are placed in parallel, covering different
address ranges. Without this patch, all the caches will forward the
snoop upwards, when only one should do so.
2014-02-18 05:50:58 -05:00
Andreas Hansson bf2f178f85 mem: Add a wrapped DRAMSim2 memory controller
This patch adds DRAMSim2 as a memory controller by wrapping the
external library and creating a sublass of AbstractMemory that bridges
between the semantics of gem5 and the DRAMSim2 interface.

The DRAMSim2 wrapper extracts the clock period from the config
file. There is no way of extracting this information from DRAMSim2
itself, so we simply read the same config file and get it from there.

To properly model the response queue, the wrapper keeps track of how
many transactions are in the actual controller, and how many are
stacking up waiting to be sent back as responses (in the wrapper). The
latter requires us to move away from the queued port and manage the
packets ourselves. This is due to DRAMSim2 not having any flow control
on the response path.

DRAMSim2 assumes that the transactions it is given are matching the
burst size of the choosen memory. The wrapper checks to ensure the
cache line size of the system matches the burst size of DRAMSim2 as
there are currently no provisions to split the system requests. In
theory we could allow a cache line size smaller than the burst size,
but that would lead to inefficient use of the DRAM, so for not we
fatal also in this case.
2014-02-18 05:50:53 -05:00
Andreas Hansson c9cb492e1c mem: Fix input to DPRINTF in CommMonitor
Minor fix of the debug message parameters.
2014-02-18 05:50:51 -05:00
Andreas Sandberg c52190a695 cpu: simple: Add support for using branch predictors
This changesets adds branch predictor support to the
BaseSimpleCPU. The simple CPUs normally don't need a branch predictor,
however, there are at least two cases where it can be desirable:

  1) A simple CPU can be used to warm the branch predictor of an O3
     CPU before switching to the slower O3 model.

  2) The simple CPU can be used as a quick way of evaluating/debugging
     new branch predictors since it exposes branch predictor
     statistics.

Limitations:
  * Since the simple CPU doesn't speculate, only one instruction will
    be active in the branch predictor at a time (i.e., the branch
    predictor will never see speculative branches).

  * The outcome of a branch prediction does not affect the performance
    of the simple CPU.
2014-02-09 20:49:28 +01:00
Nilay Vaish eb73a14fe2 base: calls abort() from fatal
Currently fatal() ends the simulation in a normal fashion.  This results in
the call stack getting lost when using a debugger and it is not always
possible to debug the simulation just from the information provided by the
printed error message.  Even though the error is likely due to a user's fault,
the information available should not be thrown away.  Hence, this patch to
call abort() from fatal().
2014-02-06 16:30:13 -06:00
Nilay Vaish bb0e9119e7 ruby: memory controller: use MemoryNode * 2014-02-06 16:30:12 -06:00
Andreas Sandberg e76a37985f x86: Fix x87 state transfer bug
Changeset 7274310be1bb (isa: clean up register constants) increased
the value of NumFloatRegs, which triggered a bug in
X86ISA::copyRegs(). This bug is caused by the x87 stack being copied
twice since register indexes past NUM_FLOATREGS are mapped into the
x87 stack relative to the top of the stack, which is undefined when
the copy takes place.

This changeset updates the copyRegs() function to use access registers
using the non-flattening interface, which guarantees that undesirable
register folding does not happen.
2014-02-05 14:08:13 +01:00
Nikos Nikoleris c6279f2d19 x86, kvm: Fix bug in the RFlags get and set functions
The getRFlags and setRFlags utility functions were not updated
correctly when condition registers were separated into their own
register class. This lead to incorrect state transfer in calls from
kvm into the simulator (e.g., m5 readfile ended up in an infinite
loop) and when switching CPUs. This patch makes these utility
functions use getCCReg and setCCReg instead of getIntReg and setIntReg
which read and write the integer registers.

Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>
2014-02-02 16:37:35 +01:00
Ola Jeppsson 7f16951451 unittest: Fix build errors
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-30 12:21:58 -06:00
Mitch Hayenga 96317d466e mem: Add additional tolerance to stride prefetcher
Forces the prefetcher to mispredict twice in a row before resetting the
confidence of prefetching.  This helps cases where a load PC strides by a
constant factor, however it may operate on different arrays at times.
Avoids the cost of retraining.  Primarily helps with small iteration loops.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga 771c864bf4 mem: Allowed tagged instruction prefetching in stride prefetcher
For systems with a tightly coupled L2, a stride-based prefetcher may observe
access requests from both instruction and data L1 caches.  However, the PC
address of an instruction miss gives no relevant training information to the
stride based prefetcher(there is no stride to train).  In theses cases, its
better if the L2 stride prefetcher simply reverted back to a simple N-block
ahead prefetcher.  This patch enables this option.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 23:21:26 -06:00
Mitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E) 95735e10e7 mem: prefetcher: add options, support for unaligned addresses
This patch extends the classic prefetcher to work on non-block aligned
addresses.  Because the existing prefetchers in gem5 mask off the lower
address bits of cache accesses, many predictable strides fail to be
detected.  For example, if a load were to stride by 48 bytes, with 64 byte
cachelines, the current stride based prefetcher would see an access pattern
of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern.  This
patch fixes this, by training the prefetcher on access and not masking off the
lower address bits.

It also adds the following configuration options:
1) Training/prefetching only on cache misses,
2) Training/prefetching only on data acceses,
3) Optionally tagging prefetches with a PC address.
#3 allows prefetchers to train off of prefetch requests in systems with
multiple cache levels and PC-based prefetchers present at multiple levels.
It also effectively allows a pipelining of prefetch requests (like in POWER4)
across multiple levels of cache hierarchy.

Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7%  for SPECFP (geomean).
2014-01-29 23:21:25 -06:00
Xiangyu Dong 32cc2ea8b9 cpu: fix bug when TrafficGen deschedules event
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29 22:35:04 -06:00
Mitch Hayenga b77ca57f8c arm: Enable umask syscall in SE mode
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:51 -06:00
Mitch Hayenga 55a4ff5f04 base: Fix race condition in the socket listen function
gem5 makes the incorrect assumption that by binding a socket, it
effectively has allocated a port. Linux only allocates ports once you call
listen on the given socket, not when you call bind.  So even if the port was
free when bind was called, another process (gem5 instance) could race in
between the bind & listen calls and steal the port. In the current code, if
the call to bind fails due to the port being in use (EADDRINUSE), gem5 retries
for a different port.  However if listen fails, gem5 just panics. The fix is
testing the return value of listen and re-trying if it was due to EADDRINUSE.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:51 -06:00
Amin Farmahini ffbdaa7cce mem: Remove redundant findVictim() input argument
The patch
(1) removes the redundant writeback argument from findVictim()
(2) fixes the description of access() function

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:50 -06:00
Amin Farmahini 575a73f4a1 mem: Fixes a bug in simple_dram write merging
Fixes updating the value of size in the write merge function.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-28 18:00:49 -06:00
Nilay Vaish bdee69d0b1 x86: use lfpimm instead of limm for fptan 2014-01-27 18:50:54 -06:00
Nilay Vaish 6a543b5134 x86: implements x87 add/sub instructions 2014-01-27 18:50:53 -06:00
Nilay Vaish 5be0b846b1 x86: implements fxch instruction. 2014-01-27 18:50:52 -06:00
Nilay Vaish 4eb3b1ed0b x86: correct error in emms instruction. 2014-01-27 18:50:51 -06:00
ARM gem5 Developers 612f8f074f arm: Add support for ARMv8 (AArch64 & AArch32)
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli    (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt       (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole           (AArch64 NEON, validation)
Ali Saidi            (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang         (AArch64 Linux support)
Rene De Jong         (AArch64 Linux support, performance opt.)
Matt Horsnell        (AArch64 MP, validation)
Matt Evans           (device models, code integration, validation)
Chris Adeniyi-Jones  (AArch64 syscall-emulation)
Prakash Ramrakhyani  (validation)
Dam Sunwoo           (validation)
Chander Sudanthi     (validation)
Stephan Diestelhorst (validation)
Andreas Hansson      (code integration, performance opt.)
Eric Van Hensbergen  (performance opt.)
Gabe Black
2014-01-24 15:29:34 -06:00
Andreas Hansson cfc4a99982 arch: Make all register index flattening const
This patch makes all the register index flattening methods const for
all the ISAs. As part of this, readMiscRegNoEffect for ARM is also
made const.
2014-01-24 15:29:30 -06:00
Geoffrey Blake 9633282fc8 checker: CheckerCPU handling of MiscRegs was incorrect
The CheckerCPU model in pre-v8 code was not checking the
updates to miscellaneous registers due to some methods
for setting misc regs were not instrumented.  The v8 patches
exposed this by calling the instrumented misc reg update
methods and then invoking the checker before the main CPU had
updated its misc regs, leading to false positives about
register mismatches. This patch fixes the non-instrumented
misc reg update methods and places calls to the checker in
the proper places in the O3 model.
2014-01-24 15:29:30 -06:00
Ali Saidi 7d0344704a arch, cpu: Add support for flattening misc register indexes.
With ARMv8 support the same misc register id  results in accessing different
registers depending on the current mode of the processor. This patch adds
the same orthogonality to the misc register file as the others (int, float, cc).
For all the othre ISAs this is currently a null-implementation.

Additionally, a system variable is added to all the ISA objects.
2014-01-24 15:29:30 -06:00
Giacomo Gabrielli 3436de0c2a cpu: Add support for Memory+Barrier instruction types in O3 cpu. 2014-01-24 15:29:30 -06:00
Ali Saidi 90b1775a8f cpu: Add support for instructions that zero cache lines. 2014-01-24 15:29:30 -06:00
Ali Saidi 6bed6e0352 cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped.
This patch add support for generating wake-up events in the CPU when an address
that is currently in the exclusive state is hit by a snoop. This mechanism is required
for ARMv8 multi-processor support.
2014-01-24 15:29:30 -06:00
Giacomo Gabrielli d3444c6603 mem: Add flag to request if it was generated by a page table walk 2014-01-24 15:29:30 -06:00
Giacomo Gabrielli aefe9cc624 mem: Add support for a security bit in the memory system
This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
2014-01-24 15:29:30 -06:00
Chris Adeniyi-Jones 7f835a59f1 sim: Add openat/fstatat syscalls and fix mremap
This patch adds support for the openat and fstatat syscalls and
broadens the support for mremap to make it work on OS X.
2014-01-24 15:29:30 -06:00
Ali Saidi 904872a01a mem: Remove explict cast from memhelper.
Previously we were casting the result type to the the memory type which
is incorrect for things like dual-memory operations which still return a
single result.
2014-01-24 15:29:30 -06:00
Timothy M. Jones 427ceb57a9 Cache: Collect very basic stats on tag and data accesses
Adds very basic statistics on the number of tag and data accesses within the
cache, which is important for power modelling.  For the tags, simply count
the associativity of the cache each time.  For the data, this depends on
whether tags and data are accessed sequentially, which is given by a new
parameter.  In the parallel case, all data blocks are accessed each time, but
with sequential accesses, a single data block is accessed only on a hit.
2014-01-24 15:29:30 -06:00
Dam Sunwoo 85e8779de7 mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks.  Cache occupancy stats are
recalculated on each stat dump.
2014-01-24 15:29:30 -06:00
Matt Horsnell 739c6df94e base: add support for probe points and common probes
The probe patch is motivated by the desire to move analytical and trace code
away from functional code. This is achieved by the probe interface which is
essentially a glorified observer model.

What this means to users:
* add a probe point and a "notify" call at the source of an "event"
* add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace)
* register that module as a probe listener
Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py

What is happening under the hood:
* every SimObject maintains has a ProbeManager.
* during initialization (src/python/m5/simulate.py) first regProbePoints and
  the regProbeListeners is called on each SimObject.  this hooks up the probe
  point notify calls with the listeners.

FAQs:
Why did you develop probe points:
* to remove trace, stats gathering, analytical code out of the functional code.
* the belief that probes could be generically useful.

What is a probe point:
* a probe point is used to notify upon a given event (e.g. cpu commits an instruction)

What is a probe listener:
* a class that handles whatever the user wishes to do when they are notified
  about an event.

What can be passed on notify:
* probe points are templates, and so the user can generate probes that pass any
  type of argument (by const reference) to a listener.

What relationships can be generated (1:1, 1:N, N:M etc):
* there isn't a restriction. You can hook probe points and listeners up in a
  1:1, 1:N, N:M relationship. They become useful when a number of modules
  listen to the same probe points. The idea being that you can add a small
  number of probes into the source code and develop a larger number of useful
  analysis modules that use information passed by the probes.

Can you give examples:
* adding a probe point to the cpu's commit method allows you to build a trace
  module (outputting assembler), you could re-use this to gather instruction
  distribution (arithmetic, load/store, conditional, control flow) stats.

Why is the probe interface currently restricted to passing a const reference:
* the desire, initially at least, is to allow an interface to observe
  functionality, but not to change functionality.
* of course this can be subverted by const-casting.

What is the performance impact of adding probes:
* when nothing is actively listening to the probes they should have a
  relatively minor impact. Profiling has suggested even with a large number of
  probes (60) the impact of them (when not active) is very minimal (<1%).
2014-01-24 15:29:30 -06:00
Andreas Hansson 4de69821e6 sim: Expose the current voltage for each object as a stat 2014-01-24 15:29:30 -06:00
Andreas Hansson 1d85e914a6 sim: Expose the current clock period as a stat
This patch adds observability to the clock period of the clock domains
by including it as a stat.

As a result of adding this, the regressions will be updated in a
separate patch.
2014-01-24 15:29:30 -06:00
Matt Horsnell ca89eba79e mem: track per-request latencies and access depths in the cache hierarchy
Add some values and methods to the request object to track the translation
and access latency for a request and which level of the cache hierarchy responded
to the request.
2014-01-24 15:29:30 -06:00
Andreas Hansson daa781d2db config: Make the Clock a Tick parameter like Latency/Frequency
This patch makes the Clock a TickParamValue just like
Latency/Frequency. There is no longer any need to distinguish it
(originally needed to support multiplication).
2014-01-24 15:29:29 -06:00
Andreas Hansson f2b0b551cc x86: Fix memory leak in table walker
This patch fixes a memory leak in the table walker, by ensuring that
the sender state is deleted again if the request packet cannot be
successfully sent.
2014-01-24 15:29:29 -06:00
Andreas Hansson 7db542c0dd cpu: Relax check on squashed non-speculative instructions
This patch relaxes the check performed when squashing non-speculative
instructions, as it caused problems with loads that were marked ready,
and then stalled on a blocked cache. The assertion is now allowing
memory references to be non-faulting.
2014-01-24 15:29:29 -06:00
Dam Sunwoo f1cd6b1ba8 cpu: remove faulty simpoint basic block inst count assertion
This patch removes an assertion in the simpoint profiling code that
asserts that a previously-seen basic block has the exact same number
of instructions executed as before. This can be false if the basic
block generates aborts or takes interrupts at different locations
within the basic block. The basic block profiling are not affected
significantly as these events are rare in general.
2014-01-24 15:29:29 -06:00
Nilay Vaish 37433d91a3 ruby: remove unused label no_vector 2014-01-17 11:02:15 -06:00
Nilay Vaish 407f37e15f ruby: move all statistics to stats.txt, eliminate ruby.stats 2014-01-10 16:19:47 -06:00
Nilay Vaish cfe912a512 stats: add function for adding two histograms
This patch adds a function to the HistStor class for adding two histograms.
This functionality is required for Ruby.  It also adds support for printing
histograms in a single line.
2014-01-10 16:19:40 -06:00
Nilay Vaish 0387281e2a ruby: fix bug introduced to revision 8523754f8885 2014-01-09 10:45:50 -06:00
Nilay Vaish 8559081648 ruby: slicc: remove variable 'addr' used in calls to doTransition
This variable causes trouble if a variable of same name is declared in a
protocol file. Hence it is being eliminated.
2014-01-08 04:26:25 -06:00
Nilay Vaish 4070b00875 ruby: add a three level MESI protocol.
The first two levels (L0, L1) are private to the core, the third level (L2)is
possibly shared. The protocol supports clustered designs.  For example, one
can have two sets of two cores. Each core has an L0 and L1 cache. There are
two L2 controllers where each set accesses only one of the L2 controllers.
2014-01-04 00:03:34 -06:00
Nilay Vaish bb6d7d402b ruby: rename MESI_CMP_directory to MESI_Two_Level
This is because the next patch introduces a three level hierarchy.

--HG--
rename : build_opts/ALPHA_MESI_CMP_directory => build_opts/ALPHA_MESI_Two_Level
rename : build_opts/X86_MESI_CMP_directory => build_opts/X86_MESI_Two_Level
rename : configs/ruby/MESI_CMP_directory.py => configs/ruby/MESI_Two_Level.py
rename : src/mem/protocol/MESI_CMP_directory-L1cache.sm => src/mem/protocol/MESI_Two_Level-L1cache.sm
rename : src/mem/protocol/MESI_CMP_directory-L2cache.sm => src/mem/protocol/MESI_Two_Level-L2cache.sm
rename : src/mem/protocol/MESI_CMP_directory-dir.sm => src/mem/protocol/MESI_Two_Level-dir.sm
rename : src/mem/protocol/MESI_CMP_directory-dma.sm => src/mem/protocol/MESI_Two_Level-dma.sm
rename : src/mem/protocol/MESI_CMP_directory-msg.sm => src/mem/protocol/MESI_Two_Level-msg.sm
rename : src/mem/protocol/MESI_CMP_directory.slicc => src/mem/protocol/MESI_Two_Level.slicc
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/config.ini => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/config.ini
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/ruby.stats
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/simerr => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/simerr
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/simout => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/simout
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/stats.txt
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/system.pc.com_1.terminal => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/system.pc.com_1.terminal
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/config.ini => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/simerr => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/simout => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/simout
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/stats.txt
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/config.ini => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/simerr => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/simout => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/simout
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/stats.txt
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/config.ini => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/simerr => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/simout => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/simout
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/stats.txt
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/config.ini => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/simerr => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/simout => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/simout
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/stats.txt
2014-01-04 00:03:33 -06:00
Nilay Vaish 5b1804e3bd ruby: add support for clusters
A cluster over here means a set of controllers that can be accessed only by a
certain set of cores.  For example,  consider a two level hierarchy. Assume
there are 4 L1 controllers (private) and 2 L2 controllers.  We can have two
different hierarchies here:

a. the address space is partitioned between the two L2 controllers.  Each L1
controller accesses both the L2 controllers.  In this case, each L1 controller
is a cluster initself.

b. both the L2 controllers can cache any address.  An L1 controller has access
to only one of the L2 controllers.  In this case, each L2 controller
along with the L1 controllers that access it, form a cluster.

This patch allows for each controller to have a cluster ID, which is 0 by
default.  By setting the cluster ID properly,  one can instantiate hierarchies
with clusters.  Note that the coherence protocol might have to be changed as
well.
2014-01-04 00:03:31 -06:00
Nilay Vaish 9853ef6651 ruby: some small changes 2014-01-04 00:03:30 -06:00
Steve Reinhardt d8c9b5431b python: provide better error message for wrapped C++ methods
If you successfully export a C++ SimObject method, but try to
invoke it from Python before the C++ object is created, you
get a confusing error that says the attribute does not exist,
making you question whether you successfully exported the
method at all.  In reality, your only problem is that you're
calling the method too soon.  This patch enhances the error
message to give you a better clue.
2014-01-03 17:08:43 -08:00
Steve Reinhardt ba9ec669bc python: don't die on assignment to cloned object
Updating the SimObject topology of a cloned hierarchy is a little
dangerous, in that cloning is a "deep copy" and the clone does not
inherit SimObject updates the same way it would inherit scalar
variable assignments.

However, because of various SimObject-valued proxy parameters,
like 'memories', 'clk_domain', and 'system', it turns out that
there are a number of implicit topology changes that happen at
instantiation, which means that these changes are impossible to
avoid.  So in order to make cloning systems useful, this error
has to go.  Changing it to a warning produces a lot of noise,
so it seems best just to delete it.
2014-01-03 17:08:42 -08:00
Christopher Torng b4b03a60b1 sim: Add support for dynamic frequency scaling
This patch provides support for DFS by having ClockedObjects register
themselves with their clock domain at construction time in a member list.
Using this list, a clock domain can update each member's tick to the
curTick() before modifying the clock period.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-12-29 19:29:45 -06:00
Christopher Torng 903b442228 mips: Floating point convert bug fix
In mips architecture, floating point convert instructions use the
FloatConvertOp format defined in src/arch/mips/isa/formats/fp.isa. The type
of the operands in the ISA description file (_sw for signed word, or _sf for
signed float, etc.) is  used to create a type for the operand in C++. Then the
operand is converted using the fpConvert() function in src/arch/mips/utility.cc.

If we are converting from a word to a float, and we want to convert 0xffffffff,
we expect -1 to be passed into fpConvert(). Instead, we see MAX_INT passed in.
Then fpConvert() converts _val_ to MAX_INT in single-precision floating point,
and we get the wrong value.

To fix it, the signs of the convert operands are being changed from unsigned to
signed in the MIPS ISA description.

Then, the FloatConvertOp format is being changed to insert a int32_t into the
C++ code instead of a uint32_t.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-12-29 19:29:45 -06:00
Nilay Vaish d71311b1cf ruby: fix bugs in mesi cmp directory protocol
This patch fixes couple of bugs in the L2 controller of the mesi cmp
directory protocol.

1. The state MT_I was transitioning to NP on receiving a clean writeback
from the L1 controller.  This patch makes it inform the directory controller
about the writeback.

2. The L2 controller was sending the dirty bit to the L1 controller and the
L2 controller used writeback from the L1 controller to update the dirty bit
unconditionally.  Now, the L1 controller always assumes that the incoming
data is clean.  The L2 controller updates the dirty bit only when the L1
controller writes to the block.

3. Certain unused functions and events are being removed.
2013-12-26 15:18:55 -06:00
Nilay Vaish fc53f9ffcc ruby: slicc: replace max_in_port_rank with number of inports
This patch replaces max_in_port_rank with the number of inports.  The use of
max_in_port_rank was causing spurious re-builds and incorrect initialization
of variables in ruby related regression tests.  This was due to the variable
value being used across threads while compiling when it was not meant to be.

Since the number of inports is state machine specific value, this problem
should get solved.
2013-12-20 20:34:04 -06:00
Nilay Vaish 30b259a31e ruby: declare variables to be unsigned in Address.hh 2013-12-20 20:34:03 -06:00
Nilay Vaish f5b52a265a ruby: mesi: remove owner and sharer fields from directory tags
The directory controller should not have the sharer field since there is
only one level 2 cache. Anyway the field was not in use.  The owner field
was being used to track the l2 cache version (in case of distributed l2) that
has the cache block under consideration.  The information is not required
since the version of the level 2 cache can be obtained from a subset of the
address bits.
2013-12-20 20:34:03 -06:00
Nilay Vaish 50d250f514 sim: reset stats after startup
Currently statistics are reset after the initial / checkpoint state
has been loaded. But ruby does some checkpoint processing in its
startup() function. So the stats need to be reset after the startup()
function has been called. This patch moves the class to stats.reset()
to achieve this change in functionality.
2013-12-03 10:51:40 -06:00
Nilay Vaish 5800e83223 cpu: call BaseCPU startup() function in o3 cpu 2013-12-03 10:36:04 -06:00
Andreas Sandberg c033ead992 base: Fix race in PollQueue and remove SIGALRM workaround
There is a race between enabling asynchronous IO for a file descriptor
and IO events happening on that descriptor. A SIGIO won't normally be
delivered if an event is pending when asynchronous IO is
enabled. Instead, the signal will be raised the next time there is an
event on the FD. This changeset simulates a SIGIO by setting the
async_io flag when setting up asynchronous IO for an FD. This causes
the main event loop to poll all file descriptors to check for pending
IO. As a consequence of this, the old SIGALRM hack should no longer be
needed and is therefore removed.
2013-11-29 14:36:10 +01:00
Andreas Sandberg 9c57d5b5a6 base: Clean up signal handling
The PollEvent class dynamically installs a SIGIO and SIGALRM handler
when a file handler is registered. Most signal handlers currently get
registered in the initSignals() function. This changeset moves the
SIGIO/SIGALRM handlers to initSignals() to live with the other signal
handlers. The original code installs SIGIO and SIGALRM with the
SA_RESTART option to prevent syscalls from returning EINTR. This
changeset consistently uses this flag for all signal handlers to
ensure that other signals that trigger asynchronous behavior (e.g.,
statistics dumping) do not cause undesirable EINTR returns.
2013-11-29 14:35:36 +01:00
Nilay Vaish 9fb93e5cd2 sim: correct ticksToCycles() function. 2013-11-26 17:05:22 -06:00
Andreas Sandberg 4b8be6a90b kvm: Set the perf exclude_host attribute if available
The performance counting framework in Linux 3.2 and onwards supports
an attribute to exclude events generated by the host when running
KVM. Setting this attribute allows us to get more reliable
measurements of the guest machine. For example, on a highly loaded
system, the instruction counts from the guest can be severely
distorted by the host kernel (e.g., by page fault handlers).

This changeset introduces a check for the attribute and enables it in
the KVM CPU if present.
2013-10-15 10:09:23 +02:00
Christian Menard d4f205ea2f x86: Implementation of Int3 and Int_Ib in long mode
This is an implementation of the x86 int3 and int immediate
instructions for long mode according to 'AMD64 Programmers Manual
Volume 3'.
2013-11-26 17:51:07 +01:00
Andreas Sandberg e5d63d0535 kvm: Remove the unused hostFreq member from BaseKvmCPU 2013-11-26 17:40:58 +01:00
Steve Reinhardt ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E%2C%20Ali%20Saidi%20%3CAli.Saidi%40ARM.com%3E) de366a16f1 sim: simulate with multiple threads and event queues
This patch adds support for simulating with multiple threads, each of
which operates on an event queue.  Each sim object specifies which eventq
is would like to be on.  A custom barrier implementation is being added
using which eventqs synchronize.

The patch was tested in two different configurations:
1. ruby_network_test.py: in this simulation L1 cache controllers receive
   requests from the cpu. The requests are replied to immediately without
   any communication taking place with any other level.
2. twosys-tsunami-simple-atomic: this configuration simulates a client-server
   system which are connected by an ethernet link.

We still lack the ability to communicate using message buffers or ports. But
other things like simulation start and end, synchronizing after every quantum
are working.

Committed by: Nilay Vaish
2013-11-25 11:21:00 -06:00
Anthony Gutierrez 8a53da22c2 cpu: allow the fetch buffer to be smaller than a cache line
the current implementation of the fetch buffer in the o3 cpu
is only allowed to be the size of a cache line. some
architectures, e.g., ARM, have fetch buffers smaller than a cache
line, see slide 22 at:
http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf

this patch allows the fetch buffer to be set to values smaller
than a cache line.
2013-11-15 13:21:15 -05:00
Andreas Hansson f028da7af7 cpu: Fix Checker register index use
This patch fixes an issue in the checker CPU register indexing. The
code will not even compile using LTO as deep inlining causes the used
index to be outside the array bounds.
2013-11-15 03:47:10 -05:00
Steve Reinhardt a2c21d47a8 tests: suppress output on switcheroo tests
The output from the switcheroo tests is voluminous and
(because it includes timestamps) highly sensitive to
minor changes, leading to extremely large updates to the
reference outputs.  This patch addresses this problem
by suppressing output from the tests.  An internal
parameter can be set to enable the output.  Wiring that
up to a command-line flag (perhaps even the rudimantary
-v/-q options in m5/main.py) is left for future work.
2013-11-14 15:03:42 -08:00
Anthony Gutierrez 99d6c3b7e0 sim: fix event priority name for debug-start option 2013-11-12 11:46:48 -05:00
Andreas Hansson 460cc77d6d mem: Fixes for DRAM stats accounting
This patch fixes a number of stats accounting issues in the DRAM
controller. Most importantly, it separates the system interface and
DRAM interface so that it is clearer what the actual DRAM bandwidth
(and consequently utilisation) is.
2013-11-01 11:56:31 -04:00
Andreas Hansson ce93982cc6 mem: Fix the LPDDR3 page size
This patch corrects the LPDDR3 page size, which was set too low.
2013-11-01 11:56:30 -04:00
Neha Agarwal 5c486908d7 mem: Adding stats for DRAM power calculation
This patch adds stats which are used for offline power calculation
from the 'Micron Power Calculator' spreadsheet.
2013-11-01 11:56:28 -04:00
Neha Agarwal 77fce1ce0e mem: Unify request selection for read and write queues
This patch unifies the request selection across read and write queues
for FR-FCFS scheduling policy. It also fixes the request selection
code to prioritize the row hits present in the request queues over the
selection based on earliest bank availability.
2013-11-01 11:56:27 -04:00
Andreas Hansson bb572663cf mem: Add a simple adaptive version of the open-page policy
This patch adds a basic adaptive version of the open-page policy that
guides the decision to keep open or close by looking at the contents
of the controller queues. If no row hits are found, and bank conflicts
are present, then the row is closed by means of an auto
precharge. This is a well-known technique that should improve
performance in most use-cases.
2013-11-01 11:56:26 -04:00
Neha Agarwal da6fd72f62 mem: Just-in-time write scheduling in DRAM controller
This patch removes the untimed while loop in the write scheduling
mechanism and now schedule commands taking into account the minimum
timing constraint. It also introduces an optimization to track write
queue size and switch from writes to reads if the number of write
requests fall below write low threshold.
2013-11-01 11:56:25 -04:00
Andreas Hansson ee6b41a1e4 mem: Add tRRD as a timing parameter for the DRAM controller
This patch adds the tRRD parameter to the DRAM controller. With the
recent addition of the actAllowedAt member for each bank, this
addition is trivial.
2013-11-01 11:56:24 -04:00
Andreas Hansson 491d3a77cf mem: Less conservative tRAS in DRAM configurations
This patch changes the default values of the tRAS timing parameter to
be less conservative, and closer in line with existing parts.
2013-11-01 11:56:23 -04:00
Ani Udipi 8bc855fa15 mem: Make tXAW enforcement less conservative and per rank
This patch changes the tXAW constraint so that it is enforced per rank
rather than globally for all ranks in the channel. It also avoids
using the bank freeAt to enforce the activation limit, as doing so
also precludes performing any column or row command to the
DRAM. Instead the patch introduces a new variable actAllowedAt for the
banks and use this to track when a potential activation can occur.
2013-11-01 11:56:22 -04:00
Neha Agarwal 7645c8e611 mem: Fix for 100% write threshold in DRAM controller
This patch fixes the controller when a write threshold of 100% is
used.  Earlier for 100% write threshold no data is written to memory
as writes never get triggered since this corner case is not
considered.
2013-11-01 11:56:21 -04:00
Andreas Hansson 10e8978ec0 mem: Pick the next DRAM request based on bank availability
This patch changes the FCFS bit of FR-FCFS such that requests that
target the earliest available bank are picked first (as suggested in
the original work on FR-FCFS by Rixner et al). To accommodate this we
add functionality to identify a bank through a one-dimensional
identifier (bank id). The member names of the DRAMPacket are also
update to match the style guide.
2013-11-01 11:56:20 -04:00
Ani Udipi ea76f97576 mem: Use the same timing calculation for DRAM read and write
This patch simplifies the DRAM model by re-using the function that
computes the busy and access time for both reads and writes.
2013-11-01 11:56:19 -04:00
Ani Udipi 655bf86828 mem: Fix DRAM bank occupancy for streaming access
This patch fixes an issue that allowed more than 100% bus utilisation
in certain cases.
2013-11-01 11:56:18 -04:00
Ani Udipi be62a142cf mem: Schedule time for DRAM event taking tRAS into account
This patch changes the time the controller is woken up to take the
next scheduling decisions. tRAS is now handled in estimateLatency and
doDRAMAccess and we do not need to worry about it at scheduling
time. The earliest we need to wake up is to do a pre-charge, row
access and column access before the bus becomes free for use.
2013-11-01 11:56:17 -04:00
Ani Udipi d4cf009b95 mem: Add tRAS parameter to the DRAM controller model
This patch adds an explicit tRAS parameter to the DRAM controller
model. Previously tRAS was, rather conservatively, assumed to be tRCD
+ tCL + tRP. The default values for tRAS are chosen to match the
previous behaviour and will be updated later.
2013-11-01 11:56:16 -04:00
Andreas Hansson c9a8b7b147 sim: Clarify the difference between tracing and debugging
This patch changes the name the command-line options related to debug
output to all start with "debug" rather than being a mix of that and
"trace". It also makes it clear that the breakpoint time is specified
in ticks and not in cycles.
2013-11-01 11:56:13 -04:00
Chander Sudanthi 3e6da89419 ARM: add support for TEEHBR access
Thumb2 ARM kernels may access the TEEHBR via thumbee_notifier
in arch/arm/kernel/thumbee.c.  The Linux kernel code just seems
to be saving and restoring the register.  This patch adds support
for the TEEHBR cp14 register.  Note, this may be a special case
when restoring from an image that was run on a system that
supports ThumbEE.
2013-10-31 13:41:13 -05:00
Matt Evans d17529b046 dev: Add 'OSC' oscillator sys control reg support to VersatileExpress
The VE motherboard provides a set of system control registers through which
various motherboard and coretile registers are accessed.  Voltage regulators and
oscillator (DLL/PLL) config are examples. These registers must be impleted to
boot Linux 3.9+ kernels.
2013-10-31 13:41:13 -05:00
Geoffrey Blake c32fbb7c00 dev: Add support for MSI-X and Capability Lists for ARM and PCI devices
This patch adds the registers and fields to the PCI device to support
Capability lists and to support MSI-X in the GIC.
2013-10-31 13:41:13 -05:00
Geoffrey Blake be4aa2b6ba dev: Fix race conditions in IDE device on newer kernels
Newer linux kernels and distros exercise more functionality in the IDE device
than previously, exposing 2 races. The first race is the handling of aborted
DMA commands would immediately report the device is ready back to the kernel
and cause already in flight commands to assert the simulator when they returned
and discovered an inconsitent device state.  The second race was due to the
Status register not being handled correctly, the interrupt status bit would get
stuck at 1 and the driver eventually views this as a bad state and logs the
condition to the terminal.  This patch fixes these two conditions by making the
device handle aborted commands gracefully and properly handles clearing the
interrupt status bit in the Status register.
2013-10-31 13:41:13 -05:00
Geoffrey Blake fb0496498d base: Add support for ipv6 into inet.hh/inet.cc 2013-10-31 13:41:13 -05:00
Faissal Sleiman 397dc784fd cpu: Construct ROB with cpu params struct instead of each variable
Most other structures/stages get passed the cpu params struct.
2013-10-31 13:41:13 -05:00
Geoffrey Blake 15938e0492 config: Fix handling of parents for simobject vectors
SimObjectVector objects did not provide the same interface to
the _parent attribute through get_parent() like a normal
SimObject.  It also handled assigning a _parent incorrectly
if objects in a SimObjectVector were changed post-creation,
leading to errors later when the simulator tried to execute.
This patch fixes these two omissions.
2013-10-31 13:41:13 -05:00
Dam Sunwoo 6b4543184e sim: added option to serialize SimLoopExitEvent
SimLoopExitEvents weren't serialized by default. Some benchmarks
utilize a delayed m5 exit pseudo op call to terminate the simulation
and this event was lost when resuming from a checkpoint generated
after the pseudo op call. This patch adds the capability to serialize
the SimLoopExitEvents and enable serialization for m5_exit and m5_fail
pseudo ops by default. Does not affect other generic
SimLoopExitEvents.
2013-10-31 13:41:13 -05:00
Stephan Diestelhorst 19c2a606fa mem: Add "const" attribute to Packet getters
Add a "const" keywords to the getters in the Packet class so these can be
invoked on const Packet objects.
2013-10-31 13:41:13 -05:00
Prakash Ramrakhyani 885656f2ed mem: Add privilege info to request class
This patch adds a flag in the request class that indicates if the request
was made in privileged mode.
2013-10-31 13:41:13 -05:00
Ali Saidi 79f81e2641 cpu: Fix O3 issuse with load+barrier instructions.
Fix a problem in the O3 CPU for instructions that are both
memory loads and memory barriers (e.g. load acquire) and
to uncacheable memory. This combination can confuse the
commit stage into commitng an instruction that hasn't
executed and got it's value yet. At the same time refactor
the code slightly to remove duplication between two of
the cases.
2013-10-31 13:41:13 -05:00
Lluc Alvarez 2b9b245fb3 ruby: set SenderMachine in messages of MOESI_CMP_directory
This patch adds missing initializations of the SenderMachine field of
out_msg's when thery are created in the L2 cache controller of the
MOESI_CMP_directory coherence protocol. When an out_msg is created and this
field is left uninitialized, it is set to the default value MachineType_NUM.
This causes a panic in the MachineType_to_string function when gem5 is
executed with the Ruby debug flag on and it tries to print the message.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-10-30 10:35:06 -05:00
Emilio Castillo 80fa6a0edc ruby: Fixed a deadlock when restoring a checkpoint with garnet
This patch fixes a problem where in Garnet, the enqueue time in the
VCallocator and the SWallocator which is of type Cycles was being stored
inside a variable with int type.

This lead to a known problem restoring checkpoints with garnet & the fixed
pipeline enabled. That value was really big and didn't fit in the variable
overflowing it, therefore some conditions on the VC allocation stage & the
SW allocation stage were not met and the packets didn't advance through the
network, leading to a deadlock panic right after the checkpoint was restored.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-10-30 10:35:05 -05:00
Stephan Diestelhorst 4e9d91016a mem: De-virtualise interfaces in the CoherentBus
The CoherentBus eventually got virtual methods for its interface. The
"virtuality" of the CoherentBus, however, comes already from the virtual
interface of the bus' ports. There is no need to add another layer of virtual
functions, here.
2013-10-17 10:20:45 -05:00
Matt Horsnell 6decd70bfb cpu: add consistent guarding to *_impl.hh files. 2013-10-17 10:20:45 -05:00
Sascha Bischoff 52f90890a3 mem: Add PortID to QueuedMasterPort constructor
This patch adds the PortID to the QueuedMasterPort. This allows a PortID to be
specified as it previously was set to the detault value of -1.
2013-10-17 10:20:45 -05:00
Matt Evans 94d17a547c arm: Add a 'clear PPI' method to gic_pl390
The underlying assumption that all PPIs must be edge-triggered is
strained when the architected timers and VGIC interfaces make
level-behaviour observable. For example, a virtual timer interrupt
'goes away' when the hypervisor is entered and the vtimer is disabled;
this requires a PPI to be de-activated.

The new method simply clears the interrupt pending state.
2013-10-17 10:20:45 -05:00
Geoffrey Blake 2b9138135e config: Fix ommission of number base in ethernet address param
The ethernet address param tries to convert a hexadecimal
string using int() in python, which defaults to base 10,
need to specify base 16 in this case.
2013-10-17 10:20:45 -05:00
Geoffrey Blake 3d582c767a config: Fix for port references generated multiple times
SimObjects are expected to only generate one port reference per
port belonging to them.  There is a subtle bug with using "not"
here as a VectorPort is seen as not having a reference if it is
either None or empty as per Python docs sec 9.9 for Standard operators.
Intended behavior is to only check if we have not created the reference.
2013-10-17 10:20:45 -05:00
Dam Sunwoo ad614bf24d dev: Add option to disable framebuffer .bmp dump in run folder
There is an option to enable/disable all framebuffer dumps, but the
last frame always gets dumped in the run folder with no other way to
disable it. These files can add up very quickly running many experiments.

This patch adds an option to disable them. The default behavior
remains unchanged.
2013-10-17 10:20:45 -05:00
Faissal Sleiman 1746eb4a11 cpu: Removing an unused variable in rename 2013-10-17 10:20:45 -05:00
Faissal Sleiman 9195f1fbfd cpu: Change IEW DPRINTF to use IEW debug flag
IEW DPRINTF uses Decode debug flag, which appears to be a copying error. This
patch changes this to the IEW Debug flag.
2013-10-17 10:20:45 -05:00
Faissal Sleiman e516531bd0 cpu: Put in assertions to check for maximum supported LQ/SQ size
LSQSenderState represents the LQ/SQ index using uint8_t, which supports up to
 256 entries (including the sentinel entry). Sending packets to memory with a
higher index than 255 truncates the index, such that the response matches the
wrong entry. For instance, this can result in a deadlock if a store completion
does not clear the head entry.
2013-10-17 10:20:45 -05:00
Eric Van Hensbergen bfdd031c0d arm: Accomodate function name changes in newer linux kernels 2013-10-17 10:20:45 -05:00
Ali Saidi 2f7b012ced arm: Fix a GIC mask register bug
This resulted in a kernel printk that said,
"GIC CPU mask not found - kernel will fail to boot."
2013-10-17 10:20:45 -05:00
Ali Saidi cf266f05a9 cpu: Fix O3 uncacheable load that is replayed but misses the TLB
This change fixes an issue in the O3 CPU where an uncachable instruction
is attempted to be executed before it reaches the head of the ROB. It is
determined to be uncacheable, and is replayed, but a PanicFault is attached
to the instruction to make sure that it is properly executed before
committing. If the TLB entry it was using is replaced in the interveaning
time, the TLB returns a delayed translation when the load is replayed at
the head of the ROB, however the LSQ code can't differntiate between the
old fault and the new one. If the translation isn't complete it can't
be faulting, so clear the fault.
2013-10-17 10:20:45 -05:00
Ali Saidi 60ce2b34fe mem: Make MemoryAccess flag more verbose
This patch extends the MemoryAccess debug flag to report who sent the
requests and the cacheability.
2013-10-17 10:20:45 -05:00
Andreas Hansson 8a8e5cdc7e build: Place proto output in the same directory, also for EXTRAS
This patch changes the ProtoBuf builder such that the generated source
and header is placed in the build directory of the proto file. This
was previously not the case for the directories included as EXTRAS. To
make this work, we also ensure that the build directory for the EXTRAS
are added to the include path (which does not seem to automatically be
the case).
2013-10-17 10:20:45 -05:00
Ali Saidi 88b811b4ef dev: Allow additional UART interrupts to be set
This patch allows setting a few additional interrupts for status
changes that should never occur.
2013-10-17 10:20:45 -05:00
Andreas Sandberg cc42e87b85 kvm: Fix latency calculation of IPR accesses
When handling IPR accesses in doMMIOAccess, the KVM CPU used
clockEdge() to convert between cycles and ticks. This is incorrect
since doMMIOAccess is supposed to return a latency in ticks rather
than when the access is done. This changeset fixes this issue by
returning clockPeriod() * ipr_delay instead.
2013-10-16 18:12:15 +02:00
Steve Reinhardt b10ff075b1 ruby: eliminate non-determinism from ruby.stats output
Get rid of non-deterministic "stats" in ruby.stats output
such as time & date of run, elapsed & CPU time used,
and memory usage.  These values cause spurious
miscomparisons when looking at output diffs (though
they don't affect regressions, since the regressions
pass/fail status currently ignores ruby.stats entirely).

Most of this information is already captured in other
places (time & date in stdout, elapsed time & mem usage
in stats.txt), where the regression script is smart
enough to filter it out.  It seems easier to get rid of
the redundant output rather than teaching the
regression tester to ignore the same information in
two different places.
2013-10-15 18:22:49 -04:00
Yasuko Eckert 1bb293d1e7 arch/x86: add support for explicit CC register file
Convert condition code registers from being specialized
("pseudo") integer registers to using the recently
added CC register class.

Nilay Vaish also contributed to this patch.
2013-10-15 14:22:44 -04:00
Yasuko Eckert 2c293823aa cpu: add a condition-code register class
Add a third register class for condition codes,
in parallel with the integer and FP classes.
No ISAs use the CC class at this point though.
2013-10-15 14:22:44 -04:00
Steve Reinhardt 5526221847 cpu/o3: clean up rename map and free list
Restructured rename map and free list to clean up some
extraneous code and separate out common code that can
be reused across different register classes (int and fp
at this point).  Both components now consist of a set
of Simple* objects that are stand-alone rename map &
free list for each class, plus a Unified* object that
presents a unified interface across all register
classes and then redirects accesses to the appropriate
Simple* object as needed.

Moved free list initialization to PhysRegFile to better
isolate knowledge of physical register index mappings
to that class (and remove the need to pass a number
of parameters to the free list constructor).

Causes a small change to these stats:
  cpu.rename.int_rename_lookups
  cpu.rename.fp_rename_lookups
because they are now categorized on a per-operand basis
rather than a per-instruction basis.
That is, an instruction with mixed fp/int/misc operand
types will have each operand categorized independently,
where previously the lookup was categorized based on
the instruction type.
2013-10-15 14:22:44 -04:00
Steve Reinhardt 219c423f1f cpu: rename *_DepTag constants to *_Reg_Base
Make these names more meaningful.

Specifically, made these substitutions:

s/FP_Base_DepTag/FP_Reg_Base/g;
s/Ctrl_Base_DepTag/Misc_Reg_Base/g;
s/Max_DepTag/Max_Reg_Index/g;
2013-10-15 14:22:43 -04:00
Steve Reinhardt a830e63de7 isa: clean up register constants
Clean up and add some consistency to the *_Base_DepTag
constants as well as some related register constants:
- Get rid of NumMiscArchRegs, TotalArchRegs, and TotalDataRegs
  since they're never used and not always defined
- Set FP_Base_DepTag = NumIntRegs when possible (i.e.,
  every case except x86)
- Set Ctrl_Base_DepTag = FP_Base_DepTag + NumFloatRegs
  (this was true before, but wasn't always expressed
  that way)
- Drastically reduce the number of arbitrary constants
  appearing in these calculations
2013-10-15 14:22:43 -04:00
Steve Reinhardt 9bd017b8ae cpu/o3: clean up scoreboard object
It had a bunch of fields (and associated constructor
parameters) thet it didn't really use, and the array
initialization was needlessly verbose.

Also just hardwired the getReg() method to aleays
return true for misc regs, rather than having an array
of bits that we always kept marked as ready.
2013-10-15 14:22:43 -04:00
Steve Reinhardt c009d0eb2a cpu/o3: clean up physical register file
No need for PhysRegFile to be a template class, or
have a pointer back to the CPU.  Also made some methods
for checking the physical register type (int vs. float)
based on the phys reg index, which will come in handy later.
2013-10-15 14:22:43 -04:00
Steve Reinhardt 06d246ab4a cpu/inorder: merge register class enums
The previous patch introduced a RegClass enum to clean
up register classification.  The inorder model already
had an equivalent enum (RegType) that was used internally.
This patch replaces RegType with RegClass to get rid
of the now-redundant code.
2013-10-15 14:22:43 -04:00
Steve Reinhardt 7aa423acad cpu: clean up architectural register classification
Move from a poorly documented scheme where the mapping
of unified architectural register indices to register
classes is hardcoded all over to one where there's an
enum for the register classes and a function that
encapsulates the mapping.
2013-10-15 14:22:42 -04:00
Andreas Sandberg 4f5775df64 mem: Rename the ASI_BITS flag field in Request
ASI_BITS in the Request object were originally used to store a memory
request's ASI on SPARC. This is not the case any more since other ISAs
use the ASI bits to store architecture-dependent information. This
changeset renames the ASI_BITS to ARCH_BITS which better describes
their use. Additionally, the getAsi() accessor is renamed to
getArchFlags().
2013-10-15 13:26:34 +02:00
Andreas Sandberg 5e7738467b mem: Use a flag instead of address bit 63 for generic IPRs
Using address bit 63 to identify generic IPRs caused problems on
SPARC, where IPRs are heavily used. This changeset redefines how
generic IPRs are identified. Instead of using bit 63, we now use a
separate flag (GENERIC_IPR) a memory request.
2013-10-15 13:24:35 +02:00
Nilay Vaish 87cc327abb x86: enables lstat and readlink syscalls 2013-10-07 18:05:49 -05:00
Andreas Sandberg c0f367e514 base: Fix a potential race in PollQueue::setupAsyncIO
There is a potential race between enabling asynchronous IO and
selecting the target for the SIGIO signal. This changeset move the
F_SETOWN call to before the F_SETFL call that enables SIGIO
delivery. This ensures that signals are always sent to the correct
process.
2013-10-07 16:03:15 +02:00
Andreas Sandberg 0dd6f87e63 kvm: Service events in the instruction event queues
This changset adds calls to the service the instruction event queues
that accidentally went missing from commit [0063c7dd18ec]. The
original commit only included the code needed to schedule instruction
stops from KVM and missed the functionality to actually service the
events.
2013-10-03 11:00:18 +02:00
Andreas Sandberg fec2dea5c3 x86: Add support for m5ops through a memory mapped interface
In order to support m5ops in virtualized environments, we need to use
a memory mapped interface. This changeset adds support for that by
reserving 0xFFFF0000-0xFFFFFFFF and mapping those to the generic IPR
interface for m5ops. The mapping is done in the
X86ISA::TLB::finalizePhysical() which means that it just works for all
of the CPU models, including virtualized ones.
2013-09-30 12:20:53 +02:00
Andreas Sandberg d9856f33a4 arch: Add support for m5ops using mmapped IPRs
In order to support m5ops on virtualized CPUs, we need to either
intercept hypercall instructions or provide a memory mapped m5ops
interface. Since KVM does not normally pass the results of hypercalls
to userspace, which makes that method unfeasible. This changeset
introduces support for m5ops using memory mapped mmapped IPRs. This is
implemented by adding a class of "generic" IPRs which are handled by
architecture-independent code. Such IPRs always have bit 63 set and
are handled by handleGenericIprRead() and
handleGenericIprWrite(). Platform specific impementations of
handleIprRead and handleIprWrite should use
GenericISA::isGenericIprAccess to determine if an IPR address should
be handled by the generic code instead of the architecture-specific
code. Platforms that don't need their own IPR support can reuse
GenericISA::handleIprRead() and GenericISA::handleIprWrite().
2013-09-30 12:20:43 +02:00
Andreas Sandberg 114b643dd0 x86: Add support for FXSAVE, FXSAVE64, FXRSTOR, and FXRSTOR64 2013-09-30 12:06:36 +02:00
Andreas Sandberg 47bcc5c737 x86: Add support for FLDENV & FNSTENV 2013-09-30 12:04:36 +02:00
Andreas Sandberg 654d1e675a x86: Add support for loading 32-bit and 80-bit floats in the x87
The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.

80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.
2013-09-30 12:00:20 +02:00
Andreas Sandberg c299dcedc6 x86: Fix re-entrancy problems in x87 store instructions
X87 store instructions typically loads and pops the top value of the
stack and stores it in memory. The current implementation pops the
stack at the same time as the floating point value is loaded to a
temporary register. This will corrupt the state of the x87 stack if
the store fails. This changeset introduces a pop87 micro-instruction
that pops the stack and uses this instruction in the affected
macro-instructions to pop the stack after storing the value to memory.
2013-09-30 11:51:25 +02:00
Andreas Sandberg 469f2e31cf kvm: Add support for thread-specific instruction events
Instruction events are currently ignored when executing in KVM. This
changeset adds support for triggering KVM exits based on instruction
counts using hardware performance counters. Depending on the
underlying performance counter implementation, there might be some
inaccuracies due to instructions being counted in the host kernel when
entering/exiting KVM.

Due to limitations/bugs in Linux's performance counter interface, we
can't reliably change the period of an overflow counter. We work
around this issue by detaching and reattaching the counter if we need
to reconfigure it.
2013-09-30 09:53:52 +02:00
Andreas Sandberg 86bade714e kvm: FPU synchronization support on x86
This changeset adds support for synchronizing the FPU and SIMD state
of a virtual x86 CPU with gem5. It supports both the XSave API and the
KVM_(GET|SET)_FPU kernel API. The XSave interface can be disabled
using the useXSave parameter (in case of kernel
issues). Unfortunately, KVM_(GET|SET)_FPU interface seems to be buggy
in some kernels (specifically, the MXCSR register isn't always
synchronized), which means that it might not be possible to
synchronize MXCSR on old kernels without the XSave interface.

This changeset depends on the __float80 type in gcc and might not
build using llvm.
2013-09-30 09:43:43 +02:00
Andreas Sandberg cccca70149 x86: Add support routines to load and store 80-bit floats
The x87 FPU on x86 supports extended floating point. We currently
handle all floating point on x86 as double and don't support 80-bit
loads/stores. This changeset add a utility function to load and
convert 80-bit floats to doubles (loadFloat80) and another function to
store doubles as 80-bit floats (storeFloat80). Both functions use
libfputils to do the conversion in software. The functions are
currently not used, but are required to handle floating point in KVM
and to properly support all x87 loads/stores.
2013-09-30 09:42:30 +02:00
Andreas Sandberg 3af2d8eab0 x86: Add limited support for extracting function call arguments
Add support for extracting the first 6 64-bit integer argumements to a
function call in X86ISA::getArgument().
2013-09-30 09:37:17 +02:00
Andreas Sandberg 30841926a3 kvm: x86: Fix segment registers to make them VMX compatible
There are cases when the segment registers in gem5 are not compatible
with VMX. This changeset works around all known such issues. Specifically:

* The accessed bits in CS, SS, DD, ES, FS, GS are forced to 1.
* The busy bit in TR is forced to 1.
* The protection level of SS is forced to the same protection level as
  CS. The difference /seems/ to be caused by a bug in gem5's x86
  implementation.
2013-09-30 09:36:54 +02:00
Andreas Sandberg e5c319db43 kvm: Add x86 segment register verification to help debugging 2013-09-25 12:35:21 +02:00
Andreas Sandberg 599b59b387 kvm: Initial x86 support
This changeset adds support for KVM on x86. Full support is split
across a number of commits since some features are relatively
complex. This changeset includes support for:

 * Integer state synchronization (including segment regs)
 * CPUID (gem5's CPUID values are inserted into KVM)
 * x86 legacy IO (remapped and handled by gem5's memory system)
 * Memory mapped IO
 * PCI
 * MSRs
 * State dumping

Most of the functionality is fairly straight forward. There are some
quirks to support PCI enumerations since this is done in the TLB(!) in
the simulated CPUs. We currently replicate some of that code.

Unlike the ARM implementation, the x86 implementation of the virtual
CPU does not use the cycles hardware counter. KVM on x86 simulates the
time stamp counter (TSC) in the kernel. If we just measure host cycles
using perfevent, we might end up measuring a slightly different number
of cycles. If we don't get the cycle accounting right, we might end up
rewinding the TSC, with all kinds of chaos as a result.

An additional feature of the KVM CPU on x86 is extended state
dumping. This enables Python scripts controlling the simulator to
request dumping of a subset of the processor state. The following
methods are currenlty supported:

 * dumpFpuRegs
 * dumpIntRegs
 * dumpSpecRegs
 * dumpDebugRegs
 * dumpXCRs
 * dumpXSave
 * dumpVCpuEvents
 * dumpMSRs

Known limitations:
  * M5 ops are currently not supported.
  * FPU synchronization is not supported (only affects CPU switching).

Both of the limitations will be addressed in separate commits.
2013-09-25 12:24:26 +02:00
Andreas Sandberg cd9cd85ce9 kvm: Correctly handle the return value from handleIpr(Read|Write)
The KVM base class incorrectly assumed that handleIprRead and
handleIprWrite both return ticks. This is not the case, instead they
return cycles. This changeset converts the returned cycles to ticks
when handling IPR accesses.
2013-09-19 17:55:04 +02:00
Andreas Sandberg 211c10b46d kvm: Fix a case where the run timers weren't armed properly
There is a possibility that the timespec used to arm a timer becomes
zero if the number of ticks used when arming a timer is close to the
resolution of the timer. Due to the semantics of POSIX timers, this
actually disarms the timer. This changeset fixes this issue by
eliminating the rounding error (we always round away from zero
now). It also reuses the minimum number of cycles, which were
previously only used for cycle-based timers, to calculate a more
useful resolution.
2013-09-19 17:55:03 +02:00
Andreas Sandberg a6e723e4d6 x86: Add support routines to convert between x87 tag formats
This changeset adds the convX87XTagsToTags() and convX87TagsToXTags()
which convert between the tag formats in the FTW register and the
format used in the xsave area. The conversion from to the x87 FTW
representation is currently loses some information since it does not
reconstruct the valid/zero/special flags which are not included in the
xsave representation.
2013-09-19 17:30:26 +02:00
Andreas Sandberg 4dbf25adc3 sim: Fix undefined behavior in the pseudo-inst interface
The order between updating and using arg_num in
PseudoInst::pseudoInst() is currently undefined. This changeset
explicitly updates arg_num after it has been used to extract an
argument.

--HG--
extra : rebase_source : 67c46dc3333d16ce56687ee8aea41ce6c6d133bb
2013-09-18 17:08:35 +02:00