Commit graph

7539 commits

Author SHA1 Message Date
Nikos Nikoleris 227bdde922 proto: Fix warnings for protoc v3
protoc v3 introduces a new syntax for proto files and warns when the
syntax is not explicitly stated.

protoc relies on the fact that undefined preprocessor symbols are
explanded to 0 but since we use -Wundef they end up generating
warnings.

Change-Id: If07abeb54e932469c8f2c4d38634a97fdae40f77
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2017-01-27 15:07:20 -06:00
Alec Roelke e4c57275d3 riscv: Fix crash when syscall argument reg index is too high
By default, doSyscall gets the values of six registers to be used for
system call arguments.  RISC-V, by convention, only has four.  Because
RISC-V's implementation of these indices is as arrays of integers rather
than as base indices plus offsets, trying to get the fifth argument
register's value will cause a crash.  This patch fixes that by returning 0
for any index higher than 3.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2017-01-27 15:05:01 -06:00
Rahul Thakur e9889c46ed mem: Refactor CommMonitor stats, add basic atomic mode stats
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2017-01-27 14:58:16 -06:00
Rahul Thakur 32d05d5fb6 mem: Add memory footprint probe
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2017-01-27 14:58:15 -06:00
Andreas Sandberg 2974dc7a37 python: Move native wrappers to the _m5 namespace
Swig wrappers for native objects currently share the _m5.internal name
space with Python code. This is undesirable if we ever want to switch
from Swig to some other framework for native binding (e.g., PyBind11
or Boost::Python). This changeset moves all of such wrappers to the
_m5 namespace, which is now reserved for native code.

Change-Id: I2d2bc12dbc05b57b7c5a75f072e08124413d77f3
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
2017-01-27 12:40:01 +00:00
Brandon Potter e387521527 syscall_emul: [patch 4/22] remove redundant M5_pid field from process 2016-11-09 14:27:40 -06:00
Brandon Potter a928a438b8 style: [patch 3/22] reduce include dependencies in some headers
Used cppclean to help identify useless includes and removed them. This
involved erroneously included headers, but also cases where forward
declarations could have been used rather than a full include.
2016-11-09 14:27:40 -06:00
Brandon Potter 93d8e6b898 syscall_emul: #ifdef new system calls to allow builds on OSX and BSD 2017-01-20 14:12:58 -05:00
Tony Gutierrez 1961a942f3 ruby: guard usage of GPUCoalescer code in Profiler
the GPUCoalescer code is used in the ruby profiler regardless of
whether or not the coalescer code has been compiled, which can
lead to link/run time errors. here we add #ifdefs to guard the
usage of GPUCoalescer code. eventually we should refactor this
code to use probe points.
2017-01-19 11:59:34 -05:00
Matthew Poremba 42044645b9 ruby: Check MessageBuffer space in garnet NetworkInterface
Garnet's NetworkInterface does not consider the size of MessageBuffers when
ejecting a Message from the network. Add a size check for the MessageBuffer
and only enqueue if space is available. If space is not available, the
message if placed in a queue and the credit is held. A callback from the
MessageBuffer is implemented to wake the NetworkInterface. If there are
messages in the stalled queue, they are processed first, in a FIFO manner
and if succesfully ejected, the credit is finally sent back upstream. The
maximum size of the stall queue is equal to the number of valid VNETs
with MessageBuffers attached.
2017-01-19 11:59:10 -05:00
Matthew Poremba a4b546c3a1 ruby: Add occupancy stats to MessageBuffers
This patch is an updated version of /r/3297.

"The most important statistic for measuring memory hierarchy performance is
throughput, which is affected by independent variables, buffer sizing and
communication latency. It is difficult/impossible to debug performance issues
through series buffers without knowing which are the bottlenecks. For finite
buffers, this patch adds statistics for the average number of messages in the
buffer, the occupancy of the buffer slots, and number of message stalls."
2017-01-19 11:58:59 -05:00
Matthew Poremba 501f170924 ruby: Check all VNETs for injection in garnet NetworkInterface
The NetworkInterface wakeup currently iterates over all VNETs and breaks the
loop if a VNET is unable to allocate a VC. This can cause a deadlock if a
lower numbered VNET is unable to allocate a VC while a higher numbered VNET
has idle VCs. This seems like a bug as Garnet 1.0 uses a while loop over an
if-statement, suggesting the break was intended for this while loop. This
patch removes the break statement, which allows up to one message to be
dequeued from a VNET and injected into the network.
2017-01-19 11:58:49 -05:00
Brandon Potter 1ced08c850 syscall_emul: [patch 2/22] move SyscallDesc into its own .hh and .cc
The class was crammed into syscall_emul.hh which has tons of forward
declarations and template definitions. To clean it up a bit, moved the
class into separate files and commented the class with doxygen style
comments. Also, provided some encapsulation by adding some accessors and
a mutator.

The syscallreturn.hh file was renamed syscall_return.hh to make it consistent
with other similarly named files in the src/sim directory.

The DPRINTF_SYSCALL macro was moved into its own header file with the
include the Base and Verbose flags as well.

--HG--
rename : src/sim/syscallreturn.hh => src/sim/syscall_return.hh
2016-11-09 14:27:40 -06:00
Brandon Potter 7a8dda49a4 style: [patch 1/22] use /r/3648/ to reorganize includes 2016-11-09 14:27:37 -06:00
Andreas Sandberg 1738a7d260 sim: Remove declaration of unused CountedDrainEvent
The CountedDrainEvent event was used to keep track of objects that
required additional simulation to drain. It was removed as a part of
the great drain rewrite, but the declaration remained.

Change-Id: I767a3213669040d3f27e2afafa2e4a5bb997e325
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
2017-01-03 17:31:39 +00:00
Andreas Sandberg c8b1e8f1cf python: Don't use Swig to cast stats
Call the stat visitor from the stat itself rather than casting stats
in Python. This reduces the number of ways visitors are called.

Change-Id: Ic4d0b7b32e3ab9897b9a34cd22d353f4da62d738
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Joe Gross <joseph.gross@amd.com>
2017-01-03 12:03:45 +00:00
Andreas Sandberg abe7ef95cb sim: Remove redundant export_method_cxx_predecls
The headers declared in export_method_cxx_predecls are redundant since a
SimObject's main header is automatically included.

Change-Id: Ied9e84630b36960e54efe91d16f8c66fba7e0da0
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-by: Joe Gross <joseph.gross@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
2017-01-03 12:03:06 +00:00
Joel Hestness 6a49dee3f3 sim: Fix SE mode checkpoint restore file handling
When restoring from a checkpoint, the simulation used to use file handles from
the checkpoint. This disallows multiple separate restore simulations from using
separate input and output files and directories, and plays havoc when the
checkpointed file locations may have changed. Add handling to allow the command
line specified files to be used as input/output for the restored simulation
(Note: this is the similar functionality to FS mode for output and error).
2016-12-23 08:43:18 -06:00
Arthur Perais c9d933efb0 cpu: implement an L-TAGE branch predictor
This patch implements an L-TAGE predictor, based on André Seznec's code
available from CBP-2
(http://hpca23.cse.tamu.edu/taco/camino/cbp2/cbp-src/realistic-seznec.h).

Signed-off-by Jason Lowe-Power <jason@lowepower.com>
2016-12-21 15:25:13 -06:00
Arthur Perais 497cc2d373 cpu: disallow speculative update of branch predictor tables (o3)
The Minor and o3 cpu models share the branch prediction
code. Minor relies on the BPredUnit::squash() function
to update the branch predictor tables on a branch mispre-
diction. This is fine because Minor executes in-order, so
the update is on the correct path. However, this causes the
branch predictor to be updated on out-of-order branch
mispredictions when using the o3 model, which should not
be the case.

This patch guards against speculative update of the branch
prediction tables. On a branch misprediction, BPredUnit::squash()
calls BpredUnit::update(..., squashed = true). The underlying
branch predictor tests against the value of squashed. If it is
true, it restores any speculatively updated internal state
it might have (e.g., global/local branch history), then returns.
If false, it updates its prediction tables. Previously, exist-
ing predictors did not test against the "squashed" parameter.

To accomodate for this change, the Minor model must now call
BPredUnit::squash() then BPredUnit::update(..., squashed = false)
on branch mispredictions. Before, calling BpredUnit::squash()
performed the prediction tables update.

The effect is a slight MPKI improvement when using the o3
model. A further patch should perform the same modifications
for the indirect target predictor and BTB (less critical).

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-12-21 15:07:16 -06:00
Arthur Perais 34065f8d5f cpu: correct comments in tournament branch predictor
The tournament predictor is presented as doing speculative
update of the global history and non-speculative update
of the local history used to generate the branch prediction.
However, the code does speculative update of both histories.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-12-21 15:06:13 -06:00
Arthur Perais 1664625db8 cpu: Resolve targets of predicted 'taken' decode for O3
The target of taken conditional direct branches does not
need to be resolved in IEW: the target can be computed at
decode, usually using the decoded instruction word and the PC.

The higher-than-necessary penalty is taken only on conditional
branches that are predicted taken but miss in the BTB. Thus,
this is mostly inconsequential on IPC if the BTB is big/associative
enough (fewer capacity/conflict misses). Nonetheless, what gem5
simulates is not representative of how conditional branch targets
can be handled.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-12-21 15:05:24 -06:00
Arthur Perais e5fb6752d6 cpu: Clarify meaning of cachePorts variable in lsq_unit.hh of O3
cachePorts currently constrains the number of store packets written to the
D-Cache each cycle), but loads currently affect this variable. This leads
to unexpected congestion (e.g., setting cachePorts to a realistic 1 will
in fact allow a store to WB only if no loads have accessed the D-Cache
this cycle). In the absence of arbitration, this patch decouples how many
loads can be done per cycle from how many stores can be done per cycle.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-12-21 15:04:06 -06:00
Joel Hestness 3a656da1a6 ruby: Make MessageBuffers actually finite sized
When Ruby controllers stall messages in MessageBuffers, the buffer moves those
messages off the priority heap and into a per-address stall map. When buffers
are finite-sized, the test areNSlotsAvailable() only checks the size of the
priority heap, but ignores the stall map, so the map is allowed to grow
unbounded if the controller stalls numerous messages. This patch fixes the
problem by tracking the stall map size and testing the total number of messages
in the buffer appropriately.
2016-12-20 11:38:24 -06:00
Tony Gutierrez 3eb979a8ce ruby: fix typo in DMASequencer::ackCallback() 2016-12-20 11:53:36 -05:00
Tony Gutierrez 02cb6b19a7 ruby: fix issue with unused var in DMASequencer
the iterator declared in DMASequencer::ackCallback() is only used in an
assert, this causes clang to fail when building fast. here we move
the find call on the request table directly into the assert.
2016-12-20 11:47:30 -05:00
Curtis Dunham f04d81163c arm: provide correct timer availability in ID_PFR1 register
Change-Id: Id4cd839c12b70616017a5830e3f9bbb59b0f97ba
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-19 11:03:28 -06:00
Curtis Dunham ae2e0ca3d0 arm: compute ID_AA64PFR{0,1}_EL1 registers
Compute the proper values of the aforementioned registers from
the system configuration rather than configuring the values themselves.

Change-Id: If9774b6610a29568b80ae4866107b9a6a5b5be0f
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-19 11:03:28 -06:00
Curtis Dunham a73937b60c arm: compute ID_PFR{0,1} registers
Compute the proper values of the aforementioned registers from
the system configuration rather than configuring the values themselves.

Change-Id: Ie7685b5d8b5f2dd9d6380b4af74f16d596b2bfd1
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-19 11:03:27 -06:00
Curtis Dunham 282cf5807d arm: miscreg refactoring
Change-Id: I4e9e8f264a4a4239dd135a6c7a1c8da213b6d345
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-19 11:03:27 -06:00
Curtis Dunham 9cf6bc444b arm: audit SCTLR
Change-Id: I814f1431a5f754f75721c9ac51171f860a714d24
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-19 11:03:27 -06:00
Curtis Dunham 7ddb55a5f2 arm: remove SCTLR.FI
Removed from ARMARM.

Change-Id: Ie8f28e4fa6e1b46dfd9c8c4b379e5b42fe25421d
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-19 11:03:27 -06:00
Curtis Dunham 19d90956eb arm: update AArch{64,32} register mappings
Change-Id: Idaaaeb3f7b1a0bdbf18d8e2d46686c78bb411317
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-19 11:03:27 -06:00
Andreas Sandberg bbd3703fbb mem: Make the BaseXBar public to not confuse Python wrappers
The Python wrappers generally assume that destructors are public. Make
the BaseXBar destructor public to avoid confusing the Python wrapper.

Change-Id: If958802409c0be74e875dd6e279742abfdb3ede1
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
2016-12-19 16:25:40 +00:00
Andreas Sandberg 8702208f3f python: Export periodicStatDump
Some configuration scripts need periodic stat dumps. One of the ways
this can be achieved is by using the pariodicStatDump helper
function. This function was previously only exported in the internal
name space. Export it as a normal function in m5.stat instead.

Change-Id: Ic88bf1fd33042a62ab436d5944d8ed778264ac98
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
2016-12-19 16:25:39 +00:00
Andreas Sandberg 73627fa007 dev: Include DmaDevice in NULL builds
Builds for the NULL ISA include Device.py, which contains the Python
declaration of DmaDevice, but don't include the actual C++
implementation. Add dma_device.cc to the NULL build to the Python and
C++ worlds consistent again.

Change-Id: I47a57181a1f4d5a7276467678bf16fbc7f161681
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
2016-12-19 16:25:38 +00:00
Andreas Sandberg d113153b52 python: Fix incorrect header in the DmaDevice wrapper
The header declared in the DmaDevice wrapper doesn't actually contain
the DmaDevice class. This can potentially lead to incorrect type cases
in Swig.

Change-Id: If2266d4180d1d6fd13359a81067068854c5e96fe
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
2016-12-19 16:25:38 +00:00
Andreas Sandberg ac8e73565a sim: Remove redundant buildEnv import
Change-Id: Id6bdbc0c988aa92b96e292cabc913e6b974f14bb
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
2016-12-19 16:25:37 +00:00
Jieming Yin b9c7b8190c ruby: Detect garnet network-level deadlock.
This patch detects garnet network deadlock by monitoring
network interfaces. If a network interface continuously
fails to allocate virtual channels for a message, a
possible deadlock is detected.
2016-12-15 16:59:17 -05:00
Brandon Potter cc1f5a4d16 base: remove header file to prevent a macro name collision 2016-11-09 14:27:37 -06:00
Brandon Potter cc84eb813c syscall_emul: implement fallocate 2016-12-15 13:16:25 -05:00
Brandon Potter 68e9c0e73b syscall_emul: add support for x86 statfs system calls 2016-12-15 13:16:03 -05:00
Brandon Potter 4ff1b165d0 syscall_emul: extend sysinfo system call to include mem_unit 2016-12-15 13:14:41 -05:00
Gabor Dozsa ecf68fac40 dev: Fix race conditions at terminating dist-gem5 simulations
Two problems may arise when a distributed gem5 simulation terminates:
(i) simulation thread(s) may get stuck in an incomplete synchronisation
event which prohibits processing  the simulation exit event; and (ii) a
stale receiver thread may try to access objects that have already been
deleted while exiting gem5. This patch terminates receive threads properly
and aborts the processing of any incomplete synchronisation event.

Change-Id: I72337aa12c7926cece00309640d478b61e55a429
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-06 17:33:06 +00:00
Andreas Hansson c642d6fc37 ruby: Remove RubyMemoryControl and associated files
This patch removes the deprecated RubyMemoryControl. The DRAMCtrl
module should be used instead.
2016-12-05 16:49:07 -05:00
Nikos Nikoleris 0054f1ad53 mem: Respond to InvalidateReq when the block is (pending) dirty
Previously when an InvalidateReq snooped a cache with a dirty block or
a pending modified MSHR, it would invalidate the block or set the
postInv flag. The cache would not send an InvalidateResp. though,
causing memory order violations. This patches changes this behavior,
making the cache with the dirty block or pending modified MSHR the
ordering point.

Change-Id: Ib4c31012f4f6693ffb137cd77258b160fbc239ca
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05 16:48:29 -05:00
Nikos Nikoleris 9916e4276c mem: Invalidate a blk when servicing the 1st invalidating target
Previously an MSHR with one or more invalidating targets would first
service all targets in the MSHR TargetList and then invalidate the
block. As a result any service snooping targets would lookup in the
cache and incorrectly find the block. This patch forces the
invalidation to happen when the first invalidating target is
encountered.

Change-Id: I9df15de24e1d351cd96f5a2c424d9a03d81c2cce
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05 16:48:28 -05:00
Nikos Nikoleris 77dfeb8c09 mem: Allow non invalidating snoops on an InvalidateReq MSHR
This patch changes an assertion that previously assumed that a non
invalidating snoop request should never be serviced by an
InvalidateReq MSHR. The MSHR serves as the ordering point for the
snooping packet. When the InvalidateResp reaches the cache the
snooping packet snoops the caches above to find the requested
block. One or more of the caches above will have the block since
earlier it has seen a WriteLineReq.

Change-Id: I0c147c8b5d5019e18bd34adf9af0fccfe431ae07
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05 16:48:27 -05:00
Nikos Nikoleris 5ebb8ec46b mem: Don't use hasSharers in the snoopFilter for memory responses
When the snoopFilter receives a response, it updates its state using
the hasSharers flag (indicates whether there are more than one copies
of the block in the caches above). The hasSharers flag of the packet
was previously populated when the request was traversing and snooping
the caches looking for the block.
1) When the response is coming from the memory-side port, its order
with respect to other responses is not necessarily preserved (e.g., a
request that arrived second to the xbar can get its response first). As
a result the snoopFilter might process responses out of order updating
its residency information using the non valid hasSharers flag which was
populated much earlier.
2) When the response is from an on-chip, the MSHRs preserve a well
defined order and the hasSharers flag should contain valid
information.

This patch changes the snoopFilter by avoiding the hasSharers flag
when the response is from the memory-side port.

Change-Id: Ib2d22a5b7bf3eccac64445127d2ea20ee74bb25b
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:26 -05:00
Nikos Nikoleris 78a97b1847 mem: Always use InvalidateReq to service WriteLineReq misses
Previously, a WriteLineReq that missed in a cache would send out an
InvalidateReq if the block lookup failed or an UpgradeReq if the
block lookup succeeded but the block had sharers. This changes ensures
that a WriteLineReq always sends an InvalidateReq to invalidate all
copies of the block and satisfy the WriteLineReq.

Change-Id: I207ff5b267663abf02bc0b08aeadde69ad81be61
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05 16:48:25 -05:00
Nikos Nikoleris 3172501a59 mem: Assert that the responderHadWritable is set only once
Change-Id: Ie3beeef25331f84a0a5bcc17f7a791f4a829695b
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:24 -05:00
Andreas Hansson 50812a20f1 mem: Ensure InvalidateReq is considered isForward by MSHRs
This patch fixes an issue where an MSHR would incorrectly be perceived
to provide data to targets arriving after an InvalidateReq. To address
this the InvalidateReq is now treated as isForward, much like an
UpgradeReq that did not hit in the cache.

Change-Id: Ia878444d949539b5c33fd19f3e12b0b8a872275e
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:23 -05:00
Nikos Nikoleris e16967941b mem: Make packet debug printing more uniform
Previously DPRINTFs printing information about a packet would use ad hoc
formats. This patch changes all DPRINTFs to use the print function
defined by the packet class, making the packet printing format more
uniform and easier to change.

Change-Id: Idd436a9758d4bf70c86a574d524648b2a2580970
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:21 -05:00
Nikos Nikoleris 61860f2419 cpu: Change traffic generators to use different values for writes
Previously all traffic generators would use the same value for write
requests. With this change traffic generators use their master id as
the payload of write requests making them more useful for the
memchecker.

Change-Id: Id1a6b8f02853789b108ef6003f4c32ab929bb123
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:20 -05:00
Nikos Nikoleris 0bd9dfb8de mem: Service only the 1st FromCPU MSHR target on ReadRespWithInv
A response to a ReadReq can either be a ReadResp or a
ReadRespWithInvalidate. As we add targets to an MSHR for a ReadReq we
assume that the response will be a ReadResp. When the response is
invalidating (ReadRespWithInvalidate) servicing more than one targets
can potentially violate the memory ordering. This change fixes the way
we handle a ReadRespWithInvalidate. When a cache receives a
ReadRespWithInvalidate we service only the first FromCPU target and
all the FromSnoop targets from the MSHR target list. The rest of the
FromCPU targets are deferred and serviced by a new request.

Change-Id: I75c30c268851987ee5f8644acb46f440b4eeeec2
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:19 -05:00
Nikos Nikoleris d28c2906f4 mem: Keep track of allocOnFill in the TargetList
Previously the information of whether a response was allocating or not
was a property of the MSHR. This change makes this flag a property of
the TargetList. Differernt TargetLists, e.g. the targets and the
deferred targets lists might have different values. Additionally, the
information about whether each of the target expects an allocating
response is stored inside the TargetList container. This allows for
repopulating the flag in case some of the targets are removed.

Change-Id: If3ec2516992f42a6d9da907009ffe3ab8d0d2021
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:18 -05:00
Nikos Nikoleris f7a5de3bec mem: Add support for repopulating the flags of an MSHR TargetList
This patch adds support for repopulating the flags of an MSHR
TargetList. The added functionality makes it possible to remove
targets from a TargetList without leaving it in an inconsistent state.

Change-Id: I3f7a8e97bfd3e2e49bebad056d11bbfb087aad91
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05 16:48:17 -05:00
Brandon Potter 3d0a537862 hsail: disable asserts to allow immediate operands i.e. 0 with loads 2016-12-02 18:01:58 -05:00
Brandon Potter 900fd15622 hsail: add stub type and stub out several instructions 2016-12-02 18:01:57 -05:00
Brandon Potter 86b375f2f3 hsail: add popcount type and generate popcount instructions 2016-12-02 18:01:55 -05:00
Brandon Potter 3bb3db6194 hsail: add a wavesize case statement to register operand code 2016-12-02 18:01:52 -05:00
Brandon Potter 69c2d86d68 hsail: generate mov instructions for more arith_types and bit_types 2016-12-02 18:01:49 -05:00
Brandon Potter 35ba103009 hsail: remove the panic guarding function directives
HSA functions calls are still not supported properly with HSAIL, but
the recent AMP runtime modifications rely on being able to parse the
BRIG/HSAIL files that are extracted from the application binaries.
We need to parse the function call HSAIL definitions, but we do not
actually need to make the function calls.

The reason that this happens is that HCC appends a set of routines
to every HSAIL binary that it creates. These extra, unnecessary
routines exist in the HCC source as a file; this file is cat'd onto
everything that the compiler outputs before being assembled into the
application's binary. HCC does this because it might call these helper
functions. However, it doesn't actually appear to do so in the AMP
codes so we just parse these functions with the HSAIL parser and
then ignore them.
2016-12-02 18:01:42 -05:00
Tony Gutierrez 38708f369b hsail: fix unsigned offset bug in address calculation
it's possible for the offset provided to an HSAIL mem inst to be a negative
value, however the variable we use to hold the offset is an unsigned type.
this can lead to excessively large offset values when the offset is negative,
which will almost certainly cause the access to go out of bounds.
2016-12-02 11:40:52 -05:00
Matthew Poremba 80607a2a1d ruby: Fix overflow reported by ASAN in MessageBuffer.
In MessageBuffer the m_not_avail_count member is incremented but not used.
This causes an overflow reported by ASAN. This patch changes from an int to
Stats::Scalar, since the count is useful in debugging finite MessageBuffers.
2016-12-02 11:40:40 -05:00
Alec Roelke ee0c261e10 riscv: [Patch 7/5] Corrected LRSC semantics
RISC-V makes use of load-reserved and store-conditional instructions to
enable creation of lock-free concurrent data manipulation as well as
ACQUIRE and RELEASE semantics for memory ordering of LR, SC, and AMO
instructions (the latter of which do not follow LR/SC semantics). This
patch is a correction to patch 4, which added these instructions to the
implementation of RISC-V. It modifies locked_mem.hh and the
implementations of lr.w, sc.w, lr.d, and sc.d to apply the proper gem5
flags and return the proper values.

An important difference between gem5's LLSC semantics and RISC-V's LR/SC
ones, beyond the name, is that gem5 uses 0 to indicate failure and 1 to
indicate success, while RISC-V is the opposite. Strictly speaking, RISC-V
uses 0 to indicate success and nonzero to indicate failure where the
value would indicate the error, but currently only 1 is reserved as a
failure code by the ISA reference.

This is the seventh patch in the series which originally consisted of five
patches that added the RISC-V ISA to gem5. The original five patches added
all of the instructions and added support for more detailed CPU models and
the sixth patch corrected the implementations of Linux constants and
structs. There will be an eighth patch that adds some regression tests
for the instructions.

[Removed some commented-out code from locked_mem.hh.]
Signed-off by: Alec Roelke

Signed-off by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:28 -05:00
Alec Roelke 84020a8aed riscv: [Patch 6/5] Improve Linux emulation for RISC-V
This is an add-on patch for the original series that implemented RISC-V
that improves the implementation of Linux emulation for SE mode. Basically
it cleans up linux/linux.hh by removing constants that haven't been
defined for the RISC-V Linux proxy kernel and rearranging the stat
struct so it aligns with RISC-V's implementation of it. It also adds
placeholders for system calls that have been given numbers in RISC-V
but haven't been given implementations yet. These system calls are
as follows:
- readlinkat
- sigprocmask
- ioctl
- clock_gettime
- getrusage
- getrlimit
- setrlimit

The first five patches implemented RISC-V with the base ISA and multiply,
floating point, and atomic extensions and added support for detailed
CPU models with memory timing.

[Fixed incompatibility with changes made from patch 1.]
Signed-off by: Alec Roelke

Signed-off by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:28 -05:00
Alec Roelke 126c0360e2 riscv: [Patch 5/5] Added missing support for timing CPU models
Last of five patches adding RISC-V to GEM5. This patch adds support for
timing, minor, and detailed CPU models that was missing in the last four,
which basically consists of handling timing-mode memory accesses and
telling the minor and detailed models what a no-op instruction should
be (addi zero, zero, 0).

Patches 1-4 introduced RISC-V and implemented the base instruction set,
RV64I, and added the multiply, floating point, and atomic memory
extensions, RV64MAFD.

[Fixed compatibility with edit from patch 1.]
[Fixed compatibility with hg copy edit from patch 1.]
[Fixed some style errors in locked_mem.hh.]
Signed-off by: Alec Roelke

Signed-off by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:28 -05:00
Alec Roelke 535e6c5fa4 riscv: [Patch 4/5] Added RISC-V atomic memory extension RV64A
Fourth of five patches adding RISC-V to GEM5. This patch adds the RV64A
extension, which includes atomic memory instructions. These instructions
atomically read a value from memory, modify it with a value contained in a
source register, and store the original memory value in the destination
register and modified value back into memory. Because this requires two
memory accesses and GEM5 does not support two timing memory accesses in
a single instruction, each of these instructions is split into two micro-
ops: A "load" micro-op, which reads the memory, and a "store" micro-op,
which modifies and writes it back.  Each atomic memory instruction also has
two bits that acquire and release a lock on its memory location.
Additionally, there are atomic load and store instructions that only either
load or store, but not both, and can acquire or release memory locks.

Note that because the current implementation of RISC-V only supports one
core and one thread, it doesn't make sense to make use of AMO instructions.
However, they do form a standard extension of the RISC-V ISA, so they are
included mostly as a placeholder for when multithreaded execution is
implemented.  As a result, any tests for their correctness in a future
patch may be abbreviated.

Patch 1 introduced RISC-V and implemented the base instruction set, RV64I;
patch 2 implemented the integer multiply extension, RV64M; and patch 3
implemented the single- and double-precision floating point extensions,
RV64FD.

Patch 5 will add support for timing, minor, and detailed CPU models that
isn't present in patches 1-4.

[Added missing file amo.isa]
[Replaced information removed from initial patch that was missed during
division into multiple patches.]
[Fixed some minor formatting issues.]
[Fixed oversight where LR and SC didn't have both AQ and RL flags.]
Signed-off by: Alec Roelke

Signed-off by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:28 -05:00
Alec Roelke 1229b3b623 riscv: [Patch 3/5] Added RISCV floating point extensions RV64FD
Third of five patches adding RISC-V to GEM5. This patch adds the RV64FD
extensions, which include single- and double-precision floating point
instructions.

Patch 1 introduced RISC-V and implemented the base instruction set, RV64I
and patch 2 implemented the integer multiply extension, RV64M.

Patch 4 will implement the atomic memory instructions, RV64A, and patch
5 will add support for timing, minor, and detailed CPU models that is
missing from the first four patches.

[Fixed exception handling in floating-point instructions to conform better
to IEEE-754 2008 standard and behavior of the Chisel-generated RISC-V
simulator.]
[Fixed style errors in decoder.isa.]
[Fixed some fuzz caused by modifying a previous patch.]
Signed-off by: Alec Roelke

Signed-off by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:28 -05:00
Alec Roelke 070da98493 riscv: [Patch 2/5] Added RISC-V multiply extension RV64M
Second of five patches adding RISC-V to GEM5.  This patch adds the
RV64M extension, which includes integer multiply and divide instructions.

Patch 1 introduced RISC-V and implemented the base instruction set, RV64I.

Patch 3 will implement the floating point extensions, RV64FD; patch 4 will
implement the atomic memory instructions, RV64A; and patch 5 will add
support for timing, minor, and detailed CPU models that is missing from
the first four patches.

[Added mulw instruction that was missed when dividing changes among
patches.]
Signed-off by: Alec Roelke

Signed-off by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:28 -05:00
Alec Roelke e76bfc8764 arch: [Patch 1/5] Added RISC-V base instruction set RV64I
First of five patches adding RISC-V to GEM5. This patch introduces the
base 64-bit ISA (RV64I) in src/arch/riscv for use with syscall emulation.
The multiply, floating point, and atomic memory instructions will be added
in additional patches, as well as support for more detailed CPU models.
The loader is also modified to be able to parse RISC-V ELF files, and a
"Hello world\!" example for RISC-V is added to test-progs.

Patch 2 will implement the multiply extension, RV64M; patch 3 will implement
the floating point (single- and double-precision) extensions, RV64FD;
patch 4 will implement the atomic memory instructions, RV64A, and patch 5
will add support for timing, minor, and detailed CPU models that is missing
from the first four patches (such as handling locked memory).

[Removed several unused parameters and imports from RiscvInterrupts.py,
RiscvISA.py, and RiscvSystem.py.]
[Fixed copyright information in RISC-V files copied from elsewhere that had
ARM licenses attached.]
[Reorganized instruction definitions in decoder.isa so that they are sorted
by opcode in preparation for the addition of ISA extensions M, A, F, D.]
[Fixed formatting of several files, removed some variables and
instructions that were missed when moving them to other patches, fixed
RISC-V Foundation copyright attribution, and fixed history of files
copied from other architectures using hg copy.]
[Fixed indentation of switch cases in isa.cc.]
[Reorganized syscall descriptions in linux/process.cc to remove large
number of repeated unimplemented system calls and added implmementations
to functions that have received them since it process.cc was first
created.]
[Fixed spacing for some copyright attributions.]
[Replaced the rest of the file copies using hg copy.]
[Fixed style check errors and corrected unaligned memory accesses.]
[Fix some minor formatting mistakes.]
Signed-off by: Alec Roelke

Signed-off by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:28 -05:00
Sophiane Senni ce2722cdd9 mem: Split the hit_latency into tag_latency and data_latency
If the cache access mode is parallel, i.e. "sequential_access" parameter
is set to "False", tags and data are accessed in parallel. Therefore,
the hit_latency is the maximum latency between tag_latency and
data_latency. On the other hand, if the cache access mode is
sequential, i.e. "sequential_access" parameter is set to "True",
tags and data are accessed sequentially. Therefore, the hit_latency
is the sum of tag_latency plus data_latency.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-11-30 17:10:27 -05:00
Jason Lowe-Power 047caf24ba cpu: Remove branch predictor function predictInOrder
This function was used by the now-defunct InOrderCPU model. Since this
model is no longer in gem5, this function was not called from anywhere in
the code.
2016-11-30 17:10:27 -05:00
Michael LeBeane cd4b26b6ae dev: Fix buffer length when unserializing an eth pkt
Changeset 11701 only serialized the useful portion of of an ethernet packets'
payload. However, the device models expect each ethernet packet to contain
a 16KB buffer, even if there is no data in it. This patch adds a 'bufLength'
field to EthPacketData so the original size of the packet buffer can always
be unserialized.

Reported-by: Gabor Dozsa <Gabor.Dozsa@arm.com>
2016-11-29 13:04:45 -05:00
Joe Gross 4b7bc5b1e1 scons: fix sanitizer flags with multiple sanitizers
There has been some problem when using address and undefined-behavior
sanitizers at the same time. This patch will look for the special case
where both are enabled at once and change the flags passed to the compiler
to reflect this.
2016-11-28 12:44:54 -05:00
Jieming Yin b0856ab3b1 ruby: Fix potential bugs in garnet2.0
1. Delete unused variable from struct LinkEntry
2. Correct GarnetExtLink and GarnetIntLink inheritance
2016-11-21 15:41:30 -05:00
Tony Gutierrez 14deacf86e gpu-compute: fix segfault when constructing GPUExecContext
the GPUExecContext context currently stores a reference to its parent WF's
GPUISA object, however there are some special instructions that do not have
an associated WF. when these objects are constructed they set their WF pointer
to null, which causes the GPUExecContext to segfault when trying to
dereference
the WF pointer to get at the WF's GPUISA object. here we change the GPUISA
reference in the GPUExecContext class to a pointer so that it may be set to
null.
2016-11-21 15:40:03 -05:00
Tony Gutierrez a0d4019abd gpu-compute: init valid field of GpuTlbEntry in default ctor
valid field for GpuTlbEntry is not set in the default ctor, which can
lead to strange behavior, and is also flagged by UBSAN.
2016-11-21 15:38:30 -05:00
Tony Gutierrez f82418acef ruby: add default ctor for MachineID type
not all uses of MachineID initialize its fields, so here we add a default
ctor.
2016-11-21 15:37:07 -05:00
Tony Gutierrez 0799600686 x86: fix issue with casting in Cvtf2i
UBSAN flags this operation because it detects that arg is being cast directly
to an unsigned type, argBits. this patch fixes this by first casting the
value to a signed int type, then reintrepreting the raw bits of the signed
int into argBits.
2016-11-21 15:35:56 -05:00
Sooraj Puthoor 29d38e7576 ruby: init MessageSizeType of SequencerMsg to Request_Control
SequencerMsg is autogenerated by slicc scripts and the MessageSizeType is
initialized to the max enume value by default. The DMASequencer pushes this
message to the mandatory queue and since the MessageSizeType is unitialized,
string_to_MessageSizeType() function used by traces to print the message fails
with a panic. This patch avoids this problem by initializing MessageSizeType
of SequencerMsg to Request_Control.
2016-11-19 12:39:04 -05:00
Tony Gutierrez ae55cba281 x86: fix loading/storing of Float80 types 2016-11-19 12:35:14 -05:00
Andreas Hansson 6ed567d600 alpha: Remove ALPHA tru64 support and associated tests
No one appears to be using it, and it is causing build issues
and increases the development and maintenance effort.
2016-11-17 04:54:14 -05:00
Tony Gutierrez 74249f80df hsail,gpu-compute: fixes to appease clang++
fixes to appease clang++. tested on:

Ubuntu clang version 3.5.0-4ubuntu2~trusty2
(tags/RELEASE_350/final) (based on LLVM 3.5.0)

Ubuntu clang version 3.6.0-2ubuntu1~trusty1
(tags/RELEASE_360/final) (based on LLVM 3.6.0)

the fixes address the following five issues:

1) the exec continuations in gpu_static_inst.hh were marked
   as protected when they should be public. here we mark
   them as public

2) the Abs instruction uses std::abs() in its execute method.
   because Abs is templated, it can also operate on U32 and U64,
   types, which cause Abs::execute() to pass uint32_t and uint64_t
   types to std::abs() respectively. this triggers a warning
   because std::abs() has no effect in this case. to rememdy this
   we add template specialization for the execute() method of Abs
   when its template paramter is U32 or U64.

3) Some potocols that utilize the code in cprintf.hh were missing
   includes to BoolVec.hh, which defines operator<< for the BoolVec
   type. This would cause issues when the generated code would try
   to pass a BoolVec type to a method in cprintf.hh that used
   operator<< on an instance of a BoolVec.

4) Surprise, clang doesn't like it when you clobber all the bits
   in a newly allocated object. I.e., this code:

   tlb = new GpuTlbEntry\[size\];
   std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size);

   Let's use std::vector to track the TLB entries in the GpuTlb now...

5) There were a few variables used only in DPRINTFs, so we mark them
   with M5_VAR_USED.
2016-10-26 22:48:45 -04:00
Michael LeBeane dc16c1ceb8 dev: Add m5 op to toggle synchronization for dist-gem5.
This patch adds the ability for an application to request dist-gem5 to begin/
end synchronization using an m5 op. When toggling on sync, all nodes agree
on the next sync point based on the maximum of all nodes' ticks. CPUs are
suspended until the sync point to avoid sending network messages until sync has
been enabled. Toggling off sync acts like a global execution barrier, where
all CPUs are disabled until every node reaches the toggle off point. This
avoids tricky situations such as one node hitting a toggle off followed by a
toggle on before the other nodes hit the first toggle off.
2016-10-26 22:48:40 -04:00
Michael LeBeane 48e43c9ad1 ruby: Allow multiple outstanding DMA requests
DMA sequencers and protocols can currently only issue one DMA access at
a time. This patch implements the necessary functionality to support
multiple outstanding DMA requests in Ruby.
2016-10-26 22:48:37 -04:00
mlebeane 96905971f2 dev: Add 'simLength' parameter in EthPacketData
Currently, all the network devices create a 16K buffer for the 'data' field
in EthPacketData, and use 'length' to keep track of the size of the packet
in the buffer.  This patch introduces the 'simLength' parameter to
EthPacketData, which is used to hold the effective length of the packet used
for all timing calulations in the simulator.  Serialization is performed using
only the useful data in the packet ('length') and not necessarily the entire
original buffer.
2016-10-26 22:48:33 -04:00
Tony Gutierrez de72e36619 gpu-compute: support in-order data delivery in GM pipe
this patch adds an ordered response buffer to the GM pipeline
to ensure in-order data delivery. the buffer is implemented as
a stl ordered map, which sorts the request in program order by
using their sequence ID. when requests return to the GM pipeline
they are marked as done. only the oldest request may be serviced
from the ordered buffer, and only if is marked as done.

the FIFO response buffers are kept and used in OoO delivery mode
2016-10-26 22:48:28 -04:00
Tony Gutierrez b63eb1302b gpu-compute, hsail: pass GPUDynInstPtr to getRegisterIndex()
for HSAIL an operand's indices into the register files may be calculated
trivially, because the operands are always read from a register file, or are
an immediate.

for machine ISA, however, an op selector may specify special registers, or
may specify special SGPRs with an alias op selector value. the location of
some of the special registers values are dependent on the size of the RF
in some cases. here we add a way for the underlying getRegisterIndex()
method to know about the size of the RFs, so that it may find the relative
positions of the special register values.
2016-10-26 22:47:49 -04:00
Tony Gutierrez aa7364276f gpu-compute: use System cache line size in the GPU 2016-10-26 22:47:47 -04:00
Tony Gutierrez 844fb845a5 gpu-compute, hsail: make the PC a byte address, not an instruction index
currently the PC is incremented on an instruction granularity, and not as an
instruction's byte address. machine ISA instructions assume the PC is a byte
address, and is incremented accordingly. here we make the GPU model, and the
HSAIL instructions treat the PC as a byte address as well.
2016-10-26 22:47:43 -04:00
Tony Gutierrez d327cdba07 gpu-compute: add gpu_isa.hh to switch hdrs, add GPUISA to WF
the GPUISA class is meant to encapsulate any ISA-specific behavior - special
register accesses, isa-specific WF/kernel state, etc. - in a generic enough
way so that it may be used in ISA-agnostic code.

gpu-compute: use the GPUISA object to advance the PC

the GPU model treats the PC as a pointer to individual instruction objects -
which are store in a contiguous array - and not a byte address to be fetched
from the real memory system. this is ok for HSAIL because all instructions
are considered by the model to be the same size.

in machine ISA, however, instructions may be 32b or 64b, and branches are
calculated by advancing the PC by the number of words (4 byte chunks) it
needs to advance in the real instruction stream. because of this there is
a mismatch between the PC we use to index into the instruction array, and
the actual byte address PC the ISA expects. here we move the PC advance
calculation to the ISA so that differences in the instrucion sizes may be
accounted for in generic way.
2016-10-26 22:47:38 -04:00
Tony Gutierrez 98d8a7051d gpu-compute: add instruction mix stats for the gpu 2016-10-26 22:47:30 -04:00
Tony Gutierrez c7a79c9a42 gpu-compute, hsail: call discardFetch() from the WF
because every taken branch causes fetch to be discarded, we move the call
to the WF to avoid to have to call it from each and every branch instruction
type.
2016-10-26 22:47:27 -04:00
Tony Gutierrez 00a6346c91 hsail, gpu-compute: remove doGm/SmReturn add completeAcc
we are removing doGmReturn from the GM pipe, and adding completeAcc()
implementations for the HSAIL mem ops. the behavior in doGmReturn is
dependent on HSAIL and HSAIL mem ops, however the completion phase
of memory ops in machine ISA can be very different, even amongst individual
machine ISA mem ops. so we remove this functionality from the pipeline and
allow it to be implemented by the individual instructions.
2016-10-26 22:47:19 -04:00
Tony Gutierrez 7ac38849ab gpu-compute: remove inst enums and use bit flag for attributes
this patch removes the GPUStaticInst enums that were defined in GPU.py.
instead, a simple set of attribute flags that can be set in the base
instruction class are used. this will help unify the attributes of HSAIL
and machine ISA instructions within the model itself.

because the static instrution now carries the attributes, a GPUDynInst
must carry a pointer to a valid GPUStaticInst so a new static kernel launch
instruction is added, which carries the attributes needed to perform a
the kernel launch.
2016-10-26 22:47:11 -04:00
Tony Gutierrez e1ad8035a3 gpu-compute: move disassemle() implementation to GPUStaticInst 2016-10-26 22:47:05 -04:00
Tony Gutierrez 0a6cdff176 gpu-compute, arch: add some methods to the base inst classes for ISA support 2016-10-26 22:47:01 -04:00
Tony Gutierrez c7d4afd878 ruby: make a RequestDesc class instead of std::pair
the RequestDesc was previously implemented as a std::pair, which made
the implementation overly complex and error prone. here we encapsulate the
packet, primary, and secondary types all in a single data structure with
all members properly intialized in a ctor
2016-10-26 22:46:58 -04:00
Bjoern A. Zeeb 28c84d2886 arm, dev: pl011 console interactivity
Improve PL011 console interactivity

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15 15:11:04 -05:00
Nicolas Derumigny 976ef444b8 syscall: read() should not write anything if reading EOF.
Read() should not write anything when returning 0 (EOF).
This patch does not correct the same bug occuring for :

nbr_read=read(file, buf, nbytes)

When nbr_read<nbytes, nbytes bytes are copied into the virtual
RAM instead of nbr_read. If buf is smaller than nbytes, a
page fault occurs, even if buf is in fact bigger than nbr_read.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15 15:06:24 -05:00
Fernando Endo 6c72c35519 cpu, arm: Distinguish Float* and SimdFloat*, create FloatMem* opClass
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to
Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which
distinguishes writes to the INT and FP register banks.
Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72,
where the "latency" of FMADD is 3 if the next instruction is a FMADD and
has only the augend to destination dependency, otherwise it's 7 cycles.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15 14:58:45 -05:00
Jason Lowe-Power 824c87634d stats: Add more information to uninitialized error
ClockedObject was changed to require its regStats() to be called from every
child class. If you forget to do this, the error was indecipherable. This
patch makes the error more clear.
2016-10-14 09:02:03 -05:00
Omar Naji 78dd152a0d mem: add DRAM powerdown current
Change-Id: I763cffe0c69f5ebbbf6a6eb12bec5c13d5d0161d
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser 1dc16aff24 mem: Add DRAM low-power functionality
Added power-down state transitions to the DRAM controller model.

Added per rank parameter, outstandingEvents, which tracks the number
of outstanding command events and is used to determine when the
controller should transition to a low power state.
The controller will only transition when there are no outstanding events
scheduled and the number of command entries for the given rank is 0.

The outstandingEvents parameter is incremented for every RD/WR burst,
PRE, and REF event scheduled.  ACT is implicitly covered by RD/WR
since burst will always issue and complete after a required ACT.
The parameter is decremented when the event is serviced (completed).

The controller will automatically transition to ACT power down,
PRE power down, or SREF.

Transition to ACT power down state scheduled from:
1) The RespondEvent, where read data is received from the memory.
   ACT power-down entry will be scheduled when one or more banks is
   open, all commands for the rank have completed (no more commands
   scheduled), and there are no commands in queue for the rank

Transition to PRE power down scheduled from:
1) respondEvent, when all banks are closed, all commands have
   completed, and there are no commands in queue for the rank
2) prechargeEvent when all banks are closed, all commands have
   completed, and there are no commands in queue for the rank
3) refreshEvent, after the refresh is complete when the previous
   state was ACT power-down
4) refreshEvent, after the refresh is complete when the previous
   state was PRE power-down and there are commands in the queue.

Transition to SREF will be scheduled from:
1) refreshEvent, after the refresh is completes when the previous
   state was PRE power-down with no commands in queue

Power-down exit commands are scheduled from:
1) The refreshEvent, prior to issuing a refresh
2) doDRAMAccess, to wake-up the rank for RD/WR command issue.

Self-refresh exit commands are scheduled from:
1) The next request event, when the queue has commands for the rank
   in the readQueue or there are commands for the rank in the
   writeQueue and the bus state is WRITE.

Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser 7b269f2c95 mem: Add callback to compute stats prior to dump event
The per rank statistics are periodically updated based on
state transition and refresh events.

Add a method to update these when a dump event occurs to
ensure they reflect accurate values.
Specifically, need to ensure that the low-power state
durations, power, and energy are logged correctly.

Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser 0dd0d4ee7a mem: Modify drain to ensure banks and power are idled
Add constraint that all ranks have to be in PWR_IDLE
before signaling drain complete

This will ensure that the banks are all closed and the rank
has exited any low-power states.

On suspend, update the power stats to sync the DRAM power logic

The logic maintains the location of the signalDrainDone
method, which is still triggered from either:
1) Read response event
2) Next request event

This ensures that the drain will complete in the READ bus
state and minimizes the changes required.

Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser 27665af26d mem: Sort memory commands and update DRAMPower
Add local variable to stores commands to be issued.
These commands are in order within a single bank but will be out
of order across banks & ranks.

A new procedure, flushCmdList, sorts commands across banks / ranks,
and flushes the sorted list, up to curTick() to DRAMPower.
This is currently called in refresh, once all previous commands are
guaranteed to have completed.  Could be called in other events like
the powerEvent as well.

By only flushing commands up to curTick(), will not get out of sync
when flushed at a periodic stats dump (done in subsequent patch).

Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:10 +01:00
Omar Naji 61b2b493d4 mem: update DDR3 die revision
Change-Id: I8992ddc1664c3ed4b2d36d8a34e4ce8be113b9de
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:10 +01:00
Omar Naji d19dc35b06 mem: add DRAM powerdown timing 2016-10-13 19:22:10 +01:00
Omar Naji 20e6bb0140 mem: make DDR4 x16 2016-10-13 19:22:10 +01:00
Mitch Hayenga bd0c2d5b0b isa,arm: Add missing AArch32 FP instructions
This commit adds missing non-predicated, scalar floating point
instructions.  Specifically VRINT* floating point integer rounding
instructions and VSEL* floating point conditional selects.

Change-Id: I23cbd1389f151389ac8beb28a7d18d5f93d000e7
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>
2016-10-13 19:22:10 +01:00
Andreas Sandberg 8c5df4be2e dev, arm: Make GenericTimer param handling more robust
The generic timer needs a pointer to an ArmSystem to wire itself to the
system register handler. This was previously specified as an instance
of System that was later cast to ArmSystem. Make this more robust by
specifying it as an ArmSystem in the Python interface and add a check
to make sure that it is non-NULL.

Change-Id: I989455e666f4ea324df28124edbbadfd094b0d02
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-10-07 14:14:44 +01:00
Tushar Krishna 22e6f65d72 ruby: Add M5_VAR_USED before variables used only inside assert in garnet2.0.
This removes errors when building gem5.fast
2016-10-06 21:06:00 -04:00
Tushar Krishna dbe8892b76 ruby: garnet2.0
Revamped version of garnet with more optimized single-cycle routers,
more configurability, and cleaner code.
2016-10-06 14:35:22 -04:00
Tushar Krishna b512f4bf71 ruby: remove the original garnet code.
Only garnet2.0 will be supported henceforth.
2016-10-06 14:35:21 -04:00
Tushar Krishna 0962d76827 config: add port directions and per-router delay in topology.
This patch adds port direction names to the links during topology
creation, which can be used for better printed names for the links
or for users to code up their own adaptive routing algorithms.
It also adds support for every router to have an independent latency
value to support heterogeneous topologies with the subsequent
garnet2.0 patch.
2016-10-06 14:35:20 -04:00
Tushar Krishna 003c08fa90 config: make internal links in network topology unidirectional.
This patch makes the internal links within the network topology
unidirectional, thus allowing any deadlock-free routing algorithms to
be specified from the topology itself using weights.
This patch also renames Mesh.py and MeshDirCorners.py to
Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing).
It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.
2016-10-06 14:35:18 -04:00
Tushar Krishna 0f68b50ff1 ruby: rename networktest to garnet_synthetic_traffic.
networktest is essentially a collection of synthetic traffic patterns
for the network. The protocol name and the tester having the same name
led to multiple python configuration files with the same name, adding
confusion. This patch renames networktest to garnet_synthetic_traffic,
and also adds more synthetic traffic patterns.
2016-10-06 14:35:16 -04:00
Tushar Krishna aca869bf2d ruby: rename ALPHA_Network_test protocol to Garnet_standalone.
Over the past 6 years, we realized that the protocol is essentially used
to run the garnet network in a standalone manner, and feed standard synthetic
traffic patterns through it.
2016-10-06 14:35:14 -04:00
Alexandru Dutu 3f0118876f kvm: Adding details to kvm page fault in x86
Adding details, e.g. rip, rsp etc. to the kvm pagefault exit when in SE mode.
2016-10-04 13:06:05 -04:00
Alexandru Dutu 526b1b7ec8 misc: Adds a warning in case gdb is attached multiple times
Instead of scheduling another event, this patch adds a warning in case gdb
is attached multiple times and the first attachement event has not been
processed yet.
2016-10-04 13:04:19 -04:00
Alexandru Dutu c8cf71f1a0 gpu-compute: Added method to compute the actual workgroup size
This patch adds a method to the Wavefront class to compute the actual workgroup
size. This can be different from the maximum workgroup size specified when
launching the kernel through the NDRange object. Current solution is still not
optimal, as we are computing these for each wavefront and the dispatcher also
needs to have this information and can't actually call
Wavefront::computeActuallWgSz before the wavefronts are being created. A long
term solution would be to have a Workgroup class that deals with all these
details.
2016-10-04 13:03:52 -04:00
Andreas Sandberg 18135ce6ab sim: Add a checkpoint function to test for entries
When loading a checkpoint, it's sometimes desirable to be able to test
whether an entry within a secion exists. This is currently done
automatically in the UNSERIALIZE_OPT_SCALAR macro, but it isn't
possible to do for arrays, containers, or enums. Instead of adding
even more macros, add a helper function (CheckpointIn::entryExists())
that tests for the presence of an entry.

Change-Id: I4b4646b03276b889fd3916efefff3bd552317dbc
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-10-04 11:22:16 +01:00
Brad Beckmann ee78758857 ruby: correct size for partial memory writes
Fixed AbstractController::queueMemoryWritePartial to specify the
correct size for partial memory writes.
2016-09-29 01:06:52 -04:00
Brad Beckmann f0971354c4 mem: minor dprintf fix to abstract mem
print number of bytes written as a decimal number, not hex
2016-09-29 01:06:33 -04:00
Curtis Dunham 109cc2caa6 arm: disable GIC extensions
Change-Id: If19b9c593b48ded1ea848f2d3710d4369ec8a221
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-22 14:46:37 +01:00
Rekai Gonzalez-Alberquilla ad296b068c cpu: Fix the O3 CPU Drain
The drain did not wait until stages were ready again. Therefore, as a
result of messages in the TimeBuffer being drain, the state after the
drain was not consistent and asserts fired in some places when the
draining happened after a stage got blocked, but before the notification
arrived to the previous stages.

Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-22 10:49:10 +01:00
Tony Gutierrez 84f9747688 gpu-compute: fix typo in GPUDispatcher 2016-09-16 14:47:19 -04:00
Alexandru Dutu 68127ca3da hsail: Fix disassembly of load instruction with 3 destination operands 2016-09-16 12:36:20 -04:00
Alexandru Dutu bd65ec0744 gpu-compute: Adding context serialization methods to Wavefront
This patch adds methods to serialize the context of a particular wavefront
to the simulated system memory. Context serialization is used when a wavefront
is preempeted (i.e. context switch).
2016-09-16 12:32:36 -04:00
Alexandru Dutu e9b14d5111 gpu-compute: Refactoring Wavefront::dynWaveId 2016-09-16 12:31:46 -04:00
Alexandru Dutu 498d0e63e5 gpu-compute: Adding vector register file debug messages
This patch introduces DPRINTFs for reading and writing to and from the vector
register file.
2016-09-16 12:30:05 -04:00
Alexandru Dutu 7918376450 gpu-compute: Changing reconvergenceStack type
std::stack has no iterators, therefore the reconvergence stack can't be
iterated without poping elements off. We will be using std::list instead to be
able to iterate for saving and restoring purposes.
2016-09-16 12:29:01 -04:00
Alexandru Dutu d5c8c5d3db gpu-compute: Adding ioctl for HW context size
Adding runtime support for determining the memory required by a SIMD engine
when executing a particular wavefront.
2016-09-16 12:27:56 -04:00
Alexandru Dutu 589e13a23b gpu-compute: Wavefront refactoring
Renaming members of the Wavefront class in accordance with the style guide.
2016-09-16 12:26:52 -04:00
Alexandru Dutu e9fe1b838b gpu-compute: Remove WFContext
WFContext struct is currently unused and it has been rendered not useful in
saving and restoring the context of a Wavefront. Wavefront class should be
sufficient for that purpose and the runtime can figure out the memory size
it will need to allocate for a Wavefront through an IOCTL.
2016-09-16 12:26:03 -04:00
Curtis Dunham 18461d1522 base: eliminate ipython warning
Change-Id: I3e282baeb969b6bb9534813a2f433d68246c0669
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-15 18:21:38 +01:00
Ricardo Alves e5c1488cb6 arm: Add m5_fail support for aarch64
Change-Id: Id2acbc09772be310a0eb9e33295afab07e08a4fa
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-15 18:21:24 +01:00
Radhika Jagtap 1fe5f63137 cpu: Support exit when any one Trace CPU completes replay
This change adds a Trace CPU param to exit simulation early,
i.e. when the first (any one) trace execution is complete. With
this change the user gets a choice to configure exit as either
when the last CPU finishes (default) or first CPU finishes
replay. Configuring an early exit enables simulating and
measuring stats strictly when memory-system resources are being
stressed by all Trace CPUs.

Change-Id: I3998045fdcc5cd343e1ca92d18dd7f7ecdba8f1d
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-09-15 18:01:20 +01:00
Radhika Jagtap d067327fc0 cpu: Adjust for trace offset and fix stats
This change subtracts the time offset present in the trace from
all the event times when nodes and request are sent so that the
replay starts immediately when the simulation starts. This makes
the stats accurate when the time offset in traces is large, for
example when traces are generated in the middle of a workload
execution. It also solves the problem of unnecessary DRAM
refresh events that would keep occuring during the large time
offset before even a single request is replayed into the system.

Change-Id: Ie0898842615def867ffd5c219948386d952af7f7
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-09-15 18:01:16 +01:00
Radhika Jagtap d7724d5f54 cpu: Add frequency scaling to the Trace CPU
This change adds a simple feature to scale the frequency of
the Trace CPU.

The compute delays in the input traces provide timing. This
change adds a freqency multiplier parameter to the Trace CPU
set to 1.0 by default. The compute delay is manipulated to
effectively achieve the  frequency at which the nodes become
ready and thus scale the frequency of the Trace CPU.

Change-Id: Iaabbd57806941ad56094fcddbeb38fcee1172431
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-09-15 18:01:09 +01:00
Michael LeBeane 443da2c030 kvm: Support timing accesses for KVM cpu
This patch enables timing accesses for KVM cpu.  A new state,
RunningMMIOPending, is added to indicate that there are outstanding timing
requests generated by KVM in the system.  KVM's tick() is disabled and the
simulation does not enter into KVM until all outstanding timing requests have
completed.  The main motivation for this is to allow KVM CPU to perform MMIO
in Ruby, since Ruby does not support atomic accesses.
2016-09-13 23:20:03 -04:00
Michael LeBeane 2c43a21687 x86: Force strict ordering for memory mapped m5ops
Normal MMAPPED_IPR requests are allowed to execute speculatively under the
assumption that they have no side effects.  The special case of m5ops that are
treated like MMAPPED_IPR should not be allowed to execute speculatively, since
they can have side-effects.  Adding the STRICT_ORDER flag to these requests
blocks execution until the associated instruction hits the ROB head.
2016-09-13 23:18:34 -04:00
Michael LeBeane 458d4a3c7b sim: Refactor quiesce and remove FS asserts
The quiesce family of magic ops can be simplified by the inclusion of
quiesceTick() and quiesce() functions on ThreadContext.  This patch also
gets rid of the FS guards, since suspending a CPU is also a valid
operation for SE mode.
2016-09-13 23:17:42 -04:00
Michael LeBeane 6e4c51fa99 dev: Add a DmaCallback class to DmaDevice
This patch introduces the DmaCallback helper class, which registers a callback
to fire after a sequence of (potentially non-contiguous) DMA transfers on a
DmaPort completes.
2016-09-13 23:14:24 -04:00
Michael LeBeane f17a5faf44 sim, syscall_emul: Add mmap to EmulatedDriver
Add support for calling mmap on an EmulatedDriver file descriptor.
2016-09-13 23:12:46 -04:00
Michael LeBeane 6a668d0c0c gpu-compute: Fix bug with return in cfg
Connecting basic blocks would stop too early in kernels where ret was not the
last instruction.  This patch allows basic blocks after the ret instruction
to be properly connected.
2016-09-13 23:11:20 -04:00
Michael LeBeane febab25957 dev: Exit correctly in dist-gem5
The receiver thread in dist_iface is allowed to directly exit the simulation.
This can cause exit to be called twice if the main thread simultaneously wants
to exit the simulation.  Therefore, have the receiver thread enqueue a request
to exit on the primary event queue for the main simulation thread to handle.
2016-09-13 23:08:34 -04:00