Commit graph

10990 commits

Author SHA1 Message Date
Andreas Sandberg db5c9a5f90 base: Redesign internal frame buffer handling
Currently, frame buffer handling in gem5 is quite ad hoc. In practice,
we pass around naked pointers to raw pixel data and expect consumers
to convert frame buffers using the (broken) VideoConverter.

This changeset completely redesigns the way we handle frame buffers
internally. In summary, it fixes several color conversion bugs, adds
support for more color formats (e.g., big endian), and makes the code
base easier to follow.

In the new world, gem5 always represents pixel data using the Pixel
struct when pixels need to be passed between different classes (e.g.,
a display controller and the VNC server). Producers of entire frames
(e.g., display controllers) should use the FrameBuffer class to
represent a frame.

Frame producers are expected to create one instance of the FrameBuffer
class in their constructors and register it with its consumers
once. Consumers are expected to check the dimensions of the frame
buffer when they consume it.

Conversion between the external representation and the internal
representation is supported for all common "true color" RGB formats of
up to 32-bit color depth. The external pixel representation is
expected to be between 1 and 4 bytes in either big endian or little
endian. Color channels are assumed to be contiguous ranges of bits
within each pixel word. The external pixel value is scaled to an 8-bit
internal representation using a floating multiplication to map it to
the entire 8-bit range.
2015-05-23 13:37:03 +01:00
Andreas Sandberg 1985d28ef9 base: Clean up bitmap generation code
The bitmap generation code is hard to follow and incorrectly uses the
size of an enum member to calculate the size of a pixel. This
changeset cleans up the code and adds some documentation.
2015-05-23 13:37:01 +01:00
Joel Hestness 0479569f67 ruby: Fix RubySystem warm-up and cool-down scope
The processes of warming up and cooling down Ruby caches are simulation-wide
processes, not just RubySystem instance-specific processes. Thus, the warm-up
and cool-down variables should be globally visible to any Ruby components
participating in either process. Make these variables static members and track
the warm-up and cool-down processes as appropriate.

This patch also has two side benefits:
1) It removes references to the RubySystem g_system_ptr, which are problematic
for allowing multiple RubySystem instances in a single simulation. Warmup and
cooldown variables being static (global) reduces the need for instance-specific
dereferences through the RubySystem.
2) From the AbstractController, it removes local RubySystem pointers, which are
used inconsistently with other uses of the RubySystem: 11 other uses reference
the RubySystem with the g_system_ptr. Only sequencers have local pointers.
2015-05-19 10:56:51 -05:00
Andreas Hansson 99d3fa5945 arm: Identify table-walker requests
This patch ensures all page-table walks are flagged as such.
2015-05-15 13:40:01 -04:00
Andreas Hansson bd583d00f9 misc: Appease gcc 5.1
Three minor issues are resolved:

1. Apparently gcc 5.1 does not like negation of booleans followed by
   bitwise AND.

2. Somehow the compiler also gets confused and warns about
   NoopMachInst being unused (removing it causes compilation errors
   though). Most likely a compiler bug.

3. There seems to be a number of instances where loop unrolling causes
   false positives for the array-bounds check. For now, switch to
   std::array. Potentially we could disable the warning for newer gcc
   versions, but switching to std::array is probably a good move in
   any case.
2015-05-15 13:39:53 -04:00
Andreas Sandberg 37aab4a155 sim: Don't clear the active CPU vector in System::initState
The system class currently clears the vector of active CPUs in
initState(). CPUs are added to the list by registerThreadContext()
which is called from BaseCPU::init(). This obviously breaks when the
System object is initialized after the CPUs. This changeset removes
the offending clear() call since the list will be empty after it has
been instantiated anyway.
2015-05-15 13:39:44 -04:00
Andreas Hansson a45c9508ea config: Use null memory for DRAM sweep script
Do not waste time when we do not care about the data.
2015-05-15 13:38:46 -04:00
Wendy Elsasser 20978ee697 config: Add new MemConfig options to DRAM sweep script
Update script to match current MemConfig options with
external_memory_system option set to 0.
2015-05-15 13:38:45 -04:00
Steve Reinhardt c65fa3dceb syscall_emul: fix warn_once behavior
The current ignoreWarnOnceFunc doesn't really work as expected,
since it will only generate one warning total, for whichever
"warn-once" syscall is invoked first.  This patch fixes that
behavior by keeping a "warned" flag in the SyscallDesc object,
allowing suitably flagged syscalls to warn exactly once per
syscall.
2015-05-05 09:25:59 -07:00
Andreas Hansson 40e180ecbe stats, arm: Update stats for missing FPEXC.EN check
Only one regression is affected.
2015-05-05 03:22:48 -04:00
Andreas Hansson f349592071 arm: Add missing FPEXC.EN check
Add a missing check to ensure that exceptions are generated properly.
2015-05-05 03:22:45 -04:00
Giacomo Gabrielli a3f23894eb arm: enable DCZVA by default in SE mode 2015-05-05 03:22:42 -04:00
Andreas Hansson 80cd107e51 stats: Update stats to reflect cache changes 2015-05-05 03:22:39 -04:00
Stephan Diestelhorst 2847d5f517 mem: Create a request copy for deferred snoops
Sometimes, we need to defer an express snoop in an MSHR, but the original
request might complete and deallocate the original pkt->req.  In those cases,
create a copy of the request so that someone who is inspecting the delayed
snoop can also inspect the request still.  All of this is rather hacky, but the
allocation / linking and general life-time management of Packet and Request is
rather tricky.  Deleting the copy is another tricky area, testing so far has
shown that the right copy is deleted at the right time.
2015-03-17 11:50:55 +00:00
Andreas Sandberg 706597f021 arm: Relax ordering for some uncacheable accesses
We currently assume that all uncacheable memory accesses are strictly
ordered. Instead of always enforcing strict ordering, we now only
enforce it if the required memory type is device memory or strongly
ordered memory.
2015-05-05 03:22:34 -04:00
Andreas Sandberg 48281375ee mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

    * Speculation: CPU models are not allowed to issue speculative
      loads.

    * Write combining: CPU models and caches are not allowed to merge
      writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
2015-05-05 03:22:33 -04:00
Andreas Sandberg 1da634ace0 mem, alpha: Move Alpha-specific request flags
Move Alpha-specific memory request flags to an architecture-specific
header and map them to the architecture specific flag bit range.
2015-05-05 03:22:31 -04:00
Andreas Hansson 23b9792681 arm: Remove unnecessary boot uncachability
With the recent patches addressing how we deal with uncacheable
accesses there is no longer need for the work arounds put in place to
enforce certain sections of memory to be uncacheable during boot.
2015-05-05 03:22:30 -04:00
Andreas Hansson 36f29496a0 mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.

The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
2015-05-05 03:22:29 -04:00
Andreas Hansson 554ddc7c07 arch, cpu: Do not forward snoops to table walker
This patch simplifies the overall CPU by changing the TLB caches such
that they do not forward snoops to the table walker port(s). Note that
only ARM and X86 are affected.

There is no reason for the ports to snoop as they do not actually take
any action, and from a performance point of view we are better of not
snooping more than we have to.

Should it at a later point be required to snoop for a particular TLB
design it is easy enough to add it back.
2015-05-05 03:22:27 -04:00
Andreas Hansson 14e5b2ea55 mem: Pass shared downstream through caches
This patch ensures that we pass on information about a packet being
shared (rather than exclusive), when forwarding a packet downstream.

Without this patch there is a risk that a downstream cache considers
the line exclusive when it really isn't.
2015-05-05 03:22:26 -04:00
Ali Jafri 3d33432136 mem: Add forward snoop check for HardPFReqs
We should always check whether the cache is supposed to be forwarding snoops
before generating snoops.
2015-05-05 03:22:25 -04:00
Andreas Hansson 0ebbf3f951 mem: Add missing stats update for uncacheable MSHRs
This patch adds a missing counter update for the uncacheable
accesses. By updating this counter we also get a meaningful average
latency for uncacheable accesses (previously inf).
2015-05-05 03:22:24 -04:00
Andreas Hansson 33e3e370f2 mem: Tidy up BaseCache parameters
This patch simply tidies up the BaseCache parameters and removes the
unused "two_queue" parameter.
2015-05-05 03:22:22 -04:00
David Guillen 5287945a8b mem: Remove templates in cache model
This patch changes the cache implementation to rely on virtual methods
rather than using the replacement policy as a template argument.

There is no impact on the simulation performance, and overall the
changes make it easier to modify (and subclass) the cache and/or
replacement policy.
2015-05-05 03:22:21 -04:00
Andreas Hansson d0d933facc cpu: Work around gcc 4.9 issues with Num_OpClasses
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
2015-05-05 03:22:19 -04:00
Andreas Hansson eb1a9977bf stats: Bring regression stats in line with actual behaviour 2015-05-05 03:22:17 -04:00
Nilay Vaish f71fa17157 stats: arm: updates 2015-04-30 14:17:43 -05:00
Nilay Vaish 42fe2df354 stats: x86: updates due to change in div latency 2015-04-29 22:35:23 -05:00
Ruslan Bukin 81f3211149 arch, base, dev, kern, sym: FreeBSD support
This adds support for FreeBSD/aarch64 FS and SE mode (basic set of syscalls only)

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-29 22:35:23 -05:00
Rizwana Begum 52a3bc5e5c mem: Simplify page close checks for adaptive policies
Both open_adaptive and close_adaptive page polices keep the page
open if a row hit is found. If a row hit is not found, close_adaptive
page policy precharges the row, and open_adaptive policy precharges
the row only if there is a bank conflict request waiting in the queue.

This patch makes the checks for above conditions simpler.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-29 22:35:22 -05:00
Nilay Vaish 3a2731fb8c ruby: set: replace long by unsigned long
UBSan complains about negative value being shifted
2015-04-29 22:35:22 -05:00
Nilay Vaish 4333549575 cpu: o3: replace issueLatency with bool pipelined
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued.  As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined).  The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value.  To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean.  If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
2015-04-29 22:35:22 -05:00
Nilay Vaish 0dbd696aae cpu: o3: single cycle default div microop latency on x86
This patch sets the default latency of the division microop to a single cycle
on x86.  This is because the division instructions DIV and IDIV have been
implemented as loops of div microops, where each microop computes a single bit
of the quotient.
2015-04-29 22:35:22 -05:00
Nilay Vaish ee06fed656 x86: change divide-by-zero fault to divide-error
Same exception is raised whether division with zero is performed or the
quotient is greater than the maximum value that the provided space can hold.
Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.
2015-04-29 22:35:22 -05:00
Andreas Hansson 179787f31f misc: Appease gcc 5.1 without moving GDB_REG_BYTES
This patch rolls back the move of the GDB_REG_BYTES constant, and
instead adds M5_VAR_USED.
2015-04-24 03:30:08 -04:00
bpotter 936768c8f4 config: enable setting SE-mode environment variables from file 2015-04-23 13:40:18 -07:00
Rene de Jong 483f873d01 arm, dev: Add a UFS device
This patch introduces a UFS host controller and a UFS device. More
information about the UFS standard can be found at the JEDEC site:
http://www.jedec.org/standards-documents/results/jesd220

Note that the model does not implement the complete standard, and as
such is not an actual implementation of UFS. The following SCSI
commands are implemented: inquiry, read, read capacity, report LUNs,
start/stop, test unit ready, verify, write, format unit, send
diagnostic, synchronize cache, mode select, mode sense, request sense,
unmap, write buffer and read buffer. This is sufficient for usage with
Linux and Android.

To interact with this model a kernel version 3.9 or above is
needed.
2015-04-23 13:37:50 -04:00
Rene de Jong fff28ce954 arm, dev: Add a NAND flash timing model
This adds a NAND flash timing model. This model takes the number of
planes into account and is ultimately intended to be used as a
high-level performance model for any device using flash. To access the
memory, use either readMemory or writeMemory.

To make use of the model you will need an interface model
such as UFSHostDevice, which is part of a separate patch.

At the moment the flash device is part of the ARM device tree since
the only use if the UFSHostDevice, and that in turn relies on the ARM
GIC.
2015-04-23 13:37:49 -04:00
Peter Enns 2e64590b88 dev: Add support for i2c devices
This patch adds an I2C bus and base device. I2C is used to connect a
variety of sensors, and this patch serves as a starting point to
enable a range of I2C devices.
2015-04-23 13:37:48 -04:00
Andreas Hansson c8c4f66889 misc: Appease gcc 5.1
This patch fixes a few small issues to ensure gem5 compiles when using
gcc 5.1.

First, the GDB_REG_BYTES in the RemoteGDB header are, rather
surprisingly, flagged as unused for both ARM and X86. Removing them,
however, causes compilation errors as they are actually used in the
source file. Moving the constant into the class definition fixes the
issue. Possibly a gcc bug.

Second, we have an unused EthPktData constructor using auto_ptr, and
the latter is deprecated. Since the code is never used it is simply
removed.
2015-04-23 13:37:46 -04:00
Steve Reinhardt 0cf36d9409 stats: update for previous changeset
Very small differences in IQ-specific O3 stats.
2015-04-22 20:22:29 -07:00
Brandon Potter a70a83155b cpu: remove conditional check (count > 0) on o3 IQ squashes
The o3 cpu instruction queue model uses the count variable to track the number
of unissued instructions in the queue. Previously, the squash method used
this variable to avoid executing the doSquash method when there were no
unissued instructions in the pipeline.  A corner case problem exists when
only issued instructions exist in the pipeline and a squash occurs; the
doSquash code is not invoked and subsequently does not clean up state properly.
2015-04-22 07:52:03 -07:00
Brandon Potter 4991c29965 syscall_emul: implement clock_gettime system call 2015-04-22 07:51:27 -07:00
Monir Mozumder 00e3cab8fc syscall_emul: update x86 syscall table
Update table with additional definitions through Linux 3.13.
2015-04-22 07:51:27 -07:00
Brandon Potter 344a437064 syscall_emul: update getrlimit to use warn
Don't use std::cerr directly, and just return EINVAL instead of aborting.
2015-04-22 07:51:27 -07:00
Brandon Potter 9c6509f198 syscall_emul: fix warning with wrong syscall name
Also nix extra whitespace.
2015-04-22 07:51:27 -07:00
Brandon Potter 6ad29ba6df base: add new ChunkGenerator method to identify last chunk 2015-04-22 07:51:27 -07:00
Steve Reinhardt 93c4527128 stats: update a few stats from long O3 runs
Very small changes to iew.predictedNotTakenIncorrect
and iew.branchMispredicts.  Looks like similar updates
were committed on April 3 (changeset 235ff1c046df), but
only for the quick tests.
2015-04-20 15:09:43 -07:00
Andreas Hansson cd76e34056 cpu: Remove the InOrderCPU from the tree
This patch takes the final step in removing the InOrderCPU from the
tree. Rest in peace.

The MinorCPU is now used to model an in-order microarchitecture, and
long term the MinorCPU will eventually be renamed InOrderCPU.
2015-04-20 12:46:35 -04:00