Commit graph

344 commits

Author SHA1 Message Date
Tony Gutierrez
1a7d3f9fcb gpu-compute: AMD's baseline GPU model 2016-01-19 14:28:22 -05:00
Andreas Sandberg
745f8229f6 dev, arm: Add a platform with support for both aarch32 and aarch64
Add a platform with support for both aarch32 and aarch64. This
platform implements a subset of the devices in a real Versatile
Express and extends it with some gem5-specific functionality. It is in
many ways similar to the old VExpress_EMM64 platform, but supports the
following new features:

  * Automatic PCI interrupt assignment
  * PCI interrupts allocated in a contiguous range.
  * Automatic boot loader selection (32-bit / 64-bit)
  * Cleaner memory map where gem5-specific devices live in CS5 which
    isn't used by current Versatile Express platforms.
  * No fake devices. Devices that were previously faked will be
    removed from the device tree instead.
  * Support for 510 GiB contiguous memory
2016-01-15 11:30:13 +00:00
Andreas Hansson
c965ca96cc configs: Fix inheritance of HMCSystem and cleanup spacing
Minor fix to ensure the HMCSystem can actually be instantiated
(SimObject cannot be created). Also address some spacing issues.
2016-01-11 05:52:17 -05:00
Gabor Dozsa
64ca31976f config: Updates for distributed gem5 simulations 2016-01-07 16:33:47 -06:00
Radhika Jagtap
9bd5051b60 config: Enable elastic trace capture and replay in se/fs
This patch adds changes to the configuration scripts to support elastic
tracing and replay.

The patch adds a command line option to enable elastic tracing in SE mode
and FS mode. When enabled the Elastic Trace cpu probe is attached to O3CPU
and a few O3 CPU parameters are tuned. The Elastic Trace probe writes out
both instruction fetch and data dependency traces. The patch also enables
configuring the TraceCPU to replay traces using the SE and FS script.

The replay run is designed to resume from checkpoint using atomic cpu to
restore state keeping it consistent with FS run flow. It then switches to
TraceCPU to replay the input traces.
2015-12-07 16:42:16 -06:00
Andreas Sandberg
78275c9d2f dev: Rewrite PCI host functionality
The gem5's current PCI host functionality is very ad hoc. The current
implementations require PCI devices to be hooked up to the
configuration space via a separate configuration port. Devices query
the platform to get their config-space address range. Un-mapped parts
of the config space are intercepted using the XBar's default port
mechanism and a magic catch-all device (PciConfigAll).

This changeset redesigns the PCI host functionality to improve code
reuse and make config-space and interrupt mapping more
transparent. Existing platform code has been updated to use the new
PCI host and configured to stay backwards compatible (i.e., no
guest-side visible changes). The current implementation does not
expose any new functionality, but it can easily be extended with
features such as automatic interrupt mapping.

PCI devices now register themselves with a PCI host controller. The
host controller interface is defined in the abstract base class
PciHost. Registration is done by PciHost::registerDevice() which takes
the device, its bus position (bus/dev/func tuple), and its interrupt
pin (INTA-INTC) as a parameter. The registration interface returns a
PciHost::DeviceInterface that the PCI device can use to query memory
mappings and signal interrupts.

The host device manages the entire PCI configuration space. Accesses
to devices decoded into the devices bus position and then forwarded to
the correct device.

Basic PCI host functionality is implemented in the GenericPciHost base
class. Most platforms can use this class as a basic PCI controller. It
provides the following functionality:

  * Configurable configuration space decoding. The number of bits
    dedicated to a device is a prameter, making it possible to support
    both CAM, ECAM, and legacy mappings.

  * Basic interrupt mapping using the interruptLine value from a
    device's configuration space. This behavior is the same as in the
    old implementation. More advanced controllers can override the
    interrupt mapping method to dynamically assign host interrupts to
    PCI devices.

  * Simple (base + addr) remapping from the PCI bus's address space to
    physical addresses for PIO, memory, and DMA.
2015-12-05 00:11:24 +00:00
Andreas Sandberg
6a05179e13 arm, config: Automatically discover available platforms
Add support for automatically discover available platforms. The
Python-side uses functionality similar to what we use when
auto-detecting available CPU models. The machine IDs have been updated
to match the platform configurations. If there isn't a matching
machine ID, the configuration scripts default to -1 which Linux uses
for device tree only platforms.
2015-12-04 00:19:05 +00:00
Andreas Hansson
7433d77fcf mem: Add an option to perform clean writebacks from caches
This patch adds the necessary commands and cache functionality to
allow clean writebacks. This functionality is crucial, especially when
having exclusive (victim) caches. For example, if read-only L1
instruction caches are not sending clean writebacks, there will never
be any spills from the L1 to the L2. At the moment the cache model
defaults to not sending clean writebacks, and this should possibly be
re-evaluated.

The implementation of clean writebacks relies on a new packet command
WritebackClean, which acts much like a Writeback (renamed
WritebackDirty), and also much like a CleanEvict. On eviction of a
clean block the cache either sends a clean evict, or a clean
writeback, and if any copies are still cached upstream the clean
evict/writeback is dropped. Similarly, if a clean evict/writeback
reaches a cache where there are outstanding MSHRs for the block, the
packet is dropped. In the typical case though, the clean writeback
allocates a block in the downstream cache, and marks it writable if
the evicted block was writable.

The patch changes the O3_ARM_v7a L1 cache configuration and the
default L1 caches in config/common/Caches.py
2015-11-06 03:26:43 -05:00
Andreas Hansson
654266f39c mem: Add cache clusivity
This patch adds a parameter to control the cache clusivity, that is if
the cache is mostly inclusive or exclusive. At the moment there is no
intention to support strict policies, and thus the options are: 1)
mostly inclusive, or 2) mostly exclusive.

The choice of policy guides the behaviuor on a cache fill, and a new
helper function, allocOnFill, is created to encapsulate the decision
making process. For the timing mode, the decision is annotated on the
MSHR on sending out the downstream packet, and in atomic we directly
pass the decision to handleFill. We (ab)use the tempBlock in cases
where we are not allocating on fill, leaving the rest of the cache
unaffected. Simple and effective.

This patch also makes it more explicit that multiple caches are
allowed to consider a block writable (this is the case
also before this patch). That is, for a mostly inclusive cache,
multiple caches upstream may also consider the block exclusive. The
caches considering the block writable/exclusive all appear along the
same path to memory, and from a coherency protocol point of view it
works due to the fact that we always snoop upwards in zero time before
querying any downstream cache.

Note that this patch does not introduce clean writebacks. Thus, for
clean lines we are essentially removing a cache level if it is made
mostly exclusive. For example, lines from the read-only L1 instruction
cache or table-walker cache are always clean, and simply get dropped
rather than being passed to the L2. If the L2 is mostly exclusive and
does not allocate on fill it will thus never hold the line. A follow
on patch adds the clean writebacks.

The patch changes the L2 of the O3_ARM_v7a CPU configuration to be
mostly exclusive (and stats are affected accordingly).
2015-11-06 03:26:41 -05:00
Nilay Vaish
6433a10749 configs: fix bug introduced due to 276ad9121192
I had made a typo in changeset 276ad9121192.  This changeset fixes it
2015-11-04 12:36:28 -06:00
Erfan Azarkhish
100cbc9cf6 mem: hmc: top level design
This patch enables modeling a complete Hybrid Memory Cube (HMC) device. It
highly reuses the existing components in gem5's general memory system with some
small modifications. This changeset requires additional patches to model a
complete HMC device.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-03 12:17:56 -06:00
Palle Lyckegaard
2cb491379b sparc: add missing parameter to makeSparcSystem()
makeSparcSystem() in configs/common/FSConfig.py is missing the cmdLine
parameter Without the parameter the simulation fails to start.  With the
parameter the simulation starts properly.
2015-11-03 12:17:55 -06:00
Jason Lowe-Power
f065f9941b config: Add configs scripts used in Learning gem5
Added a new directory in configs (learning_gem5) to hold the scripts that are
used in the book. See http://lowepower.com/jason/learning_gem5/ for a working
copy. For now, only the scripts in Part 1: Getting started with gem5
have been added. A separate patch adds tests for these scripts.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-09-16 09:35:36 -05:00
Andreas Hansson
ddfa96cf45 mem: Add explicit Cache subclass and make BaseCache abstract
Open up for other subclasses to BaseCache and transition to using the
explicit Cache subclass.

--HG--
rename : src/mem/cache/BaseCache.py => src/mem/cache/Cache.py
2015-08-21 07:03:23 -04:00
Matthias Jung
8723b08dbf misc: Coupling gem5 with SystemC TLM2.0
Transaction Level Modeling (TLM2.0) is widely used in industry for creating
virtual platforms (IEEE 1666 SystemC). This patch contains a standard compliant
implementation of an external gem5 port, that enables the usage of gem5 as a
TLM initiator component in SystemC based virtual platforms. Both TLM coding
paradigms loosely timed (b_transport) and aproximately timed (nb_transport) are
supported.

Compared to the original patch a TLM memory manager was added. Furthermore, the
transaction object was removed and for each TLM payload a PacketPointer that
points to the original gem5 packet is added as an TLM extension.  For event
handling single events are now created.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-08-03 23:08:40 -05:00
Andreas Hansson
b93c912013 mem: Remove redundant is_top_level cache parameter
This patch takes the final step in removing the is_top_level parameter
from the cache. With the recent changes to read requests and write
invalidations, the parameter is no longer needed, and consequently
removed.

This also means that asymmetric cache hierarchies are now fully
supported (and we are actually using them already with L1 caches, but
no table-walker caches, connected to a shared L2).
2015-07-03 10:14:43 -04:00
Andreas Hansson
893533a126 mem: Allow read-only caches and check compliance
This patch adds a parameter to the BaseCache to enable a read-only
cache, for example for the instruction cache, or table-walker cache
(not for x86). A number of checks are put in place in the code to
ensure a read-only cache does not end up with dirty data.

A follow-on patch adds suitable read requests to allow a read-only
cache to explicitly ask for clean data.
2015-07-03 10:14:39 -04:00
Andreas Sandberg
7c4eb3b4d8 kvm, arm: Add support for aarch64
This changeset adds support for aarch64 in kvm. The CPU module
supports both checkpointing and online CPU model switching as long as
no devices are simulated by the host kernel. It currently has the
following limitations:

   * The system register based generic timer can only be simulated by
     the host kernel. Workaround: Use a memory mapped timer instead to
     simulate the timer in gem5.

   * Simulating devices (e.g., the generic timer) in the host kernel
     requires that the host kernel also simulates the GIC.

   * ID registers in the host and in gem5 must match for switching
     between simulated CPUs and KVM. This is particularly important
     for ID registers describing memory system capabilities (e.g.,
     ASID size, physical address size).

   * Switching between a virtualized CPU and a simulated CPU is
     currently not supported if in-kernel device emulation is
     used. This could be worked around by adding support for switching
     to the gem5 (e.g., the KvmGic) side of the device models. A
     simpler workaround is to avoid in-kernel device models
     altogether.
2015-06-01 19:44:19 +01:00
Andreas Hansson
554ddc7c07 arch, cpu: Do not forward snoops to table walker
This patch simplifies the overall CPU by changing the TLB caches such
that they do not forward snoops to the table walker port(s). Note that
only ARM and X86 are affected.

There is no reason for the ports to snoop as they do not actually take
any action, and from a performance point of view we are better of not
snooping more than we have to.

Should it at a later point be required to snoop for a particular TLB
design it is easy enough to add it back.
2015-05-05 03:22:27 -04:00
Nilay Vaish
4333549575 cpu: o3: replace issueLatency with bool pipelined
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued.  As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined).  The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value.  To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean.  If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
2015-04-29 22:35:22 -05:00
bpotter
936768c8f4 config: enable setting SE-mode environment variables from file 2015-04-23 13:40:18 -07:00
Andreas Hansson
076ea249ae config: Remove memory aliases and rely on class name
Instead of maintaining two lists, rely entirely on the class
name. There is really no point in causing unecessary confusion.
2015-04-20 12:46:29 -04:00
Malek Musleh
826f69b470 config, cpu: fix progress interval for switched CPUs
This patch ensures that the CPU progress Event is triggered for the new set of
switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids
printing the interval state if the cpu is currently switched out.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-14 11:01:10 -05:00
Dibakar Gope
34ad1123ee cpu: re-organizes the branch predictor structure.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-13 17:33:57 -05:00
Curtis Dunham
c3268f8820 config: Support full-system with SST's memory system
This patch adds an example configuration in ext/sst/tests/ that allows
an SST/gem5 instance to simulate a 4-core AArch64 system with SST's
memHierarchy components providing all the caches and memories.
2015-04-08 15:56:06 -05:00
Andreas Hansson
aeffde5ed5 arm, configs: Do not forward snoops from I cache
This fix simply tells the I cache to not forward snoops to the fetch
unit (since there is really no reason to do so).
2015-03-27 04:56:10 -04:00
Steve Reinhardt
c55749d998 config: expand '~' and '~user' in paths 2015-03-23 16:14:19 -07:00
Curtis Dunham
bcea57afc3 config: Add ability to exit simulation after initialization
When using gem5 as a slave simulator, it will not advance the
clock on its own and depends on the master simulator calling
simulate(). This new option lets us use the Python scripts
to do all the configuration while stopping short of actually
simulating anything.
2015-03-23 06:57:38 -04:00
Chris Emmons
142ab40c4b config: Specify OS type and release on command line
This patch enables users to speficy --os-type on the command
line. This option is used to take specific actions for an OS type,
such as changing the kernel command line. This patch is part of the
Android KitKat enablement.
2015-03-19 04:06:14 -04:00
Rizwana Begum
0c8e025c3b config: Fix for 'android' lookup in disk name
This patch modifies FSConfig.py to look for 'android' only in disk
image name. Before this patch, 'android' was searched in full
disk path.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-03-09 09:39:08 -05:00
Andreas Hansson
36dc93a5fa mem: Move crossbar default latencies to subclasses
This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.

Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
2015-03-02 04:00:47 -05:00
Curtis Dunham
07ce60bdfa config: add --root-device machine parameter
In case /dev/sda1 is not actually the boot partition for an image,
we can override it on the command line or in a benchmark definition.
2015-01-16 14:12:03 -06:00
Steve Reinhardt
774922895b config: rename 'file' var
Rename uses of 'file' as a local variable to avoid conflict
with the built-in type of the same name.
2015-02-05 16:45:12 -08:00
Steve Reinhardt
634d923751 config: make M5_PATH a real search path
Although you can put a list of colon-separated directory names
in M5_PATH, the current code just takes the first one that
exists and assumes all files must live there.  This change
makes the code search the specified list of directories
for each individual binary or disk image that's requested.

The main motivation is that the x86/Alpha binaries and the
ARM binaries are in separate downloads, and thus naturally
end up in separate directories.  With this change, you can
have M5_PATH point to those two directories, then run any
FS regression test without changing M5_PATH.  Currently,
you either have to merge the two download directories
or change M5_PATH (or do something else I haven't figured out).
2015-02-05 16:45:06 -08:00
Andreas Hansson
28a7cea2b3 config: Add XOR hashing to the DRAM channel interleaving
This patch uses the recently added XOR hashing capabilities for the
DRAM channel interleaving. This avoids channel biasing due to strided
access patterns.
2015-02-03 14:25:55 -05:00
Andreas Hansson
5ea60a95b3 config: Adjust DRAM channel interleaving defaults
This patch changes the DRAM channel interleaving default behaviour to
be more representative. The default address mapping (RoRaBaCoCh) moves
the channel bits towards the least significant bits, and uses 128 byte
as the default channel interleaving granularity.

These defaults can be overridden if desired, but should serve as a
sensible starting point for most use-cases.
2015-02-03 14:25:52 -05:00
Malek Musleh
ca131a4196 config: arm: fix os_flags
Fix the makeArmSystem routine to reflect recent changes that support kernel
commandline option when running android. Without this fix, trying to run
android encounters a 'reference before assignment' error.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-30 15:49:34 -06:00
Andreas Hansson
3cb9c361e2 scons: Do not build the InOrderCPU
One step closer to shifting focus to the MinorCPU.
2015-01-20 08:12:45 -05:00
Andreas Hansson
59460b91f3 config: Expose the DRAM ranks as a command-line option
This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).

The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).
2014-12-23 09:31:18 -05:00
Marco Elver
177682ead4 config: Add --memchecker option
This patch adds the --memchecker option, to denote that a MemChecker
should be instantiated for the system. The exact usage of the MemChecker
depends on the system configuration.

For now CacheConfig.py makes use of the option, adding MemCheckerMonitor
instances between CPUs and D-Caches.

Note, however, that currently this only provides limited checking on a
running system; other parts of the system, such as I/O devices are not
monitored, and may cause warnings to be issued by the monitor.
2014-12-23 09:31:18 -05:00
Dam Sunwoo
809134a2b1 config: Add options to take/resume from SimPoint checkpoints
More documentation at http://gem5.org/Simpoints

Steps to profile, generate, and use SimPoints with gem5:

1. To profile workload and generate SimPoint BBV file, use the
following option:

--simpoint-profile --simpoint-interval <interval length>

Requires single Atomic CPU and fastmem.
<interval length> is in number of instructions.

2. Generate SimPoint analysis using SimPoint 3.2 from UCSD.
(SimPoint 3.2 not included with this flow.)

3. To take gem5 checkpoints based on SimPoint analysis, use the
following option:

--take-simpoint-checkpoint=<simpoint file path>,<weight file
path>,<interval length>,<warmup length>

<simpoint file> and <weight file> is generated by SimPoint analysis
tool from UCSD. SimPoint 3.2 format expected. <interval length> and
<warmup length> are in number of instructions.

4. To resume from gem5 SimPoint checkpoints, use the following option:

--restore-simpoint-checkpoint -r <N> --checkpoint-dir <simpoint
checkpoint path>

<N> is (SimPoint index + 1). E.g., "-r 1" will resume from SimPoint
#0.
2014-12-23 09:31:17 -05:00
Gabe Black
7540656fc5 config: Add two options for setting the kernel command line.
Both options accept template which will, through python string formatting,
have "mem", "disk", and "script" values substituted in from the mdesc.
Additional values can be used on a case by case basis by passing them as
keyword arguments to the fillInCmdLine function. That makes it possible to
have specialized parameters for a particular ISA, for instance.

The first option lets you specify the template directly, and the other lets
you specify a file which has the template in it.
2014-12-04 16:42:07 -08:00
Gabe Black
b7dc4ba516 config: Get rid of some extra spaces around default arguments. 2014-12-03 03:11:00 -08:00
Nilay Vaish
3022d463fb ruby: interface with classic memory controller
This patch is the final in the series.  The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system.  This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs.  Classic's memory controller is more up to date and
supports multiple different types of DRAM.  This also brings classic and
ruby ever more close.  The patch also changes ruby's memory controller to
expose the same interface.
2014-11-06 05:42:21 -06:00
Ali Saidi
f2db2a96d1 arm, tests: Update config files to more recent kernels and create 64-bit regressions.
This changes the default ARM system to a Versatile Express-like system that supports
2GB of memory and PCI devices and updates the default kernels/file-systems for
AArch64 ARM systems (64-bit) to support up to 32GB of memory and PCI devices. Some
platforms that are no longer supported have been pruned from the configuration files.

In addition a set of 64-bit ARM regressions have been added to the regression system.
2014-10-29 23:18:27 -05:00
Ali Saidi
3a5c975fd7 arm: fix bare-metal memory setup.
The bare-metal configuration option still configured memory with the old scheme
that no-longer works. This change unifies the code so there aren't any differences.
2014-10-29 23:18:26 -05:00
Nilay Vaish
b80e574d01 config: separate function for instantiating a memory controller
This patch moves code for instantiating a single memory controller from
the function config_mem() to a separate function.  This is being done
so that memory controllers can be instantiated without assuming that
they will be attached to the system in a particular fashion.
2014-10-11 15:02:23 -05:00
Jiuyue Ma
e1a5522a89 config, x86: Ensure that PCI devs get bridged to the memory bus
This patch force IO device to be mapped to 0xC0000000-0xFFFF0000 by
reserve anything between the end of memory and 3GB if memory is less
than 3GB. It also statically bridge these address range to the IO bus,
which guaranty access to pci address space will pass though bridge to
iobus.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-07-17 12:05:41 +08:00
Jiuyue Ma
7d03bf4d6b config, x86: swap bus_id of ISA/PCI in X86 IntelMPTable
This patch assign bus_id=0 to PCI bus and bus_id=1 to ISA bus for
X86 platform. Because PCI device get config space address using
Pc::calcPciConfigAddr() which requires "assert(bus==0)".
This fixes PCI interrupt routing and discovery on Linux.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-07-17 11:00:12 +08:00
Andreas Hansson
1f6d5f8f84 mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.

As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.

--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
2014-09-20 17:18:32 -04:00