Commit graph

10433 commits

Author SHA1 Message Date
Dam Sunwoo
74a4926fe0 sim: remove kernel mapping check for baremetal workloads
Baremetal workloads are specified using the "kernel" parameter, but
don't always have the correct address mappings. This patch adds a
boolean flag to the system and bypasses the kernel addr mapping checks
when running in baremetal mode.
2014-08-13 06:57:35 -04:00
Andreas Sandberg
41d069ef6a scons: Build the branch predictor for all CPUs
The branch predictor is normally only built when a CPU that uses a
branch predictor is built. The list of CPUs is currently incomplete as
the simple CPUs support branch predictors (for warming, branch stats,
etc). In practice, all CPU models now use branch predictors, so this
changeset removes the CPU model check and replaces it with a check for
the NULL ISA.
2014-08-13 06:57:31 -04:00
Andreas Sandberg
8b8d991df0 mips: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
MIPS-specific classes to silence the warning.
2014-08-13 06:57:30 -04:00
Andreas Sandberg
8d04e32a83 power: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
POWER-specific ProcessInfo struct to silence the warning.
2014-08-13 06:57:29 -04:00
Andreas Sandberg
6b908211e6 scons: Silence clang 3.4 warnings on Ubuntu 12.04
This changeset fixes three types of warnings that occur in clang 3.4
on Ubuntu 12.04:

 * Certain versions of libstdc++ (primarily 4.8) use struct and class
   interchangeably. This triggers a warning in clang.

 * Swig has a tendency to generate code with the register class which
   was deprecated in C++11. This triggers a deprecation warning in
   clang.

 * Swig sometimes generates Python wrapper code which returns
   uninitialized values. It's unclear if this is actually a problem
   (the cases might be limited to failure paths). We'll silence these
   warnings for now since there is little we can do about the
   generated code.
2014-08-13 06:57:28 -04:00
Andreas Sandberg
eb9226317d base: Remove unused M5_PRAGMA_NORETURN
The M5_PRAGMA_NORETURN macro was only used in for
__exit_message. Since the macro only holds a stub definition and all
functions with noreturn semantics use the M5_ATTR_NORETURN, this
macros is completely redundant.
2014-08-13 06:57:27 -04:00
Andreas Sandberg
25f5a6733c cpu: Don't forward declare RefCountingPtr
RefCountingPtr is sometimes forward declared to avoid having to
include refcnt.hh. This does not work since we typically return
instances of RefCountingPtr rather than references to instances. The
only reason this currently works is that we include refcnt.hh in
cprintf.hh, which "leaks" the header to most other source files. This
changeset replaces such forward declarations with an include of
refcnt.hh.
2014-08-13 06:57:26 -04:00
Andreas Sandberg
43f1e41c02 util: Fix state leakage in the SortIncludes style verifier
There are cases where the state of a SortIncludes object gets messed
up and leaks between invocations/files. This typically happens when a
file ends with an include block (dump_block() gets called at the end
of __call__). In this case, the state of the class is not reset
between files. This bug manifests itself as ghost includes that leak
between files when applying the style hooks.

This changeset adds a reset at the beginning of the __call__ method
which ensures that the class is always in a clean state when
processing a new file.
2014-08-13 06:57:25 -04:00
Mitch Hayenga
f6f6ae461e mem: Properly set cache block status fields on writebacks
When a cacheline is written back to a lower-level cache,
tags->insertBlock() sets various status parameters. However these
status bits were cleared immediately after calling. This patch makes
it so that these status fields are not cleared by moving them outside
of the tags->insertBlock() call.
2014-08-13 06:57:24 -04:00
Andreas Hansson
66904b9584 cpu: Modernise the branch predictor (STL and C++11)
This patch does some minor house keeping of the branch predictor by
adopting STL containers, and shifting some iterator to use range-based
for loops.

The predictor history is also changed from a list to a deque as we
never to insertion/deletion other than at the front and back.
2014-08-13 06:57:21 -04:00
Curtis Dunham
94daae6864 arm: remove dead code fplib mul64x64 2014-03-11 09:50:02 -05:00
Mitch Hayenga
0270cf13ac ext: clang fix for flexible array members
Changes how flexible array members are defined so clang does not error
out during compilation.
2014-08-13 06:57:19 -04:00
Radhika Jagtap
860f00228b config: Fix cache latency param in mem test
This patch fixes the cache latency in mem test which is split into two params,
hit and response latency as per BaseCache.
2014-08-10 05:39:40 -04:00
Radhika Jagtap
2ee47fc8d1 util: Move packet trace file read to protolib
This patch moves the code for opening an input protobuf packet trace into
a function defined in the protobuf library. This is because the code is
commonly used in decode scripts and is independent of the src protobuf
message.
2014-08-10 05:39:20 -04:00
Geoffrey Blake
dbdce42b88 config: Add SubSystem container for simobjects
This patch adds the SubSystem container for grouping
simobjects together in logical subsystems to facilitate
building a larger system from constituent parts.  The container
is simply a non-abstract empty simobject to hold the components
that will be connected as its children.  In simulation the
object does not participate, its only use is during configuration
of the system.
2014-08-10 05:39:16 -04:00
Geoffrey Blake
09b5003815 config: Add hooks to enable new config sys
This patch adds helper functions to SimObject.py, params.py and
simulate.py to enable the new configuration system.  Functions like
enumerateParams() in SimObject lets the config system auto-generate
command line options for simobjects to be modified on the command
line.

Params in params.py have __call__() added
to their definition to allow the argparse module to use them
as a type to check command input is in the proper format.
2014-08-10 05:39:13 -04:00
Andreas Hansson
47313601c1 cpu: Ensure the traffic generator suppresses non-memory packets
This patch adds a check to ensure that packets which are not going to
a memory range are suppressed in the traffic generator. Thus, if a
trace is collected in full-system, the packets destined for devices
are not played back.
2014-08-10 05:39:04 -04:00
Andreas Hansson
d45ab59c29 base: Remove unused files
A bit of pruning
2014-08-10 05:38:59 -04:00
Andreas Hansson
b90bdcf8d0 scons: Warn for incompatible gcc and binutils
It seems gcc >4.8 does not get along well with binutils <= 2.22, and
to help users this patch adds a warning with an indication for how to
fix the issue. It might even be worth adding a Exit(-1) and stop the
build.
2014-08-10 05:38:56 -04:00
Anthony Gutierrez
a628afedad mem: refactor LRU cache tags and add random replacement tags
this patch implements a new tags class that uses a random replacement policy.
these tags prefer to evict invalid blocks first, if none are available a
replacement candidate is chosen at random.

this patch factors out the common code in the LRU class and creates a new
abstract class: the BaseSetAssoc class. any set associative tag class must
implement the functionality related to the actual replacement policy in the
following methods:

accessBlock()
findVictim()
insertBlock()
invalidate()
2014-07-28 12:23:23 -04:00
Anthony Gutierrez
0ac4624595 arm: make the PseudoLRU tags the default for the O3_ARM_v7aL2
the Cortex-A15 has a random replacement policy for its L2 cache. see the
Cortex-A15 Technical Reference Manual 1.7 About the L2 memory system. this
patch makes the PseudoLRU tags the default for the ARM O3 CPU's L2 cache.
2014-07-28 12:22:00 -04:00
Andreas Hansson
cbf417c713 stats: Bump stats for the regressions using the minor CPU
Updating the stats to match the current behaviour.
2014-07-28 01:48:21 -04:00
Andrew Bardsley
5d0b25ba3f cpu: Minor CPU add regression tests for ARM and ALPHA
This patch adds regression tests results and test harnesses
for the Minor CPU on ARM and ALPHA.
2014-07-23 16:09:05 -05:00
Andrew Bardsley
0e8a90f06b cpu: `Minor' in-order CPU model
This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).

The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).

Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.

Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.

Minor is faster than the o3 model. Sample results:

     Benchmark     |   Stat host_seconds (s)
    ---------------+--------v--------v--------
     (on ARM, opt) | simple | o3     | minor
                   | timing | timing | timing
    ---------------+--------+--------+--------
    10.linux-boot  |   169  |  1883  |  1075
    10.mcf         |   117  |   967  |   491
    20.parser      |   668  |  6315  |  3146
    30.eon         |   542  |  3413  |  2414
    40.perlbmk     |  2339  | 20905  | 11532
    50.vortex      |   122  |  1094  |   588
    60.bzip2       |  2045  | 18061  |  9662
    70.twolf       |   207  |  2736  |  1036
2014-07-23 16:09:04 -05:00
Steve Reinhardt
040fa23d01 stats: update for syscall DPRINTF change
Only printing one rather than two args for the ignored syscall
warning means the count of register accesses has changed on
a few runs.  Oddly only Alpha Tru64 seems to have any ignored
syscalls in the regression tests.
2014-07-19 19:04:58 -07:00
Steve Reinhardt
06bb6a4731 syscall emulation: fix fast build issue
Surprisingly gcc will complain about unused variables even
inside an 'if (false)' block.

I thought I had tested this previously, but apparently not.
2014-07-19 02:06:22 -07:00
Binh Pham
c99b13d904 x86: make PioBus return BadAddress errors
Stop setting the use_default_range flag in PioBus in order to
have random bad addresses result in a BadAddress response and
not a gem5 fatal error.  This is necessary in Ruby as Ruby is
connected directly to PioBus, so misspeculated addresses will
be sent there directly.  For the classic memory system, this
change has no effect, as bad addresses are caught by the
memory bus before being sent to the PioBus.

This work was done while Binh was an intern at AMD Research.
2014-07-18 22:05:51 -07:00
Steve Reinhardt
fe530648d5 sim: remove unused MemoryModeStrings array
The System object has a static MemoryModeStrings array
that's (1) unused and (2) redundant, since there's an
auto-generated version in the Enums namespace.  No
point in leaving it in.
2014-07-18 22:05:51 -07:00
Steve Reinhardt
e3de6950a4 kern: get rid of unused linux syscall files 2014-07-18 22:05:51 -07:00
Steve Reinhardt
f5aace8300 syscall emulation: fix DPRINTF arg ordering bug
When we switched getSyscallArg() from explicit arg indices to
the implicit method, some DPRINTF arguments were left as calls
to getSyscallArg(), even though C/C++ doesn't guarantee
anything about the order of invocation of these calls.  As a
result, the args could be printed out in arbitrary orders.

Interestingly, this bug has been around since 2009:
http://repo.gem5.org/gem5/rev/4842482e1bd1
2014-07-18 22:05:51 -07:00
Anthony Gutierrez
59c8c454eb base: fix operator== for comparing EthAddr objects
this operator uses memcmp() to detect if two EthAddr object have the same
address, however memcmp() will return 0 if all bytes are equal. operator==
returns the return value of memcmp() to indicate whether or not two
address are equal. this is incorrect as it will always give the opposite of
the intended behavior. this patch fixes that problem.
2014-07-09 09:28:15 -04:00
Anthony Gutierrez
3956ec0a89 base: fix some bugs in EthAddr
per the IEEE 802 spec:
1) fixed broadcast() to ensure that all bytes are equal to 0xff.
2) fixed unicast() to ensure that bit 0 of the first byte is equal to 0
3) fixed multicast() to ensure that bit 0 of the first byte is equal to 1, and
   that it is not a broadcast.

also the constructors in EthAddr are fixed so that all bytes of data are
initialized.
2014-07-02 13:19:13 -04:00
Radhika Jagtap
b998a0c6ac util: Add DVFS perfLevel to checkpoint upgrade script
This patch updates the checkpoint upgrader script. It adds the _perfLevel
variable in the clock domain and voltage domain simObjects used for DVFS.
2014-07-01 11:58:22 -04:00
Stephan Diestelhorst
65cea4708e power: Add basic DVFS support for gem5
Adds DVFS capabilities to gem5, by allowing users to specify lists for
frequencies and voltages in SrcClockDomains and VoltageDomains respectively.
A separate component, DVFSHandler, provides a small interface to change
operating points of the associated domains.

Clock domains will be linked to voltage domains and thus allow separate clock,
but shared voltage lines.

Currently all the valid performance-level updates are performed with a fixed
transition latency as specified for the domain.

Config file example:
...
vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V'])
tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster1.clk_domain.domain_id = 0
tsys.cluster2.clk_domain.domain_id = 1
tsys.cluster1.clk_domain.voltage_domain = vd
tsys.cluster2.clk_domain.voltage_domain = vd
tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain,
                             tsys.cluster2.clk_domain]
tsys.dvfs_handler.enable = True
2014-06-30 13:56:06 -04:00
Andreas Hansson
641e602830 mem: DRAMPower trace formatting script
This patch adds a first version of a script that processes the debug
output and generates a command trace for DRAMPower. This is work in
progress and is intended as a snapshot of ongoing work at this point.

The longer term plan is to link in DRAMPower as a library and have one
instance of the model per rank, and instantiate it based on a struct
passed from gem5. Each command will then be a call to the model and no
parsing of traces will be necessary.
2014-06-30 13:56:04 -04:00
Andreas Hansson
1f539ce4cc mem: DRAMPower trace output
This patch adds a DRAMPower flag to enable off-line DRAM power
analysis using the DRAMPower tool. A new DRAMPower flag is added
and a follow-on patch adds a Python script to post-process the output
and order it based on time stamps.

The long-term goal is to link DRAMPower as a library and provide the
commands through function calls to the model rather than first
printing and then parsing the commands. At the moment it is also up to
the user to ensure that the same DRAM configuration is used by the
gem5 controller model and DRAMPower.
2014-06-30 13:56:03 -04:00
Andreas Hansson
b4ce51eb9e mem: Add bank and rank indices as fields to the DRAM bank
This patch adds the index of the bank and rank as a field so that we can
determine the identity of a given bank (reference or pointer) for the
power tracing. We also grab the opportunity of cleaning up the
arguments used for identifying the bank when activating.
2014-06-30 13:56:02 -04:00
Andreas Hansson
d59bc8ee1f mem: Extend DRAM row bits from 16 to 32 for larger densities
This patch extends the DRAM row bits to 32 to support larger density
memories. Additional checks are also added to ensure the row fits in
the 32 bits.
2014-06-30 13:56:01 -04:00
Anthony Gutierrez
f34a8f0d61 cpu: implement a bi-mode branch predictor 2014-06-30 13:50:03 -04:00
Anthony Gutierrez
db267da822 arm: make the bi-mode predictor the default for O3_ARM_v7a_BP
the branch predictor used in the Cortex-A15 is a bi-mode style predictor,
see:

http://arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf
and
http://nvidia.com/docs/IO/116757/NVIDIA_Quad_a15_whitepaper_FINALv2.pdf

this patch makes the bi-mode predictor the default for the ARM O3 CPU.
2014-06-30 13:50:01 -04:00
Steve Reinhardt
5b08e211ab stats: update for O3 changes
Mostly small differences in total ticks, but O3 stall causes
shifted significantly.

30.eon does speed up by ~6% on Alpha and ARM, and 50.vortex
by 4.5% on ARM.  At the other extreme, X86 70.twolf is 0.8%
slower.
2014-06-22 14:33:09 -07:00
Binh Pham
b085db84af x86: fix table walker assertion
In a cycle, we could see a R and W requests corresponding to the same
page walk being sent to the memory. During the cycle that assertion
happens, we have 2 responses corresponding to the R and W above. We
also have a 'read' variable to keep track of the inflight Read
request, this gets reset to NULL right after we send out any R
request; and gets set to the next R in the page walk when a response
comes back.

The issue we are seeing here is when we get a response for W request,
assert(!read) fires because we got a response for R request right
before this, hence we set 'read' to NOT NULL value, pointing to the
next R request in the pagewalk!

This work was done while Binh was an intern at AMD Research.
2014-06-21 10:39:44 -07:00
Binh Pham
b72c879868 o3: make dispatch LSQ full check more selective
Dispatch should not check LSQ size/LSQ stall for non load/store
instructions.

This work was done while Binh was an intern at AMD Research.
2014-06-21 10:26:55 -07:00
Binh Pham
0782d92286 o3: split load & store queue full cases in rename
Check for free entries in Load Queue and Store Queue separately to
avoid cases when load cannot be renamed due to full Store Queue and
vice versa.

This work was done while Binh was an intern at AMD Research.
2014-06-21 10:26:43 -07:00
Andreas Hansson
fdb965f5c1 scons: Bump the compiler version to gcc 4.6 and clang 3.0
This patch bumps the supported version of gcc from 4.4 to 4.6, and
clang from 2.9 to 3.0. This enables, amongst other things, range-based
for loops, lambda expressions, etc. The STL implementation shipping
with 4.6 also has a full functional implementation of unique_ptr and
shared_ptr.
2014-06-10 17:44:39 -04:00
Joel Hestness
4a98b0cd59 Util: Do not style check symlinks
The style checker used to traverse symlinks if they pointed to files, which can
result in style checker failure if the pointed-to file doesn't exist. This
style check is actually unnecessary, since symlinks either point to other files
that are already style checked, or files outside gem5, which shouldn't be
checked. Skip symlinks.
2014-06-09 22:01:18 -05:00
Joel Hestness
4f8ac94549 sim: More rigorous clocking comments
The language describing the clockEdge and nextCycle functions were ambiguous,
and so were prone to misinterpretation/misuse. Clear up the comments to more
rigorously describe their functionality.
2014-06-09 22:01:16 -05:00
Yasuko Eckert
fbe3688de3 ext: Add a McPAT regression tester
Add a regression tester to McPAT. Joel Hestness wrote these tests and Yasuko
Eckert modified them to reflect the new McPAT interface and other changes
the previous patch made.
2014-06-04 07:48:20 -07:00
Yasuko Eckert
0deef376d9 ext: McPAT interface changes and fixes
This patch includes software engineering changes and some generic bug fixes
Joel Hestness and Yasuko Eckert made to McPAT 0.8. There are still known
issues/concernts we did not have a chance to address in this patch.

High-level changes in this patch include:
 1) Making XML parsing modular and hierarchical:
   - Shift parsing responsibility into the components
   - Read XML in a (mostly) context-free recursive manner so that McPAT input
     files can contain arbitrary component hierarchies
 2) Making power, energy, and area calculations a hierarchical and recursive
    process
   - Components track their subcomponents and recursively call compute
     functions in stages
   - Make C++ object hierarchy reflect inheritance of classes of components
     with similar structures
   - Simplify computeArea() and computeEnergy() functions to eliminate
     successive calls to calculate separate TDP vs. runtime energy
   - Remove Processor component (now unnecessary) and introduce a more abstract
     System component
 3) Standardizing McPAT output across all components
   - Use a single, common data structure for storing and printing McPAT output
   - Recursively call print functions through component hierarchy
 4) For caches, allow splitting data array and tag array reads and writes for
    better accuracy
 5) Improving the usability of CACTI by printing more helpful warning and error
    messages
 6) Minor: Impose more rigorous code style for clarity (more work still to be
    done)
Overall, these changes greatly reduce the amount of replicated code, and they
improve McPAT runtime and decrease memory footprint.
2014-06-03 13:32:59 -07:00
Yasuko Eckert
1104199115 ext: change McPAT to not force compile in 32-bit mode. 2014-06-03 13:32:53 -07:00