This patch is adding a clearer design intent to all objects that would
not be complete without a port proxy by making the proxies members
rathen than dynamically allocated. In essence, if NULL would not be a
valid value for the proxy, then we avoid using a pointer to make this
clear.
The same approach is used for the methods using these proxies, such as
loadSections, that now use references rather than pointers to better
reflect the fact that NULL would not be an acceptable value (in fact
the code would break and that is how this patch started out).
Overall the concept of "using a reference to express unconditional
composition where a NULL pointer is never valid" could be done on a
much broader scale throughout the code base, but for now it is only
done in the locations affected by the proxies.
This patch moves all port creation from the getPort method to be
consistently done in the MemObject's constructor. This is possible
thanks to the Swig interface passing the length of the vector ports.
Previously there was a mix of: 1) creating the ports as members (at
object construction time) and using getPort for the name resolution,
or 2) dynamically creating the ports in the getPort call. This is now
uniform. Furthermore, objects that would not be complete without a
port have these ports as members rather than having pointers to
dynamically allocated ports.
This patch also enables an elaboration-time enumeration of all the
ports in the system which can be used to determine the masterId.
This patch classifies all ports in Python as either Master or Slave
and enforces a binding of master to slave. Conceptually, a master (such
as a CPU or DMA port) issues requests, and receives responses, and
conversely, a slave (such as a memory or a PIO device) receives
requests and sends back responses. Currently there is no
differentiation between coherent and non-coherent masters and slaves.
The classification as master/slave also involves splitting the dual
role port of the bus into a master and slave port and updating all the
system assembly scripts to use the appropriate port. Similarly, the
interrupt devices have to have their int_port split into a master and
slave port. The intdev and its children have minimal changes to
facilitate the extra port.
Note that this patch does not enforce any port typing in the C++
world, it merely ensures that the Python objects have a notion of the
port roles and are connected in an appropriate manner. This check is
carried when two ports are connected, e.g. bus.master =
memory.port. The following patches will make use of the
classifications and specialise the C++ ports into masters and slaves.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
Usage: m5 writefile <filename>
File will be created in the gem5 output folder with the identical filename.
Implementation is largely based on the existing "readfile" functionality.
Currently does not support exporting of folders.
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU. These changes have only been tested with the ARM ISA. Other
ISAs potentially require modification.
This patch cleans up forward declarations and a member-function
prototype that still referred to the old FunctionalPort, VirtualPort
and TranslatingPort. There is no change in functionality.
This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.
The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy
--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
A recent changeset (aae12ce9f34c) removed support for
PAL-mode breakpoints in Alpha, since it was awkward
and likely unused. This patch lets a user know if they
potentially run into this limitation.
The DPRINTF for doing protection checks appears after the checks have been
carried out. It is possible that the function returns while the checks are
being carried, in which case the printf is missed out. This patch moves the
DPRINTF before the checks.
--HG--
extra : rebase_source : 172896057e593022444d882ea93323a5d9f77a89
Adds the flag 'recvSnoops' which enables pagewalkers using DmaPorts,
to properly configure snoops.
--HG--
extra : rebase_source : 64207bef62c3268ddff2236ee4adae873812325f
Squashes the subsequent instructions in O3 pipe after the service call, so that
they see the effect of the system call when re-executed. This isn't really an issue
with FS mode, but can show up in SE mode.
--HG--
extra : rebase_source : 613a69fe1d9834261e25a8cd340aa6b47578e1fe
This patch adds a new microop for memory barrier. The microop itself does
nothing, but since it is marked as a memory barrier, the O3 CPU should flush
all the pending loads and stores before the fence to the memory system.
This parameter depends on a number of coincidences to work properly. First,
there must be an array assigned to system called "cpu" even though there's no
parameter called that. Second, the items in the "cpu" array have to have a
"clock" parameter which has a "frequency" member. This is true of the normal
CPUs, but isn't true of the memory tester CPUs. This happened to work before
because the memory tester CPUs were only used in SE mode where this parameter
was being excluded. Since everything is being pulled into a common binary,
this won't work any more. Since the boot_cpu_frequency parameter is only used
by Alpha's Linux System object (and Mips's through copy and paste), the
definition of that parameter is moved down to those objects specifically.
PageTable supported an allocate() call that called back
through the Process to allocate memory, but did not have
a method to map addresses without allocating new pages.
It makes more sense for Process to do the allocation, so
this method was renamed allocateMem() and moved to Process,
and uses a new map() call on PageTable.
The remaining uses of the process pointer in PageTable
were only to get the name and the PID, so by passing these
in directly in the constructor, we can make PageTable
completely independent of Process.
Not all objects need a platform pointer, and having one creates a dependence
on their being a platform object. This change removes the platform pointer to
from the base device object and moves it into subclasses that actually need
it.
In order for a system object to work in SE mode and FS mode, it has to either
always require a platform object even in SE mode, or get rid of the
requirement all together. Making SE mode carry around unnecessary/unused bits
of FS seems less than ideal, so I decided to go with the second option. The
platform pointer in the System class was used for exactly one purpose, a path
for the Alpha Linux system object to get to the real time clock and read its
frequency so that it could short cut the loops_per_jiffy calculation. There
was also a copy and pasted implementation in MIPS, but since it was only there
because it was there in Alpha I still count that as one use.
This change reverses the mechanism that communicates the RTC frequency so that
the Tsunami platform object pushes it up to the AlphaSystem object. This is
slightly less specific than it could be because really only the
AlphaLinuxSystem uses it. Because the intrFrequency function on the Platform
class was no longer necessary (and unimplemented on anything but Alpha) it was
eliminated.
After this change, a platform will need to have a system, but a system won't
have to have a platform.
These faults take varargs to their constructors which they print into a string
and pass to the M5DebugFault base class. They are basically faults wrapped
around panics, faults, warns, and warnonce-es so that they happen only at
commit.
By using an underscore, the "." is still available and can unambiguously be
used to refer to members of a structure if an operand is a structure, class,
etc. This change mostly just replaces the appropriate "."s with "_"s, but
there were also a few places where the ISA descriptions where handling the
extensions themselves and had their own regular expressions to update. The
regular expressions in the isa parser were updated as well. It also now
looks for one of the defined type extensions specifically after connecting "_"
where before it would look for any sequence of characters after a "."
following an operand name and try to use it as the extension. This helps to
disambiguate cases where a "_" may legitimately be part of an operand name but
not separate the name from the type suffix.
Because leaving the "_" and suffix on the variable name still leaves a valid
C++ identifier and all extensions need to be consistent in a given context, I
considered leaving them on as a breadcrumb that would show what the intended
type was for that operand. Unfortunately the operands can be referred to in
code templates, the Mem operand in particular, and since the exact type of Mem
can be different for different uses of the same template, that broke things.
There was a change a while ago that refactored some scons stuff which got rid
of cpu_models.py but also accidentally got rid of the ISA parser as a source
for its target files. That meant that changes which affected the parser
wouldn't cause a rebuild unless they also changed one of the description
files. This change fixes that.
Translating MSR addresses into MSR register indices took a lot of space in the
TLB source and made looking around in that file awkward. This change moves
the lookup into its own file to get it out of the way. It also changes it from
a switch statement to a hash map which should hopefully be a little more
efficient.
This change is a significant reorganization of the MIPS fault code that gets
rid of duplication, fixes some bugs, doubtlessly introduces others, and adds
names for the exception code constants.
Pass in a bool to indicate if the fault is from a store instead of having two
different classes. The classes were also misleadingly named since loads are
also processed by the DTB but should return ITB faults since they aren't
stores. The TLB may be returning the wrong fault in this case, but I haven't
looked at it closely.
Get rid of Fault classes left over from when this file was copied from Alpha,
and rename ArithmeticOverflowFault to be IntegerOverflowFault and get rid of
the old IntegerOverflowFault stub. The Integer version is what's actually in
the manual, but the Arithmetic version had the implementation.
The decoder now checks the value of FULL_SYSTEM in a switch statement to
decide whether to return a real syscall instruction or one that triggers
syscall emulation (or a panic in FS mode). The switch statement should devolve
into an if, and also should be optimized out since it's based on constant
input.
Only create a memory ordering violation when the value could have changed
between two subsequent loads, instead of just when loads go out-of-order
to the same address. While not very common in the case of Alpha, with
an architecture with a hardware table walker this can happen reasonably
frequently beacuse a translation will miss and start a table walk and
before the CPU re-schedules the faulting instruction another one will
pass it to the same address (or cache block depending on the dendency
checking).
This patch has been tested with a couple of self-checking hand crafted
programs to stress ordering between two cores.
The performance improvement on SPEC benchmarks can be substantial (2-10%).
So a mips-cross-gdb can connect with gem5(MIPS_SE), and do some remote
debugging.
Testing:
Build gem5 for MIPS_SE and make gem5 wait at beginning:
modify "rgdb_wait = -1" to "rgdb_wait = 0" in src/sim/system.cc;
scons build/MIPS_SE/gem5.opt CPU_MODELS=O3CPU
----
Build GDB-7.3 mips-cross:
./configure --target=mips-linux-gnu --prefix=xxx/gdb-7.3-install/
make
make install
----
Run:
./build/MIPS_SE/gem5.opt configs/example/se.py --detailed --caches
./mips-linux-gnu-gdb xxx/gem5/tests/test-progs/hello/bin/mips/linux/hello
(gdb) target remote :7000
(gdb) info registers
(gdb) disassemble
(gdb) si
(gdb) break main
(gdb) c
(gdb) quit
Testing done.
Having two StaticInst classes, one nominally ISA dependent and the other ISA
dependent, has not been historically useful and makes the StaticInst class
more complicated that it needs to be. This change merges StaticInstBase into
StaticInst.
This change pulls the instruction decoding machinery (including caches) out of
the StaticInst class and puts it into its own class. This has a few intrinsic
benefits. First, the StaticInst code, which has gotten to be quite large, gets
simpler. Second, the code that handles decode caching is now separated out
into its own component and can be looked at in isolation, making it easier to
understand. I took the opportunity to restructure the code a bit which will
hopefully also help.
Beyond that, this change also lays some ground work for each ISA to have its
own, potentially stateful decode object. We'd be able to include less
contextualizing information in the ExtMachInst objects since that context
would be applied at the decoder. Also, the decoder could "know" ahead of time
that all the instructions it's going to see are going to be, for instance, 64
bit mode, and it will have one less thing to check when it decodes them.
Because the decode caching mechanism has been separated out, it's now possible
to have multiple caches which correspond to different types of decoding
context. Having one cache for each element of the cross product of different
configurations may become prohibitive, so it may be desirable to clear out the
cache when relatively static state changes and not to have one for each
setting.
Because the decode function is no longer universally accessible as a static
member of the StaticInst class, a new function was added to the ThreadContexts
that returns the applicable decode object.
Do some minor cleanup of some recently added comments, a warning, and change
other instances of stack extension to be like what's now being done for x86.
The way flag bits were being set for microops in x86 ended up implicitly
calling the bitset constructor which was truncating flags beyond the width of
an unsigned long. This change sets the bits in chunks which are always small
enough to avoid being truncated. On 64 bit machines this should reduce to be
the same as before, and on 32 bit machines it should work properly and not be
unreasonably inefficient.
When an instruction is translated in the x86 TLB, a variable called
delayedResponse is passed back and forth which tracks whether a translation
could be completed immediately, or if there's going to be callback that will
finish things up. If a read was to the internal memory space, memory mapped
registers used to implement things like MSRs, the function hadn't yet gotten
to where delayedResponse was set to false, it's default. That meant that the
value was never set, and the TLB could start waiting for a callback that would
never come. This change simply moves the assignment to above where control
can divert to translateInt().
Nothing big here, but when you have an address that is not in the page table request to be allocated, if it falls outside of the maximum stack range all you get is a page fault and you don't know why. Add a little warn() to explain it a bit. Also add some comments and alter logic a little so that you don't totally ignore the return value of checkAndAllocNextPage().
There are a set of locations is the linux kernel that are managed via
cache maintence instructions until all processors enable their MMUs & TLBs.
Writes to these locations are manually flushed from the cache to main
memory when the occur so that cores operating without their MMU enabled
and only issuing uncached accesses can receive the correct data. Unfortuantely,
gem5 doesn't support any kind of software directed maintence of the cache.
Until such time as that support exists this patch marks the specific cache blocks
that need to be coherent as non-cacheable until all CPUs enable their MMU and
thus allows gem5 to boot MP systems with caches enabled (a requirement for
booting an O3 cpu and thus an O3 CPU regression).
SEV instructions were originally implemented to cause asynchronous squashes
via the generateTCSquash() function in the O3 pipeline when updating the
SEV_MAILBOX miscReg. This caused race conditions between CPUs in an MP system
that would lead to a pipeline either going inactive indefinitely or not being
able to commit squashed instructions. Fixed SEV instructions to behave like
interrupts and cause synchronous sqaushes inside the pipeline, eliminating
the race conditions. Also fixed up the semantics of the WFE instruction to
behave as documented in the ARMv7 ISA description to not sleep if SEV_MAILBOX=1
or unmasked interrupts are pending.
Control register operands are set up so that writing to them is serialize
after, serialize before, and non-speculative. These are probably overboard,
but they should usually be safe. Unfortunately there are times when even these
aren't enough. If an instruction modifies state that affects fetch, later
serialized instructions which come after it might have already gone through
fetch and decode by the time it commits. These instructions may have been
translated incorrectly or interpretted incorrectly and need to be destroyed.
This change modifies instructions which will or may have this behavior so that
they use the IsSquashAfter flag when necessary.
SWP and SWPB now throw an undefined instruction exception if
SCTLR.SW == 0. This also required the MIDR to be changed
slightly so programs can correctly determine that gem5 supports
the ARM v7 behavior of SWP/SWPB (in ARM v6, SWP/SWPB were
deprecated, but not disabled at CPU startup).
Adds MISCREG_ID_MMFR2 and removes break on access to MISCREG_CLIDR. Both
registers now return values that are consistent with current ARM
implementations.
This patch implements the copyRegs() function for the x86 architecture.
The patch assumes that no side effects other than TLB invalidation need
to be considered while copying the registers. This may not hold true in
future.
change hwrei back to being a non-control instruction so O3-FS mode will work
add squash in inorder that will catch a hwrei (or any other genric instruction)
that isnt a control inst but changes the PC. Additional testing still needs to be done
for inorder-FS mode but this change will free O3 development back up in the interim
This makes it possible to use the grammar multiple times and use the multiple
instances concurrently. This makes implementing an include statement as part
of a grammar possible.
This change simplifies the code surrounding operand type handling and makes it
depend only on the ctype that goes with each operand type. Future changes will
allow defining operand types by their ctypes directly, convert the ISAs over
to that style of definition, and then remove support for the old style. These
changes are to make it easier to use non-builtin types like classes or
structures as the type for operands.
readBytes and writeBytes had the word "bytes" in their names because they
accessed blobs of bytes. This distinguished them from the read and write
functions which handled higher level data types. Because those functions don't
exist any more, this change renames readBytes and writeBytes to more general
names, readMem and writeMem, which reflect the fact that they are how you read
and write memory. This also makes their names more consistent with the
register reading/writing functions, although those are still read and set for
some reason.
The DTB expects the correct PC in the ThreadContext
but how if the memory accesses are speculative? Shouldn't
we send along the requestor's PC to the translate functions?
this always changes the PC and is basically an impromptu branch instruction. why
not speculate on this instead of always be forced to mispredict/squash after the
hwrei gets resolved?
The InOrder model needs this marked as "isControl" so it knows to update the PC
after the ALU executes it. If this isnt marked as control, then it's going to
force the model to check the PC of every instruction at commit (what O3 does?),
and that would be a wasteful check for a very high percentage of instructions.
Instead of clearing the entire TLB on initialization and flush, the code was
clearing only one element. This patch corrects the memsets in the init and
flush routines.
this flag is only used for early branch resolution in the O3 model (of pc-relative branches)
but this isnt cleanly working even when the branch target code is added for sparc. For now,
we'll ignore this optimization and add a todo in the SPARC ISA for future developers
Add a few constants and functions that the InOrder model wants for SPARC.
* * *
sparc: add eaComp function
InOrder separates the address generation from the actual access so give
Sparc that functionality
* * *
sparc: add control flags for branches
branch predictors and other cpu model functions need to know specific information
about branches, so add the necessary flags here
The regular expressions matching filenames in the ##include directives and the
internally generated ##newfile directives where only looking for filenames
composed of alpha numeric characters, periods, and dashes. In Unix/Linux, the
rules for what characters can be in a filename are much looser than that. This
change replaces those expressions with ones that look for anything other than
a quote character. Technically quote characters are allowed as well so we
should allow escaping them somehow, but the additional complexity probably
isn't worth it.
We were getting a spurious warning in the regressions that turned
out to be due to having the wrong value for TGT_MAP_ANONYMOUS for
Power Linux, but in the process of tracking it down I ended up
doing some cleanup of the mmap handling in general.
A significant contributor to the need for adoptOrphanParams()
is the practice of appending to SimObjectVectors which have
already been assigned as children. This practice sidesteps the
assignment operation for those appended SimObjects, which is
where parent/child relationships are typically established.
This patch reworks the config scripts that use append() on
SimObjectVectors, which all happen to be in the x86 system
configuration. At some point in the future, I hope to make
SimObjectVectors immutable (by deriving from tuple rather than
list), at which time this patch will be necessary for correct
operation. For now, it just avoids some of the warning
messages that get printed in adoptOrphanParams().
This patch fixes two problems with the O3 cpu model. The first is an issue
with an instruction fetch causing a fault on the next address while the
current macro-op is being issued. This happens when the micro-ops exceed
the fetch bandwdith and then on the next cycle the fetch stage attempts
to issue a request to the next line while it still has micro-ops to issue
if the next line faults a fault is attached to a micro-op in the currently
executing macro-op rather than a "nop" from the next instruction block.
This leads to an instruction incorrectly faulting when on fetch when
it had no reason to fault.
A similar problem occurs with interrupts. When an interrupt occurs the
fetch stage nominally stops issuing instructions immediately. This is incorrect
in the case of a macro-op as the current location might not be interruptable.
This change further eliminates cases where condition codes were being read
just so they could be written without change because the instruction in
question was supposed to preserve them. This is done by creating the condition
code code based on the input rather than just doing a simple substitution.
If one of the condition codes isn't being used in the execution we should only
read it if the instruction might be dependent on it. With the preeceding changes
there are several more cases where we should dynamically pick instead of assuming
as we did before.
Break up the condition code bits into NZ, C, V registers. These are individually
written and this removes some incorrect dependencies between instructions.
Move the saturating bit (which is also saturating) from the renamed register
that holds the flags to the CPSR miscreg and adds a allows setting it in a
similar way to the FP saturating registers. This removes a dependency in
instructions that don't write, but need to preserve the Q bit.
This change splits out the condcodes from being one monolithic register
into three blocks that are updated independently. This allows CPUs
to not have to do RMW operations on the flags registers for instructions
that don't write all flags.
Debug flags are ExecUser, ExecKernel, and ExecAsid. ExecUser and
ExecKernel are set by default when Exec is specified. Use minus
sign with ExecUser or ExecKernel to remove user or kernel tracing
respectively.
Add registers and components to better support the VersatileEB board.
Made the MIDR and SYS_ID register parameters to ArmSystem and RealviewCtrl
respectively.
This change makes the decoder figure out if an instruction that only supports
memory is using a register encoding and decodes directly to "Unknown" which will
behave appropriately. This prevents other parts of the instruction creation
process from seeing the mismatch and asserting.
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
This change fixes a small bug in the arm copyRegs() code where some registers
wouldn't be copied if the processor was in a mode other than MODE_USER.
Additionally, this change simplifies the way the O3 switchCpu code works by
utilizing TheISA::copyRegs() to copy the required context information
rather than the adhoc copying that goes on in the CPU model. The current code
makes assumptions about the visibility of int and float registers that aren't
true for all architectures in FS mode.
***
(1): get rid of expandForMT function
MIPS is the only ISA that cares about having a piece of ISA state integrate
multiple threads so add constants for MIPS and relieve the other ISAs from having
to define this. Also, InOrder was the only core that was actively calling
this function
* * *
(2): get rid of corespecific type
The CoreSpecific type was used as a proxy to pass in HW specific params to
a MIPS CPU, but since MIPS FS hasnt been touched for awhile, it makes sense
to not force every other ISA to use CoreSpecific as well use a special
reset function to set it. That probably should go in a PowerOn reset fault
anyway.
The ISAR registers describe which features the processor supports.
Transcribe the values listed in section B5.2.5 of the ARM ARM
into the registers as read-only values