This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
This patch makes the code compile with clang 2.9 and 3.0 again by
making two very minor changes. Firt, it maintains a strict typing in
the forward declaration of the BaseCPUParams. Second, it adds a
FullSystemInt flag of the type unsigned int next to the boolean
FullSystem flag. The FullSystemInt variable can be used in
decode-statements (expands to switch statements) in the instruction
decoder.
Making the CheckerCPU a runtime time option requires the code to be compatible
with ISAs other than ARM. This patch adds the appropriate function
stubs to allow compilation.
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files. Also merges with the SE/FS changes.
The change to port proxies recently moved code out of the constructor into
initState(). This is needed for code that loads data into memory, however
for code that setups symbol tables, kernel based events, etc this is the wrong
thing to do as that code is only called when a checkpoint isn't being restored
from.
1. --implicit-cache behavior is default.
2. makeEnv in src/SConscript is conditionally called.
3. decider set to MD5-timestamp
4. NO_HTML build option changed to SLICC_HTML (defaults to False)
This patch adds a creation-time check to the CPU to ensure that the
interrupt controller is created for the cases where it is needed,
i.e. if the CPU is not being switched in later and not a checker CPU.
The patch also adds the "createInterruptController" call to a number
of the regression scripts.
This patch renames the sendTiming member function in the RubyPort to
avoid inadvertently hiding Port::sendTiming (discovered through some
rather painful debugging). The RubyPort does, in fact, rely on the
functionality of the queued port and the implementation merely
schedules a send the next cycle. The new name for the member function
is sendNextCycle to better reflect this behaviour.
In the unlikely event that we ever shift to using C++11 the member
functions in Port should have a "final" identifier to prevent any
overriding in derived classes.
This change implements a PL031 real time clock.
--HG--
rename : src/dev/arm/timer_sp804.cc => src/dev/arm/rtc_pl031.cc
rename : src/dev/arm/timer_sp804.hh => src/dev/arm/rtc_pl031.hh
New kernels attempt to read CP14 what debug architecture is available.
These changes add the debug registers and return that none is currently
available.
The block is never inserted because it's the one extra block in the cache, but
it can be invalidated twice in a row. In that case the block doesn't have a
new master id (beacuse it was never inserted), however it is valid and
the accounting goes wrong at that point.
With the recent series of patches, the symbol table loading moved from
"construct" time to "init" time, but the kernel function event
callback registration was left behind. This patch moves it to the
proper location.
Add extra declarations to allow the compiler to pick up the right function.
Please note that these declarations have been added as part of the
clang-related changes.
This patch adds a function to X86 tlb that returns the
walker port. This port is required for correctly connecting
the walker ports for the cpu just switched in
This is a trivial patch that merely makes all the member functions of
the port proxies const. There is no good reason why they should not
be, and this change only serves to make it explicit that they are not
modified through their use.
This patch fixes a compilation error that occurs with gcc >= 4.6.1,
caused by swig not including cstddef and not using the std:: namespace
prefix for ptrdiff_t. There is an old patch,
http://reviews.m5sim.org/r/913/ that no longer applies cleanly and
this might be re-iterating the same issue.
We work around the problem by always enforcing the inclusion of
cstddef in all swig interface declarations, and also by explicitly
using std::ptrdiff_t.
If an instruction is executed speculatively and hits a situation where it
wants to panic, it should return a fault instead. If the instruction was
misspeculated, the fault can be thrown away. If the instruction wasn't
misspeculated, the fault will be invoked and the panic will still happen.
This patch splits the two cache ports into a master (memory-side) and
slave (cpu-side) subclass of port with slightly different
functionality. For example, it is only the CPU-side port that blocks
incoming requests, and only the memory-side port that schedules send
events outside of what the transmit list dictates.
This patch simplifies the two classes by relying further on
SimpleTimingPort and also generalises the latter to better accommodate
the changes (introducing trySendTiming and scheduleSend). The
memory-side cache port overrides sendDeferredPacket to be able to not
only send responses from the transmit list, but also send requests
based on the MSHRs.
A follow on patch further simplifies the SimpleTimingPort and the
cache ports.
This patch simplifies the mport in preparation for a split into a
master and slave role for the message ports. In particular,
sendMessageAtomic was only used in a single location and similarly so
sendMessageTiming. The affected interrupt device is updated
accordingly.
This patch simplfies the master ports used by RubyDirectedTester and
RubyTester by avoiding the use of SimpleTimingPort. Neither tester
made any use of the functionality offered by SimpleTimingPort besides
a trivial implementation of recvFunctional (only snoops) and
recvRangeChange (not relevant since there is only one master).
The patch does not change or add any functionality, it merely makes
the introduction of a master/slave port easier (in a future patch).
This patch moves the readBlob/writeBlob/memsetBlob from the Port class
to the PortProxy class, thus making a clear separation of the basic
port functionality (recv/send functional/atomic/timing), and the
higher-level functional accessors available on the port proxies.
There are only a few places in the code base where the blob functions
were used on ports, and they are all for peeking into the memory
system without making a normal memory access (in the memtest, and the
malta and tsunami pchip). The memtest also exemplifies how easy it is
to create a non-translating proxy if desired. The malta and tsunami
pchip used a slave port to perform a functional read, and this is now
changed to rely on the physProxy of the system (to which they already
have a pointer).
This patch is adding a clearer design intent to all objects that would
not be complete without a port proxy by making the proxies members
rathen than dynamically allocated. In essence, if NULL would not be a
valid value for the proxy, then we avoid using a pointer to make this
clear.
The same approach is used for the methods using these proxies, such as
loadSections, that now use references rather than pointers to better
reflect the fact that NULL would not be an acceptable value (in fact
the code would break and that is how this patch started out).
Overall the concept of "using a reference to express unconditional
composition where a NULL pointer is never valid" could be done on a
much broader scale throughout the code base, but for now it is only
done in the locations affected by the proxies.
This patch moves all port creation from the getPort method to be
consistently done in the MemObject's constructor. This is possible
thanks to the Swig interface passing the length of the vector ports.
Previously there was a mix of: 1) creating the ports as members (at
object construction time) and using getPort for the name resolution,
or 2) dynamically creating the ports in the getPort call. This is now
uniform. Furthermore, objects that would not be complete without a
port have these ports as members rather than having pointers to
dynamically allocated ports.
This patch also enables an elaboration-time enumeration of all the
ports in the system which can be used to determine the masterId.
This patch continues the unification of how the different CPU models
create and share their instruction and data ports. Most importantly,
it forces every CPU to have an instruction and a data port, and gives
these ports explicit getters in the BaseCPU (getDataPort and
getInstPort). The patch helps in simplifying the code, make
assumptions more explicit, andfurther ease future patches related to
the CPU ports.
The biggest changes are in the in-order model (that was not modified
in the previous unification patch), which now moves the ports from the
CacheUnit to the CPU. It also distinguishes the instruction fetch and
load-store unit from the rest of the resources, and avoids the use of
indices and casting in favour of keeping track of these two units
explicitly (since they are always there anyways). The atomic, timing
and O3 model simply return references to their already existing ports.
This patch adds a check in the findPort method to ensure that an
invalid port id is never returned. Previously this could happen if no
default port was set, and no address matched the request, in which
case -1 was returned causing a SEGFAULT when using the id to index in
the port array. To clean things up further a symbolic name is added
for the invalid port id.
Without this patch, undefined params cause a cryptic KeyError
in multidict inside get_config_as_dict(). This patch lets
undefined params through get_config_as_dict() so they can
once again generate meaningful error messages later on in
the configuration process.
This patch cleans up a number of remaining uses of bus.port which
is now split into bus.master and bus.slave. The only non-trivial change
is the memtest where the level building now has to be aware of the role
of the ports used in the previous level.
1. Updates the Branch Predictor correctly to the state
just after a mispredicted branch, if a squash occurs.
2. If a BTB does not find an entry, the branch is predicted not taken.
The global history is modified to correctly reflect this prediction.
3. Local history is now updated at the fetch stage instead of
execute stage.
4. In the Update stage of the branch predictor the local predictors are
now correctly updated according to the state of local history during
fetch stage.
This patch also improves performance by as much as 17% on some benchmarks
The copy-engine ports were previously created implicitly and bound
based on the dma port peer rather than relying on the normal Python
binding (connectPorts) being called explicitly. This patch makes the
copy engine port similar to all other ports in that they are visibly
in the Python class and bound using the normal explicit calls through
Python.
This patch adds basic information about the ports in the parameter
classes to be passed from the Python world to the corresponding C++
object. Currently, the only information passed is the number of
connected peers, which for a Port is either 0 or 1, and for a
VectorPort reflects the size of the VectorPort. The default port of
the bus had to be renamed to avoid using the name "default" as a field
in the parameter class. It is possible to extend the Swig'ed
information further and add e.g. a pair with a description and size.
This patch classifies all ports in Python as either Master or Slave
and enforces a binding of master to slave. Conceptually, a master (such
as a CPU or DMA port) issues requests, and receives responses, and
conversely, a slave (such as a memory or a PIO device) receives
requests and sends back responses. Currently there is no
differentiation between coherent and non-coherent masters and slaves.
The classification as master/slave also involves splitting the dual
role port of the bus into a master and slave port and updating all the
system assembly scripts to use the appropriate port. Similarly, the
interrupt devices have to have their int_port split into a master and
slave port. The intdev and its children have minimal changes to
facilitate the extra port.
Note that this patch does not enforce any port typing in the C++
world, it merely ensures that the Python objects have a notion of the
port roles and are connected in an appropriate manner. This check is
carried when two ports are connected, e.g. bus.master =
memory.port. The following patches will make use of the
classifications and specialise the C++ ports into masters and slaves.
This patch fixes the cache stats to use the new request ids.
Cache stats also display the requestor names in the vector subnames.
Most cache stats now include "nozero" and "nonan" flags to reduce the
amount of excessive cache stat dump. Also, simplified
incMissCount()/incHitCount() functions.
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
This patch removes the calls to isTagPresent() from Sequencer.cc. These
calls are made just for setting the cache block to have been most recently
used. The calls have been folded in to the function setMRU().
This patch adds support for stalling the requests queued up at different
controllers for the MESI CMP directory protocol. Earlier the controllers
would recycle the requests using some fixed latency. This results in
younger requests getting serviced first at times, and can result in
starvation. Instead all the requests that need a particular block to be
in a stable state are moved to a separate queue, where they wait till
that block returns to a stable state and then they are processed.
The delayed commit flag is used in conjunction with interrupt pending flag to
figure out whether or not fetch stage should get more instructions. This patch
clears this flag when instructions are squashed. Also, in case an interrupt is
pending, currently it is not possible to access the instruction cache. This
patch allows accessing the cache in case this flag is set.
The condition for handling interrupts is to check whether or not the cpu's
instruction list is empty. As observed, this can lead to cases in which even
though the instruction list is empty, interrupts are handled when they should
not be. The condition is being strengthened so that interrupts get handled only
when the last committed microop did not had IsDelayedCommit set.
This patch adds a function to the ROB that will get the squashing instruction
from the ROB's list of instructions. This squashing instruction is used for
figuring out the macroop from which the fetch stage should fetch the microops.
Further, a check has been added that if the instructions are to be fetched
from the cache maintained by the fetch stage, then the data in the cache should
be valid and the PC of the thread being fetched from is same as the address of
the cache block.
This pointer was only being stored in code that came from SE mode. The system
pointer is always meaningful and available, so it should always be stored.
This patch removes the onRetryList field from the BusPort class and
entirely relies on the retryList which holds all ports that are
waiting to retry. The onRetryList field and the retryList were
previously used with overloaded functionalities and only one is really
needed (there were also checks to assert they held the same
information). After this patch the bus ports will be split into master
and slave ports and this simplifies that transition.
Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.
The code that checks whether pages allocated by allocPhysPages only checks
that the first page fits into physical memory, not that all of them do. This
change makes the code check the last page which should work properly. This
function used to only allocate one page at a time, so the first page and last
page used to be the same thing.
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
This patch is a very straight-forward simplification, removing the
unecessary otherPort pointer from the cache port. The pointer was only
used to forward range changes, and the address range is fixed for the
cache. Removing the pointer simplifies the transition to master/slave
ports.
This patch is a trivial simplification, removing the cpu pointer from
SimpleThread and relying on the baseCpu pointer in ThreadState. The
patch does not add or change any functionality, it merely cleans up
the code.
Usage: m5 writefile <filename>
File will be created in the gem5 output folder with the identical filename.
Implementation is largely based on the existing "readfile" functionality.
Currently does not support exporting of folders.
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU. These changes have only been tested with the ARM ISA. Other
ISAs potentially require modification.
This patch makes the physMemPort of the RubyPort a PioPort rather than
an M5Port. This reflects the fact that the M5Port and PioPort have
different roles. The M5Port is really a coherent slave that is
connected to the CPUs and other coherent masters of the system,
e.g. DMA ports. The PioPort, on the other hand, is a master port that
is connected to the memory and other slaves, for example the pio
devices.
This simplifies future changes into master/slave ports and is
consistent with the port roles throughout the system.
This patch cleans up forward declarations and a member-function
prototype that still referred to the old FunctionalPort, VirtualPort
and TranslatingPort. There is no change in functionality.
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system. By default, the x86 architecture will have TSO.
This patch adds a missing curly brace when clearing and setting the
appropriate bits in the ns_gige.cc code.
This commit is not based on any runtime bug experienced, but rather
inspection of the code.
CopyStringOut() improperly indexed setting the null
character, would result in zeroing a random byte
of memory after(out of bounds) the character array.
This patch implements the functionality for forwarding invalidations and
replacements from the L1 cache of the Ruby memory system to the O3 CPU. The
implementation adds a list of ports to RubyPort. Whenever a replacement or an
invalidation is performed, the L1 cache forwards this to all the ports, which
is the LSQ in case of the O3 CPU.
This command will be sent from the memory system (Ruby) to the LSQ of
an O3 CPU so that the LSQ, if it needs to, invalidates the address in
the request packet.
This patch removes the idiosyncratic nature of the default bus port
and makes it yet another port in the list of interfaces. Rather than
having a specific pointer to the default port we merely track the
identifier of this port. This change makes future port diversification
easier and overall cleans up the bus code.
In preparation for the introduction of Master and Slave ports, this
patch removes the default port parameter in the Python port and thus
forces the argument list of the Port to contain only the
description. The drawback at this point is that the config port and
dma port of PCI and DMA devices have to be connected explicitly. This
is key for future diversification as the pio and config port are
slaves, but the dma port is a master.
This patch makes the bus bridge uni-directional and specialises the
bus ports to be a master port and a slave port. This greatly
simplifies the assumptions on both sides as either port only has to
deal with requests or responses. The following patches introduce the
notion of master and slave ports, and would not be possible without
this split of responsibilities.
In making the bridge unidirectional, the address range mechanism of
the bridge is also changed. For the cases where communication is
taking place both ways, an additional bridge is needed. This causes
issues with the existing mechanism, as the busses cannot determine
when to stop iterating the address updates from the two bridges. To
avoid this issue, and also greatly simplify the specification, the
bridge now has a fixed set of address ranges, specified at creation
time.
The functional ports are no longer used and this patch cleans up the
legacy that is still present in buses, memories, CPUs etc. Note that
this does not refer to the class FunctionalPort (already removed), but
rather ports with the name (and use) functional.
This patch simplifies the address-range determination mechanism and
also unifies the naming across ports and devices. It further splits
the queries for determining if a port is snooping and what address
ranges it responds to (aiming towards a separation of
cache-maintenance ports and pure memory-mapped ports). Default
behaviours are such that most ports do not have to define isSnooping,
and master ports need not implement getAddrRanges.
This patch removes the default port and instead relies on the peer
being set to NULL initially. The binding check (i.e. is a port
connected or not) will eventually be moved to the init function of the
modules.
This patch removes the inheritance of EventManager from the ports and
moves all responsibility for event queues to the owner. Eventually the
event manager should be the interface block, which could either be the
structural owner or a subblock like a LSQ in the O3 CPU for example.
This patch performs minimal changes to move the instruction and data
ports from specialised subclasses to the base CPU (to the largest
degree possible). Ultimately it servers to make the CPU(s) have a
well-defined interface to the memory sub-system.
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.
The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy
--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
This patch changes the access permission for the WB_E_W state from
Busy to Read_Write to avoid having issues in follow-on patches with
functional accesses going through Ruby. This change was made after
consultation with all involved parties and is more of a work-around
than a fix.
The system port is used as a globally reachable access point to the
memory subsystem. The benefit of using an actual port is that the
usual infrastructure is used to resolve any access and thus makes the
overall system able to handle distributed memories in any
configuration, and also makes the accesses agnostic to the address
map. This patch only introduces the port and does not actually use it
for anything.
This patch changes the functionalAccess member function in the cache
model such that it is aware of what port the access came from, i.e. if
it came from the CPU side or from the memory side. By adding this
information, it is possible to respect the 'forwardSnoops' flag for
snooping requests coming from the memory side and not forward
them. This fixes an outstanding issue with the IO bus getting accesses
that have no valid destination port and also cleans up future changes
to the bus model.
A recent changeset (aae12ce9f34c) removed support for
PAL-mode breakpoints in Alpha, since it was awkward
and likely unused. This patch lets a user know if they
potentially run into this limitation.
The definition for the class CacheMsg was removed long back. Some declaration
had still survived, which was recently removed. Since the PerfectCacheMemory
class relied on this particular declaration, its absence let to compilation
breaking down. Hence this patch.
This patch resurrects ruby's cache warmup capability. It essentially
makes use of all the infrastructure that was added to the controllers,
memories and the cache recorder.
This patch adds function to the Sparse Memory so that the blocks can be
recorded in a cache trace. The blocks are added to the cache recorder
which can later write them into a file.
This patch adds functions to the memory vector class that can be used for
collating memory pages to raw trace and for populating pages from a raw
trace.
The SparseMemEntry structure includes just one void* pointer. It seems
unnecessary that we have a structure for this. The patch removes the
structure and makes use of a typedef on void* instead.
This adds the derived class FunctionalPacket to fix a long standing
deficiency in the Packet class where it was unable to handle finding data to
partially satisfy a functional access. Made this a derived class as
functional accesses are used only in certain contexts and to not add any
additional overhead to the existing Packet class.
This patch adds a mechanism to collect run time samples for specific portions
of a benchmark, using work_begin and work_end pseudo instructions.It also enhances
the histogram stat to report geometric mean.
The previous version didn't work correctly with max integer values (2^31-1 for
32-bit, 2^63-1 for 64bit version), causing "shift" to become -1. For smaller
numbers, it wouldn't have caused functional errors, but would have resulted in
more than necessary loops in the while loop. Special-cased cases when (max + 1
== 0) to prevent the ceilLog2 functions from failing.
To make gem5 compile and run with swig 2.0.4 a few minor fixes are
necessary, the fail label issues by swig must not be treated as an
error by gcc (tested with gcc 4.2.1), and the vector wrappers must
have SWIGPY_SLICE_ARG defined which happens in pycontainer.swg,
included through std_container.i. By adding the aforementioned include
to the vector wrappers everything seems to work.
Adaptations to make gem5 compile and run on OSX 10.7.2, with a stock
gcc 4.2.1 and the remaining dependencies from macports, i.e. python
2.7,.2 swig 2.0.4, mercurial 2.0. The changes include an adaptation of
the SConstruct to handle non-library linker flags, and Darwin-specific
code to find the memory usage of gem5. A number of Ruby files relied
on ambigious uint (without the 32 suffix) which caused compilation
errors.
This constant is currently in System.hh, but is only used in Set.hh. It
is being moved to Set.hh to remove this artificial dependence of Set.hh
on System.hh.
--HG--
extra : rebase_source : 683c43a5eeaec4f5f523b3ea32953a07f65cfee7
This patch adds a function for replacing the event at the head of the queue
with another event. This helps in running a different set of events. Events
already scheduled can processed by replacing the original head event back.
This function has been specifically added to support cache warmup and
cooldown required for creating and restoring checkpoints.
--HG--
extra : rebase_source : ed6e2905720b6bfdefd020fab76235ccf33d28d1
This patch removes calls to uu_ProfileMiss from transitions where the request
is satisfied by the L2 cache controller.
--HG--
extra : rebase_source : e59fe7c6cd5795c0019cf178dd3b062d73cc2ff5
The DPRINTF for doing protection checks appears after the checks have been
carried out. It is possible that the function returns while the checks are
being carried, in which case the printf is missed out. This patch moves the
DPRINTF before the checks.
--HG--
extra : rebase_source : 172896057e593022444d882ea93323a5d9f77a89
This patch adds and removes included files from some of the files so as to
organize remove some false dependencies and include some files directly
instead of transitively.
--HG--
extra : rebase_source : 09b482ee9ae00b3a204ace0c63550bc3ca220134
SLICC uses pointers for cache and TBE entries but not for directory entries.
This patch changes the protocols, SLICC and Ruby memory system so that even
directory entries are referenced using pointers.
--HG--
extra : rebase_source : abeb4ac78033d003153751f216fd1948251fcfad
When a change in the frame buffer from the VNC server is detected, the new
frame is stored out to the m5out/frames_*/ directory. Specifiy the flag
"--frame-capture" when running configs/example/fs.py to enable this behavior.
--HG--
extra : rebase_source : d4e08e83f4fa6ff79f3dc9c433fc1f0487e057fc
There are two lines in O3CPU.py that set the dcache and icache
tgts_per_mshr to 20, ignoring any pre-configured value of tgts_per_mshr.
This patch removes these hardcoded lines from O3CPU.py and sets the default
L1 cache mshr targets to 20.
--HG--
extra : rebase_source : 6f92d950e90496a3102967442814e97dc84db08b
Adds the flag 'recvSnoops' which enables pagewalkers using DmaPorts,
to properly configure snoops.
--HG--
extra : rebase_source : 64207bef62c3268ddff2236ee4adae873812325f
Squashes the subsequent instructions in O3 pipe after the service call, so that
they see the effect of the system call when re-executed. This isn't really an issue
with FS mode, but can show up in SE mode.
--HG--
extra : rebase_source : 613a69fe1d9834261e25a8cd340aa6b47578e1fe
There was a bug in the mm_disk implementation where a copy paste error
resulted in the d32 variable not being initialised (as it incorrectly
was used instead of d16), and gcc 4.5 complaining.
--HG--
extra : rebase_source : 9515e87b188b9eac189da8034cb13c3bf7d9e20b
This patch changes the implementation of Ruby's recvTiming() function so
that it pushes a packet in to the Sequencer instead of a RubyRequest. This
requires changes in the Sequencer's makeRequest() and issueRequest()
functions, as they also need to operate on a Packet instead of RubyRequest.
This patch adds a fault model, which provides the probability of a number of
architectural faults in the interconnection network (e.g., data corruption,
misrouting). These probabilities can be used to realistically inject faults
in GARNET and faithfully evaluate the effectiveness of novel resilient NoC
architectures.
This patch adds a new microop for memory barrier. The microop itself does
nothing, but since it is marked as a memory barrier, the O3 CPU should flush
all the pending loads and stores before the fence to the memory system.
This patch removes some of the unused typedefs. It also moves
some of the typedefs from Global.hh to TypeDefines.hh. The patch
also eliminates the file NodeID.hh.
This parameter depends on a number of coincidences to work properly. First,
there must be an array assigned to system called "cpu" even though there's no
parameter called that. Second, the items in the "cpu" array have to have a
"clock" parameter which has a "frequency" member. This is true of the normal
CPUs, but isn't true of the memory tester CPUs. This happened to work before
because the memory tester CPUs were only used in SE mode where this parameter
was being excluded. Since everything is being pulled into a common binary,
this won't work any more. Since the boot_cpu_frequency parameter is only used
by Alpha's Linux System object (and Mips's through copy and paste), the
definition of that parameter is moved down to those objects specifically.
In RubySlicc_ComponentMapping.hh, certain '#define's have been used for
mapping MachineType to GenericMachineType. These '#define's are being
eliminated and the code will now be generated by SLICC instead. Also
are being eliminated some of the unused functions from
RubySlicc_ComponentMapping.sm.
PageTable supported an allocate() call that called back
through the Process to allocate memory, but did not have
a method to map addresses without allocating new pages.
It makes more sense for Process to do the allocation, so
this method was renamed allocateMem() and moved to Process,
and uses a new map() call on PageTable.
The remaining uses of the process pointer in PageTable
were only to get the name and the PID, so by passing these
in directly in the constructor, we can make PageTable
completely independent of Process.
Replace the (broken as of previous changeset) swig_objdecl() method
that allowed/forced you to substitute a whole new C++ struct
definition for SWIG to wrap with a set of export_method* hooks
that let you just declare a set of C++ methods (or other declarations)
that get inserted in the auto-generated struct.
Restore the System get/setMemoryMode methods, and use this mechanism
to specialize SimObject as well, eliminating teh need for sim_object.i.
Needed bits of sim_object.i are moved to the new pyobject.i.
Also sucked a little SimObject specialization into cxx_param_decl()
allowing us to get rid of src/sim/sim_object_params.hh. Now the
generation and wrapping of the base SimObject param struct is more
in line with how derived objects are handled.
--HG--
rename : src/python/swig/sim_object.i => src/python/swig/pyobject.i
- Move the random bits of SWIG code generation out of src/SConscript
file and into methods on the objects being wrapped.
- Cleaned up some variable naming and added some comments to make
the process a little clearer.
- Did a little generated file/module renaming:
- vptype_Foo now Foo_vector
- init_Foo is now Foo_init
This makes it easier to see all the Foo-related files in a
sorted directory listing.
- Made cxx_predecls and swig_predecls normal SimObject classmethods.
- Got rid of swig_objdecls hook, even though this breaks the System
objects get/setMemoryMode method exports. Will be fixing this in
a future changeset.
Not all objects need a platform pointer, and having one creates a dependence
on their being a platform object. This change removes the platform pointer to
from the base device object and moves it into subclasses that actually need
it.
In order for a system object to work in SE mode and FS mode, it has to either
always require a platform object even in SE mode, or get rid of the
requirement all together. Making SE mode carry around unnecessary/unused bits
of FS seems less than ideal, so I decided to go with the second option. The
platform pointer in the System class was used for exactly one purpose, a path
for the Alpha Linux system object to get to the real time clock and read its
frequency so that it could short cut the loops_per_jiffy calculation. There
was also a copy and pasted implementation in MIPS, but since it was only there
because it was there in Alpha I still count that as one use.
This change reverses the mechanism that communicates the RTC frequency so that
the Tsunami platform object pushes it up to the AlphaSystem object. This is
slightly less specific than it could be because really only the
AlphaLinuxSystem uses it. Because the intrFrequency function on the Platform
class was no longer necessary (and unimplemented on anything but Alpha) it was
eliminated.
After this change, a platform will need to have a system, but a system won't
have to have a platform.
These faults take varargs to their constructors which they print into a string
and pass to the M5DebugFault base class. They are basically faults wrapped
around panics, faults, warns, and warnonce-es so that they happen only at
commit.
All of the classes will now be available in both modes, and only
GenericPageTableFault will continue to check the mode for conditional
compilation. It uses a process object to handle the fault in SE mode, and
for now those aren't available in FS mode.
By using an underscore, the "." is still available and can unambiguously be
used to refer to members of a structure if an operand is a structure, class,
etc. This change mostly just replaces the appropriate "."s with "_"s, but
there were also a few places where the ISA descriptions where handling the
extensions themselves and had their own regular expressions to update. The
regular expressions in the isa parser were updated as well. It also now
looks for one of the defined type extensions specifically after connecting "_"
where before it would look for any sequence of characters after a "."
following an operand name and try to use it as the extension. This helps to
disambiguate cases where a "_" may legitimately be part of an operand name but
not separate the name from the type suffix.
Because leaving the "_" and suffix on the variable name still leaves a valid
C++ identifier and all extensions need to be consistent in a given context, I
considered leaving them on as a breadcrumb that would show what the intended
type was for that operand. Unfortunately the operands can be referred to in
code templates, the Mem operand in particular, and since the exact type of Mem
can be different for different uses of the same template, that broke things.
This patch makes O3 CPU work along with the Ruby memory model. Ruby
overwrites the senderState pointer with another pointer. The pointer
is restored only when Ruby gets done with the packet. LSQ makes use of
senderState just after sendTiming() returns. But the dynamic_cast returns
a NULL pointer since Ruby's senderState pointer is from a different class.
Storing the senderState pointer before calling sendTiming() does away with
the problem.
This constant will have the same value as FULL_SYSTEM but will not be usable
by the preprocessor. It can be substituted into places where FULL_SYSTEM is
used in a C++ context and will make it easier to find which parts of the
simulator still use FULL_SYSTEM with the preprocessor using grep.
There was a change a while ago that refactored some scons stuff which got rid
of cpu_models.py but also accidentally got rid of the ISA parser as a source
for its target files. That meant that changes which affected the parser
wouldn't cause a rebuild unless they also changed one of the description
files. This change fixes that.
Translating MSR addresses into MSR register indices took a lot of space in the
TLB source and made looking around in that file awkward. This change moves
the lookup into its own file to get it out of the way. It also changes it from
a switch statement to a hash map which should hopefully be a little more
efficient.
Initialize flags via the Event constructor instead of calling
setFlags() in the body of the derived class's constructor. I
forget exactly why, but this made life easier when implementing
multi-queue support.
Also rename Event::getFlags() to isFlagSet() to better match
common usage, and get rid of some unused Event methods.
Use exitSimLoop() function instead of explicitly scheduling
on mainEventQueue (which won't work once we go to multiple
event queues). Also introduced a local params variable to
shorten a lot of expressions.
Print IpAddress params in dot notation for readability.
Properly compare IpAddress objects (by value and not object identity).
Also fix up derived param classes (IpNetmask and IpWithPort)
similarly.
This change is a significant reorganization of the MIPS fault code that gets
rid of duplication, fixes some bugs, doubtlessly introduces others, and adds
names for the exception code constants.
Pass in a bool to indicate if the fault is from a store instead of having two
different classes. The classes were also misleadingly named since loads are
also processed by the DTB but should return ITB faults since they aren't
stores. The TLB may be returning the wrong fault in this case, but I haven't
looked at it closely.
Get rid of Fault classes left over from when this file was copied from Alpha,
and rename ArithmeticOverflowFault to be IntegerOverflowFault and get rid of
the old IntegerOverflowFault stub. The Integer version is what's actually in
the manual, but the Arithmetic version had the implementation.
It was technically possible but clumsy to determine what endianness a guest
was configured with using the state in byteswap.hh. This change makes that
information available more directly.
Also get rid of unused (and mildly redundant) ByteOrderDiffers constant.
The decoder now checks the value of FULL_SYSTEM in a switch statement to
decide whether to return a real syscall instruction or one that triggers
syscall emulation (or a panic in FS mode). The switch statement should devolve
into an if, and also should be optimized out since it's based on constant
input.
In FS mode the syscall function will panic, but the interface will be
consistent and code which calls syscall can be compiled in. This will allow,
for instance, instructions that use syscall to be built unconditionally but
then not returned by the decoder.
Some DPRINTFs were printing uninitalized values because the DPRINTFs were
always being printed even when the features they were printing weren't
being used. This change moves the DPRINTFs into the appropriate if blocks
and initializes the state variables correctly.
There also is a case where the offset into the packet could be calculated
incorrectly during a DMA that is fixed.
Check that we're not currently writing back an address the prefetcher is trying
to prefetch before issuing it. We previously checked the mshrQueue and the cache
itself, but forgot to check the writeBuffer. This fixes a memory corrucption
issue with an L2 prefetcher.
Only create a memory ordering violation when the value could have changed
between two subsequent loads, instead of just when loads go out-of-order
to the same address. While not very common in the case of Alpha, with
an architecture with a hardware table walker this can happen reasonably
frequently beacuse a translation will miss and start a table walk and
before the CPU re-schedules the faulting instruction another one will
pass it to the same address (or cache block depending on the dendency
checking).
This patch has been tested with a couple of self-checking hand crafted
programs to stress ordering between two cores.
The performance improvement on SPEC benchmarks can be substantial (2-10%).
So a mips-cross-gdb can connect with gem5(MIPS_SE), and do some remote
debugging.
Testing:
Build gem5 for MIPS_SE and make gem5 wait at beginning:
modify "rgdb_wait = -1" to "rgdb_wait = 0" in src/sim/system.cc;
scons build/MIPS_SE/gem5.opt CPU_MODELS=O3CPU
----
Build GDB-7.3 mips-cross:
./configure --target=mips-linux-gnu --prefix=xxx/gdb-7.3-install/
make
make install
----
Run:
./build/MIPS_SE/gem5.opt configs/example/se.py --detailed --caches
./mips-linux-gnu-gdb xxx/gem5/tests/test-progs/hello/bin/mips/linux/hello
(gdb) target remote :7000
(gdb) info registers
(gdb) disassemble
(gdb) si
(gdb) break main
(gdb) c
(gdb) quit
Testing done.
Having two StaticInst classes, one nominally ISA dependent and the other ISA
dependent, has not been historically useful and makes the StaticInst class
more complicated that it needs to be. This change merges StaticInstBase into
StaticInst.
This change pulls the instruction decoding machinery (including caches) out of
the StaticInst class and puts it into its own class. This has a few intrinsic
benefits. First, the StaticInst code, which has gotten to be quite large, gets
simpler. Second, the code that handles decode caching is now separated out
into its own component and can be looked at in isolation, making it easier to
understand. I took the opportunity to restructure the code a bit which will
hopefully also help.
Beyond that, this change also lays some ground work for each ISA to have its
own, potentially stateful decode object. We'd be able to include less
contextualizing information in the ExtMachInst objects since that context
would be applied at the decoder. Also, the decoder could "know" ahead of time
that all the instructions it's going to see are going to be, for instance, 64
bit mode, and it will have one less thing to check when it decodes them.
Because the decode caching mechanism has been separated out, it's now possible
to have multiple caches which correspond to different types of decoding
context. Having one cache for each element of the cross product of different
configurations may become prohibitive, so it may be desirable to clear out the
cache when relatively static state changes and not to have one for each
setting.
Because the decode function is no longer universally accessible as a static
member of the StaticInst class, a new function was added to the ThreadContexts
that returns the applicable decode object.
Do some minor cleanup of some recently added comments, a warning, and change
other instances of stack extension to be like what's now being done for x86.
The way flag bits were being set for microops in x86 ended up implicitly
calling the bitset constructor which was truncating flags beyond the width of
an unsigned long. This change sets the bits in chunks which are always small
enough to avoid being truncated. On 64 bit machines this should reduce to be
the same as before, and on 32 bit machines it should work properly and not be
unreasonably inefficient.
When an instruction is translated in the x86 TLB, a variable called
delayedResponse is passed back and forth which tracks whether a translation
could be completed immediately, or if there's going to be callback that will
finish things up. If a read was to the internal memory space, memory mapped
registers used to implement things like MSRs, the function hadn't yet gotten
to where delayedResponse was set to false, it's default. That meant that the
value was never set, and the TLB could start waiting for a callback that would
never come. This change simply moves the assignment to above where control
can divert to translateInt().
Nothing big here, but when you have an address that is not in the page table request to be allocated, if it falls outside of the maximum stack range all you get is a page fault and you don't know why. Add a little warn() to explain it a bit. Also add some comments and alter logic a little so that you don't totally ignore the return value of checkAndAllocNextPage().
Even though the code is safe, compiler flags a warning here, which are treated as errors for fast/opt. I know it's redundant but it has no side effects and fixes the compile.
In the current implementation of Functional Accesses, it's very hard to
implement broadcast or snooping protocols where the memory has no idea if it
has exclusive access to a cache block or not. Without this knowledge, making
sure the RW vs. RO permissions are right are next to impossible. So we add a
new state called Backing_Store to enable the conveyance that this is the backup
storage for a block, so that it can be written if it is the only possibly RW
block in the system, or written even if there is another RW block in the
system, without causing problems.
Also, a small change to actually set the m_name field for each Controller so
that debugging can be easier. Now you can access a controller's name just by
controller->getName().