In SLICC, in order to define a type a data type for which it should not
generate any code, the keyword external_type is used. For those data types for
which code should be generated, the keyword structure is used. This patch
eliminates the use of keyword external_type for defining structures. structure
key word can now have an optional attribute external, which would be used for
figuring out whether or not to generate the code for this structure. Also, now
structures can have functions as well data members in them.
In order to add stall and wait facility for protocols, a keyword
wake_up_dependents was introduced. This patch removes the keyword,
instead this functionality is now implemented as function call.
In order to add stall and wait facility for protocols, a keyword
wake_up_all_dependents was introduced. This patch removes the keyword,
instead this functionality is now implemented as function call.
Thanks to swig this was interfering with the standard Python
random module. The only function in that module was seed(),
which erroneously called srand48(). Moved the function to
m5.internal.core, renamed it seedRandom(), and made it call
random_mt.init() instead.
FastAlloc's reuse policies can mask allocation bugs, so
we typically want it disabled when debugging. Set
FORCE_FAST_ALLOC to enable even when debugging, and set
NO_FAST_ALLOC to disable even in non-debug builds.
The ISAR registers describe which features the processor supports.
Transcribe the values listed in section B5.2.5 of the ARM ARM
into the registers as read-only values
This change speeds up booting, especially in MP cases, by not executing
udelay() on the core but instead skipping ahead tha amount of time that is being
delayed.
This patch prevents not executed conditional instructions marked as
IsQuiesce from stalling the pipeline indefinitely. If the instruction
is not executed the quiesceSkip psuedoinst is called which schedules a
wakes up call to the fetch stage.
This changes the RFE macroop into 3 microops:
URa = [sp]; URb = [sp+4]; // load CPSR,PC values from stack
sp = sp + offset; // optionally auto-increment
PC = URa; CPSR = URb; // write to the PC and CPSR.
Importantly:
- writing to PC is handled in the last micro-op.
- loading occurs prior to state changes.
This change fixes the problem for all the cases we actively use. If you want to try
more creative I/O device attachments (E.g. sharing an L2), this won't work. You
would need another level of caching between the I/O device and the cache
(which you actually need anyway with our current code to make sure writes
propagate). This is required so that you can mark the cache in between as
top level and it won't try to send ownership of a block to the I/O device.
Asserts have been added that should catch any issues.
Without this change the a store can be issued to the cache multiple times.
If this case occurs when the l1 cache is out of mshrs (and thus blocked)
the processor will never make forward progress because each cycle it will
send a single request using the recently freed mshr and not completing the
multipart store. This will continue forever.
This causes a lot of rebuilds that could have otherwise possibly been
avoided, and, more annoyingly, a lot of unnecessary rerunning of the
regressions. The benefits of having the revision in the output haven't
materialized, so this change removes it.
None of the code in the ruby tester directory is compiled or referred to
outside of that directory. This change eliminates it. If it's needed in the
future, it can be revived from the history. In the mean time, this removes
clutter and the only use of the GEMS_ROOT scons variable.
The internet says this instruction was created by accident when an Intel CPU
failed to decode x87 instructions properly. It's been documented on a few rare
occasions and has generally worked to ensure backwards compatability. One
source claims that the gcc toolchain is basically the only thing that emits
it, and that emulators/binary translators like qemu and bochs implement it.
We won't actually implement it here since we're hardly implementing any other
x87 instructions either. If we were to implement it, it would behave the same
as ffree but then also pop the register stack.
http://www.pagetable.com/?p=16
There may not be a formally correct spelling for the past tense of mmap, but
mmapped is the spelling Google doesn't try to autocorrect. This makes sense
because it mirrors the past tense of map->mapped and not the past tense of
cape->caped.
--HG--
rename : src/arch/alpha/mmaped_ipr.hh => src/arch/alpha/mmapped_ipr.hh
rename : src/arch/arm/mmaped_ipr.hh => src/arch/arm/mmapped_ipr.hh
rename : src/arch/mips/mmaped_ipr.hh => src/arch/mips/mmapped_ipr.hh
rename : src/arch/power/mmaped_ipr.hh => src/arch/power/mmapped_ipr.hh
rename : src/arch/sparc/mmaped_ipr.hh => src/arch/sparc/mmapped_ipr.hh
rename : src/arch/x86/mmaped_ipr.hh => src/arch/x86/mmapped_ipr.hh
At a couple of places in PerfectSwitch.cc and MessageBuffer.cc, DPRINTF()
has not been provided with correct number of arguments. The patch fixes these
bugs.
This patch removes the store buffer from Ruby. It is not in use currently.
Since libruby is being and store buffer makes calls to libruby, it is not
possible to maintain it until substantial changes are made.
This patch changes Address.hh so that it is not dependent on RubySystem.
This dependence seems unecessary. All those functions that depend on
RubySystem have been moved to Address.cc file.
This patch changes DataBlock.hh so that it is not dependent on RubySystem.
This dependence seems unecessary. All those functions that depende on
RubySystem have been moved to DataBlock.cc file.
This patch integrates permissions with cache and memory states, and then
automates the setting of permissions within the generated code. No longer
does one need to manually set the permissions within the setState funciton.
This patch will faciliate easier functional access support by always correctly
setting permissions for both cache and memory states.
--HG--
rename : src/mem/slicc/ast/EnumDeclAST.py => src/mem/slicc/ast/StateDeclAST.py
rename : src/mem/slicc/ast/TypeFieldEnumAST.py => src/mem/slicc/ast/TypeFieldStateAST.py
Because int and not InstSeqNum was used in a couple of places, you can
overflow the int type and thus get wierd bugs when the sequence number
is negative (or some wierd value)
remove constructors that werent being used (it just gets confusing)
use initialization list for all the variables instead of relying on initVars()
function
-use a pointer to CacheReqPacket instead of PacketPtr so correct destructors
get called on packet deletion
- make sure to delete the packet if the cache blocks the sendTiming request
or for some reason we dont use the packet
- dont overwrite memory requests since in the worst case an instruction will
be replaying a request so no need to keep allocating a new request
- we dont use retryPkt so delete it
- fetch code was split out already, so just assert that this is a memory
reference inst. and that the staticInst is available
If there is an outstanding table walk and no other activity in the CPU
it can go to sleep and never wake up. This change makes the instruction
queue always active if the CPU is waiting for a store to translate.
If Gabe changes the way this code works then the below should be removed
as indicated by the todo.
We only support EABI binaries, so there is no reason to support OABI syscalls.
The loader detects OABI calls and fatal() so there is no reason to even check
here.
The ARM performance counters are not currently supported by the model.
This patch interprets a 'reset performance counters' command to mean 'reset
the simulator statistics' instead.
"executing" isnt a very descriptive debug message and in going through the
output you get multiple messages that say "executing" but nothing to help
you parse through the code/execution.
So instead, at least print out the name of the action that is taking
place in these functions.
Overall, continue to progress Ruby debug messages to more of the normal M5
debug message style
- add a name() to the Ruby Throttle & PerfectSwitch objects so that the debug output
isn't littered w/"global:" everywhere.
- clean up messages that print over multiple lines when possible
- clean up duplicate prints in the message buffer
In certain actions of the L1 cache controller, while creating an outgoing
message, the machine type was not being set. This results in a
segmentation fault when trace is collected. Joseph Pusudesris provided
his patch for fixing this issue.
keep track of when an instruction needs the execution
behind it to be serialized. Without this, in SE Mode
instructions can execute behind a system call exit().
resources don't need to call getLatency because the latency is already a member
in the class. If there is some type of special case where different instructions
impose a different latency inside a resource then we can revisit this and
add getLatency() back in
each resource has a certain # of requests it can take per cycle. update the #s here
to be more realistic based off of the pipeline width and if the resource needs to
be accessed on multiple cycles
---
need to delete the cache request's data on clearRequest() now that we are recycling
requests
---
fetch unit needs to deallocate the fetch buffer blocks when they are replaced or
squashed.
formerly, to free up bandwidth in a resource, we could just change the pointer in that resource
but at the same time the pipeline stages had visibility to see what happened to a resource request.
Now that we are recycling these requests (to avoid too much dynamic allocation), we can't throw
away the request too early or the pipeline stage gets bad information. Instead, mark when a request
is done with the resource all together and then let the pipeline stage call back to the resource
that it's time to free up the bandwidth for more instructions
*** inteface notes ***
- When an instruction completes and is done in a resource for that cycle, call done()
- When an instruction fails and is done with a resource for that cycle, call done(false)
- When an instruction completes, but isnt finished with a resource, call completed()
- When an instruction fails, but isnt finished with a resource, call completed(false)
* * *
inorder: tlbmiss wakeup bug fix
take away all instances of reqMap in the code and make all references use the built-in
request vectors inside of each resource. The request map was dynamically allocating
a request per instruction. The request vector just allocates N number of requests
during instantiation and then the surrounding code is fixed up to reuse those N requests
***
setRequest() and clearRequest() are the new accessors needed to define a new
request in a resource
we are going to be getting away from creating new resource requests for every
instruction so no more need to keep track of a reqRemoveList and clean it up
every tick
first change in an optimization that will stop InOrder from allocating new memory for every instruction's
request to a resource. This gets expensive since every instruction needs to access ~10 requests before
graduation. Instead, the plan is to allocate just enough resource request objects to satisfy each resource's
bandwidth (e.g. the execution unit would need to allocate 3 resource request objects for a 1-issue pipeline
since on any given cycle it could have 2 read requests and 1 write request) and then let the instructions
contend and reuse those allocated requests. The end result is a smaller memory footprint for the InOrder model
and increased simulation performance
Currently the wakeup function for the PerfectSwitch contains three loops -
loop on number of virtual networks
loop on number of incoming links
loop till all messages for this (link, network) have been routed
With an 8 processor mesh network and Hammer protocol, about 11-12% of the
was observed to have been spent in this function, which is the highest
amongst all the functions. It was found that the innermost loop is executed
about 45 times per invocation of the wakeup function, when each invocation
of the wakeup function processes just about one message.
The patch tries to do away with the redundant executions of the innermost
loop. Counters have been added for each virtual network that record the
number of messages that need to be routed for that virtual network. The
inner loops are only executed when the number of messages for that particular
virtual network > 0. This does away with almost 80% of the executions of the
innermost loop. The function now consumes about 5-6% of the total execution
time.
In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or
64 bits wide overwrite all bits of the destination register. This change
removes false dependencies in these cases where the previous value of a
register doesn't need to be read to write a new value. New versions of most
microops are created that have a "Big" suffix which simply overwrite their
destination, and the right version to use is selected during microop
allocation based on the selected data size.
This does not change the performance of the O3 CPU model significantly, I
assume because there are other false dependencies from the condition code bits
in the flags register.
These faults can panic/warn/warn_once, etc., instead of instructions doing
that themselves directly. That way, instructions can be speculatively
executed, and only if they're actually going to commit will their fault be
invoked and the panic, etc., happen.
When redirecting fetch to handle branches, the npc of the current pc state
needs to be left alone. This change makes the pc state record whether or not
the npc already reflects a real value by making it keep track of the current
instruction size, or if no size has been set.