Commit graph

4665 commits

Author SHA1 Message Date
Brad Beckmann
12a05c23b7 ruby: automate permission setting
This patch integrates permissions with cache and memory states, and then
automates the setting of permissions within the generated code.  No longer
does one need to manually set the permissions within the setState funciton.
This patch will faciliate easier functional access support by always correctly
setting permissions for both cache and memory states.

--HG--
rename : src/mem/slicc/ast/EnumDeclAST.py => src/mem/slicc/ast/StateDeclAST.py
rename : src/mem/slicc/ast/TypeFieldEnumAST.py => src/mem/slicc/ast/TypeFieldStateAST.py
2011-02-23 16:41:59 -08:00
Brad Beckmann
7842e95519 MOESI_hammer: cache probe address clean up 2011-02-23 16:41:58 -08:00
Brad Beckmann
3bc33eeaea ruby: cleaned up access permission enum 2011-02-23 16:41:58 -08:00
Brad Beckmann
c09a33e5d5 ruby: removed unsupported protocol files 2011-02-23 16:41:26 -08:00
Korey Sewell
0a74246fb9 inorder: InstSeqNum bug
Because int and not InstSeqNum was used in a couple of places, you can
overflow the int type and thus get wierd bugs when the sequence number
is negative (or some wierd value)
2011-02-23 16:35:18 -05:00
Korey Sewell
3e1ad73d08 inorder: dyn inst initialization
remove constructors that werent being used (it just gets confusing)
use initialization list for all the variables instead of relying on initVars()
function
2011-02-23 16:35:04 -05:00
Korey Sewell
e0a021005d inorder: cache packet handling
-use a pointer to CacheReqPacket instead of PacketPtr so correct destructors
get called on packet deletion
- make sure to delete the packet if the cache blocks the sendTiming request
or for some reason we dont use the packet
- dont overwrite memory requests since in the worst case an instruction will
be replaying a request so no need to keep allocating a new request
- we dont use retryPkt so delete it
- fetch code was split out already, so just assert that this is a memory
reference inst. and that the staticInst is available
2011-02-23 16:30:45 -05:00
Ali Saidi
057598843a Mem: Print out memory when access > 8 bytes 2011-02-23 15:10:50 -06:00
Ali Saidi
2eb19dac65 ARM: Set ITSTATE correctly after FlushPipe 2011-02-23 15:10:50 -06:00
Ali Saidi
916c7f162d ARM: This panic can be hit during misspeculation so it can't exist. 2011-02-23 15:10:50 -06:00
Ali Saidi
1201c5a134 ARM: Bad interworking warn way to noisy when running real code w/misspeculation. 2011-02-23 15:10:50 -06:00
Ali Saidi
f9d4d9df1b O3: When a prefetch causes a fault, don't record it in the inst 2011-02-23 15:10:50 -06:00
Giacomo Gabrielli
7ee2de31c4 ARM: NEON instruction templates modified to set the predicate flag to false when needed. 2011-02-23 15:10:50 -06:00
Ali Saidi
3de8e0a0d4 O3: If there is an outstanding table walk don't let the inst queue sleep.
If there is an outstanding table walk and no other activity in the CPU
it can go to sleep and never wake up. This change makes the instruction
queue always active if the CPU is waiting for a store to translate.

If Gabe changes the way this code works then the below should be removed
as indicated by the todo.
2011-02-23 15:10:49 -06:00
Ali Saidi
326191adc9 ARM: Squash state on FPSCR stride or len write. 2011-02-23 15:10:49 -06:00
Matt Horsnell
bb319a589e ARM: Mark store conditionals as such. 2011-02-23 15:10:49 -06:00
Ali Saidi
7391ea6de6 ARM: Do something for ISB, DSB, DMB 2011-02-23 15:10:49 -06:00
Ali Saidi
ae3d456855 ARM: Fix bug that let two table walks occur in parallel. 2011-02-23 15:10:49 -06:00
Ali Saidi
f05f35df99 Includes: Don't include isa_traits.hh and use the TheISA namespace unless really needed. 2011-02-23 15:10:49 -06:00
Ali Saidi
805ad4ba41 ARM: Make Noop actually decode to a noop and set it's instflags. 2011-02-23 15:10:49 -06:00
Ali Saidi
68bd80794c O3: Fix bug when a squash occurs right before TLB miss returns.
In this case we need to throw away the TLB miss, not assume it was the
one we were waiting for.
2011-02-23 15:10:49 -06:00
Ali Saidi
e572cf93ee ARM: Delete OABI syscall handling.
We only support EABI binaries, so there is no reason to support OABI syscalls.
The loader detects OABI calls and fatal() so there is no reason to even check
here.
2011-02-23 15:10:48 -06:00
Ali Saidi
511c637ab0 CLCD: Fix some serialization bugs with the clcd controller. 2011-02-23 15:10:48 -06:00
Ali Saidi
e2a6275c03 ARM: Add support for read of 100MHz clock in system controller. 2011-02-23 15:10:48 -06:00
Ali Saidi
2157b9976b ARM: Reset simulation statistics when pref counters are reset.
The ARM performance counters are not currently supported by the model.
This patch interprets a 'reset performance counters' command to mean 'reset
the simulator statistics' instead.
2011-02-23 15:10:48 -06:00
Ali Saidi
d63020717c ARM: Adds dummy support for a L2 latency miscreg. 2011-02-23 15:10:48 -06:00
Korey Sewell
78c37b8048 ruby: extend dprintfs for RubyGenerated TraceFlag
"executing" isnt a very descriptive debug message and in going through the
output you get multiple messages that say "executing" but nothing to help
you parse through the code/execution.

So instead, at least print out the name of the action that is taking
place in these functions.
2011-02-23 00:58:42 -05:00
Korey Sewell
67cc52a605 ruby: cleaning up RubyQueue and RubyNetwork dprintfs
Overall, continue to progress Ruby debug messages to more of the normal M5
debug message style
- add a name() to the Ruby Throttle & PerfectSwitch objects so that the debug output
isn't littered w/"global:" everywhere.
- clean up messages that print over multiple lines when possible
- clean up duplicate prints in the message buffer
2011-02-23 00:58:40 -05:00
Brad Beckmann
63a25a56cc m5: merged in hammer fix 2011-02-22 11:16:40 -08:00
Nilay Vaish
77eed184f5 Ruby: Machine Type missing in MOESI CMP directory protocol
In certain actions of the L1 cache controller, while creating an outgoing
message, the machine type was not being set. This results in a
segmentation fault when trace is collected. Joseph Pusudesris provided
his patch for fixing this issue.
2011-02-19 17:32:43 -06:00
Nilay Vaish
293ccb7037 Ruby: clean MOESI CMP directory protocol
The L1 cache controller file contains references to foo and goo queues, which
are not in use at all. These have been removed.
2011-02-19 17:32:00 -06:00
Korey Sewell
66bb732c04 m5: merge inorder/release-notes/make_release changes 2011-02-18 14:35:15 -05:00
Korey Sewell
bc16bbc158 inorder: add names and slot #s to res. dprints 2011-02-18 14:31:31 -05:00
Korey Sewell
64d31e75b9 inorder: ignore nops in execution unit 2011-02-18 14:30:38 -05:00
Korey Sewell
0fe19836c7 inorder: update graduation unit
make sure instructions are able to commit before writing back to the RF
do not commit more than 1 non-speculative instruction per cycle
2011-02-18 14:30:05 -05:00
Korey Sewell
89335118a5 inorder: recognize isSerializeAfter flag
keep track of when an instruction needs the execution
behind it to be serialized. Without this, in SE Mode
instructions can execute behind a system call exit().
2011-02-18 14:29:48 -05:00
Korey Sewell
bbffd9419d inorder: update default thread size(=1)
a lot of structures get allocated based off that MaxThreads parameter so this is an
effort to not abuse it
2011-02-18 14:29:44 -05:00
Korey Sewell
a278df0b95 inorder: don't overuse getLatency()
resources don't need to call getLatency because the latency is already a member
in the class. If there is some type of special case where different instructions
impose a different latency inside a resource then we can revisit this and
add getLatency() back in
2011-02-18 14:29:40 -05:00
Korey Sewell
37df925953 inorder: update max. resource bandwidths
each resource has a certain # of requests it can take per cycle. update the #s here
to be more realistic based off of the pipeline width and if the resource needs to
be accessed on multiple cycles
2011-02-18 14:29:31 -05:00
Korey Sewell
91c48b1c3b inorder: cleanup in destructors
cleanup hanging pointers and other cruft in the destructors
2011-02-18 14:29:26 -05:00
Korey Sewell
8b4b4a1ba5 inorder: fix cache/fetch unit memory leaks
---
need to delete the cache request's data on clearRequest() now that we are recycling
requests
---
fetch unit needs to deallocate the fetch buffer blocks when they are replaced or
squashed.
2011-02-18 14:29:17 -05:00
Korey Sewell
72b5233112 inorder: remove events for zero-cycle resources
if a resource has a zero cycle latency (e.g. RegFile write), then dont allocate an event
for it to use
2011-02-18 14:29:02 -05:00
Korey Sewell
d5961b2b20 inorder: update pipeline interface for handling finished resource reqs
formerly, to free up bandwidth in a resource, we could just change the pointer in that resource
but at the same time the pipeline stages had visibility to see what happened to a resource request.
Now that we are recycling these requests (to avoid too much dynamic allocation), we can't throw
away the request too early or the pipeline stage gets bad information. Instead, mark when a request
is done with the resource all together and then let the pipeline stage call back to the resource
that it's time to free up the bandwidth for more instructions
*** inteface notes ***
- When an instruction completes and is done in a resource for that cycle, call done()
- When an instruction fails and is done with a resource for that cycle, call done(false)
- When an instruction completes, but isnt finished with a resource, call completed()
- When an instruction fails, but isnt finished with a resource, call completed(false)
* * *
inorder: tlbmiss wakeup bug fix
2011-02-18 14:28:37 -05:00
Korey Sewell
d64226750e inorder: remove request map, use request vector
take away all instances of reqMap in the code and make all references use the built-in
request vectors inside of each resource. The request map was dynamically allocating
a request per instruction. The request vector just allocates N number of requests
during instantiation and then the surrounding code is fixed up to reuse those N requests
***
setRequest() and clearRequest() are the new accessors needed to define a new
request in a resource
2011-02-18 14:28:30 -05:00
Korey Sewell
c883729025 inorder: add valid bit for resource requests
this will allow us to reuse resource requests within a resource instead
of always dynamically allocating
2011-02-18 14:28:22 -05:00
Korey Sewell
ff48afcf4f inorder: remove reqRemoveList
we are going to be getting away from creating new resource requests for every
instruction so no more need to keep track of a reqRemoveList and clean it up
every tick
2011-02-18 14:28:10 -05:00
Korey Sewell
991d0185c6 inorder: initialize res. req. vectors based on resource bandwidth
first change in an optimization that will stop InOrder from allocating new memory for every instruction's
request to a resource. This gets expensive since every instruction needs to access ~10 requests before
graduation. Instead, the plan is to allocate just enough resource request objects to satisfy each resource's
bandwidth (e.g. the execution unit would need to allocate 3 resource request objects for a 1-issue pipeline
since on any given cycle it could have 2 read requests and 1 write request) and then let the instructions
contend and reuse those allocated requests. The end result is a smaller memory footprint for the InOrder model
and increased simulation performance
2011-02-18 14:27:52 -05:00
Gabe Black
fde8b5c387 X86: Get rid of "inline" on the MicroPanic constructor in decoder.cc.
This was making certain versions of gcc omit the function from the object file
which would break the build.
2011-02-15 15:58:16 -08:00
Gabe Black
989138970e Info: Clean up some info files.
Get rid of RELEASE_NOTES since we no longer do releases, update some of the
information in README, and update the date in LICENSE.
2011-02-14 21:36:37 -08:00
Nilay Vaish
343e94a257 Ruby: Improve Change PerfectSwitch's wakeup function
Currently the wakeup function for the PerfectSwitch contains three loops -

loop on number of virtual networks
  loop on number of incoming links
	    loop till all messages for this (link, network) have been routed

With an 8 processor mesh network and Hammer protocol, about 11-12% of the
was observed to have been spent in this function, which is the highest
amongst all the functions. It was found that the innermost loop is executed
about 45 times per invocation of the wakeup function, when each invocation
of the wakeup function processes just about one message.

The patch tries to do away with the redundant executions of the innermost
loop. Counters have been added for each virtual network that record the
number of messages that need to be routed for that virtual network. The
inner loops are only executed when the number of messages for that particular
virtual network > 0. This does away with almost 80% of the executions of the
innermost loop. The function now consumes about 5-6% of the total execution
time.
2011-02-14 16:14:54 -06:00
Gabe Black
77b4a37067 X86: Detect branches taking into account instruction size.
The size of the current instruction determines what the npc should be if
there's no branching.
2011-02-13 17:45:47 -08:00
Gabe Black
bce2be525d X86: Put the result used for flags in an intermediate variable.
Using the destination register directly causes the ISA parser to treat it as a
source even if none of the original bits are used.
2011-02-13 17:45:12 -08:00
Gabe Black
4e1adf85f7 X86: Don't read in dest regs if all bits are replaced.
In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or
64 bits wide overwrite all bits of the destination register. This change
removes false dependencies in these cases where the previous value of a
register doesn't need to be read to write a new value. New versions of most
microops are created that have a "Big" suffix which simply overwrite their
destination, and the right version to use is selected during microop
allocation based on the selected data size.

This does not change the performance of the O3 CPU model significantly, I
assume because there are other false dependencies from the condition code bits
in the flags register.
2011-02-13 17:44:24 -08:00
Gabe Black
399e095510 X86: On a bad microopc, return a microop that returns a fault that panics.
This way a bad micropc will have to get all the way to commit before killing
the simulation. This accounts for misspeculated branches.
2011-02-13 17:42:56 -08:00
Gabe Black
1aa9698fa0 X86: Define fault objects to carry debug messages.
These faults can panic/warn/warn_once, etc., instead of instructions doing
that themselves directly. That way, instructions can be speculatively
executed, and only if they're actually going to commit will their fault be
invoked and the panic, etc., happen.
2011-02-13 17:42:05 -08:00
Gabe Black
5ee94f4a3d X86: Only reset npc to reflect instruction length once.
When redirecting fetch to handle branches, the npc of the current pc state
needs to be left alone. This change makes the pc state record whether or not
the npc already reflects a real value by making it keep track of the current
instruction size, or if no size has been set.
2011-02-13 17:41:10 -08:00
Gabe Black
f036fd9748 O3: Fetch from the microcode ROM when needed. 2011-02-13 17:40:07 -08:00
Ali Saidi
7c763b34c9 O3: Fix GCC 4.2.4 complaint 2011-02-13 16:51:15 -05:00
Nilay Vaish
0cede15d6c Ruby: Reorder Cache Lookup in Protocol Files
The patch changes the order in which L1 dcache and icache are looked up when
a request comes in. Earlier, if a request came in for instruction fetch, the
dcache was looked up before the icache, to correctly handle self-modifying
code. But, in the common case, dcache is going to report a miss and the
subsequent icache lookup is going to report a hit. Given the invariant -
caches under the same controller keep track of disjoint sets of cache blocks,
we can move the icache lookup before the dcache lookup. In case of a hit in
the icache, using our invariant, we know that the dcache would have reported
a miss. In  case of a miss in the icache, we know that icache would have
missed even if the dcache was looked up before looking up the icache.
Effectively, we are doing the same thing as before, though in the common case,
we expect reduction in the number of lookups. This was empirically confirmed
for MOESI hammer. The ratio lookups to access requests is now about 1.1 to 1.
2011-02-12 11:41:20 -06:00
Korey Sewell
470aa289da inorder: clean up the old way of inst. scheduling
remove remnants of old way of instruction scheduling which dynamically allocated
a new resource schedule for every instruction
2011-02-12 10:14:48 -05:00
Korey Sewell
e26aee514d inorder: utilize cached skeds in pipeline
allow the pipeline and resources to use the cached instruction schedule and resource
sked iterator
2011-02-12 10:14:45 -05:00
Korey Sewell
516b611462 inorder: define iterator for resource schedules
resource skeds are divided into two parts: front end (all insts) and back end (inst. specific)
each of those are implemented as separate lists,  so this iterator wraps around
the traditional list iterator so that an instruction can walk it's schedule but seamlessly
transfer from front end to back end when necessary
2011-02-12 10:14:43 -05:00
Korey Sewell
ec9b2ec251 inorder: stage scheduler for front/back end schedule creation
add a stage scheduler class to replace InstStage in pipeline_traits.cc
use that class to define a default front-end, resource schedule that all
instructions will follow. This will also replace the back end schedule in
pipeline_traits.cc. The reason for adding this is so that we can cache
instruction schedules in the future instead of calling the same function
over/over again as well as constantly dynamically alllocating memory on
every instruction to try to figure out it's schedule
2011-02-12 10:14:40 -05:00
Korey Sewell
6713dbfe08 inorder: cache instruction schedules
first step in a optimization to not dynamically allocate an instruction schedule
for every instruction but rather used cached schedules
2011-02-12 10:14:36 -05:00
Korey Sewell
af67631790 inorder: comments for resource sked class 2011-02-12 10:14:34 -05:00
Korey Sewell
800e93f358 inorder: remove unused file
inst_buffer file isn't used , so remove it
2011-02-12 10:14:32 -05:00
Korey Sewell
e65c15e931 inorder: remove unused isa ops
pass/fail ops were used for testing but arent part of isa
2011-02-12 10:14:26 -05:00
Ali Saidi
d4df9e763c VNC/ARM: Use VNC server and add support to boot into X11 2011-02-11 18:29:36 -06:00
Ali Saidi
d33c1d9592 VNC: Add VNC server to M5 2011-02-11 18:29:35 -06:00
Ali Saidi
ded4d319f2 Serialization: Allow serialization of stl lists 2011-02-11 18:29:35 -06:00
Giacomo Gabrielli
a05032f4df O3: Fix pipeline restart when a table walk completes in the fetch stage.
When a table walk is initiated by the fetch stage, the CPU can
potentially move to the idle state and never wake up.

The fetch stage must call cpu->wakeCPU() when a translation completes
(in finishTranslation()).
2011-02-11 18:29:35 -06:00
Giacomo Gabrielli
74eff1b71b O3: Fix a few bugs in the TableWalker object.
Uncacheable requests were set as such only in atomic mode.
currState->delayed is checked in place of currState->timing for resetting
currState in atomic mode.
2011-02-11 18:29:35 -06:00
Ali Saidi
1411cb0b0f SimpleCPU: Fix a case where a DTLB fault redirects fetch and an I-side walk occurs.
This change fixes an issue where a DTLB fault occurs and redirects fetch to
handle the fault and the ITLB requires a walk which delays translation. In this
case the status of the cpu isn't updated appropriately, and an additional
instruction fetch occurs. Eventually this hits an assert as multiple instruction
fetches are occuring in the system and when the second one returns the
processor is in the wrong state.

Some asserts below are removed because it was always true (typo) and the state
after the initiateAcc() the processor could be in any valid state when a
d-side fault occurs.
2011-02-11 18:29:35 -06:00
Giacomo Gabrielli
e2507407b1 O3: Enhance data address translation by supporting hardware page table walkers.
Some ISAs (like ARM) relies on hardware page table walkers.  For those ISAs,
when a TLB miss occurs, initiateTranslation() can return with NoFault but with
the translation unfinished.

Instructions experiencing a delayed translation due to a hardware page table
walk are deferred until the translation completes and kept into the IQ.  In
order to keep track of them, the IQ has been augmented with a queue of the
outstanding delayed memory instructions.  When their translation completes,
instructions are re-executed (only their initiateAccess() was already
executed; their DTB translation is now skipped).  The IEW stage has been
modified to support such a 2-pass execution.
2011-02-11 18:29:35 -06:00
Ali Saidi
453dbc772d ARM: Fix timer calculations.
The timer calculations were a bit off so time would run faster than
it otherwise should
2011-02-11 18:29:35 -06:00
Ali Saidi
59bf0e7eb4 Timesync: Make sure timesync event is setup after curTick is unserialized
Setup initial timesync event in initState or loadState so that curTick has
been updated to the new value, otherwise the event is scheduled in the past.
2011-02-11 18:29:35 -06:00
Brad Beckmann
fbebe9a642 MOESI_hammer: fixed wakeup for SS->S transistion 2011-02-10 13:28:23 -08:00
Brad Beckmann
06dfee5cea ruby: removed duplicate make response call 2011-02-09 16:02:09 -08:00
Nilay Vaish
488280e48b MESI CMP: Unset TBE pointer in L2 cache controller
The TBE pointer in the MESI CMP implementation was not being set to NULL
when the TBE is deallocated. This resulted in segmentation fault on testing
the protocol when the ProtocolTrace was switched on.
2011-02-08 07:47:02 -06:00
Tim Harris
44e5e7e053 X86: Obey the wp bit of CR0.
If cr0.wp ("write protect" bit) is clear then do not generate page faults when
writing to write-protected pages in kernel mode.
2011-02-07 15:18:52 -08:00
Tim Harris
6da83b8a1b X86: Use all 64 bits of the lstar register in the SYSCALL_64 macroop.
During SYSCALL_64, use dataSize=8 when handling new rip (ref
http://www.intel.com/Assets/PDF/manual/253668.pdf 5.8.8 IA32_LSTAR is a 64-bit
address)
2011-02-07 15:16:27 -08:00
Tim Harris
2ea1aa8a4f X86: Fix JMP_FAR_I to unpack a far pointer correctly.
JMP_FAR_I was unpacking its far pointer operand using sll instead of srl like
it should, and also putting the components in the wrong registers for use by
other microcode.
2011-02-07 15:12:59 -08:00
Tim Harris
5810ab121c X86: Read the LDT/GDT at CPL0 when executing an iret.
During iret access LDT/GDT at CPL0 rather than after transition to user mode
(if I'm reading the Intel IA-64 architecture spec correctly, the contents of
the descriptor table are read before the CPL is updated).
2011-02-07 15:05:28 -08:00
Nilay Vaish
10b4b364d9 Orion: Replace printf() with fatal()
The code for Orion 2.0 makes use of printf() at several places where there as
an error in configuration of the model. These have been replaced with fatal().
2011-02-07 12:42:23 -06:00
Korey Sewell
1b4e788407 ruby: add stdio header in SRAM.hh
missing header file caused RUBY_FS to not compile
2011-02-07 12:19:46 -05:00
Gabe Black
0c4b816d84 X86: Fix compiling vtophys.cc 2011-02-07 01:21:21 -08:00
Brad Beckmann
f5aa75fdc5 ruby: support to stallAndWait the mandatory queue
By stalling and waiting the mandatory queue instead of recycling it, one can
ensure that no incoming messages are starved when the mandatory queue puts
signficant of pressure on the L1 cache controller (i.e. the ruby memtester).

--HG--
rename : src/mem/slicc/ast/WakeUpDependentsStatementAST.py => src/mem/slicc/ast/WakeUpAllDependentsStatementAST.py
2011-02-06 22:14:19 -08:00
Brad Beckmann
194a137498 ruby: minor fix to deadlock panic message 2011-02-06 22:14:19 -08:00
Joel Hestness
ebe563e531 garnet: Split network power in ruby.stats
Split out dynamic and static power numbers for printing to ruby.stats
2011-02-06 22:14:19 -08:00
Brad Beckmann
5c2f4937b3 MOESI_hammer: fixed dir bug counting received acks 2011-02-06 22:14:19 -08:00
Brad Beckmann
7edab47448 ruby: numa bit fix for sparse memory 2011-02-06 22:14:19 -08:00
Tushar Krishna
4fa690e8ff MOESI_CMP_token: removed unused message fields 2011-02-06 22:14:19 -08:00
Brad Beckmann
273e3d4924 mem: Added support for Null data packet
The packet now identifies whether static or dynamic data has been allocated and
is used by Ruby to determine whehter to copy the data pointer into the ruby
request.  Subsequently, Ruby can be told not to update phys memory when
receiving packets.
2011-02-06 22:14:19 -08:00
Brad Beckmann
dfa8cbeb06 m5: added work completed monitoring support 2011-02-06 22:14:19 -08:00
Brad Beckmann
c41fc138e7 dev: fixed bugs to extend interrupt capability beyond 15 cores 2011-02-06 22:14:18 -08:00
Joel Hestness
3a2d2223e1 x86: Timing support for pagetable walker
Move page table walker state to its own object type, and make the
walker instantiate state for each outstanding walk. By storing the
states in a queue, the walker is able to handle multiple outstanding
timing requests. Note that functional walks use separate state
elements.
2011-02-06 22:14:18 -08:00
Joel Hestness
52b6119228 TimingSimpleCPU: split data sender state fix
In sendSplitData, keep a pointer to the senderState that may be updated after
the call to handle*Packet. This way, if the receiver updates the packet
senderState, it can still be accessed in sendSplitData.
2011-02-06 22:14:18 -08:00
Brad Beckmann
2da54d1285 ruby: Fix RubyPort to properly handle retrys 2011-02-06 22:14:18 -08:00
Joel Hestness
dedb4fbf05 Ruby: Fix to return cache block size to CPU for split data transfers 2011-02-06 22:14:18 -08:00
Joel Hestness
82844618fd Ruby: Add support for locked memory accesses in X86_FS 2011-02-06 22:14:18 -08:00
Joel Hestness
16c1edebd0 Ruby: Update the Ruby request type names for LL/SC 2011-02-06 22:14:18 -08:00
Brad Beckmann
9782ca5def ruby: Assert for x86 misaligned access
This patch ensures only aligned access are passed to ruby and includes a fix
to the DPRINTF address print.
2011-02-06 22:14:18 -08:00
Brad Beckmann
1b54344aeb MOESI_hammer: Added full-bit directory support 2011-02-06 22:14:18 -08:00
Joel Hestness
62e05ed78a x86: Add checkpointing capability to devices
Add checkpointing capability to the Intel 8254 timer, CMOS, I8042,
PS2 Keyboard and Mouse, I82094AA, I8237, I8254, I8259, and speaker
devices
2011-02-06 22:14:18 -08:00
Joel Hestness
911ccef6c0 x86: Add checkpointing capability to arch components
Add checkpointing capability to the x86 interrupt device and the TLBs
2011-02-06 22:14:17 -08:00
Joel Hestness
38140b5519 x86: implements vtophys
Calls walker to look up virt. to phys. page mapping
2011-02-06 22:14:17 -08:00
Joel Hestness
eea78f968b IntDev: packet latency fix
The x86 local apic now includes a separate latency parameter for interrupts.
2011-02-06 22:14:17 -08:00
Joel Hestness
d9f0a8288e MessagePort: implement the virtual recvTiming function to avoid double pkt delete
Double packet delete problem is due to an interrupt device deleting a packet that the SimpleTimingPort also deletes. Since MessagePort descends from SimpleTimingPort, simply reimplement the failing code from SimpleTimingPort: recvTiming.
2011-02-06 22:14:17 -08:00
Joel Hestness
02b05bf9be MOESI_hammer: trigge queue fix. 2011-02-06 22:14:17 -08:00
Joel Hestness
b4c10bd680 mcpat: Adds McPAT performance counters
Updated patches from Rick Strong's set that modify performance counters for
McPAT
2011-02-06 22:14:17 -08:00
Tushar Krishna
a679e732ce garnet: added orion2.0 for network power calculation 2011-02-06 22:14:17 -08:00
Tushar Krishna
59163f824c garnet: separate data and ctrl VCs
Separate data VCs and ctrl VCs in garnet, as ctrl VCs have 1 buffer per VC,
while data VCs have > 1 buffers per VC. This is for correct power estimations.
2011-02-06 22:14:16 -08:00
Brad Beckmann
afd754dc0d x86: set IsCondControl flag for the appropriate microops 2011-02-06 22:14:16 -08:00
Gabe Black
aa62c217c5 Fault: Forgot to refresh to grab these header guard updates. 2011-02-03 22:07:34 -08:00
Korey Sewell
e396a34b01 inorder: fault handling
Maintain all information about an instruction's fault in the DynInst object rather
than any cpu-request object. Also, if there is a fault during the execution stage
then just save the fault inside the instruction and trap once the instruction
tries to graduate
2011-02-04 00:09:20 -05:00
Korey Sewell
e57613588b inorder: pcstate and delay slots bug
not taken delay slots were not being advanced correctly to pc+8, so for those ISAs
we 'advance()' the pcstate one more time for the desired effect
2011-02-04 00:09:19 -05:00
Korey Sewell
68d962f8af inorder: add a fetch buffer to fetch unit
Give fetch unit it's own parameterizable fetch buffer to read from. Very inefficient
(architecturally and in simulation) to continually fetch at the granularity of the
wordsize. As expected, the number of fetch memory requests drops dramatically
2011-02-04 00:08:22 -05:00
Korey Sewell
56ce8acd41 inorder: overload find-req fn
no need to have separate function name findSplitRequest, just overload the function
2011-02-04 00:08:21 -05:00
Korey Sewell
ab3d37d398 inorder: implement separate fetch unit
instead of having one cache-unit class be responsible for both data and code
accesses, separate code that is just for fetch in it's own derived class off the
original base class. This makes the code easier to manage as well as handle
future cases of special fetch handling
2011-02-04 00:08:20 -05:00
Korey Sewell
f80508de65 inorder: cache port blocking
set the request to false when the cache port blocks so we dont deadlock.
also, comment out the outstanding address list sanity check for now.
2011-02-04 00:08:19 -05:00
Korey Sewell
0c6a679359 inorder: stage width as a python parameter
allow the user to specify how many instructions a pipeline stage can process
on any given cycle (stageWidth...i.e.bandwidth) by setting the parameter through
the python interface rather than compile the code after changing the *.cc file.
(we always had the parameter there, but still used the static 'ThePipeline::StageWidth'
instead)
-
Since StageWidth is now dynamically defined, change the interstage communication
structure to use a vector and get rid of array and array handling index (toNextStageIndex)
since we can just make calls to the list for the same information
2011-02-04 00:08:18 -05:00
Korey Sewell
8ac717ef4c inorder: multi-issue branch resolution
Only execute (resolve) one branch per cycle because handling more than one is
a little more complicated
2011-02-04 00:08:17 -05:00
Korey Sewell
be17617990 inorder: pipe. stage inst. buffering
use skidbuffer as only location for instructions between stages. before,
we had the insts queue from the prior stage and the skidbuffer for the
current stage, but that gets confusing and this consolidation helps
when handling squash cases
2011-02-04 00:08:16 -05:00
Korey Sewell
050944dd73 inorder: change skidBuffer to list instead of queue
manage insertion and deletion like a queue but will need
access to internal elements for future changes
Currently, skidbuffer manages any instruction that was
in a stage but could not complete processing, however
we will want to manage all blocked instructions (from prev stage
and from cur. stage) in just one buffer.
2011-02-04 00:08:15 -05:00
Korey Sewell
7f937e11e2 inorder: activity tracking bug
Previous code was marking CPU activity on almost every cycle due to a bug in
tracking the status of pipeline stages. This disables the CPU from sleeping
on long latency stalls and increases simulation time
2011-02-04 00:08:13 -05:00
Gabe Black
091a3e6cc0 Fault: Rename sim/fault.hh to fault_fwd.hh to distinguish it from faults.hh.
--HG--
rename : src/sim/fault.hh => src/sim/fault_fwd.hh
2011-02-03 21:47:58 -08:00
Gabe Black
00f24ae92c Config: Keep track of uncached and cached ports separately.
This makes sure that the address ranges requested for caches and uncached ports
don't conflict with each other, and that accesses which are always uncached
(message signaled interrupts for instance) don't waste time passing through
caches.
2011-02-03 20:23:00 -08:00
Gabe Black
869a046e41 O3: Fix a style bug in O3. 2011-02-02 23:34:14 -08:00
Gabe Black
cb22bead7d X86: Get rid of the stupd microop. 2011-02-02 19:57:12 -08:00
Gabe Black
eabbdbee63 X86: Replace the stupd microop with a store/update sequence. 2011-02-02 19:56:38 -08:00
Gabe Black
75d34c14fc Time: Add serialization functions to the Time class. 2011-02-02 18:05:03 -08:00
Gabe Black
119f5f8e94 X86: Add L1 caches for the TLB walkers.
Small L1 caches are connected to the TLB walkers when caches are used. This
allows them to participate in the coherence protocol properly.
2011-02-01 18:28:41 -08:00
Gabe Black
4b4cd0303e Fault: Move the definition of NoFault from faults.hh to fault.hh.
Moving the definition of NoFault into fault.hh doesn't bring any new
dependencies with it, and allows some files to include just fault.hh which has
less baggage. NoFault will still be available to everything that includes
faults.hh because it includes fault.hh.
2011-01-31 13:13:00 -08:00
Nathan Binkert
048b1e5843 refcnt: Change things around so that we handle constness correctly.
To use a non const pointer:
typedef RefCountingPtr<Foo> FooPtr;

To use a const pointer:
typedef RefCountingPtr<const Foo> ConstFooPtr;
2011-01-22 21:48:06 -08:00
Steve Reinhardt
5c99ae60b8 checkpointing: fix bug from curTick accessor conversion.
Regex replacement of curTick with curTick() accidentally
changed checkpoint key string for serialization but not
for unserialization.
2011-01-20 22:13:33 -08:00
Gabe Black
ddeaf1252f TimeSync: Use the new setTick and getTick functions. 2011-01-19 16:22:23 -08:00
Gabe Black
23bab6783b Time: Add setTick and getTick functions to the Time class. 2011-01-19 16:22:15 -08:00
Gabe Black
a368fba7d4 Time: Add a mechanism to prevent M5 from running faster than real time.
M5 skips over any simulated time where it doesn't have any work to do. When
the simulation is active, the time skipped is short and the work done at any
point in time is relatively substantial. If the time between events is long
and/or the work to do at each event is small, it's possible for simulated time
to pass faster than real time. When running a benchmark that can be good
because it means the simulation will finish sooner in real time. When
interacting with the real world through, for instance, a serial terminal or
bridge to a real network, this can be a problem. Human or network response time
could be greatly exagerated from the perspective of the simulation and make
simulated events happen "too soon" from an external perspective.

This change adds the capability to force the simulation to run no faster than
real time. It does so by scheduling a periodic event that checks to see if
its simulated period is shorter than its real period. If it is, it stalls the
simulation until they're equal. This is called time syncing.

A future change could add pseudo instructions which turn time syncing on and
off from within the simulation. That would allow time syncing to be used for
the interactive parts of a session but then turned off when running a
benchmark using the m5 utility program inside a script. Time syncing would
probably not happen anyway while running a benchmark because there would be
plenty of work for M5 to do, but the event overhead could be avoided.
2011-01-19 11:48:00 -08:00
Matt Horsnell
77853b9f52 O3: Fix itstate prediction and recovery.
Any change of control flow now resets the itstate to 0 mask and 0 condition,
except where the control flow alteration write into the cpsr register. These
case, for example return from an iterrupt, require the predecoder to recover
the itstate.

As there is a window of opportunity between the return from an interrupt
changing the control flow at the head of the pipe and the commit of the update
to the CPSR, the predecoder needs to be able to grab the ITstate early. This
is now handled by setting the forcedItState inside a PCstate for the control
flow altering instruction.

That instruction will have the correct mask/cond, but will not have a valid
itstate until advancePC is called (note this happens to advance the execution).
When the new PCstate is copy constructed it gets the itstate cond/mask, and
upon advancing the PC the itstate becomes valid.

Subsequent advancing invalidates the state and zeroes the cond/mask. This is
handled in isolation for the ARM ISA and should have no impact on other ISAs.

Refer arch/arm/types.hh and arch/arm/predecoder.cc for the details.
2011-01-18 16:30:05 -06:00
Matt Horsnell
b13a79ee71 O3: Fix some variable length instruction issues with the O3 CPU and ARM ISA. 2011-01-18 16:30:05 -06:00
Matt Horsnell
c98df6f8c2 O3: Don't test misprediction on load instructions until executed. 2011-01-18 16:30:05 -06:00
Ali Saidi
1167ef19cf O3: Keep around the last committed instruction and use for squashing.
Without this change 0 is always used for the youngest sequence number if
a squash occured and the ROB was empty (E.g. an instruction is marked
serializeAfter or a fetch stall prevents other instructions from issuing).
Using 0 there is a race to rename where an instruction that committed the
same cycle as the squashing instruction can have it's renamed state undone
by the squash using sequence number 0.
2011-01-18 16:30:05 -06:00
Ali Saidi
ea058b14da O3: Don't try to scoreboard misc registers.
I'm not positive this is the correct fix, but it's working right now.
Either we need to do something like this, prevent the misc reg from being renamed at all,
or there something else going on. We need to find the root cause as to why
this is only a problem sometimes.
2011-01-18 16:30:05 -06:00
Matt Horsnell
adbd84ab9f ARM: The ARM decoder should not panic when decoding undefined holes is arch.
This can abort simulations when the fetch unit runs ahead and speculatively
decodes instructions that are off the execution path.
2011-01-18 16:30:05 -06:00
Matt Horsnell
11bef2ab38 O3: Fix corner cases where multiple squashes/fetch redirects overwrite timebuf. 2011-01-18 16:30:05 -06:00
Matt Horsnell
62f2097917 O3: Fix mispredicts from non control instructions.
The squash inside the fetch unit should not attempt to remove them from the
branch predictor as non-control instructions are not pushed into the predictor.
2011-01-18 16:30:05 -06:00
Matt Horsnell
5ebf3b2808 O3: Fixes the way prefetches are handled inside the iew unit.
This patch prevents the prefetch being added to the instCommit queue twice.
2011-01-18 16:30:02 -06:00
Ali Saidi
ee9a331fe5 O3: Support timing translations for O3 CPU fetch. 2011-01-18 16:30:02 -06:00
Ali Saidi
0f9a3671b6 ARM: Add support for moving predicated false dest operands from sources. 2011-01-18 16:30:02 -06:00
Min Kyu Jeong
96375409ea O3: Fixes fetch deadlock when the interrupt clears before CPU handles it.
When this condition occurs the cpu should restart the fetch stage to fetch from
the original execution path. Fault handling in the commit stage is cleaned up a
little bit so the control flow is simplier. Finally, if an instruction is being
used to carry a fault it isn't executed, so the fault propagates appropriately.
2011-01-18 16:30:01 -06:00