When each load or store is sent to the LSQ, we check whether it will cross a
cache line boundary and, if so, split it in two. This creates two TLB
translations and two memory requests. Care has to be taken if the first
packet of a split load is sent but the second blocks the cache. Similarly,
for a store, if the first packet cannot be sent, we must store the second
one somewhere to retry later.
This modifies the LSQSenderState class to record both packets in a split
load or store.
Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA
to indicate whether unaligned memory accesses are allowed. This is used
throughout the changed code so that compiler can optimise away code dealing
with split requests for ISAs that don't need them.
When enabled, faulting instructions appear in the trace twice
(once when they fault and again when they're re-executed).
This flag is set by the Exec compound flag for backwards compatibility.
Get rid of misc.py and just stick misc things in __init__.py
Move utility functions out of SCons files and into m5.util
Move utility type stuff from m5/__init__.py to m5/util/__init__.py
Remove buildEnv from m5 and allow access only from m5.defines
Rename AddToPath to addToPath while we're moving it to m5.util
Rename read_command to readCommand while we're moving it
Rename compare_versions to compareVersions while we're moving it.
--HG--
rename : src/python/m5/convert.py => src/python/m5/util/convert.py
rename : src/python/m5/smartdict.py => src/python/m5/util/smartdict.py
Changes so that InOrder can work for a non-delay-slot ISA like Alpha. Typically, changes have to do with handling misspeculated branches at different points in pipeline
For some reason o3 FS init() only called initCPU if the thread state
was Suspended, which was no longer the case. There's no apparent
reason to check, so I whacked the test completely rather than
changing the check to Halted.
The inorder init() was also updated to be symmetric, though the
previous code was just a fancy no-op.
This situation can arise now on the first fetch cycle after
the last active thread is halted. It seems easy enough to
deal with when it happens rather than trying to avoid it.
This provides a common initial status for all threads independent
of CPU model (unlike the prior situation where CPUs initialized
threads to inconsistent states).
This mostly matters for SE mode; in FS mode, ISA-specific startupCPU()
methods generally handle boot-time initialization of thread contexts
(since the right thing to do is ISA-dependent).
Basically merge it in with Halted.
Also had to get rid of a few other functions that
called ThreadContext::deallocate(), including:
- InOrderCPU's setThreadRescheduleCondition.
- ThreadContext::exit(). This function was there to avoid terminating
simulation when one thread out of a multi-thread workload exits, but we
need to find a better (non-cpu-centric) way.
Make interrupts use the new wakeup method, and pull all of the interrupt
stuff into the cpu base class so that only the wakeup code needs to be updated.
I tried to make wakeup, wakeCPU, and the various other mechanisms for waking
and sleeping a little more sane, but I couldn't understand why the statistics
were changing the way they were. Maybe we'll try again some day.
the primary identifier for a hardware context should be contextId(). The
concept of threads within a CPU remains, in the form of threadId() because
sometimes you need to know which context within a cpu to manipulate.
across the subclasses. generally make it so that member data is _cpuId and
accessor functions are cpuId(). The ID val comes from the python (default -1 if
none provided), and if it is -1, the index of cpuList will be given. this has
passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard
switch.
the instruction after the hwrei to be fetched before the ITB/DTB_CM register is updated in a call pal
call sys and thus the translation fails because the user is attempting to access a super page address.
Minimally, it seems as though some sort of fetch stall or refetch after a hwrei is required. I think
this works currently because the hwrei uses the exec context interface, and the o3 stalls when that occurs.
Additionally, these changes don't update the LOCK register and probably break ll/sc. Both o3 changes were
removed since a great deal of manual patching would be required to only remove the hwrei change.
Even though we're not incorrect about operator precedence, let's add
some parens in some particularly confusing places to placate GCC 4.3
so that we don't have to turn the warning off. Agreed that this is a
bit of a pain for those users who get the order of operations correct,
but it is likely to prevent bugs in certain cases.
Fix the logic in the LSQ that determines if there are any stores to
write back. In the commit stage, check for thread specific writebacks
instead of just any writeback.
python type of a latency. In addition, the multiple definitions of profile in the different cpu models caused
problems for intialization of the interval value. If a child class's profile value was defined, the parent
BaseCPU::ProfileEvent interval field would be initialized with a garbage value. The fix was to remove the
multiple redifitions of profile in the child CPU classes.
A whole bunch of stuff has been converted to use the new params stuff, but
the CPU wasn't one of them. While we're at it, make some things a bit
more stylish. Most of the work was done by Gabe, I just cleaned stuff up
a bit more at the end.
This should help if somebody gets to the bug
fix before me (or someone else)...
--HG--
extra : convert_revision : 0ae64c58ef4f7b02996f31e9e9e6bfad344719e2
from the right point (#32 usually) instead of restarting at 0 and double-freeing.
Commented out assert line in free_list.hh that will check for when double-free condition
goes bad.
--HG--
extra : convert_revision : 08d5f9b6a874736e487d101e85c22aaa67bf59ae
SimObjects not yet updated:
- Process and subclasses
- BaseCPU and subclasses
The SimObject(const std::string &name) constructor was removed. Subclasses
that still rely on that behavior must call the parent initializer as
: SimObject(makeParams(name))
--HG--
extra : convert_revision : d6faddde76e7c3361ebdbd0a7b372a40941c12ed
It should be cleared prior to the call to recvRetry.
Add extra DPRINTF statement for clearer debugging output.
--HG--
extra : convert_revision : e2332754743f42d60e159ac89f6fb0fd8b7f57f8
Code was assuming that all argument registers followed in order from ArgumentReg0. There is now an ArgumentReg array which is indexed to find the right index. There is a constant, NumArgumentRegs, which can be used to protect against using an invalid ArgumentReg.
--HG--
extra : convert_revision : f448a3ca4d6adc3fc3323562870f70eec05a8a1f
creation and initialization now happens in python. Parameter objects
are generated and initialized by python. The .ini file is now solely for
debugging purposes and is not used in construction of the objects in any
way.
--HG--
extra : convert_revision : 7e722873e417cb3d696f2e34c35ff488b7bff4ed
into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-o3-micro
src/cpu/o3/fetch_impl.hh:
hand merge
--HG--
extra : convert_revision : 3f71f3ac2035eec8b6f7bceb6906edb4dd09c045
supposed to and make sure parameters have the right type.
Also make sure that any object that should be an intermediate
type has the right options set.
--HG--
extra : convert_revision : d56910628d9a067699827adbc0a26ab629d11e93
into doughnut.hpl.hp.com:/home/gblack/newmem-o3-micro
src/cpu/base_dyn_inst_impl.hh:
src/cpu/o3/fetch_impl.hh:
Hand merge
--HG--
extra : convert_revision : 0c0692033ac30133672d8dfe1f1a27e9d9e95a3d
src/arch/x86/predecoder.cc:
Seperate the pc-pc and the pc of the incoming bytes, and get rid of the "moreBytes" which just takes a MachInst. Also make the "opSize" field describe the number of bytes and not the log of the number of bytes.
--HG--
extra : convert_revision : 3a5ec7053ec69c5cba738a475d8b7fd9e6e6ccc0
src/cpu/o3/alpha/cpu_impl.hh:
Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions.
--HG--
extra : convert_revision : 74f4b1f5fb6f95a56081f367cce7ff44acb5688a
The removed ones were unnecessary. The commented out ones could be useful in the future, should this problem get fixed. See flyspray task #243.
src/cpu/o3/commit_impl.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue_impl.hh:
src/cpu/o3/lsq_impl.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/o3/rob_impl.hh:
Remove/comment out DPRINTFs that were causing a segfault.
--HG--
extra : convert_revision : b5aeda1c6300dfde5e0a3e9b8c4c5f6fa00b9862
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
Update code so that the O3 CPU can handle not initially having anything hooked up to its ports.
--HG--
extra : convert_revision : 04bcef44e754735d821509ebd69b0ef9c8ef8e2c
into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/clean2
src/cpu/base_dyn_inst.hh:
Hand merge. Line is no longer needed because it's handled in the ISA.
--HG--
extra : convert_revision : 0be4067aa38759a5631c6940f0167d48fde2b680
1. Move interrupt handling to a separate function to clean up main commit() function a bit. Also gate the function call off properly based on whether or not there are outstanding interrupts, and the system is not in PAL mode.
2. Better handling of updating instruction's status bits. Instructions are not marked "atCommit" until other stages view it (pushed off to IEW/IQ), and they have been properly handled (faults).
3. Don't consider the ROB "empty" for the purpose of other stages until the ROB is empty, all stores have written back, and there was no store commits this cycle. The last is necessary in case a store committed, in which case it would look like all stores have written back but in actuality have not.
src/cpu/o3/commit.hh:
Slightly modify how interrupts are handled. Also include some extra bools to keep track of state properly.
src/cpu/o3/commit_impl.hh:
Slightly modify how interrupts are handled. Also include some extra bools to keep track of state.
General correctness updates, most specifically for when commit broadcasts to other stages that the ROB is empty.
--HG--
extra : convert_revision : 682ec6ccf4ee6ed0c8a030ceaba1c90a3619d102
src/cpu/o3/iew_impl.hh:
Allow for slightly more flexible handling of non-speculative instructions. They can be other classes now, such as loads or stores.
Also be sure to clear the state associated with squashes that are not used. i.e. if a squash due to a memory ordering violation happens on the same cycle as an older branch squashing, clear the state associated with the memory ordering violation.
Lastly don't consider uncached loads to officially be "at commit" until IEW receives the signal back from commit about the load.
src/cpu/o3/inst_queue_impl.hh:
Don't consider non-speculative instructions to be "at commit" until the IQ has received a signal from commit about the instruction. This prevents non-speculative instructions from being issued too early.
src/cpu/o3/mem_dep_unit_impl.hh:
Clear instruction's ability to issue if it's replayed.
--HG--
extra : convert_revision : d69dae878a30821222885485f4dee87170d56eb3
1. Requests are handled more properly now. They assume the memory system takes control of the request upon sending out an access.
2. load-load ordering is maintained.
src/cpu/base_dyn_inst.hh:
Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out.
Also include some functions to allow certain status bits to be cleared.
src/cpu/base_dyn_inst_impl.hh:
Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out.
src/cpu/o3/fetch_impl.hh:
General correctness fixes. retryPkt is not necessarily always set, so handle it properly. Also consider the cache unblocked only when recvRetry is called.
src/cpu/o3/lsq_unit.hh:
Handle requests a little more correctly. Now that the requests aren't pointed to by the DynInst, be sure to delete the request if it's not being used by the memory system.
Also be sure to not store-load forward from an uncacheable store.
src/cpu/o3/lsq_unit_impl.hh:
Check to make sure load-load ordering was maintained.
Also handle requests a little more correctly.
--HG--
extra : convert_revision : e86bead2886d02443cf77bf7a7a1492845e1690f
1. Set CPU ID in all modes for the O3 CPU.
2. Use nextCycle() function to prevent phase drift in O3 CPU.
3. Remove assertion in rename map that is no longer true.
src/cpu/o3/alpha/cpu_builder.cc:
Allow for CPU id in all modes, not just full system. Also include a parameter that was left out by accident.
src/cpu/o3/alpha/cpu_impl.hh:
Set the CPU ID properly.
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
Use nextCycle() function so that the CPU does not get out of phase when starting up from quiesces.
src/cpu/o3/rename_map.cc:
Remove assertion that is no longer true.
tests/configs/o3-timing.py:
Set CPU's id to 0.
--HG--
extra : convert_revision : 2b69c19adfce2adcc2d1939e89d702bd6674d5d5
into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-x86
src/arch/mips/utility.hh:
src/arch/x86/SConscript:
Hand merge
--HG--
extra : convert_revision : 0ba457aab52bf6ffc9191fd1fe1006ca7704b5b0
Removed the getOpcode function from StaticInst which only made sense for Alpha.
Started implementing the x86 predecoder.
--HG--
extra : convert_revision : a13ea257c8943ef25e9bc573024a99abacf4a70d
automatic. The point is that now a subdirectory can be added
to the build process just by creating a SConscript file in it.
The process has two passes. On the first pass, all subdirs
of the root of the tree are searched for SConsopts files.
These files contain any command line options that ought to be
added for a particular subdirectory. On the second pass,
all subdirs of the src directory are searched for SConscript
files. These files describe how to build any given subdirectory.
I have added a Source() function. Any file (relative to the
directory in which the SConscript resides) passed to that
function is added to the build. Clean up everything to take
advantage of Source().
function is added to the list of files to be built.
--HG--
extra : convert_revision : 103f6b490d2eb224436688c89cdc015211c4fd30
1. Make sure connectMemPorts() only gets called when the CPU's peer gets changed. This is done by making setPeer() virtual, and overriding it in the CPU's ports. When it gets called on a CPU's port (dcache specifically), it calls the normal setPeer() function, and also connectMemPorts().
2. Consolidate redundant code that handles switching in a CPU.
src/cpu/base.cc:
Move common code of switching over peers to base CPU.
src/cpu/base.hh:
Move common code of switching over peers to BaseCPU.
src/cpu/o3/cpu.cc:
Add in function that updates thread context's ports.
Also use updated function to takeOverFrom() in BaseCPU. This gets rid of some repeated code.
src/cpu/o3/cpu.hh:
Include function to update thread context's memory ports.
src/cpu/o3/lsq.hh:
Add function to dcache port that will update the memory ports upon getting a new peer.
Also include a function that will tell the CPU to update those memory ports.
src/cpu/o3/lsq_impl.hh:
Add function that will update the memory ports upon getting a new peer.
src/cpu/simple/atomic.cc:
src/cpu/simple/timing.cc:
Add function that will update thread context's memory ports upon getting a new peer.
Also use the new BaseCPU's take over from function.
src/cpu/simple/atomic.hh:
Add in function (and dcache port) that will allow the dcache to update memory ports when it gets assigned a new peer.
src/cpu/simple/timing.hh:
Add function that will update thread context's memory ports upon getting a new peer.
src/mem/port.hh:
Make setPeer virtual so that other classes can override it.
--HG--
extra : convert_revision : 2050f1241dd2e83875d281cfc5ad5c6c8705fdaf
directly configured by python. Move stuff from root.(cc|hh) to
core.(cc|hh) since it really belogs there now.
In the process, simplify how ticks are used in the python code.
--HG--
extra : convert_revision : cf82ee1ea20f9343924f30bacc2a38d4edee8df3
Created MemCmd class to wrap enum and provide handy methods to
check attributes, convert to string/int, etc.
--HG--
extra : convert_revision : 57f147ad893443e3a2040c6d5b4cdb1a8033930b
into zower.eecs.umich.edu:/eecshome/m5/newmem
src/arch/sparc/isa/formats/mem/util.isa:
src/arch/sparc/isa_traits.hh:
src/arch/sparc/system.cc:
Hand Merge
--HG--
extra : convert_revision : d5e0c97caebb616493e2f642e915969d7028109c
src/cpu/o3/commit_impl.hh:
Oops, changed the logic a little bit. Fix it up to how it used to be.
--HG--
extra : convert_revision : df7f69b0997207b611374c3c92880f3a405e88be
Also don't call (*activeThreads).end() over and over. Just
call activeThreads->end() once and save the result.
Make sure we always check that there are elements in the list
before we grab the first one.
--HG--
extra : convert_revision : d769d8ed52da99532d57a9bbc93e92ddf22b7e58
InstObjParam interface.
src/arch/alpha/isa/branch.isa:
src/arch/alpha/isa/fp.isa:
src/arch/alpha/isa/int.isa:
src/arch/alpha/isa/main.isa:
src/arch/alpha/isa/mem.isa:
src/arch/alpha/isa/pal.isa:
src/arch/mips/isa/formats/mem.isa:
src/arch/mips/isa/formats/util.isa:
Get rid of CodeBlock calls to adapt to new InstObjParam interface.
src/arch/isa_parser.py:
Check template code for operands (in addition to snippets).
src/cpu/o3/alpha/dyn_inst.hh:
Add (read|write)MiscRegOperand calls to Alpha DynInst.
--HG--
extra : convert_revision : 332caf1bee19b014cb62c1ed9e793e793334c8ee
src/arch/isa_parser.py:
Rearranged things so that classes with more than one execute function treat operands properly.
1. Eliminated the CodeBlock class
2. Created a SubOperandList
3. Redefined how InstObjParams is constructed
To define an InstObjParam, you can either pass in a single code literal which will be named "code", or you can pass in a dictionary of code snippets which will be substituted into the Templates. In order to get this to work, there is a new restriction that each template has only one function in it. These changes should only affect memory instructions which have regular and split execute functions.
Also changed the MiscRegs so that they use the instrunctions srcReg and destReg arrays.
src/arch/sparc/isa/formats/basic.isa:
src/arch/sparc/isa/formats/branch.isa:
src/arch/sparc/isa/formats/integerop.isa:
src/arch/sparc/isa/formats/mem/basicmem.isa:
src/arch/sparc/isa/formats/mem/blockmem.isa:
src/arch/sparc/isa/formats/mem/util.isa:
src/arch/sparc/isa/formats/nop.isa:
src/arch/sparc/isa/formats/priv.isa:
src/arch/sparc/isa/formats/trap.isa:
Rearranged to work with new InstObjParam scheme.
src/cpu/o3/sparc/dyn_inst.hh:
Added functions to access the miscregs using the indexes from instructions srcReg and destReg arrays. Also changed the names of the other accessors so that they have the suffix "Operand" if they use those arrays.
src/cpu/simple/base.hh:
Added functions to access the miscregs using the indexes from instructions srcReg and destReg arrays.
--HG--
extra : convert_revision : c91e1073138b72bcf4113a721e0ed40ec600cf2e
RangeSize as a function takes a start address, and a SIZE, and will make the range (start, start+size-1) for you.
src/cpu/memtest/memtest.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/lsq.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/simple/atomic.hh:
src/cpu/simple/timing.hh:
Fix RangeSize arguments
src/dev/alpha/tsunami_cchip.cc:
src/dev/alpha/tsunami_io.cc:
src/dev/alpha/tsunami_pchip.cc:
src/dev/baddev.cc:
pioSize indicates SIZE, not a mask
--HG--
extra : convert_revision : d385521fcfe58f8dffc8622260937e668a47a948