These enter and leave thumbEE mode. Currently thumbEE mode behaves exactly the
same as Thumb mode, but at least this will make it -look- like we're enter and
leaving it. The actual behavioral changes will be implemented in future
changes.
This register will always report 0 caches as implemented. It's not clear how
to find out how many there really are when dealing with an arbitrary
hierarchy.
This register controls access to the coprocessors. This doesn't actually
implement it, it allows writes which don't turn anything off. In other words,
it allows the simulated program to ask for what it already has.
This register is supposed to "Clean and invalidate data or unified cache line
by set/way." Since there isn't a good way to do that, we'll just ignore these
and warn about it.
This change moves the writeback of load multiple instructions to the beginning
of the macroop. That way, the MicroLdrRetUop that changes the mode will
necessarily happen later, ensuring the writeback happens in the original mode.
The actual value in the base register if it also shows up in the register list
is undefined, so it's fine if it gets clobbered by one of the loads. For
stores where the base register is the lowest numbered in the register list,
the original value should be written back. That means stores can't write back
at the beginning, but the mode changing problem doesn't affect them so they
can continue to write back at the end.
Instead of panic immediately when these instructions are executed, an
UndefinedInstruction fault is returned. In FS mode (not currently
implemented), this is the fault that should, to my knowledge, be triggered in
these situations and should be handled using the normal architected
mechanisms. In SE mode, the fault causes a panic when it's invoked that gives
the same information as the instruction did. When/if support for speculative
execution of ARM is supported, this will allow a mispeculated and unrecognized
and/or unimplemented instruction from causing a panic. Only once the
instruction is going to be committed will the fault be invoked, triggering the
panic.
Shifting to the right of a signed value when the MSB is one is technically
undefined behavior, even though in my experience it's done the "right thing"
and sign extended the value. This replaces the arithmetic right shift code in
ARM that uses that coincidence with some code that relies on bit math.
This allows the templates to all be available at the same time before any of
the formats, etc. This breaks an artificial circular dependence.
--HG--
rename : src/arch/arm/isa/formats/pred.isa => src/arch/arm/isa/templates/pred.isa
This isn't technically correct since the .w should only be added if there are
32 and 16 bit encodings, but always having it always is better than never
having it.
Further cleanup should probably be done to make this class be non-Ruby
specific and put it in src/base.
There are probably several cases where this class is used, std::bitset
could be used instead.
Suppose the saturating counters of a branch predictor contain n bits. When the
counter is between 0 and (2^(n-1) - 1), boundaries included, the branch is
predicted as not taken. When the counter is between 2^(n-1) and (2^n - 1),
boundaries included, the branch is predicted as taken.
Time from base/time.hh has a name clash with Time from Ruby's
TypeDefines.hh. Eventually Ruby's Time should go away, so instead of
fixing this properly just try to avoid the clash.
When doing an unsigned 64 bit division with a divisor that has its most
significant bit set, the division code would spill a bit off of the end of a
uint64_t trying to shift the dividend into position. This change adds code
that handles that case specially by purposefully letting it spill and then
going ahead assuming there was a 65th one bit.
This causes builds to happen in sorted order rather than in
declaration order. This gets annoying when you make a global change
and then you notice that the files that are being compiled are jumping
around the directory hierarchy.
when insts execute, they mark the time they finish to be used for subsequent isnts
they may need forwarding of data. However, the regdepmap was using the wrong
value to index into the destination operands of the instruction to be forwarded.
Thus, in some cases, we are checking to see if the 3rd destination register
for an instruction is executed at a certain time, when there is only 1 dest. register
valid. Thus, we get a bad, uninitialized time value that will stall forwarding
causing performance loss but still the correct execution.
- Make the initialized flag always available, not just in debug mode.
- Make the Initialized flag actually use several bits so it is very
unlikely that something that's uninitialized accidentally looks
initialized.
- Add an initialized() function that tells you if the current event is
indeed initialized.
- Clear the flags on delete so it can't be accidentally thought of as
initialized.
- Fix getFlags assert statement. "How did this ever work?"
Symbolic names should still be used, but this makes it easier to do
things like:
Event::Priority MyObject_Pri = Event::Default_Pri + 1
Remember that higher numbers are lower priority (should we fix this?)
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
make sure to only read 1 src reg. for write-hint and any other similar
'store' instruction. Reading the source reg when its not necessary
can cause the simulator to read from uninitialized values
These recordEvent() calls could cause crashes since they
access the req pointer after it's potentially been
deleted during a failed translation call. (Similar
problem to the traceData bug fixed in the previous cset.)
Moving them above the translation call (as was done
recentlyi in cset 8b2b8e5e7d35) avoids the crash
but doesn't work, since at that point we don't know if
the access is uncached or not.
It's not clear why these calls are there, and no one
seems to use them, so we'll just delete them. If they
are needed, they should be moved to somewhere that's
guaranteed to be after the translation completes but
before the request is possibly deleted, e.g., in
finishTranslation().
Accessing traceData (to call setAddress() and/or setData())
after initiating a timing translation was causing crashes,
since a failed translation could delete the traceData
object before returning.
It turns out that there was never a need to access traceData
after initiating the translation, as the traced data was
always available earlier; this ordering was merely
historical. Furthermore, traceData->setAddress() and
traceData->setData() were being called both from the CPU
model and the ISA definition, often redundantly.
This patch standardizes all setAddress and setData calls
for memory instructions to be in the CPU models and not
in the ISA definition. It also moves those calls above
the translation calls to eliminate the crashes.
Previously, the set size was set to 4. This was mostly do to the fact that a
crazy graduate student use to create networks with 256 l2 cache banks. Now it
is far more likely that users will create systems with less than 64 of any
particular controller type. Therefore Ruby should be optimized for a set size
of 1.
On the config end, if a shared L2 is created for the system, it is
parameterized to have n sharers as defined by option.num_cpus. In addition to
making the cache sharing aware so that discriminating tag policies can make use
of context_ids to make decisions, I added an occupancy AverageStat and an occ %
stat to each cache so that you could know which contexts are occupying how much
cache on average, both in terms of blocks and percentage. Note that since
devices have context_id -1, having an array of occ stats that correspond to
each context_id will break here, so in FS mode I add an extra bucket for device
blocks. This bucket is explicitly not added in SE mode in order to not only
avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas
break when a bucket is 0).
Also, make Formulas work on AverageVector. First, Stat::Average (and thus
Stats::AverageVector) was broken when coming out of a checkpoint and on resets,
this fixes that. Formulas also didn't work with AverageVector, but added
support for that.
When implementing timing address translations instead of atomic, I
forgot to preserve the faults that are returned from the read and
write calls. This patch reinstates them.
When each load or store is sent to the LSQ, we check whether it will cross a
cache line boundary and, if so, split it in two. This creates two TLB
translations and two memory requests. Care has to be taken if the first
packet of a split load is sent but the second blocks the cache. Similarly,
for a store, if the first packet cannot be sent, we must store the second
one somewhere to retry later.
This modifies the LSQSenderState class to record both packets in a split
load or store.
Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA
to indicate whether unaligned memory accesses are allowed. This is used
throughout the changed code so that compiler can optimise away code dealing
with split requests for ISAs that don't need them.
This initiates a timing translation and passes the read or write on to the
processor before waiting for it to finish. Once the translation is finished,
the instruction's state is updated via the 'finish' function. A new
DataTranslation class is created to handle this.
The idea is taken from the implementation of timing translations in
TimingSimpleCPU by Gabe Black. This patch also separates out the timing
translations from this CPU and uses the new DataTranslation class.
- on certain retry requests you can get an assertion failure
- fix by allowing the request to literally "Retry" itself
if it wasnt successful before, and then block any requests
through cache port while waiting for the cache to be
made available for access
when threads are switching in/out the CPU, we need to keep
track of special cases like branches. Add appropriate
variables in ThreadState t track this and then use
these variables when updating pc after context switch
this will be used for when a thread comes back from a cache miss, it needs to update the PCs
because the inst might of been a branch or delayslot in which the next PC isnt always
a straight addition
allow a thread to wakeup and be activated after
it has been in suspended state and another
thread is switched out. Need to give
pipeline stages a "activateThread" function
so that can get to their suspended instruction
when the time is right.
give resources their own specific
activity to do for a "suspend" event
instead of defaulting to deactivating the thread for a
suspend thread event. This really matters
for the fetch sequence unit which wants to remove the
thread from fetching while other units want to
ignore a thread suspension. If you deactivate a thread
in a resource then you may lose some of the allotted
bandwidth that the thread is taking up...
update/add in the use of isThreadReady & isThreadSuspended
functions.Check in activateThread what list a thread is
on so it can be managed accordingly.
-Support ability to activate next ready thread after a cache miss
through the activateNextReadyContext/Thread() functions
-To support this a "readyList" of thread ids is added
-After a cache miss, thread will suspend and then call
activitynextreadythread
allow for events to schedule themselves later if desired. this is important
because of cases like where you need to activate a thread only after the previous
thread has been deactivated. The ordering there has to be enforced
add code to recognize memory stalls in resources and the pipeline as well
as squash a thread if there is a stall and we are in the switch on cache miss
model
add buffer for instructions to switch out to in a pipeline stage
can't squash the instruction and remove the pipeline so we kind of need
to 'suspend' an instruction at the stage while the memory stall resolves
for the switch on cache miss model
- loads were happening on same cycle as the address was generated which is slightly
unrealistic. Instead, force address generation to be on separate cycle from load
initiation
- also, mark the stages in a more traditional way (F-D-X-M-W)
This patch includes the necessary regression updates to test the new ruby
configuration system. The patch includes support for multiple ruby protocols
and adds the ruby random tester. The patch removes atomic mode test for
ruby since ruby does not support atomic mode acceses. These tests can be
added back in when ruby supports atomic mode for real.
--HG--
rename : tests/quick/50.memtest/test.py => tests/quick/60.rubytest/test.py
Removed the dummy power function implementations so that Orion can implement
them correctly. Since Orion lacks modular design, this patch simply enables
scons to compile it. There are no python configuration changes in this patch.
Renamed the MESI directory file to be consistent with all other protocols.
--HG--
rename : src/mem/protocol/MESI_CMP_directory-mem.sm => src/mem/protocol/MESI_CMP_directory-dir.sm
Cleaned up the ruby profilers by moving the memory controller profiling code
out of the main profiler object and into a separate object similar to the
current CacheProfiler. Both the CacheProfiler and MemCntrlProfiler are
specific to a particular Ruby object, CacheMemory and MemoryControl
respectively. Therefore, these profilers should not be SimObjects and
created by the python configuration system, but instead private objects. This
simplifies the creation of these profilers.
Reorganized ruby python configuration so that protocol and ruby memory system
configuration code can be shared by multiple front-end configuration files
(i.e. memory tester, full system, and hopefully the regression tester). This
code works for memory tester, but have not tested fs mode.
Modified ruby's tracing support to no longer rely on the RubySystem map
to convert a sequencer string name to a sequencer pointer. As a
temporary solution, the code uses the sim_object find function.
Eventually, we should develop a better fix.
This patch includes a rather substantial change to the memory controller
profiler in order to work with the new configuration system. Most
noteably, the mem_cntrl_profiler no longer uses a string map, but instead
a vector. Eventually this support should be removed from the main
profiler and go into a separate object. Each memory controller should have
a pointer to that new mem_cntrl profile object.
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
The necessary companion conversion of Ruby objects generated by SLICC
are converted to M5 SimObjects in the following patch, so this patch
alone does not compile.
Conversion of Garnet network models is also handled in a separate
patch; that code is temporarily disabled from compiling to allow
testing of interim code.
Though OutPort's message type is not used to generate code, this fix checks
that the programmer's intent is correct. Eventually, we may want to
remove the message type from the OutPort declaration statement.
1) Move alpha-specific code out of page_table.cc:serialize().
2) Begin serializing M5_pid and unserializing it, but adding an function to do optional paramIn so that old checkpoints don't need to be fixed up.
3) Fix up alpha startup code so that the unserialized M5_pid value is properly written to DTB_IPR_ASN.
4) Fix the memory unserialize that I forgot somehow in the last changeset.
5) Add in an agg_se.py to handle aggregated checkpoints. --bench foo-bar plus positional arguments foo bar are the only changes in usage from se.py.
Note this aggregation stuff has only been tested for Alpha and nothing else, though it should take a very minimal amount of work to get it to work with another ISA.
This patch changes the way that Ruby handles atomic RMW instructions. This implementation, unlike the prior one, is protocol independent. It works by locking an address from the sequencer immediately after the read portion of an RMW completes. When that address is locked, the coherence controller will only satisfy requests coming from one port (e.g., the mandatory queue) and will ignore all others. After the write portion completed, the line is unlocked. This should also work with multi-line atomics, as long as the blocks are always acquired in the same order.
In Linux, the set_thread_area system call stores the address of the thread
local storage area into a field of the current thread_info structure. Later,
to access that value, the program uses the rdhwr instruction to read a
"hardware register" with index 29. The 64 bit MIPS manual, volume II, says
that index 29 is reserved for a future ABI extension and should cause a
"Reserved Instruction Exception". In Linux (and potentially other ISAs) that
exception is trapped and emulated to return the value stored by
set_thread_area as if that were actually stored by a physical register.
The tp_value address (as named in the Linux kernel) is ironically stored as a
control register so that it goes with a particular ThreadContext. Syscall
emulation will use that to emulate storing to the OS's thread info structure,
and rdhwr will emulate faulting and returning that value from software by
returning the value itself, as if it was in hardware. In other words, we fake
faking the register in SE mode. In an FS mode implementation it should
work as specified in the manual.
The MIPS ISA object expects to be constructed with a CPU pointer it uses to
look at other thread contexts and allow them to be manipulated with control
registers. Unfortunately, that differs from all the other ISA classes and
would complicate their implementation.
This change makes the event constructor use a CPU pointer pulled out of the
thread context passed to setMiscReg instead.
Added error messages when:
- a state does not exist in a machine's list of known states.
- an event does not exist in a machine
- the actions of a certain machine have not been declared
Connects M5 cpu and dma ports directly to ruby sequencers and dma
sequencers. Rubymem also includes a pio port so that pio requests
and be forwarded to a special pio bus connecting to device pio
ports.
Some of the micro-ops weren't casting 1 to ULL before shifting,
which can cause problems. On the perl makerand input this
caused some values to be negative that shouldn't have been.
The casts are done as ULL(1) instead of 1ULL to match others
in the m5 code base.
The PC indexes in the various register sets was defined in the section for
unaliased registers which was throwing off the indexing. This moves those
where they belong. Also, to make detecting accesses to the PC easier and
because it's in the same place in all modes, the intRegForceUser function
now passes it through as index 15.
Unfortunately my implementation of the movd instruction had two bugs.
In one case, when moving a 32-bit value into an xmm register, the
lower half of the xmm register was not zero extended.
The other case is that xmm was used instead of xmmlm as the source
for a register move. My test case didn't notice this at first
as it moved xmm0 to eax, which both have the same register
number.
This double cast led to rounding errors which caused
some benchmarks to get the wrong values, most notably lucas
which failed spectacularly due to CVTTSD2SI returning an
off-by-one value. equake was also broken.
Specifically, get rid of the big switch statement so more cases can be
handled. Enumerating all the possible settings doesn't scale well. Also do
some minor style clean up.
Add constants for all the modes and registers, maps for aliasing, functions
that use the maps and range check, and use a named constant instead of a magic
number for the microcode register.
This problem is like the one fixed with movhpd a few weeks ago.
A +8 displacement is used to access memory when there should
be none.
This fix is needed for the perlbmk spec2k benchmark to run.
Right now .cc and .hh files are handled separately, but then
they're just munged together at the end by scons, so it
doesn't buy us anything. Might as well munge from the start
since we'll eventually be adding generated Python files
to the list too.
64-bit vsyscall is different than 32-bit.
There are only two syscalls, time and gettimeofday.
On a real system, there is complicated code that implements these
without entering the kernel. That would be complicated to implement in m5.
Instead we just place code that calls the regular syscalls (this is how
tools such as valgrind handle this case).
This is needed for the perlbmk spec2k benchmark.
These are complicated instructions and the micro-code might be suboptimal.
This has been tested with some small sample programs (attached)
The psrldq instruction is needed by various spec2k programs.
This patch implements the movd_Vo_Edp series of instructions.
It addresses various concerns by Gabe Black about which file the
instruction belonged in, as well as supporting REX prefixed
instructions properly.
This instruction is needed for some of the spec2k benchmarks, most
notably bzip2.
This patch implements the haddpd instruction.
It fixes the problem in the previous version (pointed out by Gabe Black)
where an incorrect result would happen if you issue the instruction
with the same argument twice, i.e. "haddpd %xmm0,%xmm0"
This instruction is used by many spec2k benchmarks.
This patch hooks up the truncate, ftruncate, truncate64 and ftruncate64
system calls on 32-bit and 64-bit X86.
These have been tested on both architectures.
ftruncate/ftruncate64 is needed for the f90 spec2k benchmarks.
When accessing arguments for a syscall, the position of an argument depends on
the policies of the ISA, how much space preceding arguments took up, and the
"alignment" of the index for this particular argument into the number of
possible storate locations. This change adjusts getSyscallArg to take its
index parameter by reference instead of value and to adjust it to point to the
possible location of the next argument on the stack, basically just after the
current one. This way, the rules for the new argument can be applied locally
without knowing about other arguments since those have already been taken into
account implicitly.
All system calls have also been changed to reflect the new interface. In a
number of cases this made the implementation clearer since it encourages
arguments to be collected in one place in order and then used as necessary
later, as opposed to scattering them throughout the function or using them in
place in long expressions. It also discourages using getSyscallArg over and
over to retrieve the same value when a temporary would do the job.
This mostly was a matter of changing the license owner to Princeton
which is as it should have been. The code was originally licensed
under the GPL but was relicensed as BSD by Li-Shiuan Peh on July 27,
2009. This relicensing was in an explicit e-mail to Nathan Binkert,
Brad Beckmann, Mark Hill, David Wood, and Steve Reinhardt.
The movdqa instruction should enforce 16-byte alignment.
This implementation does not do that.
These instructions are needed for most of x86_64 spec2k to run.
The st_size entry was in the wrong place
(see linux-2.6.29/arch/x86/include/asm/stat.h )
Also, the packed attribute is needed when compiling on a
64-bit machine, otherwise gcc adds extra padding that
break the layout of the structure.
This adds support for the 32-bit, big endian Power ISA. This supports both
integer and floating point instructions based on the Power ISA Book I v2.06.
Glibc often assumes that memory it receives from the kernel after a brk
system call will contain only zeros. This is important during a calloc,
because it won't clear the new memory itself. In the simulator, if the
new page exists, it will be cleared using this patch, to mimic the kernel's
functionality.
I've tested these on x86 and they work as expected.
In theory for 32-bit x86 we should have some sort of special
handling for the legacy 16-bit uid/gid syscalls, but in practice
modern toolchains don't use the 16-bit versions, and m5 sets the uid
and gid values to be less than 16-bits anyway.
This fix is needed for the perl spec2k benchmarks to run.
When enabled, faulting instructions appear in the trace twice
(once when they fault and again when they're re-executed).
This flag is set by the Exec compound flag for backwards compatibility.
This prevents redundant prefetches from being issued, solving the
occasional 'needsExclusive && !blk->isWritable()' assertion failure
in cache_impl.hh that several people have run into.
Eliminates "prefetch_cache_check_push" flag, neither setting of
which really solved the problem.
This is simply a translation of the C++ slicc into python with very minimal
reorganization of the code. The output can be verified as nearly identical
by doing a "diff -wBur".
Slicc can easily be run manually by using util/slicc
Get rid of misc.py and just stick misc things in __init__.py
Move utility functions out of SCons files and into m5.util
Move utility type stuff from m5/__init__.py to m5/util/__init__.py
Remove buildEnv from m5 and allow access only from m5.defines
Rename AddToPath to addToPath while we're moving it to m5.util
Rename read_command to readCommand while we're moving it
Rename compare_versions to compareVersions while we're moving it.
--HG--
rename : src/python/m5/convert.py => src/python/m5/util/convert.py
rename : src/python/m5/smartdict.py => src/python/m5/util/smartdict.py