Commit graph

2617 commits

Author SHA1 Message Date
Andreas Hansson 0cacf7e817 Clock: Add a Cycles wrapper class and use where applicable
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.

Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.

This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.

In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.

An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
2012-08-28 14:30:33 -04:00
Andreas Hansson d53d04473e Clock: Rework clocks to avoid tick-to-cycle transformations
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.

In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.

Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.

As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
2012-08-28 14:30:31 -04:00
Andreas Hansson c60db56741 Packet: Remove NACKs from packet and its use in endpoints
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).

The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.

Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
2012-08-22 11:39:59 -04:00
Andreas Hansson 70e99e0b91 Device: Remove overloaded pio_latency parameter
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.

The PciConfigAll now also uses a Param.Latency rather than a
Param.Tick. For backwards compatibility it still sets the pio_latency
to 1 tick. All the comments have also been updated to not state that
it is in simticks when it is not necessarily the case.
2012-08-21 05:50:03 -04:00
Andreas Hansson 452217817f Clock: Move the clock and related functions to ClockedObject
This patch moves the clock of the CPU, bus, and numerous devices to
the new class ClockedObject, that sits in between the SimObject and
MemObject in the class hierarchy. Although there are currently a fair
amount of MemObjects that do not make use of the clock, they
potentially should do so, e.g. the caches should at some point have
the same clock as the CPU, potentially with a 1:n ratio. This patch
does not introduce any new clock objects or object hierarchies
(clusters, clock domains etc), but is still a step in the direction of
having a more structured approach clock domains.

The most contentious part of this patch is the serialisation of clocks
that some of the modules (but not all) did previously. This
serialisation should not be needed as the clock is set through the
parameters even when restoring from the checkpoint. In other words,
the state is "stored" in the Python code that creates the modules.

The nextCycle methods are also simplified and the clock phase
parameter of the CPU is removed (this could be part of a clock object
once they are introduced).
2012-08-21 05:49:01 -04:00
Nilay Vaish 649e377937 Alpha System: override startup(), instead of loadState()
Alpha System was overriding loadState() function to setup some functional
event. The system tried to read/write to memory before the Ruby memory had
unserialized the state. With this patch, Alpha System overrides the
startup() function, and sets up functional events in this function. This
works because startup() is called after Ruby memory system has unserialized
the memory state.
2012-08-16 23:45:21 -05:00
Anthony Gutierrez 0b3897fc90 O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs
This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.

This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.
2012-08-15 10:38:08 -04:00
Ali Saidi dd1b346584 sysemul: bump all linux versions of for syscal emulation to 3.0.
New tool chains seem to be looking for kernel versions newer than what
this this was previously set to. Also take this opportunity to change
the hostname we report in uname to sim.gem5.org.
2012-08-15 10:38:04 -04:00
Marc Orr 7cef6b9bef syscall emulation: Enabled getrlimit and getrusage for x86.
Added/moved rlimit constants to base linux header file.

This patch is a revised version of Vince Weaver's earlier patch.
2012-08-06 19:52:56 -07:00
Marc Orr d55115936e syscall emulation: Clean up ioctl handling, and implement for x86.
Enable different whitelists for different OS/arch combinations,
since some use the generic Linux definitions only, and others
use definitions inherited from earlier Unix flavors on those
architectures.

Also update x86 function pointers so ioctl is no longer
unimplemented on that platform.

This patch is a revised version of Vince Weaver's earlier patch.
2012-08-06 16:52:40 -07:00
Anthony Gutierrez 2eb6b403c9 ARM: fix value of MISCREG_CTR returned by readMiscReg()
According to the A15 TRM the value of this register is as follows (assuming 16 word = 64 byte lines)
[31:29] Format - b100 specifies v7
[28] RAZ - b0
[27:24] CWG log2(max writeback size #words) - 0x4 16 words
[23:20] ERG log2(max reservation size #words) - 0x4 16 words
[19:16] DminLine log2(smallest dcache line #words) - 0x4 16 words
[15:14] L1Ip L1 index/tagging policy - b11 specifies PIPT
[13:4] RAZ - b0000000000
[3:0] IminLine log2(smallest icache line #words) - 0x4 16 words
2012-07-27 16:08:04 -04:00
Nilay Vaish 11a551ae3a X86 CPUID: Return false if unknown processor family 2012-07-22 20:31:23 -05:00
Brad Beckmann 8c18f6da9e x86: added page size in bytes tlb entry function 2012-07-11 12:21:04 -07:00
Marc Orr 387f843d51 syscall emulation: Add the futex system call. 2012-07-10 22:51:54 -07:00
Brad Beckmann 52540b1b78 x86: logSize and lruSeq are now optional ckpt params 2012-07-10 22:51:54 -07:00
Andreas Hansson 46d9adb68c Port: Make getAddrRanges const
This patch makes getAddrRanges const throughout the code base. There
is no reason why it should not be, and making it const prevents adding
any unintentional side-effects.
2012-07-09 12:35:34 -04:00
Andreas Hansson 92eaac0711 gcc: Fix warnings for gcc 4.7 and clang 3.1
This patch fixes two warnings, one related to a narrowing conversion
(int to MachInst), and one due to the cast operator for arguments and
a mismatch in const-ness (const void* and void*).
2012-07-02 08:21:53 -04:00
Ali Saidi 71daeb0b2b ARM: Fix identification of one RAS pop instruction.
The check should be with the op2 field, not with the op1 field.
2012-06-29 11:18:29 -04:00
Ali Saidi 7e3496c78c ARM: Update version of linux we claim to be to 3.0.0.
Static binaries generated with new versions of libc complain that the kernel
is too old otherwise.
2012-06-29 11:18:29 -04:00
Ali Saidi aed8050824 ARM: Fix issue with predicted next pc being wrong because of advance() ordering.
npc in PCState for ARM was being calculated before the current flags were
updated with the next flags. This causes an issue as the npc is incremented by
two or four depending on the current flags (thumb or not) and was leading to
branches that were predicted correctly being identified as mispredicted.
2012-06-29 11:18:28 -04:00
Anthony Gutierrez 9764cde7f2 ARM: implement the ProcessInfo methods 2012-06-11 11:07:41 -04:00
Andreas Hansson a118c01716 Power: Fix MaxMiscDestRegs which was set to zero
This patch fixes a failing compilation caused by MaxMiscDestRegs being
zero. According to gcc 4.6, the result is a comparison that is always
false due to limited range of data type.
2012-06-08 12:44:17 -04:00
Nilay Vaish d6609793d4 X86 TLB: Add a missing = sign 2012-06-07 17:03:45 -05:00
Jayneel Gandhi 7183c3fd56 X86 TLB: Fix for gcc 4.4.3
Due to recent changes to X86 TLB, gem5 stopped compiling on
gcc version 4.4.3. This patch provides the fix for that problem. The patch
is tested on gcc 4.4.3. The change is not required for more recent
versions of gcc (like on 4.6.3).
2012-06-07 08:11:00 -05:00
Anthony Gutierrez d6da3ff317 cpu: Don't init simple and inorder CPUs if they are defered.
initCPU() will be called to initialize switched out CPUs for the simple and
inorder CPU models. this patch prevents those CPUs from being initialized
because they should get their state from the active CPU when it is switched
out.
2012-06-05 14:20:13 -04:00
Ali Saidi 20d25b9da7 ISA: Back-out NoopMachInst as a StaticInstPtr change. 2012-06-05 13:52:30 -04:00
Chander Sudanthi 15228694d0 ARM: removed extra white space
Extra white space fixes in miscregs.hh
2012-06-05 01:23:10 -04:00
Chander Sudanthi 8a2ca2fd24 ARM: Fix MPIDR and MIDR register implementation.
This change allows designating a system as MP capable or not as some
bootloaders/kernels care that it's set right. You can have a single
processor MP capable system, but you can't have a multi-processor
UP only system. This change also fixes the initialization of the MIDR
register.
2012-06-05 01:23:10 -04:00
Ali Saidi 6df196b71e O3: Clean up the O3 structures and try to pack them a bit better.
DynInst is extremely large the hope is that this re-organization will put the
most used members close to each other.
2012-06-05 01:23:09 -04:00
Ali Saidi 1b370431d0 sim: Remove FastAlloc
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
2012-06-05 01:23:08 -04:00
Ali Saidi 0b0c5621ee ARM: Fix compilation on ARM after Gabe's change. 2012-06-05 01:23:08 -04:00
Gabe Black 008b17d816 ISA: Turn the ExtMachInst NoopMachinst into the StaticInstPtr NoopStaticInst.
This eliminates a use of the ExtMachInst type outside of the ISAs.
2012-06-04 10:57:23 -07:00
Gabe Black 35fa5074aa X86: Ensure that the CPUID instruction always writes its outputs.
The CPUID instruction was implemented so that it would only write its results
if the instruction was successful. This works fine on the simple CPU where
unwritten registers retain their old values, but on a CPU like O3 with
renaming this is broken. The instruction needs to write the old values back
into the registers explicitly if they aren't being changed.
2012-06-04 10:43:09 -07:00
Gabe Black 7b73c36f5d X86: Ensure that the decoder's internal ExtMachInst is completely initialized.
There are some bits of some fields of the ExtMachInst which are not actually
used for anything but are included in the hash of an ExtMachInst for
simplicity and efficiency. This change makes sure the decoder's internal
working ExtMachInst is completely initialized, even these unused bits, so that
there isn't any nondeterministic behavior, no valgrind messages about
uninitialized variables, and no potential false misses/redundant entries in
the decode cache.
2012-06-04 10:43:08 -07:00
Gabe Black d9988ded3c X86: Use the HandyM5Reg to avoid a register read and some logic in the TLB. 2012-05-28 21:56:23 -07:00
Gabe Black 40084e0c3e X86: Move the GDT down to where it can be accessed in 32 bit mode.
The GDT can be accessed by user level software running in compatibility mode
by moving segment selectors into segment registers. The GDT needs to be set up
at an address accessible in this mode.
2012-05-27 19:01:08 -07:00
Gabe Black 1d96135087 X86: Truncate addresses to 32 bits except in 64 bit mode, not long mode.
A small change was added a while ago to keep addresses from overflowing 32
bits when larger addresses shouldn't be accessible to software. That change
truncated when not in long mode, but really it should have truncated when not
in 64 bit mode. The difference is whether compatibility mode is included, a
mode that's supposed to act like a legacy 32 bit mode.
2012-05-27 19:01:04 -07:00
Gabe Black 19df4e94ee ISA,CPU: Generalize and split out the components of the decode cache.
This will allow it to be specialized by the ISAs. The existing caching scheme
is provided by the BasicDecodeCache in the GenericISA namespace and is built
from the generalized components.

--HG--
rename : src/cpu/decode_cache.cc => src/arch/generic/decode_cache.cc
2012-05-26 13:45:12 -07:00
Gabe Black 0cba96ba6a CPU: Merge the predecoder and decoder.
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.

--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
2012-05-26 13:44:46 -07:00
Gabe Black eae1e97fb0 ISA: Make the decode function part of the ISA's decoder. 2012-05-25 00:55:24 -07:00
Gabe Black 82a228bd43 Decode: Make the Decoder class defined per ISA.
--HG--
rename : src/cpu/decode.cc => src/arch/generic/decoder.cc
rename : src/cpu/decode.hh => src/arch/generic/decoder.hh
2012-05-25 00:53:37 -07:00
Andreas Hansson d4847fe6ea DMA: Split the DMA device and IO device into seperate files
This patch moves the DMA device to its own set of files, splitting it
from the IO device. There are no behavioural changes associated with
this patch.

The patch also grabs the opportunity to do some very minor tidying up,
including some white space removal and pruning some redundant
parameters.

Besides the immediate benefits of the separation-of-concerns, this
patch also makes upcoming changes more streamlined as it split the
devices that are only slaves and the DMA device that also acts as a
master.

--HG--
rename : src/dev/io_device.cc => src/dev/dma_device.cc
rename : src/dev/io_device.hh => src/dev/dma_device.hh
2012-05-23 09:15:45 -04:00
Andreas Hansson 5b36cf623c MEM: Add a snooping DMA port subclass for table walker
This patch makes the (device) DmaPort non-snooping and removes the
recvSnoop constructor parameter and instead introduces a
SnoopingDmaPort subclass for the ARM table walker.

Functionality is unchanged, as are the stats, and the patch merely
clarifies that the normal DMA ports are not snooping (although they
may issue requests that are snooped by others, as done with PCI, PCIe,
AMBA4 ACE etc).

Currently this port is declared in the ARM table walker as it is not
used anywhere else. If other ports were to have similar behaviour it
could be moved in a future patch.
2012-05-23 09:14:12 -04:00
Nilay Vaish 4d4d212ae9 X86: Split Condition Code register
This patch moves the ECF and EZF bits to individual registers (ecfBit and
ezfBit) and the CF and OF bits to cfofFlag registers. This is being done
so as to lower the read after write dependencies on the the condition code
register. Ultimately we will have the following registers [ZAPS], [OF],
[CF], [ECF], [EZF] and [DF]. Note that this is only one part of the
solution for lowering the dependencies. The other part will check whether
or not the condition code register needs to be actually read. This would
be done through a separate patch.
2012-05-22 11:29:53 -05:00
Marc Orr 16a559c9c6 x86 ISA: Implement the sse3 haddps instruction.
Shuffle the 32 bit values into position, and then add in parallel.
2012-05-19 04:32:25 -07:00
Dam Sunwoo f2f7fa1a1c ARM: guard masked symbol tables by default
Symbol tables masked with the loadAddrMask create redundant entries
that could conflict with kernel function events that rely on the
original addresses.  This patch guards the creation of those masked
symbol tables by default, with an option to enable them when needed
(for early-stage kernel debugging, etc.)
2012-05-10 18:04:27 -05:00
Ali Saidi 8cee4dacc8 gem5: Fix a number of incorrect case statements 2012-05-10 18:04:26 -05:00
Andreas Hansson 3fea59e162 MEM: Separate requests and responses for timing accesses
This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.

For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).

The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.

With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.
2012-05-01 13:40:42 -04:00
Gabe Black 2c85cf41a2 X86: Fix the IMUL_R_P_I macroop.
The disp displacement was left off the load microop so the wrong value was
used.
2012-04-29 02:26:34 -07:00
Vince Weaver 03a91b0533 X86: Fix up the open system call's flags. 2012-04-29 00:31:03 -07:00