sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Andreas Sandberg	32ecd72b6e	kvm: Add support for state dumping on ARM	2013-04-22 13:20:32 -04:00
Andreas Sandberg	f156020158	kvm: Add basic support for ARM Architecture specific limitations: * LPAE is currently not supported by gem5. We therefore panic if LPAE is enabled when returning to gem5. * The co-processor based interface to the architected timer is unsupported. We can't support this due to limitations in the KVM API on ARM. * M5 ops are currently not supported. This requires either a kernel hack or a memory mapped device that handles the guest<->m5 interface.	2013-04-22 13:20:32 -04:00
Andreas Sandberg	f8f66fa3df	kvm: Add experimental support for a perf-based execution timer Add support for using the CPU cycle counter instead of a normal POSIX timer to generate timed exits to gem5. This should, in theory, provide better resolution when requesting timer signals. The perf-based timer requires a fairly recent kernel since it requires a working PERF_EVENT_IOC_PERIOD ioctl. This ioctl has existed in the kernel for a long time, but it used to be completely broken due to an inverted match when the kernel copied things from user space. Additionally, the ioctl does not change the sample period correctly on all kernel versions which implement it. It is currently only known to work reliably on kernel version 3.7 and above on ARM.	2013-04-22 13:20:32 -04:00
Andreas Sandberg	2607efded8	kvm: Avoid synchronizing the TC on every KVM exit Reduce the number of KVM->TC synchronizations by overloading the getContext() method and only request an update when the TC is requested as opposed to every time KVM returns to gem5.	2013-04-22 13:20:32 -04:00
Andreas Sandberg	f485ad1908	kvm: Basic support for hardware virtualized CPUs This changeset introduces the architecture independent parts required to support KVM-accelerated CPUs. It introduces two new simulation objects: KvmVM -- The KVM VM is a component shared between all CPUs in a shared memory domain. It is typically instantiated as a child of the system object in the simulation hierarchy. It provides access to KVM VM specific interfaces. BaseKvmCPU -- Abstract base class for all KVM-based CPUs. Architecture dependent CPU implementations inherit from this class and implement the following methods: * updateKvmState() -- Update the architecture-dependent KVM state from the gem5 thread context associated with the CPU. * updateThreadContext() -- Update the thread context from the architecture-dependent KVM state. * dump() -- Dump the KVM state using (optional). In order to deliver interrupts to the guest, CPU implementations typically override the tick() method and check for, and deliver, interrupts prior to entering KVM. Hardware-virutalized CPU currently have the following limitations: * SE mode is not supported. * PC events are not supported. * Timing statistics are currently very limited. The current approach simply scales the host cycles with a user-configurable factor. * The simulated system must not contain any caches. * Since cycle counts are approximate, there is no way to request an exact number of cycles (or instructions) to be executed by the CPU. * Hardware virtualized CPUs and gem5 CPUs must not execute at the same time in the same simulator instance. * Only single-CPU systems can be simulated. * Remote GDB connections to the guest system are not supported. Additionally, m5ops requires an architecture specific interface and might not be supported.	2013-04-22 13:20:32 -04:00
Timothy M. Jones	005616518c	cpu: Let python scripts obtain the number of instructions executed	2013-04-22 13:20:31 -04:00
Andreas Sandberg	5f2361f3af	arm: Enable support for triggering a sim panic on kernel panics Add the options 'panic_on_panic' and 'panic_on_oops' to the LinuxArmSystem SimObject. When these option are enabled, the simulator panics when the guest kernel panics or oopses. Enable panic on panic and panic on oops in ARM-based test cases.	2013-04-22 13:20:31 -04:00
Dam Sunwoo	e8381142b0	sim: separate nextCycle() and clockEdge() in clockedObjects Previously, nextCycle() could return the current cycle if the current tick was already aligned with the clock edge. This behavior is not only confusing (not quite what the function name implies), but also caused problems in the drainResume() function. When exiting/re-entering the sim loop (e.g., to take checkpoints), the CPUs will drain and resume. Due to the previous behavior of nextCycle(), the CPU tick events were being rescheduled in the same ticks that were already processed before draining. This caused divergence from runs that did not exit/re-entered the sim loop. (Initially a cycle difference, but a significant impact later on.) This patch separates out the two behaviors (nextCycle() and clockEdge()), uses nextCycle() in drainResume, and uses clockEdge() everywhere else. Nothing (other than name) should change except for the drainResume timing.	2013-04-22 13:20:31 -04:00
Dam Sunwoo	2c1e344313	cpu: generate SimPoint basic block vector profiles This patch is based on http://reviews.m5sim.org/r/1474/ originally written by Mitch Hayenga. Basic block vectors are generated (simpoint.bb.gz in simout folder) based on start and end addresses of basic blocks. Some comments to the original patch are addressed and hooks are added to create and resume from checkpoints based on instruction counts dictated by external SimPoint analysis tools. SimPoint creation/resuming options will be implemented as a separate patch.	2013-04-22 13:20:31 -04:00
Ali Saidi	c9e4678c16	cpu: fix a switching issue with the o3 cpu. This change fixes the switcheroo test that broke earlier this month. The code that was checking for the pipeline being blocked wasn't checking for a pending translation, only for a icache access.	2013-04-22 13:20:31 -04:00
Nilay Vaish	ac778b1d02	o3cpu: commit: changes interrupt handling Currently the commit stage keeps a local copy of the interrupt object. Since the interrupt is usually handled several cycles after the commit stage becomes aware of it, it is possible that the local copy of the interrupt object may not be the interrupt that is actually handled. It is possible that another interrupt occurred in the interval between interrupt detection and interrupt handling. This patch creates a copy of the interrupt just before the interrupt is handled. The local copy is ignored.	2013-03-29 14:05:26 -05:00
Andreas Hansson	08c1835bef	cpu: Remove CpuPort and use MasterPort in the CPU classes This patch changes the port in the CPU classes to use MasterPort instead of the derived CpuPort. The functions of the CpuPort are now distributed across the relevant subclasses. The port accessor functions (getInstPort and getDataPort) now return a MasterPort instead of a CpuPort. This simplifies creating derivative CPUs that do not use the CpuPort.	2013-03-26 14:46:42 -04:00
Andreas Hansson	2ca42cd626	cpu: Avoid including inorder TLBUnit to avoid gcc LTO bug This patch comments out the inclusion of the inorder TLBUnit which is only used in the 9-stage pipeline. With the TLBUnit present, gcc >= 4.6 in combination with LTO ends up throwing away the definition of the TLBUnit destructor, and consequently fail to link. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53808 for more details about the bug, and http://gcc.gnu.org/ml/gcc/2012-06/msg00397.html for the discussion thread that also touches on similar issues seen with clang.	2013-03-20 06:41:23 -04:00
Andreas Sandberg	fc6f569d94	cpu: Fix state transition bug in the traffic generator The traffic generator used to incorrectly determine the next state in when state 0 had a non-zero probability. Due to the way the next transition was determined, state 0 could never be entered other than as an initial state. This changeset updates the transitition() method to correctly handle such cases and cases where the transition matrix is a 1x1 matrix.	2013-03-12 18:41:29 +01:00
Ali Saidi	f205d83359	cpu: fix a switching issue with the o3 cpu. This change fixes the switcheroo test that broke earlier this month. The code that was checking for the pipeline being blocked wasn't checking for a pending translation, only for a icache access.	2013-03-04 23:33:47 -05:00
Andreas Hansson	a62afd094b	scons: Fix warnings issued by clang 3.2svn (XCode 4.6) This patch fixes the warnings that clang3.2svn emit due to the "-Wall" flag. There is one case of an uninitialised value in the ARM neon ISA description, and then a whole range of unused private fields that are pruned.	2013-02-19 05:56:08 -05:00
Andreas Hansson	319443d42d	scons: Add warning for missing declarations This patch enables warnings for missing declarations. To avoid issues with SWIG-generated code, the warning is only applied to non-SWIG code.	2013-02-19 05:56:07 -05:00
Andreas Hansson	c10098f28b	scons: Fix up numerous warnings about name shadowing This patch address the most important name shadowing warnings (as produced when using gcc/clang with -Wshadow). There are many locations where constructor parameters and function parameters shadow local variables, but these are left unchanged.	2013-02-19 05:56:06 -05:00
Andreas Hansson	5c7ebee434	x86: Move APIC clock divider to Python This patch moves the 16x APIC clock divider to the Python code to avoid the post-instantiation modifications to the clock. The x86 APIC was the only object setting the clock after creation time and this required some custom functionality and configuration. With this patch, the clock multiplier is moved to the Python code and the objects are instantiated with the appropriate clock.	2013-02-19 05:56:06 -05:00
Andreas Hansson	0622f30961	mem: Add predecessor to SenderState base class This patch adds a predecessor field to the SenderState base class to make the process of linking them up more uniform, and enable a traversal of the stack without knowing the specific type of the subclasses. There are a number of simplifications done as part of changing the SenderState, particularly in the RubyTest.	2013-02-19 05:56:05 -05:00
Andreas Sandberg	3af59ab386	cpu: Document exec trace flags	2013-02-15 17:40:10 -05:00
Geoffrey Blake	ca96e7bff1	cpu: Avoid duplicate entries in tracking structures for writes to misc regs setMiscReg currently makes a new entry for each write to a misc reg without checking for duplicates, this can cause a triggering of the assert if an instruction get replayed and writes to the same misc regs multiple times. This fix prevents duplicate entries and instead updates the value.	2013-02-15 17:40:10 -05:00
Geoffrey Blake	8e79c68936	cpu: Fix rename mis-handling serializing instructions when resource constrained The rename can mis-handle serializing instructions (i.e. strex) if it gets into a resource constrained situation and the serializing instruction has to be placed on the skid buffer to handle blocking. In this situation the instruction informs the pipeline it is serializing and logs that the next instruction must be serialized, but since we are blocking the pipeline defers this action to place the serializing instruction and incoming instructions into the skid buffer. When resuming from blocking, rename will pull the serializing instruction from the skid buffer and the current logic will see this as the "next" instruction that has to be serialized and because of flags set on the serializing instruction, it passes through the pipeline stage as normal and resets rename to non-serializing. This causes instructions to follow the serializing inst incorrectly and eventually leads to an error in the pipeline. To fix this rename should check first if it has to block before checking for serializing instructions.	2013-02-15 17:40:10 -05:00
Matt Horsnell	e88e7d88b9	o3: fix tick used for renaming and issue with range selection Fixes the tick used from rename: - previously this gathered the tick on leaving rename which was always 1 less than the dispatch. This conflated the decode ticks when back pressure built in the pipeline. - now picks up tick on entry. Added --store_completions flag: - will additionally display the store completion tail in the viewer. - this highlights periods when large numbers of stores are outstanding (>16 LSQ blocking) Allows selection by tick range (previously this caused an infinite loop)	2013-02-15 17:40:09 -05:00
Andreas Sandberg	b904bd5437	sim: Add a system-global option to bypass caches Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches. To make memory mode tests cleaner, the following methods are added to the System class: * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed. The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore.	2013-02-15 17:40:09 -05:00
Andreas Sandberg	1eec115c31	cpu: Refactor memory system checks CPUs need to test that the memory system is in the right mode in two places, when the CPU is initialized (unless it's switched out) and on a drainResume(). This led to some code duplication in the CPU models. This changeset introduces the verifyMemoryMode() method which is called by BaseCPU::init() if the CPU isn't switched out. The individual CPU models are responsible for calling this method when resuming from a drain as this code is CPU model specific.	2013-02-15 17:40:08 -05:00
Andreas Sandberg	7f1263f144	cpu: Make checker CPUs inherit from CheckerCPU in the Python hierarchy Checker CPUs currently don't inherit from the CheckerCPU in the Python object hierarchy. This has two consequences: * It makes CPU model discovery from the Python world somewhat complicated as there is no way of testing if a CPU is a checker. * Parameters are duplicated in the checker configuration specification. This changeset makes all checker CPUs inherit from the base checker CPU class.	2013-02-15 17:40:08 -05:00
Andreas Sandberg	7cd1fd4324	cpu: Add CPU metadata om the Python classes The configuration scripts currently hard-code the requirements of each CPU. This is clearly not optimal as it makes writing new configuration scripts painful and adding new CPU models requires existing scripts to be updated. This patch adds the following class methods to the base CPU and all relevant CPUs: * memory_mode -- Return a string describing the current memory mode (invalid/atomic/timing). * require_caches -- Does the CPU model require caches? * support_take_over -- Does the CPU support CPU handover?	2013-02-15 17:40:08 -05:00
Ali Saidi	4412046041	cpu: include set in o3/commit_impl. While the majority of compilers seemed to pickup set from else where, one version of gcc 4.7 complains, so explictly add it.	2013-02-15 17:40:08 -05:00
Ali Saidi	7ae06a3b3b	cpu: fix case with o3 cpu blocking and unblocking decode in cycle Fix a case in the O3 CPU where the decode stage blocks and unblocks in a single cycle sending both signals to fetch which causes an assert or worse. The previous check could never work before since the status was set to Blocked before a test for the status being Unblocking was executed.	2013-02-15 17:40:08 -05:00
Ali Saidi	b84bd3028c	cpu: Fix a livelock in the o3 cpu. Check if an instruction just enabled interrupts and we've previously had an interrupt pending that was not handled because interrupts were subsequently disabled before the pipeline reached a place to handle the interrupt. In that case squash now to make sure the interrupt is handled.	2013-02-15 17:40:07 -05:00
Nilay Vaish ext:(%2C%20Timothy%20Jones%20%3Ctimothy.jones%40cl.cam.ac.uk%3E)	dbeabedaf0	branch predictor: move out of o3 and inorder cpus This patch moves the branch predictor files in the o3 and inorder directories to src/cpu/pred. This allows sharing the branch predictor across different cpu models. This patch was originally posted by Timothy Jones in July 2010 but never made it to the repository. --HG-- rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh	2013-01-24 12:28:51 -06:00
Andrea Pellegrini	11d5ffa108	o3 cpu: fix zero reg problem There was an issue w/ the rename logic, which would assign a previous physical register to the ZeroReg architectural register in x86. This issue was giving problems for instructions squashed in threads w/ ID different from 0, sometimes allowing non-mispredicted instructions to obtain a value different from zero when reading the zeroReg.	2013-01-22 00:13:28 -06:00
Nilay Vaish	fc57ae6401	x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch The changes made by the changeset 270c9a75e91f do not work well with switching of cpus. The problem is that decoder for the old thread context holds state that is not taken over by the new decoder. This patch adds a takeOverFrom() function to Decoder class in each ISA. Except for x86, functions in other ISAs are blank. For x86, the function copies state from the old decoder to the new decoder.	2013-01-22 00:10:10 -06:00
Joel Hestness	1429d21244	O3 IEW: Make incrWb and decrWb clearer Move the increment/decrement of wbOutstanding outside of the comparison in incrWb and decrWb in the IEW. This also fixes a compiler bug with gcc 4.4.7, which incorrectly optimizes "-- ==" as "-=".	2013-01-19 15:14:54 -06:00
Nilay Vaish	5b6f972750	ruby: remove calls to g_system_ptr->getTime() This patch further removes calls to g_system_ptr->getTime() where ever other clocked objects are available for providing current time.	2013-01-17 13:10:12 -06:00
Nilay Vaish	f7c0ba406e	base simple cpu: removes commented out code about cache ops	2013-01-12 22:11:16 -06:00
Nilay Vaish	25ec278a0b	x86: Changes to decoder, corrects 9376 The changes made by the changeset 9376 were not quite correct. The patch made changes to the code which resulted in decoder not getting initialized correctly when the state was restored from a checkpoint. This patch adds a startup function to each ISA object. For x86, this function sets the required state in the decoder. For other ISAs, the function is empty right now.	2013-01-12 22:09:48 -06:00
Andreas Sandberg	009970f59b	cpu: Unify the serialization code for all of the CPU models Cleanup the serialization code for the simple CPUs and the O3 CPU. The CPU-specific code has been replaced with a (un)serializeThread that serializes the thread state / context of a specific thread. Assuming that the thread state class uses the CPU-specific thread state uses the base thread state serialization code, this allows us to restore a checkpoint with any of the CPU models.	2013-01-07 13:05:52 -05:00
Andreas Sandberg	e09e9fa279	cpu: Flush TLBs on switchOut() This changeset inserts a TLB flush in BaseCPU::switchOut to prevent stale translations when doing repeated switching. Additionally, the TLB flushing functionality is exported to the Python to make debugging of switching/checkpointing easier. A simulation script will typically use the TLB flushing functionality to generate a reference trace. The following sequence can be used to simulate a handover (this depends on how drain is implemented, but is generally the case) between identically configured CPU models: m5.drain(test_sys) [ cpu.flushTLBs() for cpu in test_sys.cpu ] m5.resume(test_sys) The generated trace should normally be identical to a trace generated when switching between identically configured CPU models or checkpointing and resuming.	2013-01-07 13:05:48 -05:00
Andreas Sandberg	1814a85a05	cpu: Rewrite O3 draining to avoid stopping in microcode Previously, the O3 CPU could stop in the middle of a microcode sequence. This patch makes sure that the pipeline stops when it has committed a normal instruction or exited from a microcode sequence. Additionally, it makes sure that the pipeline has no instructions in flight when it is drained, which should make draining more robust. Draining is controlled in the commit stage, which checks if the next PC after a committed instruction is in microcode. If this isn't the case, it requests a squash of all instructions after that the instruction that just committed and immediately signals a drain stall to the fetch stage. The CPU then continues to execute until the pipeline and all associated buffers are empty.	2013-01-07 13:05:46 -05:00
Andreas Sandberg	9e8003148f	cpu: Make sure that a drained atomic CPU isn't executing ucode Currently, the atomic CPU can be in the middle of a microcode sequence when it is drained. This leads to two problems: * When switching to a hardware virtualized CPU, we obviously can't execute gem5 microcode. * Since curMacroStaticInst is populated when executing microcode, repeated switching between CPUs executing microcode leads to incorrect execution. After applying this patch, the CPU will be on a proper instruction boundary, which means that it is safe to switch to any CPU model (including hardware virtualized ones). This changeset fixes a bug where the multiple switches to the same atomic CPU sometimes corrupts the target state because of dangling pointers to the currently executing microinstruction. Note: This changeset moves tick event descheduling from switchOut() to drain(), which makes timing consistent between just draining a system and draining /and/ switching between two atomic CPUs. This makes debugging quite a lot easier (execution traces get the same timing), but the latency of the last instruction before a drain will not be accounted for correctly (it will always be 1 cycle). Note 2: This changeset removes so_state variable, the locked variable, and the tickEvent from checkpoints since none of them contain state that needs to be preserved across checkpoints. The so_state is made redundant because we don't use the drain state variable anymore, the lock variable should never be set when the system is drained, and the tick event isn't scheduled.	2013-01-07 13:05:46 -05:00
Andreas Sandberg	f9bcf46371	cpu: Make sure that a drained timing CPU isn't executing ucode Currently, the timing CPU can be in the middle of a microcode sequence or multicycle (stayAtPC is true) instruction when it is drained. This leads to two problems: * When switching to a hardware virtualized CPU, we obviously can't execute gem5 microcode. * If stayAtPC is true we might execute half of an instruction twice when restoring a checkpoint or switching CPUs, which leads to an incorrect execution. After applying this patch, the CPU will be on a proper instruction boundary, which means that it is safe to switch to any CPU model (including hardware virtualized ones). This changeset also fixes a bug where the timing CPU sometimes switches out with while stayAtPC is true, which corrupts the target state after a CPU switch or checkpoint. Note: This changeset removes the so_state variable from checkpoints since the drain state isn't used anymore.	2013-01-07 13:05:46 -05:00
Andreas Sandberg	52ff37caa3	cpu: Fix broken thread context handover The thread context handover code used to break when multiple handovers were performed during the same quiesce period. Previously, the thread contexts would assign the TC pointer in the old quiesce event to the new TC. This obviously broke in cases where multiple switches were performed within the same quiesce period, in which case the TC pointer in the quiesce event would point to an old CPU. The new implementation deschedules pending quiesce events in the old TC and schedules a new quiesce event in the new TC. The code has been refactored to remove most of the code duplication.	2013-01-07 13:05:46 -05:00
Andreas Sandberg	fca4fea769	cpu: Fix O3 LSQ debug dumping constness and formatting	2013-01-07 13:05:46 -05:00
Andreas Sandberg	8db27aa230	cpu: Fix broken squashAfter implementation in O3 CPU Commit can currently both commit and squash in the same cycle. This confuses other stages since the signals coming from the commit stage can only signal either a squash or a commit in a cycle. This changeset changes the behavior of squashAfter so that it commits all instructions, including the instruction that requested the squash, in the first cycle and then starts to squash in the next cycle.	2013-01-07 13:05:45 -05:00
Andreas Sandberg	a2077ccf02	o3 cpu: Remove unused variables	2013-01-07 13:05:45 -05:00
Andreas Sandberg	2cfe62adc4	cpu: Rename defer_registration->switched_out The defer_registration parameter is used to prevent a CPU from initializing at startup, leaving it in the "switched out" mode. The name of this parameter (and the help string) is confusing. This patch renames it to switched_out, which should be more descriptive.	2013-01-07 13:05:45 -05:00
Andreas Sandberg	f7da0fddd1	cpu: Remove unused params.hh header file in inorder CPU	2013-01-07 13:05:45 -05:00
Andreas Sandberg	a7e0cbeb36	cpu: Introduce sanity checks when switching between CPUs This patch introduces the following sanity checks when switching between CPUs: * Check that the set of new and old CPUs do not overlap. Having an overlap between the set of new CPUs and the set of old CPUs is currently not supported. Doing such a switch used to result in the following assertion error: BaseCPU::takeOverFrom(BaseCPU): \ Assertion `!new_itb_port->isConnected()' failed. Check that all new CPUs are in the switched out state. * Check that all old CPUs are in the switched in state.	2013-01-07 13:05:44 -05:00
Andreas Sandberg	901258c22b	cpu: Correctly call parent on switchOut() and takeOverFrom() This patch cleans up the CPU switching functionality by making sure that CPU models consistently call the parent on switchOut() and takeOverFrom(). This has the following implications that might alter current functionality: * The call to BaseCPU::switchout() in the O3 CPU is moved from signalDrained() (!) to switchOut(). * A call to BaseSimpleCPU::switchOut() is introduced in the simple CPUs.	2013-01-07 13:05:44 -05:00
Andreas Sandberg	4ae02295d5	cpu: Unify SimpleCPU and O3 CPU serialization code The O3 CPU used to copy its thread context to a SimpleThread in order to do serialization. This was a bit of a hack involving two static SimpleThread instances and a magic constructor that was only used by the O3 CPU. This patch moves the ThreadContext serialization code into two global procedures that, in addition to the normal serialization parameters, take a ThreadContext reference as a parameter. This allows us to reuse the serialization code in all ThreadContext implementations.	2013-01-07 13:05:44 -05:00
Andreas Sandberg	6daada2701	cpu: Initialize the O3 pipeline from startup() The entire O3 pipeline used to be initialized from init(), which is called before initState() or unserialize(). This causes the pipeline to be initialized from an incorrect thread context. This doesn't currently lead to correctness problems as instructions fetched from the incorrect start PC will be squashed a few cycles after initialization. This patch will affect the regressions since the O3 CPU now issues its first instruction fetch to the correct PC instead of 0x0.	2013-01-07 13:05:44 -05:00
Andreas Sandberg	e2dad8236a	cpu: Implement a flat register interface in thread contexts Some architectures map registers differently depending on their mode of operations. There is currently no architecture independent way of accessing all registers. This patch introduces a flat register interface to the ThreadContext class. This interface is useful, for example, when serializing or copying thread contexts.	2013-01-07 13:05:44 -05:00
Andreas Sandberg	17b47d35e1	arch: Move the ISA object to a separate section After making the ISA an independent SimObject, it is serialized automatically by the Python world. Previously, this just resulted in an empty ISA section. This patch moves the contents of the ISA to that section and removes the explicit ISA serialization from the thread contexts, which makes it behave like a normal SimObject during serialization. Note: This patch breaks checkpoint backwards compatibility! Use the cpt_upgrader.py utility to upgrade old checkpoints to the new format.	2013-01-07 13:05:42 -05:00
Andreas Sandberg	7eb0fb8b6e	cpu: Check that the memory system is in the correct mode This patch adds checks to all CPU models to make sure that the memory system is in the correct mode at startup and when resuming after a drain. Previously, we only checked that the memory system was in the right mode when resuming. This is inadequate since this is a configuration error that should be detected at startup as well as when resuming. Additionally, since the check was done using an assert, it wasn't performed when NDEBUG was set (e.g., the fast target).	2013-01-07 13:05:41 -05:00
Andreas Hansson	ccb6c64047	cpu: Share the send functionality between traffic generators This patch moves the packet creating and sending to a member function in the shared base class to avoid code duplication.	2013-01-07 13:05:37 -05:00
Andreas Hansson	1da209140c	cpu: Add support for protobuf input for the trace generator This patch adds support for reading input traces encoded using protobuf according to what is done in the CommMonitor. A follow-up patch adds a Python script that can be used to convert the previously used ASCII traces to protobuf equivalents. The appropriate regression input is updated as part of this patch.	2013-01-07 13:05:37 -05:00
Andreas Hansson	35bdee72cb	cpu: Encapsulate traffic generator input in a stream This patch encapsulates the traffic generator input in a stream class such that the parsing is not visible to the trace generator. The change takes us one step closer to using protobuf-based input traces for the trace replay. The functionality of the current input stream is identical to what it was, and the ASCII format remains the same for now.	2013-01-07 13:05:37 -05:00
Andreas Hansson	f22d3bb9c3	cpu: Fix the traffic gen read percentage This patch fixes the computation that determines whether to perform a read or a write such that the two corner cases (0 and 100) are both more efficient and handled correctly.	2013-01-07 13:05:35 -05:00
Andreas Sandberg	3db3f83a5e	arch: Make the ISA class inherit from SimObject The ISA class on stores the contents of ID registers on many architectures. In order to make reset values of such registers configurable, we make the class inherit from SimObject, which allows us to use the normal generated parameter headers. This patch introduces a Python helper method, BaseCPU.createThreads(), which creates a set of ISAs for each of the threads in an SMT system. Although it is currently only needed when creating multi-threaded CPUs, it should always be called before instantiating the system as this is an obvious place to configure ID registers identifying a thread/CPU.	2013-01-07 13:05:35 -05:00
Ali Saidi	69d419f313	o3: Fix issue with LLSC ordering and speculation This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked address. Previously we relied on the cache to handle the locking for us, however some users on the gem5 mailing list reported a case where the cpu speculatively executes a ll operation after a pending sc operation in the pipeline and that makes the cache monitor valid. This should handle that case by invaliding the local monitor.	2013-01-07 13:05:33 -05:00
Ali Saidi	5146a69835	cpu: rename the misleading inSyscall to noSquashFromTC isSyscall was originally created because during handling of a syscall in SE mode the threadcontext had to be updated. However, in many places this is used in FS mode (e.g. fault handlers) and the name doesn't make much sense. The boolean actually stops gem5 from squashing speculative and non-committed state when a write to a threadcontext happens, so re-name the variable to something more appropriate	2013-01-07 13:05:33 -05:00
Gabe Black	e17c375ddd	Decoder: Remove the thread context get/set from the decoder. This interface is no longer used, and getting rid of it simplifies the decoders and code that sets up the decoders. The thread context had been used to read architectural state which was used to contextualize the instruction memory as it came in. That was changed so that the state is now sent to the decoders to keep locally if/when it changes. That's significantly more efficient. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2013-01-04 19:00:45 -06:00
Nilay Vaish	c120273708	ruby: modify the directed tester to read/write streams The directed tester supports only generating only read or only write accesses. The patch modifies the tester to support streams that have both read and write accesses.	2012-12-11 10:05:55 -06:00
Erik Tomusk	3dc7e4f496	TournamentBP: Fix some bugs with table sizes and counters globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled. globalHistoryBits controls how much history is kept, global and choice predictor sizes control how much of that history is used when accessing predictor tables. This way, global and choice predictors can actually be different sizes, and it is no longer possible to walk off the predictor arrays and cause a seg fault. There are now individual thresholds for choice, global, and local saturating counters, so that taken/not taken decisions are correct even when the predictors' counters' sizes are different. The interface for localPredictorSize has been removed from TournamentBP because the value can be calculated from localHistoryBits. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2012-12-06 09:31:06 -06:00
Malek Musleh	150e9b8c68	inorder cpu: add missing DPRINTF argument Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2012-12-06 05:25:40 -06:00
Nathanael Premillieu	eb899407c5	o3 cpu: remove some unused buggy functions in the lsq Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2012-12-06 04:36:51 -06:00
Andreas Sandberg	b81a977e6a	sim: Move the draining interface into a separate base class This patch moves the draining interface from SimObject to a separate class that can be used by any object needing draining. However, objects not visible to the Python code (i.e., objects not deriving from SimObject) still depend on their parents informing them when to drain. This patch also gets rid of the CountedDrainEvent (which isn't really an event) and replaces it with a DrainManager.	2012-11-02 11:32:01 -05:00
Andreas Sandberg	eb703a4b4e	cpu: O3 add a header declaring the DerivO3CPU SWIG needs a complete declaration of all wrapped objects. This patch adds a header file with the DerivO3CPU class and includes it in the SWIG interface. --HG-- rename : src/cpu/o3/cpu_builder.cc => src/cpu/o3/deriv.cc	2012-11-02 11:32:01 -05:00
Andreas Sandberg	ebe65a394b	cpu: Add header files for checker CPUs In order to create reliable SWIG wrappers, we need to include the declaration of the wrapped class in the SWIG file. Previously, we didn't expose the declaration of checker CPUs. This patch adds header files for such CPUs and include them in the SWIG wrapper. --HG-- rename : src/cpu/dummy_checker_builder.cc => src/cpu/dummy_checker.cc rename : src/cpu/o3/checker_builder.cc => src/cpu/o3/checker.cc	2012-11-02 11:32:01 -05:00
Andreas Sandberg	c0ab52799c	sim: Include object header files in SWIG interfaces When casting objects in the generated SWIG interfaces, SWIG uses classical C-style casts ( (Foo *)bar; ). In some cases, this can degenerate into the equivalent of a reinterpret_cast (mainly if only a forward declaration of the type is available). This usually works for most compilers, but it is known to break if multiple inheritance is used anywhere in the object hierarchy. This patch introduces the cxx_header attribute to Python SimObject definitions, which should be used to specify a header to include in the SWIG interface. The header should include the declaration of the wrapped object. We currently don't enforce header the use of the header attribute, but a warning will be generated for objects that do not use it.	2012-11-02 11:32:01 -05:00
Dam Sunwoo	81406018b0	ARM: dump stats and process info on context switches This patch enables dumping statistics and Linux process information on context switch boundaries (__switch_to() calls) that are used for Streamline integration (a graphical statistics viewer from ARM).	2012-11-02 11:32:01 -05:00
Mrinmoy Ghosh	4440332bdd	o3: Fix a couple of issues with the local predictor. Fix some issues with the local predictor and the way it's indexed.	2012-11-02 11:32:00 -05:00
Nilay Vaish	07ce90f7aa	memtest: move check on outstanding requests The Memtest tester allows for only one request to be outstanding for a particular physical address. The check has been written separately for reads and writes. This patch moves the check earlier than its current position so that it need not be written separately for reads and writes.	2012-10-15 17:27:17 -05:00
Andreas Hansson	2a740aa096	Port: Add protocol-agnostic ports in the port hierarchy This patch adds an additional level of ports in the inheritance hierarchy, separating out the protocol-specific and protocl-agnostic parts. All the functionality related to the binding of ports is now confined to use BaseMaster/BaseSlavePorts, and all the protocol-specific parts stay in the Master/SlavePort. In the future it will be possible to add other protocol-specific implementations. The functions used in the binding of ports, i.e. getMaster/SlavePort now use the base classes, and the index parameter is updated to use the PortID typedef with the symbolic InvalidPortID as the default.	2012-10-15 08:12:35 -04:00
Andreas Hansson	93a159875a	Fix: Address a few minor issues identified by cppcheck This patch addresses a number of smaller issues identified by the code inspection utility cppcheck. There are a number of identified leaks in the arm/linux/system.cc (although the function only get's called once so it is not a major problem), a few deletes in dev/x86/i8042.cc that were not array deletes, and sprintfs where the character array had one element less than needed. In the IIC tags there was a function allocating an array of longs which is in fact never used.	2012-10-15 08:12:23 -04:00
Andreas Hansson	1c321b8847	Regression: Use CPU clock and 32-byte width for L1-L2 bus This patch changes the CoherentBus between the L1s and L2 to use the CPU clock and also four times the width compared to the default bus. The parameters are not intending to fit every single scenario, but rather serve as a better startingpoint than what we previously had. Note that the scripts that do not use the addTwoLevelCacheHiearchy are not affected by this change. A separate patch will update the stats.	2012-10-15 08:08:08 -04:00
Ali Saidi	5adb4ddc12	O3: Pack the comm structures a bit better to reduce their size.	2012-09-25 11:49:40 -05:00
Ali Saidi	0c99d21ad7	ARM: Squash outstanding walks when instructions are squashed.	2012-09-25 11:49:40 -05:00
Andreas Sandberg	6598241f2c	sim: Move CPU-specific methods from SimObject to the BaseCPU class	2012-09-25 11:49:40 -05:00
Djordje Kovacevic	d060a28a29	CPU: Add abandoned instructions to O3 Pipe Viewer	2012-09-25 11:49:40 -05:00
Andreas Hansson	d75b1b5a73	TrafficGen: Add a basic traffic generator This patch adds a traffic generator to the code base. The generator is aimed to be used as a black box model to create appropriate use-cases and benchmarks for the memory system, and in particular the interconnect and the memory controller. The traffic generator is a master module, where the actual behaviour is captured in a state-transition graph where each state generates some sort of traffic. By constructing a graph it is possible to create very elaborate scenarios from basic generators. Currencly the set of generators include idling, linear address sweeps, random address sequences and playback of traces (recording will be done by the Communication Monitor in a follow-up patch). At the moment the graph and the states are described in an ad-hoc line-based format, and in the future this should be aligned with our used of e.g. the Google protobufs. Similarly for the traces, the format is currently a simplistic ad-hoc line-based format that merely serves as a starting point. In addition to being used as a black-box model for system components, the traffic generator is also useful for creating test cases and regressions for the interconnect and memory system. In future patches we will use the traffic generator to create DRAM test cases for the controller model. The patch following this one adds a basic regressions which also contains an example configuration script and trace file for playback.	2012-09-21 11:48:08 -04:00
Andreas Hansson	ffb6aec603	AddrRange: Transition from Range<T> to AddrRange This patch takes the final plunge and transitions from the templated Range class to the more specific AddrRange. In doing so it changes the obvious Range<Addr> to AddrRange, and also bumps the range_map to be AddrRangeMap. In addition to the obvious changes, including the removal of redundant includes, this patch also does some house keeping in preparing for the introduction of address interleaving support in the ranges. The Range class is also stripped of all the functionality that is never used. --HG-- rename : src/base/range.hh => src/base/addr_range.hh rename : src/base/range_map.hh => src/base/addr_range_map.hh	2012-09-19 06:15:44 -04:00
Joel Hestness	16dcb723c1	Base CPU: Initialize profileEvent to NULL The profileEvent pointer is tested against NULL in various places, but it is not initialized unless running in full-system mode. In SE mode, this can result in segmentation faults when profileEvent default intializes to something other than NULL.	2012-09-12 21:40:28 -05:00
Anthony Gutierrez	c6927ed138	stats: remove duplicate instruction stats from the commit stage these stats are duplicates of insts/opsCommitted, cause confusion, and are poorly named.	2012-09-12 11:35:52 -04:00
Nilay Vaish	f00347a20f	Ruby: Use uint8_t instead of uint8 everywhere	2012-09-11 09:23:56 -05:00
Ali Saidi	03ff612054	O3: Get rid of incorrect assert in RAS.	2012-09-07 14:20:53 -05:00
Andreas Hansson	287ea1a081	Param: Transition to Cycles for relevant parameters This patch is a first step to using Cycles as a parameter type. The main affected modules are the CPUs and the Ruby caches. There are definitely plenty more places that are affected, but this patch serves as a starting point to making the transition. An important part of this patch is to actually enable parameters to be specified as Param.Cycles which involves some changes to params.py.	2012-09-07 12:34:38 -04:00
Andreas Hansson	0cacf7e817	Clock: Add a Cycles wrapper class and use where applicable This patch addresses the comments and feedback on the preceding patch that reworks the clocks and now more clearly shows where cycles (relative cycle counts) are used to express time. Instead of bumping the existing patch I chose to make this a separate patch, merely to try and focus the discussion around a smaller set of changes. The two patches will be pushed together though. This changes done as part of this patch are mostly following directly from the introduction of the wrapper class, and change enough code to make things compile and run again. There are definitely more places where int/uint/Tick is still used to represent cycles, and it will take some time to chase them all down. Similarly, a lot of parameters should be changed from Param.Tick and Param.Unsigned to Param.Cycles. In addition, the use of curTick is questionable as there should not be an absolute cycle. Potential solutions can be built on top of this patch. There is a similar situation in the o3 CPU where lastRunningCycle is currently counting in Cycles, and is still an absolute time. More discussion to be had in other words. An additional change that would be appropriate in the future is to perform a similar wrapping of Tick and probably also introduce a Ticks class along with suitable operators for all these classes.	2012-08-28 14:30:33 -04:00
Andreas Hansson	d53d04473e	Clock: Rework clocks to avoid tick-to-cycle transformations This patch introduces the notion of a clock update function that aims to avoid costly divisions when turning the current tick into a cycle. Each clocked object advances a private (hidden) cycle member and a tick member and uses these to implement functions for getting the tick of the next cycle, or the tick of a cycle some time in the future. In the different modules using the clocks, changes are made to avoid counting in ticks only to later translate to cycles. There are a few oddities in how the O3 and inorder CPU count idle cycles, as seen by a few locations where a cycle is subtracted in the calculation. This is done such that the regression does not change any stats, but should be revisited in a future patch. Another, much needed, change that is not done as part of this patch is to introduce a new typedef uint64_t Cycle to be able to at least hint at the unit of the variables counting Ticks vs Cycles. This will be done as a follow-up patch. As an additional follow up, the thread context still uses ticks for the book keeping of last activate and last suspend and this should probably also be changed into cycles as well.	2012-08-28 14:30:31 -04:00
Andreas Hansson	d14e5857c7	Port: Stricter port bind/unbind semantics This patch tightens up the semantics around port binding and checks that the ports that are being bound are currently not connected, and similarly connected before unbind is called. The patch consequently also changes the order of the unbind and bind for the switching of CPUs to ensure that the rules are adhered to. Previously the ports would be "over-written" without any check. There are no changes in behaviour due to this patch, and the only place where the unbind functionality is used is in the CPU.	2012-08-28 14:30:27 -04:00
Andreas Hansson	105ad88d35	Checker: Fix checker CPU ports This patch updates how the checker CPU handles the ports such that the regressions will once again run without causing a panic. A minor amount of tidying up was also done as part of this patch.	2012-08-28 14:30:24 -04:00
Nilay Vaish	9190940511	Ruby: Remove RubyEventQueue This patch removes RubyEventQueue. Consumer objects now rely on RubySystem or themselves for scheduling events.	2012-08-27 01:00:55 -05:00
Andreas Hansson	c60db56741	Packet: Remove NACKs from packet and its use in endpoints This patch removes the NACK frrom the packet as there is no longer any module in the system that issues them (the bridge was the only one and the previous patch removes that). The handling of NACKs was mostly avoided throughout the code base, by using e.g. panic or assert false, but in a few locations the NACKs were actually dealt with (although NACKs never occured in any of the regressions). Most notably, the DMA port will now never receive a NACK and the backoff time is thus never changed. As a consequence, the entire backoff mechanism (similar to a PCI bus) is now removed and the DMA port entirely relies on the bus performing the arbitration and issuing a retry when appropriate. This is more in line with e.g. PCIe. Surprisingly, this patch has no impact on any of the regressions. As mentioned in the patch that removes the NACK from the bridge, a follow-up patch should change the request and response buffer size for at least one regression to also verify that the system behaves as expected when the bridge fills up.	2012-08-22 11:39:59 -04:00
Andreas Hansson	a81c969529	CPU: Remove overloaded function_trace_start parameter This patch removes the overloading of the parameter, which seems both redundant, and possibly incorrect. The inorder CPU is particularly interesting as it uses a different name for the parameter, and never make any use of it internally.	2012-08-21 05:49:43 -04:00
Andreas Hansson	016593f2e9	Clock: Make Tick unsigned and remove UTick This patch makes the Tick unsigned and removes the UTick typedef. The ticks should never be negative, and there was only one major issue with removing it, caused by the o3 CPU using a -1 as an initial value. The patch has no impact on any regressions.	2012-08-21 05:49:09 -04:00
Andreas Hansson	452217817f	Clock: Move the clock and related functions to ClockedObject This patch moves the clock of the CPU, bus, and numerous devices to the new class ClockedObject, that sits in between the SimObject and MemObject in the class hierarchy. Although there are currently a fair amount of MemObjects that do not make use of the clock, they potentially should do so, e.g. the caches should at some point have the same clock as the CPU, potentially with a 1:n ratio. This patch does not introduce any new clock objects or object hierarchies (clusters, clock domains etc), but is still a step in the direction of having a more structured approach clock domains. The most contentious part of this patch is the serialisation of clocks that some of the modules (but not all) did previously. This serialisation should not be needed as the clock is set through the parameters even when restoring from the checkpoint. In other words, the state is "stored" in the Python code that creates the modules. The nextCycle methods are also simplified and the clock phase parameter of the CPU is removed (this could be part of a clock object once they are introduced).	2012-08-21 05:49:01 -04:00
Anthony Gutierrez	0b3897fc90	O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs This patch fixes some problems with the drain/switchout functionality for the O3 cpu and for the ARM ISA and adds some useful debug print statements. This is an incremental fix as there are still a few bugs/mem leaks with the switchout code. Particularly when switching from an O3CPU to a TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA I haven't encountered any more assertion failures; now the kernel will typically panic inside of simulation.	2012-08-15 10:38:08 -04:00
Steve Reinhardt	73ef8bd168	process: add progName() virtual function This replaces a (potentially uninitialized) string field with a virtual function so that we can have a safe interface without requiring changes to the eio code.	2012-08-06 16:55:34 -07:00
Anthony Gutierrez	8133f2460f	checker: make checker cpu id match its host's cpu id when using the checker i ran into problems where an instruction reading the cpu id register failed because the ids did not match, and hence, the result of the instruction did not match. this patch ensures that the ids match so this instruction does not fail. this problem only seemed to manifest itself when multiple cores were in the system, either multi-core, or extra switched- out cores present in the system.	2012-07-27 16:08:04 -04:00
Brad Beckmann	6f9bd33b73	ruby: remove the cpu assumptions for the random tester	2012-07-10 22:51:54 -07:00
Brad Beckmann	4a52a6ea2d	cpu: added assertions to ensure the correct proxies are used	2012-07-10 22:51:53 -07:00
Andreas Hansson	b265d9925c	Port: Align port names in C++ and Python This patch is a first step to align the port names used in the Python world and the C++ world. Ultimately it serves to make the use of config.json together with output from the simulation easier, including post-processing of statistics. Most notably, the CPU, cache, and bus is addressed in this patch, and there might be other ports that should be updated accordingly. The dash name separator has also been replaced with a "." which is what is used to concatenate the names in python, and a separation is made between the master and slave port in the bus.	2012-07-09 12:35:39 -04:00
Andreas Hansson	17f9270dad	Port: Move retry from port base class to Master/SlavePort This patch is the last part of moving all protocol-related functionality out of the Port base class. All the send/recv functions are already moved, and the retry (which still governs all the timing transport functions) is the only part that remained in the base class. The only point where this currently causes a bit of inconvenience is in the bus where the retry list is global and holds Port pointers (not Master/SlavePort). This is about to change with the split into a request/response bus and will soon be removed anyway. The patch has no impact on any regressions.	2012-07-09 12:35:31 -04:00
Andreas Hansson	ff5718f042	Fix: Address a few benign memory leaks This patch is the result of static analysis identifying a number of memory leaks. The leaks are all benign as they are a result of not deallocating memory in the desctructor. The fix still has value as it removes false positives in the static analysis.	2012-07-09 12:35:30 -04:00
Nathanael Premillieu	af2b14a362	O3: Track if the RAS has been pushed or not to pop the RAS if neccessary. Add new flag (named pushedRAS) in the PredictorHistory structure. This flag tracks whether the RAS has been pushed or not during a prediction. Then, in the squash function it is used to pop the RAS if necessary.	2012-06-29 11:18:29 -04:00
Andreas Hansson	754a9570f2	Timing CPU: Remove a redundant port pointer This patch is trivial and merely prunes a pointer that was never set or used.	2012-06-08 12:45:24 -04:00
Anthony Gutierrez	d6da3ff317	cpu: Don't init simple and inorder CPUs if they are defered. initCPU() will be called to initialize switched out CPUs for the simple and inorder CPU models. this patch prevents those CPUs from being initialized because they should get their state from the active CPU when it is switched out.	2012-06-05 14:20:13 -04:00
Ali Saidi	20d25b9da7	ISA: Back-out NoopMachInst as a StaticInstPtr change.	2012-06-05 13:52:30 -04:00
Ali Saidi	6df196b71e	O3: Clean up the O3 structures and try to pack them a bit better. DynInst is extremely large the hope is that this re-organization will put the most used members close to each other.	2012-06-05 01:23:09 -04:00
Ali Saidi	1b370431d0	sim: Remove FastAlloc While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe. After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc when running twolf for ARM.	2012-06-05 01:23:08 -04:00
Gabe Black	008b17d816	ISA: Turn the ExtMachInst NoopMachinst into the StaticInstPtr NoopStaticInst. This eliminates a use of the ExtMachInst type outside of the ISAs.	2012-06-04 10:57:23 -07:00
Andreas Hansson	0d32940711	Bus: Split the bus into a non-coherent and coherent bus This patch introduces a class hierarchy of buses, a non-coherent one, and a coherent one, splitting the existing bus functionality. By doing so it also enables further specialisation of the two types of buses. A non-coherent bus connects a number of non-snooping masters and slaves, and routes the request and response packets based on the address. The request packets issued by the master connected to a non-coherent bus could still snoop in caches attached to a coherent bus, as is the case with the I/O bus and memory bus in most system configurations. No snoops will, however, reach any master on the non-coherent bus itself. The non-coherent bus can be used as a template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses, and is typically used for the I/O buses. A coherent bus connects a number of (potentially) snooping masters and slaves, and routes the request and response packets based on the address, and also forwards all requests to the snoopers and deals with the snoop responses. The coherent bus can be used as a template for modelling QPI, HyperTransport, ACE and coherent OCP buses, and is typically used for the L1-to-L2 buses and as the main system interconnect. The configuration scripts are updated to use a NoncoherentBus for all peripheral and I/O buses. A bit of minor tidying up has also been done. --HG-- rename : src/mem/bus.cc => src/mem/coherent_bus.cc rename : src/mem/bus.hh => src/mem/coherent_bus.hh rename : src/mem/bus.cc => src/mem/noncoherent_bus.cc rename : src/mem/bus.hh => src/mem/noncoherent_bus.hh	2012-05-31 13:30:04 -04:00
Andreas Hansson	cad802761a	Packet: Unify the use of PortID in packet and port This patch removes the Packet::NodeID typedef and unifies it with the Port::PortId. The src and dest fields in the packet are used to hold a port id (e.g. in the bus), and thus the two should actually be the same. The typedef PortID is now global (in base/types.hh) and aligned with the ThreadID in terms of capitalisation and naming of the InvalidPortID constant. Before this patch, two flags were used for valid destination and source, rather than relying on a named value (InvalidPortID), and this is now redundant, as the src and dest field themselves are sufficient to tell whether the current value is a valid port identifier or not. Consequently, the VALID_SRC and VALID_DST are removed. As part of the cleaning up, a number of int parameters and local variables are updated to use PortID. Note that Ruby still has its own NodeID typedef. Furthermore, the MemObject getMaster/SlavePort still has an int idx parameter with a default value of -1 which should eventually change to PortID idx = InvalidPortID.	2012-05-30 05:29:42 -04:00
Gabe Black	19df4e94ee	ISA,CPU: Generalize and split out the components of the decode cache. This will allow it to be specialized by the ISAs. The existing caching scheme is provided by the BasicDecodeCache in the GenericISA namespace and is built from the generalized components. --HG-- rename : src/cpu/decode_cache.cc => src/arch/generic/decode_cache.cc	2012-05-26 13:45:12 -07:00
Gabe Black	0cba96ba6a	CPU: Merge the predecoder and decoder. These classes are always used together, and merging them will give the ISAs more flexibility in how they cache things and manage the process. --HG-- rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc	2012-05-26 13:44:46 -07:00
Gabe Black	eae1e97fb0	ISA: Make the decode function part of the ISA's decoder.	2012-05-25 00:55:24 -07:00
Gabe Black	276f3e9535	CPU: Simplify the implementation of the decode cache. Also reorganize it to make it more amenable to being rearranged later.	2012-05-25 00:54:39 -07:00
Gabe Black	82a228bd43	Decode: Make the Decoder class defined per ISA. --HG-- rename : src/cpu/decode.cc => src/arch/generic/decoder.cc rename : src/cpu/decode.hh => src/arch/generic/decoder.hh	2012-05-25 00:53:37 -07:00
Ali Saidi	4f66bcdd2e	gem5: fix some iterator use and erase bugs	2012-05-10 18:04:27 -05:00
Ali Saidi	5ecaf30219	gem5: fix a number of use after free issues	2012-05-10 18:04:27 -05:00
Andreas Hansson	3fea59e162	MEM: Separate requests and responses for timing accesses This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses. For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions). The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly. With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself.	2012-05-01 13:40:42 -04:00
Andreas Hansson	4c92708b48	MEM: Add the PortId type and a corresponding id field to Port This patch introduces the PortId type, moves the definition of INVALID_PORT_ID to the Port class, and also gives every port an id to reflect the fact that each element in a vector port has an identifier/index. Previously the bus and Ruby testers (and potentially other users of the vector ports) added the id field in their port subclasses, and now this functionality is always present as it is moved to the base class.	2012-04-25 10:41:23 -04:00
Gabe Black	a5187f9d96	CPU: Tidy up some formatting and a DPRINTF in the simple CPU base class. Put the { on the same line as the if and put a space between the if and the open paren. Also, use the # format modifier which puts a 0x in front of hex values automatically. If the ExtMachInst type isn't integral and actually prints something more complicated, the # falls away harmlessly and we aren't left with a phantom 0x followed by a bunch of unrelated text.	2012-04-15 12:35:49 -07:00
Andreas Hansson	14edc6013d	Ruby: Use MasterPort base-class pointers where possible This patch simplifies future patches by changing the pointer type used in a number of the Ruby testers to use MasterPort instead of using a derived CpuPort class. There is no reason for using the more specialised pointers, and there is no longer a need to do any casting. With the latest changes to the tester, organising ports as readers and writes, things got a bit more complicated, and the "type" now had to be removed to be able to fall back to using MasterPort rather than CpuPort.	2012-04-14 05:46:59 -04:00
Andreas Hansson	750f33a901	MEM: Remove the Broadcast destination from the packet This patch simplifies the packet by removing the broadcast flag and instead more firmly relying on (and enforcing) the semantics of transactions in the classic memory system, i.e. request packets are routed from a master to a slave based on the address, and when they are created they have neither a valid source, nor destination. On their way to the slave, the request packet is updated with a source field for all modules that multiplex packets from multiple master (e.g. a bus). When a request packet is turned into a response packet (at the final slave), it moves the potentially populated source field to the destination field, and the response packet is routed through any multiplexing components back to the master based on the destination field. Modules that connect multiplexing components, such as caches and bridges store any existing source and destination field in the sender state as a stack (just as before). The packet constructor is simplified in that there is no longer a need to pass the Packet::Broadcast as the destination (this was always the case for the classic memory system). In the case of Ruby, rather than using the parameter to the constructor we now rely on setDest, as there is already another three-argument constructor in the packet class. In many places where the packet information was printed as part of DPRINTFs, request packets would be printed with a numeric "dest" that would always be -1 (Broadcast) and that field is now removed from the printing.	2012-04-14 05:45:55 -04:00
Andreas Hansson	dccca0d3a9	MEM: Separate snoops and normal memory requests/responses This patch introduces port access methods that separates snoop request/responses from normal memory request/responses. The differentiation is made for functional, atomic and timing accesses and builds on the introduction of master and slave ports. Before the introduction of this patch, the packets belonging to the different phases of the protocol (request -> [forwarded snoop request -> snoop response]* -> response) all use the same port access functions, even though the snoop packets flow in the opposite direction to the normal packet. That is, a coherent master sends normal request and receives responses, but receives snoop requests and sends snoop responses (vice versa for the slave). These two distinct phases now use different access functions, as described below. Starting with the functional access, a master sends a request to a slave through sendFunctional, and the request packet is turned into a response before the call returns. In a system without cache coherence, this is all that is needed from the functional interface. For the cache-coherent scenario, a slave also sends snoop requests to coherent masters through sendFunctionalSnoop, with responses returned within the same packet pointer. This is currently used by the bus and caches, and the LSQ of the O3 CPU. The send/recvFunctional and send/recvFunctionalSnoop are moved from the Port super class to the appropriate subclass. Atomic accesses follow the same flow as functional accesses, with request being sent from master to slave through sendAtomic. In the case of cache-coherent ports, a slave can send snoop requests to a master through sendAtomicSnoop. Just as for the functional access methods, the atomic send and receive member functions are moved to the appropriate subclasses. The timing access methods are different from the functional and atomic in that requests and responses are separated in time and send/recvTiming are used for both directions. Hence, a master uses sendTiming to send a request to a slave, and a slave uses sendTiming to send a response back to a master, at a later point in time. Snoop requests and responses travel in the opposite direction, similar to what happens in functional and atomic accesses. With the introduction of this patch, it is possible to determine the direction of packets in the bus, and no longer necessary to look for both a master and a slave port with the requested port id. In contrast to the normal recvFunctional, recvAtomic and recvTiming that are pure virtual functions, the recvFunctionalSnoop, recvAtomicSnoop and recvTimingSnoop have a default implementation that calls panic. This is to allow non-coherent master and slave ports to not implement these functions.	2012-04-14 05:45:07 -04:00
Andreas Hansson	b6aa6d55eb	clang/gcc: Fix compilation issues with clang 3.0 and gcc 4.6 This patch addresses a number of minor issues that cause problems when compiling with clang >= 3.0 and gcc >= 4.6. Most importantly, it avoids using the deprecated ext/hash_map and instead uses unordered_map (and similarly so for the hash_set). To make use of the new STL containers, g++ and clang has to be invoked with "-std=c++0x", and this is now added for all gcc versions >= 4.6, and for clang >= 3.0. For gcc >= 4.3 and <= 4.5 and clang <= 3.0 we use the tr1 unordered_map to avoid the deprecation warning. The addition of c++0x in turn causes a few problems, as the compiler is more stringent and adds a number of new warnings. Below, the most important issues are enumerated: 1) the use of namespaces is more strict, e.g. for isnan, and all headers opening the entire namespace std are now fixed. 2) another other issue caused by the more stringent compiler is the narrowing of the embedded python, which used to be a char array, and is now unsigned char since there were values larger than 128. 3) a particularly odd issue that arose with the new c++0x behaviour is found in range.hh, where the operator< causes gcc to complain about the template type parsing (the "<" is interpreted as the beginning of a template argument), and the problem seems to be related to the begin/end members introduced for the range-type iteration, which is a new feature in c++11. As a minor update, this patch also fixes the build flags for the clang debug target that used to be shared with gcc and incorrectly use "-ggdb".	2012-04-14 05:43:31 -04:00
Brad Beckmann	3fd425124c	rubytest: remove spurious printf	2012-04-06 17:51:47 -07:00
Brad Beckmann	0a9f4b950f	rubytest: seperated read and write ports. This patch allows the ruby tester to support protocols where the i-cache and d-cache are managed by seperate controllers.	2012-04-06 13:47:06 -07:00
Andreas Hansson	b00949d88b	MEM: Enable multiple distributed generalized memories This patch removes the assumption on having on single instance of PhysicalMemory, and enables a distributed memory where the individual memories in the system are each responsible for a single contiguous address range. All memories inherit from an AbstractMemory that encompasses the basic behaviuor of a random access memory, and provides untimed access methods. What was previously called PhysicalMemory is now SimpleMemory, and a subclass of AbstractMemory. All future types of memory controllers should inherit from AbstractMemory. To enable e.g. the atomic CPU and RubyPort to access the now distributed memory, the system has a wrapper class, called PhysicalMemory that is aware of all the memories in the system and their associated address ranges. This class thus acts as an infinitely-fast bus and performs address decoding for these "shortcut" accesses. Each memory can specify that it should not be part of the global address map (used e.g. by the functional memories by some testers). Moreover, each memory can be configured to be reported to the OS configuration table, useful for populating ATAG structures, and any potential ACPI tables. Checkpointing support currently assumes that all memories have the same size and organisation when creating and resuming from the checkpoint. A future patch will enable a more flexible re-organisation. --HG-- rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py rename : src/mem/physical.cc => src/mem/abstract_mem.cc rename : src/mem/physical.hh => src/mem/abstract_mem.hh rename : src/mem/physical.cc => src/mem/simple_mem.cc rename : src/mem/physical.hh => src/mem/simple_mem.hh	2012-04-06 13:46:31 -04:00
Tushar Krishna	dbe1608fd5	NetworkTest: remove unnecessary memory allocation	2012-04-05 17:51:26 -04:00
Andreas Hansson	a8e6adb0b1	Atomic: Remove the physmem_port and access memory directly This patch removes the physmem_port from the Atomic CPU and instead uses the system pointer to access the physmem when using the fastmem option. The system already keeps track of the physmem and the valid memory address ranges, and with this patch we merely make use of that existing functionality. As a result of this change, the overloaded getMasterPort in the Atomic CPU can be removed, thus unifying the CPUs.	2012-04-03 03:50:14 -04:00
William Wang	f9d403a7b9	MEM: Introduce the master/slave port sub-classes in C++ This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects. The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches. The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal. The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again.	2012-03-30 09:40:11 -04:00
Andreas Hansson	a14013af3a	CPU: Unify initMemProxies across CPUs and simulation modes This patch unifies where initMemProxies is called, in the init() method of each BaseCPU subclass, before TheISA::initCPU is called. Moreover, it also ensures that initMemProxies is called in both full-system and syscall-emulation mode, thus unifying also across the modes. An additional check is added in the ThreadState to ensure that initMemProxies is only called once.	2012-03-30 09:38:35 -04:00
Andreas Hansson	fb395b56dd	Scons: Remove Werror=False in SConscript files This patch removes the overriding of "-Werror" in a handful of cases. The code compiles with gcc 4.6.3 and clang 3.0 without any warnings, and thus without any errors. There are no functional changes introduced by this patch. In the future, rather than ypassing "-Werror", address the warnings.	2012-03-22 06:34:50 -04:00
Andrew Lukefahr	b4e5be717d	O3: Fix sizing of decode to rename skid buffer.	2012-03-21 10:34:06 -05:00
Brian Grayson	565c1de4a8	O3: Fix size of skid buffer between fetch and decode when widths are different	2012-03-21 10:34:05 -05:00
Andreas Hansson	72538294fb	gcc: Clean-up of non-C++0x compliant code, first steps This patch cleans up a number of minor issues aiming to get closer to compliance with the C++0x standard as interpreted by gcc and clang (compile with std=c++0x and -pedantic-errors). In particular, the patch cleans up enums where the last item was succeded by a comma, namespaces closed by a curcly brace followed by a semi-colon, and the use of the GNU-extension typeof (replaced by templated functions). It does not address variable-length arrays, zero-size arrays, anonymous structs, range expressions in switch statements, and the use of long long. The generated CPU code also has a large number of issues that remain to be fixed, mainly related to overflows in implicit constant conversion (due to shifts).	2012-03-19 06:36:09 -04:00
Andreas Hansson	adb8621031	clang: Fix recently introduced clang compilation errors This patch makes the code compile with clang 2.9 and 3.0 again by making two very minor changes. Firt, it maintains a strict typing in the forward declaration of the BaseCPUParams. Second, it adds a FullSystemInt flag of the type unsigned int next to the boolean FullSystem flag. The FullSystemInt variable can be used in decode-statements (expands to switch statements) in the instruction decoder.	2012-03-19 06:35:04 -04:00
Brian Grayson	98185658c5	O3: Add fatal when fetchWidth > Impl::MaxWidth.	2012-03-11 10:20:54 -04:00
Geoffrey Blake	69d229ce28	O3/Ozone: Eliminate dead code counting software prefetch insts Eliminates dead code in the O3 and Ozone CPU models that counted software prefetch instructions separately for the ALPHA ISA only.	2012-03-09 09:59:28 -05:00
Geoffrey Blake	98cf57fb89	CheckerCPU: Add function stubs to non-ARM ISA source to compile with CheckerCPU Making the CheckerCPU a runtime time option requires the code to be compatible with ISAs other than ARM. This patch adds the appropriate function stubs to allow compilation.	2012-03-09 09:59:28 -05:00
Geoffrey Blake	043709fdfa	CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes.	2012-03-09 09:59:27 -05:00
Steve Reinhardt	fd2d5ae2af	DynInst: get rid of dead MyHash code. Not sure what this was ever used for, but it doesn't seem used anymore.	2012-03-02 09:17:42 -08:00
Andreas Hansson	32eae8094d	CPU: Check that the interrupt controller is created when needed This patch adds a creation-time check to the CPU to ensure that the interrupt controller is created for the cases where it is needed, i.e. if the CPU is not being switched in later and not a checker CPU. The patch also adds the "createInterruptController" call to a number of the regression scripts.	2012-03-02 09:21:48 -05:00
Nilay Vaish	c80af04d7d	x86: Fix switching of CPUs This patch prevents creation of interrupt controller for cpus that will be switched in later	2012-03-01 11:37:02 -06:00
Andreas Hansson	86c2aad482	Ruby: Simplify tester ports by not using SimpleTimingPort This patch simplfies the master ports used by RubyDirectedTester and RubyTester by avoiding the use of SimpleTimingPort. Neither tester made any use of the functionality offered by SimpleTimingPort besides a trivial implementation of recvFunctional (only snoops) and recvRangeChange (not relevant since there is only one master). The patch does not change or add any functionality, it merely makes the introduction of a master/slave port easier (in a future patch).	2012-02-24 11:48:48 -05:00
Andreas Hansson	485d103255	MEM: Move all read/write blob functions from Port to PortProxy This patch moves the readBlob/writeBlob/memsetBlob from the Port class to the PortProxy class, thus making a clear separation of the basic port functionality (recv/send functional/atomic/timing), and the higher-level functional accessors available on the port proxies. There are only a few places in the code base where the blob functions were used on ports, and they are all for peeking into the memory system without making a normal memory access (in the memtest, and the malta and tsunami pchip). The memtest also exemplifies how easy it is to create a non-translating proxy if desired. The malta and tsunami pchip used a slave port to perform a functional read, and this is now changed to rely on the physProxy of the system (to which they already have a pointer).	2012-02-24 11:46:39 -05:00
Andreas Hansson	9e3c8de30b	MEM: Make port proxies use references rather than pointers This patch is adding a clearer design intent to all objects that would not be complete without a port proxy by making the proxies members rathen than dynamically allocated. In essence, if NULL would not be a valid value for the proxy, then we avoid using a pointer to make this clear. The same approach is used for the methods using these proxies, such as loadSections, that now use references rather than pointers to better reflect the fact that NULL would not be an acceptable value (in fact the code would break and that is how this patch started out). Overall the concept of "using a reference to express unconditional composition where a NULL pointer is never valid" could be done on a much broader scale throughout the code base, but for now it is only done in the locations affected by the proxies.	2012-02-24 11:45:30 -05:00
Andreas Hansson	1031b824b9	MEM: Move port creation to the memory object(s) construction This patch moves all port creation from the getPort method to be consistently done in the MemObject's constructor. This is possible thanks to the Swig interface passing the length of the vector ports. Previously there was a mix of: 1) creating the ports as members (at object construction time) and using getPort for the name resolution, or 2) dynamically creating the ports in the getPort call. This is now uniform. Furthermore, objects that would not be complete without a port have these ports as members rather than having pointers to dynamically allocated ports. This patch also enables an elaboration-time enumeration of all the ports in the system which can be used to determine the masterId.	2012-02-24 11:43:53 -05:00
Andreas Hansson	9f07d2ce7e	CPU: Round-two unifying instr/data CPU ports across models This patch continues the unification of how the different CPU models create and share their instruction and data ports. Most importantly, it forces every CPU to have an instruction and a data port, and gives these ports explicit getters in the BaseCPU (getDataPort and getInstPort). The patch helps in simplifying the code, make assumptions more explicit, andfurther ease future patches related to the CPU ports. The biggest changes are in the in-order model (that was not modified in the previous unification patch), which now moves the ports from the CacheUnit to the CPU. It also distinguishes the instruction fetch and load-store unit from the rest of the resources, and avoids the use of indices and casting in favour of keeping track of these two units explicitly (since they are always there anyways). The atomic, timing and O3 model simply return references to their already existing ports.	2012-02-24 11:42:00 -05:00
Mrinmoy Ghosh	9b05e96b9e	BPred: Fix RAS to handle predicated call/return instructions. Change RAS to fix issues with predicated call/return instructions. Handled all cases in the life of a predicated call and return instruction.	2012-02-13 12:26:25 -06:00
Mrinmoy Ghosh	fd90c3676d	BP: Fix several Branch Predictor issues. 1. Updates the Branch Predictor correctly to the state just after a mispredicted branch, if a squash occurs. 2. If a BTB does not find an entry, the branch is predicted not taken. The global history is modified to correctly reflect this prediction. 3. Local history is now updated at the fetch stage instead of execute stage. 4. In the Update stage of the branch predictor the local predictors are now correctly updated according to the state of local history during fetch stage. This patch also improves performance by as much as 17% on some benchmarks	2012-02-13 12:26:24 -06:00
Andreas Hansson	5a9a743cfc	MEM: Introduce the master/slave port roles in the Python classes This patch classifies all ports in Python as either Master or Slave and enforces a binding of master to slave. Conceptually, a master (such as a CPU or DMA port) issues requests, and receives responses, and conversely, a slave (such as a memory or a PIO device) receives requests and sends back responses. Currently there is no differentiation between coherent and non-coherent masters and slaves. The classification as master/slave also involves splitting the dual role port of the bus into a master and slave port and updating all the system assembly scripts to use the appropriate port. Similarly, the interrupt devices have to have their int_port split into a master and slave port. The intdev and its children have minimal changes to facilitate the extra port. Note that this patch does not enforce any port typing in the C++ world, it merely ensures that the Python objects have a notion of the port roles and are connected in an appropriate manner. This check is carried when two ports are connected, e.g. bus.master = memory.port. The following patches will make use of the classifications and specialise the C++ ports into masters and slaves.	2012-02-13 06:43:09 -05:00
Anthony Gutierrez	542d0ceebc	cpu: add separate stats for insts/ops both globally and per cpu model	2012-02-12 16:07:39 -06:00
Ali Saidi	8aaa39e93d	mem: Add a master ID to each request object. This change adds a master id to each request object which can be used identify every device in the system that is capable of issuing a request. This is part of the way to removing the numCpus+1 stats in the cache and replacing them with the master ids. This is one of a series of changes that make way for the stats output to be changed to python.	2012-02-12 16:07:38 -06:00
Nilay Vaish	6a7a6263e1	O3 CPU: Improve handling of delayed commit flag The delayed commit flag is used in conjunction with interrupt pending flag to figure out whether or not fetch stage should get more instructions. This patch clears this flag when instructions are squashed. Also, in case an interrupt is pending, currently it is not possible to access the instruction cache. This patch allows accessing the cache in case this flag is set.	2012-02-10 08:37:31 -06:00
Nilay Vaish	cd765c23a2	O3 CPU: Strengthen condition for handling interrupts The condition for handling interrupts is to check whether or not the cpu's instruction list is empty. As observed, this can lead to cases in which even though the instruction list is empty, interrupts are handled when they should not be. The condition is being strengthened so that interrupts get handled only when the last committed microop did not had IsDelayedCommit set.	2012-02-10 08:37:30 -06:00
Nilay Vaish	8f7e03d4cf	O3 CPU: Provide the squashing instruction This patch adds a function to the ROB that will get the squashing instruction from the ROB's list of instructions. This squashing instruction is used for figuring out the macroop from which the fetch stage should fetch the microops. Further, a check has been added that if the instructions are to be fetched from the cache maintained by the fetch stage, then the data in the cache should be valid and the PC of the thread being fetched from is same as the address of the cache block.	2012-02-10 08:37:28 -06:00
Nilay Vaish	0e597e944a	O3 Fetch: Check if PC is pointing to Microcode ROM	2012-02-10 08:37:26 -06:00
Gabe Black	e80ebc308f	SE/FS: Record the system pointer all the time for the simple CPU. This pointer was only being stored in code that came from SE mode. The system pointer is always meaningful and available, so it should always be stored.	2012-02-10 02:05:31 -08:00
Gabe Black	a6246bb047	Checker: Access workload element 0 only if there is an element 0.	2012-02-07 04:44:01 -08:00
Gabe Black	f2b46fdb85	Faults: Turn off arch/faults.hh Because there are no longer architecture independent but specialized functions in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA no longer needs to be able to include them through the switching header file arch/faults.hh. By removing that header file (arch/faults.hh), the potential interface between ISA code and non ISA code is narrowed.	2012-02-07 04:43:21 -08:00
Gabe Black	ea8b347dc5	Merge with head, hopefully the last time for this batch.	2012-01-31 22:40:08 -08:00
Koan-Sin Tan	7d4f187700	clang: Enable compiling gem5 using clang 2.9 and 3.0 This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places.	2012-01-31 12:05:52 -05:00
Andreas Hansson	4fdecae443	Thread: Use inherited baseCpu rather than cpu in SimpleThread This patch is a trivial simplification, removing the cpu pointer from SimpleThread and relying on the baseCpu pointer in ThreadState. The patch does not add or change any functionality, it merely cleans up the code.	2012-01-31 11:50:07 -05:00
Geoffrey Blake	af6aaf2581	CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5 Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification.	2012-01-31 07:46:03 -08:00
Gabe Black	e88165a431	Merge with main repository.	2012-01-30 21:07:57 -08:00
Andreas Hansson	ef9fc01073	MEM: Clean-up of Functional/Virtual/TranslatingPort remnants This patch cleans up forward declarations and a member-function prototype that still referred to the old FunctionalPort, VirtualPort and TranslatingPort. There is no change in functionality.	2012-01-30 03:44:25 -05:00
Gabe Black	39f314cc15	Yet another merge with the main repository. --HG-- rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/config.ini => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/config.ini rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/simout => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/simout rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt rename : tests/long/10.linux-boot/ref/x86/linux/pc-o3-timing/system.pc.com_1.terminal => tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/system.pc.com_1.terminal rename : tests/long/00.gzip/ref/x86/linux/o3-timing/config.ini => tests/long/se/00.gzip/ref/x86/linux/o3-timing/config.ini rename : tests/long/00.gzip/ref/x86/linux/o3-timing/simout => tests/long/se/00.gzip/ref/x86/linux/o3-timing/simout rename : tests/long/00.gzip/ref/x86/linux/o3-timing/stats.txt => tests/long/se/00.gzip/ref/x86/linux/o3-timing/stats.txt rename : tests/long/10.mcf/ref/x86/linux/o3-timing/config.ini => tests/long/se/10.mcf/ref/x86/linux/o3-timing/config.ini rename : tests/long/10.mcf/ref/x86/linux/o3-timing/simout => tests/long/se/10.mcf/ref/x86/linux/o3-timing/simout rename : tests/long/10.mcf/ref/x86/linux/o3-timing/stats.txt => tests/long/se/10.mcf/ref/x86/linux/o3-timing/stats.txt rename : tests/long/20.parser/ref/x86/linux/o3-timing/config.ini => tests/long/se/20.parser/ref/x86/linux/o3-timing/config.ini rename : tests/long/20.parser/ref/x86/linux/o3-timing/simout => tests/long/se/20.parser/ref/x86/linux/o3-timing/simout rename : tests/long/20.parser/ref/x86/linux/o3-timing/stats.txt => tests/long/se/20.parser/ref/x86/linux/o3-timing/stats.txt rename : tests/long/70.twolf/ref/x86/linux/o3-timing/config.ini => tests/long/se/70.twolf/ref/x86/linux/o3-timing/config.ini rename : tests/long/70.twolf/ref/x86/linux/o3-timing/simout => tests/long/se/70.twolf/ref/x86/linux/o3-timing/simout rename : tests/long/70.twolf/ref/x86/linux/o3-timing/stats.txt => tests/long/se/70.twolf/ref/x86/linux/o3-timing/stats.txt rename : tests/quick/00.hello/ref/x86/linux/o3-timing/config.ini => tests/quick/se/00.hello/ref/x86/linux/o3-timing/config.ini rename : tests/quick/00.hello/ref/x86/linux/o3-timing/simout => tests/quick/se/00.hello/ref/x86/linux/o3-timing/simout rename : tests/quick/00.hello/ref/x86/linux/o3-timing/stats.txt => tests/quick/se/00.hello/ref/x86/linux/o3-timing/stats.txt	2012-01-29 03:27:15 -08:00
Gabe Black	dc0e629ea1	Implement Ali's review feedback. Try to decrease indentation, and remove some redundant FullSystem checks.	2012-01-29 02:04:34 -08:00
Nilay Vaish	5c2fc35e02	O3 CPU LSQ: Implement TSO This patch makes O3's LSQ maintain total order between stores. Essentially only the store at the head of the store buffer is allowed to be in flight. Only after that store completes, the next store is issued to the memory system. By default, the x86 architecture will have TSO.	2012-01-28 19:09:04 -06:00
Gabe Black	c3d41a2def	Merge with the main repo. --HG-- rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh	2012-01-28 07:24:01 -08:00
Gabe Black	da2a4acc26	Merge yet again with the main repository.	2012-01-16 04:27:10 -08:00
Andreas Hansson	07cf9d914b	MEM: Separate queries for snooping and address ranges This patch simplifies the address-range determination mechanism and also unifies the naming across ports and devices. It further splits the queries for determining if a port is snooping and what address ranges it responds to (aiming towards a separation of cache-maintenance ports and pure memory-mapped ports). Default behaviours are such that most ports do not have to define isSnooping, and master ports need not implement getAddrRanges.	2012-01-17 12:55:09 -06:00
Andreas Hansson	de34e49d15	MEM: Simplify ports by removing EventManager This patch removes the inheritance of EventManager from the ports and moves all responsibility for event queues to the owner. Eventually the event manager should be the interface block, which could either be the structural owner or a subblock like a LSQ in the O3 CPU for example.	2012-01-17 12:55:09 -06:00
Andreas Hansson	b3f930c884	CPU: Moving towards a more general port across CPU models This patch performs minimal changes to move the instruction and data ports from specialised subclasses to the base CPU (to the largest degree possible). Ultimately it servers to make the CPU(s) have a well-defined interface to the memory sub-system.	2012-01-17 12:55:08 -06:00
Andreas Hansson	f85286b3de	MEM: Add port proxies instead of non-structural ports Port proxies are used to replace non-structural ports, and thus enable all ports in the system to correspond to a structural entity. This has the advantage of accessing memory through the normal memory subsystem and thus allowing any constellation of distributed memories, address maps, etc. Most accesses are done through the "system port" that is used for loading binaries, debugging etc. For the entities that belong to the CPU, e.g. threads and thread contexts, they wrap the CPU data port in a port proxy. The following replacements are made: FunctionalPort > PortProxy TranslatingPort > SETranslatingPortProxy VirtualPort > FSTranslatingPortProxy --HG-- rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh	2012-01-17 12:55:08 -06:00
Maximilien Breughe	a7394ad680	inorder: MDU deadlock fix	2012-01-12 10:15:00 -05:00
Nilay Vaish	9957035a42	DPRINTF: Improve some dprintf messages.	2012-01-10 10:15:02 -06:00
Anders Handler	b587d511c3	CPU: Remove Alpha-specific PC alignment check.	2012-01-09 20:05:07 -05:00
Ali Saidi	525d1e46dc	O3: Remove some asserts that no longer seem to be valid.	2012-01-09 18:08:20 -06:00
Ali Saidi	d2c26f402c	O3: Add support of function tracing with O3 CPU.	2012-01-09 18:08:20 -06:00
Andreas Hansson	c2dbfc1d6c	MAC: Make gem5 compile and run on MacOSX 10.7.2 Adaptations to make gem5 compile and run on OSX 10.7.2, with a stock gcc 4.2.1 and the remaining dependencies from macports, i.e. python 2.7,.2 swig 2.0.4, mercurial 2.0. The changes include an adaptation of the SConstruct to handle non-library linker flags, and Darwin-specific code to find the memory usage of gem5. A number of Ruby files relied on ambigious uint (without the 32 suffix) which caused compilation errors.	2012-01-09 18:08:20 -06:00
Gabe Black	241cc0c840	Another merge with the main repository.	2012-01-07 02:16:37 -08:00
Gabe Black	ec936364b7	Merge with the main repository again.	2012-01-07 02:15:35 -08:00
Gabe Black	36a822f08e	Merge with main repository.	2012-01-07 02:10:34 -08:00
Nathan Binkert	6ef9691035	gcc: fix unused variable warnings from GCC 4.6.1 --HG-- extra : rebase_source : f9e22de341493a25ac6106c16ac35c61c128a080	2011-12-13 11:49:27 -08:00
Chris Emmons	5bde1d359f	Output: Add hierarchical output support and cleanup existing codebase. --HG-- extra : rebase_source : 3301137733cdf5fdb471d56ef7990e7a3a865442	2011-12-01 00:15:25 -08:00
Chander Sudanthi	61c14da751	O3: Remove hardcoded tgts_per_mshr in O3CPU.py. There are two lines in O3CPU.py that set the dcache and icache tgts_per_mshr to 20, ignoring any pre-configured value of tgts_per_mshr. This patch removes these hardcoded lines from O3CPU.py and sets the default L1 cache mshr targets to 20. --HG-- extra : rebase_source : 6f92d950e90496a3102967442814e97dc84db08b	2011-12-01 00:15:22 -08:00
Ali Saidi	946f7f0f55	ARM: Add support for having a TLB cache. --HG-- extra : rebase_source : 7a5780ab74d7c294682738c7ccb3ce8d56c6fd63	2011-12-01 00:15:22 -08:00
Ali Saidi	1444103998	O3: Add stat that counts how many cycles the O3 cpu was quiesced. --HG-- extra : rebase_source : 043b9307eef3c5b87f8e6370765641e016ed1fa7	2011-12-01 00:15:22 -08:00
Gabe Black	85424bef19	SE/FS: Get rid of includes of config/full_system.hh.	2011-11-18 02:20:22 -08:00
Gabe Black	de21bb93ea	SE/FS: Get rid of FULL_SYSTEM in the CPU directory.	2011-11-18 01:33:28 -08:00
Nilay Vaish	a547cf34b9	Ruby: Remove some unused typedefs This patch removes some of the unused typedefs. It also moves some of the typedefs from Global.hh to TypeDefines.hh. The patch also eliminates the file NodeID.hh.	2011-11-03 22:46:45 -05:00
Gabe Black	8b4a3f4070	SE/FS: Get rid of FULL_SYSTEM in sim.	2011-11-02 02:11:14 -07:00
Gabe Black	b6da5e2086	SE/FS: Get rid of uses of FULL_SYSTEM in Alpha.	2011-11-01 04:01:14 -07:00
Gabe Black	1268e0df1f	SE/FS: Expose the same methods on the CPUs in SE and FS modes.	2011-11-01 04:01:13 -07:00
Gabe Black	8ad2b8c559	SE/FS: Make the functions available from the TC consistent between SE and FS.	2011-10-31 02:58:22 -07:00
Gabe Black	d735abe5da	GCC: Get everything working with gcc 4.6.1. And by "everything" I mean all the quick regressions.	2011-10-31 01:09:44 -07:00
Gabe Black	facb40f3ff	SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs.	2011-10-30 00:33:02 -07:00
Gabe Black	5b433568f0	SE/FS: Build the base process class in FS.	2011-10-30 00:32:54 -07:00
Gabe Black	464c485d0c	SE/FS: Include getMemPort in FS.	2011-10-16 05:06:40 -07:00
Gabe Black	3595b0c5a1	SE/FS: Build/expose vport in SE mode.	2011-10-16 05:06:39 -07:00
Gabe Black	b2af015b97	ARM: Turn on the page table walker on ARM in SE mode.	2011-10-16 05:06:38 -07:00
Gabe Black	e8e9f97312	CPU: Make physPort and getPhysPort available in SE mode.	2011-10-16 02:59:53 -07:00
Gabe Black	8adc6781bf	X86: Turn on the page table walker in SE mode.	2011-10-13 02:22:23 -07:00
Gabe Black	f338d60930	SE/FS: Build the Interrupt objects in SE mode.	2011-10-09 00:15:50 -07:00
Gabe Black	51f7a66660	SE/FS: Build the devices in SE mode.	2011-09-30 00:28:33 -07:00
Gabe Black	4fcf8e9959	O3: Tidy up some DPRINTFs in the LSQ.	2011-09-27 00:25:26 -07:00
Gabe Black	44ed4849d4	Faults: Replace calls to genMachineCheckFault with M5PanicFault.	2011-09-27 00:24:43 -07:00
Nilay Vaish	56bddab189	LSQ: Moved a couple of lines to enable O3 + Ruby This patch makes O3 CPU work along with the Ruby memory model. Ruby overwrites the senderState pointer with another pointer. The pointer is restored only when Ruby gets done with the packet. LSQ makes use of senderState just after sendTiming() returns. But the dynamic_cast returns a NULL pointer since Ruby's senderState pointer is from a different class. Storing the senderState pointer before calling sendTiming() does away with the problem.	2011-09-26 12:18:32 -05:00
Steve Reinhardt	84f0a1bd91	event: minor cleanup Initialize flags via the Event constructor instead of calling setFlags() in the body of the derived class's constructor. I forget exactly why, but this made life easier when implementing multi-queue support. Also rename Event::getFlags() to isFlagSet() to better match common usage, and get rid of some unused Event methods.	2011-09-22 18:59:55 -07:00
Gabe Black	10c2e37f60	Syscall: Make the syscall function available in both SE and FS modes. In FS mode the syscall function will panic, but the interface will be consistent and code which calls syscall can be compiled in. This will allow, for instance, instructions that use syscall to be built unconditionally but then not returned by the decoder.	2011-09-19 02:46:48 -07:00
Ali Saidi	649c239cee	LSQ: Only trigger a memory violation with a load/load if the value changes. Only create a memory ordering violation when the value could have changed between two subsequent loads, instead of just when loads go out-of-order to the same address. While not very common in the case of Alpha, with an architecture with a hardware table walker this can happen reasonably frequently beacuse a translation will miss and start a table walk and before the CPU re-schedules the faulting instruction another one will pass it to the same address (or cache block depending on the dendency checking). This patch has been tested with a couple of self-checking hand crafted programs to stress ordering between two cores. The performance improvement on SPEC benchmarks can be substantial (2-10%).	2011-09-13 12:58:08 -04:00
Gabe Black	49a7ed0397	StaticInst: Merge StaticInst and StaticInstBase. Having two StaticInst classes, one nominally ISA dependent and the other ISA dependent, has not been historically useful and makes the StaticInst class more complicated that it needs to be. This change merges StaticInstBase into StaticInst.	2011-09-09 02:40:11 -07:00
Gabe Black	b7b545bc38	Decode: Pull instruction decoding out of the StaticInst class into its own. This change pulls the instruction decoding machinery (including caches) out of the StaticInst class and puts it into its own class. This has a few intrinsic benefits. First, the StaticInst code, which has gotten to be quite large, gets simpler. Second, the code that handles decode caching is now separated out into its own component and can be looked at in isolation, making it easier to understand. I took the opportunity to restructure the code a bit which will hopefully also help. Beyond that, this change also lays some ground work for each ISA to have its own, potentially stateful decode object. We'd be able to include less contextualizing information in the ExtMachInst objects since that context would be applied at the decoder. Also, the decoder could "know" ahead of time that all the instructions it's going to see are going to be, for instance, 64 bit mode, and it will have one less thing to check when it decodes them. Because the decode caching mechanism has been separated out, it's now possible to have multiple caches which correspond to different types of decoding context. Having one cache for each element of the cross product of different configurations may become prohibitive, so it may be desirable to clear out the cache when relatively static state changes and not to have one for each setting. Because the decode function is no longer universally accessible as a static member of the StaticInst class, a new function was added to the ThreadContexts that returns the applicable decode object.	2011-09-09 02:30:01 -07:00
Ali Saidi	b6203360ef	LSQ: Set store predictor to periodically clear itself as recommended in the storesets paper. This patch improves performance by as much as 10% on some spec benchmarks.	2011-08-19 15:08:07 -05:00
Geoffrey Blake	5f425b8bd1	Fix bugs due to interaction between SEV instructions and O3 pipeline SEV instructions were originally implemented to cause asynchronous squashes via the generateTCSquash() function in the O3 pipeline when updating the SEV_MAILBOX miscReg. This caused race conditions between CPUs in an MP system that would lead to a pipeline either going inactive indefinitely or not being able to commit squashed instructions. Fixed SEV instructions to behave like interrupts and cause synchronous sqaushes inside the pipeline, eliminating the race conditions. Also fixed up the semantics of the WFE instruction to behave as documented in the ARMv7 ISA description to not sleep if SEV_MAILBOX=1 or unmasked interrupts are pending.	2011-08-19 15:08:07 -05:00
Mrinmoy Ghosh	d0e0485902	LSQ: Add some better dprintfs for storeset predictor.	2011-08-19 15:08:05 -05:00
Mrinmoy Ghosh	0db95030fc	LSQ: Fix a few issues with the storeset predictor. Two issues are fixed in this patch: 1. The load and store pc passed to the predictor are passed in reverse order. 2. The flag indicating that a barrier is inflight was never cleared when the barrier was squashed instead of committed. This made all load insts dependent on a non-existent barrier in-flight.	2011-08-19 15:08:05 -05:00
Giacomo Gabrielli	676a530b77	O3: Squash the violator and younger instructions instead not all insts. Change the way instructions are squashed on memory ordering violations to squash the violator and younger instructions, not all instructions that are younger than the instruction they violated (no reason to throw away valid work).	2011-08-19 15:08:05 -05:00
Gabe Black	f2c89a01d1	InOrder: Make cache_unit.hh include hashmap.hh explicitly, not transitively.	2011-08-16 02:47:15 -07:00
Gabe Black	78a4636a13	O3: Make lsq_unit.hh include arch/isa_traits.hh directly, not transitively.	2011-08-16 02:46:57 -07:00
Gabe Black	0e6dc00497	O3: When squashing, restore the macroop that should be used for fetching.	2011-08-14 17:41:34 -07:00
Gabe Black	ec204f003c	O3: Add a pointer to the macroop for a microop in the dyninst.	2011-08-14 04:08:14 -07:00
Gabe Black	e0043f8dbe	O3: At the end of an instruction, force fetchAddr to something sensible. It's possible (though until now very unlikely) for fetchAddr to get out of sync with the actual PC of the current instruction. This change forcefull resets fetchAddr at the end of every instruction.	2011-08-13 13:36:37 -07:00
Gabe Black	96df6bedb7	O3: Stop using the current macroop no matter why you're leaving it. Until now, the only reason a macroop would be left was because it ended at a microop marked as the last microop. In O3 with branch prediction, it's possible for the branch predictor to have entries which originally came from different instructions which happened to have the same RIP. This could theoretically happen in many ways, but it was encountered specifically when different programs in different address spaces ran one after the other in X86_FS. What would happen in that case was that the macroop would continue to be looped over and microops fetched from it until it reached the last microop even though the macropc had moved out from under it. If things lined up properly, this could mean that the end bytes of an instruction actually fell into the instruction sized block of memory after the one in the predecoder. The fetch loop implicitly assumes that the last instruction sized chunk of memory processed was the last one needed for the instruction it just finished executing. It would then tell the predecoder to move to an offset within the bytes it was given that is larger than those bytes, and that would trip an assert in the x86 predecoder. This change fixes this problem by making fetch stop processing the current macroop if the address it should be fetching from changed when the PC is updated. That happens when the last microop was reached because the instruction handled it properly, and it also catches the case where the branch predictor makes fetch do a macro level branch when it shouldn't. The check of isLastMicroop is retained because otherwise, a macroop that branches back to itself would act like a single, long macroop instead of multiple instances of the same microop. There may be situations (which may turn out to be purely hypothetical) where that matters. This also fixes a relatively minor issue where the curMacroop variable would be set to NULL immediately after seeing that a microop was the last one before curMacroop was used to build the dyninst. The traceData structure would have a NULL pointer to the macroop for that microop.	2011-08-09 11:30:43 -07:00
Gabe Black	3989f41261	O3: When waiting to handle an interrupt, let everything drain out. Before this change, the commit stage would wait until the ROB and store queue were empty before recognizing an interrupt. The fetch stage would stop generating instructions at an appropriate point, so commit would then wait until a valid time to interrupt the instruction stream. Instructions might be in flight after fetch but not the in the ROB or store queue (in rename, for instance), so this change makes commit wait until all in flight instructions are finished.	2011-08-09 03:37:43 -07:00
Nilay Vaish	821dfc1289	BuildEnv: Eliminate RUBY as build environment variable This patch replaces RUBY with PROTOCOL in all the SConscript files as the environment variable that decides whether or not certain components of the simulator are compiled.	2011-08-08 10:50:13 -05:00
Gabe Black	5c0e6e6092	O3: Get rid of the unused addToRemoveList function.	2011-08-07 15:41:10 -07:00
Gabe Black	a9b7931156	O3: Let squashed and deferred instructions issue. Let squahsed and deferred instructions issue so they don't accumulate and clog up the CPU.	2011-08-07 15:41:07 -07:00
Ali Saidi	4d83b8a799	O3: Fix uninitialized variable in the tournament branch predictor.	2011-08-07 09:21:49 -07:00
Gabe Black	16882b0483	Translation: Use a pointer type as the template argument. This allows regular pointers and reference counted pointers without having to use any shim structures or other tricks.	2011-08-07 09:21:48 -07:00
Gabe Black	6230668f5e	O3: Get rid of the raw ExtMachInst constructor on DynInsts. This constructor assumes that the ExtMachInst can be decoded directly into a StaticInst that's useful to execute. With the advent of microcoded instructions that's no longer true.	2011-08-02 11:51:16 -07:00
Gabe Black	206c2e9a0e	O3: Implement memory mapped IPRs for O3.	2011-07-31 19:21:17 -07:00
Gabe Black	a42c6ae48d	O3: Fix corner case squashing into the microcode ROM. When fetching from the microcode ROM, if the PC is set so that it isn't in the cache block that's been fetched the CPU will get stuck. The fetch stage notices that it's in the ROM so it doesn't try to fetch from the current PC. It then later notices that it's outside of the current cache block so it skips generating instructions expecting to continue once the right bytes have been fetched. This change lets the fetch stage attempt to generate instructions, and only checks if the bytes it's going to use are valid if it's really going to use them.	2011-07-30 23:22:53 -07:00
Giacomo Gabrielli	69ef57fd0f	O3: Create a pipeline activity viewer for the O3 CPU model. Implemented a pipeline activity viewer as a python script (util/o3-pipeview.py) and modified O3 code base to support an extra trace flag (O3PipeView) for generating traces to be used as inputs by the tool.	2011-07-15 11:53:35 -05:00
Mrinmoy Ghosh	3396fd9e84	Branch predictor: Fixes the tournament branch predictor. Branch predictor could not predict a branch in a nested loop because: 1. The global history was not updated after a mispredict squash. 2. The global history was updated in the fetch stage. The choice predictors that were updated used the changed global history. This is incorrect, as it incorporates the state of global history after the branch in encountered. Fixed update to choice predictor using the global history state before the branch happened. 3. The global predictor table was also updated using the global history state before the branch happened as above. Additionally, parameters to initialize ctr and history size were reversed.	2011-07-10 12:56:08 -05:00
Geoffrey Blake	c7e7b89058	O3: Fix up pipelining icache accesses in fetch stage to function properly Fixed up the patch from Yasuko Watanabe that enabled pipelining of fetch accessess to icache to work with recent changes to main repository. Also added in ability for fetch stage to delay issuing the fault carrying nop when a pipeline fetch causes a fault and no fetch bandwidth is available until the next cycle.	2011-07-10 12:56:08 -05:00
Ali Saidi	60579e8d74	O3: Make sure fetch doesn't go off into the weeds during speculation.	2011-07-10 12:56:08 -05:00
Gabe Black	3a1428365a	ExecContext: Rename the readBytes/writeBytes functions to readMem and writeMem. readBytes and writeBytes had the word "bytes" in their names because they accessed blobs of bytes. This distinguished them from the read and write functions which handled higher level data types. Because those functions don't exist any more, this change renames readBytes and writeBytes to more general names, readMem and writeMem, which reflect the fact that they are how you read and write memory. This also makes their names more consistent with the register reading/writing functions, although those are still read and set for some reason.	2011-07-02 22:35:04 -07:00
Gabe Black	2e7426664a	ExecContext: Get rid of the now unused read/write templated functions.	2011-07-02 22:34:58 -07:00
Brad Beckmann ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E)	c86f849d5a	Ruby: Add support for functional accesses This patch rpovides functional access support in Ruby. Currently only the M5Port of RubyPort supports functional accesses. The support for functional through the PioPort will be added as a separate patch.	2011-06-30 19:49:26 -05:00
Gabe Black	affad29932	InOder: Fix a compile error.	2011-06-20 02:29:14 -07:00
Korey Sewell	477e7039b3	inorder: clear reg. dep entry after removing from list this will safeguard future code from trying to remove from the list twice. That code wouldnt break but would waste time.	2011-06-19 21:43:42 -04:00
Korey Sewell	b963b339b9	inorder: se: squash after syscalls	2011-06-19 21:43:42 -04:00
Korey Sewell	eedd04e894	inorder: cleanup dprintfs in cache unit	2011-06-19 21:43:42 -04:00

... 3 4 5 6 7 ...

1551 commits