sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Brandon Potter	6ad29ba6df	base: add new ChunkGenerator method to identify last chunk	2015-04-22 07:51:27 -07:00
Andreas Hansson	cd76e34056	cpu: Remove the InOrderCPU from the tree This patch takes the final step in removing the InOrderCPU from the tree. Rest in peace. The MinorCPU is now used to model an in-order microarchitecture, and long term the MinorCPU will eventually be renamed InOrderCPU.	2015-04-20 12:46:35 -04:00
Malek Musleh	826f69b470	config, cpu: fix progress interval for switched CPUs This patch ensures that the CPU progress Event is triggered for the new set of switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids printing the interval state if the cpu is currently switched out. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-14 11:01:10 -05:00
Dibakar Gope	34ad1123ee	cpu: re-organizes the branch predictor structure. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-13 17:33:57 -05:00
Nilay Vaish	e596e52498	x86: implements x87 mult/div instructions	2015-04-13 17:33:57 -05:00
Lena Olson	dea7acdb3e	ruby: allow restoring from checkpoint when using DRAMCtrl Restoring from a checkpoint with ruby + the DRAMCtrl memory model was not working, because ruby and DRAMCtrl disagreed on the current tick during warmup. Since there is no reason to do timing requests during warmup, use functional requests instead. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-13 17:33:57 -05:00
Nilay Vaish	d6af46915c	sim: Use NULL instead of None for testing filenames. The filenames are initialized with NULL. So the test should be checking for them to be == NULL instead == None.	2015-04-13 17:33:57 -05:00
Nilay Vaish	b26fef8466	sim: fix function for emulating dup() The function was using the host fd to obtain the fd object from the simulated process.	2015-04-13 17:33:57 -05:00
Curtis Dunham	c3268f8820	config: Support full-system with SST's memory system This patch adds an example configuration in ext/sst/tests/ that allows an SST/gem5 instance to simulate a 4-core AArch64 system with SST's memHierarchy components providing all the caches and memories.	2015-04-08 15:56:06 -05:00
Nikos Nikoleris	4bdbdd8413	dev: (un)serialize fix for the RTC and RTC Timer Interrupt events Restoring from a checkpoint fails if either the RTC or the RTC Timer Interrrupt event is disabled. The restored machine tried incorrectly to schedule the next event with negative offset. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Ruslan Bukin	bebab7f24f	sim: correct check for endianess Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Ruslan Bukin	b3314673f4	dev: Extend access width for IDE control registers Add 32-bit access width for PrimaryTiming register and 16bit for UDMAControl register as FreeBSD required. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Nikos Nikoleris	305e29b98e	cpu: fix system total instructions accounting The totalInstructions counter is only incremented when the whole instruction is commited and not on every microop. It was incorrectly reset in atomic and timing cpus. Committed by: Nilay Vaish <nilay@cs.wisc.edu>"	2015-04-03 11:42:10 -05:00
Lena Olson	333988a73e	x86: fix debug trace output for mwait When running with the Exec flag, the mwait instruction attempted to print out its source registers, which were never actually initialized. This led to sporadic assertion failures when the value stored there was invalid. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Stephan Diestelhorst	cb8856f580	mem: Support any number of master-IDs in stride prefetcher The stride prefetcher had a hardcoded number of contexts (i.e. master-IDs) that it could handle. Since master IDs need to be unique per system, and every core, cache etc. requires a separate master port, a static limit on these does not make much sense. Instead, this patch adds a small hash map that will map all master IDs to the right prefetch state and dynamically allocates new state for new master IDs.	2015-03-27 04:56:03 -04:00
Andreas Hansson	0197e580e5	mem: Allocate cache writebacks before new MSHRs This patch changes the order of writeback allocation such that any writebacks resulting from a tag lookup (e.g. for an uncacheable access), are added to the writebuffer before any new MSHR entries are allocated. This ensures that the writebacks logically precedes the new allocations. The patch also changes the uncacheable flush to use proper timed (or atomic) writebacks, as opposed to functional writes.	2015-03-27 04:56:02 -04:00
Andreas Hansson	24763c2177	mem: Cleanup flow for uncacheable accesses This patch simplifies the code dealing with uncacheable timing accesses, aiming to align it with the existing miss handling. Similar to what we do in atomic, a timing request now goes through Cache::access (where the block is also flushed), and then proceeds to ignore any existing MSHR for the block in question. This unifies the flow for cacheable and uncacheable accesses, and for atomic and timing.	2015-03-27 04:56:01 -04:00
Andreas Hansson	a7a1e6004a	mem: Ignore uncacheable MSHRs when finding matches This patch changes how we search for matching MSHRs, ignoring any MSHR that is allocated for an uncacheable access. By doing so, this patch fixes a corner case in the MSHRs where incorrect data ended up being copied into a (cacheable) read packet due to a first uncacheable MSHR target of size 4, followed by a cacheable target to the same MSHR of size 64. The latter target was filled with nonsense data.	2015-03-27 04:56:00 -04:00
Andreas Hansson	801ce65eae	mem: Remove redundant allocateUncachedReadBuffer in cache This patch removes the no-longer-needed allocateUncachedReadBuffer. Besides the checks it is exactly the same as allocateMissBuffer and thus provides no value.	2015-03-27 04:55:59 -04:00
Andreas Hansson	fe806a0dd7	mem: Modernise MSHR iterators to C++11 This patch updates the iterators in the MSHR and MSHR queues to use C++11 range-based for loops. It also does a bit of additional house keeping.	2015-03-27 04:55:57 -04:00
Andreas Hansson	7bae98459c	mem: Align all MSHR entries to block boundaries This patch aligns all MSHR queue entries to block boundaries to simplify checks for matches. Previously there were corner cases that could lead to existing entries not being identified as matches. There are, rather alarmingly, a few regressions that change with this patch.	2015-03-27 04:55:55 -04:00
Ali Jafri	15f0d9ff14	mem: Rename PREFETCH_SNOOP_SQUASH flag to BLOCK_CACHED This patch subsumes the PREFETCH_SNOOP_SQUASH flag with the more generic BLOCK_CACHED flag. Future patches implementing cache eviction messages can use the BLOCK_CACHED flag in almost the same manner as hardware prefetches use the PREFETCH_SNOOP_SQUASH flag. The PREFTECH_SNOOP_FLAG is set if the prefetch target is found in the tags or the MSHRs in any state, so we are simply replacing calls to setPrefetchSquashed() with setBlockCached(). The case of where the prefetch target is found in the writeback MSHRs of upper level caches continues to be covered by the MEM_INHIBIT flag.	2015-03-27 04:55:54 -04:00
Curtis Dunham	a1164154de	sim: Update limit_event reuse to final version Matching final version on reviewboard.	2015-03-26 11:16:44 -04:00
Andreas Hansson	a196dbe3bf	cpu: Fix InstPBTrace inheritance This patch fixes an issue that prevented gem5 to be built with C++ config and without Python.	2015-03-26 11:16:43 -04:00
Steve Reinhardt	6677b9122a	mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now.	2015-03-23 16:14:20 -07:00
Steve Reinhardt	5302305255	misc: quote args in echoed command line Currently if there are shell special characters in a command-line argument, you can't copy and paste the echoed command line onto a shell prompt because the characters aren't quoted properly. This patch fixes that problem.	2015-03-23 16:14:18 -07:00
Curtis Dunham	564482c782	sim: Reuse the same limit_event in simulate() This patch accomplishes two things: 1. Makes simulate()'s GlobalSimLoopExitEvent a singleton reused across calls. This is slightly more efficient than recreating it every time. 2. Gives callers to simulate() (especially other simulators) a foolproof way of knowing that the simulation period ended successfully by hitting the limit event. They can call getLimitEvent() and compare it to the return value of simulate(). This change was motivated by an ongoing effort to integrate gem5 and SST, with SST as the master sim and gem5 as the slave sim.	2015-03-23 06:57:36 -04:00
Andreas Hansson	45286d9b64	mem: Tidy up Request This patch does a bit of house keeping, fixing up typos, removing dead code etc.	2015-03-23 06:57:34 -04:00
Matt Evans	1ccc3d7e5b	arm: Add a GICv2m device This patch adds a new PIO-accessible GICv2m shim. This shim has a PIO slave port on one side, and SPI 'wires' on the other. It accepts MSIs from the system and triggers SPIs on the GIC. It is configurable with a number of frames, each of which has a number of SPIs and a base SPI offset. A Linux driver for GICv2m is available upstream.	2015-03-19 04:06:17 -04:00
Matt Evans	ec80224188	arm: Remove the 'magic MSI register' in the GIC (PL390) This patch removes the code that added this magic register. A follow-up patch provides a GICv2m MSI shim that gives the same functionality in a standard ARM system architecture way.	2015-03-19 04:06:16 -04:00
Wendy Elsasser	9b4d8030e6	cpu: Fix TrafficGen message format Fix erroneous message format for fatal error. Previously, code did not have type indicator (% instead of %d). Also removed redundant fatal check. Ran modified sweep.py with in range and out of range values to test.	2015-03-19 04:06:12 -04:00
Andreas Hansson	5275c9d740	mem: Use emplace front/back for deferred packets Embrace C++11 for the deferred packets as we actually store the objects in the data structure, and not just pointers.	2015-03-19 04:06:11 -04:00
Geoffrey Blake	1d403960af	mem: Enable CommMonitor to output traces in atomic mode The CommMonitor by default only allows memory traces to be gathered in timing mode. This patch allows memory traces to be gathered in atomic mode if all one needs is a functional trace of memory addresses used and timing information is of a secondary concern.	2015-03-19 04:06:10 -04:00
Steve Reinhardt	e57ab463cf	mem: remove redundant test in in Cache::recvTimingResp() For some reason we were checking mshr->hasTargets() even though we had already called mshr->getTarget() unconditionally earlier in the same function (which asserts if there are no targets). Get rid of this useless check, and while we're at it get rid of the redundant call to mshr->getTarget(), since we still have the value saved in a local var.	2015-02-11 10:48:53 -08:00
Steve Reinhardt	89bb03a1a6	mem: add local var in Cache::recvTimingResp() The main loop in recvTimingResp() uses target->pkt all over the place. Create a local tgt_pkt to help keep lines under the line length limit.	2015-02-11 10:48:52 -08:00
Steve Reinhardt	ee0b52404c	mem: restructure Packet cmd initialization a bit more Refactor the way that specific MemCmd values are generated for packets. The new approach is a little more elegant in that we assign the right value up front, and it's also more amenable to non-heap-allocated Packet objects. Also replaced the code in the Minor model that was still doing it the ad-hoc way. This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.	2015-02-11 10:48:50 -08:00
Steve Reinhardt	ccef61d1cc	mem: clean up write buffer check in Cache::handleSnoop() The 'if (writebacks.size)' check was redundant, because writeBuffer.findMatches() would return false if the writebacks list was empty. Also renamed 'mshr' to 'wb_entry' in this context since we are pointing at a writebuffer entry and not an MSHR (even though it's the same C++ class).	2015-03-14 06:51:07 -07:00
Nilay Vaish	e5fbc67e16	cpu: o3: another assert instead of check	2015-03-09 09:39:08 -05:00
Nilay Vaish	5003ed5f7a	cpu: o3: Remove unused code in iew, add assert instead.	2015-03-09 09:39:08 -05:00
Nilay Vaish	4e1d10a3cf	cpu: o3: commit: mark pipeline delay variable as consts	2015-03-09 09:39:08 -05:00
Nilay Vaish	53de2512b1	cpu: o3: remove unused stat variables.	2015-03-09 09:39:08 -05:00
Nilay Vaish	54bc67f619	cpu: o3: combine if with same condition	2015-03-09 09:39:07 -05:00
Nilay Vaish	61edd5ac97	cpu: o3: remove member variable squashCounter The variable is used in only one place and a whole new function setNextStatus() has been defined just to compute the value of the variable. Instead of calling the function, the value is now computed in the loop that preceded the function call.	2015-03-09 09:39:07 -05:00
Nilay Vaish	f69a74fda6	cpu: o3: remove unused function annotateMemoryUnits()	2015-03-09 09:39:07 -05:00
Andreas Hansson	fc315901ff	mem: Unify all cache DPRINTF address formatting This patch changes all the DPRINTF messages in the cache to use '%#llx' every time a packet address is printed. The inclusion of '#' ensures '0x' is prepended, and since the address type is a uint64_t %x really should be %llx.	2015-03-02 04:00:56 -05:00
Andreas Hansson	88e2963951	mem: Fix cache MSHR conflict determination This patch fixes a rather subtle issue in the sending of MSHR requests in the cache, where the logic previously did not check for conflicts between the MSRH queue and the write queue when requests were not ready. The correct thing to do is to always check, since not having a ready MSHR does not guarantee that there is no conflict. The underlying problem seems to have slipped past due to the symmetric timings used for the write queue and MSHR queue. However, with the recent timing changes the bug caused regressions to fail.	2015-03-02 04:00:54 -05:00
Andreas Hansson	407737614e	mem: Add byte mask to Packet::checkFunctional This patch changes the valid-bytes start/end to a proper byte mask. With the changes in timing introduced in previous patches there are more packets waiting in queues, and there are regressions using the checker CPU failing due to non-contigous read data being found in the various cache queues. This patch also adds some more comments explaining what is going on, and adds the fourth and missing case to Packet::checkFunctional.	2015-03-02 04:00:52 -05:00
Stephan Diestelhorst	ecef1612b8	mem: Add option to force in-order insertion in PacketQueue By default, the packet queue is ordered by the ticks of the to-be-sent packages. With the recent modifications of packages sinking their header time when their resposne leaves the caches, there could be cases of MSHR targets being allocated and ordered A, B, but their responses being sent out in the order B,A. This led to inconsistencies in bus traffic, in particular the snoop filter observing first a ReadExResp and later a ReadRespWithInv. Logically, these were ordered the other way around behind the MSHR, but due to the timing adjustments when inserting into the PacketQueue, they were sent out in the wrong order on the bus, confusing the snoop filter. This patch adds a flag (off by default) such that these special cases can request in-order insertion into the packet queue, which might offset timing slighty. This is expected to occur rarely and not affect timing results.	2015-03-02 04:00:49 -05:00
Marco Balboni	d4ef8368aa	mem: Downstream components consumes new crossbar delays This patch makes the caches and memory controllers consume the delay that is annotated to a packet by the crossbar. Previously many components simply threw these delays away. Note that the devices still do not pay for these delays.	2015-03-02 04:00:48 -05:00
Andreas Hansson	36dc93a5fa	mem: Move crossbar default latencies to subclasses This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios. Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden.	2015-03-02 04:00:47 -05:00
Marco Balboni	d35dd71ab4	mem: Add crossbar latencies This patch introduces latencies in crossbar that were neglected before. In particular, it adds three parameters in crossbar model: front_end_latency, forward_latency, and response_latency. Along with these parameters, three corresponding members are added: frontEndLatency, forwardLatency, and responseLatency. The coherent crossbar has an additional snoop_response_latency. The latency of the request path through the xbar is set as --> frontEndLatency + forwardLatency In case the snoop filter is enabled, the request path latency is charged also by look-up latency of the snoop filter. --> frontEndLatency + SF(lookupLatency) + forwardLatency. The latency of the response path through the xbar is set instead as --> responseLatency. In case of snoop response, if the response is treated as a normal response the latency associated is again --> responseLatency; If instead it is forwarded as snoop response we add an additional variable + snoopResponseLatency and the latency associated is --> snoopResponseLatency; Furthermore, this patch lets the crossbar progress on the next clock edge after an unused retry, changing the time the crossbar considers itself busy after sending a retry that was not acted upon.	2015-03-02 04:00:46 -05:00
Andreas Sandberg	7be9d4eb67	dev, arm: Clean up PL011 and rewrite interrupt handling The ARM PL011 UART model didn't clear and raise interrupts correctly. This changeset rewrites the whole interrupt handling and makes it both simpler and fixes several cases where the correct interrupts weren't raised or cleared. Additionally, it cleans up many other aspects of the code.	2015-03-02 04:00:44 -05:00
Andreas Hansson	d64b34bef8	arm: Share a port for the two table walker objects This patch changes how the MMU and table walkers are created such that a single port is used to connect the MMU and the TLBs to the memory system. Previously two ports were needed as there are two table walker objects (stage one and stage two), and they both had a port. Now the port itself is moved to the Stage2MMU, and each TableWalker is simply using the port from the parent. By using the same port we also remove the need for having an additional crossbar joining the two ports before the walker cache or the L2. This simplifies the creation of the CPU cache topology in BaseCPU.py considerably. Moreover, for naming and symmetry reasons, the TLB walker port is connected through the stage-one table walker thus making the naming identical to x86. Along the same line, we use the stage-one table walker to generate the master id that is used by all TLB-related requests.	2015-03-02 04:00:42 -05:00
Giacomo Gabrielli	bd70db5521	arm: Remove unnecessary dependencies between AArch64 FP instructions	2015-03-02 04:00:41 -05:00
Rekai	3d5434022a	cpu: o3 register renaming request handling improved Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not.	2015-03-02 04:00:38 -05:00
Andreas Hansson	987de4f5cc	mem: Tidy up the cache debug messages Avoid redundant inclusion of the name in the DPRINTF string.	2015-03-02 04:00:37 -05:00
Andreas Hansson	f26a289295	mem: Split port retry for all different packet classes This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.	2015-03-02 04:00:35 -05:00
Ali Jafri	6ebe8d863a	mem: Fix prefetchSquash + memInhibitAsserted bug This patch resolves a bug with hardware prefetches. Before a hardware prefetch is sent towards the memory, the system generates a snoop request to check all caches above the prefetch generating cache for the presence of the prefetth target. If the prefetch target is found in the tags or the MSHRs of the upper caches, the cache sets the prefetchSquashed flag in the snoop packet. When the snoop packet returns with the prefetchSquashed flag set, the prefetch generating cache deallocates the MSHR reserved for the prefetch. If the prefetch target is found in the writeback buffer of the upper cache, the cache sets the memInhibit flag, which signals the prefetch generating cache to expect the data from the writeback. When the snoop packet returns with the memInhibitAsserted flag set, it marks the allocated MSHR as inService and waits for the data from the writeback. If the prefetch target is found in multiple upper level caches, specifically in the tags or MSHRs of one upper level cache and the writeback buffer of another, the snoop packet will return with both prefetchSquashed and memInhibitAsserted set, while the current code is not written to handle such an outcome. Current code checks for the prefetchSquashed flag first, if it finds the flag, it deallocates the reserved MSHR. This leads to assert failure when the data from the writeback appears at cache. In this fix, we simply switch the order of checks. We first check for memInhibitAsserted and then for prefetch squashed.	2015-03-02 04:00:34 -05:00
Stephan Diestelhorst	de46eeade7	cpu: Add a PC-value to the traffic generator requests Have the traffic generator add its masterID as the PC address to the requests. That way, prefetchers (and other components) that use a PC for request classification will see per-tester streams of requests. This enables us to test strided prefetchers with the memchecker, too.	2015-03-02 04:00:31 -05:00
Andreas Sandberg	3b4ae7debb	arm: Don't truncate 16-bit ASIDs to 8 bits The ISA code sometimes stores 16-bit ASIDs as 8-bit unsigned integers and has a couple of inverted checks that mask out the high 8 bits of an ASID if 16-bit ASIDs have been /enabled/. This changeset fixes both of those issues.	2015-03-02 04:00:28 -05:00
Andreas Sandberg	804b11a3ed	arm: Correctly access the stack pointer in GDB We curently use INTREG_X31 instead of INTREG_SPX when accessing the stack pointer in GDB. gem5 normally uses INTREG_SPX to access the stack pointer, which gets mapped to the stack pointer corresponding (INTREG_SPn) to the current exception level. This changeset updates the GDB interface to use SPX instead of X31 (which is always zero) when transfering CPU state to gdb.	2015-03-02 04:00:27 -05:00
Andreas Sandberg	34dcd90b61	arm: Fix broken page table permissions checks in remote GDB The remote GDB interface currently doesn't check if translations are valid before reading memory. This causes a panic when GDB tries to access unmapped memory (e.g., when getting a stack trace). There are two reasons for this: 1) The function used to check for valid translations (virtvalid()) doesn't work and panics on invalid translations. 2) The method in the GDB interface used to test if a translation is valid (RemoteGDB::acc) always returns true regardless of the return from virtvalid(). This changeset fixes both of these issues.	2015-03-02 04:00:27 -05:00
Jason Power	670f44e05e	Ruby: Update backing store option to propagate through to all RubyPorts Previously, the user would have to manually set access_backing_store=True on all RubyPorts (Sequencers) in the config files. Now, instead there is one global option that each RubyPort checks on initialization. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-02-26 09:58:26 -06:00
Andreas Hansson	8c78aa31ea	cpu: TrafficGen sinks snoops without complaining To be able to use the TrafficGen in a system with caches we need to allow it to sink incoming snoop requests. By default the master port panics, so silently ignore any snoops.	2015-02-16 03:34:55 -05:00
Stephan Diestelhorst	93fa8e3cd4	mem: Fix initial value problem with MemChecker In highly loaded cases, reads might actually overlap with writes to the initial memory state. The mem checker needs to detect such cases and permit the read reading either from the writes (what it is doing now) or read from the initial, unknown value. This patch adds this logic.	2015-02-16 03:34:47 -05:00
Andreas Hansson	661dac1598	dev: Fix undefined behaviuor in i8254xGBe This patch fixes a rather unfortunate oversight where the annotation pointer was used even though it is null. Somehow the code still works, but UBSan is rather unhappy. The use is now guarded, and the variable is initialised in the constructor (as well as init()).	2015-02-16 03:34:35 -05:00
Andreas Sandberg	0a2ee77616	arm: Wire up the GIC with the platform in the base class Move the (common) GIC initialization code that notifies the platform code of the new GIC to the base class (BaseGic) instead of the Pl390 implementation.	2015-02-16 03:34:18 -05:00
Andreas Hansson	e17328a227	mem: mmap the backing store with MAP_NORESERVE This patch ensures we can run simulations with very large simulated memories (at least 64 TB based on some quick runs on a Linux workstation). In essence this allows us to efficiently deal with sparse address maps without having to implement a redirection layer in the backing store. This opens up for run-time errors if we eventually exhausts the hosts memory and swap space, but this should hopefully never happen.	2015-02-16 03:33:47 -05:00
Andreas Hansson	57758ca685	mem: Use the range cache for lookup as well as access This patch changes the range cache used in the global physical memory to be an iterator so that we can use it not only as part of isMemAddr, but also access and functionalAccess. This matches use-cases where a core is using the atomic non-caching memory mode, and repeatedly calls isMemAddr and access. Linux boot on aarch32, with a single atomic CPU, is now more than 30% faster when using "--fastmem" compared to not using the direct memory access.	2015-02-16 03:33:37 -05:00
Andreas Hansson	d0e1b8a19c	arch: Make readMiscRegNoEffect const throughout Finally took the plunge and made this apply to all ISAs, not just ARM.	2015-02-16 03:33:28 -05:00
Andreas Sandberg	5bfa7e3d59	arm: Merge ISA files with pseudo instructions This changeset moves the pseudo instructions used to signal unknown instructions and unimplemented instructions to the same source files as the decoder fault.	2015-02-16 03:32:58 -05:00
Ali Saidi	4eff4fa12e	cpu: add support for outputing a protobuf formatted CPU trace Doesn't support x86 due to static instruction representation. --HG-- rename : src/cpu/CPUTracers.py => src/cpu/InstPBTrace.py	2015-02-16 03:32:38 -05:00
Marco Balboni	268d9e59c5	mem: Clarification of packet crossbar timings This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.	2015-02-11 10:23:47 -05:00
Marco Balboni	e2828587b3	mem: Clarify usage of latency in the cache This patch adds some much-needed clarity in the specification of the cache timing. For now, hit_latency and response_latency are kept as top-level parameters, but the cache itself has a number of local variables to better map the individual timing variables to different behaviours (and sub-components). The introduced variables are: - lookupLatency: latency of tag lookup, occuring on any access - forwardLatency: latency that occurs in case of outbound miss - fillLatency: latency to fill a cache block We keep the existing responseLatency The forwardLatency is used by allocateInternalBuffer() for: - MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer); - MSHR allocateMissBuffer (cacheable miss in MSHR queue); - MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR queue) It is our assumption that the time for the above three buffers is the same. Similarly, for snoop responses passing through the cache we use forwardLatency.	2015-02-11 10:23:36 -05:00
Andreas Hansson	6563ec8634	cpu: Tidy up the MemTest and make false sharing more obvious The MemTest class really only tests false sharing, and as such there was a lot of old cruft that could be removed. This patch cleans up the tester, and also makes it more clear what the assumptions are. As part of this simplification the reference functional memory is also removed. The regression configs using MemTest are updated to reflect the changes, and the stats will be bumped in a separate patch. The example config will be updated in a separate patch due to more extensive re-work. In a follow-on patch a new tester will be introduced that uses the MemChecker to implement true sharing.	2015-02-11 10:23:28 -05:00
Andreas Sandberg	550c318490	sim: Move the BaseTLB to src/arch/generic/ The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. --HG-- rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py rename : src/sim/tlb.cc => src/arch/generic/tlb.cc rename : src/sim/tlb.hh => src/arch/generic/tlb.hh	2015-02-11 10:23:27 -05:00
Andreas Sandberg	9e6f803254	base: Add compiler macros to add deprecation warnings Gcc and clang both provide an attribute that can be used to flag a function as deprecated at compile time. This changeset adds a gem5 compiler macro for that compiler feature. The macro can be used to indicate that a legacy API within gem5 has been deprecated and provide a graceful migration to the new API.	2015-02-11 10:23:24 -05:00
Andreas Hansson	c9b8616c51	base: Do not dereference NULL in CompoundFlag creation This patch fixes the CompoundFlag constructor, ensuring that it does not dereference NULL. Doing so has undefined behaviuor, and both clang and gcc's undefined-behaviour sanitiser was rather unhappy.	2015-02-11 10:23:23 -05:00
Andreas Sandberg	431a6d708b	dev: Remove unused system pointer in the Platform base class The Platform base class contains a pointer to an instance of the System which is never initialized. This can lead to subtle bugs since some architecture-specific platform implementations contain their own system pointer which is normally used. However, if the platform is accessed through a pointer to its base class, the dangling pointer will be used instead.	2015-02-11 10:23:22 -05:00
Alexandru Dutu	ad1b177550	cpu: Idle CPU status logic revised This patch sets the CPU status to idle when the last active thread gets suspended.	2015-02-06 18:01:22 -08:00
Andreas Hansson	461a80beb3	mem: Clarify express snoop behaviour This patch adds a bit of documentation with insights around how express snoops really work.	2015-02-03 14:26:02 -05:00
Andreas Hansson	193325ff60	mem: Clarify cache behaviour for pending dirty responses This patch adds a bit of clarification around the assumptions made in the cache when packets are sent out, and dirty responses are pending. As part of the change, the marking of an MSHR as in service is simplified slightly, and comments are added to explain what assumptions are made.	2015-02-03 14:25:59 -05:00
Curtis Dunham	f0a764edc6	base: add an accessor and operators ==,!= to address ranges	2015-02-03 14:25:58 -05:00
Andreas Hansson	ccb512ecc1	base: Add XOR-based hashed address interleaving This patch extends the current address interleaving with basic hashing support. Instead of directly comparing a number of address bits with a matching value, it is now possible to use two independent set of address bits XOR'ed together. This avoids issues where strided address patterns are heavily biased to a subset of the interleaved ranges.	2015-02-03 14:25:54 -05:00
Andreas Hansson	5ea60a95b3	config: Adjust DRAM channel interleaving defaults This patch changes the DRAM channel interleaving default behaviour to be more representative. The default address mapping (RoRaBaCoCh) moves the channel bits towards the least significant bits, and uses 128 byte as the default channel interleaving granularity. These defaults can be overridden if desired, but should serve as a sensible starting point for most use-cases.	2015-02-03 14:25:52 -05:00
Andreas Sandberg	fe200c2487	sim: Remove test for non-NULL this in Event The method Event::initialized() tests if this != NULL as a part of the expression that tests if an event is initialized. The only case when this check could be false is if the method is called on a null pointer, which is illegal and leads to undefined behavior (such as eating your pets) according to the C++ standard. Because of this, modern compilers (specifically, recent versions of clang) warn about this which we treat as an error. This changeset removes the redundant check to fix said warning.	2015-02-03 14:25:48 -05:00
Andreas Sandberg	851b29ad20	dev: Correctly clear interrupts in VirtIO PCI Correctly clear the PCI interrupt belonging to a VirtIO device when the ISR register is read.	2015-02-03 14:25:47 -05:00
Curtis Dunham	b89fd57663	sim: prioritize async events; prevent starvation If a time quantum event is the only one in the queue, async events (Ctrl-C, I/O, etc.) will never be processed. So process them first.	2014-12-19 15:32:34 -06:00
Andreas Hansson	20111ba917	cpu: Ensure timing CPU sinks response before sending new request This patch changes how the timing CPU deals with processing responses, always scheduling an event, even if it is for the current tick. This helps to avoid situations where a new request shows up before a response is finished in the crossbar, and also is more in line with any realistic behaviour.	2015-02-03 14:25:27 -05:00
Geoffrey Blake	3e33786db8	config: Fix typo in Float param The Float param was not settable on the command line due to a typo in the class definition in python/m5/params.py. This corrects the typo and allows floats to be set on the command line as intended.	2015-02-03 14:25:07 -05:00
Ali Saidi	89b3616d7e	arm: always set the IsFirstMicroop flag While the IsFirstMicroop flag exists it was only occasionally used in the ARM instructions that gem5 microOps and therefore couldn't be relied on to be correct.	2015-01-25 07:22:56 -05:00
Ali Saidi	9d8ddd92dc	sim: Clean up InstRecord Track memory size and flags as well as add some comments and consts.	2015-01-25 07:22:44 -05:00
Ali Saidi	f6742ea26e	cpu: Remove all notion that we know when the cpu is misspeculating. We have no way of knowing if a CPU model is on the wrong path with our execute-in-execute CPU models. Don't pretend that we do.	2015-01-25 07:22:26 -05:00
Ali Saidi	0bd986015b	cpu: Put all CPU instruction tracers in a single file	2015-01-25 07:22:17 -05:00
Ali Saidi	6c4a23c1c6	cpu: remove legion tracer If someone wants to debug with legion again they can restore the code from the repository, but no need to have it hang around indefinately.	2015-01-25 07:22:05 -05:00
Curtis Dunham	10b5e5431d	sim: fix reference counting of PythonEvent When gem5 is a slave to another simulator and the Python is only used to initialize the configuration (and not perform actual simulation), a "debug start" (--debug-start) event will get freed during or immediately after the initial Python frame's execution rather than remaining in the event queue. This tricky patch fixes the GC issue causing this.	2014-12-23 11:51:40 -06:00
Andreas Hansson	10c69bb168	mem: Remove unused Packet src and dest fields This patch takes the final step in removing the src and dest fields in the packet. These fields were rather confusing in that they only remember a single multiplexing component, and pushed the responsibility to the bridge and caches to store the fields in a senderstate, thus effectively creating a stack. With the recent changes to the crossbar response routing the crossbar is now responsible without relying on the packet fields. Thus, these variables are now unused and can be removed.	2015-01-22 05:01:31 -05:00
Andreas Hansson	15c64035ed	mem: Remove Packet source from ForwardResponseRecord This patch removes the source field from the ForwardResponseRecord, but keeps the class as it is part of how the cache identifies responses to hardware prefetches that are snooped upwards.	2015-01-22 05:01:30 -05:00
Andreas Hansson	0c2ffd2daa	mem: Remove unused RequestState in the bridge This patch removes the bridge sender state as the Crossbar now takes care of remembering its own routing decisions.	2015-01-22 05:01:27 -05:00
Andreas Hansson	00536b0efc	mem: Always use SenderState for response routing in RubyPort This patch aligns how the response routing is done in the RubyPort, using the SenderState for both memory and I/O accesses. Before this patch, only the I/O used the SenderState, whereas the memory accesses relied on the src field in the packet. With this patch we shift to using SenderState in both cases, thus not relying on the src field any longer.	2015-01-22 05:01:24 -05:00
Andreas Hansson	072f78471d	mem: Make the XBar responsible for tracking response routing This patch removes the need for a source and destination field in the packet by shifting the onus of the tracking to the crossbar, much like a real implementation. This change in behaviour also means we no longer need a SenderState to remember the source/dest when ever we have multiple crossbars in the system. Thus, the stack that was created by the SenderState is not needed, and each crossbar locally tracks the response routing. The fields in the packet are still left behind as the RubyPort (which also acts as a crossbar) does routing based on them. In the succeeding patches the uses of the src and dest field will be removed. Combined, these patches improve the simulation performance by roughly 2%.	2015-01-22 05:01:14 -05:00
Andreas Hansson	ce12d4bc63	x86: Delay X86 table walk on receiving walker response This patch fixes a minor issue in the X86 page table walker where it ended up sending new request packets to the crossbar before the response processing was finished (recvTimingResp is directly calling sendTimingReq). Under certain conditions this caused the crossbar to see illegal combinations of request/response overlap, in turn causing problems with a slightly modified crossbar implementation.	2015-01-22 05:00:54 -05:00
Andreas Hansson	f49830ce0b	mem: Clean up Request initialisation This patch tidies up how we create and set the fields of a Request. In essence it tries to use the constructor where possible (as opposed to setPhys and setVirt), thus avoiding spreading the information across a number of locations. In fact, setPhys is made private as part of this patch, and a number of places where we callede setVirt instead uses the appropriate constructor.	2015-01-22 05:00:53 -05:00
Nikos Nikoleris	a35283ac65	cpu: commit probe notification on every microop or macroop The ppCommit should notify the attached listener every time the cpu commits a microop or non microcoded insturction. The listener can then decide whether it will process only the last microop (eg. SimPoint probe). Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-20 14:15:27 -06:00
Andreas Hansson	6096e2f9c1	mem: Fix bug in cache request retry mechanism This patch ensures that inhibited packets that are about to be turned into express snoops do not update the retry flag in the cache.	2015-01-20 08:12:01 -05:00
Andreas Hansson	da0c770943	cpu: Fix retry bug in MinorCPU LSQ	2015-01-20 08:11:58 -05:00
Andreas Hansson	92585d60c9	mem: Move DRAM interleaving check to init This patch fixes a bug where the DRAM controller tried to access the system cacheline size before the system pointer was initialised. It also fixes a bug where the granularity is 0 (no interleaving).	2015-01-20 08:11:55 -05:00
Emilio Castillo	7bb65dd434	x86 : fxsave and fxrestore missing template code This patch corrects the FXSAVE and FXRSTOR Macroops. The actual code used for saving/restore the FP registers is in the file but it was not used. The FXSAVE and FXRSTOR instructions are used in the kernel for saving and loading the state of the mmx,xmm and fpu registers. This operation is triggered in FS by issuing a Device Not Available Fault. The cr0 register has a TS flag that is set upon each context change. Every time a task access any FP related register (SIMD as well) if the TS flag is set to one, the device not available fault is issued. The kernel saves the current state of the registers, and restore the previous state of the currently running task. Right now Gem5 lacks of this capability. the Device Not Available Fault is never issued, leading to several problems when different threads share the same CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this behavior. In order to test this a hack in the atomic cpu code was done to detect if a static instruction has any FP operands and the cr0 reg TS bit is set. This check must be done in the ISA dependent code. But it seems to be tricky to access the cr0 register while executing an instruction. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-10 14:30:53 -06:00
Nikos Nikoleris	ec64b81a9d	cpu: fix RetiredStores probe point Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-10 14:30:53 -06:00
cdirik	1693e526d0	dev: prevent intel 8254 timer counter events firing before startup This change includes edits to Intel8254Timer to prevent counter events firing before startup to comply with SimObject initialization call sequence. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-06 15:10:22 -07:00
Gabe Black	1c1fb2c988	test: Add a unittest for the BitUnion types.	2015-01-07 00:34:40 -08:00
Gabe Black	86dea86987	base: Fix assigning between identical bitfields. If two bitfields are of the same type, also implying that they have the same first and last bit positions, the existing implementation would copy the entire bitfield. That includes the __data member which is shared among all the bitfields, effectively overwritting the entire bitunion. This change also adjusts the write only signed bitfield assignment operator to be like the unsigned version, using "using" instead of implementing it again and calling down to the underlying implementation.	2015-01-07 00:31:46 -08:00
Gabe Black	cd6380605c	x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield. These are for the monitor/mwait instructions, SSSE3, and XSAVE.	2015-01-06 22:15:00 -08:00
Gabe Black	cb181d6f91	cpuid, x86: Revert "Enabling more features in CPUid" That change enables CPUID bits for features that aren't implemented in gem5. If a simulated system tries to use those features because it was told it could, bad things can happen.	2015-01-06 22:13:56 -08:00
Andrew Lukefahr	6d32004407	minor: fixed LSQ MasterPortID Minor was reporting the data cache access as ".inst" accesses. This just switches the MasterPortID to dataMasterPortId. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-03 17:51:48 -06:00
mike upton	cb911559dc	arm: Add unlinkat syscall implementation added ARM aarch64 unlinkat syscall support, modeled on other <xxx>at syscalls. This gets all of the cpu2006 int workloads passing in SE mode on aarch64. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-03 17:51:48 -06:00
Maxime Martinasso	5a5416d575	x86: implements the simd128 ADDSUBPD instruction This patch implements the simd128 ADDSUBPD instruction for the x86 architecture. Tested with a simple program in assembly language which executes the instruction. Checked that different versions of the instruction are executed by using the execution tracing option. Committed by: Nilay Vaish <nilay@cs.wisc.edu	2015-01-03 17:51:48 -06:00
Cagdas Dirik	02c376ac44	dev: prevent RTC events firing before startup This change includes edits to MC146818 timer to prevent RTC events firing before startup to comply with SimObject initialization call sequence. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-03 17:51:48 -06:00
Joel Hestness	642b9b4fab	syscall_emul: Return correct writev value According to Linux man pages, if writev is successful, it returns the total number of bytes written. Otherwise, it returns an error code. Instead of returning 0, return the result from the actual call to writev in the system call.	2014-12-27 13:48:40 -06:00
Mitch Hayenga	b2342c5d9a	mem: Change prefetcher to use random_mt Prefechers has used rand() to generate random numers previously.	2014-12-23 09:31:19 -05:00
Curtis Dunham	516e6046ae	mem: Hide WriteInvalidate requests from prefetchers Without this tweak, a prefetcher will happily prefetch data that will promptly be invalidated and overwritten by a WriteInvalidate.	2014-12-23 09:31:19 -05:00
Mitch Hayenga	bd4f901c77	mem: Fix event scheduling issue for prefetches The cache's MemSidePacketQueue schedules a sendEvent based upon nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever a future prefetch is ready. However, a prefetch being ready does not guarentee that it can obtain an MSHR. So, when all MSHRs are full, the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond until an MSHR is finally freed and the prefetch can happen. This patch fixes this by not signaling the prefetch ready time if the prefetch could not be generated. The event is rescheduled as soon as a MSHR becomes available.	2014-12-23 09:31:18 -05:00
Mitch Hayenga	4acd4a2055	mem: Fix bug relating to writebacks and prefetches Previously the code commented about an unhandled case where it might be possible for a writeback to arrive after a prefetch was generated but before it was sent to the memory system. I hit that case. Luckily the prefetchSquash() logic already in the code handles dropping prefetch request in certian circumstances.	2014-12-23 09:31:18 -05:00
Mitch Hayenga	df82a2d003	mem: Rework the structuring of the prefetchers Re-organizes the prefetcher class structure. Previously the BasePrefetcher forced multiple assumptions on the prefetchers that inherited from it. This patch makes the BasePrefetcher class truly representative of base functionality. For example, the base class no longer enforces FIFO order. Instead, prefetchers with FIFO requests (like the existing stride and tagged prefetchers) now inherit from a new QueuedPrefetcher base class. Finally, the stride-based prefetcher now assumes a custimizable lookup table (sets/ways) rather than the previous fully associative structure.	2014-12-23 09:31:18 -05:00
Mitch Hayenga	6cb58b2bd2	mem: Add parameter to reserve MSHR entries for demand access Adds a new parameter that reserves some number of MSHR entries for demand accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand requests from the CPU to stall.	2014-12-23 09:31:18 -05:00
Curtis Dunham	4d88978913	arm: Add stats to table walker This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion	2014-12-23 09:31:18 -05:00
Andreas Hansson	59460b91f3	config: Expose the DRAM ranks as a command-line option This patch gives the user direct influence over the number of DRAM ranks to make it easier to tune the memory density without affecting the bandwidth (previously the only means of scaling the device count was through the number of channels). The patch also adds some basic sanity checks to ensure that the number of ranks is a power of two (since we rely on bit slices in the address decoding).	2014-12-23 09:31:18 -05:00
Andreas Hansson	2f7baf9dbe	mem: Ensure DRAM controller is idle when in atomic mode This patch addresses an issue seen with the KVM CPU where the refresh events scheduled by the DRAM controller forces the simulator to switch out of the KVM mode, thus killing performance. The current patch works around the fact that we currently have no proper API to inform a SimObject of the mode switches. Instead we rely on drainResume being called after any switch, and cache the previous mode locally to be able to decide on appropriate actions. The switcheroo regression require a minor stats bump as a result.	2014-12-23 09:31:18 -05:00
Omar Naji	381d1da791	mem: Add rank-wise refresh to the DRAM controller This patch adds rank-wise refresh to the controller, as opposed to the channel-wide refresh currently in place. In essence each rank can be refreshed independently, and for this to be possible the controller is extended with a state machine per rank. Without this patch the data bus is always idle during a refresh, as all the ranks are refreshing at the same time. With the rank-wise refresh it is possible to use one rank while another one is refreshing, and thus the data bus can be kept busy. The patch introduces a Rank class to encapsulate the state per rank, and also shifts all the relevant banks, activation tracking etc to the rank. The arbitration is also updated to consider the state of the rank.	2014-12-23 09:31:18 -05:00
Omar Naji	152c02354e	mem: Fix a bug in the DRAM controller arbitration Fix a minor issue that affects multi-rank systems.	2014-12-23 09:31:18 -05:00
Kanishk Sugand	7a25b1a0e0	mem: Add stack distance statistics to the CommMonitor This patch adds the stack distance calculator to the CommMonitor. The stats are disabled by default.	2014-12-23 09:31:18 -05:00
Kanishk Sugand	888975b29d	mem: Add a stack distance calculator This patch adds a stand-alone stack distance calculator. The stack distance calculator is a passive SimObject that observes the addresses passed to it. It calculates stack distances (LRU Distances) of incoming addresses based on the partial sum hierarchy tree algorithm described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043. For each transaction a hashtable look-up is performed. At every non-unique transaction the tree is traversed from the leaf at the returned index to the root, the old node is deleted from the tree, and the sums (to the right) are collected and decremented. The collected sum represets the stack distance of the found node. At every unique transaction the stack distance is returned as numeric_limits<uint64>::max(). In addition to the basic stack distance calculation, a feature to mark an old node in the tree is added. This is useful if it is required to see the reuse pattern. For example, Writebacks to the lower level (e.g. membus from L2), can be marked instead of being removed from the stack (isMarked flag of Node set to True). And then later if this same address is accessed (by L1), the value of the isMarked flag would be True. This gives some insight on how the Writeback policy of the lower level affect the read/write accesses in an application. Debugging is enabled by setting the verify flag to true. Debugging is implemented using a dummy stack that behaves in a naive way, using STL vectors. Note that this has a large impact on run time.	2014-12-23 09:31:18 -05:00
Marco Elver	dd0f3943e2	mem: Add MemChecker and MemCheckerMonitor This patch adds the MemChecker and MemCheckerMonitor classes. While MemChecker can be integrated anywhere in the system and is independent, the most convenient usage is through the MemCheckerMonitor -- this however, puts limitations on where the MemChecker is able to observe read/write transactions.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	184fefbb3b	arm: Raise an alignment fault if a PC has illegal alignment We currently don't handle unaligned PCs correctly. There is one check for unaligned PCs in the TLB when running in aarch64 mode, but this check does not cover cases where the CPU does not do a TLB lookup when decoding an instruction (e.g., a branch stays within the same cache line). Additionally, the Decoder class sometimes throws an assertion for unaligned PCs which breaks speculation. This changeset introduces a decoder fault bit field in the ExtMachInst structure. This field can be used to signal a decoder failure. If set, the decoder generates an internal gem5fault instruction instead of a normal instruction. This instruction in turns either panics (fault type PANIC), returns an PCAlignmentFault (fault type UNALIGNED, aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32). The patch causes minor changes to the realview64 regressions, and a stats bump will follow.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	b33812ba43	arm: Clean up and document decoder API This changeset adds more documentation to the ArmISA::Decoder class and restructures it slightly to make API groups more obvious.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	070b4a81db	arm: Add support for filtering in the PMU This patch adds support for filtering events in the PMU. In order to do so, it updates the ISADevice base class to forward an ISA pointer to ISA devices. This enables such devices to access the MiscReg file to determine the current execution level.	2014-12-23 09:31:17 -05:00
Gabe Black	70eb68beae	Let other objects set up memory like regions in a KVM VM.	2014-12-09 21:53:44 -08:00
Andreas Sandberg	9b7578d8c7	arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0 The aarch64 system register decoder is currently not decoding PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the decoder so that they are decoded using the values in table C5-6 in ARM DDI 0478A.c.	2014-12-08 04:49:53 -05:00
Andreas Sandberg	6a9fbd295d	dev: Add response sanity checks in PioPort Add an assert in the PioPort that checks if a response packet from a device has the right flags set before passing it to them rest of the memory system.	2014-12-08 04:49:52 -05:00
Andreas Sandberg	1ccc4e0e21	dev: Correctly transform packets into responses The VirtIO devices didn't correctly set the response flags in memory packets. This changeset adds the required Packet::makeResponse() calls.	2014-12-08 04:49:51 -05:00
Gabe Black	4a8a0a0798	misc: Generalize GDB single stepping. The new single stepping implementation for x86 doesn't rely on any ISA specific properties or functionality. This change pulls out the per ISA implementation of those functions and promotes the X86 implementation to the base class. One drawback of that implementation is that the CPU might stop on an instruction twice if it's affected by both breakpoints and single stepping. While that might be a little surprising, it's harmless and would only happen under somewhat unlikely circumstances.	2014-12-05 22:37:03 -08:00
Gabe Black	fb07d43b1a	x86: Implement a remote GDB stub. This stub should allow remote debugging of 32 bit and 64 bit targets. Single stepping seems to work, as do breakpoints. If both breakpoints and single stepping affect an instruction, gdb will stop at the instruction twice before continuing. That's a little surprising, but is generally harmless.	2014-12-05 22:36:16 -08:00
Gabe Black	16c9b41616	misc: Add some utility functions for schedule inst commit events. These can be used to simplify the implementation of single step in derived classes.	2014-12-05 22:35:47 -08:00
Gabe Black	cddf988bfd	misc: Rename the GDB "Event" event class to InputEvent. The "Event" name is the same as the base event class. That's a bit confusing, and makes it a little awkward to add other event types.	2014-12-05 22:34:42 -08:00
Gabe Black	f9f46b8fa9	sim: Ensure GDB interrupts the simulation at an instruction boundary. Use the comInstEventQueue to ensure GDB interrupts the simulation at an instruction boundary and not in the middle of a macroop, memory access, etc.	2014-12-05 01:51:49 -08:00
Gabe Black	bacbb8ecbc	cpu: Only check for PC events on instruction boundaries. Only the instruction address is actually checked, so there's no need to check repeatedly while we're working through the microops of a macroop and that's not changing.	2014-12-05 01:47:35 -08:00
Gabe Black	fe48c0a32b	misc: Make the GDB register cache accessible in various sized chunks. Not all ISAs have 64 bit sized registers, so it's not always very convenient to access the GDB register cache in 64 bit sized chunks. This change makes it accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations were working around that limitation by bundling and unbundling 32 bit values into 64 bit values. That code has been removed.	2014-12-05 01:44:24 -08:00
Gabe Black	22aaa5867f	x86: Rework opcode parsing to support 3 byte opcodes properly. Instead of counting the number of opcode bytes in an instruction and recording each byte before the actual opcode, we can represent the path we took to get to the actual opcode byte by using a type code. That has a couple of advantages. First, we can disambiguate the properties of opcodes of the same length which have different properties. Second, it reduces the amount of data stored in an ExtMachInst, making them slightly easier/faster to create and process. This also adds some flexibility as far as how different types of opcodes are handled, which might come in handy if we decide to support VEX or XOP instructions. This change also adds tables to support properly decoding 3 byte opcodes. Before we would fall off the end of some arrays, on top of the ambiguity described above. This change doesn't measureably affect performance on the twolf benchmark. --HG-- rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa	2014-12-04 15:53:54 -08:00
Gabe Black	3069c28a02	arch: Allow named constants as decode case values. The values in a "bitfield" or in an ExtMachInst structure member may not be a literal value, it might select from an arbitrary collection of options. Instead of using the raw value of those constants in the decoder, it's easier to tell what's going on if they can be referred to as a symbolic constant/enum. To support that, the ISA description language is extended slightly so that in addition to integer literals, the case value for decode blobs can also be a string literal. It's up to the ISA author to ensure that the string evaluates to a legal constant value when interpretted as C++.	2014-12-04 15:52:48 -08:00
Gabe Black	d67cf81f5d	x86: Clean up style in process.cc.	2014-12-02 22:01:51 -08:00
Gabe Black	2d9dae01fb	sim: Make it possible to override the breakpoint length check. The check which makes sure the length of the breakpoint being written is the same as a MachInst is only correct on fixed instruction width ISAs. Instead of incorrectly applying that check to all ISAs, this change makes that the default check and lets ISA specific GDB classes override it.	2014-12-03 03:27:19 -08:00
Gabe Black	ecec8cde63	ide: Accept the IDLE (0xe3) ATA command. This command is supposed to set up a timer which will put the drive into a standby mode if it isn't sent a command within a given time out. Since most of the timeouts are generally significantly longer than a simulation would run anyway, and we don't have an implementation for standby mode to begin with, we can accept the command, do nothing, and report success.	2014-12-03 03:07:35 -08:00
Gabe Black	bce58726f3	dev: Support translating left and right ALT keys. This is used primarily for VNC.	2014-12-03 03:06:03 -08:00
Andreas Hansson	966c3f4bc5	scons: Ensure dictionary iteration is sorted by key This patch adds sorting based on the SimObject name or parameter name for all situations where we iterate over dictionaries. This should ensure a deterministic and consistent order across the host systems and hopefully avoid regression results differing across python versions.	2014-12-02 06:08:22 -05:00
Curtis Dunham	5d22250845	mem: Support WriteInvalidate (again) This patch takes a clean-slate approach to providing WriteInvalidate (write streaming, full cache line writes without first reading) support. Unlike the prior attempt, which took an aggressive approach of directly writing into the cache before handling the coherence actions, this approach follows the existing cache flows as closely as possible.	2014-12-02 06:08:19 -05:00
Curtis Dunham	7ca27dd3cc	mem: Remove WriteInvalidate support Prepare for a different implementation following in the next patch	2014-12-02 06:08:17 -05:00
Andrew Bardsley	df37cad0fd	cpu: Fix retries on barrier/store in Minor's store buffer This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling. LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed. For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared. Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry.	2014-12-02 06:08:15 -05:00
Andrew Bardsley	98f3e7a310	cpu: Fix memoryIssueLimit checking in Minor This patch fixes the checking of the number of memory instructions issued per cycles in the Minor CPU.	2014-12-02 06:08:13 -05:00
Andrew Bardsley	3cd0b1f6a6	arm: Fix TLB ignoring faults when table walking This patch fixes a case where the Minor CPU can deadlock due to the lack of a response to TLB request because of a bug in fault handling in the ARM table walker. TableWalker::processWalkWrapper is the scheduler-called wrapper which handles deferred walks which calls to TableWalker::wait cannot immediately process. The handling of faults generated by processWalk{AArch64,LPAE,} calls in those two functions is is different. processWalkWrapper ignores fault returns from processWalk... which can lead to ::finish not being called on a translation. This fix provides fault handling in processWalkWrapper similar to that found in the leaf functions which BaseTLB::Translation::finish.	2014-12-02 06:08:11 -05:00
Marco Elver	9649395f85	cpu, o3: Ignored invalidate causing same-address load reordering In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.	2014-12-02 06:08:03 -05:00
Andreas Hansson	74bbe20141	cpu: Always mask the snoop address when performing lock check Ensure the snoop address check is always using a cache-block aligned address. This patch updates Alpha and Mips to match the other ISAs.	2014-12-02 06:08:00 -05:00
Stephan Diestelhorst	810349a8a7	cpu: Move packet deallocation to recvTimingResp in the O3 CPU Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.	2014-12-02 06:07:58 -05:00
Andreas Hansson	5c84157c29	mem: Relax packet src/dest check and shift onus to crossbar This patch allows objects to get the src/dest of a packet even if it is not set to a valid port id. This simplifies (ab)using the bridge as a buffer and latency adapter in situations where the neighbouring MemObjects are not crossbars. The checks that were done in the packet are now shifted to the crossbar where the fields are used to index into the port arrays. Thus, the carrier of the information is not burdened with checking, and the crossbar can check not only that the destination is set, but also that the port index is within limits.	2014-12-02 06:07:56 -05:00
Andreas Hansson	ea5ccc7041	mem: Clean up packet data allocation This patch attempts to make the rules for data allocation in the packet explicit, understandable, and easy to verify. The constructor that copies a packet is extended with an additional flag "alloc_data" to enable the call site to explicitly say whether the newly created packet is short-lived (a zero-time snoop), or has an unknown life-time and therefore should allocate its own data (or copy a static pointer in the case of static data). The tricky case is the static data. In essence this is a copy-avoidance scheme where the original source of the request (DMA, CPU etc) does not ask the memory system to return data as part of the packet, but instead provides a pointer, and then the memory system carries this pointer around, and copies the appropriate data to the location itself. Thus any derived packet actually never copies any data. As the original source does not copy any data from the response packet when arriving back at the source, we must maintain the copy of the original pointer to not break the system. We might want to revisit this one day and pay the price for a few extra memcpy invocations. All in all this patch should make it easier to grok what is going on in the memory system and how data is actually copied (or not).	2014-12-02 06:07:54 -05:00
Andreas Hansson	f012166bb6	mem: Cleanup Packet::checkFunctional and hasData usage This patch cleans up the use of hasData and checkFunctional in the packet. The hasData function is unfortunately suggesting that it checks if the packet has a valid data pointer, when it does in fact only check if the specific packet type is specified to have a data payload. The confusion led to a bug in checkFunctional. The latter function is also tidied up to avoid name overloading.	2014-12-02 06:07:52 -05:00
Andreas Hansson	a2ee51f631	mem: Make the requests carried by packets const This adds a basic level of sanity checking to the packet by ensuring that a request is not modified once the packet is created. The only issue that had to be worked around is the relaying of software-prefetches in the cache. The specific situation is now solved by first copying the request, and then creating a new packet accordingly.	2014-12-02 06:07:50 -05:00
Andreas Hansson	fa60d5cf27	mem: Make Request getters const This patch tidies up the Request class, making all getters const. The odd one out is incAccessDepth which is called by the memory system as packets carry the request around. This is also const to enable the packet to hold on to a const Request.	2014-12-02 06:07:48 -05:00
Andreas Hansson	3d6ec81e66	mem: Add checks and explanation for assertMemInhibit usage	2014-12-02 06:07:46 -05:00
Andreas Hansson	41846cb61b	mem: Assume all dynamic packet data is array allocated This patch simplifies how we deal with dynamically allocated data in the packet, always assuming that it is array allocated, and hence should be array deallocated (delete[] as opposed to delete). The only uses of dataDynamic was in the Ruby testers. The ARRAY_DATA flag in the packet is removed accordingly. No defragmentation of the flags is done at this point, leaving a gap in the bit masks. As the last part the patch, it renames dataDynamicArray to dataDynamic.	2014-12-02 06:07:43 -05:00
Andreas Hansson	5df96cb690	mem: Remove redundant Packet::allocate calls This patch cleans up the packet memory allocation confusion. The data is always allocated at the requesting side, when a packet is created (or copied), and there is never a need for any device to allocate any space if it is merely responding to a paket. This behaviour is in line with how SystemC and TLM works as well, thus increasing interoperability, and matching established conventions. The redundant calls to Packet::allocate are removed, and the checks in the function are tightened up to make sure data is only ever allocated once. There are still some oddities in the packet copy constructor where we copy the data pointer if it is static (without ownership), and allocate new space if the data is dynamic (with ownership). The latter is being worked on further in a follow-on patch.	2014-12-02 06:07:41 -05:00
Andreas Hansson	0706a25203	mem: Use const pointers for port proxy write functions This patch changes the various write functions in the port proxies to use const pointers for all sources (similar to how memcpy works). The one unfortunate aspect is the need for a const_cast in the packet, to avoid having to juggle a const and a non-const data pointer. This design decision can always be re-evaluated at a later stage.	2014-12-02 06:07:38 -05:00
Andreas Hansson	9779ba2e37	mem: Add const getters for write packet data This patch takes a first step in tightening up how we use the data pointer in write packets. A const getter is added for the pointer itself (getConstPtr), and a number of member functions are also made const accordingly. In a range of places throughout the memory system the new member is used. The patch also removes the unused isReadWrite function.	2014-12-02 06:07:36 -05:00
Andreas Hansson	25bfc24999	mem: Remove null-check bypassing in Packet::getPtr This patch removes the parameter that enables bypassing the null check in the Packet::getPtr method. A number of call sites assume the value to be non-null. The one odd case is the RubyTester, which issues zero-sized prefetches(!), and despite being reads they had no valid data pointer. This is now fixed, but the size oddity remains (unless anyone object or has any good suggestions). Finally, in the Ruby Sequencer, appropriate checks are made for flush packets as they have no valid data pointer.	2014-12-02 06:07:34 -05:00
Omar Naji	0e63d2cd62	mem: Add a GDDR5 DRAM config This patch adds a first cut GDDR5 config to accommodate the users combining gem5 and GPUSim. The config is based on a SK Hynix datasheet, and the Nvidia GTX580 specification. Someone from the GPUSim user-camp should tweak the default page-policy and static frontend and backend latencies.	2014-12-02 06:07:32 -05:00
Andreas Hansson	d66b14ca61	misc: Another round of static analysis fixups Mostly addressing uninitialised members.	2014-11-24 09:03:38 -05:00
Alexandru Dutu	1f539f13c3	mem: Page Table map api modification This patch adds uncacheable/cacheable and read-only/read-write attributes to the map method of PageTableBase. It also modifies the constructor of TlbEntry structs for all architectures to consider the new attributes.	2014-11-23 18:01:09 -08:00
Alexandru Dutu	c11bcb8119	mem: Multi Level Page Table bug fix The multi level page table was giving false positives for already mapped translations. This patch fixes the bogus behavior.	2014-11-23 18:01:09 -08:00
Alexandru Dutu	e4859fae5b	mem: Page Table long lines Trimmed down all the lines greater than 78 characters.	2014-11-23 18:01:09 -08:00
Alexandru Dutu	f743bdcb69	x86: Segment initialization to support KvmCPU in SE This patch sets up low and high privilege code and data segments and places them in the following order: cs low, ds low, ds, cs, in the GDT. Additionally, a syscall and page fault handler for KvmCPU in SE mode are defined. The order of the segment selectors in GDT is required in this manner for interrupt handling to work properly. Segment initialization is done for all the thread contexts.	2014-11-23 18:01:08 -08:00
Alexandru Dutu	adbaa4dfde	kvm, x86: Adding support for SE mode execution This patch adds methods in KvmCPU model to handle KVM exits caused by syscall instructions and page faults. These types of exits will be encountered if KvmCPU is run in SE mode.	2014-11-23 18:01:08 -08:00
Alexandru Dutu	335514dfdc	cpuid, x86: Enabling more features in CPUid Adding more features in the CPUid with the purpose of supporting running the KvmCPU in SE mode.	2014-11-23 18:01:08 -08:00
Gabe Black	8bbfb1b39d	x86: pc: Put a stub IO device at port 0xed which the kernel can use for delays. There was already a stub device at 0x80, the port traditionally used for an IO delay. 0x80 is also the port used for POST codes sent by firmware, and that may have prompted adding this port as a second option.	2014-11-21 17:22:02 -08:00
Gabe Black	b5fd6050a2	dev: Use fixed size member variables to describe fixed size PL111 registers.	2014-11-18 02:38:23 -08:00
Gabe Black	a08cfd797b	vnc: Add a conversion function for bgr888.	2014-11-17 01:45:42 -08:00
Gabe Black	aceeecb192	x86: Fix setting segment bases in real mode. The data size used for actually writing the base value for the segment was the default size, but really it should set the entire value without any possible truncation.	2014-11-17 01:00:53 -08:00
Gabe Black	f8603fa120	x86: Fix some bugs in the real mode far jmp instruction. The far pointer should be shifted right to get the selector value, not left. Also, when calculating the width of the offset, the wrong register was used in one spot.	2014-11-17 00:20:01 -08:00
Gabe Black	7739c24fbe	x86: APIC: Only set deliveryStatus if our IPI is going somewhere. Otherwise the IPI which isn't sent will never arrive, and the deliveryStatus bit will never be cleared.	2014-11-17 00:19:07 -08:00
Gabe Black	79e7ca307e	x86: APIC: Fix the getRegArrayBit function. The getRegArrayBit function extracts a bit from a series of registers which are treated as a single large bit array. A previous change had modified the logic which figured out which bit to extract from ">> 5" to "% 5" which seems wrong, especially when other, similar functions were changed to use "% 32".	2014-11-17 00:17:06 -08:00
Gabe Black	d228db1143	x86: Fix the CPUID Long Mode Address Size function. The value in EAX has an 8 bit field for the linear address size and one for the physical address size when calling that function. A recent change implemented it but returned 0xff for both of those fields. That implies that linear and physical addresses are 255 bits wide which is wrong. When using the KVM CPU model this causes an error, presumably because some of those bits are actually reserved, or the CPU or kernel realizes 255 bits is a bad value. This change makes those values 48.	2014-11-16 23:12:42 -08:00
Andreas Hansson	481eb6ae80	arm: Fixes based on UBSan and static analysis Another churn to clean up undefined behaviour, mostly ARM, but some parts also touching the generic part of the code base. Most of the fixes are simply ensuring that proper intialisation. One of the more subtle changes is the return type of the sign-extension, which is changed to uint64_t. This is to avoid shifting negative values (undefined behaviour) in the ISA code.	2014-11-14 03:53:51 -05:00
Andreas Hansson	9ffe0e7ba6	mem: Clarify unit of DRAM controller buffer size	2014-11-14 03:53:48 -05:00
Mitch Hayenga	9d6d8e02aa	mem: Delete unused variable in Garnet NetworkLink With recent changes OSX clang compilation fails due to an unused variable.	2014-11-12 09:05:23 -05:00
Ali Saidi	b6f32253dd	arm: Fix timing wakeup with LLSC	2014-11-12 09:05:22 -05:00
Andreas Hansson	7d05895120	sim: Sort SimObject descendants and ports This patch fixes a number of occurences where the sorting order of the objects was implementation defined.	2014-11-12 09:05:21 -05:00
Andreas Hansson	cc336ecb5e	base: Revert 9277177eccff and use getenv/setenv for UTC time This patch reverts changeset 9277177eccff which does not do what it was intended to do. In essence, we go back to implementing mkutctime much like the non-standard timegm extension.	2014-11-12 09:05:20 -05:00
Marc Orr	bf80734b2c	x86 isa: This patch attempts an implementation at mwait. Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-11-06 05:42:22 -06:00
Andrew Lukefahr	bd32d55a2c	cpu: Minor Draining Bug Fixes a bug where Minor drains in the midst of committing a conditional store. While committing a conditional store, lastCommitWasEndOfMacroop is true (from the previous instruction) as we still haven't finished the conditional store. If a drain occurs before the cache response, Minor would check just lastCommitWasEndOfMacroop, which was true, and set drainState=DrainHaltFetch, which increases the streamSeqNum. This caused the conditional store to be squashed when the memory responded and it completed. However, to the memory the store succeeded, while to the instruction sequence it never occurred. In the case of an LLSC, the instruction sequence will replay the squashed STREX, which will fail as the cache is no longer in LLSC. Then the instruction sequence will loop back to a LDREX, which receives the updated (incorrect) value. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-11-06 05:42:21 -06:00
Nilay Vaish	0811f21f67	ruby: provide a backing store Ruby's functional accesses are not guaranteed to succeed as of now. While this is not a problem for the protocols that are currently in the mainline repo, it seems that coherence protocols for gpus rely on a backing store to supply the correct data. The aim of this patch is to make this backing store configurable i.e. it comes into play only when a particular option: --access-backing-store is invoked. The backing store has been there since M5 and GEMS were integrated. The only difference is that earlier the system used to maintain the backing store and ruby's copy was write-only. Sometime last year, we moved to data being supplied supplied by ruby in SE mode simulations. And now we have patches on the reviewboard, which remove ruby's copy of memory altogether and rely completely on the system's memory to supply data. This patch adds back a SimpleMemory member to RubySystem. This member is used only if the option: access-backing-store is set to true. By default, the memory would not be accessed.	2014-11-06 05:42:21 -06:00
Nilay Vaish	3022d463fb	ruby: interface with classic memory controller This patch is the final in the series. The whole series and this patch in particular were written with the aim of interfacing ruby's directory controller with the memory controller in the classic memory system. This is being done since ruby's memory controller has not being kept up to date with the changes going on in DRAMs. Classic's memory controller is more up to date and supports multiple different types of DRAM. This also brings classic and ruby ever more close. The patch also changes ruby's memory controller to expose the same interface.	2014-11-06 05:42:21 -06:00
Nilay Vaish	68ddfab8a4	ruby: remove the function functionalReadBuffers() This function was added when I had incorrectly arrived at the conclusion that such a function can improve the chances of a functional read succeeding. As was later realized, this is not possible in the current setup. While the code using this function was dropped long back, this function was not. Hence the patch.	2014-11-06 05:42:20 -06:00

... 2 3 4 5 6 ...

6800 commits