sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Victor Garcia	8427d05daa	kvm, arm: Fix compilation errors due to API changes The checkpoint changes, along with the SMT patches have changed a number of APIs. Adapt the ArmKvmCPU accordingly.	2015-10-29 08:48:23 -04:00
Andreas Hansson	d8b7a652e1	mem: Clarify cache MSHR handling on fill This patch addresses the upgrading of deferred targets in the MSHR, and makes it clearer by explicitly calling out what is happening (deferred targets are promoted if we get exclusivity without asking for it).	2015-10-29 08:48:20 -04:00
Boris Shingarov	58cb57bacc	power: Implement Remote GDB	2015-10-25 16:01:52 -07:00
Andreas Hansson	b48ed9b6c2	x86: Add missing explicit overrides for X86 devices Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.	2015-10-23 09:51:12 -04:00
Andreas Hansson	fa32ad4941	arm: Add missing explicit overrides for ARM devices Make clang >= 3.5 happy when compiling build/ARM/gem5.opt on OSX.	2015-10-23 09:51:11 -04:00
Andreas Hansson	2a1f49fae6	mem: Pass snoop retries through the CommMonitor Allow the monitor to be placed after a snooping port, and do not fail on snoop retries, but instead pass them on to the slave port.	2015-10-14 13:32:28 -04:00
Nilay Vaish	4453537ead	ruby: profiler: provide the number of vnets through ruby system The aim is to ultimately do away with the static function Network::getNumberOfVirtualNetworks().	2015-10-14 00:29:43 -05:00
Nilay Vaish	f1b6d1913c	ruby: remove unused functionalRead() function. Not required since functional reads cannot rely on messages that are inflight.	2015-10-14 00:29:39 -05:00
Nilay Vaish	7defb594b3	ruby: garnet: flexible: refactor flit	2015-10-14 00:29:38 -05:00
Andreas Hansson	2ac04c11ac	misc: Add explicit overrides and fix other clang >= 3.5 issues This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables).	2015-10-12 04:08:01 -04:00
Andreas Hansson	22c04190c6	misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions.	2015-10-12 04:07:59 -04:00
Joel Hestness	1f2e7c1aaa	sim: Don't quiesce UDelayEvents with 0 latency ARM uses UDelayEvents to emulate kernel __udelay functions and speed up simulation. UDelayEvents call Pseudoinst::quiesceNs to quiesce the system for a specified delay. Changeset 10341:0b4d10f53c2d introduced the requirement that any quiesce process that is started must also be completed by scheduling an EndQuiesceEvent. This change causes the CPU to hang if an IsQuiesce instruction is executed, but the corresponding EndQuiesceEvent is not scheduled. Changeset 11058:d0934b57735a introduces a fix for uses of PseudoInst::quiesce that would conditionally execute the EndQuiesceEvent. ARM UDelayEvents specify quiesce period of 0 ns (src/arch/arm/linux/system.cc), so changeset 11058 causes these events to now execute full quiesce processes, greatly increasing the total instructions executed in kernel delay loops and slowing simulation. This patch updates the UDelayEvent to conditionally execute PseudoInst::quiesceNs (a quiesce operation) only if the specified delay is >0 ns. The result is ARM delay loops no longer execute instructions for quiesce handling, and regression time returns to normal.	2015-10-10 16:45:38 -05:00
Rekai Gonzalez Alberquilla	d3d159749a	isa: Add parameter to pick different decoder inside ISA The decoder is responsible for splitting instructions in micro operations (uops). Given that different micro architectures may split operations differently, this patch allows to specify which micro architecture each isa implements, so different cores in the system can split instructions differently, also decoupling uop splitting (microArch) from ISA (Arch). This is done making the decodification calls templates that receive a type 'DecoderFlavour' that maps the name of the operation to the class that implements it. This way there is only one selection point (converting the command line enum to the appropriate DecodeFeatures object). In addition, there is no explicit code replication: template instantiation hides that, and the compiler should be able to resolve a number of things at compile-time.	2015-10-09 14:50:54 -05:00
Dylan Johnson	7624fc1fb4	sim: Add relative break scheduling Add schedRelBreak() function, executable within a debugger, that sets a breakpoint by relative rather than absolute tick.	2015-10-09 14:27:09 -05:00
Steve Reinhardt	90c279e4b1	arch: clean up isa_parser error handling Although some decent error messages were getting generated inside isa_parser.py, they weren't always getting printed because of the screwy way we were handling exceptions. (Basically an inner exception would get hidden by an outer exception, and the more informative inner error message would not get printed.) Also line numbers were messed up, since they were taken from the lexer, which is typically a token (or more) ahead of the grammar rule that's being matched. Using the 'lineno' attribute that PLY associates with the grammar production is more accurate. The new LineTracker class extends lineno to track filenames as well as line numbers.	2015-10-06 17:26:50 -07:00
Steve Reinhardt	2511490c9c	sim: add ExecMacro to Exec* compound debug flags Really should have been there in the first place, IMO. Makes debugging x86 execution a lot easier.	2015-10-06 17:26:50 -07:00
Steve Reinhardt	4b7c1fe610	sim: print pid in output header This information is useful if you have a bunch of simulations running and want to know which one to kill, but you've neglected to take advantage of the previous patch and embed the pid in your output path.	2015-10-06 17:26:50 -07:00
Steve Reinhardt	a2c875c746	x86: implement rcpps and rcpss SSE insts These are packed single-precision approximate reciprocal operations, vector and scalar versions, respectively. This code was basically developed by copying the code for sqrtps and sqrtss. The mrcp micro-op was simplified relative to msqrt since there are no double-precision versions of this operation.	2015-10-06 17:26:50 -07:00
Steve Reinhardt	57b9f53afa	x86: implement fild, fucomi, and fucomip x87 insts fild loads an integer value into the x87 top of stack register. fucomi/fucomip compare two x87 register values (the latter also doing a stack pop). These instructions are used by some versions of GNU libstdc++.	2015-10-06 17:26:50 -07:00
Dylan Johnson	71b1c6ce76	sim: Add ability to break at specific kernel function Adds a GDB callable function that sets a breakpoint at the beginning of a kernel function.	2015-09-02 13:34:19 -05:00
Curtis Dunham	02881a7bf3	base: remove Trace::enabled flag The DTRACE() macro tests both Trace::enabled and the specific flag. This change uses the same administrative interface for enabling/disabling tracing, but masks the SimpleFlags settings directly. This eliminates a load for every DTRACE() test, e.g. DPRINTF.	2015-09-30 15:21:55 -05:00
Mitch Hayenga	ccf4f6c3d7	arm: Change TLB Software Caching In ARM, certain variables are only updated when a necessary change is detected. Having 2 SMT threads share a TLB resulted in these not being updated as required. This patch adds a thread context identifer to assist in the invalidation of these variables.	2015-09-30 11:14:19 -05:00
Mitch Hayenga	9e07a7504c	cpu,isa,mem: Add per-thread wakeup logic Changes wakeup functionality so that only specific threads on SMT capable cpus are woken.	2015-09-30 11:14:19 -05:00
Mitch Hayenga	a5c4eb3de9	isa,cpu: Add support for FS SMT Interrupts Adds per-thread interrupt controllers and thread/context logic so that interrupts properly get routed in SMT systems.	2015-09-30 11:14:19 -05:00
Mitch Hayenga	e255fa053f	arm: SMT MPIDR Setting Changes assignment of the MPIDR for multi-threaded systems only.	2015-09-30 11:14:19 -05:00
Mitch Hayenga	fafa83ed32	cpu: Add per-thread monitors Adds per-thread address monitors to support FullSystem SMT.	2015-09-30 11:14:19 -05:00
Mitch Hayenga	582a0148b4	config,cpu: Add SMT support to Atomic and Timing CPUs Adds SMT support to the "simple" CPU models so that they can be used with other SMT-supported CPUs. Example usage: this enables the TimingSimpleCPU to be used to warmup caches before swapping to detailed mode with the in-order or out-of-order based CPU models.	2015-09-30 11:14:19 -05:00
Mitch Hayenga	52d521e433	cpu: Change thread assignments for heterogenous SMT Trying to run an SE system with varying threads per core (SMT cores + Non-SMT cores) caused failures due to the CPU id assignment logic. The comment about thread assignment (worrying about core 0 not having tid 0) seems not to be valid given that our configuration scripts initialize them in order. This removes that constraint so a heterogenously threaded sytem can work.	2015-09-30 11:14:19 -05:00
Joel Hestness	c05d268cfa	ruby: Fix CacheMemory allocate leak If a cache entry permission was previously set to NotPresent, but the entry was not deleted, a following cache allocation can cause the entry to be leaked by setting the entry pointer to a newly allocated entry. To eliminate this possibility, check if the new entry is different from the old one, and if so, delete the old one.	2015-09-29 09:28:26 -05:00
Joel Hestness	0ecaab4ea8	arch, x86: Delete packet in IntDevice::recvResponse IntDevice::recvResponse is called from two places in current mainline: (1) the short circuit path of X86ISA::IntDevice::IntMasterPort::sendMessage for atomic mode, and (2) the full request->response path to and from the x86 interrupts device (finally called from MessageMasterPort::recvTimingResp). In the former case, the packet was deleted correctly, but in the latter case, the packet and request leak. To fix the leak, move request and packet deletion into IntDevice inherited class implementations of recvResponse.	2015-09-29 09:28:26 -05:00
Joel Hestness	b80024ee7d	ruby: RubyPort delete snoop requests In RubyPort::ruby_eviction_callback, prior changes fixed a memory leak caused by instantiating separate packets for each port that the eviction was forwarded to. That change, however, left the instantiated request to also leak. Allocate it on the stack to avoid the leak.	2015-09-29 09:28:25 -05:00
Joel Hestness	7b70fa02ae	ruby: Fix memory leak in AbstractController Recent changes to memory access queuing allocate requests for packets sent to memory controllers, but did not free the requests. Delete them to avoid leaks.	2015-09-29 09:28:25 -05:00
Joel Hestness	501705eaf0	ruby: RubyMemoryControl delete requests Changes to the RubyMemoryControl removed the dequeue function, which deleted MemoryNode instances. This results in leaked MemoryNode instances. Correctly delete these instances.	2015-09-29 09:25:29 -05:00
Joel Hestness	395b31f518	syscall_emul: Bandage readlink /proc/self/exe The recent changeset to readlink() to handle reading the /proc/self/exe link introduces a number of problems. This patch fixes two: 1) Because readlink() called on /proc/self/exe now uses LiveProcess::progName() to find the binary path, it will only get the zeroth parameter of the simulated system command line. However, if a config script also specifies the process' executable, the executable parameter is used to create the LiveProcess rather than the zeroth command line parameter. Thus, the zeroth command line parameter is not necessarily the correct path to the binary executing in the simulated system. To fix this, add a LiveProcess data member, 'executable', which is correctly set during instantiation and returned from progName(). 2) If a config script allows a user to pass a relative path as the zeroth simulated system command line parameter or process executable, readlink() will incorrecly return a relative path when called on '/proc/self/exe'. /proc/self/exe is always set to a full path, so running benchmarks can fail if a relative path is returned. To fix this, clean up the handling of LiveProcess::progName() within readlink() to get the full binary path. NOTE: This patch still leaves the potential problem that host full path to the binary bleeds into the simulated system, potentially causing the appearance of non-deterministic simulated system execution.	2015-09-29 09:25:20 -05:00
Andreas Hansson	9a0129dcbf	mem: Add PacketInfo to be used for packet probe points This patch fixes a use-after-delete issue in the packet probe points by adding a PacketInfo struct to retain the key fields before passing the packet onwards. We want to probe the packet after it is successfully sent, but by that time the fields may be modified, and the packet may even be deleted. Amazingly enough the issue has gone undetected for months, and only recently popped up in our regressions.	2015-09-25 13:25:34 -04:00
Andreas Hansson	a9a7002a3b	mem: Add check for block status on WriteLineReq fill More checks to help with understanding of functionality.	2015-09-25 07:26:58 -04:00
Andreas Hansson	012dd52dc2	mem: Fix WriteLineReq fill behaviour This patch fixes issues in the interactions between deferred snoops and WriteLineReq. More specifically, the patch addresses an issue where deferred snoops caused assertion failures when being serviced on the arrival of an InvalidateResp. The response packet was perceived to be invalidating, when actually it is not for the cache that sent out the original invalidation request.	2015-09-25 07:26:58 -04:00
Andreas Hansson	5570aa9e9a	mem: Comment clean-up for the snoop filter Merely fixing up some style issues and adding more comments.	2015-09-25 07:26:57 -04:00
Andreas Hansson	7d4e89d4e0	mem: Avoid adding and then removing empty snoop-filter items This patch tidies up how we access the snoop filter for snoops, and avoids adding items only to later remove them.	2015-09-25 07:26:57 -04:00
Andreas Hansson	ca163a80e2	mem: Only track snooping ports in the snoop filter This patch changes the tracking of ports in the snoop filter to use local dense port IDs so that we can have 64 snooping ports (rather than crossbar slave ports). This is achieved by adding a simple remapping vector that translates the actal port IDs into the local slave IDs used in the SnoopMask. Ultimately this patch allows us to scale to much larger systems without introducing a hierarchy of crossbars.	2015-09-25 07:26:57 -04:00
Ali Jafri	3aa87251d7	mem: Add snoop filters to L2 crossbars, and check size This patch adds a snoop filter to the L2XBar. For now we refrain from globally adding a snoop filter to the SystemXBar, since the latter is also used in systems without caches. In scenarios without caches the snoop filter will not see any writeback/clean evicts from the CPU ports, despite the fact that they are snooping. To avoid inadvertent use of the snoop filter in these cases we leave it out for now. A size check is added to the snoop filter, merely to ensure it does not grow beyond the total capacity of the caches above it. The size has to be set manually, and a value of 8 MByte is choosen as suitably high default.	2015-09-25 07:26:57 -04:00
Andreas Hansson	0c5a98f9d1	mem: Store snoop filter lookup result to avoid second lookup This patch introduces a private member storing the iterator from the lookupRequest call, such that it can be re-used when the request eventually finishes. The method previously called updateRequest is renamed finishRequest to make it more clear that the two functions must be called together.	2015-09-25 07:26:57 -04:00
Ali Jafri	ceec2bb02c	mem: Add snoops for CleanEvicts and Writebacks in atomic mode This patch mirrors the logic in timing mode which sends up snoops to check for cached copies before sending CleanEvicts and Writebacks down the memory hierarchy. In case there is a copy in a cache above, discard CleanEvicts and set the BLOCK_CACHED flag in Writebacks so that writebacks do not reset the cache residency bit in the snoop filter below.	2015-09-25 07:26:57 -04:00
Ali Jafri	6ac356f93b	mem: Add CleanEvict and Writeback support to snoop filters This patch adds the functionality to properly track CleanEvicts and Writebacks in the snoop filter. Previously there were no CleanEvicts, and Writebacks did not send up snoops to ensure there were no copies in caches above. Hence a writeback could never erase an entry from the snoop filter. When a CleanEvict message reaches a snoop filter, it confirms that the BLOCK_CACHED flag is not set and resets the bits corresponding to the CleanEvict address and port it arrived on. If none of the other peer caches have (or have requested) the block, the snoop filter forwards the CleanEvict to lower levels of memory. In case of a Writeback message, the snoop filter checks if the BLOCK_CACHED flag is not set and only then resets the bits corresponding to the Writeback address. If any of the other peer caches have (or has requested) the same block, the snoop filter sets the BLOCK_CACHED flag in the Writeback before forwarding it to lower levels of memory heirarachy.	2015-09-25 07:26:57 -04:00
Ali Jafri	79d3dbcea8	mem: Add check for snooping ports in the snoop filter This patch prevents the snoop filter from creating items for requests originating from non-snooping ports. The allocation decision is thus based both on the cacheability of the line, and the snooping status of the source port. Ultimately we should check if the source of the packet is caching, since also the CPU ports are snooping (but not allocating). Thus, at the moment we rely on the snoop filter being used together with caches. The patch also transitions to use the Packet::getBlockAddr in determining the line address.	2015-09-25 07:26:57 -04:00
Andreas Hansson	462c288a75	mem: Make the coherent crossbar account for timing snoops This patch introduces the concept of a snoop latency. Given the requirement to snoop and forward packets in zero time (due to the coherency mechanism), the latency is accounted for later. On a snoop, we establish the latency, and later add it to the header delay of the packet. To allow multiple caches to contribute to the snoop latency, we use a separate variable in the packet, and then take the maximum before adding it to the header delay.	2015-09-25 07:13:54 -04:00
Andreas Hansson	3bd78a141e	mem: Do not include snoop-filter latency in crossbar occupancy This patch ensures that the snoop-filter latency only contributes to the packet latency, and not to the crossbar throughput/occupancy. In essence we treat the snoop-filter lookup as pipelined.	2015-09-25 06:45:52 -04:00
Nilay Vaish	4647e4e961	ruby: simple network: refactor code Drops an unused variable and marks three variables as const.	2015-09-24 08:41:24 -05:00
Nilay Vaish	b3a3b0b6cf	ruby: garnet: refactor code in network links	2015-09-23 11:23:11 -05:00
Nilay Vaish	6bd7aa1f20	ruby: bloom filters: refactor code	2015-09-23 11:23:10 -05:00
Nilay Vaish	c2376918a5	ruby: abstract controller: mark some variables as const	2015-09-23 11:23:10 -05:00
Wendy Elsasser	61c38524ce	mem: Add initial HBM configurations Created the following HBM configurations: 1) HBM gen1 (x128/CH), 2Gb die, 4H stack, 1Gbps, 8 channels 2) HBM gen2 (x64/PC), 8Gb die, 4H stack, 1Gbps, 16 pseudo-channels The configuration values are based on: - The HBM gen1 public JEDEC spec - Publically released data from MemCon presentations - Timing extrapolated from existing LPDDR configurations Will adjust once specs become available.	2015-09-22 13:17:53 -05:00
Nilay Vaish	8975053864	ruby: garnet: mark some variables as const	2015-09-18 13:27:48 -05:00
Nilay Vaish	96c999fe88	ruby: print addresses in hex Changeset 4872dbdea907 replaced Address by Addr, but did not make changes to print statements. So the addresses which were being printed in hex earlier along with their line address, were now being printed in decimals. This patch adds a function printAddress(Addr) that can be used to print the address in hex along with the lines address. This function has been put to use in some of the places. At other places, change has been made to print just the address in hex.	2015-09-18 13:27:47 -05:00
Nilay Vaish	216529bf18	ruby: slicc: derive DataMember class from Var instead of PairContainer The DataMember class in Type.py was being derived from PairContainer. A separate Var object was also created for the DataMember. This meant some duplication of across the members of these two classes (Var and DataMember). This patch changes DataMember from Var instead. There is no obvious reason to derive from PairContainer which can only hold pairs, something that Var class already supports. The only thing that DataMember has over Var is init_code, which is being retained. This change would later on help in having pointers in DataMembers.	2015-09-18 13:27:47 -05:00
Tony Gutierrez	b3eb0d1423	ruby: update WireBuffer API to match that of MessageBuffer this patch updates the WireBuffer API to mirror the changes in revision 11111	2015-09-17 14:00:33 -04:00
Lena Olson	3225379cc0	ruby: Add missing block deallocations in MOESI_hammer Some blocks in MOESI hammer were not getting deallocated when they were set to an idle state (e.g. by invalidate or other_getx/s messages). While functionally correct, this caused some bad effects on performance, such as blocks in I in the L1s getting sent to the L2 upon eviction, in turn evicting valid blocks. Also, if a valid block was in LRU, that block could be evicted rather than a block in I. This patch adds in the missing deallocations. Committed by: Nilay Vaish<nilay@cs.wisc.edu>	2015-09-16 20:18:40 -05:00
Joe Gross	950e431d87	ruby: fix message buffer init order The recent changes to make MessageBuffers SimObjects required them to be initialized in a particular order, which could break some protocols. Fix this by calling initNetQueues on the external nodes of each external link in the constructor of Network. This patch also refactors the duplicated code for checking network allocation and setting net queues (which are called by initNetQueues) from the simple and garnet networks to be in Network.	2015-09-16 13:10:42 -04:00
Nilay Vaish	cd9e445813	ruby: message buffer, timer table: significant changes This patch changes MessageBuffer and TimerTable, two structures used for buffering messages by components in ruby. These structures would no longer maintain pointers to clock objects. Functions in these structures have been changed to take as input current time in Tick. Similarly, these structures will not operate on Cycle valued latencies for different operations. The corresponding functions would need to be provided with these latencies by components invoking the relevant functions. These latencies should also be in Ticks. I felt the need for these changes while trying to speed up ruby. The ultimate aim is to eliminate Consumer class and replace it with an EventManager object in the MessageBuffer and TimerTable classes. This object would be used for scheduling events. The event itself would contain information on the object and function to be invoked. In hindsight, it seems I should have done this while I was moving away from use of a single global clock in the memory system. That change led to introduction of clock objects that replaced the global clock object. It never crossed my mind that having clock object pointers is not a good design. And now I really don't like the fact that we have separate consumer, receiver and sender pointers in message buffers.	2015-09-16 11:59:56 -05:00
Nilay Vaish	78a1245b41	ruby: remove unused function removeRequest()	2015-09-16 11:59:55 -05:00
Nilay Vaish	4b19e06644	ruby: sequencer: remove commented out function printProgress()	2015-09-16 11:59:55 -05:00
David Hashe	b6b972da99	ruby: rename System.{hh,cc} to RubySystem.{hh,cc} The eventual aim of this change is to pass RubySystem pointers through to objects generated from the SLICC protocol code. Because some of these objects need to dereference their RubySystem pointers, they need access to the System.hh header file. In src/mem/ruby/SConscript, the MakeInclude function creates single-line header files in the build directory that do nothing except include the corresponding header file from the source tree. However, SLICC also generates a list of header files from its symbol table, and writes it to mem/protocol/Types.hh in the build directory. This code assumes that the header file name is the same as the class name. The end result of this is the many of the generated slicc files try to include RubySystem.hh, when the file they really need is System.hh. The path of least resistence is just to rename System.hh to RubySystem.hh. --HG-- rename : src/mem/ruby/system/System.cc => src/mem/ruby/system/RubySystem.cc rename : src/mem/ruby/system/System.hh => src/mem/ruby/system/RubySystem.hh	2015-09-16 12:03:03 -04:00
Anthony Gutierrez	3edadb0bd3	slicc: export uint64_t instead of uint64	2015-09-16 12:01:39 -04:00
Palle Lyckegaard	3de9def6c1	sparc: writing to tick_cmpr should not cause a panic This register is writable according to UA2005 Tried to boot NetBSD which starts the kernel by writing to the tick_cmpr register. Without the patch gem5 crashes with a panic. With the patch NetBSD starts to boot normally (although sun4v support in NetBSD is not complete yet) Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-09-15 08:14:07 -05:00
Dongxue Zhang	58ec70444d	dev: IDE Disk: Handle bad IDE image size Handle bad IDE disk image size 0. When image size is 0, gem5 will cause an exception with log "Floating point exception (core dumped)". Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-09-15 08:14:07 -05:00
Andrew Lukefahr	543efd5ca6	cpu: pred: Local Predictor Reset in Tournament Predictor When a branch gets squashed, it's speculative branch predictor state should get rolled back in squash(). However, only the globalHistory state was being rolled back. This patch adds (at least some) support for rolling back the local predictor state also. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-09-15 08:14:07 -05:00
Hongil Yoon	fb0f9884e2	cpu, o3: consider split requests for LSQ checksnoop operations This patch enables instructions in LSQ to track two physical addresses for corresponding two split requests. Later, the information is used in checksnoop() to search for/invalidate the corresponding LD instructions. The current implementation has kept track of only the physical address that is referenced by the first split request. Thus, for checksnoop(), the line accessed by the second request has not been considered, causing potential correctness issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-09-15 08:14:06 -05:00
Nilay Vaish	6bee1d9124	ruby: topology: refactor code.	2015-09-14 10:14:50 -05:00
Nilay Vaish	4e898be762	ruby: slicc: remove member buffer_expr from Var class This was added by changeset 51f40b101a56. Instead, buffer_expr would now be associated with the InPort class.	2015-09-14 10:04:55 -05:00
Nilay Vaish	78bf2dfeac	merged with 62e1504b9c64	2015-09-12 16:23:47 -05:00
Nilay Vaish	8b199b775e	ruby: perfect switch: refactor code Refactored the code in operateVnet(), moved partly to a new function operateMessageBuffer(). This is required since a later patch moves to having a wakeup event per MessageBuffer instead of one event for the entire Switch.	2015-09-12 16:16:17 -05:00
Nilay Vaish	25cd13dbf1	ruby: simple network: store Switch* in PerfectSwitch and Throttle There are two reasons for doing so: a. provide a source of clock to PerfectSwitch. A follow on patch removes sender and receiver pointers from MessageBuffer means that the object owning the buffer should have some way of providing timing info. b. schedule events. A follow on patch removes the consumer class. So the PerfectSwitch needs some EventManager object to schedule events on its own.	2015-09-12 16:16:03 -05:00
Andreas Sandberg	a151786741	dev: Add an underrun statistic to the HDLCD controller Add a stat that counts buffer underruns in the HDLCD controller. The stat counts at most one underrun per frame since the controller aborts the current frame if it underruns.	2015-09-11 15:56:09 +01:00
Andreas Sandberg	f7055e9215	dev, arm: Rewrite the HDLCD controller Rewrite the HDLCD controller to use the new DMA engine and pixel pump. This fixes several bugs in the current implementation: * Broken/missing interrupt support (VSync, underrun, DMA end) * Fragile resolution changes (changing resolutions used to cause assertion errors). * Support for resolutions with a width that isn't divisible by 32. * The pixel clock can now be set dynamically. This breaks checkpoint compatibility. Checkpoints can be upgraded with the checkpoint conversion script. However, upgraded checkpoints won't contain the state of the current frame. That means that HDLCD controllers restoring from a converted checkpoint immediately start drawing a new frame (i.e, expect timing differences).	2015-09-11 15:55:46 +01:00
Nilay Vaish	f611d4f22e	ruby: slicc: remove nextLineHack from Type.py	2015-09-08 19:32:04 -05:00
Nilay Vaish	740984b30b	ruby: call setMRU from L1 controllers, not from sequencer Currently the sequencer calls the function setMRU that updates the replacement policy structures with the first level caches. While functionally this is correct, the problem is that this requires calling findTagInSet() which is an expensive function. This patch removes the calls to setMRU from the sequencer. All controllers should now update the replacement policy on their own. The set and the way index for a given cache entry can be found within the AbstractCacheEntry structure. Use these indicies to update the replacement policy structures.	2015-09-05 09:35:39 -05:00
Nilay Vaish	8f29298bc7	ruby: adds set and way indices to AbstractCacheEntry	2015-09-05 09:35:31 -05:00
Nilay Vaish	abcc67010e	ruby: set: reimplement using std::bitset The current Set data structure is slow and therefore is being reimplemented using std::bitset. A maximum limit of 64 is being set on the number of controllers of each type. This means that for simulating a system with more controllers of a given type, one would need to change the value of the variable NUMBER_BITS_PER_SET	2015-09-05 09:34:25 -05:00
Nilay Vaish	7962a81148	ruby: declare all protocol message buffers as parameters MessageBuffer is a SimObject now. There were protocols that still declared some of the message buffers are variables of the controller, but not as input parameters. Special handling was required for these variables in the SLICC compiler. This patch changes this. Now all message buffers are declared as input parameters.	2015-09-05 09:34:24 -05:00
Andreas Hansson	419d437385	mem: Avoid setting markPending if not needed In cases where a newly added target does not have any upstream MSHR to mark as downstreamPending, remember that nothing is marked. This allows us to avoid attempting to find the MSHR as part of the clearing of downstreamPending.	2015-09-04 13:14:03 -04:00
Andreas Hansson	2c50a83ba2	mem: Tidy up CacheSet Minor tweaks and house keeping.	2015-09-04 13:14:01 -04:00
Andreas Hansson	76088fb9ca	mem: Tidy up the snoop state-transition logic Remove broken and unused option to pass dirty data on non-exclusive snoops. Also beef up the comments a bit.	2015-09-04 13:13:58 -04:00
Andreas Hansson	8e74d5484f	sim: Fix time unit in abort message	2015-09-04 13:13:55 -04:00
Curtis Dunham	87b9da2df4	sim: tag-based checkpoint versioning This commit addresses gem5 checkpoints' linear versioning bottleneck. Since development is distributed across many private trees, there exists a sort of 'race' for checkpoint version numbers: internally a checkpoint version may be used but then resynchronizing with the external tree causes a conflict on that version. This change replaces the linear version number with a set of unique strings called tags. Now the only conflicts that can arise are of tag names, where collisions are much easier to avoid. The checkpoint upgrader (util/cpt_upgrader.py) upgrades the version representation, as one would expect. Each tag version implements its upgrader code in a python file in the util/cpt_upgraders directory rather than adding a function to the upgrader script itself. The version tags are stored in the 'Globals' section rather than 'root' (as the version was previously) because 'Globals' gets unserialized first and can provide a warning before any other unserialization errors can occur.	2015-09-02 15:23:30 -05:00
Curtis Dunham	62e0344aef	sim: support checkpointing std::set<std::string>'s This is in support of tag-based checkpoint versioning; the version tags are stored in string sets. This commit adds such support.	2015-09-02 15:19:44 -05:00
Curtis Dunham	1ad5b77229	sim: make warning for absent optional parameters optional This is in support of tag-based checkpoint versioning. It should be possible to examine an optional parameter in a checkpoint during unserialization and not have it throw a warning.	2015-09-02 15:19:43 -05:00
Nilay Vaish	fe47f0a72f	ruby: remove random seed We no longer use the C library based random number generator: random(). Instead we use the C++ library provided rng. So setting the random seed for the RubySystem class has no effect. Hence the variable and the corresponding option are being dropped.	2015-09-01 15:50:33 -05:00
Nilay Vaish	5d555df359	ruby: directory memory: drop unused variable.	2015-09-01 15:50:32 -05:00
Andreas Sandberg	05852e698a	sim: Remove broken AutoSerialize support from the event queue Event auto-serialization no longer in use and has been broken ever since the introduction of PDES support almost two years ago. Additionally, serializing the individual event queues is undesirable since it exposes the thread structure of the simulator. What this means in practice is that the number of threads in the simulator must be the same when taking a checkpoint and when loading the checkpoint. This changeset removes support for the AutoSerialize event flag and the associated serialization code.	2015-09-01 15:28:45 +01:00
Andreas Sandberg	53001e6e09	dev: Remove auto-serialization dependency in EtherLink EtherLink currently uses a fire-and-forget link delay event that delays sending of packets by a fixed number of ticks. In order to serialize this event, it relies on the event queue's auto serialization support. However, support for event auto serialization has been broken for more than two years, which means that checkpoints of multi-system setups are likely to drop in-flight packets. This changeset the replaces rewrites this part of the EtherLink to use a packet queue instead. The queue contains a (tick, packet) tuple. The tick indicates when the packet will be ready. Instead of relying on event autoserialization, we now explicitly serialize the packet queue in the EhterLink::Link class. Note that this changeset changes the way in-flight packages are serialized. Old checkpoints will still load, but in-flight packets will be dropped (just as before). There has been no attempt to upgrade checkpoints since this would actually change the behavior of existing checkpoints.	2015-09-01 15:28:44 +01:00
Andreas Sandberg	0572dc3c6e	sim: Remove autoserialize support for exit events This changeset removes the support for the autoserialize parameter in GlobalSimLoopExitEvent (including exitSimLoop()) and LocalSimLoopExitEvent. Auto-serialization of the LocalSimLoopExitEvent was never used, so this is not expected to affect anything. However, it was sometimes used for GlobalSimLoopExitEvent. Unfortunately, serialization of global events has never been supported, so checkpoints with such events will currently cause simulation panics. The serialize parameter to exitSimLoop() has been left in-place to maintain API compatibility (removing it would affect m5ops). Instead of just dropping it, we now print a warning if the parameter is set and the exit event is scheduled in the future (i.e., not at the current tick).	2015-09-01 13:41:45 +01:00
Andreas Sandberg	1fa7a4394c	sim: Remove unused SerializeBuilder interface	2015-09-01 13:40:28 +01:00
Andreas Sandberg	4411c97ee1	sim: Replace fromInt/fromSimObject with decltype	2015-09-01 13:40:25 +01:00
Andreas Sandberg	db465fd788	sim: Move SimObject resolver to sim_object.hh The object resolver isn't serialization specific and shouldn't live in serialize.hh. Move it to sim_object.hh since it queries to the SimObject hierarchy.	2015-09-01 13:40:05 +01:00
Nilay Vaish	a60a93eb05	ruby: specify number of vnets for each protocol The default value for number of virtual networks is being removed. Each protocol should now specify the value it needs.	2015-08-30 12:24:18 -05:00
Nilay Vaish	bf8ae288fa	ruby: network: drop member m_in_use This member indicates whether or not a particular virtual network is in use. Instead of having a default big value for the number of virtual networks and then checking whether a virtual network is in use, the next patch removes the default value and the protocol configuration file would now specify the number of virtual networks it requires. Additionally, the patch also refactors some of the code used for computing the virtual channel next in the round robin order.	2015-08-30 12:24:18 -05:00
Nilay Vaish	7175db4a3f	ruby: garnet: mark few functions const in BaseGarnetNetwork.hh	2015-08-30 12:24:18 -05:00
Nilay Vaish	426e38af8b	ruby: slicc: avoid duplicate code for function argument check Both FuncCallExprAST and MethodCallExprAST had code for checking the arguments with which a function is being called. The patch does away with this duplication. Now the code for checking function call arguments resides in the Func class.	2015-08-30 10:52:58 -05:00
Nilay Vaish	4727fc26f8	ruby: eliminate type uint64 and int64 These types are being replaced with uint64_t and int64_t.	2015-08-29 10:19:23 -05:00
Andreas Sandberg	e9d6bf5e35	ruby: Use the const serialize interface in RubySystem The new serialization code (kudos to Tim Jones) moves all of the state mangling in RubySystem to memWriteback. This makes it possible to use the new const serialization interface. This changeset moves the cache recorder cleanup from the checkpoint() method to drainResume() to make checkpointing truly constant and updates the checkpointing code to use the new interface.	2015-08-28 10:58:44 +01:00
Nilay Vaish	fc3d34a488	ruby: handle llsc accesses through CacheEntry, not CacheMemory The sequencer takes care of llsc accesses by calling upon functions from the CacheMemory. This is unnecessary once the required CacheEntry object is available. Thus some of the calls to findTagInSet() are avoided.	2015-08-27 12:51:40 -05:00
Emilio Castillo	88b1fd82a6	cpu: quiesce pseudoinsts: Always do full quiesce The O3CPU blocks the Fetch when it sees a quiesce instruction (IsQuiesce flag). When the inst. is executed, a quiesce event is created to reactivate the context and unblock the Fetch. If the quiesceNs or quiesceCycles are called with a value of 0, the QuiesceEvent will not be created and the Fetch stage will remain blocked. Committed by Joel Hestness <jthestness@gmail.com>	2015-08-26 14:20:30 -05:00
Andreas Hansson	ce4f6a9020	mem: Revert requirement on packet addr/size always valid This patch reverts part of (842f56345a42), as apparently there are use-cases outside the main repository relying on the late setting of the physical address.	2015-08-24 05:03:45 -04:00
Andreas Hansson	daaae3744d	mem: Reflect that packet address and size are always valid This patch simplifies the packet, and removes the possibility of creating a packet without a valid address and/or size. Under no circumstances are these fields set at a later point, and thus they really have to be provided at construction time. The patch also fixes a case there the MinorCPU creates a packet without a valid address and size, only to later delete it.	2015-08-21 07:03:27 -04:00
Andreas Hansson	6eb434c8a2	arm, mem: Remove unused CLEAR_LL request flag Cleaning up dead code. The CLREX stores zero directly to MISCREG_LOCKFLAG and so the request flag is no longer needed. The corresponding functionality in the cache tags is also removed.	2015-08-21 07:03:25 -04:00
Andreas Hansson	bda79817c8	mem: Remove unused cache squash functionality Tidying up.	2015-08-21 07:03:24 -04:00
Andreas Hansson	ddfa96cf45	mem: Add explicit Cache subclass and make BaseCache abstract Open up for other subclasses to BaseCache and transition to using the explicit Cache subclass. --HG-- rename : src/mem/cache/BaseCache.py => src/mem/cache/Cache.py	2015-08-21 07:03:23 -04:00
Andreas Hansson	d71a0d790d	ruby: Move Rubys cache class from Cache.py to RubyCache.py This patch serves to avoid name clashes with the classic cache. For some reason having two 'SimObject' files with the same name creates problems. --HG-- rename : src/mem/ruby/structures/Cache.py => src/mem/ruby/structures/RubyCache.py	2015-08-21 07:03:21 -04:00
Andreas Hansson	1bf389a2bf	mem: Move cache_impl.hh to cache.cc There is no longer any need to keep the implementation in a header.	2015-08-21 07:03:20 -04:00
Andreas Hansson	ae06e9a5c6	cpu: Move invldPid constant from Request to BaseCPU A more natural home for this constant.	2015-08-21 07:03:14 -04:00
Nilay Vaish	2f44dada68	ruby: reverts to changeset: bf82f1f7b040	2015-08-19 10:02:01 -05:00
Nilay Vaish	2d9f3f8582	ruby: add accessor functions to SLICC def of MachineID	2015-08-14 19:28:44 -05:00
Nilay Vaish	62dcbe3d95	ruby: simple network: refactor code Drops an unused variable and marks three variables as const.	2015-08-14 19:28:44 -05:00
Nilay Vaish	d0cf41300b	ruby: profiler: provide the number of vnets through ruby system The aim is to ultimately do away with the static function Network::getNumberOfVirtualNetworks().	2015-08-14 19:28:44 -05:00
Nilay Vaish	e63c120d0d	ruby: directory memory: drop unused variable.	2015-08-14 19:28:44 -05:00
Nilay Vaish	8114c7ff2c	ruby: slicc: remove a stray line in StateMachine.py	2015-08-14 19:28:44 -05:00
Nilay Vaish	85506f1c21	ruby: garnet: flexible: refactor flit	2015-08-14 19:28:44 -05:00
Nilay Vaish	ae87d68551	ruby: DataBlock: adds a comment	2015-08-14 19:28:44 -05:00
Nilay Vaish	d660b3145b	ruby: remove random seed We no longer use the C library based random number generator: random(). Instead we use the C++ library provided rng. So setting the random seed for the RubySystem class has no effect. Hence the variable and the corresponding option are being dropped.	2015-08-14 19:28:44 -05:00
Nilay Vaish	ca368765a1	ruby: SubBlock: refactor code	2015-08-14 19:28:44 -05:00
Nilay Vaish	514f18cdda	ruby: cache recorder: move check on block size to RubySystem.	2015-08-14 19:28:44 -05:00
Nilay Vaish	3a726752c1	ruby: abstract controller: mark some variables as const	2015-08-14 19:28:44 -05:00
Nilay Vaish	3230a0b89f	ruby: simple network: store Switch* in PerfectSwitch and Throttle	2015-08-14 19:28:44 -05:00
Nilay Vaish	cb133b5f2c	ruby: remove unused functionalRead() function.	2015-08-14 19:28:44 -05:00
Nilay Vaish	5f1d1ce5d4	ruby: perfect switch: refactor code Refactored the code in operateVnet(), moved partly to a new function operateMessageBuffer().	2015-08-14 19:28:44 -05:00
Nilay Vaish	a706b6259a	ruby: cache memory: drop {try,test}CacheAccess functions	2015-08-14 19:28:43 -05:00
Nilay Vaish	5060e572ca	ruby: call setMRU from L1 controllers, not from sequencer Currently the sequencer calls the function setMRU that updates the replacement policy structures with the first level caches. While functionally this is correct, the problem is that this requires calling findTagInSet() which is an expensive function. This patch removes the calls to setMRU from the sequencer. All controllers should now update the replacement policy on their own. The set and the way index for a given cache entry can be found within the AbstractCacheEntry structure. Use these indicies to update the replacement policy structures.	2015-08-14 19:28:43 -05:00
Nilay Vaish	b815221718	ruby: adds set and way indices to AbstractCacheEntry	2015-08-14 19:28:43 -05:00
Nilay Vaish	a6f3f38f2c	ruby: eliminate type uint64 and int64 These types are being replaced with uint64_t and int64_t.	2015-08-14 19:28:43 -05:00
Nilay Vaish	9648c05db1	ruby: slicc: use default argument value Before this patch, while one could declare / define a function with default argument values, but the actual function call would require one to specify all the arguments. This patch changes the check for function arguments. Now a function call needs to specify arguments that are at least as much as those with default values and at most the total number of arguments taken as input by the function.	2015-08-14 19:28:43 -05:00
Nilay Vaish	7fc725fdb5	ruby: slicc: avoid duplicate code for function argument check Both FuncCallExprAST and MethodCallExprAST had code for checking the arguments with which a function is being called. The patch does away with this duplication. Now the code for checking function call arguments resides in the Func class.	2015-08-14 19:28:43 -05:00
Nilay Vaish	f391cee5e1	ruby: drop the [] notation for lookup function. This is in preparation for adding a second arugment to the lookup function for the CacheMemory class. The change to .sm files was made using the following sed command: sed -i 's/\[\([0-9A-Za-z._()]\)\]/.lookup(\1)/' src/mem/protocol/*.sm	2015-08-14 19:28:43 -05:00
Nilay Vaish	1a3e8a3370	ruby: handle llsc accesses through CacheEntry, not CacheMemory The sequencer takes care of llsc accesses by calling upon functions from the CacheMemory. This is unnecessary once the required CacheEntry object is available. Thus some of the calls to findTagInSet() are avoided.	2015-08-14 19:28:42 -05:00
Nilay Vaish	91a84c5b3c	ruby: replace Address by Addr This patch eliminates the type Address defined by the ruby memory system. This memory system would now use the type Addr that is in use by the rest of the system.	2015-08-14 12:04:51 -05:00
Nilay Vaish	9ea5d9cad9	ruby: rename variables Addr to addr Avoid clash between type Addr and variable name Addr.	2015-08-14 12:04:47 -05:00
Joel Hestness	905c0b347c	ruby: Protocol changes for SimObject MessageBuffers	2015-08-14 00:19:45 -05:00
Joel Hestness	581bae9ecb	ruby: Expose MessageBuffers as SimObjects Expose MessageBuffers from SLICC controllers as SimObjects that can be manipulated in Python. This patch has numerous benefits: 1) First and foremost, it exposes MessageBuffers as SimObjects that can be manipulated in Python code. This allows parameters to be set and checked in Python code to avoid obfuscating parameters within protocol files. Further, now as SimObjects, MessageBuffer parameters are printed to config output files as a way to track parameters across simulations (e.g. buffer sizes) 2) Cleans up special-case code for responseFromMemory buffers, and aligns their instantiation and use with mandatoryQueue buffers. These two special buffers are the only MessageBuffers that are exposed to components outside of SLICC controllers, and they're both slave ends of these buffers. They should be exposed outside of SLICC in the same way, and this patch does it. 3) Distinguishes buffer-specific parameters from buffer-to-network parameters. Specifically, buffer size, randomization, ordering, recycle latency, and ports are all specific to a MessageBuffer, while the virtual network ID and type are intrinsics of how the buffer is connected to network ports. The former are specified in the Python object, while the latter are specified in the controller *.sm files. Unlike buffer-specific parameters, which may need to change depending on the simulated system structure, buffer-to-network parameters can be specified statically for most or all different simulated systems.	2015-08-14 00:19:44 -05:00
Joel Hestness	bf06911b3f	ruby: Change PerfectCacheMemory::lookup to return pointer CacheMemory and DirectoryMemory lookup functions return pointers to entries stored in the memory. Bring PerfectCacheMemory in line with this convention, and clean up SLICC code generation that was in place solely to handle references like that which was returned by PerfectCacheMemory::lookup.	2015-08-14 00:19:39 -05:00
Joel Hestness	9567c839fe	ruby: Remove the RubyCache/CacheMemory latency The RubyCache (CacheMemory) latency parameter is only used for top-level caches instantiated for Ruby coherence protocols. However, the top-level cache hit latency is assessed by the Sequencer as accesses flow through to the cache hierarchy. Further, protocol state machines should be enforcing these cache hit latencies, but RubyCaches do not expose their latency to any existng state machines through the SLICC/C++ interface. Thus, the RubyCache latency parameter is superfluous for all caches. This is confusing for users. As a step toward pushing L0/L1 cache hit latency into the top-level cache controllers, move their latencies out of the RubyCache declarations and over to their Sequencers. Eventually, these Sequencer parameters should be exposed as parameters to the top-level cache controllers, which should assess the latency. NOTE: Assessing these latencies in the cache controllers will require modifying each to eliminate instantaneous Ruby hit callbacks in transitions that finish accesses, which is likely a large undertaking.	2015-08-14 00:19:37 -05:00
Nilay Vaish	c58bee829f	sim: clocked object: function for converting cycles to ticks.	2015-08-11 11:39:23 -05:00
Nilay Vaish	759fe30d9f	ruby: drop some redundant includes	2015-08-11 11:39:23 -05:00
Nilay Vaish	380a2ca918	ruby: slicc: allow mathematical operations on Ticks	2015-08-11 11:39:23 -05:00
Andreas Sandberg	35d8e5b52b	sim: Flag EventQueue::getCurTick() as const	2015-08-07 17:43:21 +01:00
Andreas Sandberg	bbb3abc167	mem: Cleanup packet accessor methods The Packet::get() and Packet::set() methods both have very strange semantics. Currently, they automatically convert between the guest system's endianness and the host system's endianness. This behavior is usually undesired and unexpected. This patch introduces three new method pairs to access data: * getLE() / setLE() - Get data stored as little endian. * getBE() / setBE() - Get data stored as big endian. * get(ByteOrder) / set(v, ByteOrder) - Configurable endianness For example, a little endian device that is receiving a write request will use teh getLE() method to get the data from the packet. The old interface will be deprecated once all existing devices have been ported to the new interface.	2015-08-07 09:59:28 +01:00
Andreas Sandberg	ce8939a97e	dev: Implement a simple display timing generator Timing generator for a pixel-based display. The timing generator is intended for display processors driving a standard rasterized display. The simplest possible display processor needs to derive from this class and override the nextPixel() method to feed the display with pixel data. Pixels are ordered relative to the top left corner of the display. Scan lines appear in the following order: * Vertical Sync (starting at line 0) * Vertical back porch * Visible lines * Vertical front porch Pixel order within a scan line: * Horizontal Sync * Horizontal Back Porch * Visible pixels * Horizontal Front Porch All events in the timing generator are automatically suspended on a drain() request and restarted on drainResume(). This is conceptually equivalent to clock gating when the pixel clock while the system is draining. By gating the pixel clock, we prevent display controllers from disturbing a memory system that is about to drain.	2015-08-07 09:59:26 +01:00
Andreas Sandberg	598edaae05	arm: Add support for programmable oscillators Add support for oscillators that can be programmed using the RealView / Versatile Express configuration interface. These oscillators are typically used for things like the pixel clock in the display controller. The default configurations support the oscillators from a Versatile Express motherboard (V2M-P1) with a CoreTile Express A15x2.	2015-08-07 09:59:25 +01:00
Andreas Sandberg	cd098a7e84	dev: Add a simple DMA engine that can be used by devices Add a simple DMA engine that sits behind a FIFO. This engine can be used by devices that need to read large amounts of data (e.g., display controllers). Most aspects of the controller, such as FIFO size, maximum number of in-flight accesses, and maximum request sizes can be configured. The DMA copies blocks of data into its FIFO. Transfers are initiated with a call to startFill() command that takes a start address and a size. Advanced users can create a derived class that overrides the onEndOfBlock() callback that is triggered when the last request to a block has been issued. At this point, the DMA engine is ready to start fetching a new block of data, potentially from a different address range. The DMA engine stops issuing new requests while it is draining. Care must be taken to ensure that devices that are fed by a DMA engine are suspended while the system is draining to avoid buffer underruns.	2015-08-07 09:59:23 +01:00
Andreas Sandberg	f7ff27afe8	sim: Split ClockedObject to make it usable to non-SimObjects Split ClockedObject into two classes: Clocked that provides the basic clock functionality, and ClockedObject that inherits from Clocked and SimObject to provide the functionality of the old ClockedObject.	2015-08-07 09:59:22 +01:00
Andreas Sandberg	9b2426ecfc	base: Rewrite the CircleBuf to fix bugs and add serialization The CircleBuf class has at least one bug causing it to overwrite the wrong elements when wrapping. The current code has a lot of unused functionality and duplicated code. This changeset replaces the old implementation with a new version that supports serialization and arbitrary types in the buffer (not just char).	2015-08-07 09:59:19 +01:00
Andreas Sandberg	39d8034475	dev, x86: Fix serialization bug in the i8042 device The i8042 device drops the contents of a PS2 device's buffer when serializing, which results in corrupted PS2 state when continuing simulation after a checkpoint. This changeset fixes this bug and transitions the i8042 model to use the new serialization API that requires the serialize() method to be const.	2015-08-07 09:59:15 +01:00
Andreas Sandberg	af6b51925c	dev: Make serialization in Sinic constant This changeset transitions the Sinic device to the new serialization framework that requires the serialization method to be constant.	2015-08-07 09:59:14 +01:00
Andreas Sandberg	53e777d683	base: Declare a type for context IDs Context IDs used to be declared as ad hoc (usually as int). This changeset introduces a typedef for ContextIDs and a constant for invalid context IDs.	2015-08-07 09:59:13 +01:00
Andreas Sandberg	3e26756f1d	base: Use constexpr in Cycles Declare the constructor and all of the operators that don't change the state of a Cycles instance as constexpr. This makes it possible to use Cycles as a static constant and allows the compiler to evaulate simple expressions at compile time. An unfortunate side-effect of this is that we cannot use assertions since C++11 doesn't support them in constexpr functions. As a workaround, we throw an invalid_argument exception when the assert would have triggered. A nice side-effect of this is that the compiler will evaluate the "assertion" at compile time when an expression involving Cycles can be statically evaluated.	2015-08-07 09:59:12 +01:00
Andreas Hansson	83a668ad25	mem: Remove extraneous acquire/release flags and attributes This patch removes the extraneous flags and attributes from the request and packet, and simply leaves the new commands. The change introduced when adding acquire/release breaks all compatibility with existing traces, and there is really no need for any new flags and attributes. The commands should be sufficient. This patch fixes packet tracing (urgent), and also removes the unnecessary complexity.	2015-08-07 04:55:38 -04:00
Andreas Sandberg	07815a3338	sim: Fixup comments and constness in draining infrastructure Fix comments that got outdated by the draining rewrite. Also fixup constness for methods in the querying drain state in the DrainManager.	2015-08-05 10:27:11 +01:00
Andreas Sandberg	0194e6eb2d	mem: Fixup incorrect include guards --HG-- extra : rebase_source : 9dba84eaf9c734a114ecd0940e1d505303644064	2015-08-05 10:12:12 +01:00
Andreas Sandberg	7c904d9d3f	sim: Initialize Drainable::_drainState to the system's state It is sometimes desirable to be able to instantiate Drainable objects when the simulator isn't in the Running state. Currently, we always initialize Drainable objects to the Running state. However, this confuses many of the sanity checks in the base class since objects aren't expected to be in the Running state if the system is in the Draining or Drained state. Instead of always initializing the state variable in Drainable to DrainState::Running, initialize it to the state the DrainManager is in. Note: This means an object can be created in the Draining/Drained state without first calling drain().	2015-08-04 10:31:37 +01:00
Andreas Sandberg	a3f49f60c7	mem: Move trace functionality from the CommMonitor to a probe This changeset moves the access trace functionality from the CommMonitor into a separate probe. The probe can be hooked up to any component that exports probe points of the type ProbePoints::Packet. This patch moves the dependency on Google's Protocol Buffers library from the CommMonitor to the MemTraceProbe, which means that the CommMonitor (including stack distance profiling) no long depends on it.	2015-08-04 10:29:13 +01:00
Andreas Sandberg	022e69e6de	mem: Redesign the stack distance calculator as a probe This changeset removes the stack distance calculator hooks from the CommMonitor class and implements a stack distance calculator as a memory system probe instead. The probe can be hooked up to any component that exports probe points of the type ProbePoints::Packet.	2015-08-04 10:29:13 +01:00
Andreas Sandberg	feded87fc9	mem: Add probe support to the CommMonitor This changeset adds a standardized probe point type to monitor packets in the memory system and adds two probe points to the CommMonitor class. These probe points enable monitoring of successfully delivered requests and successfully delivered responses. Memory system probe listeners should use the BaseMemProbe base class to provide a unified configuration interface and reuse listener registration code. Unlike the ProbeListenerObject class, the BaseMemProbe allows objects to be wired to multiple ProbeManager instances as long as they use the same probe point name.	2015-08-04 10:29:13 +01:00
Timothy Jones	c375870abd	sim: function for testing for auto deletion Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-08-03 23:08:40 -05:00
Timothy Jones	96091f358b	uby: Fix checkpointing and restore There are 2 problems with the existing checkpoint and restore code in ruby. The first is that when the event queue is altered by ruby during serialization, some events that are currently scheduled cannot be found (e.g. the event to stop simulation that always lives on the queue), causing a panic. The second is that ruby is sometimes serialized after the memory system, meaning that the dirty data in its cache is flushed back to memory too late and so isn't included in the checkpoint. These are fixed by implementing memory writeback in ruby, using the same technique of hijacking the event queue, but first descheduling all events that are currently on it. They are saved, along with their scheduled time, so that the event queue can be faithfully reconstructed after writeback has finished. Events with the AutoDelete flag set will delete themselves when they are descheduled, causing an error when attempting to schedule them again. This is fixed by simply not recording them when taking them off the queue. Writeback is still implemented using flushing, so the cache recorder object, that is created to generate the trace and manage flushing, is kept around and used during serialization to write the trace to disk. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-08-03 23:08:40 -05:00
Nilay Vaish	676ae57827	ruby: mesi three level: multiple corrections to the protocol 1. Eliminate state NP in L0 and L1 Caches: The two states 'NP' and 'I' both mean that the cache block is not present in the cache. 'I' also means that the cache entry has been allocated. This causes problems when we do not correctly initialize the cache entry when it is re-used. Hence, this patch eliminates the state NP altogether. Everytime a new block comes into the cache, a cache entry is allocated. Everytime a block leaves, the corresponding entry is deallocated. 2. Separate transient state for instruction fetches: purely for accouting purposes. 3. Drop state IS_I in L1 Cache and the message type STALE_DATA: when invalidation is received for a block in IS, the block used to be moved to IS_I. This meant that the data that would arrive in future would be used but not stored since the controller lost the permissions after gaining them. This state is being dropped and now invalidation messages would not processed till the data has arrived. This also means that STALE_DATA type is not longer required.	2015-08-03 22:44:29 -05:00
Nilay Vaish	9bf3b8828a	ruby: mesi two,three level: copy data only when dirty The level 2 controller has a bug. In one particular action, the data block was copied from a message irrespective whether the block is dirty or not. In cases when L1 sends no data, the data value copied was incorrect.	2015-08-03 22:44:28 -05:00
Brad Beckmann	03f2b8c23d	ruby: removed invalid assert in message comparitor It is perfectly valid to compare the same message and the greater than operator should work correctly.	2015-08-01 12:59:47 -04:00
Brad Beckmann	6b52f828cc	ruby: improved stall and wait debugging Added dprintfs and asserts for identifying stall and wait bugs.	2015-07-20 09:15:18 -05:00
Brad Beckmann	848861a17d	slicc: fix error in conflicing symbol declaration	2015-07-20 09:15:18 -05:00
Brad Beckmann	8a54adc2a5	slicc: enable overloading in functions not in classes For many years the slicc symbol table has supported overloaded functions in external classes. This patch extends that support to functions that are not part of classes (a.k.a. no parent). For example, this support allows slicc to understand that mapAddressToRange is overloaded and the NodeID is an optional parameter.	2015-07-20 09:15:18 -05:00
David Hashe	0d00cbc97b	ruby: change router pipeline stages to 2 This patch changes the router pipeline stages from 4 to 2. The canonical 4-stage router is conservative while a lower-latency router with look ahead routing and speculative allocation is well acknowledged.	2015-07-20 09:15:18 -05:00
David Hashe	8b32dad4d8	ruby: change advance_stage for flit_d Sets m_stage.second to the second parameter of the function. Then, for every place where advance_stage is called, adds a cycle to the argument being passed.	2015-07-20 09:15:18 -05:00
Brad Beckmann	f9fa242f42	slicc: improved stalling support in protocols Adds features to allow protocols to reschedule controllers when conditionally stalling within inport logic or actions. Also insures that resource and protocol stalls are re-evaluated the next cycle.	2015-07-20 09:15:18 -05:00
David Hashe	c4ffd4989c	ruby: expose access permission to replacement policies This patch adds support that allows the replacement policy to identify each cache block's access permission. This information can be useful when making replacement decisions.	2015-07-20 09:15:18 -05:00
David Hashe	967cfa939a	ruby: adds size and empty apis to the msg buffer stallmap	2015-07-20 09:15:18 -05:00
David Hashe	21aa5734a0	ruby: fix deadlock bug in banked array resource checks The Ruby banked array resource checks (initiated from SLICC) did a check and allocate at the same time. If a transition needs more than one resource, then it might check/allocate resource #1, then fail to get resource #2. Another transition might then try to get the same resources, but in reverse order. Deadlock. This patch separates resource checking and resource reservation into two steps to avoid deadlock.	2015-07-20 09:15:18 -05:00
David Hashe	63a9f10de8	ruby: Fix for stallAndWait bug It was previously possible for a stalled message to be reordered after an incomming message. This patch ensures that any stalled message stays in its original request order.	2015-07-20 09:15:18 -05:00
David Hashe	6511ab4654	mem: add request types for acquire and release Add support for acquire and release requests. These synchronization operations are commonly supported by several modern instruction sets.	2015-07-20 09:15:18 -05:00
David Hashe	7e9562013b	ruby: allocate a block in CacheMemory without updating LRU state	2015-07-20 09:15:18 -05:00
David Hashe	7e00772bda	ruby: speed up function used for cache walks This patch adds a few helpful functions that allow .sm files to directly invalidate all cache blocks using a trigger queue rather than rely on each individual cache block to be invalidated via requests from the mandatory queue.	2015-07-20 09:15:18 -05:00
David Hashe	3454a4a36e	slicc: support for arbitrary DPRINTF flags (not just RubySlicc) This patch allows DPRINTFs to be used in SLICC state machines similar to how they are used by the rest of gem5. Previously all DPRINTFs in the .sm files had to use the RubySlicc flag.	2015-07-20 09:15:18 -05:00
David Hashe	9324239922	slicc: support for local variable declarations in action blocks	2015-07-20 09:15:18 -05:00
David Hashe	1850ed410f	ruby: initialize replacement policies with their own simobjs this is in preparation for other replacement policies that take additional parameters.	2015-07-20 09:15:18 -05:00
David Hashe	74ca89f8b7	ruby: give access to cache tag/data latencies from SLICC This patch exposes the tag and data array latencies to the SLICC state machines so that it can be used to determine the correct enqueue latency for response messages.	2015-07-20 09:15:18 -05:00
David Hashe	536e3664e4	slicc: support for multiple cache entry types in the same state machine To have multiple Entry types (e.g., a cache Entry type and a directory Entry type), just declare one of them as a secondary type by using the pair 'main="false"', e.g.: structure(DirEntry, desc="...", interface="AbstractCacheEntry", main="false") { ...and the primary type would be declared: structure(Entry, desc="...", interface="AbstractCacheEntry") {	2015-07-20 09:15:18 -05:00
David Hashe	910638f338	slicc: Fix bug in enqueue and peek statements. These were not generating the correct c names for types declared within a machine scope.	2015-07-20 09:15:18 -05:00
David Hashe	3d8c8a85fa	slicc: fix missing inline function in LocalVariableAST	2015-07-20 09:15:18 -05:00
David Hashe	93fff6f636	slicc: improve support for prefix operations This patch fixes the type handling when prefix operations are used. Previously prefix operators would assume a void return type, which made it impossible to combine prefix operations with other expressions. This patch allows SLICC programmers to use prefix operations more naturally.	2015-07-20 09:15:18 -05:00
David Hashe	ee0d414fa8	slicc: support for transitions with a wildcard next state This patches adds support for transitions of the form: transition(START, EVENTS, ) { ACTIONS } This allows a machine to collapse states that differ only in the next state transition to collapse into one, and can help shorten/simplfy some protocols significantly. When is encountered as an end state of a transition, the next state is determined by calling the machine-specific getNextState function. The next state is determined before any actions of the transition execute, and therefore the next state calculation cannot depend on any of the transition actions.	2015-07-20 09:15:18 -05:00
David Hashe	6a288d9de3	slicc: support for multiple message types on the same buffer This patch allows SLICC protocols to use more than one message type with a message buffer. For example, you can declare two in ports as such: in_port(ResponseQueue_in, ResponseMsg, responseFromDir, rank=3) { ... } in_port(tgtResponseQueue_in, TgtResponseMsg, responseFromDir, rank=2) { ... }	2015-07-20 09:15:18 -05:00
Brad Beckmann	b609b032aa	slicc: fatal->panic on invalid transitions	2015-08-01 12:37:52 -04:00
David Hashe	3444d5f359	mem: Hit callback delay fix This patch was created by Bihn Pham during his internship at AMD. There is no need to delay hit callback response messages by a cycle because the response latency is already incurred in the Ruby protocol. This ensures correct timing of memory instructions.	2015-07-20 09:15:18 -05:00
David Hashe	8f71e667b3	cpu: Fixed a bug on where to fetch the next instruction from Figure out if the next instruction to fetch comes from the micro-op ROM or not. Otherwise, wrong instructions may be fetched.	2015-07-20 09:15:18 -05:00
David Hashe	a2d9aae3c3	x86: x86 instruction-implementation bug fixes Added explicit data sizes and an opcode type for correct execution.	2015-07-20 09:15:18 -05:00
Brad Beckmann	0c78abb302	ruby: re-added the addressToInt slicc interface function This helper function is very useful converting address offsets to integers that can be used for protocol specific destination mapping.	2015-07-20 09:15:18 -05:00
David Hashe	d0f6aad3c6	syscall: Add readlink to x86 with special case /proc/self/exe This patch implements the correct behavior.	2015-07-20 09:15:18 -05:00
Brad Beckmann	4710eba588	ruby: add useful dprints to sequencer Added two data block dprints that are useful when tracking down data check failures in the ruby random tester.	2015-07-20 09:15:18 -05:00
David Hashe	a254786a19	slicc: isinstance bugfix This fix prevents spurious errors when searching for a symbol that may be located in one of multiple symbol tables.	2015-07-20 09:15:18 -05:00
Andreas Sandberg	f789d729b5	cpu: Update debug message from Fetch1 isDrained() in Minor Fix a spurious %s and include the state of the Fetch1 stage in the debug printout.	2015-07-31 17:04:59 +01:00
Andreas Sandberg	f73b05431a	cpu: Fix Minor drain issues when switched out The Minor CPU currently doesn't drain properly when it is switched out. This happens because Fetch 1 expects to be in the FetchHalted state when it is drained. However, because the CPU is switched out, it is stuck in the FetchWaitingForPC state. Fix this by ignoring drain requests and returning DrainState::Drained from MinorCPU::drain() if the CPU is switched out. This is always safe since a switched out CPU, by definition, doesn't have any instructions in flight.	2015-07-31 17:04:59 +01:00
Andreas Sandberg	ff8195235e	cpu: Only activate thread 0 in Minor if the CPU is active Minor currently activates thread 0 in startup() to work around an issue where activateContext() is called from LiveProcess before the process entry point is known. When activateContext() is called, Minor creates a branch instruction to the process's entry point. The first time it is called, the branch points to an undefined location (0). The call in startup() updates the branch to point to the actual entry point. When instantiating a switched out Minor CPU, it still tries to activate thread 0. This is clearly incorrect since a switched out CPU can't have any active threads. This changeset adds a check to ensure that the thread is active before reactivating it.	2015-07-30 10:15:50 +01:00
Andreas Sandberg	473a0dcc63	cpu: Fix drain issues in the Minor CPU The drain refactor patches introduced a couple of bugs in the way Minor handles draining. This patch fixes an incorrect assert and a case of infinite recursion when the CPU signals drain done.	2015-07-30 10:15:50 +01:00
Andreas Hansson	6fac40ceb0	mem: Add missing clean eviction on uncacheable access This patch adds a missing clean eviction, occuring when an uncacheable access flushes and invalidates an existing block.	2015-07-30 03:42:25 -04:00
Andreas Hansson	540e59fd70	mem: Remove unused RequestCause in cache This patch removes the RequestCause, and also simplifies how we schedule the sending of packets through the memory-side port. The deassertion of bus requests is removed as it is not used.	2015-07-30 03:41:43 -04:00
David Guillen-Fandos	0c89c15b23	mem: Make caches way aware This patch makes cache sets aware of the way number. This enables some nice features such as the ablity to restrict way allocation. The implemented mechanism allows to set a maximum way number to be allocated 'k' which must fulfill 0 < k <= N (where N is the number of ways). In the future more sophisticated mechasims can be implemented.	2015-07-30 03:41:42 -04:00
Andreas Hansson	5a18e181ff	mem: Transition away from isSupplyExclusive for writebacks This patch changes how writebacks communicate whether the line is passed as modified or owned. Previously we relied on the isSupplyExclusive mechanism, which was originally designed to avoid unecessary snoops. For normal cache requests we use the sharedAsserted mechanism to determine if a block should be marked writeable or not, and with this patch we transition the writebacks to also use this mechanism. Conceptually this is cleaner and more consistent.	2015-07-30 03:41:40 -04:00
Andreas Hansson	5902e29e84	mem: Tidy up CacheBlk class This patch modernises and tidies up the CacheBlk, removing dead code.	2015-07-30 03:41:39 -04:00
Andreas Hansson	41b39b22cd	mem: Tidy up packet Some minor fixes and removal of dead code. Changing the flags to be enums rather than static const (to avoid any linking issues caused by the latter). Also adding a getBlockAddr member which hopefully can slowly finds its way into caches, snoop filters etc.	2015-07-30 03:41:38 -04:00
Andreas Hansson	85a44e24dc	cpu: Fix issue identified by UBSan	2015-07-30 03:41:22 -04:00
Nilay Vaish	aafa5c3f86	revert 5af8f40d8f2c	2015-07-28 01:58:04 -05:00
Nilay Vaish	608641e23c	cpu: implements vector registers This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.	2015-07-26 10:21:20 -05:00
Nilay Vaish	6e354e82d9	cpu: o3: slight correction to identation in rename_impl.hh	2015-07-26 10:20:07 -05:00
Brandon Potter	4f7c969e27	style: change Process function calls to use camelCase The Process class methods were using an improper style and this subsequently bled into the system call code. The following regular expressions should be helpful if someone transitions private system call patches on top of these changesets: s/alloc_fd/allocFD/ s/sim_fd(/simFD(/ s/sim_fd_obj/getFDEntry/ s/fix_file_offsets/fixFileOffsets/ s/find_file_offsets/findFileOffsets/	2015-07-24 12:25:23 -07:00
Brandon Potter	d5a7f09eb1	syscall_emul: standardized file descriptor name and add return checks. The patch clarifies whether file descriptors are host file descriptors or target file descriptors in the system call code. (Host file descriptors are file descriptors which have been allocated through real system calls where target file descriptors are allocated from an array in the Process class.)	2015-07-24 12:25:23 -07:00
Brandon Potter	b90711ea53	base: refactor process class (specifically FdMap and friends) This patch extends the previous patch's alterations around fd_map. It cleans up some of the uglier code in the process file and replaces it with a more concise C++11 version. As part of the changes, the FdMap class is pulled out of the Process class and receives its own file.	2015-07-24 12:25:22 -07:00
Brandon Potter	ef08046af4	syscall_emul: file descriptor interface changes This patch gets rid of unused Process::dup_fd method and does minor refactoring in the process class files. The file descriptor max has been changed to be the number of file descriptors since this clarifies the loop boundary condition and cleans up the code a bit. The fd_map field has been altered to be dynamically allocated as opposed to being an array; the intention here is to build on this is subsequent patches to allow processes to share their file descriptors with the clone system call.	2015-07-24 12:25:22 -07:00
Brandon Potter	582793468d	ruby: dma sequencer: removes redundant code	2015-07-24 12:25:22 -07:00
Nilay Vaish	1b71a20391	ruby: network: NetworkLink inherits from Consumer now.	2015-07-22 11:20:07 -05:00
Nilay Vaish	0ef3dcc27b	x86: decode instructions with vex prefix This patch updates the x86 decoder so that it can decode instructions with vex prefix. It also updates the isa with opcodes from vex opcode maps 1, 2 and 3. Note that none of the instructions have been implemented yet. The implementations would be provided in due course of time.	2015-07-17 11:31:22 -05:00
Gabor Dozsa	fc5bf6713f	dev: add support for multi gem5 runs Multi gem5 is an extension to gem5 to enable parallel simulation of a distributed system (e.g. simulation of a pool of machines connected by Ethernet links). A multi gem5 run consists of seperate gem5 processes running in parallel (potentially on different hosts/slots on a cluster). Each gem5 process executes the simulation of a component of the simulated distributed system (e.g. a multi-core board with an Ethernet NIC). The patch implements the "distributed" Ethernet link device (dev/src/multi_etherlink.[hh.cc]). This device will send/receive (simulated) Ethernet packets to/from peer gem5 processes. The interface to talk to the peer gem5 processes is defined in dev/src/multi_iface.hh and in tcp_iface.hh. There is also a central message server process (util/multi/tcp_server.[hh,cc]) which acts like an Ethernet switch and transfers messages among the gem5 peers. A multi gem5 simulations can be kicked off by the util/multi/gem5-multi.sh wrapper script. Checkpoints are supported by multi-gem5. The checkpoint must be initiated by a single gem5 process. E.g., the gem5 process with rank 0 can take a checkpoint from the bootscript just before it invokes 'mpirun' to launch an MPI test. The message server process will notify all the other peer gem5 processes and make them take a checkpoint, too (after completing a global synchronisation to ensure that there are no inflight messages among gem5).	2015-07-15 19:53:50 -05:00
Andreas Hansson	5410660919	mem: Fix (ab)use of emplace to avoid temporary object creation	2015-07-13 08:46:28 -04:00
Andreas Hansson	d870c399d3	mem: Updated DRAMSim2 wrapper to new drain API Somehow this one slipped through without being updated.	2015-07-13 08:46:16 -04:00
Brandon Potter	bfe7ee96ad	ruby: replace global g_abs_controls with per-RubySystem var This is another step in the process of removing global variables from Ruby to enable multiple RubySystem instances in a single simulation. The list of abstract controllers is per-RubySystem and should be represented that way, rather than as a global. Since this is the last remaining Ruby global variable, the src/mem/ruby/Common/Global.* files are also removed.	2015-07-10 16:05:24 -05:00
Brandon Potter	f9a370f172	ruby: replace global g_system_ptr with per-object pointers This is another step in the process of removing global variables from Ruby to enable multiple RubySystem instances in a single simulation. With possibly multiple RubySystem objects, we can no longer use a global variable to find "the" RubySystem object. Instead, each Ruby component has to carry a pointer to the RubySystem object to which it belongs.	2015-07-10 16:05:23 -05:00
Brandon Potter	c38f5098b1	ruby: replace g_ruby_start with per-RubySystem m_start_cycle This patch begins the process of removing global variables from the Ruby source with the goal of eventually allowing users to create multiple Ruby instances in a single simulation. Currently, users cannot do so because several global variables and static members are referenced by the RubySystem object in a way that assumes that there will only ever be a single RubySystem. These need to be replaced with per-RubySystem equivalents. This specific patch replaces the global var g_ruby_start, which is used to calculate throughput statistics for Throttles in simple networks and links in Garnet networks, with a RubySystem instance var m_start_cycle.	2015-07-10 16:05:23 -05:00
Brandon Potter	9eda4bdc5a	ruby: remove extra whitespace and correct misspelled words	2015-07-10 16:05:23 -05:00
Andreas Sandberg	a74c446e7d	dev, arm: Add a device model that uses the NoMali model Add a simple device shim that interfaces with the NoMali model library. The gem5 side of the interface supports Mali T60x/T62x/T760 GPUs. This device model pretends to be a Mali GPU, but doesn't render anything and executes in zero time.	2015-07-07 10:03:14 +01:00
Andreas Sandberg	ed38e3432c	sim: Refactor and simplify the drain API The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining. This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error. Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects.	2015-07-07 09:51:05 +01:00
Andreas Sandberg	f16c0a4a90	sim: Decouple draining from the SimObject hierarchy Draining is currently done by traversing the SimObject graph and calling drain()/drainResume() on the SimObjects. This is not ideal when non-SimObjects (e.g., ports) need draining since this means that SimObjects owning those objects need to be aware of this. This changeset moves the responsibility for finding objects that need draining from SimObjects and the Python-side of the simulator to the DrainManager. The DrainManager now maintains a set of all objects that need draining. To reduce the overhead in classes owning non-SimObjects that need draining, objects inheriting from Drainable now automatically register with the DrainManager. If such an object is destroyed, it is automatically unregistered. This means that drain() and drainResume() should never be called directly on a Drainable object. While implementing the new functionality, the DrainManager has now been made thread safe. In practice, this means that it takes a lock whenever it manipulates the set of Drainable objects since SimObjects in different threads may create Drainable objects dynamically. Similarly, the drain counter is now an atomic_uint, which ensures that it is manipulated correctly when objects signal that they are done draining. A nice side effect of these changes is that it makes the drain state changes stricter, which the simulation scripts can exploit to avoid redundant drains.	2015-07-07 09:51:05 +01:00
Andreas Sandberg	d5f5fbb855	sim: Move mem(Writeback\|Invalidate) to SimObject The memWriteback() and memInvalidate() calls used to live in the Serializable interface. In this series of patches, the Serializable interface will be redesigned to make serialization independent of the object graph and always work on the entire simulator. This means that the Serialization interface won't be useful to perform maintenance of the caches in a sub-graph of the entire SimObject graph. This changeset moves these memory maintenance methods to the SimObject interface instead.	2015-07-07 09:51:04 +01:00
Andreas Sandberg	e9c3d59aae	sim: Make the drain state a global typed enum The drain state enum is currently a part of the Drainable interface. The same state machine will be used by the DrainManager to identify the global state of the simulator. Make the drain state a global typed enum to better cater for this usage scenario.	2015-07-07 09:51:04 +01:00
Andreas Sandberg	1dc5e63b88	python: Remove redundant drain when changing memory modes When the Python helper code switches CPU models, it sometimes also needs to change the memory mode of the simulator. When this happens, it accidentally tried to drain the simulator despite having done so already. This changeset removes the redundant drain.	2015-07-07 09:51:04 +01:00
Andreas Sandberg	7773cb9565	sim: Add macros to serialize objects into a section Add the SERIALIZE_OBJ / UNSERIALIZE_OBJ macros that serialize an object into a subsection of the current checkpoint section.	2015-07-07 09:51:04 +01:00
Andreas Sandberg	b3ecfa6ae0	base: Add serialization support to Pixels and FrameBuffer Serialize pixels as unsigned 32 bit integers by adding the required to_number() and stream operators. This is used by the FrameBuffer, which now implements the Serializable interface. Users of frame buffers are expected to serialize it into its own section by calling serializeSection().	2015-07-07 09:51:04 +01:00
Andreas Sandberg	888ec455cb	sim: Fix broken event unserialization Events expected to be unserialized using an event-specific unserializeEvent call. This call was never actually used, which meant the events relying on it never got unserialized (or scheduled after unserialization). Instead of relying on a custom call, we now use the normal serialization code again. In order to schedule the event correctly, the parrent object is expected to use the EventQueue::checkpointReschedule() call. This happens automatically for events that are serialized using the AutoSerialize mechanism.	2015-07-07 09:51:04 +01:00
Andreas Sandberg	76cd4393c0	sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code.	2015-07-07 09:51:03 +01:00
Andreas Sandberg	7cd5db8c6d	sim: Add serialization macros for std containers	2015-07-07 09:51:03 +01:00
Andreas Sandberg	777cc71c4a	mem: Cleanup CommMonitor in preparation for probe support Make configuration parameters constant and get rid of an unnecessary dependency on the Time class.	2015-07-06 17:08:53 +01:00
Nikos Nikoleris	67925a8334	x86: Adjust the size of the values written to the x87 misc registers All x87 misc registers are implemented in an array of 64 bit values but in real hardware the size of some of these registers is smaller. Previsouly all 64 bits where incorrectly set and then later read. To ensure correctness we mask the value in setMiscRegNoEffect to write only the valid bits. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-07-04 10:43:47 -05:00
Nilay Vaish	11a48faeb4	o3: correct the number of cc registers in rename map	2015-07-04 10:43:46 -05:00
Nilay Vaish	d29d7c41f1	mem: packet: Add const to constructor argument	2015-07-04 10:43:46 -05:00
Nilay Vaish	16ac48e6a4	ruby: drop NetworkMessage class This patch drops the NetworkMessage class. The relevant data members and functions have been moved to the Message class, which was the parent of NetworkMessage.	2015-07-04 10:43:46 -05:00
Nilay Vaish	baa3eb0de3	ruby: mesi three level: name change to avoid clash The accessor function getDestination() for Destination variable in the coherence message clashes with the getDestination() that is part of the Message class. Hence the name change.	2015-07-04 10:43:46 -05:00
Nilay Vaish	b4efb48a71	ruby: remove message buffer node This structure's only purpose was to provide a comparison function for ordering messages in the MessageBuffer. The comparison function is now being moved to the Message class itself. So we no longer require this structure.	2015-07-04 10:43:46 -05:00
Andreas Hansson	7e711c98f8	mem: Increase the default buffer sizes for the DDR4 controller This patch increases the default read/write buffer sizes for the DDR4 controller config to values that are more suitable for the high bandwidth and high bank count.	2015-07-03 10:14:48 -04:00
Wendy Elsasser	31f901b69d	mem: Update DRAM command scheduler for bank groups This patch updates the command arbitration so that bank group timing as well as rank-to-rank delays will be taken into account. The resulting arbitration no longer selects commands (prepped or not) that cannot issue seamlessly if there are commands that can issue back-to-back, minimizing the effect of rank-to-rank (tCS) & same bank group (tCCD_L) delays. The arbitration selects a new command based on the following priority. Within each priority band, the arbitration will use FCFS to select the appropriate command: 1) Bank is prepped and burst can issue seamlessly, without a bubble 2) Bank is not prepped, but can prep and issue seamlessly, without a bubble 3) Bank is prepped but burst cannot issue seamlessly. In this case, a bubble will occur on the bus Thus, to enable more parallelism in subsequent selections, an unprepped packet is given higher priority if the bank prep can be hidden. If the bank prep cannot be hidden, the selection logic will choose a prepped packet that cannot issue seamlessly if one exist. Otherwise, the default selection will choose the packet with the minimum bank prep delay.	2015-07-03 10:14:46 -04:00
Andreas Hansson	b56167b682	mem: Avoid DRAM write queue iteration for merging and read lookup This patch adds a simple lookup structure to avoid iterating over the write queue to find read matches, and for the merging of write bursts. Instead of relying on iteration we simply store a set of currently-buffered write-burst addresses and compare against these. For the reads we still perform the iteration if we have a match. For the writes, we rely entirely on the set. Note that there are corner-cases where sub-bursts would actually not be mergeable without a read-modify-write. We ignore these cases and opt for speed.	2015-07-03 10:14:45 -04:00
Andreas Hansson	db85ddca1a	mem: Delay responses in the crossbar before forwarding This patch changes how the crossbar classes deal with responses. Instead of forwarding responses directly and burdening the neighbouring modules in paying for the latency (through the pkt->headerDelay), we now queue them before sending them. The coherency protocol is not affected as requests and any snoop requests/responses are still passed on in zero time. Thus, the responses end up paying for any header delay accumulated when passing through the crossbar. Any latency incurred on the request path will be paid for on the response side, if no other module has dealt with it. As a result of this patch, responses are returned at a later point. This affects the number of outstanding transactions, and quite a few regressions see an impact in blocking due to no MSHRs, increased cache-miss latencies, etc. Going forward we should be able to use the same concept also for snoop responses, and any request that is not an express snoop.	2015-07-03 10:14:44 -04:00
Andreas Hansson	b93c912013	mem: Remove redundant is_top_level cache parameter This patch takes the final step in removing the is_top_level parameter from the cache. With the recent changes to read requests and write invalidations, the parameter is no longer needed, and consequently removed. This also means that asymmetric cache hierarchies are now fully supported (and we are actually using them already with L1 caches, but no table-walker caches, connected to a shared L2).	2015-07-03 10:14:43 -04:00
Andreas Hansson	71856cfbbc	mem: Split WriteInvalidateReq into write and invalidate WriteInvalidateReq ensures that a whole-line write does not incur the cost of first doing a read exclusive, only to later overwrite the data. This patch splits the existing WriteInvalidateReq into a WriteLineReq, which is done locally, and an InvalidateReq that is sent out throughout the memory system. The WriteLineReq re-uses the normal WriteResp. The change allows us to better express the difference between the cache that is performing the write, and the ones that are merely invalidating. As a consequence, we no longer have to rely on the isTopLevel flag. Moreover, the actual memory in the system does not see the intitial write, only the writeback. We were marking the written line as dirty already, so there is really no need to also push the write all the way to the memory. The overall flow of the write-invalidate operation remains the same, i.e. the operation is only carried out once the response for the invalidate comes back. This patch adds the InvalidateResp for this very reason.	2015-07-03 10:14:41 -04:00
Andreas Hansson	0ddde83a47	mem: Add ReadCleanReq and ReadSharedReq packets This patch adds two new read requests packets: ReadCleanReq - For a cache to explicitly request clean data. The response is thus exclusive or shared, but not owned or modified. The read-only caches (see previous patch) use this request type to ensure they do not get dirty data. ReadSharedReq - We add this to distinguish cache read requests from those issued by other masters, such as devices and CPUs. Thus, devices use ReadReq, and caches use ReadCleanReq, ReadExReq, or ReadSharedReq. For the latter, the response can be any state, shared, exclusive, owned or even modified. Both ReadCleanReq and ReadSharedReq re-use the normal ReadResp. The two transactions are aligned with the emerging cache-coherent TLM standard and the AMBA nomenclature. With this change, the normal ReadReq should never be used by a cache, and is reserved for the actual (non-caching) masters in the system. We thus have a way of identifying if a request came from a cache or not. The introduction of ReadSharedReq thus removes the need for the current isTopLevel hack, and also allows us to stop relying on checking the packet size to determine if the source is a cache or not. This is fixed in follow-on patches.	2015-07-03 10:14:40 -04:00
Andreas Hansson	893533a126	mem: Allow read-only caches and check compliance This patch adds a parameter to the BaseCache to enable a read-only cache, for example for the instruction cache, or table-walker cache (not for x86). A number of checks are put in place in the code to ensure a read-only cache does not end up with dirty data. A follow-on patch adds suitable read requests to allow a read-only cache to explicitly ask for clean data.	2015-07-03 10:14:39 -04:00
Ali Jafri	a262908acc	mem: Add clean evicts to improve snoop filter tracking This patch adds eviction notices to the caches, to provide accurate tracking of cache blocks in snoop filters. We add the CleanEvict message to the memory heirarchy and use both CleanEvicts and Writebacks with BLOCK_CACHED flags to propagate notice of clean and dirty evictions respectively, down the memory hierarchy. Note that the BLOCK_CACHED flag indicates whether there exist any copies of the evicted block in the caches above the evicting cache. The purpose of the CleanEvict message is to notify snoop filters of silent evictions in the relevant caches. The CleanEvict message behaves much like a Writeback. CleanEvict is a write and a request but unlike a Writeback, CleanEvict does not have data and does not need exclusive access to the block. The cache generates the CleanEvict message on a fill resulting in eviction of a clean block. Before travelling downwards CleanEvict requests generate zero-time snoop requests to check if the same block is cached in upper levels of the memory heirarchy. If the block exists, the cache discards the CleanEvict message. The snoops check the tags, writeback queue and the MSHRs of upper level caches in a manner similar to snoops generated from HardPFReqs. Currently CleanEvicts keep travelling towards main memory unless they encounter the block corresponding to their address or reach main memory (since we have no well defined point of serialisation). Main memory simply discards CleanEvict messages. We have modified the behavior of Writebacks, such that they generate snoops to check for the presence of blocks in upper level caches. It is possible in our current implmentation for a lower level cache to be writing back a block while a shared copy of the same block exists in the upper level cache. If the snoops find the same block in upper level caches, we set the BLOCK_CACHED flag in the Writeback message. We have also added logic to account for interaction of other message types with CleanEvicts waiting in the writeback queue. A simple example is of a response arriving at a cache removing any CleanEvicts to the same address from the cache's writeback queue.	2015-07-03 10:14:37 -04:00
Andreas Hansson	aa5bbe81f6	mem: Convert Request static const flags to enums This patch fixes an issue which is very wide spread in the codebase, causing sporadic linking failures. The issue is that we declare static const class variables in the header, without any definition (as part of a source file). In most cases the compiler propagates the value and we have no issues. However, especially for less optimising builds such as debug, we get sporadic linking failures due to undefined references. This patch fixes the Request class, by turning the static const flags and master IDs into C++11 typed enums.	2015-07-03 10:14:36 -04:00
Curtis Dunham	e385ae0c72	base: remove fd from object loaders All the object loaders directly examine the (already completely loaded by object_file.cc) memory image. There is no current motivation to keep the fd around.	2015-07-03 10:14:34 -04:00
Andreas Hansson	c466d55a26	scons: Bump compiler requirement to gcc >= 4.7 and clang >= 3.1 This patch updates the compiler minimum requirement to gcc 4.7 and clang 3.1, thus allowing: 1. Explicit virtual overrides (no need for M5_ATTR_OVERRIDE) 2. Non-static data member initializers 3. Template aliases 4. Delegating constructors This patch also enables a transition from --std=c++0x to --std=c++11.	2015-07-03 10:14:15 -04:00
Nilay Vaish	57971248f6	ruby: slicc: remove README No longer maintained. Updates are only made to the wiki page. So being dropped.	2015-06-25 11:58:30 -05:00
Nilay Vaish	0647d99854	ruby: message: remove a data member added by mistake I (Nilay) had mistakenly added a data member to the Message class in revision c1694b4032a6. The data member is being removed.	2015-06-25 11:58:29 -05:00
Jason Power	2f3c467883	Ruby: Remove assert in RubyPort retry list logic Remove the assert when adding a port to the RubyPort retry list. Instead of asserting, just ignore the added port, since it's already on the list. Without this patch, Ruby+detailed fails for even the simplest tests	2015-06-25 11:58:28 -05:00
Andreas Sandberg	cc813cd5f7	base: Add a warn_if macro Add a warn if macro that is analogous to the panic_if and fatal_if.	2015-06-21 20:52:13 +01:00
Andreas Sandberg	d541038549	arm: Cleanup arch headers to remove dma_device.hh dependency Break the dependency on dma_device.hh by forward-declaring DmaPort in the relevant header.	2015-06-21 20:48:33 +01:00
Ali Jafri	f0c3b70451	mem: Add check for express snoop in packet destructor Snoop packets share the request pointer with the originating packets. We need to ensure that the snoop packet destruction does not delete the request. Snoops are used for reads, invalidations, HardPFReqs, Writebacks and CleansEvicts. Reads, invalidations, and HardPFReqs need a response so their snoops do not delete the request. For Writebacks and CleanEvicts we need to check explicitly for whethere the current packet is an express snoop, in whcih case do not delete the request.	2015-06-09 09:21:18 -04:00
Andreas Hansson	578a7f20c6	mem: Fix snoop packet data allocation bug This patch fixes an issue where the snoop packet did not properly forward the data pointer in case of static data.	2015-06-09 09:21:17 -04:00
Rune Holm	eb3ed11794	arm: Delete debug print in initialization of hardware thread There seems to have been a debug print left in when the original ARMv8 support was merged in. This printout is performed every time you initialize a hardware thread, and it prints raw pointers, so it always causes diffs in the regression. This patch removes the debug print.	2015-06-09 09:21:16 -04:00
Rune Holm	f4311d3932	arm: Fix typo in ldrsh instruction name ldrsh was typoed as hdrsh, which is a bit annoying when printing instructions. This patch fixes it.	2015-06-09 09:21:15 -04:00
Andreas Sandberg	737e5da7f6	base: Reset CircleBuf size on flush() The flush() method in CircleBuf resets the state of the circular buffer, but fails to set size to zero. This obviously confuses code that tries to determine the amount of data in the buffer. Set the size to zero on flush.	2015-06-09 09:21:14 -04:00
Andreas Sandberg	a9cad92011	dev, arm: Include PIO size in AmbaDmaDevice constructor Make it possible to specify the size of the PIO space for an AMBA DMA device. Maintain backwards compatibility and default to zero.	2015-06-09 09:21:12 -04:00
Marco Elver	6599dd87c8	ruby: Fix MESI consistency bug Fixes missed forward eviction to CPU. With the O3CPU this can lead to load-load reordering, as the LQ is never notified of the invalidate. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-06-07 14:02:40 -05:00
Matthias Jung	25fe4c2529	mem: Add HMC Timing Parameters A single HMC-2500 x32 model based on: [1] DRAMSpec: a high-level DRAM bank modelling tool developed at the University of Kaiserslautern. This high level tool uses RC (resistance-capacitance) and CV (capacitance-voltage) models to estimate the DRAM bank latency and power numbers. [2] A Logic-base Interconnect for Supporting Near Memory Computation in the Hybrid Memory Cube (E. Azarkhish et. al) Assumed for the HMC model is a 30 nm technology node. The modelled HMC consists of a 4 Gbit part with 4 layers connected with TSVs. Each layer has 16 vaults and each vault consists of 2 banks per layer. In order to be able to use the same controller used for 2D DRAM generations for HMC, the following analogy is done: Channel (DDR) => Vault (HMC) device_size (DDR) => size of a single layer in a vault ranks per channel (DDR) => number of layers banks per rank (DDR) => banks per layer devices per rank (DDR) => devices per layer ( 1 for HMC). The parameters for which no input is available are inherited from the DDR3 configuration.	2015-06-07 14:02:40 -05:00
Ruslan Bukin ext:(%2C%20Zhang%20Guoye)	736d3314bf	arch: fix build under MacOSX put O_DIRECT under ifdefs -- this fixes build for MacOSX. Also use correct class for arm64 openFlagTable. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-06-07 14:02:40 -05:00
Christoph Pfister	4a17494708	mem: addr_mapper: restore old address if request not sent Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-05-30 13:45:17 +02:00
Curtis Dunham	31825bd988	sim, arm: add checkpoint upgrader for d02b45a5 The insertion of CONTEXTIDR_EL2 in the ARM miscellaneous registers obsoletes old checkpoints.	2015-06-01 18:05:11 -05:00
Andreas Sandberg	7c4eb3b4d8	kvm, arm: Add support for aarch64 This changeset adds support for aarch64 in kvm. The CPU module supports both checkpointing and online CPU model switching as long as no devices are simulated by the host kernel. It currently has the following limitations: * The system register based generic timer can only be simulated by the host kernel. Workaround: Use a memory mapped timer instead to simulate the timer in gem5. * Simulating devices (e.g., the generic timer) in the host kernel requires that the host kernel also simulates the GIC. * ID registers in the host and in gem5 must match for switching between simulated CPUs and KVM. This is particularly important for ID registers describing memory system capabilities (e.g., ASID size, physical address size). * Switching between a virtualized CPU and a simulated CPU is currently not supported if in-kernel device emulation is used. This could be worked around by adding support for switching to the gem5 (e.g., the KvmGic) side of the device models. A simpler workaround is to avoid in-kernel device models altogether.	2015-06-01 19:44:19 +01:00
Andreas Sandberg	dbfd6effe0	kvm, arm, dev: Add an in-kernel GIC implementation This changeset adds a GIC implementation that uses the kernel's built-in support for simulating the interrupt controller. Since there is currently no support for state transfer between gem5 and the kernel, the device model does not support serialization and CPU switching (which would require switching to a gem5-simulated GIC).	2015-06-01 19:44:17 +01:00
Andreas Sandberg	8e7c0575dc	kvm: Handle inst events at the current instruction count There are cases (particularly when attaching GDB) when instruction events are scheduled at the current instruction tick. This used to trigger an assertion error in kvm. This changeset adds a check for this condition and forces KVM to do a quick entry that completes any pending IO operations, but does not execute any new instructions, before servicing the event. We could check if we need to enter KVM at all, but forcing a quick entry is makes the code slightly cleaner and does not hurt correctness (performance is hardly an issue in these cases).	2015-06-01 19:43:41 +01:00
Andreas Sandberg	06cf5cc60b	kvm, arm: Move ARM-specific files to arch/arm/kvm/ This changeset moves the ARM-specific KVM CPU implementation to arch/arm/kvm/. This change is expected to keep the source tree somewhat cleaner as we start adding support for ARMv8 and KVM in-kernel interrupt controller simulation. --HG-- rename : src/cpu/kvm/ArmKvmCPU.py => src/arch/arm/kvm/ArmKvmCPU.py rename : src/cpu/kvm/arm_cpu.cc => src/arch/arm/kvm/arm_cpu.cc rename : src/cpu/kvm/arm_cpu.hh => src/arch/arm/kvm/arm_cpu.hh	2015-06-01 19:43:40 +01:00
Curtis Dunham	e590f0d1ef	arm: implement the CONTEXTIDR_EL2 system reg.	2015-05-26 03:21:45 -04:00
Nathanael Premillieu	31fd18ab15	arm: Make address translation faster with better caching This patch adds better caching of the sys regs for AArch64, thus avoiding unnecessary calls to tc->readMiscReg(MISCREG_CPSR) in the non-faulting case.	2015-05-26 03:21:42 -04:00
Andreas Hansson	53a360985b	base: Allow multiple interleaved ranges This patch changes how the address range calculates intersection such that a system can have a number of non-overlapping interleaved ranges without complaining. Without this patch we end up with a panic.	2015-05-26 03:21:40 -04:00
Andrew Bardsley	cea1d14a93	cpu: Fix a bug in counting issued instructions in MinorCPU The MinorCPU would count bubbles in Execute::issue as part of the num_insts_issued and so sometimes reach the instruction issue limit incorrectly. Fixed by checking for a bubble in one new place.	2015-05-26 03:21:37 -04:00
Giacomo Gabrielli	cc2346e8ca	arm: Implement some missing syscalls (SE mode) Adding a few syscalls that were previously considered unimplemented.	2015-05-26 03:21:35 -04:00
Andreas Hansson	0cc350d2c5	ruby: Deprecation warning for RubyMemoryControl A step towards removing RubyMemoryControl and shift users to DRAMCtrl. The latter is faster, more representative, very versatile, and is integrated with power models.	2015-05-26 03:21:34 -04:00
Andreas Sandberg	f3f06e1684	arm, dev: Add support for a memory mapped generic timer There are cases when we don't want to use a system register mapped generic timer, but can't use the SP804. For example, when using KVM on aarch64, we want to intercept accesses to the generic timer, but can't do so if it is using the system register interface. In such cases, we need to use a memory-mapped generic timer. This changeset adds a device model that implements the memory mapped generic timer interface. The current implementation only supports a single frame (i.e., one virtual timer and one physical timer).	2015-05-23 13:46:56 +01:00
Andreas Sandberg	6533f2000b	arm: Get rid of pointless have_generic_timer param The ArmSystem class has a parameter to indicate whether it is configured to use the generic timer extension or not. This parameter doesn't affect any feature flags in the current implementation and is therefore completely unnecessary. In fact, we usually don't set it even if a system has a generic timer. If we ever need to check if there is a generic timer present, we should just request a pointer and check if it is non-null instead.	2015-05-23 13:46:54 +01:00
Andreas Sandberg	2278fec1d1	dev, arm: Add virtual timers to the generic timer model The generic timer model currently does not support virtual counters. Virtual and physical counters both tick with the same frequency. However, virtual timers allow a hypervisor to set an offset that is subtracted from the counter when it is read. This enables the hypervisor to present a time base that ticks with virtual time in the VM (i.e., doesn't tick when the VM isn't running). Modern Linux kernels generally assume that virtual counters exist and try to use them by default.	2015-05-23 13:46:53 +01:00
Andreas Sandberg	65f3f097d3	dev, arm: Refactor and clean up the generic timer model This changeset cleans up the generic timer a bit and moves most of the register juggling from the ISA code into a separate class in the same source file as the rest of the generic timer. It also removes the assumption that there is always 8 or fewer CPUs in the system. Instead of having a fixed limit, we now instantiate per-core timers as they are requested. This is all in preparation for other patches that add support for virtual timers and a memory mapped interface.	2015-05-23 13:46:52 +01:00
Andreas Sandberg	5435f25ec8	kvm: Fix dumping code for large registers The register dumping code in kvm tries to print the bytes in large registers (128 bits and larger) instead of printing them as hex. This changeset fixes that.	2015-05-23 13:37:22 +01:00
Andreas Sandberg	ed447bbff9	kvm, x86: Guard x86-specific APIs in KvmVM Protect x86-specific APIs in KvmVM with compile-time guards to avoid breaking ARM builds.	2015-05-23 13:37:20 +01:00
Andreas Sandberg	cba3a125e1	arm: Workaround incorrect HDLCD register order in kernel Some versions of the kernel incorrectly swap the red and blue color select registers. This changeset adds a workaround for that by swapping them when instantiating a PixelConverter.	2015-05-23 13:37:04 +01:00
Andreas Sandberg	db5c9a5f90	base: Redesign internal frame buffer handling Currently, frame buffer handling in gem5 is quite ad hoc. In practice, we pass around naked pointers to raw pixel data and expect consumers to convert frame buffers using the (broken) VideoConverter. This changeset completely redesigns the way we handle frame buffers internally. In summary, it fixes several color conversion bugs, adds support for more color formats (e.g., big endian), and makes the code base easier to follow. In the new world, gem5 always represents pixel data using the Pixel struct when pixels need to be passed between different classes (e.g., a display controller and the VNC server). Producers of entire frames (e.g., display controllers) should use the FrameBuffer class to represent a frame. Frame producers are expected to create one instance of the FrameBuffer class in their constructors and register it with its consumers once. Consumers are expected to check the dimensions of the frame buffer when they consume it. Conversion between the external representation and the internal representation is supported for all common "true color" RGB formats of up to 32-bit color depth. The external pixel representation is expected to be between 1 and 4 bytes in either big endian or little endian. Color channels are assumed to be contiguous ranges of bits within each pixel word. The external pixel value is scaled to an 8-bit internal representation using a floating multiplication to map it to the entire 8-bit range.	2015-05-23 13:37:03 +01:00
Andreas Sandberg	1985d28ef9	base: Clean up bitmap generation code The bitmap generation code is hard to follow and incorrectly uses the size of an enum member to calculate the size of a pixel. This changeset cleans up the code and adds some documentation.	2015-05-23 13:37:01 +01:00
Joel Hestness	0479569f67	ruby: Fix RubySystem warm-up and cool-down scope The processes of warming up and cooling down Ruby caches are simulation-wide processes, not just RubySystem instance-specific processes. Thus, the warm-up and cool-down variables should be globally visible to any Ruby components participating in either process. Make these variables static members and track the warm-up and cool-down processes as appropriate. This patch also has two side benefits: 1) It removes references to the RubySystem g_system_ptr, which are problematic for allowing multiple RubySystem instances in a single simulation. Warmup and cooldown variables being static (global) reduces the need for instance-specific dereferences through the RubySystem. 2) From the AbstractController, it removes local RubySystem pointers, which are used inconsistently with other uses of the RubySystem: 11 other uses reference the RubySystem with the g_system_ptr. Only sequencers have local pointers.	2015-05-19 10:56:51 -05:00
Andreas Hansson	99d3fa5945	arm: Identify table-walker requests This patch ensures all page-table walks are flagged as such.	2015-05-15 13:40:01 -04:00
Andreas Hansson	bd583d00f9	misc: Appease gcc 5.1 Three minor issues are resolved: 1. Apparently gcc 5.1 does not like negation of booleans followed by bitwise AND. 2. Somehow the compiler also gets confused and warns about NoopMachInst being unused (removing it causes compilation errors though). Most likely a compiler bug. 3. There seems to be a number of instances where loop unrolling causes false positives for the array-bounds check. For now, switch to std::array. Potentially we could disable the warning for newer gcc versions, but switching to std::array is probably a good move in any case.	2015-05-15 13:39:53 -04:00
Andreas Sandberg	37aab4a155	sim: Don't clear the active CPU vector in System::initState The system class currently clears the vector of active CPUs in initState(). CPUs are added to the list by registerThreadContext() which is called from BaseCPU::init(). This obviously breaks when the System object is initialized after the CPUs. This changeset removes the offending clear() call since the list will be empty after it has been instantiated anyway.	2015-05-15 13:39:44 -04:00
Steve Reinhardt	c65fa3dceb	syscall_emul: fix warn_once behavior The current ignoreWarnOnceFunc doesn't really work as expected, since it will only generate one warning total, for whichever "warn-once" syscall is invoked first. This patch fixes that behavior by keeping a "warned" flag in the SyscallDesc object, allowing suitably flagged syscalls to warn exactly once per syscall.	2015-05-05 09:25:59 -07:00
Andreas Hansson	f349592071	arm: Add missing FPEXC.EN check Add a missing check to ensure that exceptions are generated properly.	2015-05-05 03:22:45 -04:00
Giacomo Gabrielli	a3f23894eb	arm: enable DCZVA by default in SE mode	2015-05-05 03:22:42 -04:00
Stephan Diestelhorst	2847d5f517	mem: Create a request copy for deferred snoops Sometimes, we need to defer an express snoop in an MSHR, but the original request might complete and deallocate the original pkt->req. In those cases, create a copy of the request so that someone who is inspecting the delayed snoop can also inspect the request still. All of this is rather hacky, but the allocation / linking and general life-time management of Packet and Request is rather tricky. Deleting the copy is another tricky area, testing so far has shown that the right copy is deleted at the right time.	2015-03-17 11:50:55 +00:00
Andreas Sandberg	706597f021	arm: Relax ordering for some uncacheable accesses We currently assume that all uncacheable memory accesses are strictly ordered. Instead of always enforcing strict ordering, we now only enforce it if the required memory type is device memory or strongly ordered memory.	2015-05-05 03:22:34 -04:00
Andreas Sandberg	48281375ee	mem, cpu: Add a separate flag for strictly ordered memory The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.	2015-05-05 03:22:33 -04:00
Andreas Sandberg	1da634ace0	mem, alpha: Move Alpha-specific request flags Move Alpha-specific memory request flags to an architecture-specific header and map them to the architecture specific flag bit range.	2015-05-05 03:22:31 -04:00
Andreas Hansson	23b9792681	arm: Remove unnecessary boot uncachability With the recent patches addressing how we deal with uncacheable accesses there is no longer need for the work arounds put in place to enforce certain sections of memory to be uncacheable during boot.	2015-05-05 03:22:30 -04:00
Andreas Hansson	36f29496a0	mem: Snoop into caches on uncacheable accesses This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent. The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block.	2015-05-05 03:22:29 -04:00
Andreas Hansson	554ddc7c07	arch, cpu: Do not forward snoops to table walker This patch simplifies the overall CPU by changing the TLB caches such that they do not forward snoops to the table walker port(s). Note that only ARM and X86 are affected. There is no reason for the ports to snoop as they do not actually take any action, and from a performance point of view we are better of not snooping more than we have to. Should it at a later point be required to snoop for a particular TLB design it is easy enough to add it back.	2015-05-05 03:22:27 -04:00
Andreas Hansson	14e5b2ea55	mem: Pass shared downstream through caches This patch ensures that we pass on information about a packet being shared (rather than exclusive), when forwarding a packet downstream. Without this patch there is a risk that a downstream cache considers the line exclusive when it really isn't.	2015-05-05 03:22:26 -04:00
Ali Jafri	3d33432136	mem: Add forward snoop check for HardPFReqs We should always check whether the cache is supposed to be forwarding snoops before generating snoops.	2015-05-05 03:22:25 -04:00
Andreas Hansson	0ebbf3f951	mem: Add missing stats update for uncacheable MSHRs This patch adds a missing counter update for the uncacheable accesses. By updating this counter we also get a meaningful average latency for uncacheable accesses (previously inf).	2015-05-05 03:22:24 -04:00
Andreas Hansson	33e3e370f2	mem: Tidy up BaseCache parameters This patch simply tidies up the BaseCache parameters and removes the unused "two_queue" parameter.	2015-05-05 03:22:22 -04:00
David Guillen	5287945a8b	mem: Remove templates in cache model This patch changes the cache implementation to rely on virtual methods rather than using the replacement policy as a template argument. There is no impact on the simulation performance, and overall the changes make it easier to modify (and subclass) the cache and/or replacement policy.	2015-05-05 03:22:21 -04:00
Andreas Hansson	d0d933facc	cpu: Work around gcc 4.9 issues with Num_OpClasses This patch fixes a recent issue with gcc 4.9 (and possibly more) being convinced that indices outside the array bounds are used when initialising the FUPool members.	2015-05-05 03:22:19 -04:00
Ruslan Bukin	81f3211149	arch, base, dev, kern, sym: FreeBSD support This adds support for FreeBSD/aarch64 FS and SE mode (basic set of syscalls only) Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-29 22:35:23 -05:00
Rizwana Begum	52a3bc5e5c	mem: Simplify page close checks for adaptive policies Both open_adaptive and close_adaptive page polices keep the page open if a row hit is found. If a row hit is not found, close_adaptive page policy precharges the row, and open_adaptive policy precharges the row only if there is a bank conflict request waiting in the queue. This patch makes the checks for above conditions simpler. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-29 22:35:22 -05:00
Nilay Vaish	3a2731fb8c	ruby: set: replace long by unsigned long UBSan complains about negative value being shifted	2015-04-29 22:35:22 -05:00
Nilay Vaish	4333549575	cpu: o3: replace issueLatency with bool pipelined Currently, each op class has a parameter issueLat that denotes the cycles after which another op of the same class can be issued. As of now, this latency can either be one cycle (fully pipelined) or same as execution latency of the op (not at all pipelined). The fact that issueLat is a parameter of type Cycles makes one believe that it can be set to any value. To avoid the confusion, the parameter is being renamed as 'pipelined' with type boolean. If set to true, the op would execute in a fully pipelined fashion. Otherwise, it would execute in an unpipelined fashion.	2015-04-29 22:35:22 -05:00
Nilay Vaish	0dbd696aae	cpu: o3: single cycle default div microop latency on x86 This patch sets the default latency of the division microop to a single cycle on x86. This is because the division instructions DIV and IDIV have been implemented as loops of div microops, where each microop computes a single bit of the quotient.	2015-04-29 22:35:22 -05:00
Nilay Vaish	ee06fed656	x86: change divide-by-zero fault to divide-error Same exception is raised whether division with zero is performed or the quotient is greater than the maximum value that the provided space can hold. Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.	2015-04-29 22:35:22 -05:00
Andreas Hansson	179787f31f	misc: Appease gcc 5.1 without moving GDB_REG_BYTES This patch rolls back the move of the GDB_REG_BYTES constant, and instead adds M5_VAR_USED.	2015-04-24 03:30:08 -04:00
Rene de Jong	483f873d01	arm, dev: Add a UFS device This patch introduces a UFS host controller and a UFS device. More information about the UFS standard can be found at the JEDEC site: http://www.jedec.org/standards-documents/results/jesd220 Note that the model does not implement the complete standard, and as such is not an actual implementation of UFS. The following SCSI commands are implemented: inquiry, read, read capacity, report LUNs, start/stop, test unit ready, verify, write, format unit, send diagnostic, synchronize cache, mode select, mode sense, request sense, unmap, write buffer and read buffer. This is sufficient for usage with Linux and Android. To interact with this model a kernel version 3.9 or above is needed.	2015-04-23 13:37:50 -04:00
Rene de Jong	fff28ce954	arm, dev: Add a NAND flash timing model This adds a NAND flash timing model. This model takes the number of planes into account and is ultimately intended to be used as a high-level performance model for any device using flash. To access the memory, use either readMemory or writeMemory. To make use of the model you will need an interface model such as UFSHostDevice, which is part of a separate patch. At the moment the flash device is part of the ARM device tree since the only use if the UFSHostDevice, and that in turn relies on the ARM GIC.	2015-04-23 13:37:49 -04:00
Peter Enns	2e64590b88	dev: Add support for i2c devices This patch adds an I2C bus and base device. I2C is used to connect a variety of sensors, and this patch serves as a starting point to enable a range of I2C devices.	2015-04-23 13:37:48 -04:00
Andreas Hansson	c8c4f66889	misc: Appease gcc 5.1 This patch fixes a few small issues to ensure gem5 compiles when using gcc 5.1. First, the GDB_REG_BYTES in the RemoteGDB header are, rather surprisingly, flagged as unused for both ARM and X86. Removing them, however, causes compilation errors as they are actually used in the source file. Moving the constant into the class definition fixes the issue. Possibly a gcc bug. Second, we have an unused EthPktData constructor using auto_ptr, and the latter is deprecated. Since the code is never used it is simply removed.	2015-04-23 13:37:46 -04:00
Brandon Potter	a70a83155b	cpu: remove conditional check (count > 0) on o3 IQ squashes The o3 cpu instruction queue model uses the count variable to track the number of unissued instructions in the queue. Previously, the squash method used this variable to avoid executing the doSquash method when there were no unissued instructions in the pipeline. A corner case problem exists when only issued instructions exist in the pipeline and a squash occurs; the doSquash code is not invoked and subsequently does not clean up state properly.	2015-04-22 07:52:03 -07:00
Brandon Potter	4991c29965	syscall_emul: implement clock_gettime system call	2015-04-22 07:51:27 -07:00
Monir Mozumder	00e3cab8fc	syscall_emul: update x86 syscall table Update table with additional definitions through Linux 3.13.	2015-04-22 07:51:27 -07:00
Brandon Potter	344a437064	syscall_emul: update getrlimit to use warn Don't use std::cerr directly, and just return EINVAL instead of aborting.	2015-04-22 07:51:27 -07:00
Brandon Potter	9c6509f198	syscall_emul: fix warning with wrong syscall name Also nix extra whitespace.	2015-04-22 07:51:27 -07:00
Brandon Potter	6ad29ba6df	base: add new ChunkGenerator method to identify last chunk	2015-04-22 07:51:27 -07:00
Andreas Hansson	cd76e34056	cpu: Remove the InOrderCPU from the tree This patch takes the final step in removing the InOrderCPU from the tree. Rest in peace. The MinorCPU is now used to model an in-order microarchitecture, and long term the MinorCPU will eventually be renamed InOrderCPU.	2015-04-20 12:46:35 -04:00
Malek Musleh	826f69b470	config, cpu: fix progress interval for switched CPUs This patch ensures that the CPU progress Event is triggered for the new set of switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids printing the interval state if the cpu is currently switched out. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-14 11:01:10 -05:00
Dibakar Gope	34ad1123ee	cpu: re-organizes the branch predictor structure. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-13 17:33:57 -05:00
Nilay Vaish	e596e52498	x86: implements x87 mult/div instructions	2015-04-13 17:33:57 -05:00
Lena Olson	dea7acdb3e	ruby: allow restoring from checkpoint when using DRAMCtrl Restoring from a checkpoint with ruby + the DRAMCtrl memory model was not working, because ruby and DRAMCtrl disagreed on the current tick during warmup. Since there is no reason to do timing requests during warmup, use functional requests instead. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-13 17:33:57 -05:00
Nilay Vaish	d6af46915c	sim: Use NULL instead of None for testing filenames. The filenames are initialized with NULL. So the test should be checking for them to be == NULL instead == None.	2015-04-13 17:33:57 -05:00
Nilay Vaish	b26fef8466	sim: fix function for emulating dup() The function was using the host fd to obtain the fd object from the simulated process.	2015-04-13 17:33:57 -05:00
Curtis Dunham	c3268f8820	config: Support full-system with SST's memory system This patch adds an example configuration in ext/sst/tests/ that allows an SST/gem5 instance to simulate a 4-core AArch64 system with SST's memHierarchy components providing all the caches and memories.	2015-04-08 15:56:06 -05:00
Nikos Nikoleris	4bdbdd8413	dev: (un)serialize fix for the RTC and RTC Timer Interrupt events Restoring from a checkpoint fails if either the RTC or the RTC Timer Interrrupt event is disabled. The restored machine tried incorrectly to schedule the next event with negative offset. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Ruslan Bukin	bebab7f24f	sim: correct check for endianess Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Ruslan Bukin	b3314673f4	dev: Extend access width for IDE control registers Add 32-bit access width for PrimaryTiming register and 16bit for UDMAControl register as FreeBSD required. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Nikos Nikoleris	305e29b98e	cpu: fix system total instructions accounting The totalInstructions counter is only incremented when the whole instruction is commited and not on every microop. It was incorrectly reset in atomic and timing cpus. Committed by: Nilay Vaish <nilay@cs.wisc.edu>"	2015-04-03 11:42:10 -05:00
Lena Olson	333988a73e	x86: fix debug trace output for mwait When running with the Exec flag, the mwait instruction attempted to print out its source registers, which were never actually initialized. This led to sporadic assertion failures when the value stored there was invalid. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-04-03 11:42:10 -05:00
Stephan Diestelhorst	cb8856f580	mem: Support any number of master-IDs in stride prefetcher The stride prefetcher had a hardcoded number of contexts (i.e. master-IDs) that it could handle. Since master IDs need to be unique per system, and every core, cache etc. requires a separate master port, a static limit on these does not make much sense. Instead, this patch adds a small hash map that will map all master IDs to the right prefetch state and dynamically allocates new state for new master IDs.	2015-03-27 04:56:03 -04:00
Andreas Hansson	0197e580e5	mem: Allocate cache writebacks before new MSHRs This patch changes the order of writeback allocation such that any writebacks resulting from a tag lookup (e.g. for an uncacheable access), are added to the writebuffer before any new MSHR entries are allocated. This ensures that the writebacks logically precedes the new allocations. The patch also changes the uncacheable flush to use proper timed (or atomic) writebacks, as opposed to functional writes.	2015-03-27 04:56:02 -04:00
Andreas Hansson	24763c2177	mem: Cleanup flow for uncacheable accesses This patch simplifies the code dealing with uncacheable timing accesses, aiming to align it with the existing miss handling. Similar to what we do in atomic, a timing request now goes through Cache::access (where the block is also flushed), and then proceeds to ignore any existing MSHR for the block in question. This unifies the flow for cacheable and uncacheable accesses, and for atomic and timing.	2015-03-27 04:56:01 -04:00
Andreas Hansson	a7a1e6004a	mem: Ignore uncacheable MSHRs when finding matches This patch changes how we search for matching MSHRs, ignoring any MSHR that is allocated for an uncacheable access. By doing so, this patch fixes a corner case in the MSHRs where incorrect data ended up being copied into a (cacheable) read packet due to a first uncacheable MSHR target of size 4, followed by a cacheable target to the same MSHR of size 64. The latter target was filled with nonsense data.	2015-03-27 04:56:00 -04:00
Andreas Hansson	801ce65eae	mem: Remove redundant allocateUncachedReadBuffer in cache This patch removes the no-longer-needed allocateUncachedReadBuffer. Besides the checks it is exactly the same as allocateMissBuffer and thus provides no value.	2015-03-27 04:55:59 -04:00
Andreas Hansson	fe806a0dd7	mem: Modernise MSHR iterators to C++11 This patch updates the iterators in the MSHR and MSHR queues to use C++11 range-based for loops. It also does a bit of additional house keeping.	2015-03-27 04:55:57 -04:00
Andreas Hansson	7bae98459c	mem: Align all MSHR entries to block boundaries This patch aligns all MSHR queue entries to block boundaries to simplify checks for matches. Previously there were corner cases that could lead to existing entries not being identified as matches. There are, rather alarmingly, a few regressions that change with this patch.	2015-03-27 04:55:55 -04:00
Ali Jafri	15f0d9ff14	mem: Rename PREFETCH_SNOOP_SQUASH flag to BLOCK_CACHED This patch subsumes the PREFETCH_SNOOP_SQUASH flag with the more generic BLOCK_CACHED flag. Future patches implementing cache eviction messages can use the BLOCK_CACHED flag in almost the same manner as hardware prefetches use the PREFETCH_SNOOP_SQUASH flag. The PREFTECH_SNOOP_FLAG is set if the prefetch target is found in the tags or the MSHRs in any state, so we are simply replacing calls to setPrefetchSquashed() with setBlockCached(). The case of where the prefetch target is found in the writeback MSHRs of upper level caches continues to be covered by the MEM_INHIBIT flag.	2015-03-27 04:55:54 -04:00
Curtis Dunham	a1164154de	sim: Update limit_event reuse to final version Matching final version on reviewboard.	2015-03-26 11:16:44 -04:00
Andreas Hansson	a196dbe3bf	cpu: Fix InstPBTrace inheritance This patch fixes an issue that prevented gem5 to be built with C++ config and without Python.	2015-03-26 11:16:43 -04:00
Steve Reinhardt	6677b9122a	mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now.	2015-03-23 16:14:20 -07:00
Steve Reinhardt	5302305255	misc: quote args in echoed command line Currently if there are shell special characters in a command-line argument, you can't copy and paste the echoed command line onto a shell prompt because the characters aren't quoted properly. This patch fixes that problem.	2015-03-23 16:14:18 -07:00
Curtis Dunham	564482c782	sim: Reuse the same limit_event in simulate() This patch accomplishes two things: 1. Makes simulate()'s GlobalSimLoopExitEvent a singleton reused across calls. This is slightly more efficient than recreating it every time. 2. Gives callers to simulate() (especially other simulators) a foolproof way of knowing that the simulation period ended successfully by hitting the limit event. They can call getLimitEvent() and compare it to the return value of simulate(). This change was motivated by an ongoing effort to integrate gem5 and SST, with SST as the master sim and gem5 as the slave sim.	2015-03-23 06:57:36 -04:00
Andreas Hansson	45286d9b64	mem: Tidy up Request This patch does a bit of house keeping, fixing up typos, removing dead code etc.	2015-03-23 06:57:34 -04:00
Matt Evans	1ccc3d7e5b	arm: Add a GICv2m device This patch adds a new PIO-accessible GICv2m shim. This shim has a PIO slave port on one side, and SPI 'wires' on the other. It accepts MSIs from the system and triggers SPIs on the GIC. It is configurable with a number of frames, each of which has a number of SPIs and a base SPI offset. A Linux driver for GICv2m is available upstream.	2015-03-19 04:06:17 -04:00
Matt Evans	ec80224188	arm: Remove the 'magic MSI register' in the GIC (PL390) This patch removes the code that added this magic register. A follow-up patch provides a GICv2m MSI shim that gives the same functionality in a standard ARM system architecture way.	2015-03-19 04:06:16 -04:00
Wendy Elsasser	9b4d8030e6	cpu: Fix TrafficGen message format Fix erroneous message format for fatal error. Previously, code did not have type indicator (% instead of %d). Also removed redundant fatal check. Ran modified sweep.py with in range and out of range values to test.	2015-03-19 04:06:12 -04:00
Andreas Hansson	5275c9d740	mem: Use emplace front/back for deferred packets Embrace C++11 for the deferred packets as we actually store the objects in the data structure, and not just pointers.	2015-03-19 04:06:11 -04:00
Geoffrey Blake	1d403960af	mem: Enable CommMonitor to output traces in atomic mode The CommMonitor by default only allows memory traces to be gathered in timing mode. This patch allows memory traces to be gathered in atomic mode if all one needs is a functional trace of memory addresses used and timing information is of a secondary concern.	2015-03-19 04:06:10 -04:00
Steve Reinhardt	e57ab463cf	mem: remove redundant test in in Cache::recvTimingResp() For some reason we were checking mshr->hasTargets() even though we had already called mshr->getTarget() unconditionally earlier in the same function (which asserts if there are no targets). Get rid of this useless check, and while we're at it get rid of the redundant call to mshr->getTarget(), since we still have the value saved in a local var.	2015-02-11 10:48:53 -08:00
Steve Reinhardt	89bb03a1a6	mem: add local var in Cache::recvTimingResp() The main loop in recvTimingResp() uses target->pkt all over the place. Create a local tgt_pkt to help keep lines under the line length limit.	2015-02-11 10:48:52 -08:00
Steve Reinhardt	ee0b52404c	mem: restructure Packet cmd initialization a bit more Refactor the way that specific MemCmd values are generated for packets. The new approach is a little more elegant in that we assign the right value up front, and it's also more amenable to non-heap-allocated Packet objects. Also replaced the code in the Minor model that was still doing it the ad-hoc way. This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.	2015-02-11 10:48:50 -08:00
Steve Reinhardt	ccef61d1cc	mem: clean up write buffer check in Cache::handleSnoop() The 'if (writebacks.size)' check was redundant, because writeBuffer.findMatches() would return false if the writebacks list was empty. Also renamed 'mshr' to 'wb_entry' in this context since we are pointing at a writebuffer entry and not an MSHR (even though it's the same C++ class).	2015-03-14 06:51:07 -07:00
Nilay Vaish	e5fbc67e16	cpu: o3: another assert instead of check	2015-03-09 09:39:08 -05:00
Nilay Vaish	5003ed5f7a	cpu: o3: Remove unused code in iew, add assert instead.	2015-03-09 09:39:08 -05:00
Nilay Vaish	4e1d10a3cf	cpu: o3: commit: mark pipeline delay variable as consts	2015-03-09 09:39:08 -05:00
Nilay Vaish	53de2512b1	cpu: o3: remove unused stat variables.	2015-03-09 09:39:08 -05:00
Nilay Vaish	54bc67f619	cpu: o3: combine if with same condition	2015-03-09 09:39:07 -05:00
Nilay Vaish	61edd5ac97	cpu: o3: remove member variable squashCounter The variable is used in only one place and a whole new function setNextStatus() has been defined just to compute the value of the variable. Instead of calling the function, the value is now computed in the loop that preceded the function call.	2015-03-09 09:39:07 -05:00
Nilay Vaish	f69a74fda6	cpu: o3: remove unused function annotateMemoryUnits()	2015-03-09 09:39:07 -05:00
Andreas Hansson	fc315901ff	mem: Unify all cache DPRINTF address formatting This patch changes all the DPRINTF messages in the cache to use '%#llx' every time a packet address is printed. The inclusion of '#' ensures '0x' is prepended, and since the address type is a uint64_t %x really should be %llx.	2015-03-02 04:00:56 -05:00
Andreas Hansson	88e2963951	mem: Fix cache MSHR conflict determination This patch fixes a rather subtle issue in the sending of MSHR requests in the cache, where the logic previously did not check for conflicts between the MSRH queue and the write queue when requests were not ready. The correct thing to do is to always check, since not having a ready MSHR does not guarantee that there is no conflict. The underlying problem seems to have slipped past due to the symmetric timings used for the write queue and MSHR queue. However, with the recent timing changes the bug caused regressions to fail.	2015-03-02 04:00:54 -05:00
Andreas Hansson	407737614e	mem: Add byte mask to Packet::checkFunctional This patch changes the valid-bytes start/end to a proper byte mask. With the changes in timing introduced in previous patches there are more packets waiting in queues, and there are regressions using the checker CPU failing due to non-contigous read data being found in the various cache queues. This patch also adds some more comments explaining what is going on, and adds the fourth and missing case to Packet::checkFunctional.	2015-03-02 04:00:52 -05:00
Stephan Diestelhorst	ecef1612b8	mem: Add option to force in-order insertion in PacketQueue By default, the packet queue is ordered by the ticks of the to-be-sent packages. With the recent modifications of packages sinking their header time when their resposne leaves the caches, there could be cases of MSHR targets being allocated and ordered A, B, but their responses being sent out in the order B,A. This led to inconsistencies in bus traffic, in particular the snoop filter observing first a ReadExResp and later a ReadRespWithInv. Logically, these were ordered the other way around behind the MSHR, but due to the timing adjustments when inserting into the PacketQueue, they were sent out in the wrong order on the bus, confusing the snoop filter. This patch adds a flag (off by default) such that these special cases can request in-order insertion into the packet queue, which might offset timing slighty. This is expected to occur rarely and not affect timing results.	2015-03-02 04:00:49 -05:00
Marco Balboni	d4ef8368aa	mem: Downstream components consumes new crossbar delays This patch makes the caches and memory controllers consume the delay that is annotated to a packet by the crossbar. Previously many components simply threw these delays away. Note that the devices still do not pay for these delays.	2015-03-02 04:00:48 -05:00
Andreas Hansson	36dc93a5fa	mem: Move crossbar default latencies to subclasses This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios. Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden.	2015-03-02 04:00:47 -05:00
Marco Balboni	d35dd71ab4	mem: Add crossbar latencies This patch introduces latencies in crossbar that were neglected before. In particular, it adds three parameters in crossbar model: front_end_latency, forward_latency, and response_latency. Along with these parameters, three corresponding members are added: frontEndLatency, forwardLatency, and responseLatency. The coherent crossbar has an additional snoop_response_latency. The latency of the request path through the xbar is set as --> frontEndLatency + forwardLatency In case the snoop filter is enabled, the request path latency is charged also by look-up latency of the snoop filter. --> frontEndLatency + SF(lookupLatency) + forwardLatency. The latency of the response path through the xbar is set instead as --> responseLatency. In case of snoop response, if the response is treated as a normal response the latency associated is again --> responseLatency; If instead it is forwarded as snoop response we add an additional variable + snoopResponseLatency and the latency associated is --> snoopResponseLatency; Furthermore, this patch lets the crossbar progress on the next clock edge after an unused retry, changing the time the crossbar considers itself busy after sending a retry that was not acted upon.	2015-03-02 04:00:46 -05:00
Andreas Sandberg	7be9d4eb67	dev, arm: Clean up PL011 and rewrite interrupt handling The ARM PL011 UART model didn't clear and raise interrupts correctly. This changeset rewrites the whole interrupt handling and makes it both simpler and fixes several cases where the correct interrupts weren't raised or cleared. Additionally, it cleans up many other aspects of the code.	2015-03-02 04:00:44 -05:00
Andreas Hansson	d64b34bef8	arm: Share a port for the two table walker objects This patch changes how the MMU and table walkers are created such that a single port is used to connect the MMU and the TLBs to the memory system. Previously two ports were needed as there are two table walker objects (stage one and stage two), and they both had a port. Now the port itself is moved to the Stage2MMU, and each TableWalker is simply using the port from the parent. By using the same port we also remove the need for having an additional crossbar joining the two ports before the walker cache or the L2. This simplifies the creation of the CPU cache topology in BaseCPU.py considerably. Moreover, for naming and symmetry reasons, the TLB walker port is connected through the stage-one table walker thus making the naming identical to x86. Along the same line, we use the stage-one table walker to generate the master id that is used by all TLB-related requests.	2015-03-02 04:00:42 -05:00
Giacomo Gabrielli	bd70db5521	arm: Remove unnecessary dependencies between AArch64 FP instructions	2015-03-02 04:00:41 -05:00
Rekai	3d5434022a	cpu: o3 register renaming request handling improved Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not.	2015-03-02 04:00:38 -05:00
Andreas Hansson	987de4f5cc	mem: Tidy up the cache debug messages Avoid redundant inclusion of the name in the DPRINTF string.	2015-03-02 04:00:37 -05:00
Andreas Hansson	f26a289295	mem: Split port retry for all different packet classes This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.	2015-03-02 04:00:35 -05:00
Ali Jafri	6ebe8d863a	mem: Fix prefetchSquash + memInhibitAsserted bug This patch resolves a bug with hardware prefetches. Before a hardware prefetch is sent towards the memory, the system generates a snoop request to check all caches above the prefetch generating cache for the presence of the prefetth target. If the prefetch target is found in the tags or the MSHRs of the upper caches, the cache sets the prefetchSquashed flag in the snoop packet. When the snoop packet returns with the prefetchSquashed flag set, the prefetch generating cache deallocates the MSHR reserved for the prefetch. If the prefetch target is found in the writeback buffer of the upper cache, the cache sets the memInhibit flag, which signals the prefetch generating cache to expect the data from the writeback. When the snoop packet returns with the memInhibitAsserted flag set, it marks the allocated MSHR as inService and waits for the data from the writeback. If the prefetch target is found in multiple upper level caches, specifically in the tags or MSHRs of one upper level cache and the writeback buffer of another, the snoop packet will return with both prefetchSquashed and memInhibitAsserted set, while the current code is not written to handle such an outcome. Current code checks for the prefetchSquashed flag first, if it finds the flag, it deallocates the reserved MSHR. This leads to assert failure when the data from the writeback appears at cache. In this fix, we simply switch the order of checks. We first check for memInhibitAsserted and then for prefetch squashed.	2015-03-02 04:00:34 -05:00
Stephan Diestelhorst	de46eeade7	cpu: Add a PC-value to the traffic generator requests Have the traffic generator add its masterID as the PC address to the requests. That way, prefetchers (and other components) that use a PC for request classification will see per-tester streams of requests. This enables us to test strided prefetchers with the memchecker, too.	2015-03-02 04:00:31 -05:00
Andreas Sandberg	3b4ae7debb	arm: Don't truncate 16-bit ASIDs to 8 bits The ISA code sometimes stores 16-bit ASIDs as 8-bit unsigned integers and has a couple of inverted checks that mask out the high 8 bits of an ASID if 16-bit ASIDs have been /enabled/. This changeset fixes both of those issues.	2015-03-02 04:00:28 -05:00
Andreas Sandberg	804b11a3ed	arm: Correctly access the stack pointer in GDB We curently use INTREG_X31 instead of INTREG_SPX when accessing the stack pointer in GDB. gem5 normally uses INTREG_SPX to access the stack pointer, which gets mapped to the stack pointer corresponding (INTREG_SPn) to the current exception level. This changeset updates the GDB interface to use SPX instead of X31 (which is always zero) when transfering CPU state to gdb.	2015-03-02 04:00:27 -05:00
Andreas Sandberg	34dcd90b61	arm: Fix broken page table permissions checks in remote GDB The remote GDB interface currently doesn't check if translations are valid before reading memory. This causes a panic when GDB tries to access unmapped memory (e.g., when getting a stack trace). There are two reasons for this: 1) The function used to check for valid translations (virtvalid()) doesn't work and panics on invalid translations. 2) The method in the GDB interface used to test if a translation is valid (RemoteGDB::acc) always returns true regardless of the return from virtvalid(). This changeset fixes both of these issues.	2015-03-02 04:00:27 -05:00
Jason Power	670f44e05e	Ruby: Update backing store option to propagate through to all RubyPorts Previously, the user would have to manually set access_backing_store=True on all RubyPorts (Sequencers) in the config files. Now, instead there is one global option that each RubyPort checks on initialization. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-02-26 09:58:26 -06:00
Andreas Hansson	8c78aa31ea	cpu: TrafficGen sinks snoops without complaining To be able to use the TrafficGen in a system with caches we need to allow it to sink incoming snoop requests. By default the master port panics, so silently ignore any snoops.	2015-02-16 03:34:55 -05:00
Stephan Diestelhorst	93fa8e3cd4	mem: Fix initial value problem with MemChecker In highly loaded cases, reads might actually overlap with writes to the initial memory state. The mem checker needs to detect such cases and permit the read reading either from the writes (what it is doing now) or read from the initial, unknown value. This patch adds this logic.	2015-02-16 03:34:47 -05:00
Andreas Hansson	661dac1598	dev: Fix undefined behaviuor in i8254xGBe This patch fixes a rather unfortunate oversight where the annotation pointer was used even though it is null. Somehow the code still works, but UBSan is rather unhappy. The use is now guarded, and the variable is initialised in the constructor (as well as init()).	2015-02-16 03:34:35 -05:00
Andreas Sandberg	0a2ee77616	arm: Wire up the GIC with the platform in the base class Move the (common) GIC initialization code that notifies the platform code of the new GIC to the base class (BaseGic) instead of the Pl390 implementation.	2015-02-16 03:34:18 -05:00
Andreas Hansson	e17328a227	mem: mmap the backing store with MAP_NORESERVE This patch ensures we can run simulations with very large simulated memories (at least 64 TB based on some quick runs on a Linux workstation). In essence this allows us to efficiently deal with sparse address maps without having to implement a redirection layer in the backing store. This opens up for run-time errors if we eventually exhausts the hosts memory and swap space, but this should hopefully never happen.	2015-02-16 03:33:47 -05:00
Andreas Hansson	57758ca685	mem: Use the range cache for lookup as well as access This patch changes the range cache used in the global physical memory to be an iterator so that we can use it not only as part of isMemAddr, but also access and functionalAccess. This matches use-cases where a core is using the atomic non-caching memory mode, and repeatedly calls isMemAddr and access. Linux boot on aarch32, with a single atomic CPU, is now more than 30% faster when using "--fastmem" compared to not using the direct memory access.	2015-02-16 03:33:37 -05:00
Andreas Hansson	d0e1b8a19c	arch: Make readMiscRegNoEffect const throughout Finally took the plunge and made this apply to all ISAs, not just ARM.	2015-02-16 03:33:28 -05:00
Andreas Sandberg	5bfa7e3d59	arm: Merge ISA files with pseudo instructions This changeset moves the pseudo instructions used to signal unknown instructions and unimplemented instructions to the same source files as the decoder fault.	2015-02-16 03:32:58 -05:00
Ali Saidi	4eff4fa12e	cpu: add support for outputing a protobuf formatted CPU trace Doesn't support x86 due to static instruction representation. --HG-- rename : src/cpu/CPUTracers.py => src/cpu/InstPBTrace.py	2015-02-16 03:32:38 -05:00
Marco Balboni	268d9e59c5	mem: Clarification of packet crossbar timings This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.	2015-02-11 10:23:47 -05:00
Marco Balboni	e2828587b3	mem: Clarify usage of latency in the cache This patch adds some much-needed clarity in the specification of the cache timing. For now, hit_latency and response_latency are kept as top-level parameters, but the cache itself has a number of local variables to better map the individual timing variables to different behaviours (and sub-components). The introduced variables are: - lookupLatency: latency of tag lookup, occuring on any access - forwardLatency: latency that occurs in case of outbound miss - fillLatency: latency to fill a cache block We keep the existing responseLatency The forwardLatency is used by allocateInternalBuffer() for: - MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer); - MSHR allocateMissBuffer (cacheable miss in MSHR queue); - MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR queue) It is our assumption that the time for the above three buffers is the same. Similarly, for snoop responses passing through the cache we use forwardLatency.	2015-02-11 10:23:36 -05:00
Andreas Hansson	6563ec8634	cpu: Tidy up the MemTest and make false sharing more obvious The MemTest class really only tests false sharing, and as such there was a lot of old cruft that could be removed. This patch cleans up the tester, and also makes it more clear what the assumptions are. As part of this simplification the reference functional memory is also removed. The regression configs using MemTest are updated to reflect the changes, and the stats will be bumped in a separate patch. The example config will be updated in a separate patch due to more extensive re-work. In a follow-on patch a new tester will be introduced that uses the MemChecker to implement true sharing.	2015-02-11 10:23:28 -05:00
Andreas Sandberg	550c318490	sim: Move the BaseTLB to src/arch/generic/ The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. --HG-- rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py rename : src/sim/tlb.cc => src/arch/generic/tlb.cc rename : src/sim/tlb.hh => src/arch/generic/tlb.hh	2015-02-11 10:23:27 -05:00
Andreas Sandberg	9e6f803254	base: Add compiler macros to add deprecation warnings Gcc and clang both provide an attribute that can be used to flag a function as deprecated at compile time. This changeset adds a gem5 compiler macro for that compiler feature. The macro can be used to indicate that a legacy API within gem5 has been deprecated and provide a graceful migration to the new API.	2015-02-11 10:23:24 -05:00
Andreas Hansson	c9b8616c51	base: Do not dereference NULL in CompoundFlag creation This patch fixes the CompoundFlag constructor, ensuring that it does not dereference NULL. Doing so has undefined behaviuor, and both clang and gcc's undefined-behaviour sanitiser was rather unhappy.	2015-02-11 10:23:23 -05:00
Andreas Sandberg	431a6d708b	dev: Remove unused system pointer in the Platform base class The Platform base class contains a pointer to an instance of the System which is never initialized. This can lead to subtle bugs since some architecture-specific platform implementations contain their own system pointer which is normally used. However, if the platform is accessed through a pointer to its base class, the dangling pointer will be used instead.	2015-02-11 10:23:22 -05:00
Alexandru Dutu	ad1b177550	cpu: Idle CPU status logic revised This patch sets the CPU status to idle when the last active thread gets suspended.	2015-02-06 18:01:22 -08:00
Andreas Hansson	461a80beb3	mem: Clarify express snoop behaviour This patch adds a bit of documentation with insights around how express snoops really work.	2015-02-03 14:26:02 -05:00
Andreas Hansson	193325ff60	mem: Clarify cache behaviour for pending dirty responses This patch adds a bit of clarification around the assumptions made in the cache when packets are sent out, and dirty responses are pending. As part of the change, the marking of an MSHR as in service is simplified slightly, and comments are added to explain what assumptions are made.	2015-02-03 14:25:59 -05:00
Curtis Dunham	f0a764edc6	base: add an accessor and operators ==,!= to address ranges	2015-02-03 14:25:58 -05:00
Andreas Hansson	ccb512ecc1	base: Add XOR-based hashed address interleaving This patch extends the current address interleaving with basic hashing support. Instead of directly comparing a number of address bits with a matching value, it is now possible to use two independent set of address bits XOR'ed together. This avoids issues where strided address patterns are heavily biased to a subset of the interleaved ranges.	2015-02-03 14:25:54 -05:00
Andreas Hansson	5ea60a95b3	config: Adjust DRAM channel interleaving defaults This patch changes the DRAM channel interleaving default behaviour to be more representative. The default address mapping (RoRaBaCoCh) moves the channel bits towards the least significant bits, and uses 128 byte as the default channel interleaving granularity. These defaults can be overridden if desired, but should serve as a sensible starting point for most use-cases.	2015-02-03 14:25:52 -05:00
Andreas Sandberg	fe200c2487	sim: Remove test for non-NULL this in Event The method Event::initialized() tests if this != NULL as a part of the expression that tests if an event is initialized. The only case when this check could be false is if the method is called on a null pointer, which is illegal and leads to undefined behavior (such as eating your pets) according to the C++ standard. Because of this, modern compilers (specifically, recent versions of clang) warn about this which we treat as an error. This changeset removes the redundant check to fix said warning.	2015-02-03 14:25:48 -05:00
Andreas Sandberg	851b29ad20	dev: Correctly clear interrupts in VirtIO PCI Correctly clear the PCI interrupt belonging to a VirtIO device when the ISR register is read.	2015-02-03 14:25:47 -05:00
Curtis Dunham	b89fd57663	sim: prioritize async events; prevent starvation If a time quantum event is the only one in the queue, async events (Ctrl-C, I/O, etc.) will never be processed. So process them first.	2014-12-19 15:32:34 -06:00
Andreas Hansson	20111ba917	cpu: Ensure timing CPU sinks response before sending new request This patch changes how the timing CPU deals with processing responses, always scheduling an event, even if it is for the current tick. This helps to avoid situations where a new request shows up before a response is finished in the crossbar, and also is more in line with any realistic behaviour.	2015-02-03 14:25:27 -05:00
Geoffrey Blake	3e33786db8	config: Fix typo in Float param The Float param was not settable on the command line due to a typo in the class definition in python/m5/params.py. This corrects the typo and allows floats to be set on the command line as intended.	2015-02-03 14:25:07 -05:00
Ali Saidi	89b3616d7e	arm: always set the IsFirstMicroop flag While the IsFirstMicroop flag exists it was only occasionally used in the ARM instructions that gem5 microOps and therefore couldn't be relied on to be correct.	2015-01-25 07:22:56 -05:00
Ali Saidi	9d8ddd92dc	sim: Clean up InstRecord Track memory size and flags as well as add some comments and consts.	2015-01-25 07:22:44 -05:00
Ali Saidi	f6742ea26e	cpu: Remove all notion that we know when the cpu is misspeculating. We have no way of knowing if a CPU model is on the wrong path with our execute-in-execute CPU models. Don't pretend that we do.	2015-01-25 07:22:26 -05:00
Ali Saidi	0bd986015b	cpu: Put all CPU instruction tracers in a single file	2015-01-25 07:22:17 -05:00
Ali Saidi	6c4a23c1c6	cpu: remove legion tracer If someone wants to debug with legion again they can restore the code from the repository, but no need to have it hang around indefinately.	2015-01-25 07:22:05 -05:00
Curtis Dunham	10b5e5431d	sim: fix reference counting of PythonEvent When gem5 is a slave to another simulator and the Python is only used to initialize the configuration (and not perform actual simulation), a "debug start" (--debug-start) event will get freed during or immediately after the initial Python frame's execution rather than remaining in the event queue. This tricky patch fixes the GC issue causing this.	2014-12-23 11:51:40 -06:00
Andreas Hansson	10c69bb168	mem: Remove unused Packet src and dest fields This patch takes the final step in removing the src and dest fields in the packet. These fields were rather confusing in that they only remember a single multiplexing component, and pushed the responsibility to the bridge and caches to store the fields in a senderstate, thus effectively creating a stack. With the recent changes to the crossbar response routing the crossbar is now responsible without relying on the packet fields. Thus, these variables are now unused and can be removed.	2015-01-22 05:01:31 -05:00
Andreas Hansson	15c64035ed	mem: Remove Packet source from ForwardResponseRecord This patch removes the source field from the ForwardResponseRecord, but keeps the class as it is part of how the cache identifies responses to hardware prefetches that are snooped upwards.	2015-01-22 05:01:30 -05:00
Andreas Hansson	0c2ffd2daa	mem: Remove unused RequestState in the bridge This patch removes the bridge sender state as the Crossbar now takes care of remembering its own routing decisions.	2015-01-22 05:01:27 -05:00
Andreas Hansson	00536b0efc	mem: Always use SenderState for response routing in RubyPort This patch aligns how the response routing is done in the RubyPort, using the SenderState for both memory and I/O accesses. Before this patch, only the I/O used the SenderState, whereas the memory accesses relied on the src field in the packet. With this patch we shift to using SenderState in both cases, thus not relying on the src field any longer.	2015-01-22 05:01:24 -05:00
Andreas Hansson	072f78471d	mem: Make the XBar responsible for tracking response routing This patch removes the need for a source and destination field in the packet by shifting the onus of the tracking to the crossbar, much like a real implementation. This change in behaviour also means we no longer need a SenderState to remember the source/dest when ever we have multiple crossbars in the system. Thus, the stack that was created by the SenderState is not needed, and each crossbar locally tracks the response routing. The fields in the packet are still left behind as the RubyPort (which also acts as a crossbar) does routing based on them. In the succeeding patches the uses of the src and dest field will be removed. Combined, these patches improve the simulation performance by roughly 2%.	2015-01-22 05:01:14 -05:00
Andreas Hansson	ce12d4bc63	x86: Delay X86 table walk on receiving walker response This patch fixes a minor issue in the X86 page table walker where it ended up sending new request packets to the crossbar before the response processing was finished (recvTimingResp is directly calling sendTimingReq). Under certain conditions this caused the crossbar to see illegal combinations of request/response overlap, in turn causing problems with a slightly modified crossbar implementation.	2015-01-22 05:00:54 -05:00
Andreas Hansson	f49830ce0b	mem: Clean up Request initialisation This patch tidies up how we create and set the fields of a Request. In essence it tries to use the constructor where possible (as opposed to setPhys and setVirt), thus avoiding spreading the information across a number of locations. In fact, setPhys is made private as part of this patch, and a number of places where we callede setVirt instead uses the appropriate constructor.	2015-01-22 05:00:53 -05:00
Nikos Nikoleris	a35283ac65	cpu: commit probe notification on every microop or macroop The ppCommit should notify the attached listener every time the cpu commits a microop or non microcoded insturction. The listener can then decide whether it will process only the last microop (eg. SimPoint probe). Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-20 14:15:27 -06:00
Andreas Hansson	6096e2f9c1	mem: Fix bug in cache request retry mechanism This patch ensures that inhibited packets that are about to be turned into express snoops do not update the retry flag in the cache.	2015-01-20 08:12:01 -05:00
Andreas Hansson	da0c770943	cpu: Fix retry bug in MinorCPU LSQ	2015-01-20 08:11:58 -05:00
Andreas Hansson	92585d60c9	mem: Move DRAM interleaving check to init This patch fixes a bug where the DRAM controller tried to access the system cacheline size before the system pointer was initialised. It also fixes a bug where the granularity is 0 (no interleaving).	2015-01-20 08:11:55 -05:00
Emilio Castillo	7bb65dd434	x86 : fxsave and fxrestore missing template code This patch corrects the FXSAVE and FXRSTOR Macroops. The actual code used for saving/restore the FP registers is in the file but it was not used. The FXSAVE and FXRSTOR instructions are used in the kernel for saving and loading the state of the mmx,xmm and fpu registers. This operation is triggered in FS by issuing a Device Not Available Fault. The cr0 register has a TS flag that is set upon each context change. Every time a task access any FP related register (SIMD as well) if the TS flag is set to one, the device not available fault is issued. The kernel saves the current state of the registers, and restore the previous state of the currently running task. Right now Gem5 lacks of this capability. the Device Not Available Fault is never issued, leading to several problems when different threads share the same CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this behavior. In order to test this a hack in the atomic cpu code was done to detect if a static instruction has any FP operands and the cr0 reg TS bit is set. This check must be done in the ISA dependent code. But it seems to be tricky to access the cr0 register while executing an instruction. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-10 14:30:53 -06:00
Nikos Nikoleris	ec64b81a9d	cpu: fix RetiredStores probe point Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-10 14:30:53 -06:00
cdirik	1693e526d0	dev: prevent intel 8254 timer counter events firing before startup This change includes edits to Intel8254Timer to prevent counter events firing before startup to comply with SimObject initialization call sequence. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-06 15:10:22 -07:00
Gabe Black	1c1fb2c988	test: Add a unittest for the BitUnion types.	2015-01-07 00:34:40 -08:00
Gabe Black	86dea86987	base: Fix assigning between identical bitfields. If two bitfields are of the same type, also implying that they have the same first and last bit positions, the existing implementation would copy the entire bitfield. That includes the __data member which is shared among all the bitfields, effectively overwritting the entire bitunion. This change also adjusts the write only signed bitfield assignment operator to be like the unsigned version, using "using" instead of implementing it again and calling down to the underlying implementation.	2015-01-07 00:31:46 -08:00
Gabe Black	cd6380605c	x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield. These are for the monitor/mwait instructions, SSSE3, and XSAVE.	2015-01-06 22:15:00 -08:00
Gabe Black	cb181d6f91	cpuid, x86: Revert "Enabling more features in CPUid" That change enables CPUID bits for features that aren't implemented in gem5. If a simulated system tries to use those features because it was told it could, bad things can happen.	2015-01-06 22:13:56 -08:00
Andrew Lukefahr	6d32004407	minor: fixed LSQ MasterPortID Minor was reporting the data cache access as ".inst" accesses. This just switches the MasterPortID to dataMasterPortId. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-03 17:51:48 -06:00
mike upton	cb911559dc	arm: Add unlinkat syscall implementation added ARM aarch64 unlinkat syscall support, modeled on other <xxx>at syscalls. This gets all of the cpu2006 int workloads passing in SE mode on aarch64. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-03 17:51:48 -06:00
Maxime Martinasso	5a5416d575	x86: implements the simd128 ADDSUBPD instruction This patch implements the simd128 ADDSUBPD instruction for the x86 architecture. Tested with a simple program in assembly language which executes the instruction. Checked that different versions of the instruction are executed by using the execution tracing option. Committed by: Nilay Vaish <nilay@cs.wisc.edu	2015-01-03 17:51:48 -06:00
Cagdas Dirik	02c376ac44	dev: prevent RTC events firing before startup This change includes edits to MC146818 timer to prevent RTC events firing before startup to comply with SimObject initialization call sequence. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-01-03 17:51:48 -06:00
Joel Hestness	642b9b4fab	syscall_emul: Return correct writev value According to Linux man pages, if writev is successful, it returns the total number of bytes written. Otherwise, it returns an error code. Instead of returning 0, return the result from the actual call to writev in the system call.	2014-12-27 13:48:40 -06:00
Mitch Hayenga	b2342c5d9a	mem: Change prefetcher to use random_mt Prefechers has used rand() to generate random numers previously.	2014-12-23 09:31:19 -05:00
Curtis Dunham	516e6046ae	mem: Hide WriteInvalidate requests from prefetchers Without this tweak, a prefetcher will happily prefetch data that will promptly be invalidated and overwritten by a WriteInvalidate.	2014-12-23 09:31:19 -05:00
Mitch Hayenga	bd4f901c77	mem: Fix event scheduling issue for prefetches The cache's MemSidePacketQueue schedules a sendEvent based upon nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever a future prefetch is ready. However, a prefetch being ready does not guarentee that it can obtain an MSHR. So, when all MSHRs are full, the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond until an MSHR is finally freed and the prefetch can happen. This patch fixes this by not signaling the prefetch ready time if the prefetch could not be generated. The event is rescheduled as soon as a MSHR becomes available.	2014-12-23 09:31:18 -05:00
Mitch Hayenga	4acd4a2055	mem: Fix bug relating to writebacks and prefetches Previously the code commented about an unhandled case where it might be possible for a writeback to arrive after a prefetch was generated but before it was sent to the memory system. I hit that case. Luckily the prefetchSquash() logic already in the code handles dropping prefetch request in certian circumstances.	2014-12-23 09:31:18 -05:00
Mitch Hayenga	df82a2d003	mem: Rework the structuring of the prefetchers Re-organizes the prefetcher class structure. Previously the BasePrefetcher forced multiple assumptions on the prefetchers that inherited from it. This patch makes the BasePrefetcher class truly representative of base functionality. For example, the base class no longer enforces FIFO order. Instead, prefetchers with FIFO requests (like the existing stride and tagged prefetchers) now inherit from a new QueuedPrefetcher base class. Finally, the stride-based prefetcher now assumes a custimizable lookup table (sets/ways) rather than the previous fully associative structure.	2014-12-23 09:31:18 -05:00
Mitch Hayenga	6cb58b2bd2	mem: Add parameter to reserve MSHR entries for demand access Adds a new parameter that reserves some number of MSHR entries for demand accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand requests from the CPU to stall.	2014-12-23 09:31:18 -05:00
Curtis Dunham	4d88978913	arm: Add stats to table walker This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion	2014-12-23 09:31:18 -05:00
Andreas Hansson	59460b91f3	config: Expose the DRAM ranks as a command-line option This patch gives the user direct influence over the number of DRAM ranks to make it easier to tune the memory density without affecting the bandwidth (previously the only means of scaling the device count was through the number of channels). The patch also adds some basic sanity checks to ensure that the number of ranks is a power of two (since we rely on bit slices in the address decoding).	2014-12-23 09:31:18 -05:00
Andreas Hansson	2f7baf9dbe	mem: Ensure DRAM controller is idle when in atomic mode This patch addresses an issue seen with the KVM CPU where the refresh events scheduled by the DRAM controller forces the simulator to switch out of the KVM mode, thus killing performance. The current patch works around the fact that we currently have no proper API to inform a SimObject of the mode switches. Instead we rely on drainResume being called after any switch, and cache the previous mode locally to be able to decide on appropriate actions. The switcheroo regression require a minor stats bump as a result.	2014-12-23 09:31:18 -05:00
Omar Naji	381d1da791	mem: Add rank-wise refresh to the DRAM controller This patch adds rank-wise refresh to the controller, as opposed to the channel-wide refresh currently in place. In essence each rank can be refreshed independently, and for this to be possible the controller is extended with a state machine per rank. Without this patch the data bus is always idle during a refresh, as all the ranks are refreshing at the same time. With the rank-wise refresh it is possible to use one rank while another one is refreshing, and thus the data bus can be kept busy. The patch introduces a Rank class to encapsulate the state per rank, and also shifts all the relevant banks, activation tracking etc to the rank. The arbitration is also updated to consider the state of the rank.	2014-12-23 09:31:18 -05:00
Omar Naji	152c02354e	mem: Fix a bug in the DRAM controller arbitration Fix a minor issue that affects multi-rank systems.	2014-12-23 09:31:18 -05:00
Kanishk Sugand	7a25b1a0e0	mem: Add stack distance statistics to the CommMonitor This patch adds the stack distance calculator to the CommMonitor. The stats are disabled by default.	2014-12-23 09:31:18 -05:00
Kanishk Sugand	888975b29d	mem: Add a stack distance calculator This patch adds a stand-alone stack distance calculator. The stack distance calculator is a passive SimObject that observes the addresses passed to it. It calculates stack distances (LRU Distances) of incoming addresses based on the partial sum hierarchy tree algorithm described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043. For each transaction a hashtable look-up is performed. At every non-unique transaction the tree is traversed from the leaf at the returned index to the root, the old node is deleted from the tree, and the sums (to the right) are collected and decremented. The collected sum represets the stack distance of the found node. At every unique transaction the stack distance is returned as numeric_limits<uint64>::max(). In addition to the basic stack distance calculation, a feature to mark an old node in the tree is added. This is useful if it is required to see the reuse pattern. For example, Writebacks to the lower level (e.g. membus from L2), can be marked instead of being removed from the stack (isMarked flag of Node set to True). And then later if this same address is accessed (by L1), the value of the isMarked flag would be True. This gives some insight on how the Writeback policy of the lower level affect the read/write accesses in an application. Debugging is enabled by setting the verify flag to true. Debugging is implemented using a dummy stack that behaves in a naive way, using STL vectors. Note that this has a large impact on run time.	2014-12-23 09:31:18 -05:00
Marco Elver	dd0f3943e2	mem: Add MemChecker and MemCheckerMonitor This patch adds the MemChecker and MemCheckerMonitor classes. While MemChecker can be integrated anywhere in the system and is independent, the most convenient usage is through the MemCheckerMonitor -- this however, puts limitations on where the MemChecker is able to observe read/write transactions.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	184fefbb3b	arm: Raise an alignment fault if a PC has illegal alignment We currently don't handle unaligned PCs correctly. There is one check for unaligned PCs in the TLB when running in aarch64 mode, but this check does not cover cases where the CPU does not do a TLB lookup when decoding an instruction (e.g., a branch stays within the same cache line). Additionally, the Decoder class sometimes throws an assertion for unaligned PCs which breaks speculation. This changeset introduces a decoder fault bit field in the ExtMachInst structure. This field can be used to signal a decoder failure. If set, the decoder generates an internal gem5fault instruction instead of a normal instruction. This instruction in turns either panics (fault type PANIC), returns an PCAlignmentFault (fault type UNALIGNED, aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32). The patch causes minor changes to the realview64 regressions, and a stats bump will follow.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	b33812ba43	arm: Clean up and document decoder API This changeset adds more documentation to the ArmISA::Decoder class and restructures it slightly to make API groups more obvious.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	070b4a81db	arm: Add support for filtering in the PMU This patch adds support for filtering events in the PMU. In order to do so, it updates the ISADevice base class to forward an ISA pointer to ISA devices. This enables such devices to access the MiscReg file to determine the current execution level.	2014-12-23 09:31:17 -05:00
Gabe Black	70eb68beae	Let other objects set up memory like regions in a KVM VM.	2014-12-09 21:53:44 -08:00
Andreas Sandberg	9b7578d8c7	arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0 The aarch64 system register decoder is currently not decoding PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the decoder so that they are decoded using the values in table C5-6 in ARM DDI 0478A.c.	2014-12-08 04:49:53 -05:00
Andreas Sandberg	6a9fbd295d	dev: Add response sanity checks in PioPort Add an assert in the PioPort that checks if a response packet from a device has the right flags set before passing it to them rest of the memory system.	2014-12-08 04:49:52 -05:00
Andreas Sandberg	1ccc4e0e21	dev: Correctly transform packets into responses The VirtIO devices didn't correctly set the response flags in memory packets. This changeset adds the required Packet::makeResponse() calls.	2014-12-08 04:49:51 -05:00
Gabe Black	4a8a0a0798	misc: Generalize GDB single stepping. The new single stepping implementation for x86 doesn't rely on any ISA specific properties or functionality. This change pulls out the per ISA implementation of those functions and promotes the X86 implementation to the base class. One drawback of that implementation is that the CPU might stop on an instruction twice if it's affected by both breakpoints and single stepping. While that might be a little surprising, it's harmless and would only happen under somewhat unlikely circumstances.	2014-12-05 22:37:03 -08:00
Gabe Black	fb07d43b1a	x86: Implement a remote GDB stub. This stub should allow remote debugging of 32 bit and 64 bit targets. Single stepping seems to work, as do breakpoints. If both breakpoints and single stepping affect an instruction, gdb will stop at the instruction twice before continuing. That's a little surprising, but is generally harmless.	2014-12-05 22:36:16 -08:00
Gabe Black	16c9b41616	misc: Add some utility functions for schedule inst commit events. These can be used to simplify the implementation of single step in derived classes.	2014-12-05 22:35:47 -08:00
Gabe Black	cddf988bfd	misc: Rename the GDB "Event" event class to InputEvent. The "Event" name is the same as the base event class. That's a bit confusing, and makes it a little awkward to add other event types.	2014-12-05 22:34:42 -08:00
Gabe Black	f9f46b8fa9	sim: Ensure GDB interrupts the simulation at an instruction boundary. Use the comInstEventQueue to ensure GDB interrupts the simulation at an instruction boundary and not in the middle of a macroop, memory access, etc.	2014-12-05 01:51:49 -08:00
Gabe Black	bacbb8ecbc	cpu: Only check for PC events on instruction boundaries. Only the instruction address is actually checked, so there's no need to check repeatedly while we're working through the microops of a macroop and that's not changing.	2014-12-05 01:47:35 -08:00
Gabe Black	fe48c0a32b	misc: Make the GDB register cache accessible in various sized chunks. Not all ISAs have 64 bit sized registers, so it's not always very convenient to access the GDB register cache in 64 bit sized chunks. This change makes it accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations were working around that limitation by bundling and unbundling 32 bit values into 64 bit values. That code has been removed.	2014-12-05 01:44:24 -08:00
Gabe Black	22aaa5867f	x86: Rework opcode parsing to support 3 byte opcodes properly. Instead of counting the number of opcode bytes in an instruction and recording each byte before the actual opcode, we can represent the path we took to get to the actual opcode byte by using a type code. That has a couple of advantages. First, we can disambiguate the properties of opcodes of the same length which have different properties. Second, it reduces the amount of data stored in an ExtMachInst, making them slightly easier/faster to create and process. This also adds some flexibility as far as how different types of opcodes are handled, which might come in handy if we decide to support VEX or XOP instructions. This change also adds tables to support properly decoding 3 byte opcodes. Before we would fall off the end of some arrays, on top of the ambiguity described above. This change doesn't measureably affect performance on the twolf benchmark. --HG-- rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa	2014-12-04 15:53:54 -08:00
Gabe Black	3069c28a02	arch: Allow named constants as decode case values. The values in a "bitfield" or in an ExtMachInst structure member may not be a literal value, it might select from an arbitrary collection of options. Instead of using the raw value of those constants in the decoder, it's easier to tell what's going on if they can be referred to as a symbolic constant/enum. To support that, the ISA description language is extended slightly so that in addition to integer literals, the case value for decode blobs can also be a string literal. It's up to the ISA author to ensure that the string evaluates to a legal constant value when interpretted as C++.	2014-12-04 15:52:48 -08:00
Gabe Black	d67cf81f5d	x86: Clean up style in process.cc.	2014-12-02 22:01:51 -08:00
Gabe Black	2d9dae01fb	sim: Make it possible to override the breakpoint length check. The check which makes sure the length of the breakpoint being written is the same as a MachInst is only correct on fixed instruction width ISAs. Instead of incorrectly applying that check to all ISAs, this change makes that the default check and lets ISA specific GDB classes override it.	2014-12-03 03:27:19 -08:00
Gabe Black	ecec8cde63	ide: Accept the IDLE (0xe3) ATA command. This command is supposed to set up a timer which will put the drive into a standby mode if it isn't sent a command within a given time out. Since most of the timeouts are generally significantly longer than a simulation would run anyway, and we don't have an implementation for standby mode to begin with, we can accept the command, do nothing, and report success.	2014-12-03 03:07:35 -08:00
Gabe Black	bce58726f3	dev: Support translating left and right ALT keys. This is used primarily for VNC.	2014-12-03 03:06:03 -08:00
Andreas Hansson	966c3f4bc5	scons: Ensure dictionary iteration is sorted by key This patch adds sorting based on the SimObject name or parameter name for all situations where we iterate over dictionaries. This should ensure a deterministic and consistent order across the host systems and hopefully avoid regression results differing across python versions.	2014-12-02 06:08:22 -05:00
Curtis Dunham	5d22250845	mem: Support WriteInvalidate (again) This patch takes a clean-slate approach to providing WriteInvalidate (write streaming, full cache line writes without first reading) support. Unlike the prior attempt, which took an aggressive approach of directly writing into the cache before handling the coherence actions, this approach follows the existing cache flows as closely as possible.	2014-12-02 06:08:19 -05:00
Curtis Dunham	7ca27dd3cc	mem: Remove WriteInvalidate support Prepare for a different implementation following in the next patch	2014-12-02 06:08:17 -05:00
Andrew Bardsley	df37cad0fd	cpu: Fix retries on barrier/store in Minor's store buffer This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling. LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed. For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared. Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry.	2014-12-02 06:08:15 -05:00
Andrew Bardsley	98f3e7a310	cpu: Fix memoryIssueLimit checking in Minor This patch fixes the checking of the number of memory instructions issued per cycles in the Minor CPU.	2014-12-02 06:08:13 -05:00
Andrew Bardsley	3cd0b1f6a6	arm: Fix TLB ignoring faults when table walking This patch fixes a case where the Minor CPU can deadlock due to the lack of a response to TLB request because of a bug in fault handling in the ARM table walker. TableWalker::processWalkWrapper is the scheduler-called wrapper which handles deferred walks which calls to TableWalker::wait cannot immediately process. The handling of faults generated by processWalk{AArch64,LPAE,} calls in those two functions is is different. processWalkWrapper ignores fault returns from processWalk... which can lead to ::finish not being called on a translation. This fix provides fault handling in processWalkWrapper similar to that found in the leaf functions which BaseTLB::Translation::finish.	2014-12-02 06:08:11 -05:00
Marco Elver	9649395f85	cpu, o3: Ignored invalidate causing same-address load reordering In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.	2014-12-02 06:08:03 -05:00
Andreas Hansson	74bbe20141	cpu: Always mask the snoop address when performing lock check Ensure the snoop address check is always using a cache-block aligned address. This patch updates Alpha and Mips to match the other ISAs.	2014-12-02 06:08:00 -05:00
Stephan Diestelhorst	810349a8a7	cpu: Move packet deallocation to recvTimingResp in the O3 CPU Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.	2014-12-02 06:07:58 -05:00
Andreas Hansson	5c84157c29	mem: Relax packet src/dest check and shift onus to crossbar This patch allows objects to get the src/dest of a packet even if it is not set to a valid port id. This simplifies (ab)using the bridge as a buffer and latency adapter in situations where the neighbouring MemObjects are not crossbars. The checks that were done in the packet are now shifted to the crossbar where the fields are used to index into the port arrays. Thus, the carrier of the information is not burdened with checking, and the crossbar can check not only that the destination is set, but also that the port index is within limits.	2014-12-02 06:07:56 -05:00
Andreas Hansson	ea5ccc7041	mem: Clean up packet data allocation This patch attempts to make the rules for data allocation in the packet explicit, understandable, and easy to verify. The constructor that copies a packet is extended with an additional flag "alloc_data" to enable the call site to explicitly say whether the newly created packet is short-lived (a zero-time snoop), or has an unknown life-time and therefore should allocate its own data (or copy a static pointer in the case of static data). The tricky case is the static data. In essence this is a copy-avoidance scheme where the original source of the request (DMA, CPU etc) does not ask the memory system to return data as part of the packet, but instead provides a pointer, and then the memory system carries this pointer around, and copies the appropriate data to the location itself. Thus any derived packet actually never copies any data. As the original source does not copy any data from the response packet when arriving back at the source, we must maintain the copy of the original pointer to not break the system. We might want to revisit this one day and pay the price for a few extra memcpy invocations. All in all this patch should make it easier to grok what is going on in the memory system and how data is actually copied (or not).	2014-12-02 06:07:54 -05:00
Andreas Hansson	f012166bb6	mem: Cleanup Packet::checkFunctional and hasData usage This patch cleans up the use of hasData and checkFunctional in the packet. The hasData function is unfortunately suggesting that it checks if the packet has a valid data pointer, when it does in fact only check if the specific packet type is specified to have a data payload. The confusion led to a bug in checkFunctional. The latter function is also tidied up to avoid name overloading.	2014-12-02 06:07:52 -05:00
Andreas Hansson	a2ee51f631	mem: Make the requests carried by packets const This adds a basic level of sanity checking to the packet by ensuring that a request is not modified once the packet is created. The only issue that had to be worked around is the relaying of software-prefetches in the cache. The specific situation is now solved by first copying the request, and then creating a new packet accordingly.	2014-12-02 06:07:50 -05:00
Andreas Hansson	fa60d5cf27	mem: Make Request getters const This patch tidies up the Request class, making all getters const. The odd one out is incAccessDepth which is called by the memory system as packets carry the request around. This is also const to enable the packet to hold on to a const Request.	2014-12-02 06:07:48 -05:00
Andreas Hansson	3d6ec81e66	mem: Add checks and explanation for assertMemInhibit usage	2014-12-02 06:07:46 -05:00
Andreas Hansson	41846cb61b	mem: Assume all dynamic packet data is array allocated This patch simplifies how we deal with dynamically allocated data in the packet, always assuming that it is array allocated, and hence should be array deallocated (delete[] as opposed to delete). The only uses of dataDynamic was in the Ruby testers. The ARRAY_DATA flag in the packet is removed accordingly. No defragmentation of the flags is done at this point, leaving a gap in the bit masks. As the last part the patch, it renames dataDynamicArray to dataDynamic.	2014-12-02 06:07:43 -05:00
Andreas Hansson	5df96cb690	mem: Remove redundant Packet::allocate calls This patch cleans up the packet memory allocation confusion. The data is always allocated at the requesting side, when a packet is created (or copied), and there is never a need for any device to allocate any space if it is merely responding to a paket. This behaviour is in line with how SystemC and TLM works as well, thus increasing interoperability, and matching established conventions. The redundant calls to Packet::allocate are removed, and the checks in the function are tightened up to make sure data is only ever allocated once. There are still some oddities in the packet copy constructor where we copy the data pointer if it is static (without ownership), and allocate new space if the data is dynamic (with ownership). The latter is being worked on further in a follow-on patch.	2014-12-02 06:07:41 -05:00
Andreas Hansson	0706a25203	mem: Use const pointers for port proxy write functions This patch changes the various write functions in the port proxies to use const pointers for all sources (similar to how memcpy works). The one unfortunate aspect is the need for a const_cast in the packet, to avoid having to juggle a const and a non-const data pointer. This design decision can always be re-evaluated at a later stage.	2014-12-02 06:07:38 -05:00
Andreas Hansson	9779ba2e37	mem: Add const getters for write packet data This patch takes a first step in tightening up how we use the data pointer in write packets. A const getter is added for the pointer itself (getConstPtr), and a number of member functions are also made const accordingly. In a range of places throughout the memory system the new member is used. The patch also removes the unused isReadWrite function.	2014-12-02 06:07:36 -05:00
Andreas Hansson	25bfc24999	mem: Remove null-check bypassing in Packet::getPtr This patch removes the parameter that enables bypassing the null check in the Packet::getPtr method. A number of call sites assume the value to be non-null. The one odd case is the RubyTester, which issues zero-sized prefetches(!), and despite being reads they had no valid data pointer. This is now fixed, but the size oddity remains (unless anyone object or has any good suggestions). Finally, in the Ruby Sequencer, appropriate checks are made for flush packets as they have no valid data pointer.	2014-12-02 06:07:34 -05:00
Omar Naji	0e63d2cd62	mem: Add a GDDR5 DRAM config This patch adds a first cut GDDR5 config to accommodate the users combining gem5 and GPUSim. The config is based on a SK Hynix datasheet, and the Nvidia GTX580 specification. Someone from the GPUSim user-camp should tweak the default page-policy and static frontend and backend latencies.	2014-12-02 06:07:32 -05:00
Andreas Hansson	d66b14ca61	misc: Another round of static analysis fixups Mostly addressing uninitialised members.	2014-11-24 09:03:38 -05:00

... 8 9 10 11 12 ...

7425 commits