sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Gabe Black	3069c28a02	arch: Allow named constants as decode case values. The values in a "bitfield" or in an ExtMachInst structure member may not be a literal value, it might select from an arbitrary collection of options. Instead of using the raw value of those constants in the decoder, it's easier to tell what's going on if they can be referred to as a symbolic constant/enum. To support that, the ISA description language is extended slightly so that in addition to integer literals, the case value for decode blobs can also be a string literal. It's up to the ISA author to ensure that the string evaluates to a legal constant value when interpretted as C++.	2014-12-04 15:52:48 -08:00
Gabe Black	d67cf81f5d	x86: Clean up style in process.cc.	2014-12-02 22:01:51 -08:00
Gabe Black	2d9dae01fb	sim: Make it possible to override the breakpoint length check. The check which makes sure the length of the breakpoint being written is the same as a MachInst is only correct on fixed instruction width ISAs. Instead of incorrectly applying that check to all ISAs, this change makes that the default check and lets ISA specific GDB classes override it.	2014-12-03 03:27:19 -08:00
Gabe Black	ecec8cde63	ide: Accept the IDLE (0xe3) ATA command. This command is supposed to set up a timer which will put the drive into a standby mode if it isn't sent a command within a given time out. Since most of the timeouts are generally significantly longer than a simulation would run anyway, and we don't have an implementation for standby mode to begin with, we can accept the command, do nothing, and report success.	2014-12-03 03:07:35 -08:00
Gabe Black	bce58726f3	dev: Support translating left and right ALT keys. This is used primarily for VNC.	2014-12-03 03:06:03 -08:00
Andreas Hansson	966c3f4bc5	scons: Ensure dictionary iteration is sorted by key This patch adds sorting based on the SimObject name or parameter name for all situations where we iterate over dictionaries. This should ensure a deterministic and consistent order across the host systems and hopefully avoid regression results differing across python versions.	2014-12-02 06:08:22 -05:00
Curtis Dunham	5d22250845	mem: Support WriteInvalidate (again) This patch takes a clean-slate approach to providing WriteInvalidate (write streaming, full cache line writes without first reading) support. Unlike the prior attempt, which took an aggressive approach of directly writing into the cache before handling the coherence actions, this approach follows the existing cache flows as closely as possible.	2014-12-02 06:08:19 -05:00
Curtis Dunham	7ca27dd3cc	mem: Remove WriteInvalidate support Prepare for a different implementation following in the next patch	2014-12-02 06:08:17 -05:00
Andrew Bardsley	df37cad0fd	cpu: Fix retries on barrier/store in Minor's store buffer This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling. LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed. For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared. Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry.	2014-12-02 06:08:15 -05:00
Andrew Bardsley	98f3e7a310	cpu: Fix memoryIssueLimit checking in Minor This patch fixes the checking of the number of memory instructions issued per cycles in the Minor CPU.	2014-12-02 06:08:13 -05:00
Andrew Bardsley	3cd0b1f6a6	arm: Fix TLB ignoring faults when table walking This patch fixes a case where the Minor CPU can deadlock due to the lack of a response to TLB request because of a bug in fault handling in the ARM table walker. TableWalker::processWalkWrapper is the scheduler-called wrapper which handles deferred walks which calls to TableWalker::wait cannot immediately process. The handling of faults generated by processWalk{AArch64,LPAE,} calls in those two functions is is different. processWalkWrapper ignores fault returns from processWalk... which can lead to ::finish not being called on a translation. This fix provides fault handling in processWalkWrapper similar to that found in the leaf functions which BaseTLB::Translation::finish.	2014-12-02 06:08:11 -05:00
Marco Elver	9649395f85	cpu, o3: Ignored invalidate causing same-address load reordering In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.	2014-12-02 06:08:03 -05:00
Andreas Hansson	74bbe20141	cpu: Always mask the snoop address when performing lock check Ensure the snoop address check is always using a cache-block aligned address. This patch updates Alpha and Mips to match the other ISAs.	2014-12-02 06:08:00 -05:00
Stephan Diestelhorst	810349a8a7	cpu: Move packet deallocation to recvTimingResp in the O3 CPU Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.	2014-12-02 06:07:58 -05:00
Andreas Hansson	5c84157c29	mem: Relax packet src/dest check and shift onus to crossbar This patch allows objects to get the src/dest of a packet even if it is not set to a valid port id. This simplifies (ab)using the bridge as a buffer and latency adapter in situations where the neighbouring MemObjects are not crossbars. The checks that were done in the packet are now shifted to the crossbar where the fields are used to index into the port arrays. Thus, the carrier of the information is not burdened with checking, and the crossbar can check not only that the destination is set, but also that the port index is within limits.	2014-12-02 06:07:56 -05:00
Andreas Hansson	ea5ccc7041	mem: Clean up packet data allocation This patch attempts to make the rules for data allocation in the packet explicit, understandable, and easy to verify. The constructor that copies a packet is extended with an additional flag "alloc_data" to enable the call site to explicitly say whether the newly created packet is short-lived (a zero-time snoop), or has an unknown life-time and therefore should allocate its own data (or copy a static pointer in the case of static data). The tricky case is the static data. In essence this is a copy-avoidance scheme where the original source of the request (DMA, CPU etc) does not ask the memory system to return data as part of the packet, but instead provides a pointer, and then the memory system carries this pointer around, and copies the appropriate data to the location itself. Thus any derived packet actually never copies any data. As the original source does not copy any data from the response packet when arriving back at the source, we must maintain the copy of the original pointer to not break the system. We might want to revisit this one day and pay the price for a few extra memcpy invocations. All in all this patch should make it easier to grok what is going on in the memory system and how data is actually copied (or not).	2014-12-02 06:07:54 -05:00
Andreas Hansson	f012166bb6	mem: Cleanup Packet::checkFunctional and hasData usage This patch cleans up the use of hasData and checkFunctional in the packet. The hasData function is unfortunately suggesting that it checks if the packet has a valid data pointer, when it does in fact only check if the specific packet type is specified to have a data payload. The confusion led to a bug in checkFunctional. The latter function is also tidied up to avoid name overloading.	2014-12-02 06:07:52 -05:00
Andreas Hansson	a2ee51f631	mem: Make the requests carried by packets const This adds a basic level of sanity checking to the packet by ensuring that a request is not modified once the packet is created. The only issue that had to be worked around is the relaying of software-prefetches in the cache. The specific situation is now solved by first copying the request, and then creating a new packet accordingly.	2014-12-02 06:07:50 -05:00
Andreas Hansson	fa60d5cf27	mem: Make Request getters const This patch tidies up the Request class, making all getters const. The odd one out is incAccessDepth which is called by the memory system as packets carry the request around. This is also const to enable the packet to hold on to a const Request.	2014-12-02 06:07:48 -05:00
Andreas Hansson	3d6ec81e66	mem: Add checks and explanation for assertMemInhibit usage	2014-12-02 06:07:46 -05:00
Andreas Hansson	41846cb61b	mem: Assume all dynamic packet data is array allocated This patch simplifies how we deal with dynamically allocated data in the packet, always assuming that it is array allocated, and hence should be array deallocated (delete[] as opposed to delete). The only uses of dataDynamic was in the Ruby testers. The ARRAY_DATA flag in the packet is removed accordingly. No defragmentation of the flags is done at this point, leaving a gap in the bit masks. As the last part the patch, it renames dataDynamicArray to dataDynamic.	2014-12-02 06:07:43 -05:00
Andreas Hansson	5df96cb690	mem: Remove redundant Packet::allocate calls This patch cleans up the packet memory allocation confusion. The data is always allocated at the requesting side, when a packet is created (or copied), and there is never a need for any device to allocate any space if it is merely responding to a paket. This behaviour is in line with how SystemC and TLM works as well, thus increasing interoperability, and matching established conventions. The redundant calls to Packet::allocate are removed, and the checks in the function are tightened up to make sure data is only ever allocated once. There are still some oddities in the packet copy constructor where we copy the data pointer if it is static (without ownership), and allocate new space if the data is dynamic (with ownership). The latter is being worked on further in a follow-on patch.	2014-12-02 06:07:41 -05:00
Andreas Hansson	0706a25203	mem: Use const pointers for port proxy write functions This patch changes the various write functions in the port proxies to use const pointers for all sources (similar to how memcpy works). The one unfortunate aspect is the need for a const_cast in the packet, to avoid having to juggle a const and a non-const data pointer. This design decision can always be re-evaluated at a later stage.	2014-12-02 06:07:38 -05:00
Andreas Hansson	9779ba2e37	mem: Add const getters for write packet data This patch takes a first step in tightening up how we use the data pointer in write packets. A const getter is added for the pointer itself (getConstPtr), and a number of member functions are also made const accordingly. In a range of places throughout the memory system the new member is used. The patch also removes the unused isReadWrite function.	2014-12-02 06:07:36 -05:00
Andreas Hansson	25bfc24999	mem: Remove null-check bypassing in Packet::getPtr This patch removes the parameter that enables bypassing the null check in the Packet::getPtr method. A number of call sites assume the value to be non-null. The one odd case is the RubyTester, which issues zero-sized prefetches(!), and despite being reads they had no valid data pointer. This is now fixed, but the size oddity remains (unless anyone object or has any good suggestions). Finally, in the Ruby Sequencer, appropriate checks are made for flush packets as they have no valid data pointer.	2014-12-02 06:07:34 -05:00
Omar Naji	0e63d2cd62	mem: Add a GDDR5 DRAM config This patch adds a first cut GDDR5 config to accommodate the users combining gem5 and GPUSim. The config is based on a SK Hynix datasheet, and the Nvidia GTX580 specification. Someone from the GPUSim user-camp should tweak the default page-policy and static frontend and backend latencies.	2014-12-02 06:07:32 -05:00
Andreas Hansson	d66b14ca61	misc: Another round of static analysis fixups Mostly addressing uninitialised members.	2014-11-24 09:03:38 -05:00
Alexandru Dutu	1f539f13c3	mem: Page Table map api modification This patch adds uncacheable/cacheable and read-only/read-write attributes to the map method of PageTableBase. It also modifies the constructor of TlbEntry structs for all architectures to consider the new attributes.	2014-11-23 18:01:09 -08:00
Alexandru Dutu	c11bcb8119	mem: Multi Level Page Table bug fix The multi level page table was giving false positives for already mapped translations. This patch fixes the bogus behavior.	2014-11-23 18:01:09 -08:00
Alexandru Dutu	e4859fae5b	mem: Page Table long lines Trimmed down all the lines greater than 78 characters.	2014-11-23 18:01:09 -08:00
Alexandru Dutu	f743bdcb69	x86: Segment initialization to support KvmCPU in SE This patch sets up low and high privilege code and data segments and places them in the following order: cs low, ds low, ds, cs, in the GDT. Additionally, a syscall and page fault handler for KvmCPU in SE mode are defined. The order of the segment selectors in GDT is required in this manner for interrupt handling to work properly. Segment initialization is done for all the thread contexts.	2014-11-23 18:01:08 -08:00
Alexandru Dutu	adbaa4dfde	kvm, x86: Adding support for SE mode execution This patch adds methods in KvmCPU model to handle KVM exits caused by syscall instructions and page faults. These types of exits will be encountered if KvmCPU is run in SE mode.	2014-11-23 18:01:08 -08:00
Alexandru Dutu	335514dfdc	cpuid, x86: Enabling more features in CPUid Adding more features in the CPUid with the purpose of supporting running the KvmCPU in SE mode.	2014-11-23 18:01:08 -08:00
Gabe Black	8bbfb1b39d	x86: pc: Put a stub IO device at port 0xed which the kernel can use for delays. There was already a stub device at 0x80, the port traditionally used for an IO delay. 0x80 is also the port used for POST codes sent by firmware, and that may have prompted adding this port as a second option.	2014-11-21 17:22:02 -08:00
Gabe Black	b5fd6050a2	dev: Use fixed size member variables to describe fixed size PL111 registers.	2014-11-18 02:38:23 -08:00
Gabe Black	a08cfd797b	vnc: Add a conversion function for bgr888.	2014-11-17 01:45:42 -08:00
Gabe Black	aceeecb192	x86: Fix setting segment bases in real mode. The data size used for actually writing the base value for the segment was the default size, but really it should set the entire value without any possible truncation.	2014-11-17 01:00:53 -08:00
Gabe Black	f8603fa120	x86: Fix some bugs in the real mode far jmp instruction. The far pointer should be shifted right to get the selector value, not left. Also, when calculating the width of the offset, the wrong register was used in one spot.	2014-11-17 00:20:01 -08:00
Gabe Black	7739c24fbe	x86: APIC: Only set deliveryStatus if our IPI is going somewhere. Otherwise the IPI which isn't sent will never arrive, and the deliveryStatus bit will never be cleared.	2014-11-17 00:19:07 -08:00
Gabe Black	79e7ca307e	x86: APIC: Fix the getRegArrayBit function. The getRegArrayBit function extracts a bit from a series of registers which are treated as a single large bit array. A previous change had modified the logic which figured out which bit to extract from ">> 5" to "% 5" which seems wrong, especially when other, similar functions were changed to use "% 32".	2014-11-17 00:17:06 -08:00
Gabe Black	d228db1143	x86: Fix the CPUID Long Mode Address Size function. The value in EAX has an 8 bit field for the linear address size and one for the physical address size when calling that function. A recent change implemented it but returned 0xff for both of those fields. That implies that linear and physical addresses are 255 bits wide which is wrong. When using the KVM CPU model this causes an error, presumably because some of those bits are actually reserved, or the CPU or kernel realizes 255 bits is a bad value. This change makes those values 48.	2014-11-16 23:12:42 -08:00
Andreas Hansson	481eb6ae80	arm: Fixes based on UBSan and static analysis Another churn to clean up undefined behaviour, mostly ARM, but some parts also touching the generic part of the code base. Most of the fixes are simply ensuring that proper intialisation. One of the more subtle changes is the return type of the sign-extension, which is changed to uint64_t. This is to avoid shifting negative values (undefined behaviour) in the ISA code.	2014-11-14 03:53:51 -05:00
Andreas Hansson	9ffe0e7ba6	mem: Clarify unit of DRAM controller buffer size	2014-11-14 03:53:48 -05:00
Mitch Hayenga	9d6d8e02aa	mem: Delete unused variable in Garnet NetworkLink With recent changes OSX clang compilation fails due to an unused variable.	2014-11-12 09:05:23 -05:00
Ali Saidi	b6f32253dd	arm: Fix timing wakeup with LLSC	2014-11-12 09:05:22 -05:00
Andreas Hansson	7d05895120	sim: Sort SimObject descendants and ports This patch fixes a number of occurences where the sorting order of the objects was implementation defined.	2014-11-12 09:05:21 -05:00
Andreas Hansson	cc336ecb5e	base: Revert 9277177eccff and use getenv/setenv for UTC time This patch reverts changeset 9277177eccff which does not do what it was intended to do. In essence, we go back to implementing mkutctime much like the non-standard timegm extension.	2014-11-12 09:05:20 -05:00
Marc Orr	bf80734b2c	x86 isa: This patch attempts an implementation at mwait. Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-11-06 05:42:22 -06:00
Andrew Lukefahr	bd32d55a2c	cpu: Minor Draining Bug Fixes a bug where Minor drains in the midst of committing a conditional store. While committing a conditional store, lastCommitWasEndOfMacroop is true (from the previous instruction) as we still haven't finished the conditional store. If a drain occurs before the cache response, Minor would check just lastCommitWasEndOfMacroop, which was true, and set drainState=DrainHaltFetch, which increases the streamSeqNum. This caused the conditional store to be squashed when the memory responded and it completed. However, to the memory the store succeeded, while to the instruction sequence it never occurred. In the case of an LLSC, the instruction sequence will replay the squashed STREX, which will fail as the cache is no longer in LLSC. Then the instruction sequence will loop back to a LDREX, which receives the updated (incorrect) value. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-11-06 05:42:21 -06:00
Nilay Vaish	0811f21f67	ruby: provide a backing store Ruby's functional accesses are not guaranteed to succeed as of now. While this is not a problem for the protocols that are currently in the mainline repo, it seems that coherence protocols for gpus rely on a backing store to supply the correct data. The aim of this patch is to make this backing store configurable i.e. it comes into play only when a particular option: --access-backing-store is invoked. The backing store has been there since M5 and GEMS were integrated. The only difference is that earlier the system used to maintain the backing store and ruby's copy was write-only. Sometime last year, we moved to data being supplied supplied by ruby in SE mode simulations. And now we have patches on the reviewboard, which remove ruby's copy of memory altogether and rely completely on the system's memory to supply data. This patch adds back a SimpleMemory member to RubySystem. This member is used only if the option: access-backing-store is set to true. By default, the memory would not be accessed.	2014-11-06 05:42:21 -06:00
Nilay Vaish	3022d463fb	ruby: interface with classic memory controller This patch is the final in the series. The whole series and this patch in particular were written with the aim of interfacing ruby's directory controller with the memory controller in the classic memory system. This is being done since ruby's memory controller has not being kept up to date with the changes going on in DRAMs. Classic's memory controller is more up to date and supports multiple different types of DRAM. This also brings classic and ruby ever more close. The patch also changes ruby's memory controller to expose the same interface.	2014-11-06 05:42:21 -06:00
Nilay Vaish	68ddfab8a4	ruby: remove the function functionalReadBuffers() This function was added when I had incorrectly arrived at the conclusion that such a function can improve the chances of a functional read succeeding. As was later realized, this is not possible in the current setup. While the code using this function was dropped long back, this function was not. Hence the patch.	2014-11-06 05:42:20 -06:00
Nilay Vaish	d25b722e4a	ruby: coherence protocols: remove data block from dirctory entry This patch removes the data block present in the directory entry structure of each protocol in gem5's mainline. Firstly, this is required for moving towards common set of memory controllers for classic and ruby memory systems. Secondly, the data block was being misused in several places. It was being used for having free access to the physical memory instead of calling on the memory controller. From now on, the directory controller will not have a direct visibility into the physical memory. The Memory Vector object now resides in the Memory Controller class. This also means that some significant changes are being made to the functional accesses in ruby.	2014-11-06 05:42:20 -06:00
Nilay Vaish	0baaed60ab	ruby: slicc: allow adding a bool to an int, like C++.	2014-11-06 05:42:20 -06:00
Nilay Vaish	85c29973a3	ruby: remove sparse memory. In my opinion, it creates needless complications in rest of the code. Also, this structure hinders the move towards common set of code for physical memory controllers.	2014-11-06 05:42:20 -06:00
Nilay Vaish	95a0b18431	ruby: single physical memory in fs mode Both ruby and the system used to maintain memory copies. With the changes carried for programmed io accesses, only one single memory is required for fs simulations. This patch sets the copy of memory that used to reside with the system to null, so that no space is allocated, but address checks can still be carried out. All the memory accesses now source and sink values to the memory maintained by ruby.	2014-11-06 05:41:44 -06:00
Nilay Vaish	8ccfd9defa	ruby: dma sequencer: remove RubyPort as parent class As of now DMASequencer inherits from the RubyPort class. But the code in RubyPort class is heavily tailored for the CPU Sequencer. There are parts of the code that are not required at all for the DMA sequencer. Moreover, the next patch uses the dma sequencer for carrying out memory accesses for all the io devices. Hence, it is better to have a leaner dma sequencer.	2014-11-06 00:55:09 -06:00
Ali Saidi	7a0bf814b6	automated merge	2014-10-29 23:22:26 -05:00
Ali Saidi	f2db2a96d1	arm, tests: Update config files to more recent kernels and create 64-bit regressions. This changes the default ARM system to a Versatile Express-like system that supports 2GB of memory and PCI devices and updates the default kernels/file-systems for AArch64 ARM systems (64-bit) to support up to 32GB of memory and PCI devices. Some platforms that are no longer supported have been pruned from the configuration files. In addition a set of 64-bit ARM regressions have been added to the regression system.	2014-10-29 23:18:27 -05:00
Mitch Hayenga	5bfa521c46	cpu: Add writeback modeling for drain functionality It is possible for the O3 CPU to consider itself drained and later have a squashed instruction perform a writeback. This patch re-adds tracking of in-flight instructions to prevent falsely signaling a drained event.	2014-10-29 23:18:27 -05:00
Mitch Hayenga	6847bbf7ce	cpu: Add drain check functionality to IEW IEW did not check the instQueue and memDepUnit to ensure they were drained. This caused issues when drainSanityCheck() did check those structures after asserting IEW was drained.	2014-10-29 23:18:26 -05:00
Ali Saidi	b31d9e93e2	arm, mem: Fix drain bug and provide drain prints for more components.	2014-10-29 23:18:26 -05:00
Ali Saidi	baf88e908d	arm: Fix multi-system AArch64 boot w/caches. Automatically extract cpu release address from DTB file. Check SCTLR_EL1 to verify all caches are enabled.	2014-10-29 23:18:26 -05:00
Ali Saidi	9900629f83	arm: Mark some miscregs (timer counter) registers at unverifiable. The checker can't verify timer registers, so it should just grab the version from the executing CPU, otherwise it could get a larger value and diverge execution.	2014-10-29 23:18:24 -05:00
Ali Saidi	e3ee27c7b4	cpu: Add support to checker for CACHE_BLOCK_ZERO commands. The checker didn't know how to properly validate these new commands.	2014-10-29 23:18:24 -05:00
Andrew Bardsley	536c72333f	cpu: Fix barrier push to store buffer when full bug in Minor This patch fixes a bug where a completing load or store which is also a barrier can push a barrier into the store buffer without first checking that there is a free slot. The bug was not fatal but would print a warning that the store buffer was full when inserting.	2014-10-29 23:18:24 -05:00
Curtis Dunham	4024fab7fc	mem: don't inhibit WriteInv's or defer snoops on their MSHRs WriteInvalidate semantics depend on the unconditional writeback or they won't complete. Also, there's no point in deferring snoops on their MSHRs, as they don't get new data at the end of their life cycle the way other transactions do. Add comment in the cache about a minor inefficiency re: WriteInvalidate.	2014-10-21 17:04:41 -05:00
Curtis Dunham	46f9f11a55	mem: have WriteInvalidate obsolete MSHRs Since WriteInvalidate directly writes into the cache, it can create tricky timing interleavings with reads and writes to the same cache line that haven't yet completed. This patch ensures that these requests, when completed, don't overwrite the newer data from the WriteInvalidate.	2014-10-29 23:18:24 -05:00
Steve Reinhardt	6ab4eddb9f	syscall_emul: add retry flag to SyscallReturn This hook allows blocking emulated system calls to indicate that they would block, but return control to the simulator so that the simulation does not hang. The actual retry functionality requires additional support, to be provided in a future changeset.	2014-09-02 16:07:50 -05:00
Steve Reinhardt	9ac7f14fc0	syscall_emul: minor style fix to LiveProcess constructor	2014-10-22 15:53:34 -07:00
Steve Reinhardt	df7f0892ed	syscall_emul: devirtualize BaseBufferArg methods Not clear why they were marked virtual to begin with, but that doesn't appear to be necessary.	2014-10-22 15:53:34 -07:00
Steve Reinhardt	44af2c6a69	syscall_emul: Put BufferArg classes in a separate header. Move the BufferArg classes that support syscall buffer args (i.e., pointers into simulated user space) out of syscall_emul.hh and into a new header syscall_emul_buf.hh so they are accessible to emulated driver implementations. Take the opportunity to add some comments as well.	2014-10-22 15:53:34 -07:00
Steve Reinhardt	44ec1d2124	syscall_emul: add EmulatedDriver object Fake SE-mode device drivers can now be added by deriving from this abstract object.	2014-10-22 15:53:34 -07:00
Nilay Vaish	6523aad25c	sim: revert 6709bbcf564d The identifier SYS_getdents is not available on Mac OS X. Therefore, its use results in compilation failure. It seems there is no straight forward way to implement the system call getdents using readdir() or similar C functions. Hence the commit 6709bbcf564d is being rolled back.	2014-10-22 15:59:57 -05:00
Andreas Hansson	d6f1c6ce89	x86: Fixes to avoid LTO warnings This patch fixes a few minor issues that caused link-time warnings when using LTO, mainly for x86. The most important change is how the syscall array is created. Previously gcc and clang would complain that the declaration and definition types did not match. The organisation is now changed to match how it is done for ARM, moving the code that was previously in syscalls.cc into process.cc, and having a class variable pointing to the static array. With these changes, there are no longer any warnings using gcc 4.6.3 with LTO.	2014-10-20 18:03:56 -04:00
Andreas Hansson	6290f98194	misc: Use gmtime for conversion to UTC to avoid getenv/setenv This patch changes how we turn time into UTC. Previously we manipulated the TZ environment variable, but this has issues as the strings that are manipulated could be tainted (see e.g. CERT ENV34-C). Now we simply rely on the built-in gmtime function and avoid touching getenv/setenv all together.	2014-10-20 18:03:55 -04:00
Omar Naji	a4a8568bd2	mem: Fix DRAM activationlLimit bug Ensure that we do the proper event scheduling also when the activation limit is disabled.	2014-10-20 18:03:55 -04:00
Andreas Hansson	77f8f5d94c	base: Fix for stats node on gcc < 4.6.3 This patch adds an explicit function to get the underlying node as gcc 4.6.1 and 4.6.2 have issues otherwise.	2014-10-20 18:03:54 -04:00
Omar Naji	29dd2887f4	mem: Add DRAM device size and check against config This patch adds the size of the DRAM device to the DRAM config. It also compares the actual DRAM size (calculated using information from the config) to the size defined in the system. If these two values do not match gem5 will print a warning. In order to do correct DRAM research the size of the memory defined in the system should match the size of the DRAM in the config. The timing and current parameters found in the DRAM configs are defined for a DRAM device with a specific size and would differ for another device with a different size.	2014-10-20 18:03:52 -04:00
Nilay Vaish	922a9d8ed2	cpu: o3: corrects base FP and CC register index in removeThread()	2014-10-20 16:47:55 -05:00
Tom Jablin	c6731e331a	sim: invalid alignment checks in mmap and mremap Presently, the alignment checks in the mmap and mremap implementations in syscall_emul.hh are wrong. The checks are implemented as: if ((start % TheISA::PageBytes) != 0 \|\| (length % TheISA::PageBytes) != 0) { warn("mmap failing: arguments not page-aligned: " "start 0x%x length 0x%x", start, length); return -EINVAL; } This checks that both the start and the length arguments of the mmap syscall are checked for page-alignment. However, the POSIX specification says: The off argument is constrained to be aligned and sized according to the value returned by sysconf() when passed _SC_PAGESIZE or _SC_PAGE_SIZE. When MAP_FIXED is specified, the application shall ensure that the argument addr also meets these constraints. The implementation performs mapping operations over whole pages. Thus, while the argument len need not meet a size or alignment constraint, the implementation shall include, in any mapping operation, any partial page specified by the range [pa,pa+len). So the length parameter should not be checked for page-alignment. By contrast, the current implementation fails to check the offset argument, which must be page aligned. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-10-20 16:45:25 -05:00
Michael Adler	7254d5742a	sim: mmap: correct behavior for fixed address Change mmap fixed address request to return an error if the mapping is impossible due to conflict instead of what I believe used to be silent corruption. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-10-20 16:45:08 -05:00
Michael Adler	a3fe4c0662	sim: implement getdents/getdents64 in user mode Has been tested only for alpha. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-10-20 16:44:53 -05:00
Severin Wischmann ext:(%2C%20Ioannis%20Ilkos%20%3Cioannis.ilkos09%40imperial.ac.uk%3E)	e72736aaf0	x86: syscall: implementation of exit_group On exit_group syscall, we used to exit the simulator. But now we will only halt the execution of threads that belong to the group. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-10-20 16:43:48 -05:00
Andreas Hansson	6d4866383f	mem: Modernise PhysicalMemory with C++11 features Bring the PhysicalMemory up-to-date by making use of range-based for loops and vector intialisation where possible.	2014-10-16 05:50:01 -04:00
Andreas Hansson	edc77fc03c	misc: Move AddrRangeList from port.hh to addr_range.hh The new location seems like a better fit. The iterator typedefs are removed in favour of using C++11 auto.	2014-10-16 05:49:59 -04:00
Geoffrey Blake	2d2006ddb3	dev: refactor pci config space for sysfs scanning Sysfs on ubuntu scrapes the entire PCI config space when it discovers a device using 4 byte accesses. This was not supported by our devices, in particular the NIC that implemented the extended PCI config space. This change allows the extended PCI config space to be accessed by sysfs properly.	2014-10-16 05:49:57 -04:00
Andrew Bardsley	d6732895a5	mem: Add ExternalMaster and ExternalSlave ports This patch adds two MemoryObject's: ExternalMaster and ExternalSlave. Each object has a single port which can be bound to an externally- provided bridge to a port of another simulation system at initialisation.	2014-10-16 05:49:56 -04:00
Andreas Hansson	e2a13386e5	sim: EventQueue wakeup on events scheduled outside the event loop This patch adds a 'wakeup' member function to EventQueue which should be called on an event queue whenever an event is scheduled on the event queue from outside code within the call tree of the gem5 event loop. This clearly isn't necessary for normal gem5 EventQueue operation but becomes the minimum necessary interface to allow hosting gem5's event loop onto other schedulers where there may be calls into gem5 from external code which schedules events onto an EventQueue between the current time and the time of the next scheduled event. The use case I have in mind is a SystemC hosting where the event loop is: while (more events) { wait(time_to_next_event or wakeup) setCurTick service events at this time } where the 'wait' needs to be woken up if time_to_next_event becomes shorter due to a scheduled event from SystemC arriving in a gem5 object. Requiring 'wakeup' to be called is a more efficient interface than requiring all gem5 event scheduling actions to affect the host scheduler. This interface could be located elsewhere, say on another global object, or by being passed by the host scheduler to objects which will schedule such events, but it seems cleanest to put it on EventQueue as it is actually a signal to the queue. EventQueue::wakeup is called for async_event events on event queue 0 as it's only important that some queue be triggered for such events.	2014-10-16 05:49:53 -04:00
Andrew Bardsley	960935a5bd	base: Reimplement the DPRINTF mechanism in a Logger class This patch adds a Logger class encapsulating dprintf. This allows variants of DPRINTF logging to be constructed and substituted in place of the default behaviour. The Logger provides a logMessage(when, name, format, ...) member function like Trace::dprintf and a getOstream member function to use a raw ostream for logging. A class OstreamLogger is provided which generates the customary debugging output with Trace::OstreamLogger::logMessage being the old Trace::dprintf.	2014-10-16 05:49:53 -04:00
Andreas Hansson	a2d246b6b8	arch: Use shared_ptr for all Faults This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".	2014-10-16 05:49:51 -04:00
Andreas Hansson	a769963d16	o3: Use shared_ptr for MemDepEntry This patch transitions the o3 MemDepEntry from the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no changes in behaviour, and the code modifications are mainly replacing "new" with "make_shared".	2014-10-16 05:49:49 -04:00
Andreas Hansson	db3739682d	mem: Use shared_ptr for Ruby Message classes This patch transitions the Ruby Message and its derived classes from the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no changes in behaviour, and the code modifications are mainly replacing "new" with "make_shared". The cloning of derived messages is slightly changed as they previously relied on overriding the base-class through covariant return types.	2014-10-16 05:49:49 -04:00
Andreas Hansson	acdfcad30d	base: Use shared_ptr for stat Node This patch transitions the stat Node and its derived classes from the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no changes in behaviour, and the code modifications are mainly replacing "new" with "make_shared".	2014-10-16 05:49:48 -04:00
Andreas Hansson	8b789ae451	base: Transition CP annotate to use shared_ptr	2014-10-16 05:49:47 -04:00
Andreas Hansson	ad3f75dc81	dev: Use shared_ptr for EthPacketData This patch transitions the EthPacketData from the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no changes in behaviour, and the code modifications are mainly replacing "new" with "make_shared". The bool casting operator for the shared_ptr is explicit, and we must therefore either cast it, compare it to NULL (p != nullptr), double negate it (!!p) or do a (p ? true : false).	2014-10-16 05:49:46 -04:00
Andreas Hansson	4e67ab6663	dev: Use shared_ptr for Arguments::Data This patch takes a first few steps in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no changes in behaviour, and the code modifications are mainly introducing the use of make_shared. Note that the class could use unique_ptr rather than shared_ptr, was it not for the postfix increment and decrement operators.	2014-10-16 05:49:45 -04:00
Andreas Hansson	2475862747	arch,x86,mem: Dynamically determine the ISA for Ruby store check This patch makes the memory system ISA-agnostic by enabling the Ruby Sequencer to dynamically determine if it has to do a store check. To enable this check, the ISA is encoded as an enum, and the system is able to provide the ISA to the Sequencer at run time. --HG-- rename : src/arch/x86/insts/microldstop.hh => src/arch/x86/ldstflags.hh	2014-10-16 05:49:44 -04:00
Andreas Hansson	df973abef3	mem: Dynamically determine page bytes in memory components This patch takes a step towards an ISA-agnostic memory system by enabling the components to establish the page size after instantiation. The swap operation in the memory is now also allowing any granularity to avoid depending on the IntReg of the ISA.	2014-10-16 05:49:43 -04:00
Andreas Sandberg	37908d62a4	arm: Add helper methods to setup architected PMU events	2014-10-16 05:49:42 -04:00
Andreas Sandberg	e0074324ba	cpu: Probe points for basic PMU stats This changeset adds probe points that can be used to implement PMU counters for CPU stats. The following probes are supported: * BaseCPU::ppCycles / Cycles * BaseCPU::ppRetiredInsts / RetiredInsts * BaseCPU::ppRetiredLoads / RetiredLoads * BaseCPU::ppRetiredStores / RetiredStores * BaseCPU::ppRetiredBranches RetiredBranches	2014-10-16 05:49:41 -04:00
Andreas Sandberg	9d35d48e84	arm: Add TLB PMU probes This changeset adds probe points that can be used to implement PMU counters for TLB stats. The following probes are supported: * ArmISA::TLB::ppRefills / TLB Refills (TLB insertions)	2014-10-16 05:49:41 -04:00
Andreas Sandberg	76b0ff9ecd	cpu: Add branch predictor PMU probe points This changeset adds probe points that can be used to implement PMU counters for branch predictor stats. The following probes are supported: * BPRedUnit::ppBranches / Branches * BPRedUnit::ppMisses / Misses	2014-10-16 05:49:40 -04:00
Andreas Sandberg	3697990c27	arm: Add a model of an ARM PMUv3 This class implements a subset of the ARM PMU v3 specification as described in the ARMv8 reference manual. It supports most of the features of the PMU, however the following features are known to be missing: * Event filtering (e.g., from different privilege levels). * Access controls (the PMU currently ignores the execution level). * The chain counter (event no. 0x1E) is unimplemented. The PMU itself does not implement any events, it merely provides an interface for the configuration scripts to hook up probes that drive events. Configuration scripts should call addEventProbe() to configure custom events or high-level methods to configure architected events. The Python implementation of addEventProbe() automatically delays event type registration until after instantiation. In order to support CPU switching and some combined counters (e.g., memory references synthesized from loads and stores), the PMU allows multiple probes per event type. When creating a system that switches between CPU models that share the same PMU, PMU events for all of the CPU models can be registered with the PMU. Kudos to Matt Horsnell for the initial gem5 implementation of the PMU.	2014-10-16 05:49:39 -04:00
Andreas Sandberg	132ea6319a	sim: Add typedefs for PMU probe points In order to show make PMU probe points usable across different PMU implementations, we want a common probe interface. This patch the namespace ProbePoins that contains typedefs for probe points that are shared between multiple SimObjects. It also adds typedefs for the PMU probe interface.	2014-10-16 05:49:38 -04:00
Andreas Sandberg	804ed4b418	sim: Add support for serializing BitUnionXX BitUnion instances can normally not be used with the SERIALIZE_SCALAR and UNSERIALIZE_SCALAR macros due to the way they are converted between their storage type and their actual type. This changeset adds a set of parm(In\|Out) functions specifically for gem5 bit unions to work around the issue.	2014-10-16 05:49:37 -04:00
Andreas Hansson	66df7b7fd4	config: Add the ability to read a config file using C++ and Python This patch adds the ability to load in config.ini files generated from gem5 into another instance of gem5 built without Python configuration support. The intended use case is for configuring gem5 when it is a library embedded in another simulation system. A parallel config file reader is also provided purely in Python to demonstrate the approach taken and to provided similar functionality for as-yet-unknown use models. The Python configuration file reader can read both .ini and .json files. C++ configuration file reading: A command line option has been added for scons to enable C++ configuration file reading: --with-cxx-config There is an example in util/cxx_config that shows C++ configuration in action. util/cxx_config/README explains how to build the example. Configuration is achieved by the object CxxConfigManager. It handles reading object descriptions from a CxxConfigFileBase object which wraps a config file reader. The wrapper class CxxIniFile is provided which wraps an IniFile for reading .ini files. Reading .json files from C++ would be possible with a similar wrapper and a JSON parser. After reading object descriptions, CxxConfigManager creates SimObjectParam-derived objects from the classes in the (generated with this patch) directory build/ARCH/cxx_config CxxConfigManager can then build SimObjects from those SimObjectParams (in an order dictated by the SimObject-value parameters on other objects) and bind ports of the produced SimObjects. A minimal set of instantiate-replacing member functions are provided by CxxConfigManager and few of the member functions of SimObject (such as drain) are extended onto CxxConfigManager. Python configuration file reading (configs/example/read_config.py): A Python version of the reader is also supplied with a similar interface to CxxConfigFileBase (In Python: ConfigFile) to config file readers. The Python config file reading will handle both .ini and .json files. The object construction strategy is slightly different in Python from the C++ reader as you need to avoid objects prematurely becoming the children of other objects when setting parameters. Port binding also needs to be strictly in the same port-index order as the original instantiation.	2014-10-16 05:49:37 -04:00
Andreas Hansson	b14f521e5f	scons: Add Undefined Behavior Sanitizer (UBSan) option This patch adds the Undefined Behavior Sanitizer (UBSan) for clang and gcc >= 4.9. Due to the performance impact, the usage is guarded by a command-line option.	2014-10-16 05:49:36 -04:00
Akash Bagdia	8b7724d04c	arm: Don't speculatively access most miscregisters. Speculative exeuction can cause panics in detailed execution mode that shouldn't happen.	2014-09-02 11:26:32 +01:00
Curtis Dunham	f7c6a2cbed	scons: Generate a single debug flag C++ file Reduces target count/compiler invocations by ~180.	2014-08-12 17:35:28 -05:00
Curtis Dunham	f780e85dc3	scons: create dummy target to have SWIG generate C++ classes scons build/<arch>/swig	2014-10-16 05:49:33 -04:00
Andrew Bardsley	d8502ee46d	config: Add a --without-python option to build process Add the ability to build libgem5 without embedded Python or the ability to configure with Python. This is a prelude to a patch to allow config.ini files to be loaded into libgem5 using only C++ which would make embedding gem5 within other simulation systems easier. This adds a few registration interfaces to things which cross between Python and C++. Namely: stats dumping and SimObject resolving	2014-10-16 05:49:32 -04:00
Andrew Lukefahr	8e07b36d2b	cpu: Fix o3 SMT IQCount bug Commmitted by: Nilay Vaish <nilay@cs.wisc.edu>	2014-10-11 16:16:02 -05:00
Nilay Vaish	a098fad174	ruby: network: garnet: add statistics for different activities This patch adds some statistics to garnet that record the activity of certain structures in the on-chip network. These statistics, in a later patch, will be used for computing the energy consumed by the on-chip network.	2014-10-11 15:02:23 -05:00
Nilay Vaish	25bb18f12b	ruby: network: garnet: remove functions for computing power	2014-10-11 15:02:23 -05:00
Nilay Vaish	9321a41c62	ruby: drop Orion network power model Orion is being dropped from ruby. It would be replaced with DSENT which has better models. Note that the power / energy numbers reported after this patch has been applied are not for use.	2014-10-11 15:02:23 -05:00
Nilay Vaish	b6d804a1e6	ruby: mesi: slight renaming	2014-10-11 15:02:23 -05:00
Nilay Vaish	e7f918d8cd	ruby: structures: coorect #ifndef macros in header files	2014-10-11 15:02:22 -05:00
Jiuyue Ma	9fb8b8515b	x86: add LongModeAddressSize function to cpuid LongModeAddressSize was used by kernel 2.6.28.4 for physical address validation, if not properly implemented, PCI resource allocation may failed because of ioremap failed: - linux-2.6.28.4/arch/x86/mm/ioremap.c:27-30 27 static inline int phys_addr_valid(unsigned long addr) 28 { 29 return addr < (1UL << boot_cpu_data.x86_phys_bits); 30 } - linux-2.6.28.4/arch/x86/kernel/cpu/common.c:475-482 475 #ifdef CONFIG_X86_64 476 if (c->extended_cpuid_level >= 0x80000008) { 477 u32 eax = cpuid_eax(0x80000008); 478 479 c->x86_virt_bits = (eax >> 8) & 0xff; 480 c->x86_phys_bits = eax & 0xff; 481 } 482 #endif - linux-2.6.28.4/arch/x86/mm/ioremap.c:209-214 209 if (!phys_addr_valid(phys_addr)) { 210 printk(KERN_WARNING "ioremap: invalid physical address %llx\n", 211 (unsigned long long)phys_addr); 212 WARN_ON_ONCE(1); 213 return NULL; 214 } This patch return 0x0000ffff for LongModeAddressSize, which guarantee phys_addr_valid never failed. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-06-13 16:48:47 +08:00
Andrew Lukefahr	f94fd44991	sim: draining bug for fast-forwaring multiple cores fix draining bug where multiple cores hit max_insts_any_thread simultaneously Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-10-11 15:02:22 -05:00
Nilay Vaish	2816521f0d	base: addr range: slight change to validity check The validity check is being changed from < to <= since the end of the range is considered to be a part of it.	2014-10-11 15:02:22 -05:00
Nilay Vaish	a9bfea5a35	base: misc: Add missing header file.	2014-10-11 15:02:22 -05:00
Omar Naji	cd8023a1ee	mem: DRAMPower integration for on-line DRAM power stats This patch takes the final step in integrating DRAMPower and adds the appropriate calls in the DRAM controller to provide the command trace and extract the power and energy stats. The debug printouts are still left in place, but will eventually be removed. At the moment the DRAM power calculation is always on when using the DRAM controller model. The run-time impact of this addition is around 1.5% when looking at the total host seconds of the regressions. We deem this a sensible trade-off to avoid the complication of adding an enable/disable mechanism.	2014-07-29 17:22:44 +01:00
Omar Naji	afc6ce6228	mem: Add DRAMPower wrapping class This patch adds a class to wrap DRAMPower Library in gem5. This class initiates an object of class MemorySpecification of the DRAMPower Library, passes the parameters from DRAMCtrl.py to this object and creates an object of drampower library using the memory specification.	2014-07-29 17:29:36 +01:00
Omar Naji	00b37ffe50	mem: Add missig timing and current parameters to DRAM configs This patch adds missing timing and current parameters to the existing DRAM configs. These missing timing and current parameters are required by DRAMPower for the DRAM power calculations. The missing values are datasheet values of the specified DRAMs, and the appropriate references are added for the variuos configs.	2014-07-25 10:05:59 +01:00
Omar Naji	f9fce9ba07	mem: Remove DRAMSim2 DDR3 configuration This patch prunes the DDR3 config that was initially created to match the default config of DRAMSim2. The config is not complete as it is, and to avoid having to maintain it, the easiest way forward is to simply prune it. Going forward we are adding power number etc to the other configurations.	2014-10-09 17:52:04 -04:00
Andreas Hansson	c81517c293	config: Add Current as a parameter type This patch adds the Python parameter type Current, which is used for the DRAM power modelling (to start with). With this addition we avoid implicit unit assumptions.	2014-10-09 17:52:00 -04:00
Mitch Hayenga	06f4b521aa	cpu: Remove Ozone CPU from the source tree The Ozone CPU is now very much out of date and completely non-functional, with no one actively working on restoring it. It is a source of confusion for new users who attempt to use it before realizing its current state. RIP	2014-10-09 17:51:58 -04:00
Andreas Hansson	f4a538f862	mem: Add packet sanity checks to cache and MSHRs This patch adds a number of asserts to the cache, checking basic assumptions about packets being requests or responses.	2014-10-09 17:51:56 -04:00
Andreas Hansson	4a453e8c95	mem: Allow packet queue to move next send event forward This patch changes the packet queue such that when scheduling a send, the queue is allowed to move the event forward.	2014-10-09 17:51:52 -04:00
Andreas Hansson	6498ccddb2	misc: Fix issues identified by static analysis Another bunch of issues addressed.	2014-10-01 08:05:54 -04:00
Andreas Hansson	b520223699	arm: Use MiscRegIndex rather than int when flattening Some additional type checking to avoid future issues.	2014-10-01 08:05:52 -04:00
Andreas Hansson	10f82934be	arm: More UBSan cleanups after additional full-system runs Some incorrect casting to IntRegIndex, and a few uninitialized members in the i8254xGBe device.	2014-10-01 08:05:51 -04:00
Andreas Hansson	ec41000dad	arm: Fixed undefined behaviours identified by gcc This patch fixes the runtime errors highlighted by the undefined behaviour sanitizer. In the end there were two issues. First, when rotating an immediate, we ended up shifting an uint32_t by 32 in some cases. This case is fixed by checking for a rotation by 0 positions. Second, the Mrc15 and Mcr15 are operating on an IntReg and a MiscReg, but we used the type RegRegImmOp and passed a MiscRegIndex as an IntRegIndex. This issue is resolved by introducing a MiscRegRegImmOp and RegMiscRegImmOp with the appropriate types. With these fixes there are no runtime errors identified for the full ARM regressions.	2014-09-27 09:08:37 -04:00
Andreas Hansson	341dbf2662	arch: Use const StaticInstPtr references where possible This patch optimises the passing of StaticInstPtr by avoiding copying the reference-counting pointer. This avoids first incrementing and then decrementing the reference-counting pointer.	2014-09-27 09:08:36 -04:00
Andreas Hansson	deb2200671	scons: Address issues related to gcc 4.9.1 Fix a number few minor issues to please gcc 4.9.1. Removing the '-fuse-linker-plugin' flag means no libraries are part of the LTO process, but hopefully this is an acceptable loss, as the flag causes issues on a lot of systems (only certain combinations of gcc, ld and ar work).	2014-09-27 09:08:34 -04:00
Curtis Dunham	4836aef1e4	dev: Output invalid access size in IsaFake panic	2014-09-27 09:08:33 -04:00
Curtis Dunham	b7f1d675da	mem: Output precise range when XBar has conflicts	2014-09-27 09:08:32 -04:00
Curtis Dunham	725be98fe8	mem: Provide better diagnostic for unconnected port When _masterPort is null, a message to that effect is more helpful than a segfault.	2014-09-27 09:08:30 -04:00
Andreas Hansson	de62aedabc	misc: Fix a bunch of minor issues identified by static analysis Add some missing initialisation, and fix a handful benign resource leaks (including some false positives).	2014-09-27 09:08:29 -04:00
Mitch Hayenga	cc6523e2d6	cpu: Remove unused deallocateContext calls The call paths for de-scheduling a thread are halt() and suspend(), from the thread context. There is no call to deallocateContext() in general, though some CPUs chose to define it. This patch removes the function from BaseCPU and the cores which do not require it.	2014-09-20 17:18:36 -04:00
Mitch Hayenga	e1403fc2af	alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate activate(), suspend(), and halt() used on thread contexts had an optional delay parameter. However this parameter was often ignored. Also, when used, the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were ever specified). This patch removes the delay parameter and 'Events' associated with them across all ISAs and cores. Unused activate logic is also removed.	2014-09-20 17:18:35 -04:00
Andreas Hansson	1f6d5f8f84	mem: Rename Bus to XBar to better reflect its behaviour This patch changes the name of the Bus classes to XBar to better reflect the actual timing behaviour. The actual instances in the config scripts are not renamed, and remain as e.g. iobus or membus. As part of this renaming, the code has also been clean up slightly, making use of range-based for loops and tidying up some comments. The only changes outside the bus/crossbar code is due to the delay variables in the packet. --HG-- rename : src/mem/Bus.py => src/mem/XBar.py rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh rename : src/mem/bus.cc => src/mem/xbar.cc rename : src/mem/bus.hh => src/mem/xbar.hh	2014-09-20 17:18:32 -04:00
Stephan Diestelhorst	435f4aec3d	mem: Add access statistics for the snoop filter Adds a simple access counter for requests and snoops for the snoop filter and also classifies hits based on whether a single other holder existed or whether multiple shares held the line.	2014-04-25 12:36:16 +01:00
Stephan Diestelhorst	afa2428eca	mem: Tie in the snoop filter in the coherent bus	2014-09-20 17:18:29 -04:00
Stephan Diestelhorst	7d488cc66f	mem: Add a simple snoop counter per bus This patch adds a simple counter for both total messages and a histogram for the fan-out of snoop messages. The fan-out describes to how many ports snoops had to be sent per incoming request / snoop-from-below. Without any cleverness, this usually means to either all, or all but the requesting port.	2014-04-24 13:28:47 +01:00
Stephan Diestelhorst	fe98cb6be4	misc: Add functions for doing popcount and power-of-two checking Adds two public domain algorithms for determining number of set bits and also whether a value is a power of two, uses the builtin that is available in GCC and clang for popcount.	2014-04-24 17:41:26 +01:00
Stephan Diestelhorst	ba98d598ae	mem: Simple Snoop Filter This is a first cut at a simple snoop filter that tracks presence of lines in the caches "above" it. The snoop filter can be applied at any given cache hierarchy and will then handle the caches above it appropriately; there is no need to use this only in the last-level bus. This design currently has some limitations: missing stats, no notion of clean evictions (these will not update the underlying snoop filter, because they are not sent from the evicting cache down), no notion of capacity for the snoop filter and thus no need for invalidations caused by capacity pressure in the snoop filter. These are planned to be added on top with future change sets.	2014-09-20 17:18:26 -04:00
Stephan Diestelhorst	16351ba8d6	energy: Tighter checking of levels for DFS systems There are cases where users might by accident / intention specify less voltage operating points thatn frequency points. We consider one of these cases special: giving only a single voltage to a voltage domain effectively renders it as a static domain. This patch adds additional logic in the auxiliary parts of the functionality to handle these cases properly (simple driver asking for N>1 operating levels, we should return the same voltage for all of them) and adds error checking code in the voltage domain.	2014-08-12 19:00:44 +01:00
Stephan Diestelhorst	65aaf62714	energy: Add the Energy Controller in the right configs Tie in the newly created energy controller components in the default configurations.	2014-07-25 13:36:23 +01:00
Akash Bagdia	04e51e5e3e	energy: Memory-mapped Energy Controller component This patch provides an Energy Controller device that provides software (driver) access to a DVFS handler. The device is currently residing in the dev/arm tree, but there is nothing inherently ARM specific in the behaviour. It is currently only tested and supported for ARM Linux, hence the location.	2014-09-20 17:18:23 -04:00
Stephan Diestelhorst	4422d1322a	energy: Small extentions and fixes for DVFS handler These additions allow easier interoperability with and querying from an additional controller which will be in a separate patch. Also adding warnings for changing the enabled state of the handler across checkpoint / resume and deviating from the state in the configuration. Contributed-by: Akash Bagdia <akash.bagdia@arm.com>	2014-06-16 14:59:44 +01:00
Wendy Elsasser	bf23847072	mem: Add DDR4 bank group timing Added the following parameter to the DRAMCtrl class: - bank_groups_per_rank This defaults to 1. For the DDR4 case, the default is overridden to indicate bank group architecture, with multiple bank groups per rank. Added the following delays to the DRAMCtrl class: - tCCD_L : CAS-to-CAS, same bank group delay - tRRD_L : RAS-to-RAS, same bank group delay These parameters are only applied when bank group timing is enabled. Bank group timing is currently enabled only for DDR4 memories. For all other memories, these delays will default to '0 ns' In the DRAM controller model, applied the bank group timing to the per bank parameters actAllowedAt and colAllowedAt. The actAllowedAt will be updated based on bank group when an ACT is issued. The colAllowedAt will be updated based on bank group when a RD/WR burst is issued. At the moment no modifications are made to the scheduling.	2014-09-20 17:18:21 -04:00
Wendy Elsasser	b6ecfe9183	mem: Add memory rank-to-rank delay Add the following delay to the DRAM controller: - tCS : Different rank bus turnaround delay This will be applied for 1) read-to-read, 2) write-to-write, 3) write-to-read, and 4) read-to-write command sequences, where the new command accesses a different rank than the previous burst. The delay defaults to 2*tCK for each defined memory class. Note that this does not correspond to one particular timing constraint, but is a way of modelling all the associated constraints. The DRAM controller has some minor changes to prioritize commands to the same rank. This prioritization will only occur when the command stream is not switching from a read to write or vice versa (in the case of switching we have a gap in any case). To prioritize commands to the same rank, the model will determine if there are any commands queued (same type) to the same rank as the previous command. This check will ensure that the 'same rank' command will be able to execute without adding bubbles to the command flow, e.g. any ACT delay requirements can be done under the hoods, allowing the burst to issue seamlessly.	2014-09-20 17:17:57 -04:00
Wendy Elsasser	a384525355	cpu: Update DRAM traffic gen Add new DRAM_ROTATE mode to traffic generator. This mode will generate DRAM traffic that rotates across banks per rank, command types, and ranks per channel The looping order is illustrated below: for (ranks per channel) for (command types) for (banks per rank) // Generate DRAM Command Series This patch also adds the read percentage as an input argument to the DRAM sweep script. If the simulated read percentage is 0 or 100, the middle for loop does not generate additional commands. This loop is used only when the read percentage is set to 50, in which case the middle loop will toggle between read and write commands. Modified sweep.py script, which generates DRAM traffic. Added input arguments and support for new DRAM_ROTATE mode. The script now has input arguments for: 1) Read percentage 2) Number of ranks 3) Address mapping 4) Traffic generator mode (DRAM or DRAM_ROTATE) The default values are: 100% reads, 1 rank, RoRaBaCoCh address mapping, and DRAM traffic gen mode For the DRAM traffic mode, added multi-rank support.	2014-09-20 17:17:55 -04:00
Andreas Sandberg	3f7a9348dd	dev: Add support for 9p proxying over VirtIO This patch adds support for 9p filesystem proxying over VirtIO. It can currently operate by connecting to a 9p server over a socket (VirtIO9PSocket) or by starting the diod 9p server and connecting over pipe (VirtIO9PDiod). WARNING: Checkpoints are currently not supported for systems with 9p proxies!	2014-09-20 17:17:54 -04:00
Andreas Sandberg	8c070c8f1b	dev: Add a VirtIO block device model	2014-09-20 17:17:53 -04:00
Andreas Sandberg	b8c9b04bd6	dev: Add a VirtIO console device model	2014-09-20 17:17:52 -04:00
Andreas Sandberg	bf2c2183c6	dev, pci: Implement basic VirtIO support This patch adds support for VirtIO over the PCI bus. It does so by providing the following new SimObjects: * VirtIODeviceBase - Abstract base class for VirtIO devices. * PciVirtIO - VirtIO PCI transport interface. A VirtIO device is hooked up to the guest system by adding a PciVirtIO device to the PCI bus and connecting it to a VirtIO device using the vio parameter. New VirtIO devices should inherit from VirtIODevice base and implementing one or more VirtQueues. The VirtQueues are usually device-specific and all derive from the VirtQueue class. Queues must be registered with the base class from the constructor since the device assumes that the number of queues stay constant.	2014-09-20 17:17:51 -04:00
Andreas Sandberg	0c5139310d	dev: Refactor terminal<->UART interface to make it more generic The terminal currently assumes that the transport to the guest always inherits from the Uart class. This assumption breaks when implementing, for example, a VirtIO consoles. This patch removes this assumption by adding pointer to the from the terminal to the uart and replacing it with a more general callback interface. The Uart, or any other class using the terminal, class implements an instance of the callbacks class and registers it with the terminal.	2014-09-20 17:17:50 -04:00
Andreas Hansson	0fa128bbd0	base: Clean up redundant string functions and use C++11 This patch does a bit of housekeeping on the string helper functions and relies on the C++11 standard library where possible. It also does away with our custom string hash as an implementation is already part of the standard library.	2014-09-20 17:17:49 -04:00
Andrew Bardsley	b2c2e67468	base: Add getSectionNames to IniFile Add an accessor to IniFile to list all the sections in the file.	2014-09-20 17:17:47 -04:00
Mitch Hayenga	4f0e3cd4d7	cpu: Add ExecFlags debug flag Adds a debug flag to print out the flags a instruction is tagged with.	2014-09-20 17:17:45 -04:00
Mitch Hayenga	3e5bf0c922	mem: Remove the GHB prefetcher from the source tree There are two primary issues with this code which make it deserving of deletion. 1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher 2) This prefetcher isn't even structured like a GHB prefetcher. It's basically a worse version of the stride prefetcher. It primarily serves to confuse new gem5 users and most functionality is already present in the stride prefetcher.	2014-09-20 17:17:44 -04:00
Dam Sunwoo	ca3513d630	cpu: use probes infrastructure to do simpoint profiling Instead of having code embedded in cpu model to do simpoint profiling use the probes infrastructure to do it.	2014-09-20 17:17:43 -04:00
Andrew Bardsley	7329c0e20b	config: Cleanup .json config file generation This patch 'completes' .json config files generation by adding in the SimObject references and String-valued parameters not currently printed. TickParamValues are also changed to print in the same tick-value format as in .ini files. This allows .json files to describe a system as fully as the .ini files currently do. This patch adds a new function config_value (which mirrors ini_str) to each ParamValue and to SimObject. This function can then be explicitly changed to give different .json and .ini printing behaviour rather than being written in terms of ini_str.	2014-09-20 17:17:42 -04:00
Andreas Hansson	41fc8a573e	arch: Pass faults by const reference where possible This patch changes how faults are passed between methods in an attempt to copy as few reference-counting pointer instances as possible. This should avoid unecessary copies being created, contributing to the increment/decrement of the reference counters.	2014-09-19 10:35:18 -04:00
Andreas Hansson	619c5519fe	cpu: Use a deque in o3 rename instruction queue Switch from a list to a data structure with better data layout.	2014-09-19 10:35:14 -04:00
Andreas Hansson	586a219d11	base: Ensure the CP annotation compiles again A bit of revamping to get the CP annotate functionality to compile.	2014-09-19 10:35:12 -04:00
Andreas Hansson	efd5cf323a	misc: Use safe_cast when assumptions are made about return value This patch changes two dynamic_cast to safe_cast as we assume the return value is not NULL (without checking).	2014-09-19 10:35:11 -04:00
Andreas Hansson	32c111eda4	misc: Restore ostream flags where needed This patch ensures we adhere to the normal ostream usage rules, and restore the flags after modifying them.	2014-09-19 10:35:09 -04:00
Andreas Hansson	addfd89dce	stats: Fix flow-control bug in Vector2D printing	2014-09-19 10:35:08 -04:00
Andreas Hansson	f615c4aeb0	misc: Remove assertions ensuring unsigned values >= 0	2014-09-19 10:35:07 -04:00
Andreas Hansson	377f081251	mem: Check return value of checkFunctional in SimpleMemory Simple fix to ensure we only iterate until we are done.	2014-09-19 10:35:06 -04:00
Andreas Hansson	38646d48eb	mem: Add checks to sendTimingReq in cache A small fix to ensure the return value is not ignored.	2014-09-19 10:35:04 -04:00
Nilay Vaish	2ccdfc547d	ruby: network: revert some of the changes from ad9c042dce54 The changeset ad9c042dce54 made changes to the structures under the network directory to use a map of buffers instead of vector of buffers. The reasoning was that not all vnets that are created are used and we needlessly allocate more buffers than required and then iterate over them while processing network messages. But the move to map resulted in a slow down which was pointed out by Andreas Hansson. This patch moves things back to using vector of message buffers.	2014-09-15 16:19:38 -05:00
Andrew Bardsley	1a45a8c5d3	cpu: Fix memory access in Minor not setting parent Request flags This patch fixes cases where uncacheable/memory type flags are not set correctly on a memory op which is split in the LSQ. Without this patch, request->request if freely used to check flags where the flags should actually come from the accumulation of request fragment flags. This patch also fixes a bug where an uncacheable access which passes through tryToSendRequest more than once can increment LSQ::numAccessesInMemorySystem more than once.	2014-09-12 10:22:49 -04:00
Andrew Bardsley	c8b919aba2	style: Fix line continuation, especially in debug messages This patch closes a number of space gaps in debug messages caused by the incorrect use of line continuation within strings. (There's also one consistency change to a similar, but correct, use of line continuation)	2014-09-12 10:22:47 -04:00
Andreas Hansson	2b4906fc64	minor: Fix typo in DPRINTF for Minor branch prediction	2014-09-12 10:22:46 -04:00
Andreas Sandberg	53a24b01ab	sim: Automatically unregister probe listeners The ProbeListener base class automatically registers itself with a probe manager. Currently, the class does not unregister a itself when it is destroyed, which makes removing probes listeners somewhat cumbersome. This patch adds an automatic call to manager->removeListener in the ProbeListener destructor, which solves the problem.	2014-09-09 04:36:43 -04:00
Geoffrey Blake	b0e4de667a	config: Fix vectorparam command line parsing Parsing vectorparams from the command was slightly broken in that it wouldn't accept the input that the help message provided to the user and it didn't do the conversion on the second code path used to convert the string input to the actual internal representation. This patch fixes these bugs.	2014-09-09 04:36:34 -04:00
Mitch Hayenga	cd1bd7572a	cpu: Only iterate over possible threads on the o3 cpu Some places in O3 always iterated over "Impl::MaxThreads" even if a CPU had fewer threads. This removes a few of those instances.	2014-09-09 04:36:34 -04:00
Mitch Hayenga	9a595fac74	mem: Add accessor function for vaddr Determine if a request has an associated virtual address.	2014-09-09 04:36:33 -04:00
Andreas Sandberg	11494c4345	sim: Fix resource leak in BaseGlobalEvent Static analysis revealed that BaseGlobalEvent::barrier was never deallocated. This changeset solves this leak by making the barrier allocation a part of the BaseGlobalEvent instead of storing a pointer to a separate heap-allocated barrier.	2014-09-09 04:36:32 -04:00
Andreas Hansson	da4539dc74	misc: Fix a number of unitialised variables and members Static analysis unearther a bunch of uninitialised variables and members, and this patch addresses the problem. In all cases these omissions seem benign in the end, but at least fixing them means less false positives next time round.	2014-09-09 04:36:31 -04:00
Ali Saidi	346fe73370	dev: seperate legacy io offsets from PCI offset The PC platform has a single IO range that is used both legacy IO and PCI IO while other platforms may use seperate regions. Provide another mechanism to configure the legacy IO base address range and set it to the PCI IO address range for x86.	2014-09-03 07:43:06 -04:00
Ali Saidi	1c0ae90027	arm: Support >2GB of memory for AArch64 systems	2014-09-03 07:43:05 -04:00
Ali Saidi	1e13f1b074	dev, arm: Add support for linux generic pci host driver This change adds support for a generic pci host bus driver that has been included in recent Linux kernel instead of the more bespoke one we've been using to date. It also works with aarch64 so it provides PCI support for 64-bit ARM Linux. To make this work a new configuration option pci_io_base is added to the RealView platform that should be set to the start of the memory used as memory mapped IO ports (IO ports that are memory mapped, not regular memory mapped IO). And a parameter pci_cfg_gen_offsets which specifies if the config space offsets should be used that the generic driver expects. To use the pci-host-generic device you need to: pci_io_base = 0x2f000000 (Valid for VExpress EMM) pci_cfg_gen_offsets = True and add the following to your device tree: pci { compatible = "pci-host-ecam-generic"; device_type = "pci"; #address-cells = <0x3>; #size-cells = <0x2>; #interrupt-cells = <0x1>; //bus-range = <0x0 0x1>; // CPU_PHYSICAL(2) SIZE(2) // Note, some DTS blobs only support 1 size reg = <0x0 0x30000000 0x0 0x10000000>; // IO (1), no bus address (2), cpu address (2), size (2) // MMIO (1), at address (2), cpu address (2), size (2) ranges = <0x01000000 0x0 0x00000000 0x0 0x2f000000 0x0 0x10000>, <0x02000000 0x0 0x40000000 0x0 0x40000000 0x0 0x10000000>; // With gem5 we typically use INTA/B/C/D one per device interrupt-map = <0x0000 0x0 0x0 0x1 0x1 0x0 0x11 0x1 0x0000 0x0 0x0 0x2 0x1 0x0 0x12 0x1 0x0000 0x0 0x0 0x3 0x1 0x0 0x13 0x1 0x0000 0x0 0x0 0x4 0x1 0x0 0x14 0x1>; // Only match INTA/B/C/D and not BDF interrupt-map-mask = <0x0000 0x0 0x0 0x7>; };	2014-09-03 07:43:04 -04:00
Geoffrey Blake	31e4e475d9	config: Add port splicing capability to PortRef class The new configuration scripts need the ability to splice a simobject between a pair of ports that are already connected. The primary use case is when a CommMonitor needs to be created after the system is configured and then spliced between the pair of ports it will monitor.	2014-09-03 07:43:03 -04:00
Geoffrey Blake	845e199934	config: Refactor RealviewEMM to fit into new config system This eliminates some default devices and adds in helper functions to connect the devices defined here to associate with the proper clock domains.	2014-09-03 07:43:01 -04:00
Andreas Hansson	83a46bfc09	base: Use STL C++11 random number generation This patch changes the random number generator from the in-house Mersenne twister to an implementation relying entirely on C++11 STL. The format for the checkpointing of the twister is simplified. As the functionality was never used this should not matter. Note that this patch does not actually make use of the checkpointing functionality. As the random number generator is not thread safe, it may be sensible to create one generator per thread, system, or even object. Until this is decided the status quo is maintained in that no generator state is part of the checkpoint.	2014-09-03 07:42:55 -04:00
Andreas Hansson	2698e73966	base: Use the global Mersenne twister throughout This patch tidies up random number generation to ensure that it is done consistently throughout the code base. In essence this involves a clean-up of Ruby, and some code simplifications in the traffic generator. As part of this patch a bunch of skewed distributions (off-by-one etc) have been fixed. Note that a single global random number generator is used, and that the object instantiation order will impact the behaviour (the sequence of numbers will be unaffected, but if module A calles random before module B then they would obviously see a different outcome). The dependency on the instantiation order is true in any case due to the execution-model of gem5, so we leave it as is. Also note that the global ranom generator is not thread safe at this point. Regressions using the memtest, TrafficGen or any Ruby tester are affected and will be updated accordingly.	2014-09-03 07:42:54 -04:00
Andreas Hansson	1ff4c45bbb	mem: Avoid unecessary retries when bus peer is not ready This patch removes unecessary retries that happened when the bus layer itself was no longer busy, but the the peer was not yet ready. Instead of sending a retry that will inevitably not succeed, the bus now silenty waits until the peer sends a retry.	2014-09-03 07:42:53 -04:00
Mitch Hayenga	8f95144e16	arm: Make memory ops work on 64bit/128-bit quantities Multiple instructions assume only 32-bit load operations are available, this patch increases load sizes to 64-bit or 128-bit for many load pair and load multiple instructions.	2014-09-03 07:42:52 -04:00
Curtis Dunham	f6f63ec0aa	mem: write streaming support via WriteInvalidate promotion Support full-block writes directly rather than requiring RMW: * a cache line is allocated in the cache upon receipt of a WriteInvalidateReq, not the WriteInvalidateResp. * only top-level caches allocate the line; the others just pass the request along and invalidate as necessary. * to close a timing window between the Req and the Resp, a new metadata bit tracks whether another cache has read a copy of the new line before the writeback to memory.	2014-06-27 12:29:00 -05:00
Andreas Hansson	3be4f4b846	mem: Fix a bug in the cache port flow control This patch fixes a bug in the cache port where the retry flag was reset too early, allowing new requests to arrive before the retry was actually sent, but with the event already scheduled. This caused a deadlock in the interactions with the O3 LSQ. The patche fixes the underlying issue by shifting the resetting of the flag to be done by the event that also calls sendRetry(). The patch also tidies up the flow control in recvTimingReq and ensures that we also check if we already have a retry outstanding.	2014-09-03 07:42:50 -04:00
Curtis Dunham	5d029463ee	cpu, mem: Make software prefetches non-blocking Previously, they were treated so much like loads that they could stall at the head of the ROB. Now they are always treated like L1 hits. If they actually miss, a new request is created at the L1 and tracked from the MSHRs there if necessary (i.e. if it didn't coalesce with an existing outstanding load).	2014-05-13 12:20:49 -05:00
Curtis Dunham	e3b19cb294	mem: Refactor assignment of Packet types Put the packet type swizzling (that is currently done in a lot of places) into a refineCommand() member function.	2014-05-13 12:20:48 -05:00
Mitch Hayenga	afbae1ec95	x86: Flag instructions that call suspend as IsQuiesce The o3 cpu relies upon instructions that suspend a thread context being flagged as "IsQuiesce". If they are not, unpredictable behavior can occur. This patch fixes that for the x86 ISA.	2014-09-03 07:42:46 -04:00
Mitch Hayenga	659bdc1a6b	cpu: Fix o3 drain bug For X86, the o3 CPU would get stuck with the commit stage not being drained if an interrupt arrived while drain was pending. isDrained() makes sure that pcState.microPC() == 0, thus ensuring that we are at an instruction boundary. However, when we take an interrupt we execute: pcState.upc(romMicroPC(entry)); pcState.nupc(romMicroPC(entry) + 1); tc->pcState(pcState); As a result, the MicroPC is no longer zero. This patch ensures the drain is delayed until no interrupts are present. Once draining, non-synchronous interrupts are deffered until after the switch.	2014-09-03 07:42:45 -04:00
Mitch Hayenga	bb1e6cf7c4	arm: Fix v8 neon latency issue for loads/stores Neon memory ops that operate on multiple registers currently have very poor performance because of interleave/deinterleave micro-ops. This patch marks the deinterleave/interleave micro-ops as "No_OpClass" such that they take minumum cycles to execute and are never resource constrained. Additionaly the micro-ops over-read registers. Although one form may need to read up to 20 sources, not all do. This adds in new forms so false dependencies are not modeled. Instructions read their minimum number of sources.	2014-09-03 07:42:44 -04:00
Curtis Dunham	4a3f11149d	arm: use condition code registers for ARM ISA Analogous to ee049bf (for x86). Requires a bump of the checkpoint version and corresponding upgrader code to move the condition code register values to the new register file.	2014-04-29 16:05:02 -05:00
Andrew Bardsley	035a82ee2c	arm: ISA X31 destination register fix This patch substituted the zero register for X31 used as a destination register. This prevents false dependencies based on X31.	2014-09-03 07:42:43 -04:00
Dam Sunwoo	5008a20aa4	cpu: fix bimodal predictor to use correct global history reg A small bug in the bimodal predictor caused significant degradation in performance on some benchmarks. This was caused by using the wrong globalHistoryReg during the update phase. This patches fixes the bug and brings the performance to normal level.	2014-09-03 07:42:41 -04:00
Mitch Hayenga	476c6fe368	arm: Mark v7 cbz instructions as direct branches v7 cbz/cbnz instructions were improperly marked as indirect branches.	2014-09-03 07:42:40 -04:00
Mitch Hayenga	4f13f676aa	cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior.	2014-09-03 07:42:39 -04:00
Mitch Hayenga	283935a6f0	cpu: Fix o3 quiesce fetch bug O3 is supposed to stop fetching instructions once a quiesce is encountered. However due to a bug, it would continue fetching instructions from the current fetch buffer. This is because of a break statment that only broke out of the first of 2 nested loops. It should have broken out of both.	2014-09-03 07:42:38 -04:00
Mitch Hayenga	4f26bedc18	cpu: Fix SMT scheduling issue with the O3 cpu The o3 cpu could attempt to schedule inactive threads under round-robin SMT mode. This is because it maintained an independent priority list of threads from the active thread list. This priority list could be come stale once threads were inactive, leading to the cpu trying to fetch/commit from inactive threads. Additionally the fetch queue is now forcibly flushed of instrctuctions from the de-scheduled thread. Relevant output: 24557000: system.cpu: [tid:1]: Calling deactivate thread. 24557000: system.cpu: [tid:1]: Removing from active threads list 24557500: system.cpu: FullO3CPU: Ticking main, FullO3CPU. 24557500: system.cpu.fetch: Running stage. 24557500: system.cpu.fetch: Attempting to fetch from [tid:1]	2014-09-03 07:42:37 -04:00
Mitch Hayenga	daedc5a491	cpu: Fix incorrect speculative branch predictor behavior When a branch mispredicted gem5 would squash all history after and including the mispredicted branch. However, the mispredicted branch is still speculative and its history is required to rollback state if another, older, branch mispredicts. This leads to things like RAS corruption.	2014-09-03 07:42:36 -04:00
Mitch Hayenga	ecd5300971	cpu: Add a fetch queue to the o3 cpu This patch adds a fetch queue that sits between fetch and decode to the o3 cpu. This effectively decouples fetch from decode stalls allowing it to be more aggressive, running futher ahead in the instruction stream.	2014-09-03 07:42:35 -04:00
Mitch Hayenga	1716749c8c	cpu: Fix o3 front-end pipeline interlock behavior The o3 pipeline interlock/stall logic is incorrect. o3 unnecessicarily stalled fetch and decode due to later stages in the pipeline. In general, a stage should usually only consider if it is stalled by the adjacent, downstream stage. Forcing stalls due to later stages creates and results in bubbles in the pipeline. Additionally, o3 stalled the entire frontend (fetch, decode, rename) on a branch mispredict while the ROB is being serially walked to update the RAT (robSquashing). Only should have stalled at rename.	2014-09-03 07:42:34 -04:00
Mitch Hayenga	976f27487b	cpu: Change writeback modeling for outstanding instructions As highlighed on the mailing list gem5's writeback modeling can impact performance. This patch removes the limitation on maximum outstanding issued instructions, however the number that can writeback in a single cycle is still respected in instToCommit().	2014-09-03 07:42:33 -04:00
Mitch Hayenga	fd722946dd	arch: Properly guess OpClass from optional StaticInst flags isa_parser.py guesses the OpClass if none were given based upon the StaticInst flags. The existing code does not take into account optionally set flags. This code hoists the setting of optional flags so OpClass is properly assigned.	2014-09-03 07:42:32 -04:00
Geoffrey Blake	b404ffde60	cache: Fix handling of LL/SC requests under contention If a set of LL/SC requests contend on the same cache block we can get into a situation where CPUs will deadlock if they expect a failed SC to supply them data. This case happens where 3 or more cores are contending for a cache block using LL/SC and the system is configured where 2 cores are connected to a local bus and the third is connected to a remote bus. If a core on the local bus sends an SCUpgrade and the core on the remote bus sends and SCUpgrade they will race to see who will win the SC access. In the meantime if the other core appends a read to one of the SCUpgrades it will expect to be supplied data by that SCUpgrade transaction. If it happens that the SCUpgrade that was picked to supply the data is failed, it will drop the appended request for data and never respond, leaving the requesting core to deadlock. This patch makes all SC's behave as normal stores to prevent this case but still makes sure to check whether it can perform the update.	2014-09-03 07:42:31 -04:00
Curtis Dunham	12210ada54	arm: support 16kb vm granules	2014-05-27 11:00:56 -05:00
Andreas Hansson	77c28cc395	mem: Packet queue clean up No change in functionality, just a bit of tidying up.	2014-09-03 07:42:28 -04:00
Mitch Hayenga	71769d2d7b	dev: Avoid invalid sized reads in PL390 with DPRINTF enabled The first DPRINTF() in PL390::writeDistributor always read a uint32_t, though a packet may have only been 1 or 2 bytes. This caused an assertion in packet->get().	2014-09-03 07:42:27 -04:00
Andrew Bardsley	87f6034462	sim: Fix checkpoint restore for Ticked This patch makes restoring the 'lastStopped' value for Ticked-containing objects (including MinorCPU) optional so that Ticked-containing objects can be restored from non-Ticked-containing objects (such as AtomicSimpleCPU).	2014-09-03 07:42:25 -04:00
Andreas Sandberg	326662b01b	arch, cpu: Factor out the ExecContext into a proper base class We currently generate and compile one version of the ISA code per CPU model. This is obviously wasting a lot of resources at compile time. This changeset factors out the interface into a separate ExecContext class, which also serves as documentation for the interface between CPUs and the ISA code. While doing so, this changeset also fixes up interface inconsistencies between the different CPU models. The main argument for using one set of ISA code per CPU model has always been performance as this avoid indirect branches in the generated code. However, this argument does not hold water. Booting Linux on a simulated ARM system running in atomic mode (opt/10.linux-boot/realview-simple-atomic) is actually 2% faster (compiled using clang 3.4) after applying this patch. Additionally, compilation time is decreased by 35%.	2014-09-03 07:42:22 -04:00
Andreas Hansson	e1ac962939	arch: Cleanup unused ISA traits constants This patch prunes unused values, and also unifies how the values are defined (not using an enum for ALPHA), aligning the use of int vs Addr etc. The patch also removes the duplication of PageBytes/PageShift and VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical values and the latter has been removed.	2014-09-03 07:42:21 -04:00
Mitch Hayenga	23c8540756	config: Change parsing of Addr so hex values work from scripts When passed from a configuration script with a hexadecimal value (like "0x80000000"), gem5 would error out. This is because it would call "toMemorySize" which requires the argument to end with a size specifier (like 1MB, etc). This modification makes it so raw hex values can be passed through Addr parameters from the configuration scripts.	2014-09-03 07:42:20 -04:00
Andreas Hansson	1046b8d6e5	arm: Fix ExtMachInst hash operator underlying type This patch fixes the hash operator used for ARM ExtMachInst, which incorrectly was still using uint32_t. Instead of changing it to uint64_t it is not using the underlying data type of the BitUnion.	2014-09-03 07:42:19 -04:00
Nilay Vaish	2cbe7c705b	ruby: remove typedef of Index as int64 The Index type defined as typedef int64 does not really provide any help since in most places we use primitive types instead of Index. Also, the name Index is very generic that it does not merit being used as a typename.	2014-09-01 16:55:50 -05:00
Nilay Vaish	4ccdf8fb81	x86: set op class of two fp instructions This patch sets op class of two fp instructions: movfp and pop x87 stack as IntAluOp since these instructions do not make use of the fp alu.	2014-09-01 16:55:49 -05:00
Nilay Vaish	b4dade6fb2	ruby: PerfectSwitch: moves code to a per vnet helper function This patch moves code from the wakeup() function to a operateVnet(). The aim is to improve the readiblity of the code.	2014-09-01 16:55:48 -05:00
Nilay Vaish	7a0d5aafe4	ruby: message buffers: significant changes This patch is the final patch in a series of patches. The aim of the series is to make ruby more configurable than it was. More specifically, the connections between controllers are not at all possible (unless one is ready to make significant changes to the coherence protocol). Moreover the buffers themselves are magically connected to the network inside the slicc code. These connections are not part of the configuration file. This patch makes changes so that these connections will now be made in the python configuration files associated with the protocols. This requires each state machine to expose the message buffers it uses for input and output. So, the patch makes these buffers configurable members of the machines. The patch drops the slicc code that usd to connect these buffers to the network. Now these buffers are exposed to the python configuration system as Master and Slave ports. In the configuration files, any master port can be connected any slave port. The file pyobject.cc has been modified to take care of allocating the actual message buffer. This is inline with how other port connections work.	2014-09-01 16:55:47 -05:00
Nilay Vaish	00286fc5cb	build opts: add MI_example to NULL ISA A later changeset changes the file src/python/swig/pyobject.cc to include a header file that includes a header file generated at build time depending on the PROTOCOL in use. Since NULL ISA was not specifying any protocol, this resulted in compilation problems. Hence, the changeset.	2014-09-01 16:55:46 -05:00
Nilay Vaish	d07abd9b5b	mem: change the namespace Message to ProtoMessage The namespace Message conflicts with the Message data type used extensively in Ruby. Since Ruby is being moved to the same Master/Slave ports based configuration style as the rest of gem5, this conflict needs to be resolved. Hence, the namespace is being renamed to ProtoMessage.	2014-09-01 16:55:46 -05:00
Nilay Vaish	cee8faaad0	ruby: slicc: change the way configurable members are specified There are two changes this patch makes to the way configurable members of a state machine are specified in SLICC. The first change is that the data member declarations will need to be separated by a semi-colon instead of a comma. Secondly, the default value to be assigned would now use SLICC's assignment operator i.e. ':='.	2014-09-01 16:55:45 -05:00
Nilay Vaish	b1d3873ec5	ruby: slicc: improve the grammar This patch changes the grammar for SLICC so as to remove some of the redundant / duplicate rules. In particular rules for object/variable declaration and class member declaration have been unified. Similarly, the rules for a general function and a class method have been unified. One more change is in the priority of two rules. The first rule is on declaring a function with all the params typed and named. The second rule is on declaring a function with all the params only typed. Earlier the second rule had a higher priority. Now the first rule has a higher priority.	2014-09-01 16:55:44 -05:00
Nilay Vaish	3202ec98e7	ruby: mesi three level: slight naming changes.	2014-09-01 16:55:44 -05:00
Nilay Vaish	557200725c	ruby: slicc: donot prefix machine name to variables This changeset does away with prefixing of member variables of state machines with the identity of the machine itself.	2014-09-01 16:55:43 -05:00
Nilay Vaish	6ceb1aadc2	ruby: remove unused toString() from AbstractController	2014-09-01 16:55:42 -05:00
Nilay Vaish	00dbadcbb0	ruby: network: move getNumNodes() to base class All the implementations were doing the same things.	2014-09-01 16:55:42 -05:00
Nilay Vaish	cc2cc58869	ruby: eliminate type Time There is another type Time in src/base class which results in a conflict.	2014-09-01 16:55:41 -05:00
Nilay Vaish	82d136285d	ruby: move files from ruby/system to ruby/structures The directory ruby/system is crowded and unorganized. Hence, the files the hold actual physical structures, are being moved to the directory ruby/structures. This includes Cache Memory, Directory Memory, Memory Controller, Wire Buffer, TBE Table, Perfect Cache Memory, Timer Table, Bank Array. The directory ruby/systems has the glue code that holds these structures together. --HG-- rename : src/mem/ruby/system/MachineID.hh => src/mem/ruby/common/MachineID.hh rename : src/mem/ruby/buffers/MessageBuffer.cc => src/mem/ruby/network/MessageBuffer.cc rename : src/mem/ruby/buffers/MessageBuffer.hh => src/mem/ruby/network/MessageBuffer.hh rename : src/mem/ruby/buffers/MessageBufferNode.cc => src/mem/ruby/network/MessageBufferNode.cc rename : src/mem/ruby/buffers/MessageBufferNode.hh => src/mem/ruby/network/MessageBufferNode.hh rename : src/mem/ruby/system/AbstractReplacementPolicy.hh => src/mem/ruby/structures/AbstractReplacementPolicy.hh rename : src/mem/ruby/system/BankedArray.cc => src/mem/ruby/structures/BankedArray.cc rename : src/mem/ruby/system/BankedArray.hh => src/mem/ruby/structures/BankedArray.hh rename : src/mem/ruby/system/Cache.py => src/mem/ruby/structures/Cache.py rename : src/mem/ruby/system/CacheMemory.cc => src/mem/ruby/structures/CacheMemory.cc rename : src/mem/ruby/system/CacheMemory.hh => src/mem/ruby/structures/CacheMemory.hh rename : src/mem/ruby/system/DirectoryMemory.cc => src/mem/ruby/structures/DirectoryMemory.cc rename : src/mem/ruby/system/DirectoryMemory.hh => src/mem/ruby/structures/DirectoryMemory.hh rename : src/mem/ruby/system/DirectoryMemory.py => src/mem/ruby/structures/DirectoryMemory.py rename : src/mem/ruby/system/LRUPolicy.hh => src/mem/ruby/structures/LRUPolicy.hh rename : src/mem/ruby/system/MemoryControl.cc => src/mem/ruby/structures/MemoryControl.cc rename : src/mem/ruby/system/MemoryControl.hh => src/mem/ruby/structures/MemoryControl.hh rename : src/mem/ruby/system/MemoryControl.py => src/mem/ruby/structures/MemoryControl.py rename : src/mem/ruby/system/MemoryNode.cc => src/mem/ruby/structures/MemoryNode.cc rename : src/mem/ruby/system/MemoryNode.hh => src/mem/ruby/structures/MemoryNode.hh rename : src/mem/ruby/system/MemoryVector.hh => src/mem/ruby/structures/MemoryVector.hh rename : src/mem/ruby/system/PerfectCacheMemory.hh => src/mem/ruby/structures/PerfectCacheMemory.hh rename : src/mem/ruby/system/PersistentTable.cc => src/mem/ruby/structures/PersistentTable.cc rename : src/mem/ruby/system/PersistentTable.hh => src/mem/ruby/structures/PersistentTable.hh rename : src/mem/ruby/system/PseudoLRUPolicy.hh => src/mem/ruby/structures/PseudoLRUPolicy.hh rename : src/mem/ruby/system/RubyMemoryControl.cc => src/mem/ruby/structures/RubyMemoryControl.cc rename : src/mem/ruby/system/RubyMemoryControl.hh => src/mem/ruby/structures/RubyMemoryControl.hh rename : src/mem/ruby/system/RubyMemoryControl.py => src/mem/ruby/structures/RubyMemoryControl.py rename : src/mem/ruby/system/SparseMemory.cc => src/mem/ruby/structures/SparseMemory.cc rename : src/mem/ruby/system/SparseMemory.hh => src/mem/ruby/structures/SparseMemory.hh rename : src/mem/ruby/system/TBETable.hh => src/mem/ruby/structures/TBETable.hh rename : src/mem/ruby/system/TimerTable.cc => src/mem/ruby/structures/TimerTable.cc rename : src/mem/ruby/system/TimerTable.hh => src/mem/ruby/structures/TimerTable.hh rename : src/mem/ruby/system/WireBuffer.cc => src/mem/ruby/structures/WireBuffer.cc rename : src/mem/ruby/system/WireBuffer.hh => src/mem/ruby/structures/WireBuffer.hh rename : src/mem/ruby/system/WireBuffer.py => src/mem/ruby/structures/WireBuffer.py rename : src/mem/ruby/recorder/CacheRecorder.cc => src/mem/ruby/system/CacheRecorder.cc rename : src/mem/ruby/recorder/CacheRecorder.hh => src/mem/ruby/system/CacheRecorder.hh	2014-09-01 16:55:40 -05:00
Alexandru	5efbb4442a	mem: adding architectural page table support for SE mode This patch enables the use of page tables that are stored in system memory and respect x86 specification, in SE mode. It defines an architectural page table for x86 as a MultiLevelPageTable class and puts a placeholder class for other ISAs page tables, giving the possibility for future implementation.	2014-08-28 10:11:44 -05:00
Alexandru	26ac28dec2	mem: adding a multi-level page table class This patch defines a multi-level page table class that stores the page table in system memory, consistent with ISA specifications. In this way, cpu models that use the actual hardware to execute (e.g. KvmCPU), are able to traverse the page table.	2014-04-01 12:18:12 -05:00
Andreas Hansson	9e4cd5bf1e	mem: Fix DRAMSim2 cycle check when restoring from checkpoint This patch ensures the cycle check is still valid even restoring from a checkpoint. In this case the DRAMSim2 cycle count is relative to the startTick rather than 0.	2014-08-26 10:14:38 -04:00
Andreas Hansson	6fa8015b7f	base: Add const to intmath and be more flexible with typing This patch ensures the functions can be used on const variables.	2014-08-26 10:14:32 -04:00
Andreas Sandberg	70176fecd1	base: Replace the internal varargs stuff with C++11 constructs We currently use our own home-baked support for type-safe variadic functions. This is confusing and somewhat limited (e.g., cprintf only supports a limited number of arguments). This changeset converts all uses of our internal varargs support to use C++11 variadic macros.	2014-08-26 10:13:45 -04:00
Andreas Sandberg	f3e5fee743	base: Add compiler macros for C++11 final/override Add the macros M5_ATTR_FINAL and M5_ATTR_OVERRIDE which are defined to final and override respectively if supported by the compiler. This is done to allow a smooth transition to gcc >= 4.7.	2014-08-26 10:13:33 -04:00
Mitch Hayenga	0da99b7e0c	mips: Fix RLIMIT_RSS naming MIPS defined RLIMIT_RSS in a way that could cause a naming conflict with RLIMIT_RSS from the host system. Broke clang+MacOS build.	2014-08-26 10:13:31 -04:00
Andreas Sandberg	61b8d5e4e4	base: Add a static assert to check bit union ranges If a bit field in a bit union specified as Bitfield<LSB, MSB> instead of Bitfield<MSB, LSB> the code silently fails and the field is read as zero. This changeset introduces a static assert that tests, at compile time, that the bit order is correct.	2014-08-26 10:13:28 -04:00
Andreas Sandberg	a3d3eb0ff7	sparc: Fixup bit ordering in the PSTATE bit union The order of the MSB and LSB bit of the mm field in the PSTATE union is wrong. Any access to this field will currently be ignored and reads will always return zero. This patch fixes the ordering so it is <MSB, LSB> instead of <LSB, MSB>.	2014-08-26 10:13:23 -04:00
Andreas Hansson	3efabb4b2f	mem: Update DRAM controller comments Update comments and add a reference for more information.	2014-08-26 10:13:03 -04:00
Andreas Hansson	56b7796e0d	mem: Fix address interleaving bug in DRAM controller This patch fixes a bug in the DRAM controller address decoding. In cases where the DRAM burst size (e.g. 32 bytes in a rank with a single LPDDR3 x32) was smaller than the channel interleaving size (e.g. systems with a 64-byte cache line) one address bit effectively got used as a channel bit when it should have been a low-order column bit. This patch adds a notion of "columns per stripe", and more clearly deals with the low-order column bits and high-order column bits. The patch also relaxes the granularity check such that it is possible to use interleaving granularities other than the cache line size. The patch also adds a missing M5_CLASS_VAR_USED to the tCK member as it is only used in the debug build for now.	2014-08-26 10:12:45 -04:00
Curtis Dunham	04d1f61ae8	sim: bump checkpoint version for multiple event queues This patch adds a fix for older checkpoints before support for multiple event queues were added in changeset 2cce74fe359e. The change in checkpoint version should really hav ebeen part of the aforementioned changeset.	2014-02-05 16:17:41 -06:00
Dam Sunwoo	b04d6c7c33	arm: change MISCREG_L2ERRSR to warn not fail Some newer binaries compiled for Versatile Express TC2 contain access to implementation specific L2MERRSR registers. This causes an infinite loop of undefined exceptions. This patch changes the behavior to "warn not fail" to keep the workloads going.	2014-08-13 06:57:36 -04:00
Dam Sunwoo	74a4926fe0	sim: remove kernel mapping check for baremetal workloads Baremetal workloads are specified using the "kernel" parameter, but don't always have the correct address mappings. This patch adds a boolean flag to the system and bypasses the kernel addr mapping checks when running in baremetal mode.	2014-08-13 06:57:35 -04:00
Andreas Sandberg	41d069ef6a	scons: Build the branch predictor for all CPUs The branch predictor is normally only built when a CPU that uses a branch predictor is built. The list of CPUs is currently incomplete as the simple CPUs support branch predictors (for warming, branch stats, etc). In practice, all CPU models now use branch predictors, so this changeset removes the CPU model check and replaces it with a check for the NULL ISA.	2014-08-13 06:57:31 -04:00
Andreas Sandberg	8b8d991df0	mips: Remove unused private members to fix compile-time warning Certain versions of clang complain about unused private members if they are not used. This changeset removes such members from the MIPS-specific classes to silence the warning.	2014-08-13 06:57:30 -04:00
Andreas Sandberg	8d04e32a83	power: Remove unused private members to fix compile-time warning Certain versions of clang complain about unused private members if they are not used. This changeset removes such members from the POWER-specific ProcessInfo struct to silence the warning.	2014-08-13 06:57:29 -04:00
Andreas Sandberg	6b908211e6	scons: Silence clang 3.4 warnings on Ubuntu 12.04 This changeset fixes three types of warnings that occur in clang 3.4 on Ubuntu 12.04: * Certain versions of libstdc++ (primarily 4.8) use struct and class interchangeably. This triggers a warning in clang. * Swig has a tendency to generate code with the register class which was deprecated in C++11. This triggers a deprecation warning in clang. * Swig sometimes generates Python wrapper code which returns uninitialized values. It's unclear if this is actually a problem (the cases might be limited to failure paths). We'll silence these warnings for now since there is little we can do about the generated code.	2014-08-13 06:57:28 -04:00
Andreas Sandberg	eb9226317d	base: Remove unused M5_PRAGMA_NORETURN The M5_PRAGMA_NORETURN macro was only used in for __exit_message. Since the macro only holds a stub definition and all functions with noreturn semantics use the M5_ATTR_NORETURN, this macros is completely redundant.	2014-08-13 06:57:27 -04:00
Andreas Sandberg	25f5a6733c	cpu: Don't forward declare RefCountingPtr RefCountingPtr is sometimes forward declared to avoid having to include refcnt.hh. This does not work since we typically return instances of RefCountingPtr rather than references to instances. The only reason this currently works is that we include refcnt.hh in cprintf.hh, which "leaks" the header to most other source files. This changeset replaces such forward declarations with an include of refcnt.hh.	2014-08-13 06:57:26 -04:00
Mitch Hayenga	f6f6ae461e	mem: Properly set cache block status fields on writebacks When a cacheline is written back to a lower-level cache, tags->insertBlock() sets various status parameters. However these status bits were cleared immediately after calling. This patch makes it so that these status fields are not cleared by moving them outside of the tags->insertBlock() call.	2014-08-13 06:57:24 -04:00
Andreas Hansson	66904b9584	cpu: Modernise the branch predictor (STL and C++11) This patch does some minor house keeping of the branch predictor by adopting STL containers, and shifting some iterator to use range-based for loops. The predictor history is also changed from a list to a deque as we never to insertion/deletion other than at the front and back.	2014-08-13 06:57:21 -04:00
Curtis Dunham	94daae6864	arm: remove dead code fplib mul64x64	2014-03-11 09:50:02 -05:00
Geoffrey Blake	dbdce42b88	config: Add SubSystem container for simobjects This patch adds the SubSystem container for grouping simobjects together in logical subsystems to facilitate building a larger system from constituent parts. The container is simply a non-abstract empty simobject to hold the components that will be connected as its children. In simulation the object does not participate, its only use is during configuration of the system.	2014-08-10 05:39:16 -04:00
Geoffrey Blake	09b5003815	config: Add hooks to enable new config sys This patch adds helper functions to SimObject.py, params.py and simulate.py to enable the new configuration system. Functions like enumerateParams() in SimObject lets the config system auto-generate command line options for simobjects to be modified on the command line. Params in params.py have __call__() added to their definition to allow the argparse module to use them as a type to check command input is in the proper format.	2014-08-10 05:39:13 -04:00
Andreas Hansson	47313601c1	cpu: Ensure the traffic generator suppresses non-memory packets This patch adds a check to ensure that packets which are not going to a memory range are suppressed in the traffic generator. Thus, if a trace is collected in full-system, the packets destined for devices are not played back.	2014-08-10 05:39:04 -04:00
Andreas Hansson	d45ab59c29	base: Remove unused files A bit of pruning	2014-08-10 05:38:59 -04:00
Anthony Gutierrez	a628afedad	mem: refactor LRU cache tags and add random replacement tags this patch implements a new tags class that uses a random replacement policy. these tags prefer to evict invalid blocks first, if none are available a replacement candidate is chosen at random. this patch factors out the common code in the LRU class and creates a new abstract class: the BaseSetAssoc class. any set associative tag class must implement the functionality related to the actual replacement policy in the following methods: accessBlock() findVictim() insertBlock() invalidate()	2014-07-28 12:23:23 -04:00
Andrew Bardsley	0e8a90f06b	cpu: `Minor' in-order CPU model This patch contains a new CPU model named `Minor'. Minor models a four stage in-order execution pipeline (fetch lines, decompose into macroops, decompose macroops into microops, execute). The model was developed to support the ARM ISA but should be fixable to support all the remaining gem5 ISAs. It currently also works for Alpha, and regressions are included for ARM and Alpha (including Linux boot). Documentation for the model can be found in src/doc/inside-minor.doxygen and its internal operations can be visualised using the Minorview tool utils/minorview.py. Minor was designed to be fairly simple and not to engage in a lot of instruction annotation. As such, it currently has very few gathered stats and may lack other gem5 features. Minor is faster than the o3 model. Sample results: Benchmark \| Stat host_seconds (s) ---------------+--------v--------v-------- (on ARM, opt) \| simple \| o3 \| minor \| timing \| timing \| timing ---------------+--------+--------+-------- 10.linux-boot \| 169 \| 1883 \| 1075 10.mcf \| 117 \| 967 \| 491 20.parser \| 668 \| 6315 \| 3146 30.eon \| 542 \| 3413 \| 2414 40.perlbmk \| 2339 \| 20905 \| 11532 50.vortex \| 122 \| 1094 \| 588 60.bzip2 \| 2045 \| 18061 \| 9662 70.twolf \| 207 \| 2736 \| 1036	2014-07-23 16:09:04 -05:00
Steve Reinhardt	06bb6a4731	syscall emulation: fix fast build issue Surprisingly gcc will complain about unused variables even inside an 'if (false)' block. I thought I had tested this previously, but apparently not.	2014-07-19 02:06:22 -07:00
Binh Pham	c99b13d904	x86: make PioBus return BadAddress errors Stop setting the use_default_range flag in PioBus in order to have random bad addresses result in a BadAddress response and not a gem5 fatal error. This is necessary in Ruby as Ruby is connected directly to PioBus, so misspeculated addresses will be sent there directly. For the classic memory system, this change has no effect, as bad addresses are caught by the memory bus before being sent to the PioBus. This work was done while Binh was an intern at AMD Research.	2014-07-18 22:05:51 -07:00
Steve Reinhardt	fe530648d5	sim: remove unused MemoryModeStrings array The System object has a static MemoryModeStrings array that's (1) unused and (2) redundant, since there's an auto-generated version in the Enums namespace. No point in leaving it in.	2014-07-18 22:05:51 -07:00
Steve Reinhardt	e3de6950a4	kern: get rid of unused linux syscall files	2014-07-18 22:05:51 -07:00
Steve Reinhardt	f5aace8300	syscall emulation: fix DPRINTF arg ordering bug When we switched getSyscallArg() from explicit arg indices to the implicit method, some DPRINTF arguments were left as calls to getSyscallArg(), even though C/C++ doesn't guarantee anything about the order of invocation of these calls. As a result, the args could be printed out in arbitrary orders. Interestingly, this bug has been around since 2009: http://repo.gem5.org/gem5/rev/4842482e1bd1	2014-07-18 22:05:51 -07:00
Anthony Gutierrez	59c8c454eb	base: fix operator== for comparing EthAddr objects this operator uses memcmp() to detect if two EthAddr object have the same address, however memcmp() will return 0 if all bytes are equal. operator== returns the return value of memcmp() to indicate whether or not two address are equal. this is incorrect as it will always give the opposite of the intended behavior. this patch fixes that problem.	2014-07-09 09:28:15 -04:00
Anthony Gutierrez	3956ec0a89	base: fix some bugs in EthAddr per the IEEE 802 spec: 1) fixed broadcast() to ensure that all bytes are equal to 0xff. 2) fixed unicast() to ensure that bit 0 of the first byte is equal to 0 3) fixed multicast() to ensure that bit 0 of the first byte is equal to 1, and that it is not a broadcast. also the constructors in EthAddr are fixed so that all bytes of data are initialized.	2014-07-02 13:19:13 -04:00
Radhika Jagtap	b998a0c6ac	util: Add DVFS perfLevel to checkpoint upgrade script This patch updates the checkpoint upgrader script. It adds the _perfLevel variable in the clock domain and voltage domain simObjects used for DVFS.	2014-07-01 11:58:22 -04:00
Stephan Diestelhorst	65cea4708e	power: Add basic DVFS support for gem5 Adds DVFS capabilities to gem5, by allowing users to specify lists for frequencies and voltages in SrcClockDomains and VoltageDomains respectively. A separate component, DVFSHandler, provides a small interface to change operating points of the associated domains. Clock domains will be linked to voltage domains and thus allow separate clock, but shared voltage lines. Currently all the valid performance-level updates are performed with a fixed transition latency as specified for the domain. Config file example: ... vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V']) tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz'] tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz'] tsys.cluster1.clk_domain.domain_id = 0 tsys.cluster2.clk_domain.domain_id = 1 tsys.cluster1.clk_domain.voltage_domain = vd tsys.cluster2.clk_domain.voltage_domain = vd tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain, tsys.cluster2.clk_domain] tsys.dvfs_handler.enable = True	2014-06-30 13:56:06 -04:00
Andreas Hansson	1f539ce4cc	mem: DRAMPower trace output This patch adds a DRAMPower flag to enable off-line DRAM power analysis using the DRAMPower tool. A new DRAMPower flag is added and a follow-on patch adds a Python script to post-process the output and order it based on time stamps. The long-term goal is to link DRAMPower as a library and provide the commands through function calls to the model rather than first printing and then parsing the commands. At the moment it is also up to the user to ensure that the same DRAM configuration is used by the gem5 controller model and DRAMPower.	2014-06-30 13:56:03 -04:00
Andreas Hansson	b4ce51eb9e	mem: Add bank and rank indices as fields to the DRAM bank This patch adds the index of the bank and rank as a field so that we can determine the identity of a given bank (reference or pointer) for the power tracing. We also grab the opportunity of cleaning up the arguments used for identifying the bank when activating.	2014-06-30 13:56:02 -04:00
Andreas Hansson	d59bc8ee1f	mem: Extend DRAM row bits from 16 to 32 for larger densities This patch extends the DRAM row bits to 32 to support larger density memories. Additional checks are also added to ensure the row fits in the 32 bits.	2014-06-30 13:56:01 -04:00
Anthony Gutierrez	f34a8f0d61	cpu: implement a bi-mode branch predictor	2014-06-30 13:50:03 -04:00
Binh Pham	b085db84af	x86: fix table walker assertion In a cycle, we could see a R and W requests corresponding to the same page walk being sent to the memory. During the cycle that assertion happens, we have 2 responses corresponding to the R and W above. We also have a 'read' variable to keep track of the inflight Read request, this gets reset to NULL right after we send out any R request; and gets set to the next R in the page walk when a response comes back. The issue we are seeing here is when we get a response for W request, assert(!read) fires because we got a response for R request right before this, hence we set 'read' to NOT NULL value, pointing to the next R request in the pagewalk! This work was done while Binh was an intern at AMD Research.	2014-06-21 10:39:44 -07:00
Binh Pham	b72c879868	o3: make dispatch LSQ full check more selective Dispatch should not check LSQ size/LSQ stall for non load/store instructions. This work was done while Binh was an intern at AMD Research.	2014-06-21 10:26:55 -07:00
Binh Pham	0782d92286	o3: split load & store queue full cases in rename Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa. This work was done while Binh was an intern at AMD Research.	2014-06-21 10:26:43 -07:00
Andreas Hansson	fdb965f5c1	scons: Bump the compiler version to gcc 4.6 and clang 3.0 This patch bumps the supported version of gcc from 4.4 to 4.6, and clang from 2.9 to 3.0. This enables, amongst other things, range-based for loops, lambda expressions, etc. The STL implementation shipping with 4.6 also has a full functional implementation of unique_ptr and shared_ptr.	2014-06-10 17:44:39 -04:00
Joel Hestness	4f8ac94549	sim: More rigorous clocking comments The language describing the clockEdge and nextCycle functions were ambiguous, and so were prone to misinterpretation/misuse. Clear up the comments to more rigorously describe their functionality.	2014-06-09 22:01:16 -05:00
Steve Reinhardt	0be64ffe2f	style: eliminate equality tests with true and false Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'. It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up. Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code.	2014-05-31 18:00:23 -07:00
Nilay Vaish	e685767b58	ruby: slicc: remove unused ids DNUCA*	2014-05-23 06:07:02 -05:00
Nilay Vaish	9c9257a612	ruby: remove old protocol documentation	2014-05-23 06:07:02 -05:00
Nilay Vaish	8bf41e41c1	ruby: message buffer: drop dequeue_getDelayCycles() The functionality of updating and returning the delay cycles would now be performed by the dequeue() function itself.	2014-05-23 06:07:02 -05:00
Nilay Vaish	1e26b7ea29	cpu: o3: remove stat totalCommittedInsts This patch removes the stat totalCommittedInsts. This variable was used for recording the total number of instructions committed across all the threads of a core. The instructions committed by each thread are recorded invidually. The total would now be generated by summing these individual counts.	2014-05-23 06:07:02 -05:00
Steve Reinhardt	109908c2a6	syscall emulation: clean up & comment SyscallReturn	2014-05-12 14:23:31 -07:00
Andreas Hansson	f800f268db	mem: Update DDR3 and DDR4 based on datasheets This patch makes a more firm connection between the DDR3-1600 configuration and the corresponding datasheet, and also adds a DDR3-2133 and a DDR4-2400 configuration. At the moment there is also an ongoing effort to align the choice of datasheets to what is available in DRAMPower.	2014-05-09 18:58:49 -04:00
Andreas Hansson	cc4ca78f99	mem: Add DRAM cycle time This patch extends the current timing parameters with the DRAM cycle time. This is needed as the DRAMPower tool expects timestamps in DRAM cycles. At the moment we could get away with doing this in a post-processing step as the DRAMPower execution is separate from the simulation run. However, in the long run we want the tool to be called during the simulation, and then the cycle time is needed.	2014-05-09 18:58:49 -04:00
Andreas Hansson	8c56efe747	mem: Simplify DRAM response scheduling This patch simplifies the DRAM response scheduling based on the assumption that they are always returned in order.	2014-05-09 18:58:48 -04:00
Andreas Hansson	8e3869411d	mem: Add precharge all (PREA) to the DRAM controller This patch adds the basic ingredients for a precharge all operation, to be used in conjunction with DRAM power modelling. Currently we do not try and apply any cleverness when precharging all banks, thus even if only a single bank is open we use PREA as opposed to PRE. At the moment we only have a single tRP (tRPpb), and do not model the slightly longer all-bank precharge constraint (tRPab).	2014-05-09 18:58:48 -04:00
Andreas Hansson	0ba1e72e9b	mem: Remove printing of DRAM params This patch removes the redundant printing of DRAM params.	2014-05-09 18:58:48 -04:00
Andreas Hansson	6753cb705e	mem: Add tRTP to the DRAM controller This patch adds the tRTP timing constraint, governing the minimum time between a read command and a precharge. Default values are provided for the existing DRAM types.	2014-05-09 18:58:48 -04:00
Andreas Hansson	60799dc552	mem: Merge DRAM latency calculation and bank state update This patch merges the two control paths used to estimate the latency and update the bank state. As a result of this merging the computation is now in one place only, and should be easier to follow as it is all done in absolute (rather than relative) time. As part of this change, the scheduling is also refined to ensure that we look at a sensible estimate of the bank ready time in choosing the next request. The bank latency stat is removed as it ends up being misleading when the DRAM access code gets evaluated ahead of time (due to the eagerness of waking the model up for scheduling the next request).	2014-05-09 18:58:48 -04:00
Andreas Hansson	b8631d9ae8	mem: Add tWR to DRAM activate and precharge constraints This patch adds the write recovery time to the DRAM timing constraints, and changes the current tRASDoneAt to a more generic preAllowedAt, capturing when a precharge is allowed to take place. The part of the DRAM access code that accounts for the precharge and activate constraints is updated accordingly.	2014-05-09 18:58:48 -04:00
Andreas Hansson	c735ef6cb0	mem: Merge DRAM page-management calculations This patch treats the closed page policy as yet another case of auto-precharging, and thus merges the code with that used for the other policies.	2014-05-09 18:58:48 -04:00
Andreas Hansson	87f4c956c4	mem: Add DRAM power states to the controller This patch adds power states to the controller. These states and the transitions can be used together with the Micron power model. As a more elaborate use-case, the transitions can be used to drive the DRAMPower tool. At the moment, the power-down modes are not used, and this patch simply serves to capture the idle, auto refresh and active modes. The patch adds a third state machine that interacts with the refresh state machine.	2014-05-09 18:58:48 -04:00
Andreas Hansson	babf072c1c	mem: Ensure DRAM refresh respects timings This patch adds a state machine for the refresh scheduling to ensure that no accesses are allowed while the refresh is in progress, and that all banks are propely precharged. As part of this change, the precharging of banks of broken out into a method of its own, making is similar to how activations are dealt with. The idle accounting is also updated to ensure that the refresh duration is not added to the time that the DRAM is in the idle state with all banks precharged.	2014-05-09 18:58:48 -04:00
Andreas Hansson	5c2c3f598e	mem: Make DRAM read/write switching less conservative This patch changes the read/write event loop to use a single event (nextReqEvent), along with a state variable, thus joining the two control flows. This change makes it easier to follow the state transitions, and control what happens when. With the new loop we modify the overly conservative switching times such that the write-to-read switch allows bank preparation to happen in parallel with the bus turn around. Similarly, the read-to-write switch uses the introduced tRTW constraint.	2014-05-09 18:58:48 -04:00
Ali Saidi	dbaf43394b	arm: Make sure UndefinedInstructions are properly initialized	2014-04-17 16:56:09 -05:00
Ali Saidi	a00b44ebe8	arm: allow DC instructions by default so SE mode works	2014-04-17 16:55:54 -05:00
Ali Saidi	c4a2f76fea	sim, arm: implement more of the at variety syscalls Needed for new AArch64 binaries	2014-04-17 16:55:05 -05:00
Andrew Bardsley	f5c3f60601	cpu: Useful getters for ActivityRecorder Add some useful getters to ActivityRecorder	2014-05-09 18:58:48 -04:00
Andrew Bardsley	bf78299f04	cpu: Add flag name printing to StaticInst This patch adds a the member function StaticInst::printFlags to allow all of an instruction's flags to be printed without using the individual is... member functions or resorting to exposing the 'flags' vector It also replaces the enum definition StaticInst::Flags with a Python-generated enumeration and adds to the enum generation mechanism in src/python/m5/params.py to allow Enums to be placed in namespaces other than Enums or, alternatively, in wrapper structs allowing them to be inherited by other classes (so populating that class's name-space with the enumeration element names).	2014-05-09 18:58:47 -04:00
Andrew Bardsley	8087d2622d	cpu: Timebuf const accessors Add const accessors for timebuf elements.	2014-05-09 18:58:47 -04:00
Andrew Bardsley	f7d80348fa	arm: Add branch flags onto macroops Mark branch flags onto macroops to allow branch prediction before microop decomposition	2014-05-09 18:58:47 -04:00
Andrew Bardsley	eab00f4966	cpu: Allow setWhen on trace objects Allow setting of 'when' in trace records. This allows later times than the arbitrary record creation point to be used as inst. times	2014-05-09 18:58:47 -04:00
Curtis Dunham	af39ab297f	arm: add preliminary ISA splits for ARM arch	2014-05-09 18:58:47 -04:00
Curtis Dunham	fe27f937aa	arch: teach ISA parser how to split code across files This patch encompasses several interrelated and interdependent changes to the ISA generation step. The end goal is to reduce the size of the generated compilation units for instruction execution and decoding so that batch compilation can proceed with all CPUs active without exhausting physical memory. The ISA parser (src/arch/isa_parser.py) has been improved so that it can accept 'split [output_type];' directives at the top level of the grammar and 'split(output_type)' python calls within 'exec {{ ... }}' blocks. This has the effect of "splitting" the files into smaller compilation units. I use air-quotes around "splitting" because the files themselves are not split, but preprocessing directives are inserted to have the same effect. Architecturally, the ISA parser has had some changes in how it works. In general, it emits code sooner. It doesn't generate per-CPU files, and instead defers to the C preprocessor to create the duplicate copies for each CPU type. Likewise there are more files emitted and the C preprocessor does more substitution that used to be done by the ISA parser. Finally, the build system (SCons) needs to be able to cope with a dynamic list of source files coming out of the ISA parser. The changes to the SCons{cript,truct} files support this. In broad strokes, the targets requested on the command line are hidden from SCons until all the build dependencies are determined, otherwise it would try, realize it can't reach the goal, and terminate in failure. Since build steps (i.e. running the ISA parser) must be taken to determine the file list, several new build stages have been inserted at the very start of the build. First, the build dependencies from the ISA parser will be emitted to arch/$ISA/generated/inc.d, which is then read by a new SCons builder to finalize the dependencies. (Once inc.d exists, the ISA parser will not need to be run to complete this step.) Once the dependencies are known, the 'Environments' are made by the makeEnv() function. This function used to be called before the build began but now happens during the build. It is easy to see that this step is quite slow; this is a known issue and it's important to realize that it was already slow, but there was no obvious cause to attribute it to since nothing was displayed to the terminal. Since new steps that used to be performed serially are now in a potentially-parallel build phase, the pathname handling in the SCons scripts has been tightened up to deal with chdir() race conditions. In general, pathnames are computed earlier and more likely to be stored, passed around, and processed as absolute paths rather than relative paths. In the end, some of these issues had to be fixed by inserting serializing dependencies in the build. Minor note: For the null ISA, we just provide a dummy inc.d so SCons is never compelled to try to generate it. While it seems slightly wrong to have anything in src/arch/*/generated (i.e. a non-generated 'generated' file), it's by far the simplest solution.	2014-05-09 18:58:47 -04:00
Geoffrey Blake	0c1913336a	config: Avoid generating a reference to myself for Parent.any The unproxy code for Parent.any can generate a circular reference in certain situations with classes hierarchies like those in ClockDomain.py. This patch solves this by marking ouself as visited to make sure the search does not resolve to a self-reference.	2014-05-09 18:58:47 -04:00
Geoffrey Blake	85940fd537	arch, arm: Preserve TLB bootUncacheability when switching CPUs The ARM TLBs have a bootUncacheability flag used to make some loads and stores become uncacheable when booting in FS mode. Later the flag is cleared to let those loads and stores operate as normal. When doing a takeOverFrom(), this flag's state is not preserved and is momentarily reset until the CPSR is touched. On single core runs this is a non-issue. On multi-core runs this can lead to crashes on the O3 CPU model from the following series of events: 1) takeOverFrom executed to switch from Atomic -> O3 2) All bootUncacheability flags are reset to true 3) Core2 tries to execute a load covered by bootUncacheability, it is flagged as uncacheable 4) Core2's load needs to replay due to a pipeline flush 3) Core1 core does an action on CPSR 4) The handling code for CPSR then checks all other cores to determine if bootUncacheability can be set to false 5) Asynchronously set bootUncacheability on all cores to false 6) Core2 replays load previously set as uncacheable and notices it is now flagged as cacheable, leads to a panic. This patch implements takeOverFrom() functionality for the ARM TLBs to preserve flag values when switching from atomic -> detailed.	2014-05-09 18:58:47 -04:00
Curtis Dunham	1028c03320	cpu: add more instruction mix statistics For the o3, add instruction mix (OpClass) histogram at commit (stats also already collected at issue). For the simple CPUs we add a histogram of executed instructions	2014-05-09 18:58:47 -04:00
Mitch Hayenga	a15b713cba	mem: Squash prefetch requests from downstream caches This patch squashes prefetch requests from downstream caches, so that they do not steal cachelines away from caches closer to the cpu. It was originally coded by Mitch Hayenga and modified by Aasheesh Kolli.	2014-05-09 18:58:46 -04:00
Stephan Diestelhorst	b9e6c260a0	stats: Method stats source This source for stats binds an object and a method / function from the object to a stats object. This allows pulling out stats from object methods without needing to go through a global, or static shim. Syntax is somewhat unpleasant, but the templates and method pointer type specification were quite tricky. Interface is very clean though; and similar to .functor	2014-05-09 18:58:46 -04:00
Akash Bagdia	2b1a01ee6c	cpu, arm: Allow the specification of a socket field Allow the specification of a socket ID for every core that is reflected in the MPIDR field in ARM systems. This allows studying multi-socket / cluster systems with ARM CPUs.	2014-05-09 18:58:46 -04:00
Sascha Bischoff	e940bac278	mem: Auto-generate CommMonitor trace file names Splits the CommMonitor trace_file parameter into three parameters. Previously, the trace was only enabled if the trace_file parameter was set, and would be written to this file. This patch adds in a trace_enable and trace_compress parameter to the CommMonitor. No trace is generated if trace_enable is set to False. If it is set to True, the trace is written to a file based on the name of the SimObject in the simulation hierarchy. For example, system.cluster.il1_commmonitor.trc. This filename can be overridden by additionally specifying a file name to the trace_file parameter (more on this later). The trace_compress parameter will append .gz to any filename if set to True. This enables compression of the generated traces. If the file name already ends in .gz, then no changes are made. The trace_file parameter will override the name set by the trace_enable parameter. In the case that the specified name does not end in .gz but trace_compress is set to true, .gz is appended to the supplied file name.	2014-05-09 18:58:46 -04:00
Geoffrey Blake	29601eada7	arm: Panics in miscreg read functions can be tripped by O3 model Unimplemented miscregs for the generic timer were guarded by panics in arm/isa.cc which can be tripped by the O3 model if it speculatively executes a wrong path containing a mrs instruction with a bad miscreg index. These registers were flagged as implemented and accessible. This patch changes the miscreg info bit vector to flag them as unimplemented and inaccessible. In this case, and UndefinedInst fault will be generated if the register access is not trapped by a hypervisor.	2014-05-09 18:58:46 -04:00
Chris Emmons	a3306d0d5e	dev: Set HDLCD default pixel clock for 1080p @ 60Hz This patch changes the default pixel clock to effectively generate 1080p resolution at 60 frames per second. It is dependent upon the kernel device tree file using the specified resolution / display string in the comments.	2014-05-09 18:58:46 -04:00
Matt Evans	73dc89e542	arm: quick hack to allow a greater number of CPUs to a guest OS This is a quick hack to communicate a greater number of CPUs to a guest OS via the ARM A9 SCU config register. Some OSes (Linux) just look at the bottom field to count CPUs and with a small change can look at bits [3:0] to learn about up to 16 CPUs. Very much unsupported (and contains warning messages as such) but useful for running 8 core sims without hardwiring CPU count in the guest OS.	2014-05-09 18:58:46 -04:00
Curtis Dunham	7f1603d207	arch: remove inline specifiers on all inst constrs, all ISAs With (upcoming) separate compilation, they are useless. Only link-time optimization could re-inline them, but ideally feedback-directed optimization would choose to do so only for profitable (i.e. common) instructions.	2014-05-09 18:58:46 -04:00
Curtis Dunham	eb61f0123b	arm: cleanup ARM ISA definition	2014-05-09 18:58:46 -04:00
Curtis Dunham	ad019c5c58	scons: Require SWIG >= 2.0.4 and remove vector typemaps SWIG commit fd666c1 () made it unnecessary for gem5 to have these typemaps to handle Vector types. `fd666c1440`	2014-05-09 18:58:46 -04:00
Curtis Dunham	ecf774bc56	arm: Correctly display disassembly of vldmia/vstmia The MicroMemOp class generates the disassembly for both integer and floating point instructions, but it would always print its first operand as an integer register without considering that the op may be a floating instruction in which case a float register should be displayed instead.	2014-04-23 05:18:30 -04:00
Andreas Hansson	6cd82c1116	sim: Use correct unit for abort message This patch fixes the unit in the abort printout.	2014-04-23 05:18:27 -04:00
Mitchell Hayenga	bf25c53a7d	cpu: Fix setTranslateLatency() bug for squashed instructions setTranslateLatency could sometimes improperly access a deleted request packet after an instruction was squashed.	2014-04-23 05:18:26 -04:00
Sascha Bischoff	2031c03c09	misc: Proper type check and import for PortRef Rewriting the type checking around PortRef, which was interacting strangely with other Python scripts. Tested-by: stephan.diestelhorst@arm.com	2014-04-23 05:18:25 -04:00
Mitch Hayenga	e4086878f6	cpu: Fix case where o3 lsq could print out uninitialized data In the O3 LSQ, data read/written is printed out in DPRINTFs. However, the data field is treated as a character string with a null terminated. However the data field is not encoded this way. This patch removes that possibility by removing the data part of the print.	2014-04-01 14:22:06 -05:00
Mitch Hayenga	a0d30f36a6	mem: Don't print out the data of a cache block This never actually worked since it was printing out only a word of the cache block and not the entire thing and doubly didn't work csprintf overrides the %#x specifier and assumes a char* array is actually a string.	2014-04-01 14:24:36 -05:00
Mitchell Hayenga	0fad0c7f7d	arm: Don't use a stack allocated mnemonic FailUnimplemented passed a stack created mnemonic as a const char * which causes some grief when the stack goes away.	2014-04-23 05:18:20 -04:00
Dam Sunwoo	84f8fe637c	cpu: Add O3 CPU width checks O3CPU has a compile-time maximum width set in o3/impl.hh, but checking the configuration against this limit was not implemented anywhere except for fetch. Configuring a wider pipe than the limit can silently cause various issues during the simulation. This patch adds the proper checking in the constructor of the various pipeline stages.	2014-04-23 05:18:18 -04:00
Curtis Dunham	c9071ff95e	base: explicitly suggest potential use of 'All' debug flags Without this declaration, new clangs will complain about this value being unused. It has no explicit use in the codebase, but it can be useful to turn on all debugging flags while in a debugger to greatly increase simulator verbosity.	2014-04-23 05:17:59 -04:00
Curtis Dunham	e651188f75	arch: remove 'null update' check in isa-parser SCons already does this for all build steps.	2014-04-23 05:17:57 -04:00
Curtis Dunham	fa4a262204	stats: better error message for uninitialized statistic As suggested by Nathan Binkert in 2008: http://permalink.gmane.org/gmane.comp.emulators.m5.users/2676	2014-02-10 18:24:20 -06:00
Nilay Vaish	4ceeda20aa	ruby: slicc: remove old documentation Has not been maintained at all. Since there is alternate documentation available on gem5.org, no need to have this separately.	2014-04-19 09:00:31 -05:00
Nilay Vaish	183100b8cb	ruby: slicc: slight change to rule for transitions It had an unnecessary pairs token which is being removed.	2014-04-19 09:00:31 -05:00
Faissal Sleiman	a1570f544f	o3: Fix occupancy checks for SMT A number of calls to isEmpty() and numFreeEntries() should be thread-specific. In cpu.cc, the fact that tid is /commented/ out is a bug. Say the rob has instructions from thread 0 (isEmpty() returns false), and none from thread 1. If we are trying to squash all of thread 1, then readTailInst(thread 1) will be called because rob->isEmpty() returns false. The result is end_it is not in the list and the while statement loops indefinitely back over the cpu's instList. In iew_impl.hh, all threads are told they have the entire remaining IQ, when each thread actually has a certain allocation. The result is extra stalls at the iew dispatch stage which the rename stage usually takes care of. In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only contains instructions from thread 0. This returns a dummyInst (which may work since we are trying to squash all instructions, but hardly seems like the right way to do it). In rob_impl.hh this fix skips the rest of the function more frequently and is more efficient. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-04-19 09:00:30 -05:00
Marco Elver	d9fa950396	ruby: recorder: Fix (de-)serializing with different cache block-sizes Upon aggregating records, serialize system's cache-block size, as the cache-block size can be different when restoring from a checkpoint. This way, we can correctly read all records when restoring from a checkpoints, even if the cache-block size is different. Note, that it is only possible to restore from a checkpoint if the desired cache-block size is smaller or equal to the cache-block size when the checkpoint was taken; we can split one larger request into multiple small ones, but it is not reliable to do the opposite. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-04-19 09:00:30 -05:00
Andreas Sandberg	02b51afb7e	kvm, x86: Add initial support for multicore simulation Simulating a SMP or multicore requires devices to be shared between multiple KVM vCPUs. This means that locking is required when accessing devices. This changeset adds the necessary locking to allow devices to execute correctly. It is implemented by temporarily migrating the KVM CPU to the VM's (and devices) event queue when handling MMIO. Similarly, the VM migrates to the interrupt controller's event queue when delivering an interrupt. The support for fast-forwarding of multicore simulations added by this changeset assumes that all devices in a system are simulated in the same thread and each vCPU has its own thread. Special care must be taken to ensure that devices living under the CPU in the object hierarchy (e.g., the interrupt controller) do not inherit the parent CPUs thread and are assigned to device thread. The KvmVM object is assumed to live in the same thread as the other devices in the system.	2014-04-09 16:01:58 +02:00
Andreas Sandberg	221f4f232a	dev: Protect PollEvent processing when running in parallel mode The calling thread is undefined when the PollQueue services events. This implies that PollEvents need to handle the case where they are processed from a different thread than the thread that created the event. This changeset adds temporary event queue migrations to the VNC server, the ethernet tap device, and the terminal to protect them from inter-thread calls.	2014-04-09 16:01:43 +02:00
Nilay Vaish	d805e42b81	ruby: slicc: change enqueue statement As of now, the enqueue statement can take in any number of 'pairs' as argument. But we only use the pair in which latency is the key. This latency is allowed to be either a fixed integer or a member variable of controller in which the expression appears. This patch drops the use of pairs in an enqueue statement. Instead, an expression is allowed which will be interpreted to be the latency of the enqueue. This expression can anything allowed by slicc including a constant integer or a member variable.	2014-04-08 13:26:30 -05:00
Nilay Vaish	e689c00b16	ruby: coherence protocols: drop the phrase IntraChip The phrase is no longer valid since we do not distinguish between inter and intra chip communication.	2014-04-08 13:26:29 -05:00
Andreas Sandberg	838bcd3b19	sim: Add the ability to lock and migrate between event queues We need the ability to lock event queues to enable device accesses across threads. The serviceOne() method now takes a service lock prior to handling a new event. By locking an event queue, a different thread/eq can effectively execute in the context of the locked event queue. To simplify temporary event queue migrations, this changeset introduces the EventQueue::ScopedMigration class that unlocks the current event queue, locks a new event queue, and updates the current event queue variable. In order to prevent deadlocks, event queues need to be released when waiting on barriers. This is implemented using the EventQueue::ScopedRelease class. An instance of this class is, for example, used in the BaseGlobalEvent class to release the event queue when waiting on the synchronization barrier. The intended use for this functionality is when devices need to be accessed across thread boundaries. For example, when fast-forwarding, it might be useful to run devices and CPUs in separate threads. In such a case, the CPU locks the device queue whenever it needs to perform IO. This functionality is primarily intended for KVM. Note: Migrating between event queues can lead to non-deterministic timing. Use with extreme care! --HG-- extra : rebase_source : 23e3a741a1fd73861d1339782dbbe1bc76285315	2014-04-03 11:22:49 +02:00
Marco Elver	b884fcf412	cpu: o3: lsq: Fix TSO implementation This patch fixes violation of TSO in the O3CPU, as all loads must be ordered with all other loads. In the LQ, if a snoop is observed, all subsequent loads need to be squashed if the system is TSO. Prior to this patch, the following case could be violated: P0 \| P1 ; MOV [x],mail=/usr/spool/mail/nilay \| MOV EAX,[y] ; MOV [y],mail=/usr/spool/mail/nilay \| MOV EBX,[x] ; exists (1:EAX=1 /\ 1:EBX=0) [is a violation] The problem was found using litmus [http://diy.inria.fr]. Committed by: Nilay Vaish <nilay@cs.wisc.edu	2014-03-25 13:15:04 -05:00
Andreas Hansson	a00383a40a	mem: Track DRAM read/write switching and add hysteresis This patch adds stats for tracking the number of reads/writes per bus turn around, and also adds hysteresis to the write-to-read switching to ensure that the queue does not oscilate around the low threshold.	2014-03-23 11:12:14 -04:00
Andreas Hansson	7c18691db1	mem: Rename SimpleDRAM to a more suitable DRAMCtrl This patch renames the not-so-simple SimpleDRAM to a more suitable DRAMCtrl. The name change is intended to ensure that we do not send the wrong message (although the "simple" in SimpleDRAM was originally intended as in cleverly simple, or elegant). As the DRAM controller modelling work is being presented at ISPASS'14 our hope is that a broader audience will use the model in the future. --HG-- rename : src/mem/SimpleDRAM.py => src/mem/DRAMCtrl.py rename : src/mem/simple_dram.cc => src/mem/dram_ctrl.cc rename : src/mem/simple_dram.hh => src/mem/dram_ctrl.hh	2014-03-23 11:12:12 -04:00
Andreas Hansson	3dd1587afc	mem: Change memory defaults to be more representative Make the default memory type DDR3-1600 x64, and use the open-adaptive page policy. This change is aiming to ensure that users by default are using a realistic memory system.	2014-03-23 11:12:10 -04:00
Wendy Elsasser	bbbae677ed	mem: Add close adaptive paging policy to DRAM controller model This patch adds a second adaptive page policy to the DRAM controller, closing the page unless there are already queued accesses to the open page.	2014-03-23 11:12:08 -04:00
Andreas Hansson	03a1aed803	mem: DRAM controller tidying up Minor tidying up and removing of redundant code, including the printing of queue state every million accesses.	2014-03-23 11:12:06 -04:00
Andreas Hansson	bc83eb2197	mem: Fix bug in DRAM bytes per activate This patch ensures that we do not sample the bytes per activate when the row has already been closed.	2014-03-23 11:12:05 -04:00
Andreas Hansson	116985d661	mem: Limit the accesses to a page before forcing a precharge This patch adds a basic starvation-prevention mechanism where a DRAM page is forced to close after a certain number of accesses. The limit is combined with the open and open-adaptive page policy and if reached causes an auto-precharge.	2014-03-23 11:12:03 -04:00
Andreas Hansson	6557741311	mem: Make DRAM write queue draining more aggressive This patch changes the triggering condition for the write draining such that we grab the opportunity to issue writes if there are no reads waiting (as opposed to waiting for the writes to reach the high threshold). As a result, we potentially drain some of the writes in read idle periods (if any). A low threshold is added to be able to control how many write bursts are kept in the memory controller queue (acting as on-chip storage). The high and low thresholds are updated to sensible values for a 32/64 size write buffer. Note that the thresholds should be adjusted along with the queue sizes. This patch also adds some basic initialisation sanity checks and moves part of the initialisation to the constructor.	2014-03-23 11:12:01 -04:00
Neha Agarwal	364a51181e	cpu: DRAM Traffic Generator This patch enables a new 'DRAM' mode to the existing traffic generator, catered to generate specific requests to DRAM based on required hit length (stride size) and bank utilization. It is an add on to the Random mode. The basic idea is to control how many successive packets target the same page, and how many banks are being used in parallel. This gives a two-dimensional space that stresses different aspects of the DRAM timing. The configuration file needed to use this patch has to be changed as follow: (reference to Random Mode, LPDDR3 memory type) 'STATE 0 10000000000 RANDOM 50 0 134217728 64 3004 5002 0' -> 'STATE 0 10000000000 DRAM 50 0 134217728 32 3004 5002 0 96 1024 8 6 1' The last 4 parameters to be added are: <stride size (bytes), page size(bytes), number of banks available in DRAM, number of banks to be utilized, address mapping scheme> The address mapping information is used to get the stride address stream of the specified size and to know where to find the bank bits. The configuration file has a parameter where '0'-> RoCoRaBaCh, '1'-> RoRaBaCoCh/RoRaBaChCo address-mapping schemes. Note that the generator currently assumes a single channel and a single rank. This is to avoid overwhelming the traffic generator with information about the memory organisation.	2014-03-23 11:11:58 -04:00
Neha Agarwal	43abaf518f	mem: DDR3 config for comparing with DRAMSim2 This patch adds a new DDR3 configuration to match with the parameters that are specified in one of the DDR3 configs used in DRAMSim2.	2014-03-23 11:11:56 -04:00
Andreas Hansson	7e7b67472a	mem: More descriptive address-mapping scheme names This patch adds the row bits to the name of the address mapping schemes to make it more clear that all the current schemes places the row bits as the most significant bits.	2014-03-23 11:11:53 -04:00
Stan Czerniawski	4f77bc230a	misc: Fix -q (quiet) flag Check the right flag.	2014-03-23 11:11:49 -04:00
Andreas Hansson	9ac4f781ec	ruby: Move Ruby debug flags to ruby dir and remove stale options This patch moves the Ruby-related debug flags to the ruby sub-directory, and also removes the state SConsopts that add the no-longer-used NO_VECTOR_BOUNDS_CHECK.	2014-03-23 11:11:48 -04:00
Andreas Hansson	9f018d2f5a	mem: Include the DRAMSim2 wrapper in NULL build This patch makes sure DRAMSim2 is included in a build of the NULL ISA.	2014-03-23 11:11:44 -04:00
Sascha Bischoff	548d47ea2c	mem: CommMonitor trace warn on non-timing mode Add a warning to the CommMonitor which will alert the user if they try and record a trace when the system is not in timing mode.	2014-03-23 11:11:40 -04:00
Stan Czerniawski	e18d0e04a2	cpu: Add basic check to TrafficGen initial state Prevent incomplete configuration of TrafficGen class from causing segmentation faults. If an 'INIT' line is not present in the configuration file then the currState variable will remain uninitialized which may result in a crash.	2014-03-23 11:11:39 -04:00
Andrew Bardsley	0c001e729a	dev: Fix IsaFake's cxx_header setting cxx_header was set incorrectly on IsaFake	2014-03-23 11:11:37 -04:00
Eric Van Hensbergen	7630168a75	arm: m5ops readfile64 args broken, offset coming through garbage There were several sections of the m5ops code which were essentially copy/pasted versions of the 32-bit code. The problem is that some of these didn't account fo4 64-bit registers leading to arguments being in the wrong registers. This patch addresses the args for readfile64, writefile64, and addsymbol64 -- all of which seemed to suffer from a similar set of problems when moving to 64-bit.	2014-03-23 11:11:34 -04:00
Andreas Hansson	5093e58dc2	base: Fix error message time unit (cycle -> tick) This patch fixes the unit used in all error messages.	2014-03-23 11:11:32 -04:00
Nilay Vaish	52a83c1d0e	ruby: consumer: avoid accessing wakeup times when waking up Each consumer object maintains a set of tick values when the object is supposed to wakeup and do some processing. As of now, the object accesses this set both when scheduling a wakeup event and when the object actually wakes up. The set is accessed during wakeup to remove the current tick value from the set. This functionality is now being moved to the scheduling function where ticks are removed at a later time.	2014-03-20 09:14:14 -05:00
Nilay Vaish	4b67ada89e	ruby: garnet: convert network interfaces into clocked objects This helps in configuring the network interfaces from the python script and these objects no longer rely on the network object for the timing information.	2014-03-20 09:14:14 -05:00
Nilay Vaish	4f7ef51efb	ruby: slicc: code refactor	2014-03-20 09:14:14 -05:00
Nilay Vaish	9b3418d163	ruby: no piobus in se mode Piobus was recently added to se scripts for ruby so that the interrupt controller can be connected to something (required since the interrupt controller sends address range messages). This patch removes the piobus and instead, the pio port of ruby port will now ignore the range change messages in se mode.	2014-03-20 08:03:09 -05:00
Nilay Vaish	f7e7fa6d90	ruby: remove some of the unnecessary code	2014-03-17 17:40:14 -05:00
Andreas Sandberg	11ffa379ab	kvm: Clean up signal handling KVM used to use two signals, one for instruction count exits and one for timer exits. There is really no need to distinguish between the two since they only trigger exits from KVM. This changeset unifies and renames the signals and adds a method, kick(), that can be used to raise the control signal in the vCPU thread. It also removes the early timer warning since we do not normally see if the signal was delivered. --HG-- extra : rebase_source : cd0e45ca90894c3d6f6aa115b9b06a1d8f0fda4d	2014-03-16 17:40:58 +01:00
Andreas Sandberg	5db547bca4	kvm: x86: Adjust PC to remove the CS segment base address gem5 seems to store the PC as RIP+CS_BASE. This is not what KVM expects, so we need to subtract CS_BASE prior to transferring the PC into KVM. This changeset adds the necessary PC manipulation and refactors thread context updates slightly to avoid reading registers multiple times from KVM. --HG-- extra : rebase_source : 3f0569dca06a1fcd8694925f75c8918d954ada44	2014-03-16 17:30:24 +01:00
Andreas Sandberg	f791e7b313	kvm: x86: Add support for x86 INIT and STARTUP handling This changeset adds support for INIT and STARTUP IPI handling. We currently handle both of these interrupts in gem5 and transfer the state to KVM. Since we do not have a BIOS loaded, we pretend that the INIT interrupt suspends the CPU after reset. --HG-- extra : rebase_source : 7f3b25f3801d68f668b6cd91eaf50d6f48ee2a6a	2014-03-16 17:28:23 +01:00
Paul Rosenfeld	32bf74cb8e	alpha: Small removal of dead comments/code from alpha ISA Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-03-12 07:03:22 -05:00
Andreas Hansson	62fe81e9c1	cpu: Make CPU and ThreadContext getters const This patch merely tidies up the CPU and ThreadContext getters by making them const where appropriate.	2014-03-07 15:56:23 -05:00
Geoffrey Blake	c4a8e5c36c	arm: Handle functional TLB walks properly The table walker code currently accounts for two types of walks, Atomic and Timing, and treats them differently. Atomic walks keep a single instance of WalkerState around for all walks to use in currState. Timing mode keeps a queue of in-flight WalkerStates and maintains currState as NULL between walks. If a functional walk is done during Timing mode, it is treated as an atomic walk and either creates a persistent WalkerState if in between Timing walks, or stomps an existing currState for an in-progress Timing walk. This patch distinguishes functional walks as being able to exist at any time and sets up a temporary WalkerState for its exclusive use and then cleans up when finished, leaving any in progress Atomic or Timing walks undisturbed.	2014-03-07 15:56:23 -05:00
Prakash Ramrakhyani	e88cffb30a	mem: Fix incorrect assert failure in the Cache This patch fixes an assert condition that is not true at all times. There are valid situations that arise in dual-core dual-workload runs where the assert condition is false. The function call following the assert however needs to be called only when the condition is true (a block cannot be invalidated in the tags structure if has not been allocated in the structure, and the tempBlock is never allocated). Hence the 'assert' has been replaced with an 'if'.	2014-03-07 15:56:23 -05:00
Radhika Jagtap	c446dc40bd	mem: Edit proto Packet and enhance the python script This patch changes the decode script to output the optional fields of the proto message Packet, namely id and flags. The flags field is set by the communication monitor. The id field is useful for CPU trace experiments, e.g. linking the fetch side to decode side. It had to be renamed because it clashes with a built in python function id() for getting the "identity" of an object. This patch also takes a few common function definitions out from the multiple scripts and adds them to a protolib python module.	2014-03-07 15:56:23 -05:00
Stephan Diestelhorst	45677ffa97	misc: Add panic_if / fatal_if / chatty_assert This snippet can be used to replace if + {panics, fatals, asserts} constructs. The idea is to have both the condition checking and a verbose printout in a single statement. The interface is as follows: panic_if(foo != bar, "These should be equal: foo %i bar %i", foo, bar); fatal_if(foo != bar, "These should be equal: foo %i bar %i", foo, bar); chatty_assert(foo == bar, "These should be equal: foo %i bar %i", foo, bar);	2014-03-07 15:56:23 -05:00
Mitch Hayenga	b9a9d99b22	scons: Fixes uninitialized warnings issued by clang Small fixes to appease recent clang versions.	2014-03-07 15:56:23 -05:00
Stephan Diestelhorst	bef2086f5b	arm: Fix uninitialised warning with gcc 4.8 Small fix for a warning that prevents compilation with gcc 4.8.1 due to detecting that a variable might be uninitialised. The fix is to assign a safe default.	2014-03-07 15:56:23 -05:00
Ali Saidi	bf39a475fe	mem: Wakeup sleeping CPUs without caches on LLSC For systems without caches, the LLSC code does not get snoops for wake-ups. We add the LLSC code in the abstract memory to do the job for us.	2014-03-07 15:56:23 -05:00
Andreas Sandberg	f4a897d8e3	sim: Schedule the global sync event at curTick() + simQuantum The global synchronization event used to be scheduled at simQuantum. This prevented repeated entries into gem5 from Python as it can be scheduled in the past. This changeset ensures that the first global synchronization happens at curTick() + simQuantum instead.	2014-03-06 15:59:53 +01:00
Andreas Sandberg	be246cef62	x86: Setup correct TSL/TR segment attributes on INIT The TSL/LDT & TR/TSS segments didn't contain valid attributes. This caused problems when transfering the state into KVM where invalid state is a no-go. Fixup the attributes with values from AMD's architecture programmer's manual.	2014-03-03 14:44:57 +01:00
Andreas Sandberg	e7d230ede0	kvm: x86: Always assume segments to be usable When transferring segment registers into kvm, we need to find the value of the unusable bit. We used to assume that this could be inferred from the selector since segments are generally unusable if their selector is 0. This assumption breaks in some weird corner cases. Instead, we just assume that segments are always usable. This is what qemu does so it should work.	2014-03-03 14:34:33 +01:00
Andreas Sandberg	739cc0128b	kvm: Initialize signal handlers from startupThread() Signal handlers in KVM are controlled per thread and should be initialized from the thread that is going to execute the CPU. This changeset moves the initialization call from startup() to startupThread().	2014-03-03 14:31:39 +01:00
Nilay Vaish	5cd9dd29bd	ruby: message buffer: changes related to tracking push/pop times The last pop operation is now tracked as a Tick instead of in Cycles. This helps in avoiding use of the receiver's clock during the enqueue operation.	2014-03-01 23:59:58 -06:00
Nilay Vaish	67cd04b6fe	ruby: make the max_size variable of the MessageBuffer unsigned	2014-03-01 23:59:57 -06:00
Christopher Torng	919baa603d	cpu: Enable fast-forwarding for MIPS InOrderCPU and O3CPU A copyRegs() function is added to MIPS utilities to copy architectural state from the old CPU to the new CPU during fast-forwarding. This addition alone enables fast-forwarding for the o3 cpu model running MIPS. The patch also adds takeOverFrom() and drainResume() functions to the InOrderCPU to enable it to take over from another CPU. This change enables fast-forwarding for the inorder cpu model running MIPS, but not for Alpha. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-03-01 23:35:23 -06:00
Nilay Vaish	a533f3f983	ruby: profiler: statically allocate stats variable Couple of users observed segmentation fault when the simulator tries to register the statistical variable m_IncompleteTimes. It seems that there is some problem with the initialization of these variables when allocated in the constructor.	2014-03-01 23:35:21 -06:00
Nilay Vaish	7e27860ef4	ruby: route all packets through ruby port Currently, the interrupt controller in x86 is connected to the io bus directly. Therefore the packets between the io devices and the interrupt controller do not go through ruby. This patch changes ruby port so that these packets arrive at the ruby port first, which then routes them to their destination. Note that the patch does not make these packets go through the ruby network. That would happen in a subsequent patch.	2014-02-23 19:16:16 -06:00
Andreas Hansson	5755fff998	ruby: Simplify RubyPort flow control and routing This patch simplfies the retry logic in the RubyPort, avoiding redundant attributes, and enforcing more stringent checks on the interactions with the normal ports. The patch also simplifies the routing done by the RubyPort, using the port identifiers instead of a heavy-weight sender state. The patch also fixes a bug in the sending of responses from PIO ports. Previously these responses bypassed the queue in the queued port, and ignored the return value, potentially leading to response packets being lost. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-02-23 19:16:16 -06:00
Nilay Vaish	7572ab71b5	ruby: message buffer: refactor code Code in two of the functions was exactly the same. This patch moves this code to a new function which is called from the two functions mentioned initially.	2014-02-23 19:16:15 -06:00
Nilay Vaish	cde20fd476	ruby: remove few not required #includes	2014-02-23 19:16:15 -06:00
Nilay Vaish	82378f7301	ruby: slicc: remove unused COPY_HEAD functionality	2014-02-23 19:16:15 -06:00
Nilay Vaish	13ad07601b	ruby: protocols: remove unused action z_stall	2014-02-23 19:16:15 -06:00
Nilay Vaish	cd33f9bc42	ruby: network: move message buffers to base network class.	2014-02-21 08:02:05 -06:00
Nilay Vaish	bd8f954526	ruby: network: garnet: fixed: removes net_ptr from links	2014-02-21 08:02:04 -06:00
Nilay Vaish	307f53e164	ruby: cache: remove not required variable m_cache_name	2014-02-21 08:02:02 -06:00
Nilay Vaish	f8f8b7e5c2	ruby: network: garnet: fixed: removes next cycle functions At several places, there are functions that take a cycle value as input and performs some computation. Along with each such function, another function was being defined that simply added one more cycle to input and computed the same function. This patch removes this second copy of the function. Places where these functions were being called have been updated to use the original function with argument being current cycle + 1.	2014-02-20 17:28:01 -06:00
Nilay Vaish	896654746a	ruby: controller: slight code refactoring	2014-02-20 17:27:45 -06:00
Nilay Vaish	0ce8c25919	ruby: mesi three level: rename incorrectly named files Two files had been incorrectly named with a .cache suffix. --HG-- rename : src/mem/protocol/MESI_Three_Level-L0.cache => src/mem/protocol/MESI_Three_Level-L0cache.sm rename : src/mem/protocol/MESI_Three_Level-L1.cache => src/mem/protocol/MESI_Three_Level-L1cache.sm	2014-02-20 17:27:17 -06:00
Nilay Vaish	db5b3d37fe	ruby: network: removes unused code.	2014-02-20 17:27:07 -06:00
Nilay Vaish	dd5c72e5a7	ruby: slicc: slight code refactoring	2014-02-20 17:26:49 -06:00
Nilay Vaish	b312a41f21	ruby: message buffer: removes some unecessary functions.	2014-02-20 17:26:41 -06:00
Andreas Sandberg	0d6009e8dc	kvm: Add support for multi-system simulation The introduction of parallel event queues added most of the support needed to run multiple VMs (systems) within the same gem5 instance. This changeset fixes up signal delivery so that KVM's control signals are delivered to the thread that executes the CPU's event queue. Specifically: * Timers and counters are now initialized from a separate method (startupThread) that is scheduled as the first event in the thread-specific event queue. This ensures that they are initialized from the thread that is going to execute the CPUs event queue and enables signal delivery to the right thread when exiting from KVM. * The POSIX-timer-based KVM timer (used to force exits from KVM) has been updated to deliver signals to the thread that's executing KVM instead of the process (thread is undefined in that case). This assumes that the timer is instantiated from the thread that is going to execute the KVM vCPU. * Signal masking is now done using pthread_sigmask instead of sigprocmask. The behavior of the latter is undefined in threaded applications. * Since signal masks can be inherited, make sure to actively unmask the control signals when setting up the KVM signal mask. There are currently no facilities to multiplex between multiple KVM CPUs in the same event queue, we are therefore limited to configurations where there is only one KVM CPU per event queue. In practice, this means that multi-system configurations can be simulated, but not multiple CPUs in a shared-memory configuration.	2014-02-20 15:43:53 +01:00
Andreas Hansson	4b81585c49	mem: Fix bug in PhysicalMemory use of mmap and munmap This patch fixes a bug in how physical memory used to be mapped and unmapped. Previously we unmapped and re-mapped if restoring from a checkpoint. However, we never checked that the new mapping was actually the same, it was just magically working as the OS seems to fairly reliably give us the same chunk back. This patch fixes this issue by relying entirely on the mmap call in the constructor.	2014-02-18 05:51:01 -05:00
Andreas Hansson	f0ea79c41f	dev: Include basic devices in NULL ISA build This patch enbles use of the basic PIO devices as part of the NULL build. Although it might seem counter intuitive to have a PIO device without being able to execute a driver, this change enables us to break a device class hierarchy into an ISA-agnostic part, and an ISA-specific part, without requiring multiple-inheritance. The ISA-agnostic base class is a PIO device, but does not make use of the port.	2014-02-18 05:50:59 -05:00
Andreas Hansson	969b436243	mem: Filter cache snoops based on address ranges This patch adds a filter to the cache to drop snoop requests that are not for a range covered by the cache. This fixes an issue observed when multiple caches are placed in parallel, covering different address ranges. Without this patch, all the caches will forward the snoop upwards, when only one should do so.	2014-02-18 05:50:58 -05:00
Andreas Hansson	bf2f178f85	mem: Add a wrapped DRAMSim2 memory controller This patch adds DRAMSim2 as a memory controller by wrapping the external library and creating a sublass of AbstractMemory that bridges between the semantics of gem5 and the DRAMSim2 interface. The DRAMSim2 wrapper extracts the clock period from the config file. There is no way of extracting this information from DRAMSim2 itself, so we simply read the same config file and get it from there. To properly model the response queue, the wrapper keeps track of how many transactions are in the actual controller, and how many are stacking up waiting to be sent back as responses (in the wrapper). The latter requires us to move away from the queued port and manage the packets ourselves. This is due to DRAMSim2 not having any flow control on the response path. DRAMSim2 assumes that the transactions it is given are matching the burst size of the choosen memory. The wrapper checks to ensure the cache line size of the system matches the burst size of DRAMSim2 as there are currently no provisions to split the system requests. In theory we could allow a cache line size smaller than the burst size, but that would lead to inefficient use of the DRAM, so for not we fatal also in this case.	2014-02-18 05:50:53 -05:00
Andreas Hansson	c9cb492e1c	mem: Fix input to DPRINTF in CommMonitor Minor fix of the debug message parameters.	2014-02-18 05:50:51 -05:00
Andreas Sandberg	c52190a695	cpu: simple: Add support for using branch predictors This changesets adds branch predictor support to the BaseSimpleCPU. The simple CPUs normally don't need a branch predictor, however, there are at least two cases where it can be desirable: 1) A simple CPU can be used to warm the branch predictor of an O3 CPU before switching to the slower O3 model. 2) The simple CPU can be used as a quick way of evaluating/debugging new branch predictors since it exposes branch predictor statistics. Limitations: * Since the simple CPU doesn't speculate, only one instruction will be active in the branch predictor at a time (i.e., the branch predictor will never see speculative branches). * The outcome of a branch prediction does not affect the performance of the simple CPU.	2014-02-09 20:49:28 +01:00
Nilay Vaish	eb73a14fe2	base: calls abort() from fatal Currently fatal() ends the simulation in a normal fashion. This results in the call stack getting lost when using a debugger and it is not always possible to debug the simulation just from the information provided by the printed error message. Even though the error is likely due to a user's fault, the information available should not be thrown away. Hence, this patch to call abort() from fatal().	2014-02-06 16:30:13 -06:00
Nilay Vaish	bb0e9119e7	ruby: memory controller: use MemoryNode *	2014-02-06 16:30:12 -06:00
Andreas Sandberg	e76a37985f	x86: Fix x87 state transfer bug Changeset 7274310be1bb (isa: clean up register constants) increased the value of NumFloatRegs, which triggered a bug in X86ISA::copyRegs(). This bug is caused by the x87 stack being copied twice since register indexes past NUM_FLOATREGS are mapped into the x87 stack relative to the top of the stack, which is undefined when the copy takes place. This changeset updates the copyRegs() function to use access registers using the non-flattening interface, which guarantees that undesirable register folding does not happen.	2014-02-05 14:08:13 +01:00
Nikos Nikoleris	c6279f2d19	x86, kvm: Fix bug in the RFlags get and set functions The getRFlags and setRFlags utility functions were not updated correctly when condition registers were separated into their own register class. This lead to incorrect state transfer in calls from kvm into the simulator (e.g., m5 readfile ended up in an infinite loop) and when switching CPUs. This patch makes these utility functions use getCCReg and setCCReg instead of getIntReg and setIntReg which read and write the integer registers. Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>	2014-02-02 16:37:35 +01:00
Ola Jeppsson	7f16951451	unittest: Fix build errors Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-30 12:21:58 -06:00
Mitch Hayenga	96317d466e	mem: Add additional tolerance to stride prefetcher Forces the prefetcher to mispredict twice in a row before resetting the confidence of prefetching. This helps cases where a load PC strides by a constant factor, however it may operate on different arrays at times. Avoids the cost of retraining. Primarily helps with small iteration loops. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-29 23:21:26 -06:00
Mitch Hayenga	771c864bf4	mem: Allowed tagged instruction prefetching in stride prefetcher For systems with a tightly coupled L2, a stride-based prefetcher may observe access requests from both instruction and data L1 caches. However, the PC address of an instruction miss gives no relevant training information to the stride based prefetcher(there is no stride to train). In theses cases, its better if the L2 stride prefetcher simply reverted back to a simple N-block ahead prefetcher. This patch enables this option. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-29 23:21:26 -06:00
Mitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E)	95735e10e7	mem: prefetcher: add options, support for unaligned addresses This patch extends the classic prefetcher to work on non-block aligned addresses. Because the existing prefetchers in gem5 mask off the lower address bits of cache accesses, many predictable strides fail to be detected. For example, if a load were to stride by 48 bytes, with 64 byte cachelines, the current stride based prefetcher would see an access pattern of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern. This patch fixes this, by training the prefetcher on access and not masking off the lower address bits. It also adds the following configuration options: 1) Training/prefetching only on cache misses, 2) Training/prefetching only on data acceses, 3) Optionally tagging prefetches with a PC address. #3 allows prefetchers to train off of prefetch requests in systems with multiple cache levels and PC-based prefetchers present at multiple levels. It also effectively allows a pipelining of prefetch requests (like in POWER4) across multiple levels of cache hierarchy. Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7% for SPECFP (geomean).	2014-01-29 23:21:25 -06:00
Xiangyu Dong	32cc2ea8b9	cpu: fix bug when TrafficGen deschedules event Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-29 22:35:04 -06:00
Mitch Hayenga	b77ca57f8c	arm: Enable umask syscall in SE mode Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:51 -06:00
Mitch Hayenga	55a4ff5f04	base: Fix race condition in the socket listen function gem5 makes the incorrect assumption that by binding a socket, it effectively has allocated a port. Linux only allocates ports once you call listen on the given socket, not when you call bind. So even if the port was free when bind was called, another process (gem5 instance) could race in between the bind & listen calls and steal the port. In the current code, if the call to bind fails due to the port being in use (EADDRINUSE), gem5 retries for a different port. However if listen fails, gem5 just panics. The fix is testing the return value of listen and re-trying if it was due to EADDRINUSE. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:51 -06:00
Amin Farmahini	ffbdaa7cce	mem: Remove redundant findVictim() input argument The patch (1) removes the redundant writeback argument from findVictim() (2) fixes the description of access() function Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:50 -06:00
Amin Farmahini	575a73f4a1	mem: Fixes a bug in simple_dram write merging Fixes updating the value of size in the write merge function. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:49 -06:00
Nilay Vaish	bdee69d0b1	x86: use lfpimm instead of limm for fptan	2014-01-27 18:50:54 -06:00
Nilay Vaish	6a543b5134	x86: implements x87 add/sub instructions	2014-01-27 18:50:53 -06:00
Nilay Vaish	5be0b846b1	x86: implements fxch instruction.	2014-01-27 18:50:52 -06:00
Nilay Vaish	4eb3b1ed0b	x86: correct error in emms instruction.	2014-01-27 18:50:51 -06:00
ARM gem5 Developers	612f8f074f	arm: Add support for ARMv8 (AArch64 & AArch32) Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64 kernel you are restricted to AArch64 user-mode binaries. This will be addressed in a later patch. Note: Virtualization is only supported in AArch32 mode. This will also be fixed in a later patch. Contributors: Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation) Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation) Mbou Eyole (AArch64 NEON, validation) Ali Saidi (AArch64 Linux support, code integration, validation) Edmund Grimley-Evans (AArch64 FP) William Wang (AArch64 Linux support) Rene De Jong (AArch64 Linux support, performance opt.) Matt Horsnell (AArch64 MP, validation) Matt Evans (device models, code integration, validation) Chris Adeniyi-Jones (AArch64 syscall-emulation) Prakash Ramrakhyani (validation) Dam Sunwoo (validation) Chander Sudanthi (validation) Stephan Diestelhorst (validation) Andreas Hansson (code integration, performance opt.) Eric Van Hensbergen (performance opt.) Gabe Black	2014-01-24 15:29:34 -06:00
Andreas Hansson	cfc4a99982	arch: Make all register index flattening const This patch makes all the register index flattening methods const for all the ISAs. As part of this, readMiscRegNoEffect for ARM is also made const.	2014-01-24 15:29:30 -06:00
Geoffrey Blake	9633282fc8	checker: CheckerCPU handling of MiscRegs was incorrect The CheckerCPU model in pre-v8 code was not checking the updates to miscellaneous registers due to some methods for setting misc regs were not instrumented. The v8 patches exposed this by calling the instrumented misc reg update methods and then invoking the checker before the main CPU had updated its misc regs, leading to false positives about register mismatches. This patch fixes the non-instrumented misc reg update methods and places calls to the checker in the proper places in the O3 model.	2014-01-24 15:29:30 -06:00
Ali Saidi	7d0344704a	arch, cpu: Add support for flattening misc register indexes. With ARMv8 support the same misc register id results in accessing different registers depending on the current mode of the processor. This patch adds the same orthogonality to the misc register file as the others (int, float, cc). For all the othre ISAs this is currently a null-implementation. Additionally, a system variable is added to all the ISA objects.	2014-01-24 15:29:30 -06:00
Giacomo Gabrielli	3436de0c2a	cpu: Add support for Memory+Barrier instruction types in O3 cpu.	2014-01-24 15:29:30 -06:00
Ali Saidi	90b1775a8f	cpu: Add support for instructions that zero cache lines.	2014-01-24 15:29:30 -06:00
Ali Saidi	6bed6e0352	cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped. This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support.	2014-01-24 15:29:30 -06:00
Giacomo Gabrielli	d3444c6603	mem: Add flag to request if it was generated by a page table walk	2014-01-24 15:29:30 -06:00
Giacomo Gabrielli	aefe9cc624	mem: Add support for a security bit in the memory system This patch adds the basic building blocks required to support e.g. ARM TrustZone by discerning secure and non-secure memory accesses.	2014-01-24 15:29:30 -06:00
Chris Adeniyi-Jones	7f835a59f1	sim: Add openat/fstatat syscalls and fix mremap This patch adds support for the openat and fstatat syscalls and broadens the support for mremap to make it work on OS X.	2014-01-24 15:29:30 -06:00
Ali Saidi	904872a01a	mem: Remove explict cast from memhelper. Previously we were casting the result type to the the memory type which is incorrect for things like dual-memory operations which still return a single result.	2014-01-24 15:29:30 -06:00
Timothy M. Jones	427ceb57a9	Cache: Collect very basic stats on tag and data accesses Adds very basic statistics on the number of tag and data accesses within the cache, which is important for power modelling. For the tags, simply count the associativity of the cache each time. For the data, this depends on whether tags and data are accessed sequentially, which is given by a new parameter. In the parallel case, all data blocks are accessed each time, but with sequential accesses, a single data block is accessed only on a hit.	2014-01-24 15:29:30 -06:00
Dam Sunwoo	85e8779de7	mem: per-thread cache occupancy and per-block ages This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.	2014-01-24 15:29:30 -06:00
Matt Horsnell	739c6df94e	base: add support for probe points and common probes The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model. What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out your analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh\|cc] and src/cpu/SimpleTrace.py What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners. FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful. What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction) What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event. What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener. What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes. Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats. Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting. What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%).	2014-01-24 15:29:30 -06:00
Andreas Hansson	4de69821e6	sim: Expose the current voltage for each object as a stat	2014-01-24 15:29:30 -06:00
Andreas Hansson	1d85e914a6	sim: Expose the current clock period as a stat This patch adds observability to the clock period of the clock domains by including it as a stat. As a result of adding this, the regressions will be updated in a separate patch.	2014-01-24 15:29:30 -06:00
Matt Horsnell	ca89eba79e	mem: track per-request latencies and access depths in the cache hierarchy Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request.	2014-01-24 15:29:30 -06:00
Andreas Hansson	daa781d2db	config: Make the Clock a Tick parameter like Latency/Frequency This patch makes the Clock a TickParamValue just like Latency/Frequency. There is no longer any need to distinguish it (originally needed to support multiplication).	2014-01-24 15:29:29 -06:00
Andreas Hansson	f2b0b551cc	x86: Fix memory leak in table walker This patch fixes a memory leak in the table walker, by ensuring that the sender state is deleted again if the request packet cannot be successfully sent.	2014-01-24 15:29:29 -06:00
Andreas Hansson	7db542c0dd	cpu: Relax check on squashed non-speculative instructions This patch relaxes the check performed when squashing non-speculative instructions, as it caused problems with loads that were marked ready, and then stalled on a blocked cache. The assertion is now allowing memory references to be non-faulting.	2014-01-24 15:29:29 -06:00
Dam Sunwoo	f1cd6b1ba8	cpu: remove faulty simpoint basic block inst count assertion This patch removes an assertion in the simpoint profiling code that asserts that a previously-seen basic block has the exact same number of instructions executed as before. This can be false if the basic block generates aborts or takes interrupts at different locations within the basic block. The basic block profiling are not affected significantly as these events are rare in general.	2014-01-24 15:29:29 -06:00
Nilay Vaish	37433d91a3	ruby: remove unused label no_vector	2014-01-17 11:02:15 -06:00
Nilay Vaish	407f37e15f	ruby: move all statistics to stats.txt, eliminate ruby.stats	2014-01-10 16:19:47 -06:00
Nilay Vaish	cfe912a512	stats: add function for adding two histograms This patch adds a function to the HistStor class for adding two histograms. This functionality is required for Ruby. It also adds support for printing histograms in a single line.	2014-01-10 16:19:40 -06:00
Nilay Vaish	0387281e2a	ruby: fix bug introduced to revision 8523754f8885	2014-01-09 10:45:50 -06:00
Nilay Vaish	8559081648	ruby: slicc: remove variable 'addr' used in calls to doTransition This variable causes trouble if a variable of same name is declared in a protocol file. Hence it is being eliminated.	2014-01-08 04:26:25 -06:00
Nilay Vaish	4070b00875	ruby: add a three level MESI protocol. The first two levels (L0, L1) are private to the core, the third level (L2)is possibly shared. The protocol supports clustered designs. For example, one can have two sets of two cores. Each core has an L0 and L1 cache. There are two L2 controllers where each set accesses only one of the L2 controllers.	2014-01-04 00:03:34 -06:00
Nilay Vaish	bb6d7d402b	ruby: rename MESI_CMP_directory to MESI_Two_Level This is because the next patch introduces a three level hierarchy. --HG-- rename : build_opts/ALPHA_MESI_CMP_directory => build_opts/ALPHA_MESI_Two_Level rename : build_opts/X86_MESI_CMP_directory => build_opts/X86_MESI_Two_Level rename : configs/ruby/MESI_CMP_directory.py => configs/ruby/MESI_Two_Level.py rename : src/mem/protocol/MESI_CMP_directory-L1cache.sm => src/mem/protocol/MESI_Two_Level-L1cache.sm rename : src/mem/protocol/MESI_CMP_directory-L2cache.sm => src/mem/protocol/MESI_Two_Level-L2cache.sm rename : src/mem/protocol/MESI_CMP_directory-dir.sm => src/mem/protocol/MESI_Two_Level-dir.sm rename : src/mem/protocol/MESI_CMP_directory-dma.sm => src/mem/protocol/MESI_Two_Level-dma.sm rename : src/mem/protocol/MESI_CMP_directory-msg.sm => src/mem/protocol/MESI_Two_Level-msg.sm rename : src/mem/protocol/MESI_CMP_directory.slicc => src/mem/protocol/MESI_Two_Level.slicc rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/config.ini => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/config.ini rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/ruby.stats rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/simerr => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/simerr rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/simout => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/simout rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/stats.txt rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/system.pc.com_1.terminal => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/system.pc.com_1.terminal rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/config.ini => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/config.ini rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/ruby.stats rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/simerr => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/simerr rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/simout => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/simout rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/stats.txt rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/config.ini => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/config.ini rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/ruby.stats rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/simerr => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/simerr rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/simout => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/simout rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/stats.txt rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/config.ini => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/config.ini rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/ruby.stats rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/simerr => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/simerr rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/simout => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/simout rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/stats.txt rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/config.ini => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/config.ini rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/ruby.stats rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/simerr => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/simerr rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/simout => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/simout rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/stats.txt	2014-01-04 00:03:33 -06:00
Nilay Vaish	5b1804e3bd	ruby: add support for clusters A cluster over here means a set of controllers that can be accessed only by a certain set of cores. For example, consider a two level hierarchy. Assume there are 4 L1 controllers (private) and 2 L2 controllers. We can have two different hierarchies here: a. the address space is partitioned between the two L2 controllers. Each L1 controller accesses both the L2 controllers. In this case, each L1 controller is a cluster initself. b. both the L2 controllers can cache any address. An L1 controller has access to only one of the L2 controllers. In this case, each L2 controller along with the L1 controllers that access it, form a cluster. This patch allows for each controller to have a cluster ID, which is 0 by default. By setting the cluster ID properly, one can instantiate hierarchies with clusters. Note that the coherence protocol might have to be changed as well.	2014-01-04 00:03:31 -06:00
Nilay Vaish	9853ef6651	ruby: some small changes	2014-01-04 00:03:30 -06:00
Steve Reinhardt	d8c9b5431b	python: provide better error message for wrapped C++ methods If you successfully export a C++ SimObject method, but try to invoke it from Python before the C++ object is created, you get a confusing error that says the attribute does not exist, making you question whether you successfully exported the method at all. In reality, your only problem is that you're calling the method too soon. This patch enhances the error message to give you a better clue.	2014-01-03 17:08:43 -08:00
Steve Reinhardt	ba9ec669bc	python: don't die on assignment to cloned object Updating the SimObject topology of a cloned hierarchy is a little dangerous, in that cloning is a "deep copy" and the clone does not inherit SimObject updates the same way it would inherit scalar variable assignments. However, because of various SimObject-valued proxy parameters, like 'memories', 'clk_domain', and 'system', it turns out that there are a number of implicit topology changes that happen at instantiation, which means that these changes are impossible to avoid. So in order to make cloning systems useful, this error has to go. Changing it to a warning produces a lot of noise, so it seems best just to delete it.	2014-01-03 17:08:42 -08:00
Christopher Torng	b4b03a60b1	sim: Add support for dynamic frequency scaling This patch provides support for DFS by having ClockedObjects register themselves with their clock domain at construction time in a member list. Using this list, a clock domain can update each member's tick to the curTick() before modifying the clock period. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2013-12-29 19:29:45 -06:00
Christopher Torng	903b442228	mips: Floating point convert bug fix In mips architecture, floating point convert instructions use the FloatConvertOp format defined in src/arch/mips/isa/formats/fp.isa. The type of the operands in the ISA description file (_sw for signed word, or _sf for signed float, etc.) is used to create a type for the operand in C++. Then the operand is converted using the fpConvert() function in src/arch/mips/utility.cc. If we are converting from a word to a float, and we want to convert 0xffffffff, we expect -1 to be passed into fpConvert(). Instead, we see MAX_INT passed in. Then fpConvert() converts _val_ to MAX_INT in single-precision floating point, and we get the wrong value. To fix it, the signs of the convert operands are being changed from unsigned to signed in the MIPS ISA description. Then, the FloatConvertOp format is being changed to insert a int32_t into the C++ code instead of a uint32_t. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2013-12-29 19:29:45 -06:00
Nilay Vaish	d71311b1cf	ruby: fix bugs in mesi cmp directory protocol This patch fixes couple of bugs in the L2 controller of the mesi cmp directory protocol. 1. The state MT_I was transitioning to NP on receiving a clean writeback from the L1 controller. This patch makes it inform the directory controller about the writeback. 2. The L2 controller was sending the dirty bit to the L1 controller and the L2 controller used writeback from the L1 controller to update the dirty bit unconditionally. Now, the L1 controller always assumes that the incoming data is clean. The L2 controller updates the dirty bit only when the L1 controller writes to the block. 3. Certain unused functions and events are being removed.	2013-12-26 15:18:55 -06:00
Nilay Vaish	fc53f9ffcc	ruby: slicc: replace max_in_port_rank with number of inports This patch replaces max_in_port_rank with the number of inports. The use of max_in_port_rank was causing spurious re-builds and incorrect initialization of variables in ruby related regression tests. This was due to the variable value being used across threads while compiling when it was not meant to be. Since the number of inports is state machine specific value, this problem should get solved.	2013-12-20 20:34:04 -06:00
Nilay Vaish	30b259a31e	ruby: declare variables to be unsigned in Address.hh	2013-12-20 20:34:03 -06:00
Nilay Vaish	f5b52a265a	ruby: mesi: remove owner and sharer fields from directory tags The directory controller should not have the sharer field since there is only one level 2 cache. Anyway the field was not in use. The owner field was being used to track the l2 cache version (in case of distributed l2) that has the cache block under consideration. The information is not required since the version of the level 2 cache can be obtained from a subset of the address bits.	2013-12-20 20:34:03 -06:00
Nilay Vaish	50d250f514	sim: reset stats after startup Currently statistics are reset after the initial / checkpoint state has been loaded. But ruby does some checkpoint processing in its startup() function. So the stats need to be reset after the startup() function has been called. This patch moves the class to stats.reset() to achieve this change in functionality.	2013-12-03 10:51:40 -06:00
Nilay Vaish	5800e83223	cpu: call BaseCPU startup() function in o3 cpu	2013-12-03 10:36:04 -06:00
Andreas Sandberg	c033ead992	base: Fix race in PollQueue and remove SIGALRM workaround There is a race between enabling asynchronous IO for a file descriptor and IO events happening on that descriptor. A SIGIO won't normally be delivered if an event is pending when asynchronous IO is enabled. Instead, the signal will be raised the next time there is an event on the FD. This changeset simulates a SIGIO by setting the async_io flag when setting up asynchronous IO for an FD. This causes the main event loop to poll all file descriptors to check for pending IO. As a consequence of this, the old SIGALRM hack should no longer be needed and is therefore removed.	2013-11-29 14:36:10 +01:00
Andreas Sandberg	9c57d5b5a6	base: Clean up signal handling The PollEvent class dynamically installs a SIGIO and SIGALRM handler when a file handler is registered. Most signal handlers currently get registered in the initSignals() function. This changeset moves the SIGIO/SIGALRM handlers to initSignals() to live with the other signal handlers. The original code installs SIGIO and SIGALRM with the SA_RESTART option to prevent syscalls from returning EINTR. This changeset consistently uses this flag for all signal handlers to ensure that other signals that trigger asynchronous behavior (e.g., statistics dumping) do not cause undesirable EINTR returns.	2013-11-29 14:35:36 +01:00
Nilay Vaish	9fb93e5cd2	sim: correct ticksToCycles() function.	2013-11-26 17:05:22 -06:00
Andreas Sandberg	4b8be6a90b	kvm: Set the perf exclude_host attribute if available The performance counting framework in Linux 3.2 and onwards supports an attribute to exclude events generated by the host when running KVM. Setting this attribute allows us to get more reliable measurements of the guest machine. For example, on a highly loaded system, the instruction counts from the guest can be severely distorted by the host kernel (e.g., by page fault handlers). This changeset introduces a check for the attribute and enables it in the KVM CPU if present.	2013-10-15 10:09:23 +02:00
Christian Menard	d4f205ea2f	x86: Implementation of Int3 and Int_Ib in long mode This is an implementation of the x86 int3 and int immediate instructions for long mode according to 'AMD64 Programmers Manual Volume 3'.	2013-11-26 17:51:07 +01:00
Andreas Sandberg	e5d63d0535	kvm: Remove the unused hostFreq member from BaseKvmCPU	2013-11-26 17:40:58 +01:00
Steve Reinhardt ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E%2C%20Ali%20Saidi%20%3CAli.Saidi%40ARM.com%3E)	de366a16f1	sim: simulate with multiple threads and event queues This patch adds support for simulating with multiple threads, each of which operates on an event queue. Each sim object specifies which eventq is would like to be on. A custom barrier implementation is being added using which eventqs synchronize. The patch was tested in two different configurations: 1. ruby_network_test.py: in this simulation L1 cache controllers receive requests from the cpu. The requests are replied to immediately without any communication taking place with any other level. 2. twosys-tsunami-simple-atomic: this configuration simulates a client-server system which are connected by an ethernet link. We still lack the ability to communicate using message buffers or ports. But other things like simulation start and end, synchronizing after every quantum are working. Committed by: Nilay Vaish	2013-11-25 11:21:00 -06:00
Anthony Gutierrez	8a53da22c2	cpu: allow the fetch buffer to be smaller than a cache line the current implementation of the fetch buffer in the o3 cpu is only allowed to be the size of a cache line. some architectures, e.g., ARM, have fetch buffers smaller than a cache line, see slide 22 at: http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf this patch allows the fetch buffer to be set to values smaller than a cache line.	2013-11-15 13:21:15 -05:00
Andreas Hansson	f028da7af7	cpu: Fix Checker register index use This patch fixes an issue in the checker CPU register indexing. The code will not even compile using LTO as deep inlining causes the used index to be outside the array bounds.	2013-11-15 03:47:10 -05:00
Steve Reinhardt	a2c21d47a8	tests: suppress output on switcheroo tests The output from the switcheroo tests is voluminous and (because it includes timestamps) highly sensitive to minor changes, leading to extremely large updates to the reference outputs. This patch addresses this problem by suppressing output from the tests. An internal parameter can be set to enable the output. Wiring that up to a command-line flag (perhaps even the rudimantary -v/-q options in m5/main.py) is left for future work.	2013-11-14 15:03:42 -08:00
Anthony Gutierrez	99d6c3b7e0	sim: fix event priority name for debug-start option	2013-11-12 11:46:48 -05:00
Andreas Hansson	460cc77d6d	mem: Fixes for DRAM stats accounting This patch fixes a number of stats accounting issues in the DRAM controller. Most importantly, it separates the system interface and DRAM interface so that it is clearer what the actual DRAM bandwidth (and consequently utilisation) is.	2013-11-01 11:56:31 -04:00
Andreas Hansson	ce93982cc6	mem: Fix the LPDDR3 page size This patch corrects the LPDDR3 page size, which was set too low.	2013-11-01 11:56:30 -04:00
Neha Agarwal	5c486908d7	mem: Adding stats for DRAM power calculation This patch adds stats which are used for offline power calculation from the 'Micron Power Calculator' spreadsheet.	2013-11-01 11:56:28 -04:00
Neha Agarwal	77fce1ce0e	mem: Unify request selection for read and write queues This patch unifies the request selection across read and write queues for FR-FCFS scheduling policy. It also fixes the request selection code to prioritize the row hits present in the request queues over the selection based on earliest bank availability.	2013-11-01 11:56:27 -04:00
Andreas Hansson	bb572663cf	mem: Add a simple adaptive version of the open-page policy This patch adds a basic adaptive version of the open-page policy that guides the decision to keep open or close by looking at the contents of the controller queues. If no row hits are found, and bank conflicts are present, then the row is closed by means of an auto precharge. This is a well-known technique that should improve performance in most use-cases.	2013-11-01 11:56:26 -04:00
Neha Agarwal	da6fd72f62	mem: Just-in-time write scheduling in DRAM controller This patch removes the untimed while loop in the write scheduling mechanism and now schedule commands taking into account the minimum timing constraint. It also introduces an optimization to track write queue size and switch from writes to reads if the number of write requests fall below write low threshold.	2013-11-01 11:56:25 -04:00
Andreas Hansson	ee6b41a1e4	mem: Add tRRD as a timing parameter for the DRAM controller This patch adds the tRRD parameter to the DRAM controller. With the recent addition of the actAllowedAt member for each bank, this addition is trivial.	2013-11-01 11:56:24 -04:00
Andreas Hansson	491d3a77cf	mem: Less conservative tRAS in DRAM configurations This patch changes the default values of the tRAS timing parameter to be less conservative, and closer in line with existing parts.	2013-11-01 11:56:23 -04:00
Ani Udipi	8bc855fa15	mem: Make tXAW enforcement less conservative and per rank This patch changes the tXAW constraint so that it is enforced per rank rather than globally for all ranks in the channel. It also avoids using the bank freeAt to enforce the activation limit, as doing so also precludes performing any column or row command to the DRAM. Instead the patch introduces a new variable actAllowedAt for the banks and use this to track when a potential activation can occur.	2013-11-01 11:56:22 -04:00
Neha Agarwal	7645c8e611	mem: Fix for 100% write threshold in DRAM controller This patch fixes the controller when a write threshold of 100% is used. Earlier for 100% write threshold no data is written to memory as writes never get triggered since this corner case is not considered.	2013-11-01 11:56:21 -04:00
Andreas Hansson	10e8978ec0	mem: Pick the next DRAM request based on bank availability This patch changes the FCFS bit of FR-FCFS such that requests that target the earliest available bank are picked first (as suggested in the original work on FR-FCFS by Rixner et al). To accommodate this we add functionality to identify a bank through a one-dimensional identifier (bank id). The member names of the DRAMPacket are also update to match the style guide.	2013-11-01 11:56:20 -04:00
Ani Udipi	ea76f97576	mem: Use the same timing calculation for DRAM read and write This patch simplifies the DRAM model by re-using the function that computes the busy and access time for both reads and writes.	2013-11-01 11:56:19 -04:00
Ani Udipi	655bf86828	mem: Fix DRAM bank occupancy for streaming access This patch fixes an issue that allowed more than 100% bus utilisation in certain cases.	2013-11-01 11:56:18 -04:00
Ani Udipi	be62a142cf	mem: Schedule time for DRAM event taking tRAS into account This patch changes the time the controller is woken up to take the next scheduling decisions. tRAS is now handled in estimateLatency and doDRAMAccess and we do not need to worry about it at scheduling time. The earliest we need to wake up is to do a pre-charge, row access and column access before the bus becomes free for use.	2013-11-01 11:56:17 -04:00
Ani Udipi	d4cf009b95	mem: Add tRAS parameter to the DRAM controller model This patch adds an explicit tRAS parameter to the DRAM controller model. Previously tRAS was, rather conservatively, assumed to be tRCD + tCL + tRP. The default values for tRAS are chosen to match the previous behaviour and will be updated later.	2013-11-01 11:56:16 -04:00
Andreas Hansson	c9a8b7b147	sim: Clarify the difference between tracing and debugging This patch changes the name the command-line options related to debug output to all start with "debug" rather than being a mix of that and "trace". It also makes it clear that the breakpoint time is specified in ticks and not in cycles.	2013-11-01 11:56:13 -04:00
Chander Sudanthi	3e6da89419	ARM: add support for TEEHBR access Thumb2 ARM kernels may access the TEEHBR via thumbee_notifier in arch/arm/kernel/thumbee.c. The Linux kernel code just seems to be saving and restoring the register. This patch adds support for the TEEHBR cp14 register. Note, this may be a special case when restoring from an image that was run on a system that supports ThumbEE.	2013-10-31 13:41:13 -05:00
Matt Evans	d17529b046	dev: Add 'OSC' oscillator sys control reg support to VersatileExpress The VE motherboard provides a set of system control registers through which various motherboard and coretile registers are accessed. Voltage regulators and oscillator (DLL/PLL) config are examples. These registers must be impleted to boot Linux 3.9+ kernels.	2013-10-31 13:41:13 -05:00
Geoffrey Blake	c32fbb7c00	dev: Add support for MSI-X and Capability Lists for ARM and PCI devices This patch adds the registers and fields to the PCI device to support Capability lists and to support MSI-X in the GIC.	2013-10-31 13:41:13 -05:00
Geoffrey Blake	be4aa2b6ba	dev: Fix race conditions in IDE device on newer kernels Newer linux kernels and distros exercise more functionality in the IDE device than previously, exposing 2 races. The first race is the handling of aborted DMA commands would immediately report the device is ready back to the kernel and cause already in flight commands to assert the simulator when they returned and discovered an inconsitent device state. The second race was due to the Status register not being handled correctly, the interrupt status bit would get stuck at 1 and the driver eventually views this as a bad state and logs the condition to the terminal. This patch fixes these two conditions by making the device handle aborted commands gracefully and properly handles clearing the interrupt status bit in the Status register.	2013-10-31 13:41:13 -05:00
Geoffrey Blake	fb0496498d	base: Add support for ipv6 into inet.hh/inet.cc	2013-10-31 13:41:13 -05:00

... 8 9 10 11 12 ...

6952 commits