sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Mitch Hayenga	6cb58b2bd2	mem: Add parameter to reserve MSHR entries for demand access Adds a new parameter that reserves some number of MSHR entries for demand accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand requests from the CPU to stall.	2014-12-23 09:31:18 -05:00
Curtis Dunham	4d88978913	arm: Add stats to table walker This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion	2014-12-23 09:31:18 -05:00
Andreas Hansson	59460b91f3	config: Expose the DRAM ranks as a command-line option This patch gives the user direct influence over the number of DRAM ranks to make it easier to tune the memory density without affecting the bandwidth (previously the only means of scaling the device count was through the number of channels). The patch also adds some basic sanity checks to ensure that the number of ranks is a power of two (since we rely on bit slices in the address decoding).	2014-12-23 09:31:18 -05:00
Andreas Hansson	2f7baf9dbe	mem: Ensure DRAM controller is idle when in atomic mode This patch addresses an issue seen with the KVM CPU where the refresh events scheduled by the DRAM controller forces the simulator to switch out of the KVM mode, thus killing performance. The current patch works around the fact that we currently have no proper API to inform a SimObject of the mode switches. Instead we rely on drainResume being called after any switch, and cache the previous mode locally to be able to decide on appropriate actions. The switcheroo regression require a minor stats bump as a result.	2014-12-23 09:31:18 -05:00
Omar Naji	381d1da791	mem: Add rank-wise refresh to the DRAM controller This patch adds rank-wise refresh to the controller, as opposed to the channel-wide refresh currently in place. In essence each rank can be refreshed independently, and for this to be possible the controller is extended with a state machine per rank. Without this patch the data bus is always idle during a refresh, as all the ranks are refreshing at the same time. With the rank-wise refresh it is possible to use one rank while another one is refreshing, and thus the data bus can be kept busy. The patch introduces a Rank class to encapsulate the state per rank, and also shifts all the relevant banks, activation tracking etc to the rank. The arbitration is also updated to consider the state of the rank.	2014-12-23 09:31:18 -05:00
Omar Naji	152c02354e	mem: Fix a bug in the DRAM controller arbitration Fix a minor issue that affects multi-rank systems.	2014-12-23 09:31:18 -05:00
Andreas Hansson	e76e8e28a3	tests: Add a regression for the stack distance calculator Re-use the existing traffic generator regression, and enable the stack distance calculation in the comm monitor, along with the verification stack. The traffic generator config is also tuned to not increase the run-time too much (and actually have some address re-use).	2014-12-23 09:31:18 -05:00
Kanishk Sugand	7a25b1a0e0	mem: Add stack distance statistics to the CommMonitor This patch adds the stack distance calculator to the CommMonitor. The stats are disabled by default.	2014-12-23 09:31:18 -05:00
Kanishk Sugand	888975b29d	mem: Add a stack distance calculator This patch adds a stand-alone stack distance calculator. The stack distance calculator is a passive SimObject that observes the addresses passed to it. It calculates stack distances (LRU Distances) of incoming addresses based on the partial sum hierarchy tree algorithm described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043. For each transaction a hashtable look-up is performed. At every non-unique transaction the tree is traversed from the leaf at the returned index to the root, the old node is deleted from the tree, and the sums (to the right) are collected and decremented. The collected sum represets the stack distance of the found node. At every unique transaction the stack distance is returned as numeric_limits<uint64>::max(). In addition to the basic stack distance calculation, a feature to mark an old node in the tree is added. This is useful if it is required to see the reuse pattern. For example, Writebacks to the lower level (e.g. membus from L2), can be marked instead of being removed from the stack (isMarked flag of Node set to True). And then later if this same address is accessed (by L1), the value of the isMarked flag would be True. This gives some insight on how the Writeback policy of the lower level affect the read/write accesses in an application. Debugging is enabled by setting the verify flag to true. Debugging is implemented using a dummy stack that behaves in a naive way, using STL vectors. Note that this has a large impact on run time.	2014-12-23 09:31:18 -05:00
Marco Elver	177682ead4	config: Add --memchecker option This patch adds the --memchecker option, to denote that a MemChecker should be instantiated for the system. The exact usage of the MemChecker depends on the system configuration. For now CacheConfig.py makes use of the option, adding MemCheckerMonitor instances between CPUs and D-Caches. Note, however, that currently this only provides limited checking on a running system; other parts of the system, such as I/O devices are not monitored, and may cause warnings to be issued by the monitor.	2014-12-23 09:31:18 -05:00
Marco Elver	dd0f3943e2	mem: Add MemChecker and MemCheckerMonitor This patch adds the MemChecker and MemCheckerMonitor classes. While MemChecker can be integrated anywhere in the system and is independent, the most convenient usage is through the MemCheckerMonitor -- this however, puts limitations on where the MemChecker is able to observe read/write transactions.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	184fefbb3b	arm: Raise an alignment fault if a PC has illegal alignment We currently don't handle unaligned PCs correctly. There is one check for unaligned PCs in the TLB when running in aarch64 mode, but this check does not cover cases where the CPU does not do a TLB lookup when decoding an instruction (e.g., a branch stays within the same cache line). Additionally, the Decoder class sometimes throws an assertion for unaligned PCs which breaks speculation. This changeset introduces a decoder fault bit field in the ExtMachInst structure. This field can be used to signal a decoder failure. If set, the decoder generates an internal gem5fault instruction instead of a normal instruction. This instruction in turns either panics (fault type PANIC), returns an PCAlignmentFault (fault type UNALIGNED, aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32). The patch causes minor changes to the realview64 regressions, and a stats bump will follow.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	b33812ba43	arm: Clean up and document decoder API This changeset adds more documentation to the ArmISA::Decoder class and restructures it slightly to make API groups more obvious.	2014-12-23 09:31:17 -05:00
Andreas Sandberg	070b4a81db	arm: Add support for filtering in the PMU This patch adds support for filtering events in the PMU. In order to do so, it updates the ISADevice base class to forward an ISA pointer to ISA devices. This enables such devices to access the MiscReg file to determine the current execution level.	2014-12-23 09:31:17 -05:00
Dam Sunwoo	809134a2b1	config: Add options to take/resume from SimPoint checkpoints More documentation at http://gem5.org/Simpoints Steps to profile, generate, and use SimPoints with gem5: 1. To profile workload and generate SimPoint BBV file, use the following option: --simpoint-profile --simpoint-interval <interval length> Requires single Atomic CPU and fastmem. <interval length> is in number of instructions. 2. Generate SimPoint analysis using SimPoint 3.2 from UCSD. (SimPoint 3.2 not included with this flow.) 3. To take gem5 checkpoints based on SimPoint analysis, use the following option: --take-simpoint-checkpoint=<simpoint file path>,<weight file path>,<interval length>,<warmup length> <simpoint file> and <weight file> is generated by SimPoint analysis tool from UCSD. SimPoint 3.2 format expected. <interval length> and <warmup length> are in number of instructions. 4. To resume from gem5 SimPoint checkpoints, use the following option: --restore-simpoint-checkpoint -r <N> --checkpoint-dir <simpoint checkpoint path> <N> is (SimPoint index + 1). E.g., "-r 1" will resume from SimPoint #0.	2014-12-23 09:31:17 -05:00
Gabe Black	7e34bae813	scons: Make the USE_KVM variable available in C++. We need it to determine whether we should expect KVM related parameters exist in the cirrus graphics device.	2014-12-22 16:49:24 -08:00
Nilay Vaish	8d9ce18c2b	Added tag stable_2014_12_14 for changeset bdb307e8be54	2014-12-14 16:21:04 -06:00
Gabe Black	70eb68beae	Let other objects set up memory like regions in a KVM VM.	2014-12-09 21:53:44 -08:00
Andreas Sandberg	9b7578d8c7	arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0 The aarch64 system register decoder is currently not decoding PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the decoder so that they are decoded using the values in table C5-6 in ARM DDI 0478A.c.	2014-12-08 04:49:53 -05:00
Andreas Sandberg	6a9fbd295d	dev: Add response sanity checks in PioPort Add an assert in the PioPort that checks if a response packet from a device has the right flags set before passing it to them rest of the memory system.	2014-12-08 04:49:52 -05:00
Andreas Sandberg	1ccc4e0e21	dev: Correctly transform packets into responses The VirtIO devices didn't correctly set the response flags in memory packets. This changeset adds the required Packet::makeResponse() calls.	2014-12-08 04:49:51 -05:00
Gabe Black	4a8a0a0798	misc: Generalize GDB single stepping. The new single stepping implementation for x86 doesn't rely on any ISA specific properties or functionality. This change pulls out the per ISA implementation of those functions and promotes the X86 implementation to the base class. One drawback of that implementation is that the CPU might stop on an instruction twice if it's affected by both breakpoints and single stepping. While that might be a little surprising, it's harmless and would only happen under somewhat unlikely circumstances.	2014-12-05 22:37:03 -08:00
Gabe Black	fb07d43b1a	x86: Implement a remote GDB stub. This stub should allow remote debugging of 32 bit and 64 bit targets. Single stepping seems to work, as do breakpoints. If both breakpoints and single stepping affect an instruction, gdb will stop at the instruction twice before continuing. That's a little surprising, but is generally harmless.	2014-12-05 22:36:16 -08:00
Gabe Black	16c9b41616	misc: Add some utility functions for schedule inst commit events. These can be used to simplify the implementation of single step in derived classes.	2014-12-05 22:35:47 -08:00
Gabe Black	cddf988bfd	misc: Rename the GDB "Event" event class to InputEvent. The "Event" name is the same as the base event class. That's a bit confusing, and makes it a little awkward to add other event types.	2014-12-05 22:34:42 -08:00
Gabe Black	f9f46b8fa9	sim: Ensure GDB interrupts the simulation at an instruction boundary. Use the comInstEventQueue to ensure GDB interrupts the simulation at an instruction boundary and not in the middle of a macroop, memory access, etc.	2014-12-05 01:51:49 -08:00
Gabe Black	bacbb8ecbc	cpu: Only check for PC events on instruction boundaries. Only the instruction address is actually checked, so there's no need to check repeatedly while we're working through the microops of a macroop and that's not changing.	2014-12-05 01:47:35 -08:00
Gabe Black	fe48c0a32b	misc: Make the GDB register cache accessible in various sized chunks. Not all ISAs have 64 bit sized registers, so it's not always very convenient to access the GDB register cache in 64 bit sized chunks. This change makes it accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations were working around that limitation by bundling and unbundling 32 bit values into 64 bit values. That code has been removed.	2014-12-05 01:44:24 -08:00
Gabe Black	7540656fc5	config: Add two options for setting the kernel command line. Both options accept template which will, through python string formatting, have "mem", "disk", and "script" values substituted in from the mdesc. Additional values can be used on a case by case basis by passing them as keyword arguments to the fillInCmdLine function. That makes it possible to have specialized parameters for a particular ISA, for instance. The first option lets you specify the template directly, and the other lets you specify a file which has the template in it.	2014-12-04 16:42:07 -08:00
Gabe Black	22aaa5867f	x86: Rework opcode parsing to support 3 byte opcodes properly. Instead of counting the number of opcode bytes in an instruction and recording each byte before the actual opcode, we can represent the path we took to get to the actual opcode byte by using a type code. That has a couple of advantages. First, we can disambiguate the properties of opcodes of the same length which have different properties. Second, it reduces the amount of data stored in an ExtMachInst, making them slightly easier/faster to create and process. This also adds some flexibility as far as how different types of opcodes are handled, which might come in handy if we decide to support VEX or XOP instructions. This change also adds tables to support properly decoding 3 byte opcodes. Before we would fall off the end of some arrays, on top of the ambiguity described above. This change doesn't measureably affect performance on the twolf benchmark. --HG-- rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa	2014-12-04 15:53:54 -08:00
Gabe Black	3069c28a02	arch: Allow named constants as decode case values. The values in a "bitfield" or in an ExtMachInst structure member may not be a literal value, it might select from an arbitrary collection of options. Instead of using the raw value of those constants in the decoder, it's easier to tell what's going on if they can be referred to as a symbolic constant/enum. To support that, the ISA description language is extended slightly so that in addition to integer literals, the case value for decode blobs can also be a string literal. It's up to the ISA author to ensure that the string evaluates to a legal constant value when interpretted as C++.	2014-12-04 15:52:48 -08:00
Nilay Vaish	cca1608bd5	config: ruby: mi protocol: correct master slave setting for dma In the MI protocol, the master slave connection between the dma controller and network was being set incorrectly. This patch corrects it.	2014-12-04 08:59:44 -06:00
Gabe Black	d67cf81f5d	x86: Clean up style in process.cc.	2014-12-02 22:01:51 -08:00
Gabe Black	2d9dae01fb	sim: Make it possible to override the breakpoint length check. The check which makes sure the length of the breakpoint being written is the same as a MachInst is only correct on fixed instruction width ISAs. Instead of incorrectly applying that check to all ISAs, this change makes that the default check and lets ISA specific GDB classes override it.	2014-12-03 03:27:19 -08:00
Gabe Black	b7dc4ba516	config: Get rid of some extra spaces around default arguments.	2014-12-03 03:11:00 -08:00
Gabe Black	ecec8cde63	ide: Accept the IDLE (0xe3) ATA command. This command is supposed to set up a timer which will put the drive into a standby mode if it isn't sent a command within a given time out. Since most of the timeouts are generally significantly longer than a simulation would run anyway, and we don't have an implementation for standby mode to begin with, we can accept the command, do nothing, and report success.	2014-12-03 03:07:35 -08:00
Gabe Black	bce58726f3	dev: Support translating left and right ALT keys. This is used primarily for VNC.	2014-12-03 03:06:03 -08:00
Andreas Hansson	6489598fb4	stats: Bump stats for fixes, mostly TLB and WriteInvalidate	2014-12-02 06:08:25 -05:00
Andreas Hansson	966c3f4bc5	scons: Ensure dictionary iteration is sorted by key This patch adds sorting based on the SimObject name or parameter name for all situations where we iterate over dictionaries. This should ensure a deterministic and consistent order across the host systems and hopefully avoid regression results differing across python versions.	2014-12-02 06:08:22 -05:00
Curtis Dunham	5d22250845	mem: Support WriteInvalidate (again) This patch takes a clean-slate approach to providing WriteInvalidate (write streaming, full cache line writes without first reading) support. Unlike the prior attempt, which took an aggressive approach of directly writing into the cache before handling the coherence actions, this approach follows the existing cache flows as closely as possible.	2014-12-02 06:08:19 -05:00
Curtis Dunham	7ca27dd3cc	mem: Remove WriteInvalidate support Prepare for a different implementation following in the next patch	2014-12-02 06:08:17 -05:00
Andrew Bardsley	df37cad0fd	cpu: Fix retries on barrier/store in Minor's store buffer This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling. LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed. For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared. Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry.	2014-12-02 06:08:15 -05:00
Andrew Bardsley	98f3e7a310	cpu: Fix memoryIssueLimit checking in Minor This patch fixes the checking of the number of memory instructions issued per cycles in the Minor CPU.	2014-12-02 06:08:13 -05:00
Andrew Bardsley	3cd0b1f6a6	arm: Fix TLB ignoring faults when table walking This patch fixes a case where the Minor CPU can deadlock due to the lack of a response to TLB request because of a bug in fault handling in the ARM table walker. TableWalker::processWalkWrapper is the scheduler-called wrapper which handles deferred walks which calls to TableWalker::wait cannot immediately process. The handling of faults generated by processWalk{AArch64,LPAE,} calls in those two functions is is different. processWalkWrapper ignores fault returns from processWalk... which can lead to ::finish not being called on a translation. This fix provides fault handling in processWalkWrapper similar to that found in the leaf functions which BaseTLB::Translation::finish.	2014-12-02 06:08:11 -05:00
Andrew Bardsley	e5e5b80690	config: Fix to SystemC example's event handling This patch fixes checkpoint restore in the SystemC hosting example by handling early PollEvent events correctly before any EventQueue events are posted. The SystemC event queue handler (SCEventQueue) reports an error if the event loop is entered with no Events posted. It is possible for this to happen after instantiate due to PollEvent events. This patch separates out `external' events into a different handler in sc_module.cc to prevent the error from occurring. This fix also improves the event handling of asynchronous events by: 1) Making asynchronous events 'catch up' gem5 time to SystemC time to avoid the appearance that events have been lost while servicing an asynchronous event that schedules an event loop exit event 2) Add an in_simulate data member to Module to allow the event loop to check whether events should be processed or deferred until the next time Module::simulate is entered 3) Cancel pending events around the entry/exit of the event loop in Module::simulate 4) Moving the state initialisation of the example entirely into run to correct a problem with early events in checkpoint restore. It is still possible to schedule asynchronous events (and talk PollQueue actions) while simulate is not running. This behaviour may stil cause some problems.	2014-12-02 06:08:09 -05:00
Andrew Bardsley	05bba75cdc	config: SystemC Gem5Control top level additions This patch cleans up a few style issues and adds a few capabilities to the SystemC top level 'Gem5Control/Gem5System' mechanism. These include: 1) A space to store/retrieve a version string for a model 2) A mechanism for registering functions to be called at the end of elaboration to perform simulation setup tasks in SystemC 3) Adding setGDBRemotePort to the Gem5Control 4) Changing the sc_set_time_resolution behaviour to instead check that the SystemC time resolution is already acceptable	2014-12-02 06:08:06 -05:00
Andreas Hansson	726f626e87	stats: Bump stats for o3 LSQ changes	2014-12-02 06:08:05 -05:00
Marco Elver	9649395f85	cpu, o3: Ignored invalidate causing same-address load reordering In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.	2014-12-02 06:08:03 -05:00
Andreas Hansson	74bbe20141	cpu: Always mask the snoop address when performing lock check Ensure the snoop address check is always using a cache-block aligned address. This patch updates Alpha and Mips to match the other ISAs.	2014-12-02 06:08:00 -05:00
Stephan Diestelhorst	810349a8a7	cpu: Move packet deallocation to recvTimingResp in the O3 CPU Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.	2014-12-02 06:07:58 -05:00

... 7 8 9 10 11 ...

11023 commits