sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Andreas Hansson	bf2f178f85	mem: Add a wrapped DRAMSim2 memory controller This patch adds DRAMSim2 as a memory controller by wrapping the external library and creating a sublass of AbstractMemory that bridges between the semantics of gem5 and the DRAMSim2 interface. The DRAMSim2 wrapper extracts the clock period from the config file. There is no way of extracting this information from DRAMSim2 itself, so we simply read the same config file and get it from there. To properly model the response queue, the wrapper keeps track of how many transactions are in the actual controller, and how many are stacking up waiting to be sent back as responses (in the wrapper). The latter requires us to move away from the queued port and manage the packets ourselves. This is due to DRAMSim2 not having any flow control on the response path. DRAMSim2 assumes that the transactions it is given are matching the burst size of the choosen memory. The wrapper checks to ensure the cache line size of the system matches the burst size of DRAMSim2 as there are currently no provisions to split the system requests. In theory we could allow a cache line size smaller than the burst size, but that would lead to inefficient use of the DRAM, so for not we fatal also in this case.	2014-02-18 05:50:53 -05:00
Andreas Hansson	e83fdc532b	util: Enhance the error messages for packet encode/decode This patch adds a more verbose error message when the Python protobuf module cannot be loaded.	2014-02-18 05:50:52 -05:00
Andreas Hansson	c9cb492e1c	mem: Fix input to DPRINTF in CommMonitor Minor fix of the debug message parameters.	2014-02-18 05:50:51 -05:00
Nilay Vaish	5abbb84f02	stats: updates due to branch predictor warming	2014-02-16 11:40:34 -06:00
Nilay Vaish	0a44e16948	Added tag stable_2014_02_15 to the changeset 459491344fcf	2014-02-15 12:44:09 -06:00
Andreas Sandberg	c52190a695	cpu: simple: Add support for using branch predictors This changesets adds branch predictor support to the BaseSimpleCPU. The simple CPUs normally don't need a branch predictor, however, there are at least two cases where it can be desirable: 1) A simple CPU can be used to warm the branch predictor of an O3 CPU before switching to the slower O3 model. 2) The simple CPU can be used as a quick way of evaluating/debugging new branch predictors since it exposes branch predictor statistics. Limitations: * Since the simple CPU doesn't speculate, only one instruction will be active in the branch predictor at a time (i.e., the branch predictor will never see speculative branches). * The outcome of a branch prediction does not affect the performance of the simple CPU.	2014-02-09 20:49:28 +01:00
Nilay Vaish	eb73a14fe2	base: calls abort() from fatal Currently fatal() ends the simulation in a normal fashion. This results in the call stack getting lost when using a debugger and it is not always possible to debug the simulation just from the information provided by the printed error message. Even though the error is likely due to a user's fault, the information available should not be thrown away. Hence, this patch to call abort() from fatal().	2014-02-06 16:30:13 -06:00
Nilay Vaish	bb0e9119e7	ruby: memory controller: use MemoryNode *	2014-02-06 16:30:12 -06:00
Andreas Sandberg	e76a37985f	x86: Fix x87 state transfer bug Changeset 7274310be1bb (isa: clean up register constants) increased the value of NumFloatRegs, which triggered a bug in X86ISA::copyRegs(). This bug is caused by the x87 stack being copied twice since register indexes past NUM_FLOATREGS are mapped into the x87 stack relative to the top of the stack, which is undefined when the copy takes place. This changeset updates the copyRegs() function to use access registers using the non-flattening interface, which guarantees that undesirable register folding does not happen.	2014-02-05 14:08:13 +01:00
Nikos Nikoleris	c6279f2d19	x86, kvm: Fix bug in the RFlags get and set functions The getRFlags and setRFlags utility functions were not updated correctly when condition registers were separated into their own register class. This lead to incorrect state transfer in calls from kvm into the simulator (e.g., m5 readfile ended up in an infinite loop) and when switching CPUs. This patch makes these utility functions use getCCReg and setCCReg instead of getIntReg and setIntReg which read and write the integer registers. Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>	2014-02-02 16:37:35 +01:00
Nilay Vaish	3526676165	config: correct bug in x86 drive sys instantiation	2014-01-31 15:35:45 -06:00
Ola Jeppsson	7f16951451	unittest: Fix build errors Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-30 12:21:58 -06:00
Mitch Hayenga	96317d466e	mem: Add additional tolerance to stride prefetcher Forces the prefetcher to mispredict twice in a row before resetting the confidence of prefetching. This helps cases where a load PC strides by a constant factor, however it may operate on different arrays at times. Avoids the cost of retraining. Primarily helps with small iteration loops. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-29 23:21:26 -06:00
Mitch Hayenga	771c864bf4	mem: Allowed tagged instruction prefetching in stride prefetcher For systems with a tightly coupled L2, a stride-based prefetcher may observe access requests from both instruction and data L1 caches. However, the PC address of an instruction miss gives no relevant training information to the stride based prefetcher(there is no stride to train). In theses cases, its better if the L2 stride prefetcher simply reverted back to a simple N-block ahead prefetcher. This patch enables this option. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-29 23:21:26 -06:00
Mitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E)	95735e10e7	mem: prefetcher: add options, support for unaligned addresses This patch extends the classic prefetcher to work on non-block aligned addresses. Because the existing prefetchers in gem5 mask off the lower address bits of cache accesses, many predictable strides fail to be detected. For example, if a load were to stride by 48 bytes, with 64 byte cachelines, the current stride based prefetcher would see an access pattern of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern. This patch fixes this, by training the prefetcher on access and not masking off the lower address bits. It also adds the following configuration options: 1) Training/prefetching only on cache misses, 2) Training/prefetching only on data acceses, 3) Optionally tagging prefetches with a PC address. #3 allows prefetchers to train off of prefetch requests in systems with multiple cache levels and PC-based prefetchers present at multiple levels. It also effectively allows a pipelining of prefetch requests (like in POWER4) across multiple levels of cache hierarchy. Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7% for SPECFP (geomean).	2014-01-29 23:21:25 -06:00
Xiangyu Dong	32cc2ea8b9	cpu: fix bug when TrafficGen deschedules event Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-29 22:35:04 -06:00
Mitch Hayenga	b77ca57f8c	arm: Enable umask syscall in SE mode Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:51 -06:00
Mitch Hayenga	55a4ff5f04	base: Fix race condition in the socket listen function gem5 makes the incorrect assumption that by binding a socket, it effectively has allocated a port. Linux only allocates ports once you call listen on the given socket, not when you call bind. So even if the port was free when bind was called, another process (gem5 instance) could race in between the bind & listen calls and steal the port. In the current code, if the call to bind fails due to the port being in use (EADDRINUSE), gem5 retries for a different port. However if listen fails, gem5 just panics. The fix is testing the return value of listen and re-trying if it was due to EADDRINUSE. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:51 -06:00
Amin Farmahini	ffbdaa7cce	mem: Remove redundant findVictim() input argument The patch (1) removes the redundant writeback argument from findVictim() (2) fixes the description of access() function Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:50 -06:00
Amin Farmahini	575a73f4a1	mem: Fixes a bug in simple_dram write merging Fixes updating the value of size in the write merge function. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-01-28 18:00:49 -06:00
Nilay Vaish	7792dedfdd	x86: add a warning about the number of memory controllers When memory size > 3GB, print a warning that twice the number of memory controllers would be created.	2014-01-28 07:15:53 -06:00
Nilay Vaish	bdee69d0b1	x86: use lfpimm instead of limm for fptan	2014-01-27 18:50:54 -06:00
Nilay Vaish	6a543b5134	x86: implements x87 add/sub instructions	2014-01-27 18:50:53 -06:00
Nilay Vaish	5be0b846b1	x86: implements fxch instruction.	2014-01-27 18:50:52 -06:00
Nilay Vaish	4eb3b1ed0b	x86: correct error in emms instruction.	2014-01-27 18:50:51 -06:00
Nilay Vaish	95b782f600	config: allow more than 3GB of memory for x86 simulations This patch edits the configuration files so that x86 simulations can have more than 3GB of memory. It also corrects a bug in the MemConfig.py script.	2014-01-27 18:50:51 -06:00
Nilay Vaish	fa0ff1c902	stats: update sparc fs stats	2014-01-27 13:30:37 -06:00
Steve Reinhardt	85016c2d45	stats: update eio stats for recent changes	2014-01-27 00:38:58 -05:00
Ali Saidi	cfb805cc71	stats: update stats for ARMv8 changes	2014-01-24 15:29:34 -06:00
ARM gem5 Developers	612f8f074f	arm: Add support for ARMv8 (AArch64 & AArch32) Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64 kernel you are restricted to AArch64 user-mode binaries. This will be addressed in a later patch. Note: Virtualization is only supported in AArch32 mode. This will also be fixed in a later patch. Contributors: Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation) Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation) Mbou Eyole (AArch64 NEON, validation) Ali Saidi (AArch64 Linux support, code integration, validation) Edmund Grimley-Evans (AArch64 FP) William Wang (AArch64 Linux support) Rene De Jong (AArch64 Linux support, performance opt.) Matt Horsnell (AArch64 MP, validation) Matt Evans (device models, code integration, validation) Chris Adeniyi-Jones (AArch64 syscall-emulation) Prakash Ramrakhyani (validation) Dam Sunwoo (validation) Chander Sudanthi (validation) Stephan Diestelhorst (validation) Andreas Hansson (code integration, performance opt.) Eric Van Hensbergen (performance opt.) Gabe Black	2014-01-24 15:29:34 -06:00
Ali Saidi	f3585c841e	stats: update stats for cache occupancy and clock domain changes	2014-01-24 15:29:33 -06:00
Andreas Hansson	cfc4a99982	arch: Make all register index flattening const This patch makes all the register index flattening methods const for all the ISAs. As part of this, readMiscRegNoEffect for ARM is also made const.	2014-01-24 15:29:30 -06:00
Geoffrey Blake	9633282fc8	checker: CheckerCPU handling of MiscRegs was incorrect The CheckerCPU model in pre-v8 code was not checking the updates to miscellaneous registers due to some methods for setting misc regs were not instrumented. The v8 patches exposed this by calling the instrumented misc reg update methods and then invoking the checker before the main CPU had updated its misc regs, leading to false positives about register mismatches. This patch fixes the non-instrumented misc reg update methods and places calls to the checker in the proper places in the O3 model.	2014-01-24 15:29:30 -06:00
Ali Saidi	7d0344704a	arch, cpu: Add support for flattening misc register indexes. With ARMv8 support the same misc register id results in accessing different registers depending on the current mode of the processor. This patch adds the same orthogonality to the misc register file as the others (int, float, cc). For all the othre ISAs this is currently a null-implementation. Additionally, a system variable is added to all the ISA objects.	2014-01-24 15:29:30 -06:00
Giacomo Gabrielli	3436de0c2a	cpu: Add support for Memory+Barrier instruction types in O3 cpu.	2014-01-24 15:29:30 -06:00
Ali Saidi	90b1775a8f	cpu: Add support for instructions that zero cache lines.	2014-01-24 15:29:30 -06:00
Ali Saidi	6bed6e0352	cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped. This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support.	2014-01-24 15:29:30 -06:00
Giacomo Gabrielli	d3444c6603	mem: Add flag to request if it was generated by a page table walk	2014-01-24 15:29:30 -06:00
Giacomo Gabrielli	aefe9cc624	mem: Add support for a security bit in the memory system This patch adds the basic building blocks required to support e.g. ARM TrustZone by discerning secure and non-secure memory accesses.	2014-01-24 15:29:30 -06:00
Chris Adeniyi-Jones	7f835a59f1	sim: Add openat/fstatat syscalls and fix mremap This patch adds support for the openat and fstatat syscalls and broadens the support for mremap to make it work on OS X.	2014-01-24 15:29:30 -06:00
Ali Saidi	904872a01a	mem: Remove explict cast from memhelper. Previously we were casting the result type to the the memory type which is incorrect for things like dual-memory operations which still return a single result.	2014-01-24 15:29:30 -06:00
Timothy M. Jones	427ceb57a9	Cache: Collect very basic stats on tag and data accesses Adds very basic statistics on the number of tag and data accesses within the cache, which is important for power modelling. For the tags, simply count the associativity of the cache each time. For the data, this depends on whether tags and data are accessed sequentially, which is given by a new parameter. In the parallel case, all data blocks are accessed each time, but with sequential accesses, a single data block is accessed only on a hit.	2014-01-24 15:29:30 -06:00
Dam Sunwoo	85e8779de7	mem: per-thread cache occupancy and per-block ages This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.	2014-01-24 15:29:30 -06:00
Matt Horsnell	739c6df94e	base: add support for probe points and common probes The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model. What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out your analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh\|cc] and src/cpu/SimpleTrace.py What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners. FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful. What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction) What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event. What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener. What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes. Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats. Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting. What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%).	2014-01-24 15:29:30 -06:00
Andreas Hansson	4de69821e6	sim: Expose the current voltage for each object as a stat	2014-01-24 15:29:30 -06:00
Andreas Hansson	1d85e914a6	sim: Expose the current clock period as a stat This patch adds observability to the clock period of the clock domains by including it as a stat. As a result of adding this, the regressions will be updated in a separate patch.	2014-01-24 15:29:30 -06:00
Matt Horsnell	ca89eba79e	mem: track per-request latencies and access depths in the cache hierarchy Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request.	2014-01-24 15:29:30 -06:00
Andreas Hansson	daa781d2db	config: Make the Clock a Tick parameter like Latency/Frequency This patch makes the Clock a TickParamValue just like Latency/Frequency. There is no longer any need to distinguish it (originally needed to support multiplication).	2014-01-24 15:29:29 -06:00
Andreas Hansson	f2b0b551cc	x86: Fix memory leak in table walker This patch fixes a memory leak in the table walker, by ensuring that the sender state is deleted again if the request packet cannot be successfully sent.	2014-01-24 15:29:29 -06:00
Andreas Hansson	7db542c0dd	cpu: Relax check on squashed non-speculative instructions This patch relaxes the check performed when squashing non-speculative instructions, as it caused problems with loads that were marked ready, and then stalled on a blocked cache. The assertion is now allowing memory references to be non-faulting.	2014-01-24 15:29:29 -06:00

1 2 3 4 5 ...

10067 commits