sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Krishnendra Nathella	cabd4768c7	cpu: Fix LLSC atomic CPU wakeup Writes to locked memory addresses (LLSC) did not wake up the locking CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU, recvAtomicSnoop was checking if the incoming packet was an invalidation (isInvalidate) and only then handled a locked snoop. But, writes are seen instead of invalidates when running without caches (fast-forward configurations). As as simple fix, now handleLockedSnoop is also called even if the incoming snoop packet are from writes.	2015-07-19 15:03:30 -05:00
Pau Cabre	abfb997800	cpu: fix unitialized variable which may cause assertion failure The assert in lsq_unit_impl.hh line 963 needs pktPending to be initialized to NULL (I got the assertion failure several times without the fix). Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-12-04 17:54:03 -06:00
Hongil Yoon	fb0f9884e2	cpu, o3: consider split requests for LSQ checksnoop operations This patch enables instructions in LSQ to track two physical addresses for corresponding two split requests. Later, the information is used in checksnoop() to search for/invalidate the corresponding LD instructions. The current implementation has kept track of only the physical address that is referenced by the first split request. Thus, for checksnoop(), the line accessed by the second request has not been considered, causing potential correctness issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-09-15 08:14:06 -05:00
Andreas Sandberg	48281375ee	mem, cpu: Add a separate flag for strictly ordered memory The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.	2015-05-05 03:22:33 -04:00
Marco Elver	9649395f85	cpu, o3: Ignored invalidate causing same-address load reordering In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.	2014-12-02 06:08:03 -05:00
Stephan Diestelhorst	810349a8a7	cpu: Move packet deallocation to recvTimingResp in the O3 CPU Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.	2014-12-02 06:07:58 -05:00
Andreas Hansson	a2d246b6b8	arch: Use shared_ptr for all Faults This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".	2014-10-16 05:49:51 -04:00
Andreas Hansson	0fa128bbd0	base: Clean up redundant string functions and use C++11 This patch does a bit of housekeeping on the string helper functions and relies on the C++11 standard library where possible. It also does away with our custom string hash as an implementation is already part of the standard library.	2014-09-20 17:17:49 -04:00
Curtis Dunham	e3b19cb294	mem: Refactor assignment of Packet types Put the packet type swizzling (that is currently done in a lot of places) into a refineCommand() member function.	2014-05-13 12:20:48 -05:00
Mitch Hayenga	4f13f676aa	cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior.	2014-09-03 07:42:39 -04:00
Mitch Hayenga	976f27487b	cpu: Change writeback modeling for outstanding instructions As highlighed on the mailing list gem5's writeback modeling can impact performance. This patch removes the limitation on maximum outstanding issued instructions, however the number that can writeback in a single cycle is still respected in instToCommit().	2014-09-03 07:42:33 -04:00
Binh Pham	0782d92286	o3: split load & store queue full cases in rename Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa. This work was done while Binh was an intern at AMD Research.	2014-06-21 10:26:43 -07:00
Steve Reinhardt	0be64ffe2f	style: eliminate equality tests with true and false Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'. It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up. Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code.	2014-05-31 18:00:23 -07:00
Mitch Hayenga	e4086878f6	cpu: Fix case where o3 lsq could print out uninitialized data In the O3 LSQ, data read/written is printed out in DPRINTFs. However, the data field is treated as a character string with a null terminated. However the data field is not encoded this way. This patch removes that possibility by removing the data part of the print.	2014-04-01 14:22:06 -05:00
Marco Elver	b884fcf412	cpu: o3: lsq: Fix TSO implementation This patch fixes violation of TSO in the O3CPU, as all loads must be ordered with all other loads. In the LQ, if a snoop is observed, all subsequent loads need to be squashed if the system is TSO. Prior to this patch, the following case could be violated: P0 \| P1 ; MOV [x],mail=/usr/spool/mail/nilay \| MOV EAX,[y] ; MOV [y],mail=/usr/spool/mail/nilay \| MOV EBX,[x] ; exists (1:EAX=1 /\ 1:EBX=0) [is a violation] The problem was found using litmus [http://diy.inria.fr]. Committed by: Nilay Vaish <nilay@cs.wisc.edu	2014-03-25 13:15:04 -05:00
Ali Saidi	90b1775a8f	cpu: Add support for instructions that zero cache lines.	2014-01-24 15:29:30 -06:00
Ali Saidi	6bed6e0352	cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped. This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support.	2014-01-24 15:29:30 -06:00
Matt Horsnell	739c6df94e	base: add support for probe points and common probes The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model. What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out your analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh\|cc] and src/cpu/SimpleTrace.py What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners. FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful. What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction) What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event. What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener. What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes. Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats. Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting. What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%).	2014-01-24 15:29:30 -06:00
Matt Horsnell	ca89eba79e	mem: track per-request latencies and access depths in the cache hierarchy Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request.	2014-01-24 15:29:30 -06:00
Matt Horsnell	6decd70bfb	cpu: add consistent guarding to *_impl.hh files.	2013-10-17 10:20:45 -05:00
Faissal Sleiman	e516531bd0	cpu: Put in assertions to check for maximum supported LQ/SQ size LSQSenderState represents the LQ/SQ index using uint8_t, which supports up to 256 entries (including the sentinel entry). Sending packets to memory with a higher index than 255 truncates the index, such that the response matches the wrong entry. For instance, this can result in a deadlock if a store completion does not clear the head entry.	2013-10-17 10:20:45 -05:00
Andreas Hansson	d4273cc9a6	mem: Set the cache line size on a system level This patch removes the notion of a peer block size and instead sets the cache line size on the system level. Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used. A follow-on patch updates the configuration scripts accordingly.	2013-07-18 08:31:16 -04:00
Matt Horsnell	e88e7d88b9	o3: fix tick used for renaming and issue with range selection Fixes the tick used from rename: - previously this gathered the tick on leaving rename which was always 1 less than the dispatch. This conflated the decode ticks when back pressure built in the pipeline. - now picks up tick on entry. Added --store_completions flag: - will additionally display the store completion tail in the viewer. - this highlights periods when large numbers of stores are outstanding (>16 LSQ blocking) Allows selection by tick range (previously this caused an infinite loop)	2013-02-15 17:40:09 -05:00
Andreas Sandberg	1814a85a05	cpu: Rewrite O3 draining to avoid stopping in microcode Previously, the O3 CPU could stop in the middle of a microcode sequence. This patch makes sure that the pipeline stops when it has committed a normal instruction or exited from a microcode sequence. Additionally, it makes sure that the pipeline has no instructions in flight when it is drained, which should make draining more robust. Draining is controlled in the commit stage, which checks if the next PC after a committed instruction is in microcode. If this isn't the case, it requests a squash of all instructions after that the instruction that just committed and immediately signals a drain stall to the fetch stage. The CPU then continues to execute until the pipeline and all associated buffers are empty.	2013-01-07 13:05:46 -05:00
Andreas Sandberg	fca4fea769	cpu: Fix O3 LSQ debug dumping constness and formatting	2013-01-07 13:05:46 -05:00
Ali Saidi	69d419f313	o3: Fix issue with LLSC ordering and speculation This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked address. Previously we relied on the cache to handle the locking for us, however some users on the gem5 mailing list reported a case where the cpu speculatively executes a ll operation after a pending sc operation in the pipeline and that makes the cache monitor valid. This should handle that case by invaliding the local monitor.	2013-01-07 13:05:33 -05:00
Nathanael Premillieu	eb899407c5	o3 cpu: remove some unused buggy functions in the lsq Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2012-12-06 04:36:51 -06:00
Andreas Hansson	c60db56741	Packet: Remove NACKs from packet and its use in endpoints This patch removes the NACK frrom the packet as there is no longer any module in the system that issues them (the bridge was the only one and the previous patch removes that). The handling of NACKs was mostly avoided throughout the code base, by using e.g. panic or assert false, but in a few locations the NACKs were actually dealt with (although NACKs never occured in any of the regressions). Most notably, the DMA port will now never receive a NACK and the backoff time is thus never changed. As a consequence, the entire backoff mechanism (similar to a PCI bus) is now removed and the DMA port entirely relies on the bus performing the arbitration and issuing a retry when appropriate. This is more in line with e.g. PCIe. Surprisingly, this patch has no impact on any of the regressions. As mentioned in the patch that removes the NACK from the bridge, a follow-up patch should change the request and response buffer size for at least one regression to also verify that the system behaves as expected when the bridge fills up.	2012-08-22 11:39:59 -04:00
Ali Saidi	6df196b71e	O3: Clean up the O3 structures and try to pack them a bit better. DynInst is extremely large the hope is that this re-organization will put the most used members close to each other.	2012-06-05 01:23:09 -04:00
Andreas Hansson	3fea59e162	MEM: Separate requests and responses for timing accesses This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses. For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions). The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly. With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself.	2012-05-01 13:40:42 -04:00
Andreas Hansson	750f33a901	MEM: Remove the Broadcast destination from the packet This patch simplifies the packet by removing the broadcast flag and instead more firmly relying on (and enforcing) the semantics of transactions in the classic memory system, i.e. request packets are routed from a master to a slave based on the address, and when they are created they have neither a valid source, nor destination. On their way to the slave, the request packet is updated with a source field for all modules that multiplex packets from multiple master (e.g. a bus). When a request packet is turned into a response packet (at the final slave), it moves the potentially populated source field to the destination field, and the response packet is routed through any multiplexing components back to the master based on the destination field. Modules that connect multiplexing components, such as caches and bridges store any existing source and destination field in the sender state as a stack (just as before). The packet constructor is simplified in that there is no longer a need to pass the Packet::Broadcast as the destination (this was always the case for the classic memory system). In the case of Ruby, rather than using the parameter to the constructor we now rely on setDest, as there is already another three-argument constructor in the packet class. In many places where the packet information was printed as part of DPRINTFs, request packets would be printed with a numeric "dest" that would always be -1 (Broadcast) and that field is now removed from the printing.	2012-04-14 05:45:55 -04:00
William Wang	f9d403a7b9	MEM: Introduce the master/slave port sub-classes in C++ This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects. The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches. The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal. The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again.	2012-03-30 09:40:11 -04:00
Geoffrey Blake	043709fdfa	CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes.	2012-03-09 09:59:27 -05:00
Geoffrey Blake	af6aaf2581	CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5 Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification.	2012-01-31 07:46:03 -08:00
Nilay Vaish	5c2fc35e02	O3 CPU LSQ: Implement TSO This patch makes O3's LSQ maintain total order between stores. Essentially only the store at the head of the store buffer is allowed to be in flight. Only after that store completes, the next store is issued to the memory system. By default, the x86 architecture will have TSO.	2012-01-28 19:09:04 -06:00
Gabe Black	4fcf8e9959	O3: Tidy up some DPRINTFs in the LSQ.	2011-09-27 00:25:26 -07:00
Gabe Black	44ed4849d4	Faults: Replace calls to genMachineCheckFault with M5PanicFault.	2011-09-27 00:24:43 -07:00
Nilay Vaish	56bddab189	LSQ: Moved a couple of lines to enable O3 + Ruby This patch makes O3 CPU work along with the Ruby memory model. Ruby overwrites the senderState pointer with another pointer. The pointer is restored only when Ruby gets done with the packet. LSQ makes use of senderState just after sendTiming() returns. But the dynamic_cast returns a NULL pointer since Ruby's senderState pointer is from a different class. Storing the senderState pointer before calling sendTiming() does away with the problem.	2011-09-26 12:18:32 -05:00
Steve Reinhardt	84f0a1bd91	event: minor cleanup Initialize flags via the Event constructor instead of calling setFlags() in the body of the derived class's constructor. I forget exactly why, but this made life easier when implementing multi-queue support. Also rename Event::getFlags() to isFlagSet() to better match common usage, and get rid of some unused Event methods.	2011-09-22 18:59:55 -07:00
Ali Saidi	649c239cee	LSQ: Only trigger a memory violation with a load/load if the value changes. Only create a memory ordering violation when the value could have changed between two subsequent loads, instead of just when loads go out-of-order to the same address. While not very common in the case of Alpha, with an architecture with a hardware table walker this can happen reasonably frequently beacuse a translation will miss and start a table walk and before the CPU re-schedules the faulting instruction another one will pass it to the same address (or cache block depending on the dendency checking). This patch has been tested with a couple of self-checking hand crafted programs to stress ordering between two cores. The performance improvement on SPEC benchmarks can be substantial (2-10%).	2011-09-13 12:58:08 -04:00
Gabe Black	206c2e9a0e	O3: Implement memory mapped IPRs for O3.	2011-07-31 19:21:17 -07:00
Ali Saidi	09a2be0c39	O3: Fix a small corner case with the lsq hazard detection logic.	2011-05-04 20:38:26 -05:00
Nathan Binkert	6e9143d36d	stats: one more name violation	2011-04-20 19:07:45 -07:00
Nathan Binkert	eddac53ff6	trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help	2011-04-15 10:44:32 -07:00
Nathan Binkert	39a055645f	includes: sort all includes	2011-04-15 10:44:06 -07:00
Ali Saidi	7dde557fdc	O3: Tighten memory order violation checking to 16 bytes. The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required.	2011-04-04 11:42:23 -05:00
Ali Saidi	2f40b3b8ae	O3: Fix unaligned stores when cache blocked Without this change the a store can be issued to the cache multiple times. If this case occurs when the l1 cache is out of mshrs (and thus blocked) the processor will never make forward progress because each cycle it will send a single request using the recently freed mshr and not completing the multipart store. This will continue forever.	2011-03-17 19:20:19 -05:00
Giacomo Gabrielli	e2507407b1	O3: Enhance data address translation by supporting hardware page table walkers. Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs, when a TLB miss occurs, initiateTranslation() can return with NoFault but with the translation unfinished. Instructions experiencing a delayed translation due to a hardware page table walk are deferred until the translation completes and kept into the IQ. In order to keep track of them, the IQ has been augmented with a queue of the outstanding delayed memory instructions. When their translation completes, instructions are re-executed (only their initiateAccess() was already executed; their DTB translation is now skipped). The IEW stage has been modified to support such a 2-pass execution.	2011-02-11 18:29:35 -06:00
Matt Horsnell	11bef2ab38	O3: Fix corner cases where multiple squashes/fetch redirects overwrite timebuf.	2011-01-18 16:30:05 -06:00
Ali Saidi	0f9a3671b6	ARM: Add support for moving predicated false dest operands from sources.	2011-01-18 16:30:02 -06:00

1 2 3

115 commits