sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Gabor Dozsa	ecf68fac40	dev: Fix race conditions at terminating dist-gem5 simulations Two problems may arise when a distributed gem5 simulation terminates: (i) simulation thread(s) may get stuck in an incomplete synchronisation event which prohibits processing the simulation exit event; and (ii) a stale receiver thread may try to access objects that have already been deleted while exiting gem5. This patch terminates receive threads properly and aborts the processing of any incomplete synchronisation event. Change-Id: I72337aa12c7926cece00309640d478b61e55a429 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2016-12-06 17:33:06 +00:00
Andreas Hansson	c642d6fc37	ruby: Remove RubyMemoryControl and associated files This patch removes the deprecated RubyMemoryControl. The DRAMCtrl module should be used instead.	2016-12-05 16:49:07 -05:00
Nikos Nikoleris	0054f1ad53	mem: Respond to InvalidateReq when the block is (pending) dirty Previously when an InvalidateReq snooped a cache with a dirty block or a pending modified MSHR, it would invalidate the block or set the postInv flag. The cache would not send an InvalidateResp. though, causing memory order violations. This patches changes this behavior, making the cache with the dirty block or pending modified MSHR the ordering point. Change-Id: Ib4c31012f4f6693ffb137cd77258b160fbc239ca Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>	2016-12-05 16:48:29 -05:00
Nikos Nikoleris	9916e4276c	mem: Invalidate a blk when servicing the 1st invalidating target Previously an MSHR with one or more invalidating targets would first service all targets in the MSHR TargetList and then invalidate the block. As a result any service snooping targets would lookup in the cache and incorrectly find the block. This patch forces the invalidation to happen when the first invalidating target is encountered. Change-Id: I9df15de24e1d351cd96f5a2c424d9a03d81c2cce Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>	2016-12-05 16:48:28 -05:00
Nikos Nikoleris	77dfeb8c09	mem: Allow non invalidating snoops on an InvalidateReq MSHR This patch changes an assertion that previously assumed that a non invalidating snoop request should never be serviced by an InvalidateReq MSHR. The MSHR serves as the ordering point for the snooping packet. When the InvalidateResp reaches the cache the snooping packet snoops the caches above to find the requested block. One or more of the caches above will have the block since earlier it has seen a WriteLineReq. Change-Id: I0c147c8b5d5019e18bd34adf9af0fccfe431ae07 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>	2016-12-05 16:48:27 -05:00
Nikos Nikoleris	5ebb8ec46b	mem: Don't use hasSharers in the snoopFilter for memory responses When the snoopFilter receives a response, it updates its state using the hasSharers flag (indicates whether there are more than one copies of the block in the caches above). The hasSharers flag of the packet was previously populated when the request was traversing and snooping the caches looking for the block. 1) When the response is coming from the memory-side port, its order with respect to other responses is not necessarily preserved (e.g., a request that arrived second to the xbar can get its response first). As a result the snoopFilter might process responses out of order updating its residency information using the non valid hasSharers flag which was populated much earlier. 2) When the response is from an on-chip, the MSHRs preserve a well defined order and the hasSharers flag should contain valid information. This patch changes the snoopFilter by avoiding the hasSharers flag when the response is from the memory-side port. Change-Id: Ib2d22a5b7bf3eccac64445127d2ea20ee74bb25b Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:26 -05:00
Nikos Nikoleris	78a97b1847	mem: Always use InvalidateReq to service WriteLineReq misses Previously, a WriteLineReq that missed in a cache would send out an InvalidateReq if the block lookup failed or an UpgradeReq if the block lookup succeeded but the block had sharers. This changes ensures that a WriteLineReq always sends an InvalidateReq to invalidate all copies of the block and satisfy the WriteLineReq. Change-Id: I207ff5b267663abf02bc0b08aeadde69ad81be61 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>	2016-12-05 16:48:25 -05:00
Nikos Nikoleris	3172501a59	mem: Assert that the responderHadWritable is set only once Change-Id: Ie3beeef25331f84a0a5bcc17f7a791f4a829695b Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:24 -05:00
Andreas Hansson	50812a20f1	mem: Ensure InvalidateReq is considered isForward by MSHRs This patch fixes an issue where an MSHR would incorrectly be perceived to provide data to targets arriving after an InvalidateReq. To address this the InvalidateReq is now treated as isForward, much like an UpgradeReq that did not hit in the cache. Change-Id: Ia878444d949539b5c33fd19f3e12b0b8a872275e Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:23 -05:00
Nikos Nikoleris	e16967941b	mem: Make packet debug printing more uniform Previously DPRINTFs printing information about a packet would use ad hoc formats. This patch changes all DPRINTFs to use the print function defined by the packet class, making the packet printing format more uniform and easier to change. Change-Id: Idd436a9758d4bf70c86a574d524648b2a2580970 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:21 -05:00
Nikos Nikoleris	61860f2419	cpu: Change traffic generators to use different values for writes Previously all traffic generators would use the same value for write requests. With this change traffic generators use their master id as the payload of write requests making them more useful for the memchecker. Change-Id: Id1a6b8f02853789b108ef6003f4c32ab929bb123 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:20 -05:00
Nikos Nikoleris	0bd9dfb8de	mem: Service only the 1st FromCPU MSHR target on ReadRespWithInv A response to a ReadReq can either be a ReadResp or a ReadRespWithInvalidate. As we add targets to an MSHR for a ReadReq we assume that the response will be a ReadResp. When the response is invalidating (ReadRespWithInvalidate) servicing more than one targets can potentially violate the memory ordering. This change fixes the way we handle a ReadRespWithInvalidate. When a cache receives a ReadRespWithInvalidate we service only the first FromCPU target and all the FromSnoop targets from the MSHR target list. The rest of the FromCPU targets are deferred and serviced by a new request. Change-Id: I75c30c268851987ee5f8644acb46f440b4eeeec2 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:19 -05:00
Nikos Nikoleris	d28c2906f4	mem: Keep track of allocOnFill in the TargetList Previously the information of whether a response was allocating or not was a property of the MSHR. This change makes this flag a property of the TargetList. Differernt TargetLists, e.g. the targets and the deferred targets lists might have different values. Additionally, the information about whether each of the target expects an allocating response is stored inside the TargetList container. This allows for repopulating the flag in case some of the targets are removed. Change-Id: If3ec2516992f42a6d9da907009ffe3ab8d0d2021 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:18 -05:00
Nikos Nikoleris	f7a5de3bec	mem: Add support for repopulating the flags of an MSHR TargetList This patch adds support for repopulating the flags of an MSHR TargetList. The added functionality makes it possible to remove targets from a TargetList without leaving it in an inconsistent state. Change-Id: I3f7a8e97bfd3e2e49bebad056d11bbfb087aad91 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>	2016-12-05 16:48:17 -05:00
Brandon Potter	3d0a537862	hsail: disable asserts to allow immediate operands i.e. 0 with loads	2016-12-02 18:01:58 -05:00
Brandon Potter	900fd15622	hsail: add stub type and stub out several instructions	2016-12-02 18:01:57 -05:00
Brandon Potter	86b375f2f3	hsail: add popcount type and generate popcount instructions	2016-12-02 18:01:55 -05:00
Brandon Potter	3bb3db6194	hsail: add a wavesize case statement to register operand code	2016-12-02 18:01:52 -05:00
Brandon Potter	69c2d86d68	hsail: generate mov instructions for more arith_types and bit_types	2016-12-02 18:01:49 -05:00
Brandon Potter	35ba103009	hsail: remove the panic guarding function directives HSA functions calls are still not supported properly with HSAIL, but the recent AMP runtime modifications rely on being able to parse the BRIG/HSAIL files that are extracted from the application binaries. We need to parse the function call HSAIL definitions, but we do not actually need to make the function calls. The reason that this happens is that HCC appends a set of routines to every HSAIL binary that it creates. These extra, unnecessary routines exist in the HCC source as a file; this file is cat'd onto everything that the compiler outputs before being assembled into the application's binary. HCC does this because it might call these helper functions. However, it doesn't actually appear to do so in the AMP codes so we just parse these functions with the HSAIL parser and then ignore them.	2016-12-02 18:01:42 -05:00
Tony Gutierrez	38708f369b	hsail: fix unsigned offset bug in address calculation it's possible for the offset provided to an HSAIL mem inst to be a negative value, however the variable we use to hold the offset is an unsigned type. this can lead to excessively large offset values when the offset is negative, which will almost certainly cause the access to go out of bounds.	2016-12-02 11:40:52 -05:00
Matthew Poremba	80607a2a1d	ruby: Fix overflow reported by ASAN in MessageBuffer. In MessageBuffer the m_not_avail_count member is incremented but not used. This causes an overflow reported by ASAN. This patch changes from an int to Stats::Scalar, since the count is useful in debugging finite MessageBuffers.	2016-12-02 11:40:40 -05:00
Alec Roelke	ee0c261e10	riscv: [Patch 7/5] Corrected LRSC semantics RISC-V makes use of load-reserved and store-conditional instructions to enable creation of lock-free concurrent data manipulation as well as ACQUIRE and RELEASE semantics for memory ordering of LR, SC, and AMO instructions (the latter of which do not follow LR/SC semantics). This patch is a correction to patch 4, which added these instructions to the implementation of RISC-V. It modifies locked_mem.hh and the implementations of lr.w, sc.w, lr.d, and sc.d to apply the proper gem5 flags and return the proper values. An important difference between gem5's LLSC semantics and RISC-V's LR/SC ones, beyond the name, is that gem5 uses 0 to indicate failure and 1 to indicate success, while RISC-V is the opposite. Strictly speaking, RISC-V uses 0 to indicate success and nonzero to indicate failure where the value would indicate the error, but currently only 1 is reserved as a failure code by the ISA reference. This is the seventh patch in the series which originally consisted of five patches that added the RISC-V ISA to gem5. The original five patches added all of the instructions and added support for more detailed CPU models and the sixth patch corrected the implementations of Linux constants and structs. There will be an eighth patch that adds some regression tests for the instructions. [Removed some commented-out code from locked_mem.hh.] Signed-off by: Alec Roelke Signed-off by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:28 -05:00
Alec Roelke	84020a8aed	riscv: [Patch 6/5] Improve Linux emulation for RISC-V This is an add-on patch for the original series that implemented RISC-V that improves the implementation of Linux emulation for SE mode. Basically it cleans up linux/linux.hh by removing constants that haven't been defined for the RISC-V Linux proxy kernel and rearranging the stat struct so it aligns with RISC-V's implementation of it. It also adds placeholders for system calls that have been given numbers in RISC-V but haven't been given implementations yet. These system calls are as follows: - readlinkat - sigprocmask - ioctl - clock_gettime - getrusage - getrlimit - setrlimit The first five patches implemented RISC-V with the base ISA and multiply, floating point, and atomic extensions and added support for detailed CPU models with memory timing. [Fixed incompatibility with changes made from patch 1.] Signed-off by: Alec Roelke Signed-off by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:28 -05:00
Alec Roelke	126c0360e2	riscv: [Patch 5/5] Added missing support for timing CPU models Last of five patches adding RISC-V to GEM5. This patch adds support for timing, minor, and detailed CPU models that was missing in the last four, which basically consists of handling timing-mode memory accesses and telling the minor and detailed models what a no-op instruction should be (addi zero, zero, 0). Patches 1-4 introduced RISC-V and implemented the base instruction set, RV64I, and added the multiply, floating point, and atomic memory extensions, RV64MAFD. [Fixed compatibility with edit from patch 1.] [Fixed compatibility with hg copy edit from patch 1.] [Fixed some style errors in locked_mem.hh.] Signed-off by: Alec Roelke Signed-off by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:28 -05:00
Alec Roelke	535e6c5fa4	riscv: [Patch 4/5] Added RISC-V atomic memory extension RV64A Fourth of five patches adding RISC-V to GEM5. This patch adds the RV64A extension, which includes atomic memory instructions. These instructions atomically read a value from memory, modify it with a value contained in a source register, and store the original memory value in the destination register and modified value back into memory. Because this requires two memory accesses and GEM5 does not support two timing memory accesses in a single instruction, each of these instructions is split into two micro- ops: A "load" micro-op, which reads the memory, and a "store" micro-op, which modifies and writes it back. Each atomic memory instruction also has two bits that acquire and release a lock on its memory location. Additionally, there are atomic load and store instructions that only either load or store, but not both, and can acquire or release memory locks. Note that because the current implementation of RISC-V only supports one core and one thread, it doesn't make sense to make use of AMO instructions. However, they do form a standard extension of the RISC-V ISA, so they are included mostly as a placeholder for when multithreaded execution is implemented. As a result, any tests for their correctness in a future patch may be abbreviated. Patch 1 introduced RISC-V and implemented the base instruction set, RV64I; patch 2 implemented the integer multiply extension, RV64M; and patch 3 implemented the single- and double-precision floating point extensions, RV64FD. Patch 5 will add support for timing, minor, and detailed CPU models that isn't present in patches 1-4. [Added missing file amo.isa] [Replaced information removed from initial patch that was missed during division into multiple patches.] [Fixed some minor formatting issues.] [Fixed oversight where LR and SC didn't have both AQ and RL flags.] Signed-off by: Alec Roelke Signed-off by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:28 -05:00
Alec Roelke	1229b3b623	riscv: [Patch 3/5] Added RISCV floating point extensions RV64FD Third of five patches adding RISC-V to GEM5. This patch adds the RV64FD extensions, which include single- and double-precision floating point instructions. Patch 1 introduced RISC-V and implemented the base instruction set, RV64I and patch 2 implemented the integer multiply extension, RV64M. Patch 4 will implement the atomic memory instructions, RV64A, and patch 5 will add support for timing, minor, and detailed CPU models that is missing from the first four patches. [Fixed exception handling in floating-point instructions to conform better to IEEE-754 2008 standard and behavior of the Chisel-generated RISC-V simulator.] [Fixed style errors in decoder.isa.] [Fixed some fuzz caused by modifying a previous patch.] Signed-off by: Alec Roelke Signed-off by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:28 -05:00
Alec Roelke	070da98493	riscv: [Patch 2/5] Added RISC-V multiply extension RV64M Second of five patches adding RISC-V to GEM5. This patch adds the RV64M extension, which includes integer multiply and divide instructions. Patch 1 introduced RISC-V and implemented the base instruction set, RV64I. Patch 3 will implement the floating point extensions, RV64FD; patch 4 will implement the atomic memory instructions, RV64A; and patch 5 will add support for timing, minor, and detailed CPU models that is missing from the first four patches. [Added mulw instruction that was missed when dividing changes among patches.] Signed-off by: Alec Roelke Signed-off by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:28 -05:00
Alec Roelke	e76bfc8764	arch: [Patch 1/5] Added RISC-V base instruction set RV64I First of five patches adding RISC-V to GEM5. This patch introduces the base 64-bit ISA (RV64I) in src/arch/riscv for use with syscall emulation. The multiply, floating point, and atomic memory instructions will be added in additional patches, as well as support for more detailed CPU models. The loader is also modified to be able to parse RISC-V ELF files, and a "Hello world\!" example for RISC-V is added to test-progs. Patch 2 will implement the multiply extension, RV64M; patch 3 will implement the floating point (single- and double-precision) extensions, RV64FD; patch 4 will implement the atomic memory instructions, RV64A, and patch 5 will add support for timing, minor, and detailed CPU models that is missing from the first four patches (such as handling locked memory). [Removed several unused parameters and imports from RiscvInterrupts.py, RiscvISA.py, and RiscvSystem.py.] [Fixed copyright information in RISC-V files copied from elsewhere that had ARM licenses attached.] [Reorganized instruction definitions in decoder.isa so that they are sorted by opcode in preparation for the addition of ISA extensions M, A, F, D.] [Fixed formatting of several files, removed some variables and instructions that were missed when moving them to other patches, fixed RISC-V Foundation copyright attribution, and fixed history of files copied from other architectures using hg copy.] [Fixed indentation of switch cases in isa.cc.] [Reorganized syscall descriptions in linux/process.cc to remove large number of repeated unimplemented system calls and added implmementations to functions that have received them since it process.cc was first created.] [Fixed spacing for some copyright attributions.] [Replaced the rest of the file copies using hg copy.] [Fixed style check errors and corrected unaligned memory accesses.] [Fix some minor formatting mistakes.] Signed-off by: Alec Roelke Signed-off by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:28 -05:00
Sophiane Senni	ce2722cdd9	mem: Split the hit_latency into tag_latency and data_latency If the cache access mode is parallel, i.e. "sequential_access" parameter is set to "False", tags and data are accessed in parallel. Therefore, the hit_latency is the maximum latency between tag_latency and data_latency. On the other hand, if the cache access mode is sequential, i.e. "sequential_access" parameter is set to "True", tags and data are accessed sequentially. Therefore, the hit_latency is the sum of tag_latency plus data_latency. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2016-11-30 17:10:27 -05:00
Jason Lowe-Power	047caf24ba	cpu: Remove branch predictor function predictInOrder This function was used by the now-defunct InOrderCPU model. Since this model is no longer in gem5, this function was not called from anywhere in the code.	2016-11-30 17:10:27 -05:00
Michael LeBeane	cd4b26b6ae	dev: Fix buffer length when unserializing an eth pkt Changeset 11701 only serialized the useful portion of of an ethernet packets' payload. However, the device models expect each ethernet packet to contain a 16KB buffer, even if there is no data in it. This patch adds a 'bufLength' field to EthPacketData so the original size of the packet buffer can always be unserialized. Reported-by: Gabor Dozsa <Gabor.Dozsa@arm.com>	2016-11-29 13:04:45 -05:00
Joe Gross	4b7bc5b1e1	scons: fix sanitizer flags with multiple sanitizers There has been some problem when using address and undefined-behavior sanitizers at the same time. This patch will look for the special case where both are enabled at once and change the flags passed to the compiler to reflect this.	2016-11-28 12:44:54 -05:00
Jieming Yin	b0856ab3b1	ruby: Fix potential bugs in garnet2.0 1. Delete unused variable from struct LinkEntry 2. Correct GarnetExtLink and GarnetIntLink inheritance	2016-11-21 15:41:30 -05:00
Tony Gutierrez	14deacf86e	gpu-compute: fix segfault when constructing GPUExecContext the GPUExecContext context currently stores a reference to its parent WF's GPUISA object, however there are some special instructions that do not have an associated WF. when these objects are constructed they set their WF pointer to null, which causes the GPUExecContext to segfault when trying to dereference the WF pointer to get at the WF's GPUISA object. here we change the GPUISA reference in the GPUExecContext class to a pointer so that it may be set to null.	2016-11-21 15:40:03 -05:00
Tony Gutierrez	a0d4019abd	gpu-compute: init valid field of GpuTlbEntry in default ctor valid field for GpuTlbEntry is not set in the default ctor, which can lead to strange behavior, and is also flagged by UBSAN.	2016-11-21 15:38:30 -05:00
Tony Gutierrez	f82418acef	ruby: add default ctor for MachineID type not all uses of MachineID initialize its fields, so here we add a default ctor.	2016-11-21 15:37:07 -05:00
Tony Gutierrez	0799600686	x86: fix issue with casting in Cvtf2i UBSAN flags this operation because it detects that arg is being cast directly to an unsigned type, argBits. this patch fixes this by first casting the value to a signed int type, then reintrepreting the raw bits of the signed int into argBits.	2016-11-21 15:35:56 -05:00
Sooraj Puthoor	29d38e7576	ruby: init MessageSizeType of SequencerMsg to Request_Control SequencerMsg is autogenerated by slicc scripts and the MessageSizeType is initialized to the max enume value by default. The DMASequencer pushes this message to the mandatory queue and since the MessageSizeType is unitialized, string_to_MessageSizeType() function used by traces to print the message fails with a panic. This patch avoids this problem by initializing MessageSizeType of SequencerMsg to Request_Control.	2016-11-19 12:39:04 -05:00
Tony Gutierrez	ae55cba281	x86: fix loading/storing of Float80 types	2016-11-19 12:35:14 -05:00
Andreas Hansson	6ed567d600	alpha: Remove ALPHA tru64 support and associated tests No one appears to be using it, and it is causing build issues and increases the development and maintenance effort.	2016-11-17 04:54:14 -05:00
Tony Gutierrez	74249f80df	hsail,gpu-compute: fixes to appease clang++ fixes to appease clang++. tested on: Ubuntu clang version 3.5.0-4ubuntu2~trusty2 (tags/RELEASE_350/final) (based on LLVM 3.5.0) Ubuntu clang version 3.6.0-2ubuntu1~trusty1 (tags/RELEASE_360/final) (based on LLVM 3.6.0) the fixes address the following five issues: 1) the exec continuations in gpu_static_inst.hh were marked as protected when they should be public. here we mark them as public 2) the Abs instruction uses std::abs() in its execute method. because Abs is templated, it can also operate on U32 and U64, types, which cause Abs::execute() to pass uint32_t and uint64_t types to std::abs() respectively. this triggers a warning because std::abs() has no effect in this case. to rememdy this we add template specialization for the execute() method of Abs when its template paramter is U32 or U64. 3) Some potocols that utilize the code in cprintf.hh were missing includes to BoolVec.hh, which defines operator<< for the BoolVec type. This would cause issues when the generated code would try to pass a BoolVec type to a method in cprintf.hh that used operator<< on an instance of a BoolVec. 4) Surprise, clang doesn't like it when you clobber all the bits in a newly allocated object. I.e., this code: tlb = new GpuTlbEntry\[size\]; std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size); Let's use std::vector to track the TLB entries in the GpuTlb now... 5) There were a few variables used only in DPRINTFs, so we mark them with M5_VAR_USED.	2016-10-26 22:48:45 -04:00
Michael LeBeane	dc16c1ceb8	dev: Add m5 op to toggle synchronization for dist-gem5. This patch adds the ability for an application to request dist-gem5 to begin/ end synchronization using an m5 op. When toggling on sync, all nodes agree on the next sync point based on the maximum of all nodes' ticks. CPUs are suspended until the sync point to avoid sending network messages until sync has been enabled. Toggling off sync acts like a global execution barrier, where all CPUs are disabled until every node reaches the toggle off point. This avoids tricky situations such as one node hitting a toggle off followed by a toggle on before the other nodes hit the first toggle off.	2016-10-26 22:48:40 -04:00
Michael LeBeane	48e43c9ad1	ruby: Allow multiple outstanding DMA requests DMA sequencers and protocols can currently only issue one DMA access at a time. This patch implements the necessary functionality to support multiple outstanding DMA requests in Ruby.	2016-10-26 22:48:37 -04:00
mlebeane	96905971f2	dev: Add 'simLength' parameter in EthPacketData Currently, all the network devices create a 16K buffer for the 'data' field in EthPacketData, and use 'length' to keep track of the size of the packet in the buffer. This patch introduces the 'simLength' parameter to EthPacketData, which is used to hold the effective length of the packet used for all timing calulations in the simulator. Serialization is performed using only the useful data in the packet ('length') and not necessarily the entire original buffer.	2016-10-26 22:48:33 -04:00
Tony Gutierrez	de72e36619	gpu-compute: support in-order data delivery in GM pipe this patch adds an ordered response buffer to the GM pipeline to ensure in-order data delivery. the buffer is implemented as a stl ordered map, which sorts the request in program order by using their sequence ID. when requests return to the GM pipeline they are marked as done. only the oldest request may be serviced from the ordered buffer, and only if is marked as done. the FIFO response buffers are kept and used in OoO delivery mode	2016-10-26 22:48:28 -04:00
Tony Gutierrez	b63eb1302b	gpu-compute, hsail: pass GPUDynInstPtr to getRegisterIndex() for HSAIL an operand's indices into the register files may be calculated trivially, because the operands are always read from a register file, or are an immediate. for machine ISA, however, an op selector may specify special registers, or may specify special SGPRs with an alias op selector value. the location of some of the special registers values are dependent on the size of the RF in some cases. here we add a way for the underlying getRegisterIndex() method to know about the size of the RFs, so that it may find the relative positions of the special register values.	2016-10-26 22:47:49 -04:00
Tony Gutierrez	aa7364276f	gpu-compute: use System cache line size in the GPU	2016-10-26 22:47:47 -04:00
Tony Gutierrez	844fb845a5	gpu-compute, hsail: make the PC a byte address, not an instruction index currently the PC is incremented on an instruction granularity, and not as an instruction's byte address. machine ISA instructions assume the PC is a byte address, and is incremented accordingly. here we make the GPU model, and the HSAIL instructions treat the PC as a byte address as well.	2016-10-26 22:47:43 -04:00
Tony Gutierrez	d327cdba07	gpu-compute: add gpu_isa.hh to switch hdrs, add GPUISA to WF the GPUISA class is meant to encapsulate any ISA-specific behavior - special register accesses, isa-specific WF/kernel state, etc. - in a generic enough way so that it may be used in ISA-agnostic code. gpu-compute: use the GPUISA object to advance the PC the GPU model treats the PC as a pointer to individual instruction objects - which are store in a contiguous array - and not a byte address to be fetched from the real memory system. this is ok for HSAIL because all instructions are considered by the model to be the same size. in machine ISA, however, instructions may be 32b or 64b, and branches are calculated by advancing the PC by the number of words (4 byte chunks) it needs to advance in the real instruction stream. because of this there is a mismatch between the PC we use to index into the instruction array, and the actual byte address PC the ISA expects. here we move the PC advance calculation to the ISA so that differences in the instrucion sizes may be accounted for in generic way.	2016-10-26 22:47:38 -04:00
Tony Gutierrez	98d8a7051d	gpu-compute: add instruction mix stats for the gpu	2016-10-26 22:47:30 -04:00
Tony Gutierrez	c7a79c9a42	gpu-compute, hsail: call discardFetch() from the WF because every taken branch causes fetch to be discarded, we move the call to the WF to avoid to have to call it from each and every branch instruction type.	2016-10-26 22:47:27 -04:00
Tony Gutierrez	00a6346c91	hsail, gpu-compute: remove doGm/SmReturn add completeAcc we are removing doGmReturn from the GM pipe, and adding completeAcc() implementations for the HSAIL mem ops. the behavior in doGmReturn is dependent on HSAIL and HSAIL mem ops, however the completion phase of memory ops in machine ISA can be very different, even amongst individual machine ISA mem ops. so we remove this functionality from the pipeline and allow it to be implemented by the individual instructions.	2016-10-26 22:47:19 -04:00
Tony Gutierrez	7ac38849ab	gpu-compute: remove inst enums and use bit flag for attributes this patch removes the GPUStaticInst enums that were defined in GPU.py. instead, a simple set of attribute flags that can be set in the base instruction class are used. this will help unify the attributes of HSAIL and machine ISA instructions within the model itself. because the static instrution now carries the attributes, a GPUDynInst must carry a pointer to a valid GPUStaticInst so a new static kernel launch instruction is added, which carries the attributes needed to perform a the kernel launch.	2016-10-26 22:47:11 -04:00
Tony Gutierrez	e1ad8035a3	gpu-compute: move disassemle() implementation to GPUStaticInst	2016-10-26 22:47:05 -04:00
Tony Gutierrez	0a6cdff176	gpu-compute, arch: add some methods to the base inst classes for ISA support	2016-10-26 22:47:01 -04:00
Tony Gutierrez	c7d4afd878	ruby: make a RequestDesc class instead of std::pair the RequestDesc was previously implemented as a std::pair, which made the implementation overly complex and error prone. here we encapsulate the packet, primary, and secondary types all in a single data structure with all members properly intialized in a ctor	2016-10-26 22:46:58 -04:00
Bjoern A. Zeeb	28c84d2886	arm, dev: pl011 console interactivity Improve PL011 console interactivity Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2016-10-15 15:11:04 -05:00
Nicolas Derumigny	976ef444b8	syscall: read() should not write anything if reading EOF. Read() should not write anything when returning 0 (EOF). This patch does not correct the same bug occuring for : nbr_read=read(file, buf, nbytes) When nbr_read<nbytes, nbytes bytes are copied into the virtual RAM instead of nbr_read. If buf is smaller than nbytes, a page fault occurs, even if buf is in fact bigger than nbr_read. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2016-10-15 15:06:24 -05:00
Fernando Endo	6c72c35519	cpu, arm: Distinguish Float* and SimdFloat, create FloatMem opClass Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which distinguishes writes to the INT and FP register banks. Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72, where the "latency" of FMADD is 3 if the next instruction is a FMADD and has only the augend to destination dependency, otherwise it's 7 cycles. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2016-10-15 14:58:45 -05:00
Jason Lowe-Power	824c87634d	stats: Add more information to uninitialized error ClockedObject was changed to require its regStats() to be called from every child class. If you forget to do this, the error was indecipherable. This patch makes the error more clear.	2016-10-14 09:02:03 -05:00
Omar Naji	78dd152a0d	mem: add DRAM powerdown current Change-Id: I763cffe0c69f5ebbbf6a6eb12bec5c13d5d0161d Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>	2016-10-13 19:22:11 +01:00
Wendy Elsasser	1dc16aff24	mem: Add DRAM low-power functionality Added power-down state transitions to the DRAM controller model. Added per rank parameter, outstandingEvents, which tracks the number of outstanding command events and is used to determine when the controller should transition to a low power state. The controller will only transition when there are no outstanding events scheduled and the number of command entries for the given rank is 0. The outstandingEvents parameter is incremented for every RD/WR burst, PRE, and REF event scheduled. ACT is implicitly covered by RD/WR since burst will always issue and complete after a required ACT. The parameter is decremented when the event is serviced (completed). The controller will automatically transition to ACT power down, PRE power down, or SREF. Transition to ACT power down state scheduled from: 1) The RespondEvent, where read data is received from the memory. ACT power-down entry will be scheduled when one or more banks is open, all commands for the rank have completed (no more commands scheduled), and there are no commands in queue for the rank Transition to PRE power down scheduled from: 1) respondEvent, when all banks are closed, all commands have completed, and there are no commands in queue for the rank 2) prechargeEvent when all banks are closed, all commands have completed, and there are no commands in queue for the rank 3) refreshEvent, after the refresh is complete when the previous state was ACT power-down 4) refreshEvent, after the refresh is complete when the previous state was PRE power-down and there are commands in the queue. Transition to SREF will be scheduled from: 1) refreshEvent, after the refresh is completes when the previous state was PRE power-down with no commands in queue Power-down exit commands are scheduled from: 1) The refreshEvent, prior to issuing a refresh 2) doDRAMAccess, to wake-up the rank for RD/WR command issue. Self-refresh exit commands are scheduled from: 1) The next request event, when the queue has commands for the rank in the readQueue or there are commands for the rank in the writeQueue and the bus state is WRITE. Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>	2016-10-13 19:22:11 +01:00
Wendy Elsasser	7b269f2c95	mem: Add callback to compute stats prior to dump event The per rank statistics are periodically updated based on state transition and refresh events. Add a method to update these when a dump event occurs to ensure they reflect accurate values. Specifically, need to ensure that the low-power state durations, power, and energy are logged correctly. Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>	2016-10-13 19:22:11 +01:00
Wendy Elsasser	0dd0d4ee7a	mem: Modify drain to ensure banks and power are idled Add constraint that all ranks have to be in PWR_IDLE before signaling drain complete This will ensure that the banks are all closed and the rank has exited any low-power states. On suspend, update the power stats to sync the DRAM power logic The logic maintains the location of the signalDrainDone method, which is still triggered from either: 1) Read response event 2) Next request event This ensures that the drain will complete in the READ bus state and minimizes the changes required. Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>	2016-10-13 19:22:11 +01:00
Wendy Elsasser	27665af26d	mem: Sort memory commands and update DRAMPower Add local variable to stores commands to be issued. These commands are in order within a single bank but will be out of order across banks & ranks. A new procedure, flushCmdList, sorts commands across banks / ranks, and flushes the sorted list, up to curTick() to DRAMPower. This is currently called in refresh, once all previous commands are guaranteed to have completed. Could be called in other events like the powerEvent as well. By only flushing commands up to curTick(), will not get out of sync when flushed at a periodic stats dump (done in subsequent patch). Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>	2016-10-13 19:22:10 +01:00
Omar Naji	61b2b493d4	mem: update DDR3 die revision Change-Id: I8992ddc1664c3ed4b2d36d8a34e4ce8be113b9de Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>	2016-10-13 19:22:10 +01:00
Omar Naji	d19dc35b06	mem: add DRAM powerdown timing	2016-10-13 19:22:10 +01:00
Omar Naji	20e6bb0140	mem: make DDR4 x16	2016-10-13 19:22:10 +01:00
Mitch Hayenga	bd0c2d5b0b	isa,arm: Add missing AArch32 FP instructions This commit adds missing non-predicated, scalar floating point instructions. Specifically VRINT* floating point integer rounding instructions and VSEL* floating point conditional selects. Change-Id: I23cbd1389f151389ac8beb28a7d18d5f93d000e7 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>	2016-10-13 19:22:10 +01:00
Andreas Sandberg	8c5df4be2e	dev, arm: Make GenericTimer param handling more robust The generic timer needs a pointer to an ArmSystem to wire itself to the system register handler. This was previously specified as an instance of System that was later cast to ArmSystem. Make this more robust by specifying it as an ArmSystem in the Python interface and add a check to make sure that it is non-NULL. Change-Id: I989455e666f4ea324df28124edbbadfd094b0d02 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>	2016-10-07 14:14:44 +01:00
Tushar Krishna	22e6f65d72	ruby: Add M5_VAR_USED before variables used only inside assert in garnet2.0. This removes errors when building gem5.fast	2016-10-06 21:06:00 -04:00
Tushar Krishna	dbe8892b76	ruby: garnet2.0 Revamped version of garnet with more optimized single-cycle routers, more configurability, and cleaner code.	2016-10-06 14:35:22 -04:00
Tushar Krishna	b512f4bf71	ruby: remove the original garnet code. Only garnet2.0 will be supported henceforth.	2016-10-06 14:35:21 -04:00
Tushar Krishna	0962d76827	config: add port directions and per-router delay in topology. This patch adds port direction names to the links during topology creation, which can be used for better printed names for the links or for users to code up their own adaptive routing algorithms. It also adds support for every router to have an independent latency value to support heterogeneous topologies with the subsequent garnet2.0 patch.	2016-10-06 14:35:20 -04:00
Tushar Krishna	003c08fa90	config: make internal links in network topology unidirectional. This patch makes the internal links within the network topology unidirectional, thus allowing any deadlock-free routing algorithms to be specified from the topology itself using weights. This patch also renames Mesh.py and MeshDirCorners.py to Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing). It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.	2016-10-06 14:35:18 -04:00
Tushar Krishna	0f68b50ff1	ruby: rename networktest to garnet_synthetic_traffic. networktest is essentially a collection of synthetic traffic patterns for the network. The protocol name and the tester having the same name led to multiple python configuration files with the same name, adding confusion. This patch renames networktest to garnet_synthetic_traffic, and also adds more synthetic traffic patterns.	2016-10-06 14:35:16 -04:00
Tushar Krishna	aca869bf2d	ruby: rename ALPHA_Network_test protocol to Garnet_standalone. Over the past 6 years, we realized that the protocol is essentially used to run the garnet network in a standalone manner, and feed standard synthetic traffic patterns through it.	2016-10-06 14:35:14 -04:00
Alexandru Dutu	3f0118876f	kvm: Adding details to kvm page fault in x86 Adding details, e.g. rip, rsp etc. to the kvm pagefault exit when in SE mode.	2016-10-04 13:06:05 -04:00
Alexandru Dutu	526b1b7ec8	misc: Adds a warning in case gdb is attached multiple times Instead of scheduling another event, this patch adds a warning in case gdb is attached multiple times and the first attachement event has not been processed yet.	2016-10-04 13:04:19 -04:00
Alexandru Dutu	c8cf71f1a0	gpu-compute: Added method to compute the actual workgroup size This patch adds a method to the Wavefront class to compute the actual workgroup size. This can be different from the maximum workgroup size specified when launching the kernel through the NDRange object. Current solution is still not optimal, as we are computing these for each wavefront and the dispatcher also needs to have this information and can't actually call Wavefront::computeActuallWgSz before the wavefronts are being created. A long term solution would be to have a Workgroup class that deals with all these details.	2016-10-04 13:03:52 -04:00
Andreas Sandberg	18135ce6ab	sim: Add a checkpoint function to test for entries When loading a checkpoint, it's sometimes desirable to be able to test whether an entry within a secion exists. This is currently done automatically in the UNSERIALIZE_OPT_SCALAR macro, but it isn't possible to do for arrays, containers, or enums. Instead of adding even more macros, add a helper function (CheckpointIn::entryExists()) that tests for the presence of an entry. Change-Id: I4b4646b03276b889fd3916efefff3bd552317dbc Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>	2016-10-04 11:22:16 +01:00
Brad Beckmann	ee78758857	ruby: correct size for partial memory writes Fixed AbstractController::queueMemoryWritePartial to specify the correct size for partial memory writes.	2016-09-29 01:06:52 -04:00
Brad Beckmann	f0971354c4	mem: minor dprintf fix to abstract mem print number of bytes written as a decimal number, not hex	2016-09-29 01:06:33 -04:00
Curtis Dunham	109cc2caa6	arm: disable GIC extensions Change-Id: If19b9c593b48ded1ea848f2d3710d4369ec8a221 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2016-09-22 14:46:37 +01:00
Rekai Gonzalez-Alberquilla	ad296b068c	cpu: Fix the O3 CPU Drain The drain did not wait until stages were ready again. Therefore, as a result of messages in the TimeBuffer being drain, the state after the drain was not consistent and asserts fired in some places when the draining happened after a stage got blocked, but before the notification arrived to the previous stages. Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2016-09-22 10:49:10 +01:00
Tony Gutierrez	84f9747688	gpu-compute: fix typo in GPUDispatcher	2016-09-16 14:47:19 -04:00
Alexandru Dutu	68127ca3da	hsail: Fix disassembly of load instruction with 3 destination operands	2016-09-16 12:36:20 -04:00
Alexandru Dutu	bd65ec0744	gpu-compute: Adding context serialization methods to Wavefront This patch adds methods to serialize the context of a particular wavefront to the simulated system memory. Context serialization is used when a wavefront is preempeted (i.e. context switch).	2016-09-16 12:32:36 -04:00
Alexandru Dutu	e9b14d5111	gpu-compute: Refactoring Wavefront::dynWaveId	2016-09-16 12:31:46 -04:00
Alexandru Dutu	498d0e63e5	gpu-compute: Adding vector register file debug messages This patch introduces DPRINTFs for reading and writing to and from the vector register file.	2016-09-16 12:30:05 -04:00
Alexandru Dutu	7918376450	gpu-compute: Changing reconvergenceStack type std::stack has no iterators, therefore the reconvergence stack can't be iterated without poping elements off. We will be using std::list instead to be able to iterate for saving and restoring purposes.	2016-09-16 12:29:01 -04:00
Alexandru Dutu	d5c8c5d3db	gpu-compute: Adding ioctl for HW context size Adding runtime support for determining the memory required by a SIMD engine when executing a particular wavefront.	2016-09-16 12:27:56 -04:00
Alexandru Dutu	589e13a23b	gpu-compute: Wavefront refactoring Renaming members of the Wavefront class in accordance with the style guide.	2016-09-16 12:26:52 -04:00
Alexandru Dutu	e9fe1b838b	gpu-compute: Remove WFContext WFContext struct is currently unused and it has been rendered not useful in saving and restoring the context of a Wavefront. Wavefront class should be sufficient for that purpose and the runtime can figure out the memory size it will need to allocate for a Wavefront through an IOCTL.	2016-09-16 12:26:03 -04:00
Curtis Dunham	18461d1522	base: eliminate ipython warning Change-Id: I3e282baeb969b6bb9534813a2f433d68246c0669 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2016-09-15 18:21:38 +01:00
Ricardo Alves	e5c1488cb6	arm: Add m5_fail support for aarch64 Change-Id: Id2acbc09772be310a0eb9e33295afab07e08a4fa Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2016-09-15 18:21:24 +01:00
Radhika Jagtap	1fe5f63137	cpu: Support exit when any one Trace CPU completes replay This change adds a Trace CPU param to exit simulation early, i.e. when the first (any one) trace execution is complete. With this change the user gets a choice to configure exit as either when the last CPU finishes (default) or first CPU finishes replay. Configuring an early exit enables simulating and measuring stats strictly when memory-system resources are being stressed by all Trace CPUs. Change-Id: I3998045fdcc5cd343e1ca92d18dd7f7ecdba8f1d Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>	2016-09-15 18:01:20 +01:00
Radhika Jagtap	d067327fc0	cpu: Adjust for trace offset and fix stats This change subtracts the time offset present in the trace from all the event times when nodes and request are sent so that the replay starts immediately when the simulation starts. This makes the stats accurate when the time offset in traces is large, for example when traces are generated in the middle of a workload execution. It also solves the problem of unnecessary DRAM refresh events that would keep occuring during the large time offset before even a single request is replayed into the system. Change-Id: Ie0898842615def867ffd5c219948386d952af7f7 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>	2016-09-15 18:01:16 +01:00
Radhika Jagtap	d7724d5f54	cpu: Add frequency scaling to the Trace CPU This change adds a simple feature to scale the frequency of the Trace CPU. The compute delays in the input traces provide timing. This change adds a freqency multiplier parameter to the Trace CPU set to 1.0 by default. The compute delay is manipulated to effectively achieve the frequency at which the nodes become ready and thus scale the frequency of the Trace CPU. Change-Id: Iaabbd57806941ad56094fcddbeb38fcee1172431 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>	2016-09-15 18:01:09 +01:00

1 2 3 4 5 ...

7446 commits