sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Andreas Hansson	12eb034378	scons: Enable -Wextra by default Make best use of the compiler, and enable -Wextra as well as -Wall. There are a few issues that had to be resolved, but they are all trivial.	2016-01-11 05:52:20 -05:00
Mitch Hayenga	fafa83ed32	cpu: Add per-thread monitors Adds per-thread address monitors to support FullSystem SMT.	2015-09-30 11:14:19 -05:00
Hongil Yoon	fb0f9884e2	cpu, o3: consider split requests for LSQ checksnoop operations This patch enables instructions in LSQ to track two physical addresses for corresponding two split requests. Later, the information is used in checksnoop() to search for/invalidate the corresponding LD instructions. The current implementation has kept track of only the physical address that is referenced by the first split request. Thus, for checksnoop(), the line accessed by the second request has not been considered, causing potential correctness issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2015-09-15 08:14:06 -05:00
Andreas Sandberg	53e777d683	base: Declare a type for context IDs Context IDs used to be declared as ad hoc (usually as int). This changeset introduces a typedef for ContextIDs and a constant for invalid context IDs.	2015-08-07 09:59:13 +01:00
Nilay Vaish	aafa5c3f86	revert 5af8f40d8f2c	2015-07-28 01:58:04 -05:00
Nilay Vaish	608641e23c	cpu: implements vector registers This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.	2015-07-26 10:21:20 -05:00
Andreas Hansson	bd583d00f9	misc: Appease gcc 5.1 Three minor issues are resolved: 1. Apparently gcc 5.1 does not like negation of booleans followed by bitwise AND. 2. Somehow the compiler also gets confused and warns about NoopMachInst being unused (removing it causes compilation errors though). Most likely a compiler bug. 3. There seems to be a number of instances where loop unrolling causes false positives for the array-bounds check. For now, switch to std::array. Potentially we could disable the warning for newer gcc versions, but switching to std::array is probably a good move in any case.	2015-05-15 13:39:53 -04:00
Andreas Sandberg	48281375ee	mem, cpu: Add a separate flag for strictly ordered memory The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.	2015-05-05 03:22:33 -04:00
Rekai	3d5434022a	cpu: o3 register renaming request handling improved Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not.	2015-03-02 04:00:38 -05:00
Andreas Sandberg	550c318490	sim: Move the BaseTLB to src/arch/generic/ The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. --HG-- rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py rename : src/sim/tlb.cc => src/arch/generic/tlb.cc rename : src/sim/tlb.hh => src/arch/generic/tlb.hh	2015-02-11 10:23:27 -05:00
Ali Saidi	9d8ddd92dc	sim: Clean up InstRecord Track memory size and flags as well as add some comments and consts.	2015-01-25 07:22:44 -05:00
Marc Orr	bf80734b2c	x86 isa: This patch attempts an implementation at mwait. Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up. Committed by: Nilay Vaish <nilay@cs.wisc.edu>	2014-11-06 05:42:22 -06:00
Andreas Hansson	a2d246b6b8	arch: Use shared_ptr for all Faults This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".	2014-10-16 05:49:51 -04:00
Andreas Hansson	341dbf2662	arch: Use const StaticInstPtr references where possible This patch optimises the passing of StaticInstPtr by avoiding copying the reference-counting pointer. This avoids first incrementing and then decrementing the reference-counting pointer.	2014-09-27 09:08:36 -04:00
Andreas Sandberg	326662b01b	arch, cpu: Factor out the ExecContext into a proper base class We currently generate and compile one version of the ISA code per CPU model. This is obviously wasting a lot of resources at compile time. This changeset factors out the interface into a separate ExecContext class, which also serves as documentation for the interface between CPUs and the ISA code. While doing so, this changeset also fixes up interface inconsistencies between the different CPU models. The main argument for using one set of ISA code per CPU model has always been performance as this avoid indirect branches in the generated code. However, this argument does not hold water. Booting Linux on a simulated ARM system running in atomic mode (opt/10.linux-boot/realview-simple-atomic) is actually 2% faster (compiled using clang 3.4) after applying this patch. Additionally, compilation time is decreased by 35%.	2014-09-03 07:42:22 -04:00
Akash Bagdia	2b1a01ee6c	cpu, arm: Allow the specification of a socket field Allow the specification of a socket ID for every core that is reflected in the MPIDR field in ARM systems. This allows studying multi-socket / cluster systems with ARM CPUs.	2014-05-09 18:58:46 -04:00
Andreas Hansson	62fe81e9c1	cpu: Make CPU and ThreadContext getters const This patch merely tidies up the CPU and ThreadContext getters by making them const where appropriate.	2014-03-07 15:56:23 -05:00
Ali Saidi	6bed6e0352	cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped. This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support.	2014-01-24 15:29:30 -06:00
Dam Sunwoo	85e8779de7	mem: per-thread cache occupancy and per-block ages This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.	2014-01-24 15:29:30 -06:00
Ali Saidi	cf266f05a9	cpu: Fix O3 uncacheable load that is replayed but misses the TLB This change fixes an issue in the O3 CPU where an uncachable instruction is attempted to be executed before it reaches the head of the ROB. It is determined to be uncacheable, and is replayed, but a PanicFault is attached to the instruction to make sure that it is properly executed before committing. If the TLB entry it was using is replaced in the interveaning time, the TLB returns a delayed translation when the load is replayed at the head of the ROB, however the LSQ code can't differntiate between the old fault and the new one. If the translation isn't complete it can't be faulting, so clear the fault.	2013-10-17 10:20:45 -05:00
Yasuko Eckert	2c293823aa	cpu: add a condition-code register class Add a third register class for condition codes, in parallel with the integer and FP classes. No ISAs use the CC class at this point though.	2013-10-15 14:22:44 -04:00
Andreas Hansson	d4273cc9a6	mem: Set the cache line size on a system level This patch removes the notion of a peer block size and instead sets the cache line size on the system level. Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used. A follow-on patch updates the configuration scripts accordingly.	2013-07-18 08:31:16 -04:00
Ali Saidi	6df196b71e	O3: Clean up the O3 structures and try to pack them a bit better. DynInst is extremely large the hope is that this re-organization will put the most used members close to each other.	2012-06-05 01:23:09 -04:00
Ali Saidi	1b370431d0	sim: Remove FastAlloc While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe. After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc when running twolf for ARM.	2012-06-05 01:23:08 -04:00
Andreas Hansson	72538294fb	gcc: Clean-up of non-C++0x compliant code, first steps This patch cleans up a number of minor issues aiming to get closer to compliance with the C++0x standard as interpreted by gcc and clang (compile with std=c++0x and -pedantic-errors). In particular, the patch cleans up enums where the last item was succeded by a comma, namespaces closed by a curcly brace followed by a semi-colon, and the use of the GNU-extension typeof (replaced by templated functions). It does not address variable-length arrays, zero-size arrays, anonymous structs, range expressions in switch statements, and the use of long long. The generated CPU code also has a large number of issues that remain to be fixed, mainly related to overflows in implicit constant conversion (due to shifts).	2012-03-19 06:36:09 -04:00
Geoffrey Blake	043709fdfa	CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes.	2012-03-09 09:59:27 -05:00
Andreas Hansson	9f07d2ce7e	CPU: Round-two unifying instr/data CPU ports across models This patch continues the unification of how the different CPU models create and share their instruction and data ports. Most importantly, it forces every CPU to have an instruction and a data port, and gives these ports explicit getters in the BaseCPU (getDataPort and getInstPort). The patch helps in simplifying the code, make assumptions more explicit, andfurther ease future patches related to the CPU ports. The biggest changes are in the in-order model (that was not modified in the previous unification patch), which now moves the ports from the CacheUnit to the CPU. It also distinguishes the instruction fetch and load-store unit from the rest of the resources, and avoids the use of indices and casting in favour of keeping track of these two units explicitly (since they are always there anyways). The atomic, timing and O3 model simply return references to their already existing ports.	2012-02-24 11:42:00 -05:00
Ali Saidi	8aaa39e93d	mem: Add a master ID to each request object. This change adds a master id to each request object which can be used identify every device in the system that is capable of issuing a request. This is part of the way to removing the numCpus+1 stats in the cache and replacing them with the master ids. This is one of a series of changes that make way for the stats output to be changed to python.	2012-02-12 16:07:38 -06:00
Gabe Black	f2b46fdb85	Faults: Turn off arch/faults.hh Because there are no longer architecture independent but specialized functions in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA no longer needs to be able to include them through the switching header file arch/faults.hh. By removing that header file (arch/faults.hh), the potential interface between ISA code and non ISA code is narrowed.	2012-02-07 04:43:21 -08:00
Gabe Black	ea8b347dc5	Merge with head, hopefully the last time for this batch.	2012-01-31 22:40:08 -08:00
Geoffrey Blake	af6aaf2581	CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5 Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification.	2012-01-31 07:46:03 -08:00
Gabe Black	85424bef19	SE/FS: Get rid of includes of config/full_system.hh.	2011-11-18 02:20:22 -08:00
Ali Saidi	649c239cee	LSQ: Only trigger a memory violation with a load/load if the value changes. Only create a memory ordering violation when the value could have changed between two subsequent loads, instead of just when loads go out-of-order to the same address. While not very common in the case of Alpha, with an architecture with a hardware table walker this can happen reasonably frequently beacuse a translation will miss and start a table walk and before the CPU re-schedules the faulting instruction another one will pass it to the same address (or cache block depending on the dendency checking). This patch has been tested with a couple of self-checking hand crafted programs to stress ordering between two cores. The performance improvement on SPEC benchmarks can be substantial (2-10%).	2011-09-13 12:58:08 -04:00
Gabe Black	49a7ed0397	StaticInst: Merge StaticInst and StaticInstBase. Having two StaticInst classes, one nominally ISA dependent and the other ISA dependent, has not been historically useful and makes the StaticInst class more complicated that it needs to be. This change merges StaticInstBase into StaticInst.	2011-09-09 02:40:11 -07:00
Gabe Black	ec204f003c	O3: Add a pointer to the macroop for a microop in the dyninst.	2011-08-14 04:08:14 -07:00
Gabe Black	16882b0483	Translation: Use a pointer type as the template argument. This allows regular pointers and reference counted pointers without having to use any shim structures or other tricks.	2011-08-07 09:21:48 -07:00
Gabe Black	6230668f5e	O3: Get rid of the raw ExtMachInst constructor on DynInsts. This constructor assumes that the ExtMachInst can be decoded directly into a StaticInst that's useful to execute. With the advent of microcoded instructions that's no longer true.	2011-08-02 11:51:16 -07:00
Gabe Black	3a1428365a	ExecContext: Rename the readBytes/writeBytes functions to readMem and writeMem. readBytes and writeBytes had the word "bytes" in their names because they accessed blobs of bytes. This distinguished them from the read and write functions which handled higher level data types. Because those functions don't exist any more, this change renames readBytes and writeBytes to more general names, readMem and writeMem, which reflect the fact that they are how you read and write memory. This also makes their names more consistent with the register reading/writing functions, although those are still read and set for some reason.	2011-07-02 22:35:04 -07:00
Gabe Black	2e7426664a	ExecContext: Get rid of the now unused read/write templated functions.	2011-07-02 22:34:58 -07:00
Ali Saidi	5962fecc1d	CPU: Remove references to memory copy operations	2011-04-04 11:42:26 -05:00
Ali Saidi	7dde557fdc	O3: Tighten memory order violation checking to 16 bytes. The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required.	2011-04-04 11:42:23 -05:00
Giacomo Gabrielli	e2507407b1	O3: Enhance data address translation by supporting hardware page table walkers. Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs, when a TLB miss occurs, initiateTranslation() can return with NoFault but with the translation unfinished. Instructions experiencing a delayed translation due to a hardware page table walk are deferred until the translation completes and kept into the IQ. In order to keep track of them, the IQ has been augmented with a queue of the outstanding delayed memory instructions. When their translation completes, instructions are re-executed (only their initiateAccess() was already executed; their DTB translation is now skipped). The IEW stage has been modified to support such a 2-pass execution.	2011-02-11 18:29:35 -06:00
Ali Saidi	e681c0f7b3	O3: Support squashing all state after special instruction For SPARC ASIs are added to the ExtMachInst. If the ASI is changed simply marking the instruction as Serializing isn't enough beacuse that only stops rename. This provides a mechanism to squash all the instructions and refetch them	2010-12-07 16:19:57 -08:00
Giacomo Gabrielli	719f9a6d4f	O3: Make all instructions that write a misc. register not perform the write until commit. ARM instructions updating cumulative flags (ARM FP exceptions and saturation flags) are not serialized. Added aliases for ARM FP exceptions and saturation flags in FPSCR. Removed write accesses to the FP condition codes for most ARM VFP instructions: only VCMP and VCMPE instructions update the FP condition codes. Removed a potential cause of seg. faults in the O3 model for NEON memory macro-ops (ARM).	2010-12-07 16:19:57 -08:00
Ali Saidi	cdacbe734a	ARM/Alpha/Cpu: Change prefetchs to be more like normal loads. This change modifies the way prefetches work. They are now like normal loads that don't writeback a register. Previously prefetches were supposed to call prefetch() on the exection context, so they executed with execute() methods instead of initiateAcc() completeAcc(). The prefetch() methods for all the CPUs are blank, meaning that they get executed, but don't actually do anything. On Alpha dead cache copy code was removed and prefetches are now normal ops. They count as executed operations, but still don't do anything and IsMemRef is not longer set on them. On ARM IsDataPrefetch or IsInstructionPreftech is now set on all prefetch instructions. The timing simple CPU doesn't try to do anything special for prefetches now and they execute with the normal memory code path.	2010-11-08 13:58:22 -06:00
Gabe Black	6f4bd2c1da	ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.	2010-10-31 00:07:20 -07:00
Gabe Black	6833ca7eed	Faults: Pass the StaticInst involved, if any, to a Fault's invoke method. Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense.	2010-09-13 19:26:03 -07:00
Min Kyu Jeong	03286e9d4e	CPU: Make Exec trace to print predication result (if false) for memory instructions	2010-08-23 11:18:41 -05:00
Min Kyu Jeong	5f91ec3f46	ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate. THis allows the CPU to handle predicated-false instructions accordingly. This particular patch makes loads that are predicated-false to be sent straight to the commit stage directly, not waiting for return of the data that was never requested since it was predicated-false.	2010-08-23 11:18:40 -05:00
Ali Saidi	1d1837ee98	CPU: Set a default value when readBytes faults. This was being done in read(), but if readBytes was called directly it wouldn't happen. Also, instead of setting the memory blob being read to -1 which would (I believe) require using memset with -1 as a parameter, this now uses bzero. It's hoped that it's more specialized behavior will make it slightly faster.	2010-08-23 11:18:39 -05:00

1 2

97 commits