sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Nilay Vaish	343e94a257	Ruby: Improve Change PerfectSwitch's wakeup function Currently the wakeup function for the PerfectSwitch contains three loops - loop on number of virtual networks loop on number of incoming links loop till all messages for this (link, network) have been routed With an 8 processor mesh network and Hammer protocol, about 11-12% of the was observed to have been spent in this function, which is the highest amongst all the functions. It was found that the innermost loop is executed about 45 times per invocation of the wakeup function, when each invocation of the wakeup function processes just about one message. The patch tries to do away with the redundant executions of the innermost loop. Counters have been added for each virtual network that record the number of messages that need to be routed for that virtual network. The inner loops are only executed when the number of messages for that particular virtual network > 0. This does away with almost 80% of the executions of the innermost loop. The function now consumes about 5-6% of the total execution time.	2011-02-14 16:14:54 -06:00
Gabe Black	5ec5794456	X86: Update stats for the improved branch detection/prediction.	2011-02-13 17:46:04 -08:00
Gabe Black	77b4a37067	X86: Detect branches taking into account instruction size. The size of the current instruction determines what the npc should be if there's no branching.	2011-02-13 17:45:47 -08:00
Gabe Black	44306e8114	X86: Update stats now that the dest reg isn't read unnecessarily to set flags.	2011-02-13 17:45:30 -08:00
Gabe Black	bce2be525d	X86: Put the result used for flags in an intermediate variable. Using the destination register directly causes the ISA parser to treat it as a source even if none of the original bits are used.	2011-02-13 17:45:12 -08:00
Gabe Black	b046f3feb6	X86: Update stats for the reduced register reads.	2011-02-13 17:44:32 -08:00
Gabe Black	4e1adf85f7	X86: Don't read in dest regs if all bits are replaced. In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or 64 bits wide overwrite all bits of the destination register. This change removes false dependencies in these cases where the previous value of a register doesn't need to be read to write a new value. New versions of most microops are created that have a "Big" suffix which simply overwrite their destination, and the right version to use is selected during microop allocation based on the selected data size. This does not change the performance of the O3 CPU model significantly, I assume because there are other false dependencies from the condition code bits in the flags register.	2011-02-13 17:44:24 -08:00
Gabe Black	399e095510	X86: On a bad microopc, return a microop that returns a fault that panics. This way a bad micropc will have to get all the way to commit before killing the simulation. This accounts for misspeculated branches.	2011-02-13 17:42:56 -08:00
Gabe Black	1aa9698fa0	X86: Define fault objects to carry debug messages. These faults can panic/warn/warn_once, etc., instead of instructions doing that themselves directly. That way, instructions can be speculatively executed, and only if they're actually going to commit will their fault be invoked and the panic, etc., happen.	2011-02-13 17:42:05 -08:00
Gabe Black	5ee94f4a3d	X86: Only reset npc to reflect instruction length once. When redirecting fetch to handle branches, the npc of the current pc state needs to be left alone. This change makes the pc state record whether or not the npc already reflects a real value by making it keep track of the current instruction size, or if no size has been set.	2011-02-13 17:41:10 -08:00
Gabe Black	f036fd9748	O3: Fetch from the microcode ROM when needed.	2011-02-13 17:40:07 -08:00
Ali Saidi	7c763b34c9	O3: Fix GCC 4.2.4 complaint	2011-02-13 16:51:15 -05:00
Nilay Vaish	0cede15d6c	Ruby: Reorder Cache Lookup in Protocol Files The patch changes the order in which L1 dcache and icache are looked up when a request comes in. Earlier, if a request came in for instruction fetch, the dcache was looked up before the icache, to correctly handle self-modifying code. But, in the common case, dcache is going to report a miss and the subsequent icache lookup is going to report a hit. Given the invariant - caches under the same controller keep track of disjoint sets of cache blocks, we can move the icache lookup before the dcache lookup. In case of a hit in the icache, using our invariant, we know that the dcache would have reported a miss. In case of a miss in the icache, we know that icache would have missed even if the dcache was looked up before looking up the icache. Effectively, we are doing the same thing as before, though in the common case, we expect reduction in the number of lookups. This was empirically confirmed for MOESI hammer. The ratio lookups to access requests is now about 1.1 to 1.	2011-02-12 11:41:20 -06:00
Korey Sewell	2971b8401a	inorder:regress: host-inst-rate improved ~58% there are still only a few inorder benchmark but for the lengthier benchmarks (twolf and vortext) the latest changes to how instruction scheduling (how instructions figure out what they want to do on each pipeline stage in the inorder model) were able to improve performance by a nice amount... The latest results for the inorder model process about 100k insts/second (note: 58% is over the last time run on 64-bit pool machines at UM)	2011-02-12 10:14:52 -05:00
Korey Sewell	470aa289da	inorder: clean up the old way of inst. scheduling remove remnants of old way of instruction scheduling which dynamically allocated a new resource schedule for every instruction	2011-02-12 10:14:48 -05:00
Korey Sewell	e26aee514d	inorder: utilize cached skeds in pipeline allow the pipeline and resources to use the cached instruction schedule and resource sked iterator	2011-02-12 10:14:45 -05:00
Korey Sewell	516b611462	inorder: define iterator for resource schedules resource skeds are divided into two parts: front end (all insts) and back end (inst. specific) each of those are implemented as separate lists, so this iterator wraps around the traditional list iterator so that an instruction can walk it's schedule but seamlessly transfer from front end to back end when necessary	2011-02-12 10:14:43 -05:00
Korey Sewell	ec9b2ec251	inorder: stage scheduler for front/back end schedule creation add a stage scheduler class to replace InstStage in pipeline_traits.cc use that class to define a default front-end, resource schedule that all instructions will follow. This will also replace the back end schedule in pipeline_traits.cc. The reason for adding this is so that we can cache instruction schedules in the future instead of calling the same function over/over again as well as constantly dynamically alllocating memory on every instruction to try to figure out it's schedule	2011-02-12 10:14:40 -05:00
Korey Sewell	6713dbfe08	inorder: cache instruction schedules first step in a optimization to not dynamically allocate an instruction schedule for every instruction but rather used cached schedules	2011-02-12 10:14:36 -05:00
Korey Sewell	af67631790	inorder: comments for resource sked class	2011-02-12 10:14:34 -05:00
Korey Sewell	800e93f358	inorder: remove unused file inst_buffer file isn't used , so remove it	2011-02-12 10:14:32 -05:00
Korey Sewell	e65c15e931	inorder: remove unused isa ops pass/fail ops were used for testing but arent part of isa	2011-02-12 10:14:26 -05:00
Ali Saidi	2055df8322	Stats: Update the statistics for vnc patch.	2011-02-11 18:29:36 -06:00
Ali Saidi	d4df9e763c	VNC/ARM: Use VNC server and add support to boot into X11	2011-02-11 18:29:36 -06:00
Ali Saidi	d33c1d9592	VNC: Add VNC server to M5	2011-02-11 18:29:35 -06:00
Ali Saidi	ded4d319f2	Serialization: Allow serialization of stl lists	2011-02-11 18:29:35 -06:00
Giacomo Gabrielli	a05032f4df	O3: Fix pipeline restart when a table walk completes in the fetch stage. When a table walk is initiated by the fetch stage, the CPU can potentially move to the idle state and never wake up. The fetch stage must call cpu->wakeCPU() when a translation completes (in finishTranslation()).	2011-02-11 18:29:35 -06:00
Giacomo Gabrielli	74eff1b71b	O3: Fix a few bugs in the TableWalker object. Uncacheable requests were set as such only in atomic mode. currState->delayed is checked in place of currState->timing for resetting currState in atomic mode.	2011-02-11 18:29:35 -06:00
Ali Saidi	1411cb0b0f	SimpleCPU: Fix a case where a DTLB fault redirects fetch and an I-side walk occurs. This change fixes an issue where a DTLB fault occurs and redirects fetch to handle the fault and the ITLB requires a walk which delays translation. In this case the status of the cpu isn't updated appropriately, and an additional instruction fetch occurs. Eventually this hits an assert as multiple instruction fetches are occuring in the system and when the second one returns the processor is in the wrong state. Some asserts below are removed because it was always true (typo) and the state after the initiateAcc() the processor could be in any valid state when a d-side fault occurs.	2011-02-11 18:29:35 -06:00
Giacomo Gabrielli	e2507407b1	O3: Enhance data address translation by supporting hardware page table walkers. Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs, when a TLB miss occurs, initiateTranslation() can return with NoFault but with the translation unfinished. Instructions experiencing a delayed translation due to a hardware page table walk are deferred until the translation completes and kept into the IQ. In order to keep track of them, the IQ has been augmented with a queue of the outstanding delayed memory instructions. When their translation completes, instructions are re-executed (only their initiateAccess() was already executed; their DTB translation is now skipped). The IEW stage has been modified to support such a 2-pass execution.	2011-02-11 18:29:35 -06:00
Ali Saidi	453dbc772d	ARM: Fix timer calculations. The timer calculations were a bit off so time would run faster than it otherwise should	2011-02-11 18:29:35 -06:00
Ali Saidi	59bf0e7eb4	Timesync: Make sure timesync event is setup after curTick is unserialized Setup initial timesync event in initState or loadState so that curTick has been updated to the new value, otherwise the event is scheduled in the past.	2011-02-11 18:29:35 -06:00
Brad Beckmann	8ea71c3907	merged with Ali X11 patch	2011-02-10 13:31:52 -08:00
Brad Beckmann	fbebe9a642	MOESI_hammer: fixed wakeup for SS->S transistion	2011-02-10 13:28:23 -08:00
Ali Saidi	b7457fc11e	Ext: Add X11 keysym header files to ext directory.	2011-02-09 22:27:37 -06:00
Brad Beckmann	06dfee5cea	ruby: removed duplicate make response call	2011-02-09 16:02:09 -08:00
Brad Beckmann	4eab18fd06	regess: protocol regression tester updates	2011-02-08 18:07:54 -08:00
Brad Beckmann	ea9d4c3a97	memtest: due to contention increase, increased deadlock threshold	2011-02-08 15:53:33 -08:00
Brad Beckmann	6ebd7c390b	config: fixed minor bug connecting dma devices to ruby	2011-02-08 15:52:44 -08:00
Nilay Vaish	488280e48b	MESI CMP: Unset TBE pointer in L2 cache controller The TBE pointer in the MESI CMP implementation was not being set to NULL when the TBE is deallocated. This resulted in segmentation fault on testing the protocol when the ProtocolTrace was switched on.	2011-02-08 07:47:02 -06:00
Gabe Black	0851580aad	Stats: Re update stats.	2011-02-07 19:23:13 -08:00
Gabe Black	1b64bfa933	Stats: Back out broken update.	2011-02-07 19:23:11 -08:00
Tim Harris	44e5e7e053	X86: Obey the wp bit of CR0. If cr0.wp ("write protect" bit) is clear then do not generate page faults when writing to write-protected pages in kernel mode.	2011-02-07 15:18:52 -08:00
Tim Harris	6da83b8a1b	X86: Use all 64 bits of the lstar register in the SYSCALL_64 macroop. During SYSCALL_64, use dataSize=8 when handling new rip (ref http://www.intel.com/Assets/PDF/manual/253668.pdf 5.8.8 IA32_LSTAR is a 64-bit address)	2011-02-07 15:16:27 -08:00
Tim Harris	2ea1aa8a4f	X86: Fix JMP_FAR_I to unpack a far pointer correctly. JMP_FAR_I was unpacking its far pointer operand using sll instead of srl like it should, and also putting the components in the wrong registers for use by other microcode.	2011-02-07 15:12:59 -08:00
Tim Harris	5810ab121c	X86: Read the LDT/GDT at CPL0 when executing an iret. During iret access LDT/GDT at CPL0 rather than after transition to user mode (if I'm reading the Intel IA-64 architecture spec correctly, the contents of the descriptor table are read before the CPL is updated).	2011-02-07 15:05:28 -08:00
Nilay Vaish	10b4b364d9	Orion: Replace printf() with fatal() The code for Orion 2.0 makes use of printf() at several places where there as an error in configuration of the model. These have been replaced with fatal().	2011-02-07 12:42:23 -06:00
Korey Sewell	1b4e788407	ruby: add stdio header in SRAM.hh missing header file caused RUBY_FS to not compile	2011-02-07 12:19:46 -05:00
Gabe Black	2107258d24	X86: Add stats for the new x86 fs regressions.	2011-02-07 01:23:16 -08:00
Gabe Black	dd53743797	X86: Add scripts to support X86 FS configurations in the regressions.	2011-02-07 01:23:02 -08:00

... 21 22 23 24 25 ...

9128 commits