sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Nilay Vaish	59befdb628	Added tag stable_2013_06_16 for changeset 07352f119e48	2013-06-16 08:27:42 -05:00
Nilay Vaish	be981772b9	config: Do not instantiate membus when using ruby This patch moves the instantiation of system.membus in se.py to the area of code where classic memory system has been dealt with. Ruby does not require this bus and hence it should not be instantiated.	2013-06-13 07:24:25 -05:00
Andreas Sandberg	64270b19c3	kvm: Add more VM stats This changeset adds the following stats to KVM: * numVMHalfEntries: Number of entries into KVM to finalize pending IO operations without executing guest instructions. These typically happen as a result of a drain where the guest must finalize some operations before the guest state is consistent. * numExitSignal: Number of VM exits that have been triggered by a signal. These usually happen as a result of the timer that limits the time spent in KVM.	2013-06-11 09:43:05 +02:00
Andreas Sandberg	c97a99110b	kvm: Separate host frequency from simulated CPU frequency We used to use the KVM CPU's clock to specify the host frequency. This was not ideal for several reasons. One of them being that the clock parameter of a CPU determines the frequency of some of the components connected to the CPU. This changeset adds a separate hostFreq parameter that should be used to specify the host frequency until we add code to autodetect it. The hostFactor should still be used to specify the conversion factor between the host performance and that of the simulated system.	2013-06-11 09:24:55 +02:00
Andreas Sandberg	4f002930bc	kvm: Don't handle IO and execute in the same tick We currently execute instructions in the guest and then handle any IO request right after we break out of the virtualized environment. This has the effect of executing IO requests in the exact same tick as the first instruction in the sequence that was just run. There seem to be cases where this simplification upsets some timing-sensitive devices. This changeset splits execute and IO (and other services) across multiple ticks. This is implemented by adding a separate RunningService state to the CPU state machine. When a VM requires service, it enters into this state and pending IO is then serviced in the future instead of immediately. The delay between getting the request and servicing it depends on the number of cycles executed in the guest, which allows other components to catch up with the CPU.	2013-06-11 09:24:51 +02:00
Andreas Sandberg	df059f45a0	kvm: Maintain a local instruction counter and update totalNumInsts Update the system's totalNumInst counter when exiting from KVM and maintain an internal absolute instruction count instead of relying on the one from perf.	2013-06-11 09:24:40 +02:00
Andreas Sandberg	0b4a8b4086	x86: Fix bug when copying TSC on CPU handover The TSC value stored in MISCREG_TSC is actually just an offset from the current CPU cycle to the actual TSC value. Writes with side-effects to the TSC subtract the current cycle count before storing the new value, while reads add the current cycle count. When switching CPUs, the current value is copied without side-effects. This works as long as the source and the destination CPUs have the same clock frequencies. The TSC will jump, sometimes backwards, if they have different clock frequencies. Most OSes assume the TSC to be monotonic and break when this happens. This changeset makes sure that the TSC is copied with side-effects to ensure that the offset is updated to match the new CPU.	2013-06-11 09:24:38 +02:00
Andreas Sandberg	2442aae54f	sim: Revert [34e3295b0e39] (sim: Fix early termination in mult...) HG changset 34e3295b0e39 introduced a check in the main simulation loop that discards exit events that happen at the same tick as another exit event. This was supposed to fix a problem where a simulation script got confused by multiple exit events. This obviously breaks the simulator since it can hide important simulation events, such as a simulation failure, that happen at the same time as a non-fatal simulation event.	2013-06-11 09:24:10 +02:00
Andreas Sandberg	0793d0727b	cpu: Add support for scheduling multiple inst/load stop events Currently, the only way to get a CPU to stop after a fixed number of instructions/loads is to set a property on the CPU that causes a SimLoopExitEvent to be scheduled when the CPU is constructed. This is clearly not ideal in cases where the simulation script wants the CPU to stop at multiple instruction counts (e.g., SimPoint generation). This changeset adds the methods scheduleInstStop() and scheduleLoadStop() to the BaseCPU. These methods are exported to Python and are designed to be used from the simulation script. By using these methods instead of the old properties, a simulation script can schedule a stop at any point during simulation or schedule multiple stops. The number of instructions specified when scheduling a stop is relative to the current point of execution.	2013-06-11 09:18:25 +02:00
Nilay Vaish	247e4e9ab4	stats: updates due to changes to ruby Ruby's controller statistics have been mostly moved to stats.txt now. Plus stats.txt for solaris/t1000-simple-atomic and arm/20.parser are also being updated.	2013-06-10 06:46:20 -05:00
Nilay Vaish	d32ee94231	ruby: remove several unused variables in Profiler This patch removes per processor cycle count, histogram for filter stats, histogram for multicasts, histogram for prefetch wait, some function prototypes that do not have definitions.	2013-06-09 07:30:00 -05:00
Nilay Vaish	27b321f2f7	ruby: remove periodic event from Profiler The Profiler class does not need an event for dumping statistics periodically. This is because there is a method for dumping statistics for all the sim objects periodically. Since Ruby is a sim object, its statistics are also included.	2013-06-09 07:29:59 -05:00
Nilay Vaish	f59a7af50a	ruby: stats: use gem5's stats for cache and memory controllers This moves event and transition count statistics for cache controllers to gem5's statistics. It does the same for the statistics associated with the memory controller in ruby. All the cache/directory/dma controllers individually collect the event and transition counts. A callback function, collateStats(), has been added that is invoked on the controller version 0 of each controller class. This function adds all the individual controller statistics to a vector variables. All the code for registering the statistical variables and collating them is generated by SLICC. The patch removes the files _Profiler.{cc,hh} and _ProfileDumper.{cc,hh} which were earlier used for collecting and dumping statistics respectively.	2013-06-09 07:29:59 -05:00
Nilay Vaish	38736ce7c3	ruby: remove undefined functions in Address class	2013-06-09 07:29:58 -05:00
Nilay Vaish	f2b5b4c8cc	stats: allow printing vectors on a single line This patch adds a new flag to specify if the data values for a given vector should be printed in one line in the stats.txt file. The default behavior will be to print the data in multiple lines. It makes changes to print functions to enforce this behavior.	2013-06-09 07:29:57 -05:00
Nilay Vaish	b5d315518c	config: add atomic cpu to X86_MESI_CMP_directory build options There is some problem with the way listing cpu options right now. Since Ruby uses the options variable, this variable has to be created in the config file that runs the fs test for x86 and mesi cmp directory combination. While creating the variable, some error is occurs due to the way list of cpu types is now created. Hence, we need to compile all the cpu models.	2013-06-08 21:57:02 -05:00
Steve Reinhardt	bd39adfa98	Updating EIO regression reference outputs for new stats.	2013-06-08 10:28:33 -04:00
Ali Saidi	2b582ad9bb	scons: ammend swig warning error to version 2.0.10 as well	2013-06-04 15:17:04 -05:00
Andreas Sandberg	a3685b0181	dev: Clarify why updates are delayed when the MC14818 is activated	2013-06-04 10:08:21 +02:00
Andreas Sandberg	7846f59d0d	arch: Create a method to finalize physical addresses in the TLB Some architectures (currently only x86) require some fixing-up of physical addresses after a normal address translation. This is usually to remap devices such as the APIC, but could be used for other memory mapped devices as well. When running the CPU in a using hardware virtualization, we still need to do these address fix-ups before inserting the request into the memory system. This patch moves this patch allows that code to be used by such CPUs without doing full address translations.	2013-06-03 13:55:41 +02:00
Andreas Sandberg	63dae28703	base: Make the Python module loader PEP302 compliant The custom Python loader didn't comply with PEP302 for two reasons: * Previously, we would overwrite old modules on name conflicts. PEP302 explicitly states that: "If there is an existing module object named 'fullname' in sys.modules, the loader must use that existing module". * The "__package__" attribute wasn't set. PEP302: "The __package__ attribute must be set." This changeset addresses both of these issues.	2013-06-03 13:51:03 +02:00
Andreas Sandberg	d989a3ad50	config: Add missing CPUs to --restore-with-cpu The --restore-with-cpu option didn't use CpuConfig.cpu_names() to determine which CPU names are valid, instead it used a static list of known CPU names. This changeset makes the option parsing code use the CPU list from the CpuConfig module instead.	2013-06-03 13:40:05 +02:00
Andreas Sandberg	c2ec232920	kvm: Allow architectures to override the cycle accounting mechanism Some architectures have special registers in the guest that can be used to do cycle accounting. This is generally preferrable since the prevents the guest from seeing a non-monotonic clock. This changeset adds a virtual method, getHostCycles(), that the architecture-specific code can override to implement this functionallity. The default implementation uses the hwCycles counter.	2013-06-03 13:39:11 +02:00
Andreas Sandberg	15f81b6ed9	kvm: Add handling of EAGAIN when creating timers timer_create can apparently return -1 and set errno to EAGAIN if the kernel suffered a temporary failure when allocating a timer. This happens from time to time, so we need to handle it.	2013-06-03 13:38:59 +02:00
Andreas Sandberg	743f80712e	sim: Add debug output when executing pseudo-instructions	2013-06-03 13:21:21 +02:00
Andreas Sandberg	2b65fce5d9	kvm: Add a call to thread->startup() in startup() It is now required to initialize the thread context by calling startup() on it. Failing to do so currently causes decoder in x86-based CPUs to get very confused when restoring from checkpoints.	2013-06-03 12:36:56 +02:00
Andreas Sandberg	5e60f87aa3	dev: Add support for disabling ticking and the divider in MC146818 Some Linux versions disable updates (regB.set = 1) to prevent the chip from updating its internal state while the OS is updating it. Support for this was already there, this patch merely disables the check in writeReg that prevented it from being enabled. The patch also includes support for disabling the divider, which is used to control when clock updates should start after setting the internal RTC state. These changes are required to boot most vanilla Linux distributions that update the RTC settings at boot.	2013-06-03 12:28:52 +02:00
Andreas Sandberg	14b8a17f28	dev: Clean up MC146818 register (A & B) handling Rewrite reg A & B handling to use the bitunion stuff instead of bit masking. Add better error messages when the kernel tries to enable unsupported stuff.	2013-06-03 12:28:41 +02:00
Andreas Hansson	74553c7d3f	stats: Update the stats to reflect bus and memory changes This patch updates the stats to reflect the addition of the bus stats, and changes to the bus layers. In addition it updates the stats to match the addition of the static pipeline latency of the memory conotroller and the addition of a stat tracking the bytes per activate.	2013-05-30 12:54:18 -04:00
Andreas Hansson	3bc4ecdcb4	mem: More descriptive DRAM config names This patch changes the class names of the variuos DRAM configurations to better reflect what memory they are based on. The speed and interface width is now part of the name, and also the alias that is used to select them on the command line. Some minor changes are done to the actual parameters, to better reflect the named configurations. As a result of these changes the regressions change slightly and the stats will be bumped in a separate patch.	2013-05-30 12:54:14 -04:00
Andreas Hansson	83d99aebb1	mem: Add bytes per activate DRAM controller stat This patch adds a histogram to track how many bytes are accessed in an open row before it is closed. This metric is useful in characterising a workload and the efficiency of the DRAM scheduler. For example, a DDR3-1600 device requires 44 cycles (tRC) before it can activate another row in the same bank. For a x32 interface (8 bytes per cycle) that means 8 x 44 = 352 bytes must be transferred to hide the preparation time.	2013-05-30 12:54:13 -04:00
Andreas Hansson	d82bffd297	mem: Add static latency to the DRAM controller This patch adds a frontend and backend static latency to the DRAM controller by delaying the responses. Two parameters expressing the frontend and backend contributions in absolute time are added to the controller, and the appropriate latency is added to the responses when adding them to the (infinite) queued port for sending. For writes and reads that hit in the write buffer, only the frontend latency is added. For reads that are serviced by the DRAM, the static latency is the sum of the pipeline latencies of the entire frontend, backend and PHY. The default values are chosen based on having roughly 10 pipeline stages in total at 500 MHz. In the future, it would be sensible to make the controller use its clock and convert these latencies (and a few of the DRAM timings) to cycles.	2013-05-30 12:54:12 -04:00
Andreas Hansson	7da851d1a8	mem: Spring cleaning of MSHR and MSHRQueue This patch does some minor tidying up of the MSHR and MSHRQueue. The clean up started as part of some ad-hoc tracing and debugging, but seems worthwhile enough to go in as a separate patch. The highlights of the changes are reduced scoping (private) members where possible, avoiding redundant new/delete, and constructor initialisation to please static code analyzers.	2013-05-30 12:54:11 -04:00
Andreas Hansson	42191522cc	mem: Fix MSHR print format This patch fixes an incorrect print format string by adding an additional string element.	2013-05-30 12:54:09 -04:00
Andreas Hansson	4d7d8393ed	cpu: Prune the stale TraceCPU This patch prunes the TraceCPU as the code is stale and the functionality that it provided can now be achieved with the TrafficGen using its trace playback mode. The TraceCPU was able to play back pre-recorded memory traces of a few different formats, and to achieve this level of flexibility with the TrafficGen, use the util/encode_packet_trace (with suitable modifications) to create a protobuf trace off-line.	2013-05-30 12:54:09 -04:00
Sascha Bischoff	6f4be9bd4c	cpu: Check that minimum TrafficGen period is less than max period Add a check which ensures that the minumum period for the LINEAR and RANDOM traffic generator states is less than or equal to the maximum period. If the minimum period is greater than the maximum period a fatal is triggered.	2013-05-30 12:54:08 -04:00
Sascha Bischoff	04ccc79134	cpu: Fix bug when reading in TrafficGen state transitions This patch fixes a bug with the traffic generator which occured when reading in the state transitions from the configuration file. Previously, the size of the vector which stored the transitions was used to get the size of the transitions matrix, rather than using the number of states. Therefore, if there were more transitions than states, i.e. some transitions has a probability of less than 1, then the traffic generator would fatal when trying to check the transitions. This issue has been addressed by using the number of input states, rather then the number of transitions.	2013-05-30 12:54:07 -04:00
Andreas Hansson	fc09bc8678	cpu: Add request elasticity to the traffic generator This patch adds an optional request elasticity to the traffic generator, effectievly compensating for it in the case of the linear and random generators, and adding it in the case of the trace generator. The accounting is left with the top-level traffic generator, and the individual generators do the necessary math as part of determining the next packet tick. Note that in the linear and random generators we have to compensate for the blocked time to not be elastic, i.e. without this patch the aforementioned generators will slow down in the case of back-pressure.	2013-05-30 12:54:06 -04:00
Andreas Hansson	4931414ca7	cpu: Block traffic generator when requests have to retry This patch changes the queued port for a conventional master port and stalls the traffic generator when requests are not immediately accepted. This is a first step to allowing elasticity in the injection of requests. The patch also adds stats for the sent packets and retries, and slightly changes how the nextPacketTick and getNextPacket interact. The advancing of the trace is now moved to getNextPacket and nextPacketTick is only responsible for answering the question when the next packet should be sent.	2013-05-30 12:54:05 -04:00
Andreas Hansson	c9c35da934	cpu: Move traffic generator sending out of generator states This patch moves the responsibility for sending packets out of the generator states and leaves it with the top-level traffic generator. The main aim of this patch is to enable a transition to non-queued ports, i.e. with send/retry flow control, and to do so it is much more convenient to not wrap the port interactions and instead leave it all local to the traffic generator. The generator states now only govern when they are ready to send something new, and the generation of the packets to send. They thus have no knowledge of the port that is used.	2013-05-30 12:54:04 -04:00
Andreas Hansson	ba11a02cf2	cpu: Fold together the StateGraph and the TrafficGen This patch simplifies the object hierarchy of the traffic generator by getting rid of the StateGraph class and folding this functionality into the traffic generator itself. The main goal of this patch is to facilitate upcoming changes by reducing the number of affected layers.	2013-05-30 12:54:03 -04:00
Andreas Hansson	7e13c4d046	mem: Make returning snoop responses occupy response layer This patch introduces a mirrored internal snoop port to facilitate easy addition of flow control for the snoop responses that are turned into normal responses on their return. To perform this, the slave ports of the coherent bus are wrapped in internal master ports that are passed as the source ports to the response layer in question. As a result of this patch, there is more contention for the response resources, and as such system performance will decrease slightly. A consequence of the mirrored internal port is that the port the bus tells to retry (the internal one) and the port actually retrying (the mirrored) one are not the same. Thus, the existing check in tryTiming is not longer correct. In fact, the test is redundant as the layer is only in the retry state while calling sendRetry on the waiting port, and if the latter does not immediately call the bus then the retry state is left. Consequently the check is removed.	2013-05-30 12:54:02 -04:00
Andreas Hansson	2308f812ef	mem: Make the buses multi layered This patch makes the buses multi layered, and effectively creates a crossbar structure with distributed contention ports at the destination ports. Before this patch, a bus could have a single request, response and snoop response in flight at any time, and with these changes there can be as many requests as connected slaves (bus master ports), and as many responses as connected masters (bus slave ports). Together with address interleaving, this patch enables us to create high-throughput memory interconnects, e.g. 50+ GByte/s.	2013-05-30 12:54:01 -04:00
Andreas Hansson	e82996d9da	mem: Separate the two snoop response cases in the bus This patch makes the flow control and state updates of the coherent bus more clear by separating the two cases, i.e. forward as a snoop response, or turn it into a normal response. With this change it is also more clear what resources are being occupied, and that we effectively bypass the busy check for the second case. As a result of the change in resource usage some stats change.	2013-05-30 12:54:00 -04:00
Andreas Hansson	cb62d39835	mem: Tidy up a few variables in the bus This patch does some minor housekeeping on the bus code, removing redundant code, and moving the extraction of the destination id to the top of the functions using it.	2013-05-30 12:53:59 -04:00
Uri Wiener	91f7b065a9	mem: Add basic stats to the buses This patch adds a basic set of stats which are hard to impossible to implement using only communication monitors, and are needed for insight such as bus utilization, transactions through the bus etc. Stats added include throughput and transaction distribution, and also a two-dimensional vector capturing how many packets and how much data is exchanged between the masters and slaves connected to the bus.	2013-05-30 12:53:58 -04:00
Andreas Hansson	e1e73c5f39	mem: Use unordered set in bus request tracking This patch changes the set used to track outstanding requests to an unordered set (part of C++11 STL). There is no need to maintain the order, and hopefully there might even be a small performance benefit.	2013-05-30 12:53:57 -04:00
Andreas Hansson	82397921a5	mem: Check for waiting state in bus draining This patch fixes a bug in the bus where the bus transitions from busy to idle and still has a port that is waiting for a retry from a peer.	2013-05-30 12:53:57 -04:00
Andreas Hansson	bf6291460d	mem: Add a LPDDR3-1600 configuration This patch adds a typical (leaning towards fast) LPDDR3 configuration based on publically available data. As expected, it looks very similar to the LPDDR2-S4 configuration, only with a slightly lower burst time.	2013-05-30 12:53:56 -04:00
Andreas Hansson	ce1ad84abd	mem: Adapt the LPDDR2 to match a single x32 channel This patch adapts the existing LPDDR2 configuration to make use of the multi-channel functionality. Thus, to get a x64 interface two controllers should be instantiated using the makeMultiChannel method. The page size and ranks are also adapted to better suit with a typical LPDDR2 part.	2013-05-30 12:53:55 -04:00

1 2 3 4 5 ...

9808 commits