sanchayanmaity/gem5 - Sanchayan Maity's repositories

Author	SHA1	Message	Date
Andreas Hansson	36dc93a5fa	mem: Move crossbar default latencies to subclasses This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios. Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden.	2015-03-02 04:00:47 -05:00
Andreas Hansson	6563ec8634	cpu: Tidy up the MemTest and make false sharing more obvious The MemTest class really only tests false sharing, and as such there was a lot of old cruft that could be removed. This patch cleans up the tester, and also makes it more clear what the assumptions are. As part of this simplification the reference functional memory is also removed. The regression configs using MemTest are updated to reflect the changes, and the stats will be bumped in a separate patch. The example config will be updated in a separate patch due to more extensive re-work. In a follow-on patch a new tester will be introduced that uses the MemChecker to implement true sharing.	2015-02-11 10:23:28 -05:00
Andreas Hansson	1f6d5f8f84	mem: Rename Bus to XBar to better reflect its behaviour This patch changes the name of the Bus classes to XBar to better reflect the actual timing behaviour. The actual instances in the config scripts are not renamed, and remain as e.g. iobus or membus. As part of this renaming, the code has also been clean up slightly, making use of range-based for loops and tidying up some comments. The only changes outside the bus/crossbar code is due to the delay variables in the packet. --HG-- rename : src/mem/Bus.py => src/mem/XBar.py rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh rename : src/mem/bus.cc => src/mem/xbar.cc rename : src/mem/bus.hh => src/mem/xbar.hh	2014-09-20 17:18:32 -04:00
Akash Bagdia	e7e17f92db	power: Add voltage domains to the clock domains This patch adds the notion of voltage domains, and groups clock domains that operate under the same voltage (i.e. power supply) into domains. Each clock domain is required to be associated with a voltage domain, and the latter requires the voltage to be explicitly set. A voltage domain is an independently controllable voltage supply being provided to section of the design. Thus, if you wish to perform dynamic voltage scaling on a CPU, its clock domain should be associated with a separate voltage domain. The current implementation of the voltage domain does not take into consideration cases where there are derived voltage domains running at ratio of native voltage domains, as with the case where there can be on-chip buck/boost (charge pumps) voltage regulation logic. The regression and configuration scripts are updated with a generic voltage domain for the system, and one for the CPUs.	2013-08-19 03:52:28 -04:00
Akash Bagdia	7d7ab73862	sim: Add the notion of clock domains to all ClockedObjects This patch adds the notion of source- and derived-clock domains to the ClockedObjects. As such, all clock information is moved to the clock domain, and the ClockedObjects are grouped into domains. The clock domains are either source domains, with a specific clock period, or derived domains that have a parent domain and a divider (potentially chained). For piece of logic that runs at a derived clock (a ratio of the clock its parent is running at) the necessary derived clock domain is created from its corresponding parent clock domain. For now, the derived clock domain only supports a divider, thus ensuring a lower speed compared to its parent. Multiplier functionality implies a PLL logic that has not been modelled yet (create a separate clock instead). The clock domains should be used as a mechanism to provide a controllable clock source that affects clock for every clocked object lying beneath it. The clock of the domain can (in a future patch) be controlled by a handler responsible for dynamic frequency scaling of the respective clock domains. All the config scripts have been retro-fitted with clock domains. For the System a default SrcClockDomain is created. For CPUs that run at a different speed than the system, there is a seperate clock domain created. This domain incorporates the CPU and the associated caches. As before, Ruby runs under its own clock domain. The clock period of all domains are pre-computed, such that no virtual functions or multiplications are needed when calling clockPeriod. Instead, the clock period is pre-computed when any changes occur. For this to be possible, each clock domain tracks its children.	2013-06-27 05:49:49 -04:00
Akash Bagdia	076d04a653	config: Add a system clock command-line option This patch adds a 'sys_clock' command-line option and use it to assign clocks to the system during instantiation. As part of this change, the default clock in the System class is removed and whenever a system is instantiated a system clock value must be set. A default value is provided for the command-line option. The configs and tests are updated accordingly.	2013-06-27 05:49:49 -04:00
Akash Bagdia	7eccb1b779	config: Remove redundant explicit setting of default clocks This patch removes the explicit setting of the clock period for certain instances of CoherentBus, NonCoherentBus and IOCache where the specified clock is same as the default value of the system clock. As all the values used are the defaults, there are no performance changes. There are similar cases where the toL2Bus is set to use the parent CPU clock which is already the default behaviour. The main motivation for these simplifications is to ease the introduction of clock domains.	2013-06-27 05:49:49 -04:00
Andreas Hansson	9cbe1cb653	config: Unify caches used in regressions and adjust L2 MSHRs This patch unified the L1 and L2 caches used throughout the regressions instead of declaring different, but very similar, configurations in the different scripts. The patch also changes the default L2 configuration to match what it used to be for the fs and se scripts (until the last patch that updated the regressions to also make use of the cache config). The MSHRs and targets per MSHR are now set to a more realistic default of 20 and 12, respectively. As a result of both the aforementioned changes, many of the regression stats are changed. A follow-on patch will bump the stats.	2012-10-30 07:44:08 -04:00
Andreas Hansson	88554790c3	Mem: Use cycles to express cache-related latencies This patch changes the cache-related latencies from an absolute time expressed in Ticks, to a number of cycles that can be scaled with the clock period of the caches. Ultimately this patch serves to enable future work that involves dynamic frequency scaling. As an immediate benefit it also makes it more convenient to specify cache performance without implicitly assuming a specific CPU core operating frequency. The stat blocked_cycles that actually counter in ticks is now updated to count in cycles. As the timing is now rounded to the clock edges of the cache, there are some regressions that change. Plenty of them have very minor changes, whereas some regressions with a short run-time are perturbed quite significantly. A follow-on patch updates all the statistics for the regressions.	2012-10-15 08:10:54 -04:00
Andreas Hansson	072a91ee51	Configs: Set the memtest clock to a reasonable value This patch changes the memtest clock from 1THz (the default) to 2GHz, similar to the CPUs in the other regressions. This is useful as the caches will adopt the same clock as the CPU. The bus clock rate is scaled accordingly, and the L1-L2 bus is kept at the CPU clock while the memory bus is at half that frequency. A separate patch updates the affected stats.	2012-10-15 08:09:57 -04:00
Mrinmoy Ghosh	6fc0094337	Cache: add a response latency to the caches In the current caches the hit latency is paid twice on a miss. This patch lets a configurable response latency be set of the cache for the backward path.	2012-09-25 11:49:41 -05:00
Andreas Hansson	f00cba34eb	Mem: Make SimpleMemory single ported This patch changes the simple memory to have a single slave port rather than a vector port. The simple memory makes no attempts at modelling the contention between multiple ports, and any such multiplexing and demultiplexing could be done in a bus (or crossbar) outside the memory controller. This scenario also matches with the ongoing work on a SimpleDRAM model, which will be a single-ported single-channel controller that can be used in conjunction with a bus (or crossbar) to create a multi-port multi-channel controller. There are only very few regressions that make use of the vector port, and these are all for functional accesses only. To facilitate these cases, memtest and memtest-ruby have been updated to also have a "functional" bus to perform the (de)multiplexing of the functional memory accesses.	2012-07-12 12:56:13 -04:00
Andreas Hansson	0d32940711	Bus: Split the bus into a non-coherent and coherent bus This patch introduces a class hierarchy of buses, a non-coherent one, and a coherent one, splitting the existing bus functionality. By doing so it also enables further specialisation of the two types of buses. A non-coherent bus connects a number of non-snooping masters and slaves, and routes the request and response packets based on the address. The request packets issued by the master connected to a non-coherent bus could still snoop in caches attached to a coherent bus, as is the case with the I/O bus and memory bus in most system configurations. No snoops will, however, reach any master on the non-coherent bus itself. The non-coherent bus can be used as a template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses, and is typically used for the I/O buses. A coherent bus connects a number of (potentially) snooping masters and slaves, and routes the request and response packets based on the address, and also forwards all requests to the snoopers and deals with the snoop responses. The coherent bus can be used as a template for modelling QPI, HyperTransport, ACE and coherent OCP buses, and is typically used for the L1-to-L2 buses and as the main system interconnect. The configuration scripts are updated to use a NoncoherentBus for all peripheral and I/O buses. A bit of minor tidying up has also been done. --HG-- rename : src/mem/bus.cc => src/mem/coherent_bus.cc rename : src/mem/bus.hh => src/mem/coherent_bus.hh rename : src/mem/bus.cc => src/mem/noncoherent_bus.cc rename : src/mem/bus.hh => src/mem/noncoherent_bus.hh	2012-05-31 13:30:04 -04:00
Andreas Hansson	b00949d88b	MEM: Enable multiple distributed generalized memories This patch removes the assumption on having on single instance of PhysicalMemory, and enables a distributed memory where the individual memories in the system are each responsible for a single contiguous address range. All memories inherit from an AbstractMemory that encompasses the basic behaviuor of a random access memory, and provides untimed access methods. What was previously called PhysicalMemory is now SimpleMemory, and a subclass of AbstractMemory. All future types of memory controllers should inherit from AbstractMemory. To enable e.g. the atomic CPU and RubyPort to access the now distributed memory, the system has a wrapper class, called PhysicalMemory that is aware of all the memories in the system and their associated address ranges. This class thus acts as an infinitely-fast bus and performs address decoding for these "shortcut" accesses. Each memory can specify that it should not be part of the global address map (used e.g. by the functional memories by some testers). Moreover, each memory can be configured to be reported to the OS configuration table, useful for populating ATAG structures, and any potential ACPI tables. Checkpointing support currently assumes that all memories have the same size and organisation when creating and resuming from the checkpoint. A future patch will enable a more flexible re-organisation. --HG-- rename : src/mem/PhysicalMemory.py => src/mem/AbstractMemory.py rename : src/mem/PhysicalMemory.py => src/mem/SimpleMemory.py rename : src/mem/physical.cc => src/mem/abstract_mem.cc rename : src/mem/physical.hh => src/mem/abstract_mem.hh rename : src/mem/physical.cc => src/mem/simple_mem.cc rename : src/mem/physical.hh => src/mem/simple_mem.hh	2012-04-06 13:46:31 -04:00
Andreas Hansson	5a9a743cfc	MEM: Introduce the master/slave port roles in the Python classes This patch classifies all ports in Python as either Master or Slave and enforces a binding of master to slave. Conceptually, a master (such as a CPU or DMA port) issues requests, and receives responses, and conversely, a slave (such as a memory or a PIO device) receives requests and sends back responses. Currently there is no differentiation between coherent and non-coherent masters and slaves. The classification as master/slave also involves splitting the dual role port of the bus into a master and slave port and updating all the system assembly scripts to use the appropriate port. Similarly, the interrupt devices have to have their int_port split into a master and slave port. The intdev and its children have minimal changes to facilitate the extra port. Note that this patch does not enforce any port typing in the C++ world, it merely ensures that the Python objects have a notion of the port roles and are connected in an appropriate manner. This check is carried when two ports are connected, e.g. bus.master = memory.port. The following patches will make use of the classifications and specialise the C++ ports into masters and slaves.	2012-02-13 06:43:09 -05:00
Dam Sunwoo	230540e655	mem: fix cache stats to use request ids correctly This patch fixes the cache stats to use the new request ids. Cache stats also display the requestor names in the vector subnames. Most cache stats now include "nozero" and "nonan" flags to reduce the amount of excessive cache stat dump. Also, simplified incMissCount()/incHitCount() functions.	2012-02-12 16:07:39 -06:00
Gabe Black	ec20ee2f7c	SE/FS: Make SE vs. FS mode a runtime parameter.	2012-01-28 07:24:34 -08:00
Andreas Hansson	f85286b3de	MEM: Add port proxies instead of non-structural ports Port proxies are used to replace non-structural ports, and thus enable all ports in the system to correspond to a structural entity. This has the advantage of accessing memory through the normal memory subsystem and thus allowing any constellation of distributed memories, address maps, etc. Most accesses are done through the "system port" that is used for loading binaries, debugging etc. For the entities that belong to the CPU, e.g. threads and thread contexts, they wrap the CPU data port in a port proxy. The following replacements are made: FunctionalPort > PortProxy TranslatingPort > SETranslatingPortProxy VirtualPort > FSTranslatingPortProxy --HG-- rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh	2012-01-17 12:55:08 -06:00
Ali Saidi	a432d8e085	Mem: Fix issue with dirty block being lost when entire block transferred to non-cache. This change fixes the problem for all the cases we actively use. If you want to try more creative I/O device attachments (E.g. sharing an L2), this won't work. You would need another level of caching between the I/O device and the cache (which you actually need anyway with our current code to make sure writes propagate). This is required so that you can mark the cache in between as top level and it won't try to send ownership of a block to the I/O device. Asserts have been added that should catch any issues.	2011-03-17 19:20:19 -05:00
Lisa Hsu	1d3228481f	cache: Make caches sharing aware and add occupancy stats. On the config end, if a shared L2 is created for the system, it is parameterized to have n sharers as defined by option.num_cpus. In addition to making the cache sharing aware so that discriminating tag policies can make use of context_ids to make decisions, I added an occupancy AverageStat and an occ % stat to each cache so that you could know which contexts are occupying how much cache on average, both in terms of blocks and percentage. Note that since devices have context_id -1, having an array of occ stats that correspond to each context_id will break here, so in FS mode I add an extra bucket for device blocks. This bucket is explicitly not added in SE mode in order to not only avoid ugliness in the stats.txt file, but to avoid broken stats (some formulas break when a bucket is 0).	2010-02-23 09:34:22 -08:00
Steve Reinhardt	07f091d6ed	Get rid of remaining traces of obsolete CoherenceProtocol object. --HG-- extra : convert_revision : c5555b00bef1b304a84886188ad2c0dcb4d7c5b9	2007-06-30 17:59:45 -07:00
Steve Reinhardt	0305159abf	PhysicalMemory has vector of uniform ports instead of one special one. configs/example/memtest.py: PhysicalMemory has vector of uniform ports instead of one special one. Other updates to fix obsolete brokenness. src/mem/physical.cc: src/mem/physical.hh: src/python/m5/objects/PhysicalMemory.py: Have vector of uniform ports instead of one special one. src/python/swig/pyobject.cc: Add comment. --HG-- extra : convert_revision : a4a764dcdcd9720bcd07c979d0ece311fc8cb4f1	2007-05-19 00:24:34 -04:00
Ali Saidi	634d2e9d83	remove hit_latency and make latency do the right thing set the latency parameter in terms of a latency add caches to tsunami-simple configs configs/common/Caches.py: tests/configs/memtest.py: tests/configs/o3-timing-mp.py: tests/configs/o3-timing.py: tests/configs/simple-atomic-mp.py: tests/configs/simple-timing-mp.py: tests/configs/simple-timing.py: set the latency parameter in terms of a latency configs/common/FSConfig.py: give the bridge a default latency too src/mem/cache/cache_builder.cc: src/python/m5/objects/BaseCache.py: remove hit_latency and make latency do the right thing tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: add caches to tsunami-simple configs --HG-- extra : convert_revision : 37bef7c652e97c8cdb91f471fba62978f89019f1	2007-05-10 18:24:48 -04:00
Steve Reinhardt	d486700149	Add short memtest run to quick regressions. Caveats: - Even though memtest is ISA-independent, it will only run for the Alpha builds, since there's no way to specify ISA-independent reference files and I didn't want to commit 3 copies since I'm not sure we want to run it for all the different ISAs anyway. - Reference outputs were generated on my laptop, so performance numbers will be low compared to zizzer. --HG-- extra : convert_revision : 210fe4abecc19fbab0b15402c6cb4863650bab66	2007-02-06 21:16:33 -08:00
Ron Dreslinski	780aa0a0eb	Fix corner case on assertion. I need to move over to using the fixPacket function so I don't have to make the same changes everywhere. Still a functional access bug someplace I need to track down in timing mode. src/mem/cache/base_cache.cc: src/mem/cache/cache_impl.hh: Fix corner case on assertion tests/configs/memtest.py: Updated memtester with uncacheable addresses and functional accesses --HG-- extra : convert_revision : e6fa851621700ff9227b83cc5cac20af4fc8444f	2006-10-19 21:26:46 -04:00
Ron Dreslinski	60252f8e63	Interesting memtest finally. Get over 500,000 reads on each of 8 testers before memory leak becomes large. tests/configs/memtest.py: Update test to be more interesting --HG-- extra : convert_revision : 4258b798fbeeed2a376f1bfac100a109eb05620e	2006-10-11 01:18:20 -04:00
Ron Dreslinski	cc78d86661	Fix several bugs pertaining to upgrades/mem leaks. src/mem/cache/base_cache.cc: Fix a bug about not having a request to send src/mem/cache/base_cache.hh: Fix a bug with the blocking code src/mem/cache/cache.hh: AFix a bug with snoop hits in WB buffer src/mem/cache/cache_impl.hh: Fix a bug with snoop hits in WB buffer Also, add better DPRINTF's src/mem/cache/miss/miss_queue.cc: Fix a bug with upgrades (Need to clean it up later) src/mem/cache/miss/mshr.cc: Fix a memory leak bug, still some outstanding with writebacks not being deleted src/mem/cache/miss/mshr_queue.cc: Fix a bug about upgrades (need to clean up later) src/mem/packet.hh: Fix for newly added cmd attribute for upgrades tests/configs/memtest.py: More interesting testcase --HG-- extra : convert_revision : fcb4f17dd58b537bb4f67a8c835f50e455e8c688	2006-10-10 01:32:18 -04:00
Ron Dreslinski	b9fb4d4870	Make memtest work with 8 memtesters src/mem/physical.cc: Update comment to match memtest use src/python/m5/objects/PhysicalMemory.py: Make memtester have a way to connect functionally tests/configs/memtest.py: Properly create 8 memtesters and connect them to the memory system --HG-- extra : convert_revision : e5a2dd9c8918d58051b553b5c6a14785d48b34ca	2006-10-09 17:13:50 -04:00
Ron Dreslinski	6c7ab02682	Update the Memtester, commit a config file/test for it. src/cpu/SConscript: Add memtester to the compilation environment. Someone who knows this better should make the MemTest a cpu model parameter. For now attached with the build of o3 cpu. src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: Update Memtest for new mem system src/python/m5/objects/MemTest.py: Update memtest python description --HG-- extra : convert_revision : d6a63e08fda0975a7abfb23814a86a0caf53e482	2006-10-09 00:26:10 -04:00

29 commits