__ __ ____ _ _____ ____ _
| \/ | ___| _ \ / \|_ _| | __ ) ___| |_ __ _
| |\/| |/ __| |_) / _ \ | | | _ \ / _ \ __|/ _` |
| | | | (__| __/ ___ \| | | |_) | __/ |_| (_| |
|_| |_|\___|_| /_/ \_\_| |____/ \___|\__|\__,_|
McPAT: Multicore Power, Area, and Timing
Current version 0.8Beta
===============================
McPAT is an architectural modeling tool for chip multiprocessors (CMP)
The main focus of McPAT is accurate power and area
modeling, and a target clock rate is used as a design constraint.
McPAT performs automatic extensive search to find optimal designs
that satisfy the target clock frequency.
For complete documentation of the McPAT, please refer McPAT 1.0
technical report and the following paper,
"McPAT: An Integrated Power, Area, and Timing Modeling
Framework for Multicore and Manycore Architectures",
that appears in MICRO 2009. Please cite the paper, if you use
McPAT in your work. The bibtex entry is provided below for your convenience.
@inproceedings{mcpat:micro,
author = {Sheng Li and Jung Ho Ahn and Richard D. Strong and Jay B. Brockman and Dean M. Tullsen and Norman P. Jouppi},
title = "{McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures}",
booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture},
year = {2009},
pages = {469--480},
}
Current McPAT is in its beta release.
List of features of beta release
===============================
The following are the list of features supported by the tool.
* Power, area, and timing models for CMPs with:
Inorder cores both single and multithreaded
OOO cores both single and multithreaded
Shared/coherent caches with directory hardware:
including directory cache, shadowed tag directory
and static bank mapped tag directory
Network-on-Chip
On-chip memory controllers
* Internal models are based on real modern processors:
Inorder models are based on Sun Niagara family
OOO models are based on Intel P6 for reservation
station based OOO cores, and on Intel Netburst and
Alpha 21264 for physical register file based OOO cores.
* Leakage power modeling considers both sub-threshold leakage
and gate leakage power. The impact of operating temperature
on both leakage power are considered. Longer channel devices
that can reduce leakage significantly with modest performance
penalty are also modeled.
* McPAT supports automatic extensive search to find optimal designs
that satisfy the target clock frequency. The timing constraint
include both throughput and latency.
* Interconnect model with different delay, power, and area
properties, as well as both the aggressive and conservative
interconnect projections on wire technologies.
* All process specific values used by the McPAT are obtained
from ITRS and currently, the McPAT supports 90nm, 65nm, 45nm,
32nm, and 22nm technology nodes. At 32nm and 22nm nodes, SOI
and DG devices are used. After 45nm, Hi-K metal gates are used.
How to use the tool?
====================
McPAT takes input parameters from an XML-based interface,
then it computes area and peak power of the
Please note that the peak power is the absolute worst case power,
which could be even higher than TDP.
1. Steps to run McPAT:
-> define the target processor using inorder.xml or OOO.xml
-> run the "mcpat" binary:
./mcpat -infile <*.xml> -print_level < level of detailed output>
./mcpat -h (or mcpat --help) will show the quick help message.
Rather than being hardwired to certain simulators, McPAT
uses an XML-based interface to enable easy integration
with various performance simulators. Our collaborator,
Richard Strong, at University of California, San Diego,
designed an experimental parser for the M5 simulator, aiming for
streamlining the integration of McPAT and M5. Please check the M5
repository/ for the latest version of the parser.
2. Optimize:
McPAT will try its best to satisfy the target clock rate.
When it cannot find a valid solution, it gives out warnings,
while still giving a solution that is closest to the timing
constraints and calculate power based on it. The optimization
will lead to larger power/area numbers for target higher clock
rate. McPAT also provides the option "-opt_for_clk" to turn on
("-opt_for_clk 1") and off this strict optimization for the
timing constraint. When it is off, McPAT always optimize
component for ED^2P without worrying about meeting the
target clock frequency. By turning it off, the computation time
can be reduced, which suites for situations where target clock rate
is conservative.
3. The output:
McPAT outputs results in a hierarchical manner. Increasing
the "-print_level" will show detailed results inside each
component. For each component, major parts are shown, and associated
pipeline registers/control logic are added up in total area/power of each
components. In general, McPAT does not model the area/overhead of the pad
frame used in a processor die.
4. How to use the XML interface for McPAT
4.1 Set up the parameters
Parameters of target designs need to be set in the *.xml file for
entries taged as "param". McPAT have very detailed parameter settings.
please remove the structure parameter from the file if you want
to use the default values. Otherwise, the parameters in the xml file
will override the default values.
4.2 Pass the statistics
There are two options to get the correct stats: a) the performance
simulator can capture all the stats in detail and pass them to McPAT;
b). Performance simulator can only capture partial stats and pass
them to McPAT, while McPAT can reason about the complete stats using
the partial information and the configuration. Therefore, there are
some overlap for the stats.
4.3 Interface XML file structures (PLEASE READ!)
The XML is hierarchical from processor level to micro-architecture
level. McPAT support both heterogeneous and homogeneous manycore processors.
1). For heterogeneous processor setup, each component (core, NoC, cache,
and etc) must have its own instantiations (core0, core1, ..., coreN).
Each instantiation will have different parameters as well as its stats.
Thus, the XML file must have multiple "instantiation" of each type of
heterogeneous components and the corresponding hetero flags must be set
in the XML file. Then state in the XML should be the stats of "a" instantiation
(e.g. "a" cores). The reported runtime dynamic is of a single instantiation
(e.g. "a" cores). Since the stats for each (e.g. "a" cores) may be different,
we will see a whole list of (e.g. "a" cores) with different dynamic power,
and total power is just a sum of them.
2). For homogeneous processors, the same method for heterogeneous can
also be used by treating all homogeneous instantiations as heterogeneous.
However, a preferred approach is to use a single representative for all
the same components (e.g. core0 to represent all cores) and set the
processor to have homogeneous components (e.g. <param name="homogeneous_cores
" value="1"/> ). Thus, the XML file only has one instantiation to represent
all others with the same architectural parameters. The corresponding homo
flags must be set in the XML file. Then, the stats in the XML should be
the aggregated stats of the sum of all instantiations (e.g. aggregated stats
of all cores). In the final results, McPAT will only report a single
instantiation of each type of component, and the reported runtime dynamic power
is the sum of all instantiations of the same type. This approach can run fast
and use much less memory.
5. Guide for integrating McPAT into performance simulators and bypassing the XML interface
The detailed work flow of McPAT has two phases: the initialization phase and
the computation phase. Specifically, in order to start the initialization phase a
user specifies static configurations, including parameters at all three levels,
namely, architectural, circuit, and technology levels. During the initialization
phase, McPAT will generate the internal chip representation using the configurations
set by the user.
The computation phase of McPAT is called by McPAT or the performance simulator
during simulation to generate runtime power numbers. Before calling McPAT to
compute runtime power numbers, the performance simulator needs to pass the
statistics, namely, the activity factors of each individual components to McPAT
via the XML interface.
The initialization phase is very time-consuming, since it will repeat many
times until valid configurations are found or the possible configurations are
exhausted. To reduce the overhead, a user can let the simulator to call McPAT
directly for computation phase and only call initialization phase once at the
beginning of simulation. In this case, the XML interface file is bypassed,
please refer to processor.cc to see how the two phases are called.
6. Sample input files:
This package provide sample XML files for validating target processors. Please find the
enclosed Niagara1.xml (for the Sun Niagara1 processor), Niagara2.xml (for the Sun Niagara2
processor), Alpha21364.xml (for the Alpha21364 processor), and Xeon.xml (for the Intel
Xeon Tulsa processor).
Special instructions for using Xeon.xml:
McPAT uses ITRS device types including HP, LSTP, and LOP. Although most
designs follow ITRS projections, there are designs with special technologies.
For example, the 65nm Xeon Tulsa processor uses 1.25 V rather than 1.1V
for the core voltage domain, which results in the changes in threshold voltage,
leakage current density, saturation current, and etc, besides the different
supply voltage. We use MASTAR to match the special technology as used in Xeon
core domain. Therefore, in order to generate accurate results of Xeon
Tulsa cores, users need to do make TAR=mcpatXeonCore and use the generated
special executable. The L3 cache and buses must be computed using standard
ITRS technology.
====================
McPAT is in its beginning stage. We are still improving
the tool and refining the code. Please come back to its website
for newer versions. If you have any comments,
questions, or suggestions, please write to us.
Version history and roadmap
McPAT Alpha: released Sep. 2009 Experimental release
McPAT Beta (0.6): released Nov. 2009 New code base and technology base
McPAT Beta (0.7): released May. 2010 Added various new models,
including long channel devices, buses model; together
with bug fixes and extensive code optimization to reduce
memory usage.
McPAT Beta (0.8): released Aug. 2010 Added various new models,
including on-chip 10Gb ethernet units, PCIe, and flash controllers.
Next major release:
McPAT 1.0: including advance power-saving states
Future releases may include the modeling of embedded low-power
processors as well as vector processors and GPGPUs.
Sheng Li
sheng.li@hp.com