gem5/src/cpu/o3/mem_dep_unit_impl.hh

588 lines
17 KiB
C++
Raw Normal View History

/*
* Copyright (c) 2004-2006 The Regents of The University of Michigan
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met: redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer;
* redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution;
* neither the name of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* Authors: Kevin Lim
*/
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
#include <map>
#include "cpu/o3/inst_queue.hh"
shuffle files around for new directory structure --HG-- rename : cpu/base_cpu.cc => cpu/base.cc rename : cpu/base_cpu.hh => cpu/base.hh rename : cpu/beta_cpu/2bit_local_pred.cc => cpu/o3/2bit_local_pred.cc rename : cpu/beta_cpu/2bit_local_pred.hh => cpu/o3/2bit_local_pred.hh rename : cpu/beta_cpu/alpha_full_cpu.cc => cpu/o3/alpha_cpu.cc rename : cpu/beta_cpu/alpha_full_cpu.hh => cpu/o3/alpha_cpu.hh rename : cpu/beta_cpu/alpha_full_cpu_builder.cc => cpu/o3/alpha_cpu_builder.cc rename : cpu/beta_cpu/alpha_full_cpu_impl.hh => cpu/o3/alpha_cpu_impl.hh rename : cpu/beta_cpu/alpha_dyn_inst.cc => cpu/o3/alpha_dyn_inst.cc rename : cpu/beta_cpu/alpha_dyn_inst.hh => cpu/o3/alpha_dyn_inst.hh rename : cpu/beta_cpu/alpha_dyn_inst_impl.hh => cpu/o3/alpha_dyn_inst_impl.hh rename : cpu/beta_cpu/alpha_impl.hh => cpu/o3/alpha_impl.hh rename : cpu/beta_cpu/alpha_params.hh => cpu/o3/alpha_params.hh rename : cpu/beta_cpu/bpred_unit.cc => cpu/o3/bpred_unit.cc rename : cpu/beta_cpu/bpred_unit.hh => cpu/o3/bpred_unit.hh rename : cpu/beta_cpu/bpred_unit_impl.hh => cpu/o3/bpred_unit_impl.hh rename : cpu/beta_cpu/btb.cc => cpu/o3/btb.cc rename : cpu/beta_cpu/btb.hh => cpu/o3/btb.hh rename : cpu/beta_cpu/comm.hh => cpu/o3/comm.hh rename : cpu/beta_cpu/commit.cc => cpu/o3/commit.cc rename : cpu/beta_cpu/commit.hh => cpu/o3/commit.hh rename : cpu/beta_cpu/commit_impl.hh => cpu/o3/commit_impl.hh rename : cpu/beta_cpu/full_cpu.cc => cpu/o3/cpu.cc rename : cpu/beta_cpu/full_cpu.hh => cpu/o3/cpu.hh rename : cpu/beta_cpu/cpu_policy.hh => cpu/o3/cpu_policy.hh rename : cpu/beta_cpu/decode.cc => cpu/o3/decode.cc rename : cpu/beta_cpu/decode.hh => cpu/o3/decode.hh rename : cpu/beta_cpu/decode_impl.hh => cpu/o3/decode_impl.hh rename : cpu/beta_cpu/fetch.cc => cpu/o3/fetch.cc rename : cpu/beta_cpu/fetch.hh => cpu/o3/fetch.hh rename : cpu/beta_cpu/fetch_impl.hh => cpu/o3/fetch_impl.hh rename : cpu/beta_cpu/free_list.cc => cpu/o3/free_list.cc rename : cpu/beta_cpu/free_list.hh => cpu/o3/free_list.hh rename : cpu/beta_cpu/iew.cc => cpu/o3/iew.cc rename : cpu/beta_cpu/iew.hh => cpu/o3/iew.hh rename : cpu/beta_cpu/iew_impl.hh => cpu/o3/iew_impl.hh rename : cpu/beta_cpu/inst_queue.cc => cpu/o3/inst_queue.cc rename : cpu/beta_cpu/inst_queue.hh => cpu/o3/inst_queue.hh rename : cpu/beta_cpu/inst_queue_impl.hh => cpu/o3/inst_queue_impl.hh rename : cpu/beta_cpu/mem_dep_unit.cc => cpu/o3/mem_dep_unit.cc rename : cpu/beta_cpu/mem_dep_unit.hh => cpu/o3/mem_dep_unit.hh rename : cpu/beta_cpu/mem_dep_unit_impl.hh => cpu/o3/mem_dep_unit_impl.hh rename : cpu/beta_cpu/ras.cc => cpu/o3/ras.cc rename : cpu/beta_cpu/ras.hh => cpu/o3/ras.hh rename : cpu/beta_cpu/regfile.hh => cpu/o3/regfile.hh rename : cpu/beta_cpu/rename.cc => cpu/o3/rename.cc rename : cpu/beta_cpu/rename.hh => cpu/o3/rename.hh rename : cpu/beta_cpu/rename_impl.hh => cpu/o3/rename_impl.hh rename : cpu/beta_cpu/rename_map.cc => cpu/o3/rename_map.cc rename : cpu/beta_cpu/rename_map.hh => cpu/o3/rename_map.hh rename : cpu/beta_cpu/rob.cc => cpu/o3/rob.cc rename : cpu/beta_cpu/rob.hh => cpu/o3/rob.hh rename : cpu/beta_cpu/rob_impl.hh => cpu/o3/rob_impl.hh rename : cpu/beta_cpu/sat_counter.cc => cpu/o3/sat_counter.cc rename : cpu/beta_cpu/sat_counter.hh => cpu/o3/sat_counter.hh rename : cpu/beta_cpu/store_set.cc => cpu/o3/store_set.cc rename : cpu/beta_cpu/store_set.hh => cpu/o3/store_set.hh rename : cpu/beta_cpu/tournament_pred.cc => cpu/o3/tournament_pred.cc rename : cpu/beta_cpu/tournament_pred.hh => cpu/o3/tournament_pred.hh rename : cpu/ooo_cpu/ooo_cpu.cc => cpu/ozone/cpu.cc rename : cpu/ooo_cpu/ooo_cpu.hh => cpu/ozone/cpu.hh rename : cpu/ooo_cpu/ooo_impl.hh => cpu/ozone/cpu_impl.hh rename : cpu/ooo_cpu/ea_list.cc => cpu/ozone/ea_list.cc rename : cpu/ooo_cpu/ea_list.hh => cpu/ozone/ea_list.hh rename : cpu/simple_cpu/simple_cpu.cc => cpu/simple/cpu.cc rename : cpu/simple_cpu/simple_cpu.hh => cpu/simple/cpu.hh rename : cpu/full_cpu/smt.hh => cpu/smt.hh rename : cpu/full_cpu/op_class.hh => encumbered/cpu/full/op_class.hh extra : convert_revision : c4a891d8d6d3e0e9e5ea56be47d851da44d8c032
2005-06-05 02:50:10 +02:00
#include "cpu/o3/mem_dep_unit.hh"
#include "debug/MemDepUnit.hh"
#include "params/DerivO3CPU.hh"
template <class MemDepPred, class Impl>
MemDepUnit<MemDepPred, Impl>::MemDepUnit()
: loadBarrier(false), loadBarrierSN(0), storeBarrier(false),
storeBarrierSN(0), iqPtr(NULL)
{
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
template <class MemDepPred, class Impl>
MemDepUnit<MemDepPred, Impl>::MemDepUnit(DerivO3CPUParams *params)
: _name(params->name + ".memdepunit"),
depPred(params->store_set_clear_period, params->SSITSize,
params->LFSTSize),
loadBarrier(false), loadBarrierSN(0), storeBarrier(false),
storeBarrierSN(0), iqPtr(NULL)
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
{
DPRINTF(MemDepUnit, "Creating MemDepUnit object.\n");
}
template <class MemDepPred, class Impl>
MemDepUnit<MemDepPred, Impl>::~MemDepUnit()
{
for (ThreadID tid = 0; tid < Impl::MaxThreads; tid++) {
ListIt inst_list_it = instList[tid].begin();
MemDepHashIt hash_it;
while (!instList[tid].empty()) {
hash_it = memDepHash.find((*inst_list_it)->seqNum);
assert(hash_it != memDepHash.end());
memDepHash.erase(hash_it);
instList[tid].erase(inst_list_it++);
}
}
#ifdef DEBUG
assert(MemDepEntry::memdep_count == 0);
#endif
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::init(DerivO3CPUParams *params, ThreadID tid)
{
DPRINTF(MemDepUnit, "Creating MemDepUnit %i object.\n",tid);
_name = csprintf("%s.memDep%d", params->name, tid);
id = tid;
depPred.init(params->store_set_clear_period, params->SSITSize,
params->LFSTSize);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::regStats()
{
insertedLoads
.name(name() + ".insertedLoads")
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
.desc("Number of loads inserted to the mem dependence unit.");
insertedStores
.name(name() + ".insertedStores")
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
.desc("Number of stores inserted to the mem dependence unit.");
conflictingLoads
.name(name() + ".conflictingLoads")
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
.desc("Number of conflicting loads.");
conflictingStores
.name(name() + ".conflictingStores")
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
.desc("Number of conflicting stores.");
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::switchOut()
{
assert(instList[0].empty());
assert(instsToReplay.empty());
assert(memDepHash.empty());
// Clear any state.
for (int i = 0; i < Impl::MaxThreads; ++i) {
instList[i].clear();
}
instsToReplay.clear();
memDepHash.clear();
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::takeOverFrom()
{
// Be sure to reset all state.
loadBarrier = storeBarrier = false;
loadBarrierSN = storeBarrierSN = 0;
depPred.clear();
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::setIQ(InstructionQueue<Impl> *iq_ptr)
{
iqPtr = iq_ptr;
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::insert(DynInstPtr &inst)
{
ThreadID tid = inst->threadNumber;
MemDepEntryPtr inst_entry = new MemDepEntry(inst);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
// Add the MemDepEntry to the hash.
memDepHash.insert(
std::pair<InstSeqNum, MemDepEntryPtr>(inst->seqNum, inst_entry));
#ifdef DEBUG
MemDepEntry::memdep_insert++;
#endif
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
instList[tid].push_back(inst);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
inst_entry->listIt = --(instList[tid].end());
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
// Check any barriers and the dependence predictor for any
// producing memrefs/stores.
InstSeqNum producing_store;
if (inst->isLoad() && loadBarrier) {
DPRINTF(MemDepUnit, "Load barrier [sn:%lli] in flight\n",
loadBarrierSN);
producing_store = loadBarrierSN;
} else if (inst->isStore() && storeBarrier) {
DPRINTF(MemDepUnit, "Store barrier [sn:%lli] in flight\n",
storeBarrierSN);
producing_store = storeBarrierSN;
} else {
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
producing_store = depPred.checkInst(inst->instAddr());
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
MemDepEntryPtr store_entry = NULL;
// If there is a producing store, try to find the entry.
if (producing_store != 0) {
DPRINTF(MemDepUnit, "Searching for producer\n");
MemDepHashIt hash_it = memDepHash.find(producing_store);
if (hash_it != memDepHash.end()) {
store_entry = (*hash_it).second;
DPRINTF(MemDepUnit, "Proucer found\n");
}
}
// If no store entry, then instruction can issue as soon as the registers
// are ready.
if (!store_entry) {
DPRINTF(MemDepUnit, "No dependency for inst PC "
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
"%s [sn:%lli].\n", inst->pcState(), inst->seqNum);
inst_entry->memDepReady = true;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
if (inst->readyToIssue()) {
inst_entry->regsReady = true;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
moveToReady(inst_entry);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
} else {
// Otherwise make the instruction dependent on the store/barrier.
DPRINTF(MemDepUnit, "Adding to dependency list; "
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
"inst PC %s is dependent on [sn:%lli].\n",
inst->pcState(), producing_store);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
if (inst->readyToIssue()) {
inst_entry->regsReady = true;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
// Clear the bit saying this instruction can issue.
inst->clearCanIssue();
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
// Add this instruction to the list of dependents.
store_entry->dependInsts.push_back(inst_entry);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
if (inst->isLoad()) {
++conflictingLoads;
} else {
++conflictingStores;
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
if (inst->isStore()) {
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
DPRINTF(MemDepUnit, "Inserting store PC %s [sn:%lli].\n",
inst->pcState(), inst->seqNum);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
depPred.insertStore(inst->instAddr(), inst->seqNum, inst->threadNumber);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
++insertedStores;
} else if (inst->isLoad()) {
++insertedLoads;
} else {
panic("Unknown type! (most likely a barrier).");
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::insertNonSpec(DynInstPtr &inst)
{
ThreadID tid = inst->threadNumber;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
MemDepEntryPtr inst_entry = new MemDepEntry(inst);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
// Insert the MemDepEntry into the hash.
memDepHash.insert(
std::pair<InstSeqNum, MemDepEntryPtr>(inst->seqNum, inst_entry));
#ifdef DEBUG
MemDepEntry::memdep_insert++;
#endif
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
// Add the instruction to the list.
instList[tid].push_back(inst);
inst_entry->listIt = --(instList[tid].end());
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
// Might want to turn this part into an inline function or something.
// It's shared between both insert functions.
if (inst->isStore()) {
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
DPRINTF(MemDepUnit, "Inserting store PC %s [sn:%lli].\n",
inst->pcState(), inst->seqNum);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
depPred.insertStore(inst->instAddr(), inst->seqNum, inst->threadNumber);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
++insertedStores;
} else if (inst->isLoad()) {
++insertedLoads;
} else {
panic("Unknown type! (most likely a barrier).");
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::insertBarrier(DynInstPtr &barr_inst)
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
{
InstSeqNum barr_sn = barr_inst->seqNum;
// Memory barriers block loads and stores, write barriers only stores.
if (barr_inst->isMemBarrier()) {
loadBarrier = true;
loadBarrierSN = barr_sn;
storeBarrier = true;
storeBarrierSN = barr_sn;
DPRINTF(MemDepUnit, "Inserted a memory barrier %s SN:%lli\n",
barr_inst->pcState(),barr_sn);
} else if (barr_inst->isWriteBarrier()) {
storeBarrier = true;
storeBarrierSN = barr_sn;
DPRINTF(MemDepUnit, "Inserted a write barrier\n");
}
ThreadID tid = barr_inst->threadNumber;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
MemDepEntryPtr inst_entry = new MemDepEntry(barr_inst);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
// Add the MemDepEntry to the hash.
memDepHash.insert(
std::pair<InstSeqNum, MemDepEntryPtr>(barr_sn, inst_entry));
#ifdef DEBUG
MemDepEntry::memdep_insert++;
#endif
// Add the instruction to the instruction list.
instList[tid].push_back(barr_inst);
inst_entry->listIt = --(instList[tid].end());
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
template <class MemDepPred, class Impl>
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
void
MemDepUnit<MemDepPred, Impl>::regsReady(DynInstPtr &inst)
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
{
DPRINTF(MemDepUnit, "Marking registers as ready for "
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
"instruction PC %s [sn:%lli].\n",
inst->pcState(), inst->seqNum);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
MemDepEntryPtr inst_entry = findInHash(inst);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
inst_entry->regsReady = true;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
if (inst_entry->memDepReady) {
DPRINTF(MemDepUnit, "Instruction has its memory "
"dependencies resolved, adding it to the ready list.\n");
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
moveToReady(inst_entry);
} else {
DPRINTF(MemDepUnit, "Instruction still waiting on "
"memory dependency.\n");
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::nonSpecInstReady(DynInstPtr &inst)
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
{
DPRINTF(MemDepUnit, "Marking non speculative "
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
"instruction PC %s as ready [sn:%lli].\n",
inst->pcState(), inst->seqNum);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
MemDepEntryPtr inst_entry = findInHash(inst);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
moveToReady(inst_entry);
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::reschedule(DynInstPtr &inst)
{
instsToReplay.push_back(inst);
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::replay(DynInstPtr &inst)
{
DynInstPtr temp_inst;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
// For now this replay function replays all waiting memory ops.
while (!instsToReplay.empty()) {
temp_inst = instsToReplay.front();
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
MemDepEntryPtr inst_entry = findInHash(temp_inst);
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
DPRINTF(MemDepUnit, "Replaying mem instruction PC %s [sn:%lli].\n",
temp_inst->pcState(), temp_inst->seqNum);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
moveToReady(inst_entry);
instsToReplay.pop_front();
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::completed(DynInstPtr &inst)
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
{
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
DPRINTF(MemDepUnit, "Completed mem instruction PC %s [sn:%lli].\n",
inst->pcState(), inst->seqNum);
ThreadID tid = inst->threadNumber;
// Remove the instruction from the hash and the list.
MemDepHashIt hash_it = memDepHash.find(inst->seqNum);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
assert(hash_it != memDepHash.end());
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
instList[tid].erase((*hash_it).second->listIt);
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
(*hash_it).second = NULL;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
memDepHash.erase(hash_it);
#ifdef DEBUG
MemDepEntry::memdep_erase++;
#endif
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::completeBarrier(DynInstPtr &inst)
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
{
wakeDependents(inst);
completed(inst);
InstSeqNum barr_sn = inst->seqNum;
DPRINTF(MemDepUnit, "barrier completed: %s SN:%lli\n", inst->pcState(),
inst->seqNum);
if (inst->isMemBarrier()) {
if (loadBarrierSN == barr_sn)
loadBarrier = false;
if (storeBarrierSN == barr_sn)
storeBarrier = false;
} else if (inst->isWriteBarrier()) {
if (storeBarrierSN == barr_sn)
storeBarrier = false;
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::wakeDependents(DynInstPtr &inst)
{
// Only stores and barriers have dependents.
if (!inst->isStore() && !inst->isMemBarrier() && !inst->isWriteBarrier()) {
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
return;
}
MemDepEntryPtr inst_entry = findInHash(inst);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
for (int i = 0; i < inst_entry->dependInsts.size(); ++i ) {
MemDepEntryPtr woken_inst = inst_entry->dependInsts[i];
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
if (!woken_inst->inst) {
// Potentially removed mem dep entries could be on this list
continue;
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
DPRINTF(MemDepUnit, "Waking up a dependent inst, "
"[sn:%lli].\n",
woken_inst->inst->seqNum);
if (woken_inst->regsReady && !woken_inst->squashed) {
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
moveToReady(woken_inst);
} else {
woken_inst->memDepReady = true;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
inst_entry->dependInsts.clear();
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::squash(const InstSeqNum &squashed_num,
ThreadID tid)
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
{
if (!instsToReplay.empty()) {
ListIt replay_it = instsToReplay.begin();
while (replay_it != instsToReplay.end()) {
if ((*replay_it)->threadNumber == tid &&
(*replay_it)->seqNum > squashed_num) {
instsToReplay.erase(replay_it++);
} else {
++replay_it;
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
}
ListIt squash_it = instList[tid].end();
--squash_it;
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
MemDepHashIt hash_it;
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
while (!instList[tid].empty() &&
(*squash_it)->seqNum > squashed_num) {
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
DPRINTF(MemDepUnit, "Squashing inst [sn:%lli]\n",
(*squash_it)->seqNum);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
if ((*squash_it)->seqNum == loadBarrierSN)
loadBarrier = false;
if ((*squash_it)->seqNum == storeBarrierSN)
storeBarrier = false;
hash_it = memDepHash.find((*squash_it)->seqNum);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
assert(hash_it != memDepHash.end());
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
(*hash_it).second->squashed = true;
(*hash_it).second = NULL;
memDepHash.erase(hash_it);
#ifdef DEBUG
MemDepEntry::memdep_erase++;
#endif
instList[tid].erase(squash_it--);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
// Tell the dependency predictor to squash as well.
depPred.squash(squashed_num, tid);
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::violation(DynInstPtr &store_inst,
DynInstPtr &violating_load)
{
DPRINTF(MemDepUnit, "Passing violating PCs to store sets,"
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
" load: %#x, store: %#x\n", violating_load->instAddr(),
store_inst->instAddr());
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
// Tell the memory dependence unit of the violation.
depPred.violation(store_inst->instAddr(), violating_load->instAddr());
Update to make multiple instruction issue and different latencies work. Also change to ref counted DynInst. SConscript: Add branch predictor, BTB, load store queue, and storesets. arch/isa_parser.py: Specify the template parameter for AlphaDynInst base/traceflags.py: Add load store queue, store set, and mem dependence unit to the list of trace flags. cpu/base_dyn_inst.cc: Change formating, add in debug statement. cpu/base_dyn_inst.hh: Change DynInst to be RefCounted, add flag to clear whether or not this instruction can commit. This is likely to be removed in the future. cpu/beta_cpu/alpha_dyn_inst.cc: AlphaDynInst has been changed to be templated, so now this CC file is just used to force instantiations of AlphaDynInst. cpu/beta_cpu/alpha_dyn_inst.hh: Changed AlphaDynInst to be templated on Impl. Removed some unnecessary functions. cpu/beta_cpu/alpha_full_cpu.cc: AlphaFullCPU has been changed to be templated, so this CC file is now just used to force instantation of AlphaFullCPU. cpu/beta_cpu/alpha_full_cpu.hh: Change AlphaFullCPU to be templated on Impl. cpu/beta_cpu/alpha_impl.hh: Update it to reflect AlphaDynInst and AlphaFullCPU being templated on Impl. Also removed time buffers from here, as they are really a part of the CPU and are thus in the CPU policy now. cpu/beta_cpu/alpha_params.hh: Make AlphaSimpleParams inherit from the BaseFullCPU so that it doesn't need to specifically declare any parameters that are already in the BaseFullCPU. cpu/beta_cpu/comm.hh: Changed the structure of the time buffer communication structs. Now they include the size of the packet of instructions it is sending. Added some parameters to the backwards communication struct, mainly for squashing. cpu/beta_cpu/commit.hh: Update typenames to reflect change in location of time buffer structs. Update DynInst to DynInstPtr (it is refcounted now). cpu/beta_cpu/commit_impl.hh: Formatting changes mainly. Also sends back proper information on branch mispredicts so that the bpred unit can update itself. Updated behavior for non-speculative instructions (stores, any other non-spec instructions): once they reach the head of the ROB, the ROB signals back to the IQ that it can go ahead and issue the non-speculative instruction. The instruction itself is updated so that commit won't try to commit it again until it is done executing. cpu/beta_cpu/cpu_policy.hh: Added branch prediction unit, mem dependence prediction unit, load store queue. Moved time buffer structs from AlphaSimpleImpl to here. cpu/beta_cpu/decode.hh: Changed typedefs to reflect change in location of time buffer structs and also the change from DynInst to ref counted DynInstPtr. cpu/beta_cpu/decode_impl.hh: Continues to buffer instructions even while unblocking now. Changed how it loops through groups of instructions so it can properly block during the middle of a group of instructions. cpu/beta_cpu/fetch.hh: Changed typedefs to reflect change in location of time buffer structs and the change to ref counted DynInsts. Also added in branch brediction unit. cpu/beta_cpu/fetch_impl.hh: Add in branch prediction. Changed how fetch checks inputs and its current state to make for easier logic. cpu/beta_cpu/free_list.cc: Changed int regs and float regs to logically use one flat namespace. Future change will be moving them to a single scoreboard to conserve space. cpu/beta_cpu/free_list.hh: Mostly debugging statements. Might be removed for performance in future. cpu/beta_cpu/full_cpu.cc: Added in some debugging statements. Updated BaseFullCPU to take a params object. cpu/beta_cpu/full_cpu.hh: Added params class within BaseCPU that other param classes will be able to inherit from. Updated typedefs to reflect change in location of time buffer structs and ref counted DynInst. cpu/beta_cpu/iew.hh: Updated typedefs to reflect change in location of time buffer structs and use of ref counted DynInsts. cpu/beta_cpu/iew_impl.hh: Added in load store queue, updated iew to be able to execute non- speculative instructions, instead of having them execute in commit. cpu/beta_cpu/inst_queue.hh: Updated change to ref counted DynInsts. Changed inst queue to hold non-speculative instructions as well, which are issued only when commit signals backwards that a nonspeculative instruction is at the head of the ROB. cpu/beta_cpu/inst_queue_impl.hh: Updated to allow for non-speculative instructions to be in the inst queue. Also added some debug functions. cpu/beta_cpu/regfile.hh: Added debugging statements, changed formatting. cpu/beta_cpu/rename.hh: Updated typedefs, added some functions to clean up code. cpu/beta_cpu/rename_impl.hh: Moved some code into functions to make it easier to read. cpu/beta_cpu/rename_map.cc: Changed int and float reg behavior to use a single flat namespace. In the future, the rename maps can be combined to a single rename map to save space. cpu/beta_cpu/rename_map.hh: Added destructor. cpu/beta_cpu/rob.hh: Updated it with change from DynInst to ref counted DynInst. cpu/beta_cpu/rob_impl.hh: Formatting, updated to use ref counted DynInst. cpu/static_inst.hh: Updated forward declaration for AlphaDynInst now that it is templated. --HG-- extra : convert_revision : 1045f240ee9b6a4bd368e1806aca029ebbdc6dd3
2004-09-23 20:06:03 +02:00
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::issue(DynInstPtr &inst)
{
DPRINTF(MemDepUnit, "Issuing instruction PC %#x [sn:%lli].\n",
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
inst->instAddr(), inst->seqNum);
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
depPred.issued(inst->instAddr(), inst->seqNum, inst->isStore());
}
template <class MemDepPred, class Impl>
inline typename MemDepUnit<MemDepPred,Impl>::MemDepEntryPtr &
MemDepUnit<MemDepPred, Impl>::findInHash(const DynInstPtr &inst)
{
MemDepHashIt hash_it = memDepHash.find(inst->seqNum);
assert(hash_it != memDepHash.end());
return (*hash_it).second;
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
template <class MemDepPred, class Impl>
inline void
MemDepUnit<MemDepPred, Impl>::moveToReady(MemDepEntryPtr &woken_inst_entry)
{
DPRINTF(MemDepUnit, "Adding instruction [sn:%lli] "
"to the ready list.\n", woken_inst_entry->inst->seqNum);
assert(!woken_inst_entry->squashed);
iqPtr->addReadyMemInst(woken_inst_entry->inst);
}
template <class MemDepPred, class Impl>
void
MemDepUnit<MemDepPred, Impl>::dumpLists()
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
{
for (ThreadID tid = 0; tid < Impl::MaxThreads; tid++) {
cprintf("Instruction list %i size: %i\n",
tid, instList[tid].size());
ListIt inst_list_it = instList[tid].begin();
int num = 0;
while (inst_list_it != instList[tid].end()) {
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
cprintf("Instruction:%i\nPC: %s\n[sn:%i]\n[tid:%i]\nIssued:%i\n"
"Squashed:%i\n\n",
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-31 08:07:20 +01:00
num, (*inst_list_it)->pcState(),
(*inst_list_it)->seqNum,
(*inst_list_it)->threadNumber,
(*inst_list_it)->isIssued(),
(*inst_list_it)->isSquashed());
inst_list_it++;
++num;
}
}
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
cprintf("Memory dependence hash size: %i\n", memDepHash.size());
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
#ifdef DEBUG
cprintf("Memory dependence entries: %i\n", MemDepEntry::memdep_count);
#endif
Check in of various updates to the CPU. Mainly adds in stats, improves branch prediction, and makes memory dependence work properly. SConscript: Added return address stack, tournament predictor. cpu/base_cpu.cc: Added debug break and print statements. cpu/base_dyn_inst.cc: cpu/base_dyn_inst.hh: Comment out possibly unneeded variables. cpu/beta_cpu/2bit_local_pred.cc: 2bit predictor no longer speculatively updates itself. cpu/beta_cpu/alpha_dyn_inst.hh: Comment formatting. cpu/beta_cpu/alpha_full_cpu.hh: Formatting cpu/beta_cpu/alpha_full_cpu_builder.cc: Added new parameters for branch predictors, and IQ parameters. cpu/beta_cpu/alpha_full_cpu_impl.hh: Register stats. cpu/beta_cpu/alpha_params.hh: Added parameters for IQ, branch predictors, and store sets. cpu/beta_cpu/bpred_unit.cc: Removed one class. cpu/beta_cpu/bpred_unit.hh: Add in RAS, stats. Changed branch predictor unit functionality so that it holds a history of past branches so it can update, and also hold a proper history of the RAS so it can be restored on branch mispredicts. cpu/beta_cpu/bpred_unit_impl.hh: Added in stats, history of branches, RAS. Now bpred unit actually modifies the instruction's predicted next PC. cpu/beta_cpu/btb.cc: Add in sanity checks. cpu/beta_cpu/comm.hh: Add in communication where needed, remove it where it's not. cpu/beta_cpu/commit.hh: cpu/beta_cpu/rename.hh: cpu/beta_cpu/rename_impl.hh: Add in stats. cpu/beta_cpu/commit_impl.hh: Stats, update what is sent back on branch mispredict. cpu/beta_cpu/cpu_policy.hh: Change the bpred unit being used. cpu/beta_cpu/decode.hh: cpu/beta_cpu/decode_impl.hh: Stats. cpu/beta_cpu/fetch.hh: Stats, change squash so it can handle squashes from decode differently than squashes from commit. cpu/beta_cpu/fetch_impl.hh: Add in stats. Change how a cache line is fetched. Update to work with caches. Also have separate functions for different behavior if squash is coming from decode vs commit. cpu/beta_cpu/free_list.hh: Remove some old comments. cpu/beta_cpu/full_cpu.cc: cpu/beta_cpu/full_cpu.hh: Added function to remove instructions from back of instruction list until a certain sequence number. cpu/beta_cpu/iew.hh: Stats, separate squashing behavior due to branches vs memory. cpu/beta_cpu/iew_impl.hh: Stats, separate squashing behavior for branches vs memory. cpu/beta_cpu/inst_queue.cc: Debug stuff cpu/beta_cpu/inst_queue.hh: Stats, change how mem dep unit works, debug stuff cpu/beta_cpu/inst_queue_impl.hh: Stats, change how mem dep unit works, debug stuff. Also add in parameters that used to be hardcoded. cpu/beta_cpu/mem_dep_unit.hh: cpu/beta_cpu/mem_dep_unit_impl.hh: Add in stats, change how memory dependence unit works. It now holds the memory instructions that are waiting for their memory dependences to resolve. It provides which instructions are ready directly to the IQ. cpu/beta_cpu/regfile.hh: Fix up sanity checks. cpu/beta_cpu/rename_map.cc: Fix loop variable type. cpu/beta_cpu/rob_impl.hh: Remove intermediate DynInstPtr cpu/beta_cpu/store_set.cc: Add in debugging statements. cpu/beta_cpu/store_set.hh: Reorder function arguments to match the rest of the calls. --HG-- extra : convert_revision : aabf9b1fecd1d743265dfc3b174d6159937c6f44
2004-10-22 00:02:36 +02:00
}