From: Bobby R. Bruce Date: Fri, 3 Apr 2020 21:45:28 +0000 (-0700) Subject: misc: Removed unneeded Doxygen pages X-Git-Tag: v20.0.0.0~162 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=8be39b3059bee5be69e7d2687f6ef180acf6dae2;p=gem5.git misc: Removed unneeded Doxygen pages These removed doxygen files have already been migrated to the gem5 website. inside-minor.doxygen: www.gem5.org/documentation/general_docs/cpu_models/minor_cpu memory_system.doxygen: www.gem5.org/documentation/general_docs/memory_system/gem5_memory_system power_thermal_model.doxygen: www.gem5.org/documentation/general_docs/thermal_model Issue-on: https://gem5.atlassian.net/browse/GEM5-229 Change-Id: Ib36c364def2dae06a0efbedd3d398763ae7d4e21 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27487 Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com> Reviewed-by: Jason Lowe-Power Maintainer: Jason Lowe-Power --- diff --git a/src/doc/inside-minor.doxygen b/src/doc/inside-minor.doxygen deleted file mode 100644 index 9db3d6876..000000000 --- a/src/doc/inside-minor.doxygen +++ /dev/null @@ -1,1089 +0,0 @@ -# Copyright (c) 2014 ARM Limited -# All rights reserved -# -# The license below extends only to copyright in the software and shall -# not be construed as granting a license to any other intellectual -# property including but not limited to intellectual property relating -# to a hardware implementation of the functionality of the software -# licensed hereunder. You may use the software subject to the license -# terms below provided that you ensure that this notice is replicated -# unmodified and in its entirety in all distributions of the software, -# modified or unmodified, in source code or in binary form. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are -# met: redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer; -# redistributions in binary form must reproduce the above copyright -# notice, this list of conditions and the following disclaimer in the -# documentation and/or other materials provided with the distribution; -# neither the name of the copyright holders nor the names of its -# contributors may be used to endorse or promote products derived from -# this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -namespace Minor -{ - -/*! - -\page minor Inside the Minor CPU model - -\tableofcontents - -This document contains a description of the structure and function of the -Minor gem5 in-order processor model. It is recommended reading for anyone who -wants to understand Minor's internal organisation, design decisions, C++ -implementation and Python configuration. A familiarity with gem5 and some of -its internal structures is assumed. This document is meant to be read -alongside the Minor source code and to explain its general structure without -being too slavish about naming every function and data type. - -\section whatis What is Minor? - -Minor is an in-order processor model with a fixed pipeline but configurable -data structures and execute behaviour. It is intended to be used to model -processors with strict in-order execution behaviour and allows visualisation -of an instruction's position in the pipeline through the -MinorTrace/minorview.py format/tool. The intention is to provide a framework -for micro-architecturally correlating the model with a particular, chosen -processor with similar capabilities. - -\section philo Design philosophy - -\subsection mt Multithreading - -The model isn't currently capable of multithreading but there are THREAD -comments in key places where stage data needs to be arrayed to support -multithreading. - -\subsection structs Data structures - -Decorating data structures with large amounts of life-cycle information is -avoided. Only instructions (MinorDynInst) contain a significant proportion of -their data content whose values are not set at construction. - -All internal structures have fixed sizes on construction. Data held in queues -and FIFOs (MinorBuffer, FUPipeline) should have a BubbleIF interface to -allow a distinct 'bubble'/no data value option for each type. - -Inter-stage 'struct' data is packaged in structures which are passed by value. -Only MinorDynInst, the line data in ForwardLineData and the memory-interfacing -objects Fetch1::FetchRequest and LSQ::LSQRequest are '::new' allocated while -running the model. - -\section model Model structure - -Objects of class MinorCPU are provided by the model to gem5. MinorCPU -implements the interfaces of (cpu.hh) and can provide data and -instruction interfaces for connection to a cache system. The model is -configured in a similar way to other gem5 models through Python. That -configuration is passed on to MinorCPU::pipeline (of class Pipeline) which -actually implements the processor pipeline. - -The hierarchy of major unit ownership from MinorCPU down looks like this: - -
    -
  • MinorCPU
  • -
      -
    • Pipeline - container for the pipeline, owns the cyclic 'tick' - event mechanism and the idling (cycle skipping) mechanism.
    • -
        -
      • Fetch1 - instruction fetch unit responsible for fetching cache - lines (or parts of lines from the I-cache interface)
      • -
          -
        • Fetch1::IcachePort - interface to the I-cache from - Fetch1
        • -
        -
      • Fetch2 - line to instruction decomposition
      • -
      • Decode - instruction to micro-op decomposition
      • -
      • Execute - instruction execution and data memory - interface
      • -
          -
        • LSQ - load store queue for memory ref. instructions
        • -
        • LSQ::DcachePort - interface to the D-cache from - Execute
        • -
        -
      -
    -
- -\section keystruct Key data structures - -\subsection ids Instruction and line identity: InstId (dyn_inst.hh) - -An InstId contains the sequence numbers and thread numbers that describe the -life cycle and instruction stream affiliations of individual fetched cache -lines and instructions. - -An InstId is printed in one of the following forms: - - - T/S.P/L - for fetched cache lines - - T/S.P/L/F - for instructions before Decode - - T/S.P/L/F.E - for instructions from Decode onwards - -for example: - - - 0/10.12/5/6.7 - -InstId's fields are: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldSymbolGenerated byChecked byFunction
InstId::threadIdTFetch1Everywhere the thread number is neededThread number (currently always 0).
InstId::streamSeqNumSExecuteFetch1, Fetch2, Execute (to discard lines/insts)Stream sequence number as chosen by Execute. Stream - sequence numbers change after changes of PC (branches, exceptions) in - Execute and are used to separate pre and post branch instruction - streams.
InstId::predictionSeqNumPFetch2Fetch2 (while discarding lines after prediction)Prediction sequence numbers represent branch prediction decisions. - This is used by Fetch2 to mark lines/instructions according to the last - followed branch prediction made by Fetch2. Fetch2 can signal to Fetch1 - that it should change its fetch address and mark lines with a new - prediction sequence number (which it will only do if the stream sequence - number Fetch1 expects matches that of the request).
InstId::lineSeqNumLFetch1(Just for debugging)Line fetch sequence number of this cache line or the line - this instruction was extracted from. -
InstId::fetchSeqNumFFetch2Fetch2 (as the inst. sequence number for branches)Instruction fetch order assigned by Fetch2 when lines - are decomposed into instructions. -
InstId::execSeqNumEDecodeExecute (to check instruction identity in queues/FUs/LSQ)Instruction order after micro-op decomposition.
- -The sequence number fields are all independent of each other and although, for -instance, InstId::execSeqNum for an instruction will always be >= -InstId::fetchSeqNum, the comparison is not useful. - -The originating stage of each sequence number field keeps a counter for that -field which can be incremented in order to generate new, unique numbers. - -\subsection insts Instructions: MinorDynInst (dyn_inst.hh) - -MinorDynInst represents an instruction's progression through the pipeline. An -instruction can be three things: - - - - - - - - - - - - - - - - - - - - - - -
ThingPredicateExplanation
A bubbleMinorDynInst::isBubble()no instruction at all, just a space-filler
A faultMinorDynInst::isFault()a fault to pass down the pipeline in an instruction's clothing
A decoded instructionMinorDynInst::isInst()instructions are actually passed to the gem5 decoder in Fetch2 and so - are created fully decoded. MinorDynInst::staticInst is the decoded - instruction form.
- -Instructions are reference counted using the gem5 RefCountingPtr -(base/refcnt.hh) wrapper. They therefore usually appear as MinorDynInstPtr in -code. Note that as RefCountingPtr initialises as nullptr rather than an -object that supports BubbleIF::isBubble, passing raw MinorDynInstPtrs to -Queue%s and other similar structures from stage.hh without boxing is -dangerous. - -\subsection fld ForwardLineData (pipe_data.hh) - -ForwardLineData is used to pass cache lines from Fetch1 to Fetch2. Like -MinorDynInst%s, they can be bubbles (ForwardLineData::isBubble()), -fault-carrying or can contain a line (partial line) fetched by Fetch1. The -data carried by ForwardLineData is owned by a Packet object returned from -memory and is explicitly memory managed and do must be deleted once processed -(by Fetch2 deleting the Packet). - -\subsection fid ForwardInstData (pipe_data.hh) - -ForwardInstData can contain up to ForwardInstData::width() instructions in its -ForwardInstData::insts vector. This structure is used to carry instructions -between Fetch2, Decode and Execute and to store input buffer vectors in Decode -and Execute. - -\subsection fr Fetch1::FetchRequest (fetch1.hh) - -FetchRequests represent I-cache line fetch requests. The are used in the -memory queues of Fetch1 and are pushed into/popped from Packet::senderState -while traversing the memory system. - -FetchRequests contain a memory system Request (mem/request.hh) for that fetch -access, a packet (Packet, mem/packet.hh), if the request gets to memory, and a -fault field that can be populated with a TLB-sourced prefetch fault (if any). - -\subsection lsqr LSQ::LSQRequest (execute.hh) - -LSQRequests are similar to FetchRequests but for D-cache accesses. They carry -the instruction associated with a memory access. - -\section pipeline The pipeline - -\verbatim ------------------------------------------------------------------------------- - Key: - - [] : inter-stage BufferBuffer - ,--. - | | : pipeline stage - `--' - ---> : forward communication - <--- : backward communication - - rv : reservation information for input buffers - - ,------. ,------. ,------. ,-------. - (from --[]-v->|Fetch1|-[]->|Fetch2|-[]->|Decode|-[]->|Execute|--> (to Fetch1 - Execute) | | |<-[]-| |<-rv-| |<-rv-| | & Fetch2) - | `------'<-rv-| | | | | | - `-------------->| | | | | | - `------' `------' `-------' ------------------------------------------------------------------------------- -\endverbatim - -The four pipeline stages are connected together by MinorBuffer FIFO -(stage.hh, derived ultimately from TimeBuffer) structures which allow -inter-stage delays to be modelled. There is a MinorBuffer%s between adjacent -stages in the forward direction (for example: passing lines from Fetch1 to -Fetch2) and, between Fetch2 and Fetch1, a buffer in the backwards direction -carrying branch predictions. - -Stages Fetch2, Decode and Execute have input buffers which, each cycle, can -accept input data from the previous stage and can hold that data if the stage -is not ready to process it. Input buffers store data in the same form as it -is received and so Decode and Execute's input buffers contain the output -instruction vector (ForwardInstData (pipe_data.hh)) from their previous stages -with the instructions and bubbles in the same positions as a single buffer -entry. - -Stage input buffers provide a Reservable (stage.hh) interface to their -previous stages, to allow slots to be reserved in their input buffers, and -communicate their input buffer occupancy backwards to allow the previous stage -to plan whether it should make an output in a given cycle. - -\subsection events Event handling: MinorActivityRecorder (activity.hh, -pipeline.hh) - -Minor is essentially a cycle-callable model with some ability to skip cycles -based on pipeline activity. External events are mostly received by callbacks -(e.g. Fetch1::IcachePort::recvTimingResp) and cause the pipeline to be woken -up to service advancing request queues. - -Ticked (sim/ticked.hh) is a base class bringing together an evaluate -member function and a provided SimObject. It provides a Ticked::start/stop -interface to start and pause clock events from being periodically issued. -Pipeline is a derived class of Ticked. - -During evaluate calls, stages can signal that they still have work to do in -the next cycle by calling either MinorCPU::activityRecorder->activity() (for -non-callable related activity) or MinorCPU::wakeupOnEvent() (for -stage callback-related 'wakeup' activity). - -Pipeline::evaluate contains calls to evaluate for each unit and a test for -pipeline idling which can turns off the clock tick if no unit has signalled -that it may become active next cycle. - -Within Pipeline (pipeline.hh), the stages are evaluated in reverse order (and -so will ::evaluate in reverse order) and their backwards data can be -read immediately after being written in each cycle allowing output decisions -to be 'perfect' (allowing synchronous stalling of the whole pipeline). Branch -predictions from Fetch2 to Fetch1 can also be transported in 0 cycles making -fetch1ToFetch2BackwardDelay the only configurable delay which can be set as -low as 0 cycles. - -The MinorCPU::activateContext and MinorCPU::suspendContext interface can be -called to start and pause threads (threads in the MT sense) and to start and -pause the pipeline. Executing instructions can call this interface -(indirectly through the ThreadContext) to idle the CPU/their threads. - -\subsection stages Each pipeline stage - -In general, the behaviour of a stage (each cycle) is: - -\verbatim - evaluate: - push input to inputBuffer - setup references to input/output data slots - - do 'every cycle' 'step' tasks - - if there is input and there is space in the next stage: - process and generate a new output - maybe re-activate the stage - - send backwards data - - if the stage generated output to the following FIFO: - signal pipe activity - - if the stage has more processable input and space in the next stage: - re-activate the stage for the next cycle - - commit the push to the inputBuffer if that data hasn't all been used -\endverbatim - -The Execute stage differs from this model as its forward output (branch) data -is unconditionally sent to Fetch1 and Fetch2. To allow this behaviour, Fetch1 -and Fetch2 must be unconditionally receptive to that data. - -\subsection fetch1 Fetch1 stage - -Fetch1 is responsible for fetching cache lines or partial cache lines from the -I-cache and passing them on to Fetch2 to be decomposed into instructions. It -can receive 'change of stream' indications from both Execute and Fetch2 to -signal that it should change its internal fetch address and tag newly fetched -lines with new stream or prediction sequence numbers. When both Execute and -Fetch2 signal changes of stream at the same time, Fetch1 takes Execute's -change. - -Every line issued by Fetch1 will bear a unique line sequence number which can -be used for debugging stream changes. - -When fetching from the I-cache, Fetch1 will ask for data from the current -fetch address (Fetch1::pc) up to the end of the 'data snap' size set in the -parameter fetch1LineSnapWidth. Subsequent autonomous line fetches will fetch -whole lines at a snap boundary and of size fetch1LineWidth. - -Fetch1 will only initiate a memory fetch if it can reserve space in Fetch2 -input buffer. That input buffer serves an the fetch queue/LFL for the system. - -Fetch1 contains two queues: requests and transfers to handle the stages of -translating the address of a line fetch (via the TLB) and accommodating the -request/response of fetches to/from memory. - -Fetch requests from Fetch1 are pushed into the requests queue as newly -allocated FetchRequest objects once they have been sent to the ITLB with a -call to itb->translateTiming. - -A response from the TLB moves the request from the requests queue to the -transfers queue. If there is more than one entry in each queue, it is -possible to get a TLB response for request which is not at the head of the -requests queue. In that case, the TLB response is marked up as a state change -to Translated in the request object, and advancing the request to transfers -(and the memory system) is left to calls to Fetch1::stepQueues which is called -in the cycle following any event is received. - -Fetch1::tryToSendToTransfers is responsible for moving requests between the -two queues and issuing requests to memory. Failed TLB lookups (prefetch -aborts) continue to occupy space in the queues until they are recovered at the -head of transfers. - -Responses from memory change the request object state to Complete and -Fetch1::evaluate can pick up response data, package it in the ForwardLineData -object, and forward it to Fetch2%'s input buffer. - -As space is always reserved in Fetch2::inputBuffer, setting the input buffer's -size to 1 results in non-prefetching behaviour. - -When a change of stream occurs, translated requests queue members and -completed transfers queue members can be unconditionally discarded to make way -for new transfers. - -\subsection fetch2 Fetch2 stage - -Fetch2 receives a line from Fetch1 into its input buffer. The data in the -head line in that buffer is iterated over and separated into individual -instructions which are packed into a vector of instructions which can be -passed to Decode. Packing instructions can be aborted early if a fault is -found in either the input line as a whole or a decomposed instruction. - -\subsubsection bp Branch prediction - -Fetch2 contains the branch prediction mechanism. This is a wrapper around the -branch predictor interface provided by gem5 (cpu/pred/...). - -Branches are predicted for any control instructions found. If prediction is -attempted for an instruction, the MinorDynInst::triedToPredict flag is set on -that instruction. - -When a branch is predicted to take, the MinorDynInst::predictedTaken flag is -set and MinorDynInst::predictedTarget is set to the predicted target PC value. -The predicted branch instruction is then packed into Fetch2%'s output vector, -the prediction sequence number is incremented, and the branch is communicated -to Fetch1. - -After signalling a prediction, Fetch2 will discard its input buffer contents -and will reject any new lines which have the same stream sequence number as -that branch but have a different prediction sequence number. This allows -following sequentially fetched lines to be rejected without ignoring new lines -generated by a change of stream indicated from a 'real' branch from Execute -(which will have a new stream sequence number). - -The program counter value provided to Fetch2 by Fetch1 packets is only updated -when there is a change of stream. Fetch2::havePC indicates whether the PC -will be picked up from the next processed input line. Fetch2::havePC is -necessary to allow line-wrapping instructions to be tracked through decode. - -Branches (and instructions predicted to branch) which are processed by Execute -will generate BranchData (pipe_data.hh) data explaining the outcome of the -branch which is sent forwards to Fetch1 and Fetch2. Fetch1 uses this data to -change stream (and update its stream sequence number and address for new -lines). Fetch2 uses it to update the branch predictor. Minor does not -communicate branch data to the branch predictor for instructions which are -discarded on the way to commit. - -BranchData::BranchReason (pipe_data.hh) encodes the possible branch scenarios: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Branch enum val.In ExecuteFetch1 reactionFetch2 reaction
NoBranch(output bubble data)--
CorrectlyPredictedBranchPredicted, taken-Update BP as taken branch
UnpredictedBranchNot predicted, taken and was takenNew streamUpdate BP as taken branch
BadlyPredictedBranchPredicted, not takenNew stream to restore to old inst. sourceUpdate BP as not taken branch
BadlyPredictedBranchTargetPredicted, taken, but to a different target than predicted oneNew streamUpdate BTB to new target
SuspendThreadHint to suspend fetchingSuspend fetch for this thread (branch to next inst. as wakeup - fetch addr)-
InterruptInterrupt detectedNew stream-
- -The parameter decodeInputWidth sets the number of instructions which can be -packed into the output per cycle. If the parameter fetch2CycleInput is true, -Decode can try to take instructions from more than one entry in its input -buffer per cycle. - -\subsection decode Decode stage - -Decode takes a vector of instructions from Fetch2 (via its input buffer) and -decomposes those instructions into micro-ops (if necessary) and packs them -into its output instruction vector. - -The parameter executeInputWidth sets the number of instructions which can be -packed into the output per cycle. If the parameter decodeCycleInput is true, -Decode can try to take instructions from more than one entry in its input -buffer per cycle. - -\subsection execute Execute stage - -Execute provides all the instruction execution and memory access mechanisms. -An instructions passage through Execute can take multiple cycles with its -precise timing modelled by a functional unit pipeline FIFO. - -A vector of instructions (possibly including fault 'instructions') is provided -to Execute by Decode and can be queued in the Execute input buffer before -being issued. Setting the parameter executeCycleInput allows execute to -examine more than one input buffer entry (more than one instruction vector). -The number of instructions in the input vector can be set with -executeInputWidth and the depth of the input buffer can be set with parameter -executeInputBufferSize. - -\subsubsection fus Functional units - -The Execute stage contains pipelines for each functional unit comprising the -computational core of the CPU. Functional units are configured via the -executeFuncUnits parameter. Each functional unit has a number of instruction -classes it supports, a stated delay between instruction issues, and a delay -from instruction issue to (possible) commit and an optional timing annotation -capable of more complicated timing. - -Each active cycle, Execute::evaluate performs this action: - -\verbatim - Execute::evaluate: - push input to inputBuffer - setup references to input/output data slots and branch output slot - - step D-cache interface queues (similar to Fetch1) - - if interrupt posted: - take interrupt (signalling branch to Fetch1/Fetch2) - else - commit instructions - issue new instructions - - advance functional unit pipelines - - reactivate Execute if the unit is still active - - commit the push to the inputBuffer if that data hasn't all been used -\endverbatim - -\subsubsection fifos Functional unit FIFOs - -Functional units are implemented as SelfStallingPipelines (stage.hh). These -are TimeBuffer FIFOs with two distinct 'push' and 'pop' wires. They respond -to SelfStallingPipeline::advance in the same way as TimeBuffers unless -there is data at the far, 'pop', end of the FIFO. A 'stalled' flag is -provided for signalling stalling and to allow a stall to be cleared. The -intention is to provide a pipeline for each functional unit which will never -advance an instruction out of that pipeline until it has been processed and -the pipeline is explicitly unstalled. - -The actions 'issue', 'commit', and 'advance' act on the functional units. - -\subsubsection issue Issue - -Issuing instructions involves iterating over both the input buffer -instructions and the heads of the functional units to try and issue -instructions in order. The number of instructions which can be issued each -cycle is limited by the parameter executeIssueLimit, how executeCycleInput is -set, the availability of pipeline space and the policy used to choose a -pipeline in which the instruction can be issued. - -At present, the only issue policy is strict round-robin visiting of each -pipeline with the given instructions in sequence. For greater flexibility, -better (and more specific policies) will need to be possible. - -Memory operation instructions traverse their functional units to perform their -EA calculations. On 'commit', the ExecContext::initiateAcc execution phase is -performed and any memory access is issued (via. ExecContext::{read,write}Mem -calling LSQ::pushRequest) to the LSQ. - -Note that faults are issued as if they are instructions and can (currently) be -issued to *any* functional unit. - -Every issued instruction is also pushed into the Execute::inFlightInsts queue. -Memory ref. instructions are pushing into Execute::inFUMemInsts queue. - -\subsubsection commit Commit - -Instructions are committed by examining the head of the Execute::inFlightInsts -queue (which is decorated with the functional unit number to which the -instruction was issued). Instructions which can then be found in their -functional units are executed and popped from Execute::inFlightInsts. - -Memory operation instructions are committed into the memory queues (as -described above) and exit their functional unit pipeline but are not popped -from the Execute::inFlightInsts queue. The Execute::inFUMemInsts queue -provides ordering to memory operations as they pass through the functional -units (maintaining issue order). On entering the LSQ, instructions are popped -from Execute::inFUMemInsts. - -If the parameter executeAllowEarlyMemoryIssue is set, memory operations can be -sent from their FU to the LSQ before reaching the head of -Execute::inFlightInsts but after their dependencies are met. -MinorDynInst::instToWaitFor is marked up with the latest dependent instruction -execSeqNum required to be committed for a memory operation to progress to the -LSQ. - -Once a memory response is available (by testing the head of -Execute::inFlightInsts against LSQ::findResponse), commit will process that -response (ExecContext::completeAcc) and pop the instruction from -Execute::inFlightInsts. - -Any branch, fault or interrupt will cause a stream sequence number change and -signal a branch to Fetch1/Fetch2. Only instructions with the current stream -sequence number will be issued and/or committed. - -\subsubsection advance Advance - -All non-stalled pipeline are advanced and may, thereafter, become stalled. -Potential activity in the next cycle is signalled if there are any -instructions remaining in any pipeline. - -\subsubsection sb Scoreboard - -The scoreboard (Scoreboard) is used to control instruction issue. It contains -a count of the number of in flight instructions which will write each general -purpose CPU integer or float register. Instructions will only be issued where -the scoreboard contains a count of 0 instructions which will write to one of -the instructions source registers. - -Once an instruction is issued, the scoreboard counts for each destination -register for an instruction will be incremented. - -The estimated delivery time of the instruction's result is marked up in the -scoreboard by adding the length of the issued-to FU to the current time. The -timings parameter on each FU provides a list of additional rules for -calculating the delivery time. These are documented in the parameter comments -in MinorCPU.py. - -On commit, (for memory operations, memory response commit) the scoreboard -counters for an instruction's source registers are decremented. will be -decremented. - -\subsubsection ifi Execute::inFlightInsts - -The Execute::inFlightInsts queue will always contain all instructions in -flight in Execute in the correct issue order. Execute::issue is the only -process which will push an instruction into the queue. Execute::commit is the -only process that can pop an instruction. - -\subsubsection lsq LSQ - -The LSQ can support multiple outstanding transactions to memory in a number of -conservative cases. - -There are three queues to contain requests: requests, transfers and the store -buffer. The requests and transfers queue operate in a similar manner to the -queues in Fetch1. The store buffer is used to decouple the delay of -completing store operations from following loads. - -Requests are issued to the DTLB as their instructions leave their functional -unit. At the head of requests, cacheable load requests can be sent to memory -and on to the transfers queue. Cacheable stores will be passed to transfers -unprocessed and progress that queue maintaining order with other transactions. - -The conditions in LSQ::tryToSendToTransfers dictate when requests can -be sent to memory. - -All uncacheable transactions, split transactions and locked transactions are -processed in order at the head of requests. Additionally, store results -residing in the store buffer can have their data forwarded to cacheable loads -(removing the need to perform a read from memory) but no cacheable load can be -issue to the transfers queue until that queue's stores have drained into the -store buffer. - -At the end of transfers, requests which are LSQ::LSQRequest::Complete (are -faulting, are cacheable stores, or have been sent to memory and received a -response) can be picked off by Execute and either committed -(ExecContext::completeAcc) and, for stores, be sent to the store buffer. - -Barrier instructions do not prevent cacheable loads from progressing to memory -but do cause a stream change which will discard that load. Stores will not be -committed to the store buffer if they are in the shadow of the barrier but -before the new instruction stream has arrived at Execute. As all other memory -transactions are delayed at the end of the requests queue until they are at -the head of Execute::inFlightInsts, they will be discarded by any barrier -stream change. - -After commit, LSQ::BarrierDataRequest requests are inserted into the -store buffer to track each barrier until all preceding memory transactions -have drained from the store buffer. No further memory transactions will be -issued from the ends of FUs until after the barrier has drained. - -\subsubsection drain Draining - -Draining is mostly handled by the Execute stage. When initiated by calling -MinorCPU::drain, Pipeline::evaluate checks the draining status of each unit -each cycle and keeps the pipeline active until draining is complete. It is -Pipeline that signals the completion of draining. Execute is triggered by -MinorCPU::drain and starts stepping through its Execute::DrainState state -machine, starting from state Execute::NotDraining, in this order: - - - - - - - - - - - - - - - - - - - - - - -
StateMeaning
Execute::NotDrainingNot trying to drain, normal execution
Execute::DrainCurrentInstDraining micro-ops to complete inst.
Execute::DrainHaltFetchHalt fetching instructions
Execute::DrainAllInstsDiscarding all instructions presented
- -When complete, a drained Execute unit will be in the Execute::DrainAllInsts -state where it will continue to discard instructions but has no knowledge of -the drained state of the rest of the model. - -\section debug Debug options - -The model provides a number of debug flags which can be passed to gem5 with -the --debug-flags option. - -The available flags are: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Debug flagUnit which will generate debugging output
ActivityDebug ActivityMonitor actions
BranchFetch2 and Execute branch prediction decisions
MinorCPUCPU global actions such as wakeup/thread suspension
DecodeDecode
MinorExecExecute behaviour
FetchFetch1 and Fetch2
MinorInterruptExecute interrupt handling
MinorMemExecute memory interactions
MinorScoreboardExecute scoreboard activity
MinorTraceGenerate MinorTrace cyclic state trace output (see below)
MinorTimingMinorTiming instruction timing modification operations
- -The group flag Minor enables all of the flags beginning with Minor. - -\section trace MinorTrace and minorview.py - -The debug flag MinorTrace causes cycle-by-cycle state data to be printed which -can then be processed and viewed by the minorview.py tool. This output is -very verbose and so it is recommended it only be used for small examples. - -\subsection traceformat MinorTrace format - -There are three types of line outputted by MinorTrace: - -\subsubsection state MinorTrace - Ticked unit cycle state - -For example: - -\verbatim - 110000: system.cpu.dcachePort: MinorTrace: state=MemoryRunning in_tlb_mem=0/0 -\endverbatim - -For each time step, the MinorTrace flag will cause one MinorTrace line to be -printed for every named element in the model. - -\subsubsection traceunit MinorInst - summaries of instructions issued by \ - Decode - -For example: - -\verbatim - 140000: system.cpu.execute: MinorInst: id=0/1.1/1/1.1 addr=0x5c \ - inst=" mov r0, #0" class=IntAlu -\endverbatim - -MinorInst lines are currently only generated for instructions which are -committed. - -\subsubsection tracefetch1 MinorLine - summaries of line fetches issued by \ - Fetch1 - -For example: - -\verbatim - 92000: system.cpu.icachePort: MinorLine: id=0/1.1/1 size=36 \ - vaddr=0x5c paddr=0x5c -\endverbatim - -\subsection minorview minorview.py - -Minorview (util/minorview.py) can be used to visualise the data created by -MinorTrace. - -\verbatim -usage: minorview.py [-h] [--picture picture-file] [--prefix name] - [--start-time time] [--end-time time] [--mini-views] - event-file - -Minor visualiser - -positional arguments: - event-file - -optional arguments: - -h, --help show this help message and exit - --picture picture-file - markup file containing blob information (default: - /minor.pic) - --prefix name name prefix in trace for CPU to be visualised - (default: system.cpu) - --start-time time time of first event to load from file - --end-time time time of last event to load from file - --mini-views show tiny views of the next 10 time steps -\endverbatim - -Raw debugging output can be passed to minorview.py as the event-file. It will -pick out the MinorTrace lines and use other lines where units in the -simulation are named (such as system.cpu.dcachePort in the above example) will -appear as 'comments' when units are clicked on the visualiser. - -Clicking on a unit which contains instructions or lines will bring up a speech -bubble giving extra information derived from the MinorInst/MinorLine lines. - ---start-time and --end-time allow only sections of debug files to be loaded. - ---prefix allows the name prefix of the CPU to be inspected to be supplied. -This defaults to 'system.cpu'. - -In the visualiser, The buttons Start, End, Back, Forward, Play and Stop can be -used to control the displayed simulation time. - -The diagonally striped coloured blocks are showing the InstId of the -instruction or line they represent. Note that lines in Fetch1 and f1ToF2.F -only show the id fields of a line and that instructions in Fetch2, f2ToD, and -decode.inputBuffer do not yet have execute sequence numbers. The T/S.P/L/F.E -buttons can be used to toggle parts of InstId on and off to make it easier to -understand the display. Useful combinations are: - - - - - - - - - - - - - - - - - - - - - - -
CombinationReason
Ejust show the final execute sequence number
F/Eshow the instruction-related numbers
S/Pshow just the stream-related numbers (watch the stream sequence - change with branches and not change with predicted branches)
S/Eshow instructions and their stream
- -The key to the right shows all the displayable colours (some of the colour -choices are quite bad!): - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SymbolMeaning
UUnknown data
BBlocked stage
-Bubble
EEmpty queue slot
RReserved queue slot
FFault
rRead (used as the leftmost stripe on data in the dcachePort)
wWrite " "
0 to 9last decimal digit of the corresponding data
- -\verbatim - - ,---------------. .--------------. *U - | |=|->|=|->|=| | ||=|||->||->|| | *- <- Fetch queues/LSQ - `---------------' `--------------' *R - === ====== *w <- Activity/Stage activity - ,--------------. *1 - ,--. ,. ,. | ============ | *3 <- Scoreboard - | |-\[]-\||-\[]-\||-\[]-\| ============ | *5 <- Execute::inFlightInsts - | | :[] :||-/[]-/||-/[]-/| -. -------- | *7 - | |-/[]-/|| ^ || | | --------- | *9 - | | || | || | | ------ | -[]->| | ->|| | || | | ---- | - | |<-[]<-||<-+-<-||<-[]<-| | ------ |->[] <- Execute to Fetch1, - '--` `' ^ `' | -' ------ | Fetch2 branch data - ---. | ---. `--------------' - ---' | ---' ^ ^ - | ^ | `------------ Execute - MinorBuffer ----' input `-------------------- Execute input buffer - buffer -\endverbatim - -Stages show the colours of the instructions currently being -generated/processed. - -Forward FIFOs between stages show the data being pushed into them at the -current tick (to the left), the data in transit, and the data available at -their outputs (to the right). - -The backwards FIFO between Fetch2 and Fetch1 shows branch prediction data. - -In general, all displayed data is correct at the end of a cycle's activity at -the time indicated but before the inter-stage FIFOs are ticked. Each FIFO -has, therefore an extra slot to show the asserted new input data, and all the -data currently within the FIFO. - -Input buffers for each stage are shown below the corresponding stage and show -the contents of those buffers as horizontal strips. Strips marked as reserved -(cyan by default) are reserved to be filled by the previous stage. An input -buffer with all reserved or occupied slots will, therefore, block the previous -stage from generating output. - -Fetch queues and LSQ show the lines/instructions in the queues of each -interface and show the number of lines/instructions in TLB and memory in the -two striped colours of the top of their frames. - -Inside Execute, the horizontal bars represent the individual FU pipelines. -The vertical bar to the left is the input buffer and the bar to the right, the -instructions committed this cycle. The background of Execute shows -instructions which are being committed this cycle in their original FU -pipeline positions. - -The strip at the top of the Execute block shows the current streamSeqNum that -Execute is committing. A similar stripe at the top of Fetch1 shows that -stage's expected streamSeqNum and the stripe at the top of Fetch2 shows its -issuing predictionSeqNum. - -The scoreboard shows the number of instructions in flight which will commit a -result to the register in the position shown. The scoreboard contains slots -for each integer and floating point register. - -The Execute::inFlightInsts queue shows all the instructions in flight in -Execute with the oldest instruction (the next instruction to be committed) to -the right. - -'Stage activity' shows the signalled activity (as E/1) for each stage (with -CPU miscellaneous activity to the left) - -'Activity' show a count of stage and pipe activity. - -\subsection picformat minor.pic format - -The minor.pic file (src/minor/minor.pic) describes the layout of the -models blocks on the visualiser. Its format is described in the supplied -minor.pic file. - -*/ - -} diff --git a/src/doc/memory_system.doxygen b/src/doc/memory_system.doxygen deleted file mode 100644 index 4fe982068..000000000 --- a/src/doc/memory_system.doxygen +++ /dev/null @@ -1,278 +0,0 @@ -# Copyright (c) 2012 ARM Limited -# All rights reserved -# -# The license below extends only to copyright in the software and shall -# not be construed as granting a license to any other intellectual -# property including but not limited to intellectual property relating -# to a hardware implementation of the functionality of the software -# licensed hereunder. You may use the software subject to the license -# terms below provided that you ensure that this notice is replicated -# unmodified and in its entirety in all distributions of the software, -# modified or unmodified, in source code or in binary form. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are -# met: redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer; -# redistributions in binary form must reproduce the above copyright -# notice, this list of conditions and the following disclaimer in the -# documentation and/or other materials provided with the distribution; -# neither the name of the copyright holders nor the names of its -# contributors may be used to endorse or promote products derived from -# this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# -# Author: Djordje Kovacevic - -/*! \page gem5MemorySystem Memory System in gem5 - - \tableofcontents - - The document describes memory subsystem in gem5 with focus on program flow - during CPU’s simple memory transactions (read or write). - - - \section gem5_MS_MH MODEL HIERARCHY - - Model that is used in this document consists of two out-of-order (O3) - ARM v7 CPUs with corresponding L1 data caches and Simple Memory. It is - created by running gem5 with the following parameters: - - configs/example/fs.py --caches --cpu-type=arm_detailed --num-cpus=2 - - Gem5 uses Simulation Objects (SimObject) derived objects as basic blocks for - building memory system. They are connected via ports with established - master/slave hierarchy. Data flow is initiated on master port while the - response messages and snoop queries appear on the slave port. The following - figure shows the hierarchy of Simulation Objects used in this document: - - \image html "gem5_MS_Fig1.PNG" "Simulation Object hierarchy of the model" width=3cm - - \section gem5_CPU CPU - - It is not in the scope of this document to describe O3 CPU model in details, so - here are only a few relevant notes about the model: - - Read access is initiated by sending message to the port towards DCache - object. If DCache rejects the message (for being blocked or busy) CPU will - flush the pipeline and the access will be re-attempted later on. The access - is completed upon receiving reply message (ReadRep) from DCache. - - Write access is initiated by storing the request into store buffer whose - context is emptied and sent to DCache on every tick. DCache may also reject - the request. Write access is completed when write reply (WriteRep) message is - received from DCache. - - Load & store buffers (for read and write access) don’t impose any - restriction on the number of active memory accesses. Therefore, the maximum - number of outstanding CPU’s memory access requests is not limited by CPU - Simulation Object but by underlying memory system model. - - Split memory access is implemented. - - The message that is sent by CPU contains memory type (Normal, Device, Strongly - Ordered and cachebility) of the accessed region. However, this is not being used - by the rest of the model that takes more simplified approach towards memory types. - - \section gem5_DCache DATA CACHE OBJECT - - Data Cache object implements a standard cache structure: - - \image html "gem5_MS_Fig2.PNG" "DCache Simulation Object" width=3cm - - Cached memory reads that match particular cache tag (with Valid & Read - flags) will be completed (by sending ReadResp to CPU) after a configurable time. - Otherwise, the request is forwarded to Miss Status and Handling Register - (MSHR) block. - - Cached memory writes that match particular cache tag (with Valid, Read - & Write flags) will be completed (by sending WriteResp CPU) after the same - configurable time. Otherwise, the request is forwarded to Miss Status and - Handling Register(MSHR) block. - - Uncached memory reads are forwarded to MSHR block. - - Uncached memory writes are forwarded to WriteBuffer block. - - Evicted (& dirty) cache lines are forwarded to WriteBuffer block. - - CPU’s access to Data Cache is blocked if any of the following is true: - - - MSHR block is full. (The size of MSHR’s buffer is configurable.) - - - Writeback block is full. (The size of the block’s buffer is - configurable.) - - - The number of outstanding memory accesses against the same memory cache line - has reached configurable threshold value – see MSHR and Write Buffer for details. - - Data Cache in block state will reject any request from slave port (from CPU) - regardless of whether it would result in cache hit or miss. Note that - incoming messages on master port (response messages and snoop requests) - are never rejected. - - Cache hit on uncachable memory region (unpredicted behaviour according to - ARM ARM) will invalidate cache line and fetch data from memory. - - \subsection gem5_MS_TAndDBlock Tags & Data Block - - Cache lines (referred as blocks in source code) are organised into sets with - configurable associativity and size. They have the following status flags: - - Valid. It holds data. Address tag is valid - - Read. No read request will be accepted without this flag being set. - For example, cache line is valid and unreadable when it waits for write flag - to complete write access. - - Write. It may accept writes. Cache line with Write flags - identifies Unique state – no other cache memory holds the copy. - - Dirty. It needs Writeback when evicted. - - Read access will hit cache line if address tags match and Valid and Read - flags are set. Write access will hit cache line if address tags match and - Valid, Read and Write flags are set. - - \subsection gem5_MS_Queues MSHR and Write Buffer Queues - - Miss Status and Handling Register (MSHR) queue holds the list of CPU’s - outstanding memory requests that require read access to lower memory - level. They are: - - Cached Read misses. - - Cached Write misses. - - Uncached reads. - - WriteBuffer queue holds the following memory requests: - - Uncached writes. - - Writeback from evicted (& dirty) cache lines. - - \image html "gem5_MS_Fig3.PNG" "MSHR and Write Buffer Blocks" width=6cm - - Each memory request is assigned to corresponding MSHR object (READ or WRITE - on diagram above) that represents particular block (cache line) of memory - that has to be read or written in order to complete the command(s). As shown - on gigure above, cached read/writes against the same cache line have a common - MSHR object and will be completed with a single memory access. - - The size of the block (and therefore the size of read/write access to lower - memory) is: - - The size of cache line for cached access & writeback; - - As specified in CPU instruction for uncached access. - - In general, Data Cache model distinguishes between just two memory types: - - Normal Cached memory. It is always treated as write back, read and write - allocate. - - Normal uncached, Device and Strongly Ordered types are treated equally - (as uncached memory) - - \subsection gem5_MS_Ordering Memory Access Ordering - - An unique order number is assigned to each CPU read/write request(as they appear on - slave port). Order numbers of MSHR objects are copied from the first - assigned read/write. - - Memory read/writes from each of these two queues are executed in order (according - to the assigned order number). When both queues are not empty the model will - execute memory read from MSHR block unless WriteBuffer is full. It will, - however, always preserve the order of read/writes on the same - (or overlapping) memory cache line (block). - - In summary: - - Order of accesses to cached memory is not preserved unless they target - the same cache line. For example, the accesses #1, #5 & #10 will - complete simultaneously in the same tick (still in order). The access - #5 will complete before #3. - - Order of all uncached memory writes is preserved. Write#6 always - completes before Write#13. - - Order to all uncached memory reads is preserved. Read#2 always completes - before Read#8. - - The order of a read and a write uncached access is not necessarily - preserved - unless their access regions overlap. Therefore, Write#6 - always completes before Read#8 (they target the same memory block). - However, Write#13 may complete before Read#8. - - - \section gem5_MS_Bus COHERENT BUS OBJECT - - \image html "gem5_MS_Fig4.PNG" "Coherent Bus Object" width=3cm - - Coherent Bus object provides basic support for snoop protocol: - - All requests on the slave port are forwarded to the appropriate master port. Requests - for cached memory regions are also forwarded to other slave ports (as snoop - requests). - - Master port replies are forwarded to the appropriate slave port. - - Master port snoop requests are forwarded to all slave ports. - - Slave port snoop replies are forwarded to the port that was the source of the - request. (Note that the source of snoop request can be either slave or - master port.) - - The bus declares itself blocked for a configurable period of time after - any of the following events: - - A packet is sent (or failed to be sent) to a slave port. - - A reply message is sent to a master port. - - Snoop response from one slave port is sent to another slave port. - - The bus in blocked state rejects the following incoming messages: - - Slave port requests. - - Master port replies. - - Master port snoop requests. - - \section gem5_MS_SimpleMemory SIMPLE MEMORY OBJECT - - It never blocks the access on slave port. - - Memory read/write takes immediate effect. (Read or write is performed when - the request is received). - - Reply message is sent after a configurable period of time . - - \section gem5_MS_MessageFlow MESSAGE FLOW - - \subsection gem5_MS_Ordering Read Access - - The following diagram shows read access that hits Data Cache line with Valid - and Read flags: - - \image html "gem5_MS_Fig5.PNG" "Read Hit (Read flag must be set in cache line)" width=3cm - - Cache miss read access will generate the following sequence of messages: - - \image html "gem5_MS_Fig6.PNG" "Read Miss with snoop reply" width=3cm - - Note that bus object never gets response from both DCache2 and Memory object. - It sends the very same ReadReq package (message) object to memory and data - cache. When Data Cache wants to reply on snoop request it marks the message - with MEM_INHIBIT flag that tells Memory object not to process the message. - - \subsection gem5_MS_Ordering Write Access - - The following diagram shows write access that hits DCache1 cache line with - Valid & Write flags: - - \image html "gem5_MS_Fig7.PNG" "Write Hit (with Write flag set in cache line)" width=3cm - - Next figure shows write access that hits DCache1 cache line with Valid but no - Write flags – which qualifies as write miss. DCache1 issues UpgradeReq to - obtain write permission. DCache2::snoopTiming will invalidate cache line that - has been hit. Note that UpgradeResp message doesn’t carry data. - - \image html "gem5_MS_Fig8.PNG" "Write Miss – matching tag with no Write flag" width=3cm - - The next diagram shows write miss in DCache. ReadExReq invalidates cache line - in DCache2. ReadExResp carries the content of memory cache line. - - \image html "gem5_MS_Fig9.PNG" "Miss - no matching tag" width=3cm - -*/ diff --git a/src/doc/power_thermal_model.doxygen b/src/doc/power_thermal_model.doxygen deleted file mode 100644 index 8fc3c0ddb..000000000 --- a/src/doc/power_thermal_model.doxygen +++ /dev/null @@ -1,129 +0,0 @@ -# Copyright (c) 2016 ARM Limited -# All rights reserved -# -# The license below extends only to copyright in the software and shall -# not be construed as granting a license to any other intellectual -# property including but not limited to intellectual property relating -# to a hardware implementation of the functionality of the software -# licensed hereunder. You may use the software subject to the license -# terms below provided that you ensure that this notice is replicated -# unmodified and in its entirety in all distributions of the software, -# modified or unmodified, in source code or in binary form. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are -# met: redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer; -# redistributions in binary form must reproduce the above copyright -# notice, this list of conditions and the following disclaimer in the -# documentation and/or other materials provided with the distribution; -# neither the name of the copyright holders nor the names of its -# contributors may be used to endorse or promote products derived from -# this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# -# Author: David Guillen Fandos - -/*! \page gem5PowerModel Gem5 Power & Thermal model - - \tableofcontents - - This document gives an overview of the power and thermal modelling - infrastructure in Gem5. The purpose is to give a high level view of - all the pieces involved and how they interact with each other and - the simulator. - - \section gem5_PM_CD Class overview - - Classes involved in the power model are: - - - PowerModel: Represents a power model for a hardware component. - - - PowerModelState: Represents a power model for a hardware component - in a certain power state. It is an abstract class that defines an - interface that must be implemented for each model. - - - MathExprPowerModel: Simple implementation of PowerModelState that - assumes that power can be modeled using a simple power - - Classes involved in the thermal model are: - - - ThermalModel: Contains the system thermal model logic and state. - It performs the power query and temperature update. It also enables - gem5 to query for temperature (for OS reporting). - - - ThermalDomain: Represents an entity that generates heat. It's - essentially a group of SimObjects grouped under a SubSystem component - that have its own thermal behaviour. - - - ThermalNode: Represents a node in the thermal circuital equivalent. - The node has a temperature and interacts with other nodes through - connections (thermal resistors and capacitors). - - - ThermalReference: Temperature reference for the thermal model - (essentially a thermal node with a fixed temperature), can be used - to model air or any other constant temperature domains. - - - ThermalEntity: A thermal component that connects two thermal nodes - and models a thermal impedance between them. This class is just an - abstract interface. - - - ThermalResistor: Implements ThermalEntity to model a thermal resistance - between the two nodes it connects. Thermal resistances model the - capacity of a material to transfer heat (units in K/W). - - - ThermalCapacitor. Implements ThermalEntity to model a thermal - capacitance. Thermal capacitors are used to model material's thermal - capacitance, this is, the ability to change a certain material - temperature (units in J/K). - - \section gem5_thermal Thermal model - - The thermal model works by creating a circuital equivalent of the - simulated platform. Each node in the circuit has a temperature (as - voltage equivalent) and power flows between nodes (as current in a - circuit). - - To build this equivalent temperature model the platform is required - to group the power actors (any component that has a power model) - under SubSystems and attach ThermalDomains to those subsystems. - Other components might also be created (like ThermalReferences) and - connected all together by creating thermal entities (capacitors and - resistors). - - Last step to conclude the thermal model is to create the ThermalModel - instance itself and attach all the instances used to it, so it can - properly update them at runtime. Only one thermal model instance is - supported right now and it will automatically report temperature when - appropriate (ie. platform sensor devices). - - \section gem5_power Power model - - Every ClockedObject has a power model associated. If this power model is - non-null power will be calculated at every stats dump (although it might - be possible to force power evaluation at any other point, if the power - model uses the stats, it is a good idea to keep both events in sync). - The definition of a power model is quite vague in the sense that it is - as flexible as users want it to be. The only enforced contraints so far - is the fact that a power model has several power state models, one for - each possible power state for that hardware block. When it comes to compute - power consumption the power is just the weighted average of each power model. - - A power state model is essentially an interface that allows us to define two - power functions for dynamic and static. As an example implementation a class - called MathExprPowerModel has been provided. This implementation allows the - user to define a power model as an equation involving several statistics. - There's also some automatic (or "magic") variables such as "temp", which - reports temperature. -*/