util: Make dot_writer ignore NULL simobjects.
[gem5.git] / src / doc / inside-minor.doxygen
1 # Copyright (c) 2014 ARM Limited
2 # All rights reserved
3 #
4 # The license below extends only to copyright in the software and shall
5 # not be construed as granting a license to any other intellectual
6 # property including but not limited to intellectual property relating
7 # to a hardware implementation of the functionality of the software
8 # licensed hereunder. You may use the software subject to the license
9 # terms below provided that you ensure that this notice is replicated
10 # unmodified and in its entirety in all distributions of the software,
11 # modified or unmodified, in source code or in binary form.
12 #
13 # Redistribution and use in source and binary forms, with or without
14 # modification, are permitted provided that the following conditions are
15 # met: redistributions of source code must retain the above copyright
16 # notice, this list of conditions and the following disclaimer;
17 # redistributions in binary form must reproduce the above copyright
18 # notice, this list of conditions and the following disclaimer in the
19 # documentation and/or other materials provided with the distribution;
20 # neither the name of the copyright holders nor the names of its
21 # contributors may be used to endorse or promote products derived from
22 # this software without specific prior written permission.
23 #
24 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
25 # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
26 # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
27 # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
28 # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
29 # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
30 # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
31 # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
32 # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
33 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
34 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
35 #
36 # Authors: Andrew Bardsley
37
38 namespace Minor
39 {
40
41 /*!
42
43 \page minor Inside the Minor CPU model
44
45 \tableofcontents
46
47 This document contains a description of the structure and function of the
48 Minor gem5 in-order processor model. It is recommended reading for anyone who
49 wants to understand Minor's internal organisation, design decisions, C++
50 implementation and Python configuration. A familiarity with gem5 and some of
51 its internal structures is assumed. This document is meant to be read
52 alongside the Minor source code and to explain its general structure without
53 being too slavish about naming every function and data type.
54
55 \section whatis What is Minor?
56
57 Minor is an in-order processor model with a fixed pipeline but configurable
58 data structures and execute behaviour. It is intended to be used to model
59 processors with strict in-order execution behaviour and allows visualisation
60 of an instruction's position in the pipeline through the
61 MinorTrace/minorview.py format/tool. The intention is to provide a framework
62 for micro-architecturally correlating the model with a particular, chosen
63 processor with similar capabilities.
64
65 \section philo Design philosophy
66
67 \subsection mt Multithreading
68
69 The model isn't currently capable of multithreading but there are THREAD
70 comments in key places where stage data needs to be arrayed to support
71 multithreading.
72
73 \subsection structs Data structures
74
75 Decorating data structures with large amounts of life-cycle information is
76 avoided. Only instructions (MinorDynInst) contain a significant proportion of
77 their data content whose values are not set at construction.
78
79 All internal structures have fixed sizes on construction. Data held in queues
80 and FIFOs (MinorBuffer, FUPipeline) should have a BubbleIF interface to
81 allow a distinct 'bubble'/no data value option for each type.
82
83 Inter-stage 'struct' data is packaged in structures which are passed by value.
84 Only MinorDynInst, the line data in ForwardLineData and the memory-interfacing
85 objects Fetch1::FetchRequest and LSQ::LSQRequest are '::new' allocated while
86 running the model.
87
88 \section model Model structure
89
90 Objects of class MinorCPU are provided by the model to gem5. MinorCPU
91 implements the interfaces of (cpu.hh) and can provide data and
92 instruction interfaces for connection to a cache system. The model is
93 configured in a similar way to other gem5 models through Python. That
94 configuration is passed on to MinorCPU::pipeline (of class Pipeline) which
95 actually implements the processor pipeline.
96
97 The hierarchy of major unit ownership from MinorCPU down looks like this:
98
99 <ul>
100 <li>MinorCPU</li>
101 <ul>
102 <li>Pipeline - container for the pipeline, owns the cyclic 'tick'
103 event mechanism and the idling (cycle skipping) mechanism.</li>
104 <ul>
105 <li>Fetch1 - instruction fetch unit responsible for fetching cache
106 lines (or parts of lines from the I-cache interface)</li>
107 <ul>
108 <li>Fetch1::IcachePort - interface to the I-cache from
109 Fetch1</li>
110 </ul>
111 <li>Fetch2 - line to instruction decomposition</li>
112 <li>Decode - instruction to micro-op decomposition</li>
113 <li>Execute - instruction execution and data memory
114 interface</li>
115 <ul>
116 <li>LSQ - load store queue for memory ref. instructions</li>
117 <li>LSQ::DcachePort - interface to the D-cache from
118 Execute</li>
119 </ul>
120 </ul>
121 </ul>
122 </ul>
123
124 \section keystruct Key data structures
125
126 \subsection ids Instruction and line identity: InstId (dyn_inst.hh)
127
128 An InstId contains the sequence numbers and thread numbers that describe the
129 life cycle and instruction stream affiliations of individual fetched cache
130 lines and instructions.
131
132 An InstId is printed in one of the following forms:
133
134 - T/S.P/L - for fetched cache lines
135 - T/S.P/L/F - for instructions before Decode
136 - T/S.P/L/F.E - for instructions from Decode onwards
137
138 for example:
139
140 - 0/10.12/5/6.7
141
142 InstId's fields are:
143
144 <table>
145 <tr>
146 <td><b>Field</b></td>
147 <td><b>Symbol</b></td>
148 <td><b>Generated by</b></td>
149 <td><b>Checked by</b></td>
150 <td><b>Function</b></td>
151 </tr>
152
153 <tr>
154 <td>InstId::threadId</td>
155 <td>T</td>
156 <td>Fetch1</td>
157 <td>Everywhere the thread number is needed</td>
158 <td>Thread number (currently always 0).</td>
159 </tr>
160
161 <tr>
162 <td>InstId::streamSeqNum</td>
163 <td>S</td>
164 <td>Execute</td>
165 <td>Fetch1, Fetch2, Execute (to discard lines/insts)</td>
166 <td>Stream sequence number as chosen by Execute. Stream
167 sequence numbers change after changes of PC (branches, exceptions) in
168 Execute and are used to separate pre and post branch instruction
169 streams.</td>
170 </tr>
171
172 <tr>
173 <td>InstId::predictionSeqNum</td>
174 <td>P</td>
175 <td>Fetch2</td>
176 <td>Fetch2 (while discarding lines after prediction)</td>
177 <td>Prediction sequence numbers represent branch prediction decisions.
178 This is used by Fetch2 to mark lines/instructions according to the last
179 followed branch prediction made by Fetch2. Fetch2 can signal to Fetch1
180 that it should change its fetch address and mark lines with a new
181 prediction sequence number (which it will only do if the stream sequence
182 number Fetch1 expects matches that of the request). </td> </tr>
183
184 <tr>
185 <td>InstId::lineSeqNum</td>
186 <td>L</td>
187 <td>Fetch1</td>
188 <td>(Just for debugging)</td>
189 <td>Line fetch sequence number of this cache line or the line
190 this instruction was extracted from.
191 </td>
192 </tr>
193
194 <tr>
195 <td>InstId::fetchSeqNum</td>
196 <td>F</td>
197 <td>Fetch2</td>
198 <td>Fetch2 (as the inst. sequence number for branches)</td>
199 <td>Instruction fetch order assigned by Fetch2 when lines
200 are decomposed into instructions.
201 </td>
202 </tr>
203
204 <tr>
205 <td>InstId::execSeqNum</td>
206 <td>E</td>
207 <td>Decode</td>
208 <td>Execute (to check instruction identity in queues/FUs/LSQ)</td>
209 <td>Instruction order after micro-op decomposition.</td>
210 </tr>
211
212 </table>
213
214 The sequence number fields are all independent of each other and although, for
215 instance, InstId::execSeqNum for an instruction will always be >=
216 InstId::fetchSeqNum, the comparison is not useful.
217
218 The originating stage of each sequence number field keeps a counter for that
219 field which can be incremented in order to generate new, unique numbers.
220
221 \subsection insts Instructions: MinorDynInst (dyn_inst.hh)
222
223 MinorDynInst represents an instruction's progression through the pipeline. An
224 instruction can be three things:
225
226 <table>
227 <tr>
228 <td><b>Thing</b></td>
229 <td><b>Predicate</b></td>
230 <td><b>Explanation</b></td>
231 </tr>
232 <tr>
233 <td>A bubble</td>
234 <td>MinorDynInst::isBubble()</td>
235 <td>no instruction at all, just a space-filler</td>
236 </tr>
237 <tr>
238 <td>A fault</td>
239 <td>MinorDynInst::isFault()</td>
240 <td>a fault to pass down the pipeline in an instruction's clothing</td>
241 </tr>
242 <tr>
243 <td>A decoded instruction</td>
244 <td>MinorDynInst::isInst()</td>
245 <td>instructions are actually passed to the gem5 decoder in Fetch2 and so
246 are created fully decoded. MinorDynInst::staticInst is the decoded
247 instruction form.</td>
248 </tr>
249 </table>
250
251 Instructions are reference counted using the gem5 RefCountingPtr
252 (base/refcnt.hh) wrapper. They therefore usually appear as MinorDynInstPtr in
253 code. Note that as RefCountingPtr initialises as nullptr rather than an
254 object that supports BubbleIF::isBubble, passing raw MinorDynInstPtrs to
255 Queue%s and other similar structures from stage.hh without boxing is
256 dangerous.
257
258 \subsection fld ForwardLineData (pipe_data.hh)
259
260 ForwardLineData is used to pass cache lines from Fetch1 to Fetch2. Like
261 MinorDynInst%s, they can be bubbles (ForwardLineData::isBubble()),
262 fault-carrying or can contain a line (partial line) fetched by Fetch1. The
263 data carried by ForwardLineData is owned by a Packet object returned from
264 memory and is explicitly memory managed and do must be deleted once processed
265 (by Fetch2 deleting the Packet).
266
267 \subsection fid ForwardInstData (pipe_data.hh)
268
269 ForwardInstData can contain up to ForwardInstData::width() instructions in its
270 ForwardInstData::insts vector. This structure is used to carry instructions
271 between Fetch2, Decode and Execute and to store input buffer vectors in Decode
272 and Execute.
273
274 \subsection fr Fetch1::FetchRequest (fetch1.hh)
275
276 FetchRequests represent I-cache line fetch requests. The are used in the
277 memory queues of Fetch1 and are pushed into/popped from Packet::senderState
278 while traversing the memory system.
279
280 FetchRequests contain a memory system Request (mem/request.hh) for that fetch
281 access, a packet (Packet, mem/packet.hh), if the request gets to memory, and a
282 fault field that can be populated with a TLB-sourced prefetch fault (if any).
283
284 \subsection lsqr LSQ::LSQRequest (execute.hh)
285
286 LSQRequests are similar to FetchRequests but for D-cache accesses. They carry
287 the instruction associated with a memory access.
288
289 \section pipeline The pipeline
290
291 \verbatim
292 ------------------------------------------------------------------------------
293 Key:
294
295 [] : inter-stage BufferBuffer
296 ,--.
297 | | : pipeline stage
298 `--'
299 ---> : forward communication
300 <--- : backward communication
301
302 rv : reservation information for input buffers
303
304 ,------. ,------. ,------. ,-------.
305 (from --[]-v->|Fetch1|-[]->|Fetch2|-[]->|Decode|-[]->|Execute|--> (to Fetch1
306 Execute) | | |<-[]-| |<-rv-| |<-rv-| | & Fetch2)
307 | `------'<-rv-| | | | | |
308 `-------------->| | | | | |
309 `------' `------' `-------'
310 ------------------------------------------------------------------------------
311 \endverbatim
312
313 The four pipeline stages are connected together by MinorBuffer FIFO
314 (stage.hh, derived ultimately from TimeBuffer) structures which allow
315 inter-stage delays to be modelled. There is a MinorBuffer%s between adjacent
316 stages in the forward direction (for example: passing lines from Fetch1 to
317 Fetch2) and, between Fetch2 and Fetch1, a buffer in the backwards direction
318 carrying branch predictions.
319
320 Stages Fetch2, Decode and Execute have input buffers which, each cycle, can
321 accept input data from the previous stage and can hold that data if the stage
322 is not ready to process it. Input buffers store data in the same form as it
323 is received and so Decode and Execute's input buffers contain the output
324 instruction vector (ForwardInstData (pipe_data.hh)) from their previous stages
325 with the instructions and bubbles in the same positions as a single buffer
326 entry.
327
328 Stage input buffers provide a Reservable (stage.hh) interface to their
329 previous stages, to allow slots to be reserved in their input buffers, and
330 communicate their input buffer occupancy backwards to allow the previous stage
331 to plan whether it should make an output in a given cycle.
332
333 \subsection events Event handling: MinorActivityRecorder (activity.hh,
334 pipeline.hh)
335
336 Minor is essentially a cycle-callable model with some ability to skip cycles
337 based on pipeline activity. External events are mostly received by callbacks
338 (e.g. Fetch1::IcachePort::recvTimingResp) and cause the pipeline to be woken
339 up to service advancing request queues.
340
341 Ticked (sim/ticked.hh) is a base class bringing together an evaluate
342 member function and a provided SimObject. It provides a Ticked::start/stop
343 interface to start and pause clock events from being periodically issued.
344 Pipeline is a derived class of Ticked.
345
346 During evaluate calls, stages can signal that they still have work to do in
347 the next cycle by calling either MinorCPU::activityRecorder->activity() (for
348 non-callable related activity) or MinorCPU::wakeupOnEvent(<stageId>) (for
349 stage callback-related 'wakeup' activity).
350
351 Pipeline::evaluate contains calls to evaluate for each unit and a test for
352 pipeline idling which can turns off the clock tick if no unit has signalled
353 that it may become active next cycle.
354
355 Within Pipeline (pipeline.hh), the stages are evaluated in reverse order (and
356 so will ::evaluate in reverse order) and their backwards data can be
357 read immediately after being written in each cycle allowing output decisions
358 to be 'perfect' (allowing synchronous stalling of the whole pipeline). Branch
359 predictions from Fetch2 to Fetch1 can also be transported in 0 cycles making
360 fetch1ToFetch2BackwardDelay the only configurable delay which can be set as
361 low as 0 cycles.
362
363 The MinorCPU::activateContext and MinorCPU::suspendContext interface can be
364 called to start and pause threads (threads in the MT sense) and to start and
365 pause the pipeline. Executing instructions can call this interface
366 (indirectly through the ThreadContext) to idle the CPU/their threads.
367
368 \subsection stages Each pipeline stage
369
370 In general, the behaviour of a stage (each cycle) is:
371
372 \verbatim
373 evaluate:
374 push input to inputBuffer
375 setup references to input/output data slots
376
377 do 'every cycle' 'step' tasks
378
379 if there is input and there is space in the next stage:
380 process and generate a new output
381 maybe re-activate the stage
382
383 send backwards data
384
385 if the stage generated output to the following FIFO:
386 signal pipe activity
387
388 if the stage has more processable input and space in the next stage:
389 re-activate the stage for the next cycle
390
391 commit the push to the inputBuffer if that data hasn't all been used
392 \endverbatim
393
394 The Execute stage differs from this model as its forward output (branch) data
395 is unconditionally sent to Fetch1 and Fetch2. To allow this behaviour, Fetch1
396 and Fetch2 must be unconditionally receptive to that data.
397
398 \subsection fetch1 Fetch1 stage
399
400 Fetch1 is responsible for fetching cache lines or partial cache lines from the
401 I-cache and passing them on to Fetch2 to be decomposed into instructions. It
402 can receive 'change of stream' indications from both Execute and Fetch2 to
403 signal that it should change its internal fetch address and tag newly fetched
404 lines with new stream or prediction sequence numbers. When both Execute and
405 Fetch2 signal changes of stream at the same time, Fetch1 takes Execute's
406 change.
407
408 Every line issued by Fetch1 will bear a unique line sequence number which can
409 be used for debugging stream changes.
410
411 When fetching from the I-cache, Fetch1 will ask for data from the current
412 fetch address (Fetch1::pc) up to the end of the 'data snap' size set in the
413 parameter fetch1LineSnapWidth. Subsequent autonomous line fetches will fetch
414 whole lines at a snap boundary and of size fetch1LineWidth.
415
416 Fetch1 will only initiate a memory fetch if it can reserve space in Fetch2
417 input buffer. That input buffer serves an the fetch queue/LFL for the system.
418
419 Fetch1 contains two queues: requests and transfers to handle the stages of
420 translating the address of a line fetch (via the TLB) and accommodating the
421 request/response of fetches to/from memory.
422
423 Fetch requests from Fetch1 are pushed into the requests queue as newly
424 allocated FetchRequest objects once they have been sent to the ITLB with a
425 call to itb->translateTiming.
426
427 A response from the TLB moves the request from the requests queue to the
428 transfers queue. If there is more than one entry in each queue, it is
429 possible to get a TLB response for request which is not at the head of the
430 requests queue. In that case, the TLB response is marked up as a state change
431 to Translated in the request object, and advancing the request to transfers
432 (and the memory system) is left to calls to Fetch1::stepQueues which is called
433 in the cycle following any event is received.
434
435 Fetch1::tryToSendToTransfers is responsible for moving requests between the
436 two queues and issuing requests to memory. Failed TLB lookups (prefetch
437 aborts) continue to occupy space in the queues until they are recovered at the
438 head of transfers.
439
440 Responses from memory change the request object state to Complete and
441 Fetch1::evaluate can pick up response data, package it in the ForwardLineData
442 object, and forward it to Fetch2%'s input buffer.
443
444 As space is always reserved in Fetch2::inputBuffer, setting the input buffer's
445 size to 1 results in non-prefetching behaviour.
446
447 When a change of stream occurs, translated requests queue members and
448 completed transfers queue members can be unconditionally discarded to make way
449 for new transfers.
450
451 \subsection fetch2 Fetch2 stage
452
453 Fetch2 receives a line from Fetch1 into its input buffer. The data in the
454 head line in that buffer is iterated over and separated into individual
455 instructions which are packed into a vector of instructions which can be
456 passed to Decode. Packing instructions can be aborted early if a fault is
457 found in either the input line as a whole or a decomposed instruction.
458
459 \subsubsection bp Branch prediction
460
461 Fetch2 contains the branch prediction mechanism. This is a wrapper around the
462 branch predictor interface provided by gem5 (cpu/pred/...).
463
464 Branches are predicted for any control instructions found. If prediction is
465 attempted for an instruction, the MinorDynInst::triedToPredict flag is set on
466 that instruction.
467
468 When a branch is predicted to take, the MinorDynInst::predictedTaken flag is
469 set and MinorDynInst::predictedTarget is set to the predicted target PC value.
470 The predicted branch instruction is then packed into Fetch2%'s output vector,
471 the prediction sequence number is incremented, and the branch is communicated
472 to Fetch1.
473
474 After signalling a prediction, Fetch2 will discard its input buffer contents
475 and will reject any new lines which have the same stream sequence number as
476 that branch but have a different prediction sequence number. This allows
477 following sequentially fetched lines to be rejected without ignoring new lines
478 generated by a change of stream indicated from a 'real' branch from Execute
479 (which will have a new stream sequence number).
480
481 The program counter value provided to Fetch2 by Fetch1 packets is only updated
482 when there is a change of stream. Fetch2::havePC indicates whether the PC
483 will be picked up from the next processed input line. Fetch2::havePC is
484 necessary to allow line-wrapping instructions to be tracked through decode.
485
486 Branches (and instructions predicted to branch) which are processed by Execute
487 will generate BranchData (pipe_data.hh) data explaining the outcome of the
488 branch which is sent forwards to Fetch1 and Fetch2. Fetch1 uses this data to
489 change stream (and update its stream sequence number and address for new
490 lines). Fetch2 uses it to update the branch predictor. Minor does not
491 communicate branch data to the branch predictor for instructions which are
492 discarded on the way to commit.
493
494 BranchData::BranchReason (pipe_data.hh) encodes the possible branch scenarios:
495
496 <table>
497 <tr>
498 <td>Branch enum val.</td>
499 <td>In Execute</td>
500 <td>Fetch1 reaction</td>
501 <td>Fetch2 reaction</td>
502 </tr>
503 <tr>
504 <td>NoBranch</td>
505 <td>(output bubble data)</td>
506 <td>-</td>
507 <td>-</td>
508 </tr>
509 <tr>
510 <td>CorrectlyPredictedBranch</td>
511 <td>Predicted, taken</td>
512 <td>-</td>
513 <td>Update BP as taken branch</td>
514 </tr>
515 <tr>
516 <td>UnpredictedBranch</td>
517 <td>Not predicted, taken and was taken</td>
518 <td>New stream</td>
519 <td>Update BP as taken branch</td>
520 </tr>
521 <tr>
522 <td>BadlyPredictedBranch</td>
523 <td>Predicted, not taken</td>
524 <td>New stream to restore to old inst. source</td>
525 <td>Update BP as not taken branch</td>
526 </tr>
527 <tr>
528 <td>BadlyPredictedBranchTarget</td>
529 <td>Predicted, taken, but to a different target than predicted one</td>
530 <td>New stream</td>
531 <td>Update BTB to new target</td>
532 </tr>
533 <tr>
534 <td>SuspendThread</td>
535 <td>Hint to suspend fetching</td>
536 <td>Suspend fetch for this thread (branch to next inst. as wakeup
537 fetch addr)</td>
538 <td>-</td>
539 </tr>
540 <tr>
541 <td>Interrupt</td>
542 <td>Interrupt detected</td>
543 <td>New stream</td>
544 <td>-</td>
545 </tr>
546 </table>
547
548 The parameter decodeInputWidth sets the number of instructions which can be
549 packed into the output per cycle. If the parameter fetch2CycleInput is true,
550 Decode can try to take instructions from more than one entry in its input
551 buffer per cycle.
552
553 \subsection decode Decode stage
554
555 Decode takes a vector of instructions from Fetch2 (via its input buffer) and
556 decomposes those instructions into micro-ops (if necessary) and packs them
557 into its output instruction vector.
558
559 The parameter executeInputWidth sets the number of instructions which can be
560 packed into the output per cycle. If the parameter decodeCycleInput is true,
561 Decode can try to take instructions from more than one entry in its input
562 buffer per cycle.
563
564 \subsection execute Execute stage
565
566 Execute provides all the instruction execution and memory access mechanisms.
567 An instructions passage through Execute can take multiple cycles with its
568 precise timing modelled by a functional unit pipeline FIFO.
569
570 A vector of instructions (possibly including fault 'instructions') is provided
571 to Execute by Decode and can be queued in the Execute input buffer before
572 being issued. Setting the parameter executeCycleInput allows execute to
573 examine more than one input buffer entry (more than one instruction vector).
574 The number of instructions in the input vector can be set with
575 executeInputWidth and the depth of the input buffer can be set with parameter
576 executeInputBufferSize.
577
578 \subsubsection fus Functional units
579
580 The Execute stage contains pipelines for each functional unit comprising the
581 computational core of the CPU. Functional units are configured via the
582 executeFuncUnits parameter. Each functional unit has a number of instruction
583 classes it supports, a stated delay between instruction issues, and a delay
584 from instruction issue to (possible) commit and an optional timing annotation
585 capable of more complicated timing.
586
587 Each active cycle, Execute::evaluate performs this action:
588
589 \verbatim
590 Execute::evaluate:
591 push input to inputBuffer
592 setup references to input/output data slots and branch output slot
593
594 step D-cache interface queues (similar to Fetch1)
595
596 if interrupt posted:
597 take interrupt (signalling branch to Fetch1/Fetch2)
598 else
599 commit instructions
600 issue new instructions
601
602 advance functional unit pipelines
603
604 reactivate Execute if the unit is still active
605
606 commit the push to the inputBuffer if that data hasn't all been used
607 \endverbatim
608
609 \subsubsection fifos Functional unit FIFOs
610
611 Functional units are implemented as SelfStallingPipelines (stage.hh). These
612 are TimeBuffer FIFOs with two distinct 'push' and 'pop' wires. They respond
613 to SelfStallingPipeline::advance in the same way as TimeBuffers <b>unless</b>
614 there is data at the far, 'pop', end of the FIFO. A 'stalled' flag is
615 provided for signalling stalling and to allow a stall to be cleared. The
616 intention is to provide a pipeline for each functional unit which will never
617 advance an instruction out of that pipeline until it has been processed and
618 the pipeline is explicitly unstalled.
619
620 The actions 'issue', 'commit', and 'advance' act on the functional units.
621
622 \subsubsection issue Issue
623
624 Issuing instructions involves iterating over both the input buffer
625 instructions and the heads of the functional units to try and issue
626 instructions in order. The number of instructions which can be issued each
627 cycle is limited by the parameter executeIssueLimit, how executeCycleInput is
628 set, the availability of pipeline space and the policy used to choose a
629 pipeline in which the instruction can be issued.
630
631 At present, the only issue policy is strict round-robin visiting of each
632 pipeline with the given instructions in sequence. For greater flexibility,
633 better (and more specific policies) will need to be possible.
634
635 Memory operation instructions traverse their functional units to perform their
636 EA calculations. On 'commit', the ExecContext::initiateAcc execution phase is
637 performed and any memory access is issued (via. ExecContext::{read,write}Mem
638 calling LSQ::pushRequest) to the LSQ.
639
640 Note that faults are issued as if they are instructions and can (currently) be
641 issued to *any* functional unit.
642
643 Every issued instruction is also pushed into the Execute::inFlightInsts queue.
644 Memory ref. instructions are pushing into Execute::inFUMemInsts queue.
645
646 \subsubsection commit Commit
647
648 Instructions are committed by examining the head of the Execute::inFlightInsts
649 queue (which is decorated with the functional unit number to which the
650 instruction was issued). Instructions which can then be found in their
651 functional units are executed and popped from Execute::inFlightInsts.
652
653 Memory operation instructions are committed into the memory queues (as
654 described above) and exit their functional unit pipeline but are not popped
655 from the Execute::inFlightInsts queue. The Execute::inFUMemInsts queue
656 provides ordering to memory operations as they pass through the functional
657 units (maintaining issue order). On entering the LSQ, instructions are popped
658 from Execute::inFUMemInsts.
659
660 If the parameter executeAllowEarlyMemoryIssue is set, memory operations can be
661 sent from their FU to the LSQ before reaching the head of
662 Execute::inFlightInsts but after their dependencies are met.
663 MinorDynInst::instToWaitFor is marked up with the latest dependent instruction
664 execSeqNum required to be committed for a memory operation to progress to the
665 LSQ.
666
667 Once a memory response is available (by testing the head of
668 Execute::inFlightInsts against LSQ::findResponse), commit will process that
669 response (ExecContext::completeAcc) and pop the instruction from
670 Execute::inFlightInsts.
671
672 Any branch, fault or interrupt will cause a stream sequence number change and
673 signal a branch to Fetch1/Fetch2. Only instructions with the current stream
674 sequence number will be issued and/or committed.
675
676 \subsubsection advance Advance
677
678 All non-stalled pipeline are advanced and may, thereafter, become stalled.
679 Potential activity in the next cycle is signalled if there are any
680 instructions remaining in any pipeline.
681
682 \subsubsection sb Scoreboard
683
684 The scoreboard (Scoreboard) is used to control instruction issue. It contains
685 a count of the number of in flight instructions which will write each general
686 purpose CPU integer or float register. Instructions will only be issued where
687 the scoreboard contains a count of 0 instructions which will write to one of
688 the instructions source registers.
689
690 Once an instruction is issued, the scoreboard counts for each destination
691 register for an instruction will be incremented.
692
693 The estimated delivery time of the instruction's result is marked up in the
694 scoreboard by adding the length of the issued-to FU to the current time. The
695 timings parameter on each FU provides a list of additional rules for
696 calculating the delivery time. These are documented in the parameter comments
697 in MinorCPU.py.
698
699 On commit, (for memory operations, memory response commit) the scoreboard
700 counters for an instruction's source registers are decremented. will be
701 decremented.
702
703 \subsubsection ifi Execute::inFlightInsts
704
705 The Execute::inFlightInsts queue will always contain all instructions in
706 flight in Execute in the correct issue order. Execute::issue is the only
707 process which will push an instruction into the queue. Execute::commit is the
708 only process that can pop an instruction.
709
710 \subsubsection lsq LSQ
711
712 The LSQ can support multiple outstanding transactions to memory in a number of
713 conservative cases.
714
715 There are three queues to contain requests: requests, transfers and the store
716 buffer. The requests and transfers queue operate in a similar manner to the
717 queues in Fetch1. The store buffer is used to decouple the delay of
718 completing store operations from following loads.
719
720 Requests are issued to the DTLB as their instructions leave their functional
721 unit. At the head of requests, cacheable load requests can be sent to memory
722 and on to the transfers queue. Cacheable stores will be passed to transfers
723 unprocessed and progress that queue maintaining order with other transactions.
724
725 The conditions in LSQ::tryToSendToTransfers dictate when requests can
726 be sent to memory.
727
728 All uncacheable transactions, split transactions and locked transactions are
729 processed in order at the head of requests. Additionally, store results
730 residing in the store buffer can have their data forwarded to cacheable loads
731 (removing the need to perform a read from memory) but no cacheable load can be
732 issue to the transfers queue until that queue's stores have drained into the
733 store buffer.
734
735 At the end of transfers, requests which are LSQ::LSQRequest::Complete (are
736 faulting, are cacheable stores, or have been sent to memory and received a
737 response) can be picked off by Execute and either committed
738 (ExecContext::completeAcc) and, for stores, be sent to the store buffer.
739
740 Barrier instructions do not prevent cacheable loads from progressing to memory
741 but do cause a stream change which will discard that load. Stores will not be
742 committed to the store buffer if they are in the shadow of the barrier but
743 before the new instruction stream has arrived at Execute. As all other memory
744 transactions are delayed at the end of the requests queue until they are at
745 the head of Execute::inFlightInsts, they will be discarded by any barrier
746 stream change.
747
748 After commit, LSQ::BarrierDataRequest requests are inserted into the
749 store buffer to track each barrier until all preceding memory transactions
750 have drained from the store buffer. No further memory transactions will be
751 issued from the ends of FUs until after the barrier has drained.
752
753 \subsubsection drain Draining
754
755 Draining is mostly handled by the Execute stage. When initiated by calling
756 MinorCPU::drain, Pipeline::evaluate checks the draining status of each unit
757 each cycle and keeps the pipeline active until draining is complete. It is
758 Pipeline that signals the completion of draining. Execute is triggered by
759 MinorCPU::drain and starts stepping through its Execute::DrainState state
760 machine, starting from state Execute::NotDraining, in this order:
761
762 <table>
763 <tr>
764 <td><b>State</b></td>
765 <td><b>Meaning</b></td>
766 </tr>
767 <tr>
768 <td>Execute::NotDraining</td>
769 <td>Not trying to drain, normal execution</td>
770 </tr>
771 <tr>
772 <td>Execute::DrainCurrentInst</td>
773 <td>Draining micro-ops to complete inst.</td>
774 </tr>
775 <tr>
776 <td>Execute::DrainHaltFetch</td>
777 <td>Halt fetching instructions</td>
778 </tr>
779 <tr>
780 <td>Execute::DrainAllInsts</td>
781 <td>Discarding all instructions presented</td>
782 </tr>
783 </table>
784
785 When complete, a drained Execute unit will be in the Execute::DrainAllInsts
786 state where it will continue to discard instructions but has no knowledge of
787 the drained state of the rest of the model.
788
789 \section debug Debug options
790
791 The model provides a number of debug flags which can be passed to gem5 with
792 the --debug-flags option.
793
794 The available flags are:
795
796 <table>
797 <tr>
798 <td><b>Debug flag</b></td>
799 <td><b>Unit which will generate debugging output</b></td>
800 </tr>
801 <tr>
802 <td>Activity</td>
803 <td>Debug ActivityMonitor actions</td>
804 </tr>
805 <tr>
806 <td>Branch</td>
807 <td>Fetch2 and Execute branch prediction decisions</td>
808 </tr>
809 <tr>
810 <td>MinorCPU</td>
811 <td>CPU global actions such as wakeup/thread suspension</td>
812 </tr>
813 <tr>
814 <td>Decode</td>
815 <td>Decode</td>
816 </tr>
817 <tr>
818 <td>MinorExec</td>
819 <td>Execute behaviour</td>
820 </tr>
821 <tr>
822 <td>Fetch</td>
823 <td>Fetch1 and Fetch2</td>
824 </tr>
825 <tr>
826 <td>MinorInterrupt</td>
827 <td>Execute interrupt handling</td>
828 </tr>
829 <tr>
830 <td>MinorMem</td>
831 <td>Execute memory interactions</td>
832 </tr>
833 <tr>
834 <td>MinorScoreboard</td>
835 <td>Execute scoreboard activity</td>
836 </tr>
837 <tr>
838 <td>MinorTrace</td>
839 <td>Generate MinorTrace cyclic state trace output (see below)</td>
840 </tr>
841 <tr>
842 <td>MinorTiming</td>
843 <td>MinorTiming instruction timing modification operations</td>
844 </tr>
845 </table>
846
847 The group flag Minor enables all of the flags beginning with Minor.
848
849 \section trace MinorTrace and minorview.py
850
851 The debug flag MinorTrace causes cycle-by-cycle state data to be printed which
852 can then be processed and viewed by the minorview.py tool. This output is
853 very verbose and so it is recommended it only be used for small examples.
854
855 \subsection traceformat MinorTrace format
856
857 There are three types of line outputted by MinorTrace:
858
859 \subsubsection state MinorTrace - Ticked unit cycle state
860
861 For example:
862
863 \verbatim
864 110000: system.cpu.dcachePort: MinorTrace: state=MemoryRunning in_tlb_mem=0/0
865 \endverbatim
866
867 For each time step, the MinorTrace flag will cause one MinorTrace line to be
868 printed for every named element in the model.
869
870 \subsubsection traceunit MinorInst - summaries of instructions issued by \
871 Decode
872
873 For example:
874
875 \verbatim
876 140000: system.cpu.execute: MinorInst: id=0/1.1/1/1.1 addr=0x5c \
877 inst=" mov r0, #0" class=IntAlu
878 \endverbatim
879
880 MinorInst lines are currently only generated for instructions which are
881 committed.
882
883 \subsubsection tracefetch1 MinorLine - summaries of line fetches issued by \
884 Fetch1
885
886 For example:
887
888 \verbatim
889 92000: system.cpu.icachePort: MinorLine: id=0/1.1/1 size=36 \
890 vaddr=0x5c paddr=0x5c
891 \endverbatim
892
893 \subsection minorview minorview.py
894
895 Minorview (util/minorview.py) can be used to visualise the data created by
896 MinorTrace.
897
898 \verbatim
899 usage: minorview.py [-h] [--picture picture-file] [--prefix name]
900 [--start-time time] [--end-time time] [--mini-views]
901 event-file
902
903 Minor visualiser
904
905 positional arguments:
906 event-file
907
908 optional arguments:
909 -h, --help show this help message and exit
910 --picture picture-file
911 markup file containing blob information (default:
912 <minorview-path>/minor.pic)
913 --prefix name name prefix in trace for CPU to be visualised
914 (default: system.cpu)
915 --start-time time time of first event to load from file
916 --end-time time time of last event to load from file
917 --mini-views show tiny views of the next 10 time steps
918 \endverbatim
919
920 Raw debugging output can be passed to minorview.py as the event-file. It will
921 pick out the MinorTrace lines and use other lines where units in the
922 simulation are named (such as system.cpu.dcachePort in the above example) will
923 appear as 'comments' when units are clicked on the visualiser.
924
925 Clicking on a unit which contains instructions or lines will bring up a speech
926 bubble giving extra information derived from the MinorInst/MinorLine lines.
927
928 --start-time and --end-time allow only sections of debug files to be loaded.
929
930 --prefix allows the name prefix of the CPU to be inspected to be supplied.
931 This defaults to 'system.cpu'.
932
933 In the visualiser, The buttons Start, End, Back, Forward, Play and Stop can be
934 used to control the displayed simulation time.
935
936 The diagonally striped coloured blocks are showing the InstId of the
937 instruction or line they represent. Note that lines in Fetch1 and f1ToF2.F
938 only show the id fields of a line and that instructions in Fetch2, f2ToD, and
939 decode.inputBuffer do not yet have execute sequence numbers. The T/S.P/L/F.E
940 buttons can be used to toggle parts of InstId on and off to make it easier to
941 understand the display. Useful combinations are:
942
943 <table>
944 <tr>
945 <td><b>Combination</b></td>
946 <td><b>Reason</b></td>
947 </tr>
948 <tr>
949 <td>E</td>
950 <td>just show the final execute sequence number</td>
951 </tr>
952 <tr>
953 <td>F/E</td>
954 <td>show the instruction-related numbers</td>
955 </tr>
956 <tr>
957 <td>S/P</td>
958 <td>show just the stream-related numbers (watch the stream sequence
959 change with branches and not change with predicted branches)</td>
960 </tr>
961 <tr>
962 <td>S/E</td>
963 <td>show instructions and their stream</td>
964 </tr>
965 </table>
966
967 The key to the right shows all the displayable colours (some of the colour
968 choices are quite bad!):
969
970 <table>
971 <tr>
972 <td><b>Symbol</b></td>
973 <td><b>Meaning</b></td>
974 </tr>
975 <tr>
976 <td>U</td>
977 <td>Unknown data</td>
978 </tr>
979 <tr>
980 <td>B</td>
981 <td>Blocked stage</td>
982 </tr>
983 <tr>
984 <td>-</td>
985 <td>Bubble</td>
986 </tr>
987 <tr>
988 <td>E</td>
989 <td>Empty queue slot</td>
990 </tr>
991 <tr>
992 <td>R</td>
993 <td>Reserved queue slot</td>
994 </tr>
995 <tr>
996 <td>F</td>
997 <td>Fault</td>
998 </tr>
999 <tr>
1000 <td>r</td>
1001 <td>Read (used as the leftmost stripe on data in the dcachePort)</td>
1002 </tr>
1003 <tr>
1004 <td>w</td>
1005 <td>Write " "</td>
1006 </tr>
1007 <tr>
1008 <td>0 to 9</td>
1009 <td>last decimal digit of the corresponding data</td>
1010 </tr>
1011 </table>
1012
1013 \verbatim
1014
1015 ,---------------. .--------------. *U
1016 | |=|->|=|->|=| | ||=|||->||->|| | *- <- Fetch queues/LSQ
1017 `---------------' `--------------' *R
1018 === ====== *w <- Activity/Stage activity
1019 ,--------------. *1
1020 ,--. ,. ,. | ============ | *3 <- Scoreboard
1021 | |-\[]-\||-\[]-\||-\[]-\| ============ | *5 <- Execute::inFlightInsts
1022 | | :[] :||-/[]-/||-/[]-/| -. -------- | *7
1023 | |-/[]-/|| ^ || | | --------- | *9
1024 | | || | || | | ------ |
1025 []->| | ->|| | || | | ---- |
1026 | |<-[]<-||<-+-<-||<-[]<-| | ------ |->[] <- Execute to Fetch1,
1027 '--` `' ^ `' | -' ------ | Fetch2 branch data
1028 ---. | ---. `--------------'
1029 ---' | ---' ^ ^
1030 | ^ | `------------ Execute
1031 MinorBuffer ----' input `-------------------- Execute input buffer
1032 buffer
1033 \endverbatim
1034
1035 Stages show the colours of the instructions currently being
1036 generated/processed.
1037
1038 Forward FIFOs between stages show the data being pushed into them at the
1039 current tick (to the left), the data in transit, and the data available at
1040 their outputs (to the right).
1041
1042 The backwards FIFO between Fetch2 and Fetch1 shows branch prediction data.
1043
1044 In general, all displayed data is correct at the end of a cycle's activity at
1045 the time indicated but before the inter-stage FIFOs are ticked. Each FIFO
1046 has, therefore an extra slot to show the asserted new input data, and all the
1047 data currently within the FIFO.
1048
1049 Input buffers for each stage are shown below the corresponding stage and show
1050 the contents of those buffers as horizontal strips. Strips marked as reserved
1051 (cyan by default) are reserved to be filled by the previous stage. An input
1052 buffer with all reserved or occupied slots will, therefore, block the previous
1053 stage from generating output.
1054
1055 Fetch queues and LSQ show the lines/instructions in the queues of each
1056 interface and show the number of lines/instructions in TLB and memory in the
1057 two striped colours of the top of their frames.
1058
1059 Inside Execute, the horizontal bars represent the individual FU pipelines.
1060 The vertical bar to the left is the input buffer and the bar to the right, the
1061 instructions committed this cycle. The background of Execute shows
1062 instructions which are being committed this cycle in their original FU
1063 pipeline positions.
1064
1065 The strip at the top of the Execute block shows the current streamSeqNum that
1066 Execute is committing. A similar stripe at the top of Fetch1 shows that
1067 stage's expected streamSeqNum and the stripe at the top of Fetch2 shows its
1068 issuing predictionSeqNum.
1069
1070 The scoreboard shows the number of instructions in flight which will commit a
1071 result to the register in the position shown. The scoreboard contains slots
1072 for each integer and floating point register.
1073
1074 The Execute::inFlightInsts queue shows all the instructions in flight in
1075 Execute with the oldest instruction (the next instruction to be committed) to
1076 the right.
1077
1078 'Stage activity' shows the signalled activity (as E/1) for each stage (with
1079 CPU miscellaneous activity to the left)
1080
1081 'Activity' show a count of stage and pipe activity.
1082
1083 \subsection picformat minor.pic format
1084
1085 The minor.pic file (src/minor/minor.pic) describes the layout of the
1086 models blocks on the visualiser. Its format is described in the supplied
1087 minor.pic file.
1088
1089 */
1090
1091 }