From 5a9b973e92b1fa61b91212a8e37681a5e65f5ebc Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Sat, 31 Aug 2019 09:48:33 +0100 Subject: [PATCH] state clarification on VBLOCK --- simple_v_extension/vblock_format.mdwn | 68 ++++++++++++++++++++------- 1 file changed, 50 insertions(+), 18 deletions(-) diff --git a/simple_v_extension/vblock_format.mdwn b/simple_v_extension/vblock_format.mdwn index 5def4fb1f..9ddbf4a1b 100644 --- a/simple_v_extension/vblock_format.mdwn +++ b/simple_v_extension/vblock_format.mdwn @@ -51,13 +51,20 @@ in a single instruction. # VBLOCK Prefix -The purpose of the VBLOCK Prefix is to specify the context in which a block of RV Scalar instructions are "vectorised" and/or predicated. +The purpose of the VBLOCK Prefix is to specify the context in which a +block of RV Scalar instructions are "vectorised" and/or predicated. -As there are not very many bits available without going into a prefix format longer than 16 bits, some abbreviations are used. Two bits are dedicated to specifying whether the Register and Predicate formats are 16 or 8 bit. +As there are not very many bits available without going into a prefix +format longer than 16 bits, some abbreviations are used. Two bits are +dedicated to specifying whether the Register and Predicate formats are +16 or 8 bit. -Also, the number of entries in each table is specified with an unusual encoding, on the basis that if registers are to be Vectorised, it is highly likely that they will be predicated as well. +Also, the number of entries in each table is specified with an unusual +encoding, on the basis that if registers are to be Vectorised, it is +highly likely that they will be predicated as well. -The VL Block is optional and also only 16 bits: this because an RVC opcode is limited by comparison. +The VL Block is optional and also only 16 bits: this because an RVC +opcode is limited by comparison. The format is explained as follows: @@ -189,10 +196,10 @@ trap to occur even part-way through decode, in order to reduce latency. The format is as follows: -| 31:30 | 29 | 28:26 | 25:24 | 23:22 | 21 | 20:5 | 4:0 | -|--------|-------|-------|-------|-------|------|-------|-------| +| 31:30 | 29 | 28:26 | 25:24 | 23:22 | 21 | 20:5 | 4:0 | +|--------|-------|-------|-------|-------|------|---------|-------| | status | vlset | 16xil | pplen | rplen | mode | vblock2 | opptr | -| 2 | 1 | 3 | 2 | 2 | 1 | 16 | 5 | +| 2 | 1 | 3 | 2 | 2 | 1 | 16 | 5 | * status is the key field that effectively exposes the inner FSM (Finite State Machine) directly. @@ -200,14 +207,14 @@ The format is as follows: is instead in standard RV Scalar opcode execution mode. The processor will leave this mode only after it encounters the beginning of a valid VBLOCK opcode. -* status = 0b01 indicates that vlset, 16xil, pplen, rplen and mode have +* status=0b01 indicates that vlset, 16xil, pplen, rplen and mode have all been copied directly from the VBLOCK so that they do not need to be - read again from the instruction stream, - and that VBLOCK2 has also been read and stored, if 16xil was - equal to 0b111. -* status=0b10 indicates that the VL Block has been read from the - instruction stream and actioned. - (This means that a SETVL instruction has been created and executed). + read again from the instruction stream, and that VBLOCK2 has also been + read and stored, if 16xil was equal to 0b111. +* status=0b10 indicates that the VL Block has been read from the instruction + stream and actioned. (This means that a SETVL instruction has been + created and executed). It also indicates that reading of the + Predicate, Register and Swizzle Blocks are now being read. * status=0b11 indicates that the Predicate and Register Blocks have been read from the instruction stream (and put into internal Vector Context) Simpler implementations are permitted to reset status back to 0b10 and @@ -226,15 +233,40 @@ The format is as follows: and Register Context destroyed (Note: the STATE CSR is **not** altered purely by exit from a VBLOCK Context). +During the transition from status=0b00 to status=0b01, it is assumed +that the instruction stream is being read at a mininum of 32 bits at +a time. Therefore it is reasonable to expect that VBLOCK2 would be +successfully read simultaneously with the initial VBLOCK header. +For this reason there is no separate state in the FSM for updating +of the vblock2 field in PCVBLK. + +When the transition from status=0b01 to status=0b10 occurs, actioning the +VL Block state *actually* and literally **must** be as if a SETVL instruction +had occurred. This can result in updating of the VL and MVL CSRs (and +the VL destination register target). Note, below, that this means that +a context-switch may save/restore VL and MVL (and the integer register file), +where the remaining tables have no such opportunity. + +When status=0b10, and before status=0b11, there is no external indicator +as to how far the hardware has got in the process of reading the +Predicate, Register, and Swizzle Blocks. Implementations are free to use +any internal means to track progress, however given that if a trap occurs +the read process will need to be restarted (in simpler implementations), +there is no point having external indicators of progress. By complete +contrast, given that a SETVL actually writes to VL (and MVL), the VL +Block state *has* been actioned and thus would be successfully restored +by a context-switch. + When status=0b11, opptr may be written to using CSRRWI. Doing so will cause execution to jump within the block, exactly as if PC had been set -in normal RISC-V eexecution. Writing a value outside of the range of the +in normal RISC-V execution. Writing a value outside of the range of the instruction block will cause an illegal instruction exception. Writing a value (any value) when status is not 0b11 likewise causes an illegal -instruction exception. +instruction exception. To be clear: CSRRWI PCVBLK does **not** have the same +behaviour as CSRRW PCVBLK. -In privileged modes, obviously the above rules do not apply to the -completely seoarate (x)ePCVBLK CSRs because these are copies of state, +In privileged modes, obviously the above rules do not apply to the completely +separate (x)ePCVBLK CSRs because these are (inactive) *copies* of state, not the actual active PCVBLK. Writing to PCVBLK during a trap however, clearly the rules must apply. -- 2.30.2