From: Luke Kenneth Casson Leighton Date: Sat, 1 Apr 2023 12:15:10 +0000 (+0100) Subject: whitespace X-Git-Tag: opf_rfc_ls012_v1~201 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=82a6d4b3c08fa679e93bce057a837f5d8bd967d4;p=libreriscv.git whitespace --- diff --git a/openpower/sv/rfc/ls010.mdwn b/openpower/sv/rfc/ls010.mdwn index 3f64df68e..6480051e8 100644 --- a/openpower/sv/rfc/ls010.mdwn +++ b/openpower/sv/rfc/ls010.mdwn @@ -21,12 +21,12 @@ Links: # Introduction Simple-V is a type of Vectorisation best described as a "Prefix Loop -Subsystem" similar to the 5 decades-old Zilog Z80 `LDIR` instruction and to the 8086 `REP` -Prefix instruction. More advanced features are similar to the Z80 -`CPIR` instruction. If viewed one-dimensionally as an actual Vector ISA it introduces -over 1.5 million 64-bit Vector instructions. SVP64, the instruction -format used by Simple-V, is therefore best viewed as an orthogonal RISC-paradigm "Prefixing" -subsystem instead. +Subsystem" similar to the 5 decades-old Zilog Z80 `LDIR` instruction and +to the 8086 `REP` Prefix instruction. More advanced features are similar +to the Z80 `CPIR` instruction. If viewed one-dimensionally as an actual +Vector ISA it introduces over 1.5 million 64-bit Vector instructions. +SVP64, the instruction format used by Simple-V, is therefore best viewed +as an orthogonal RISC-paradigm "Prefixing" subsystem instead. Except where explicitly stated all bit numbers remain as in the rest of the Power ISA: in MSB0 form (the bits are numbered from 0 at the MSB on @@ -104,10 +104,10 @@ because there is only one `MSR`. ## Register files, elements, and Element-width Overrides -In the Upper Compliancy Levels of SVP64 the size of the GPR and FPR Register -files are expanded from 32 to 128 entries, and the number of CR Fields -expanded from CR0-CR7 to CR0-CR127. (Note: A future version of SVP64 is anticipated -to extend the VSR register file). +In the Upper Compliancy Levels of SVP64 the size of the GPR and FPR +Register files are expanded from 32 to 128 entries, and the number of +CR Fields expanded from CR0-CR7 to CR0-CR127. (Note: A future version +of SVP64 is anticipated to extend the VSR register file). Memory access remains exactly the same: the effects of `MSR.LE` remain exactly the same, affecting as they already do and remain **only** @@ -262,22 +262,22 @@ order to store the new CR Fields, CR8-CR15, CR16-CR23 etc. sequentially. The top 32 MSBs in each new SVP64 Condition Register are *also* not used: only the bottom 32 bits (numbered 32:63 in MSB0 numbering). -*Programmer's note: using `sv.mfcr` without element-width overrides to take -into account the fact that the top 32 MSBs are zero and thus effectively -doubling the number of GPR registers required to hold all 128 CR Fields -would seem the only option because normally elwidth overrides would -halve the capacity of the instruction. However in this case it is -possible to use destination element-width overrides (for `sv.mfcr`. -source overrides would be used on the GPR of `sv.mtocrf`), -whereupon truncation -of the 64-bit Condition Register(s) occurs, throwing away the zeros and -storing the remaining (valid, desired) 32-bit values sequentially into -(LSB0-convention) lower-numbered and upper-numbered halves of GPRs respectively. -The programmer is expected to be aware however that the full width of -the entire 64-bit Condition Register is considered to be "an element". -This is **not** like any other Condition-Register instructions because -all other CR instructions, on closer investigation, will be observed -to all be CR-bit or CR-Field related. Thus a `VL` of 16 must be used* +*Programmer's note: using `sv.mfcr` without element-width overrides +to take into account the fact that the top 32 MSBs are zero and thus +effectively doubling the number of GPR registers required to hold all 128 +CR Fields would seem the only option because normally elwidth overrides +would halve the capacity of the instruction. However in this case it +is possible to use destination element-width overrides (for `sv.mfcr`. +source overrides would be used on the GPR of `sv.mtocrf`), whereupon +truncation of the 64-bit Condition Register(s) occurs, throwing away +the zeros and storing the remaining (valid, desired) 32-bit values +sequentially into (LSB0-convention) lower-numbered and upper-numbered +halves of GPRs respectively. The programmer is expected to be aware +however that the full width of the entire 64-bit Condition Register +is considered to be "an element". This is **not** like any other +Condition-Register instructions because all other CR instructions, +on closer investigation, will be observed to all be CR-bit or CR-Field +related. Thus a `VL` of 16 must be used* ## Future expansion. @@ -301,8 +301,8 @@ scope for this version of SVP64. # New 64-bit Instruction Encoding spaces -The following seven new areas are defined within Primary Opcode 9 (EXT009) as a -new 64-bit encoding space, alongside EXT1xx. +The following seven new areas are defined within Primary Opcode 9 (EXT009) +as a new 64-bit encoding space, alongside EXT1xx. | 0-5 | 6 | 7 | 8-31 | 32| Description | |-----|---|---|-------|---|------------------------------------| @@ -314,12 +314,11 @@ new 64-bit encoding space, alongside EXT1xx. | PO | 1 | 0 | !zero | n | SVP64Single:EXT000-063 or `RESERVED4` | | PO | 1 | 1 | nnnn | n | SVP64:EXT000-063 | -Note that for the future SVP64Single Encoding (currently RESERVED3 and 4) it -is prohibited to have bits 8-31 be zero, unlike for SVP64 Vector space, -for which bits 8-31 -can be zero (termed `scalar identity behaviour`). This -prohibition allows SVP64Single to share its -Encoding space with Scalar Ext232-263 and Scalar EXT300-363. +Note that for the future SVP64Single Encoding (currently RESERVED3 and 4) +it is prohibited to have bits 8-31 be zero, unlike for SVP64 Vector space, +for which bits 8-31 can be zero (termed `scalar identity behaviour`). This +prohibition allows SVP64Single to share its Encoding space with Scalar +Ext232-263 and Scalar EXT300-363. Also that RESERVED1 and 2 are candidates for future Major opcode areas EXT200-231 and EXT300-363 respectively, however as RESERVED areas @@ -355,10 +354,10 @@ Prefixed with SVP64Single. # Remapped Encoding (`RM[0:23]`) -In the SVP64 Vector Prefix spaces, the 24 bits 8-31 are termed `RM`. Bits 32-37 are -the Primary Opcode of the Suffix "Defined Word". 38-63 are the remainder of the -Defined Word. Note that the new EXT232-263 SVP64 area it is obviously mandatory -that bit 32 is required to be set to 1. +In the SVP64 Vector Prefix spaces, the 24 bits 8-31 are termed `RM`. Bits +32-37 are the Primary Opcode of the Suffix "Defined Word". 38-63 are the +remainder of the Defined Word. Note that the new EXT232-263 SVP64 area +it is obviously mandatory that bit 32 is required to be set to 1. | 0-5 | 6 | 7 | 8-31 | 32-37 | 38-64 |Description | |-----|---|---|----------|--------|----------|-----------------------| @@ -366,13 +365,14 @@ that bit 32 is required to be set to 1. | PO | 1 | 1 | RM[0:23] | nnnnnn | xxxxxxxx | SVP64:EXT000-063 | It is important to note that unlike v3.1 64-bit prefixed instructions -there is insufficient space in `RM` to provide identification of any SVP64 -Fields without first partially decoding the 32-bit suffix. Similar to -the "Forms" (X-Form, D-Form) the `RM` format is individually associated -with every instruction. However this still does not adversely affect Multi-Issue -Decoding because the identification of the *length* of anything in the -64-bit space has been kept brutally simple (EXT009), and further decoding -of any number of 64-bit Encodings in parallel at that point is fully independent. +there is insufficient space in `RM` to provide identification of +any SVP64 Fields without first partially decoding the 32-bit suffix. +Similar to the "Forms" (X-Form, D-Form) the `RM` format is individually +associated with every instruction. However this still does not adversely +affect Multi-Issue Decoding because the identification of the *length* +of anything in the 64-bit space has been kept brutally simple (EXT009), +and further decoding of any number of 64-bit Encodings in parallel at +that point is fully independent. Extreme caution and care must be taken when extending SVP64 in future, to not create unnecessary relationships between prefix and @@ -468,14 +468,15 @@ Note: [`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) is reserved for a future implementation of SV -Note that any IEEE754 FP operation in Power ISA ending in "s" (`fadds`) shall -perform its operation at **half** the ELWIDTH then padded back out -to ELWIDTH. `sv.fadds/ew=f32` shall perform an IEEE754 FP16 operation that is then "padded" to fill out to an IEEE754 FP32. When ELWIDTH=DEFAULT +Note that any IEEE754 FP operation in Power ISA ending in "s" (`fadds`) +shall perform its operation at **half** the ELWIDTH then padded back out +to ELWIDTH. `sv.fadds/ew=f32` shall perform an IEEE754 FP16 operation +that is then "padded" to fill out to an IEEE754 FP32. When ELWIDTH=DEFAULT clearly the behaviour of `sv.fadds` is performed at 32-bit accuracy then padded back out to fit in IEEE754 FP64, exactly as for Scalar -v3.0B "single" FP. Any FP operation ending in "s" where ELWIDTH=f16 -or ELWIDTH=bf16 is reserved and must raise an illegal instruction -(IEEE754 FP8 or BF8 are not defined). +v3.0B "single" FP. Any FP operation ending in "s" where ELWIDTH=f16 or +ELWIDTH=bf16 is reserved and must raise an illegal instruction (IEEE754 +FP8 or BF8 are not defined). ### Elwidth for CRs (no meaning) @@ -991,19 +992,20 @@ When N=1 the same occurs except that the result is saturated to the min or max of a signed result, and for FP to the min and max value rather than returning +/- INF. -When Rc=1, the CR "overflow" bit is set on the CR associated with the -element, to indicate whether saturation occurred. Note that due to -the hugely detrimental effect it has on parallel processing, XER.SO is -**ignored** completely and is **not** brought into play here. The CR -overflow bit is therefore simply set to zero if saturation did not occur, -and to one if it did. This behaviour (ignoring XER.SO) is actually optional in -the SFFS Compliancy Subset: for SVP64 it is made mandatory *but only on -Vectorised instructions*. +When Rc=1, the CR "overflow" bit is set on the CR associated with +the element, to indicate whether saturation occurred. Note that +due to the hugely detrimental effect it has on parallel processing, +XER.SO is **ignored** completely and is **not** brought into play here. +The CR overflow bit is therefore simply set to zero if saturation did +not occur, and to one if it did. This behaviour (ignoring XER.SO) is +actually optional in the SFFS Compliancy Subset: for SVP64 it is made +mandatory *but only on Vectorised instructions*. Note also that saturate on operations that set OE=1 must raise an Illegal -Instruction due to the conflicting use of the CR.so bit for storing if -saturation occurred. Vectorised Integer Operations that produce a Carry-Out (CA, -CA32): these two bits will be `UNDEFINED` if saturation is also requested. +Instruction due to the conflicting use of the CR.so bit for storing +if saturation occurred. Vectorised Integer Operations that produce a +Carry-Out (CA, CA32): these two bits will be `UNDEFINED` if saturation +is also requested. Note that the operation takes place at the maximum bitwidth (max of src and dest elwidth) and that truncation occurs to the range of the @@ -1097,16 +1099,16 @@ not so limited. Thus it is possible to use for example `sv.cror/ff=gt/vli *0,*0,*0`, which is not a `nop` because it allows Fail-First Mode to perform a test and truncate VL.* -*Hardware implementor's note: effective Sequential Program Order must be preserved. -Speculative Execution is perfectly permitted as long as the speculative elements -are held back from writing to register files (kept in Resevation Stations), -until such time as the relevant -CR Field bit(s) has been analysed. All Speculative elements sequentially beyond the -test-failure point **MUST** be cancelled. This is no different from standard -Out-of-Order Execution and the modification effort to efficiently support -Data-Dependent Fail-First within a pre-existing Multi-Issue Out-of-Order Engine -is anticipated to be minimal. In-Order systems on the other hand are expected, -unavoidably, to be low-performance*. +*Hardware implementor's note: effective Sequential Program Order must +be preserved. Speculative Execution is perfectly permitted as long as +the speculative elements are held back from writing to register files +(kept in Resevation Stations), until such time as the relevant CR Field +bit(s) has been analysed. All Speculative elements sequentially beyond +the test-failure point **MUST** be cancelled. This is no different from +standard Out-of-Order Execution and the modification effort to efficiently +support Data-Dependent Fail-First within a pre-existing Multi-Issue +Out-of-Order Engine is anticipated to be minimal. In-Order systems on +the other hand are expected, unavoidably, to be low-performance*. Two extremely important aspects of ffirst are: @@ -1215,11 +1217,12 @@ Memory infrastructure (and the ISA itself) correspondingly needs Vector Memory Operations as well. Vectorised Load and Store also presents an extra dimension (literally) -which creates scenarios unique to Vector applications, that a Scalar -(and even a SIMD) ISA simply never encounters. SVP64 endeavours to add -the modes typically found in *all* Scalable Vector ISAs, without changing -the behaviour of the underlying Base (Scalar) v3.0B operations in any way. -(The sole apparent exception is Post-Increment Mode on LD/ST-update instructions) +which creates scenarios unique to Vector applications, that a Scalar (and +even a SIMD) ISA simply never encounters. SVP64 endeavours to add the +modes typically found in *all* Scalable Vector ISAs, without changing the +behaviour of the underlying Base (Scalar) v3.0B operations in any way. +(The sole apparent exception is Post-Increment Mode on LD/ST-update +instructions) ## Modes overview