From: Luke Kenneth Casson Leighton Date: Mon, 29 May 2023 12:37:42 +0000 (+0100) Subject: replace "Defined word" with "Defined Word-instruction" X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=337dbea2b80;p=libreriscv.git replace "Defined word" with "Defined Word-instruction" --- diff --git a/openpower/sv/cr_ops.mdwn b/openpower/sv/cr_ops.mdwn index ddb54d787..d4e81fc46 100644 --- a/openpower/sv/cr_ops.mdwn +++ b/openpower/sv/cr_ops.mdwn @@ -44,7 +44,7 @@ considered to be a "co-result". Such CR Field "co-result" arithmeric operations are firmly out of scope for this section, being covered fully by [[sv/normal]]. -* Examples of Vectorizeable Defined Words to which this section does +* Examples of Vectorizeable Defined Word-instructions to which this section does apply is - `mfcr` and `cmpi` (3 bit operands) and - `crnor` and `crand` (5 bit operands). diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn index cbebc0032..04f6de582 100644 --- a/openpower/sv/ldst.mdwn +++ b/openpower/sv/ldst.mdwn @@ -1,6 +1,6 @@ # SV Load and Store -This section describes how Standard Load/Store Defined Words are exploited as +This section describes how Standard Load/Store Defined Word-instructions are exploited as Element-level Load/Stores and augmented to create direct equivalents of Vector Load/Store instructions. @@ -63,7 +63,7 @@ and are a critical part of its value*. Also included in SVP64 LD/ST is Element-width overrides and Twin-Predication. Note also that Indexed [[sv/remap]] mode may be applied to both Scalar -LD/ST Immediate Defined Words *and* LD/ST Indexed Defined Words. +LD/ST Immediate Defined Word-instructions *and* LD/ST Indexed Defined Word-instructions. LD/ST-Indexed should not be conflated with Indexed REMAP mode: clarification is provided below. @@ -524,7 +524,7 @@ zero in the predicate will be the NULL pointer* RA=0 # vec - first one is valid, contains ptr imm = 8 # offset_of(ptr->next) for i in range(VL): - # this part is the Scalar Defined Word (standard scalar ld operation) + # this part is the Scalar Defined Word-instruction (standard scalar ld operation) EA = GPR(RA+i) + imm # ptr + offset(next) data = MEM(EA, 8) # 64-bit address of ptr->next # was a normal vector-ld up to this point. now the Data-Fail-First @@ -566,7 +566,7 @@ Vertical-First Mode. Although Rc=1 on LD/ST is a rare occurrence at present, future versions of Power ISA *might* conceivably have Rc=1 LD/ST Scalar instructions, and with the SVP64 Vectorization Prefixing being itself a RISC-paradigm that -is itself fully-independent of the Scalar Suffix Defined Words, prohibiting +is itself fully-independent of the Scalar Suffix Defined Word-instructions, prohibiting the possibility of Rc=1 Data-Dependent Mode on future potential LD/ST operations is not strategically sound. @@ -747,7 +747,7 @@ REMAP will need to be used. **Parallel Reduction REMAP** No REMAP Schedule is prohibited in SVP64 because the RISC-paradigm Prefix -is completely separate from the RISC-paradigm Scalar Defined Words. Although +is completely separate from the RISC-paradigm Scalar Defined Word-instructions. Although obscure there does exist the outside possibility that a potential use for Parallel Reduction Schedules on LD/ST would find a use in Computer Science. Readers are invited to contact the authors of this document if one is ever diff --git a/openpower/sv/po9_encoding.mdwn b/openpower/sv/po9_encoding.mdwn index f0489c359..d832f5065 100644 --- a/openpower/sv/po9_encoding.mdwn +++ b/openpower/sv/po9_encoding.mdwn @@ -7,7 +7,7 @@ In its simplest form, the Simple-V Loop/Vector concept is a Prefixing system (similar to the 8086 `REP` instruction and the Z80 `LDIR`) that both augments its -following Defined Word Suffix, and also may repeat that instruction +following Defined Word-instruction Suffix, and also may repeat that instruction with optional sequential register offsets from those given in the Suffix. Register numbers may also be extended (larger register files). More advanced features add predication, element-width overrides, and @@ -17,9 +17,9 @@ Vertical-First Mode. SVP64 is a well-defined implementation of the Simple-V Loop/Vector concept, in a 32-bit Prefix format, that exploits the following instruction -(the Defined Word) using it as a "template". It requires 24 bits, +(the Defined Word-instruction) using it as a "template". It requires 24 bits, some of which are common to all Suffixes, and some Mode bits are specific -to the Defined Word class: Load/Store-Immediate, Load/Store-Indexed, +to the Defined Word-instruction class: Load/Store-Immediate, Load/Store-Indexed, Arithmetic/Logical, Condition Register operations, and Branch-Conditional. Anything not falling into those five categories is termed "Unvectorizable". @@ -45,7 +45,7 @@ instruction is a clear priority. **Definition of SVP64Single Prefixing:** -A 32-bit Prefix in front of a Defined Word that extends register +A 32-bit Prefix in front of a Defined Word-instruction that extends register numbers (allows larger register files), adds single-bit predication, element-width overrides, and optionally adds Saturation to Arithmetic instructions that normally would not have it. *SVP64Single is in Draft only* @@ -62,7 +62,7 @@ Unvectorizable instructions are required to be detected as such if Prefixed (either SVP64 or SVP64Single) and an Illegal Instruction Trap raised. *Hardware Architectural Note: Given that a "pre-classification" Decode Phase -is required (identifying whether the Suffix - Defined Word - is +is required (identifying whether the Suffix - Defined Word-instruction - is Arithmetic/Logical, CR-op, Load/Store or Branch-Conditional), adding "Unvectorizable" to this phase is not unreasonable.* @@ -89,7 +89,7 @@ Key: * **x** - a `RESERVED` encoding. Illegal Instruction Trap must be raised * **n** - a specification-defined value * **!zero** - a non-zero specification-defined value -* **DWd** - when including bit 32 is a "Defined Word" as explained in +* **DWd** - when including bit 32 is a "Defined Word-instruction" as explained in Book I Section 1.6 (Public v3.1 p11) Note that for the future SVP64Single Encoding (currently RESERVED3 and 4) @@ -105,17 +105,17 @@ they may equally be allocated entirely differently. *Architectural Resource Allocation Note: Similar to ARM's `MOVPRFX` instruction and the original x86 REP instruction, despite "influence" over the Suffix, the Suffix is entirely independent of the Prefix. Therefore -**under no circumstances** must different Defined Words (different from -the same **Un-Prefixed** Defined Word) be allocated within any `EXT{z}` +**under no circumstances** must different Defined Word-instructions (different from +the same **Un-Prefixed** Defined Word-instruction) be allocated within any `EXT{z}` prefixed or unprefixed space for a given value of `z` of 0, 2 or 3: the results would be catastrophic. Even if Unvectorizable an instruction -Defined Word space **must** have the exact same Instruction and exact same +Defined Word-instruction space **must** have the exact same Instruction and exact same Instruction Encoding in all spaces being RESERVED (Illegal Instruction Trap if Unvectorizable) or not be allocated at all. This is required as an inviolate hard rule governing Primary Opcode 9 that may not be revoked under any circumstances. A useful way to think of this is that the Prefix Encoding is, like the 8086 REP instruction, an independent -32-bit Defined Word. The only semi-exceptions are the Post-Increment +32-bit Defined Word-instruction. The only semi-exceptions are the Post-Increment Mode of LD/ST-Update and Vectorized Branch-Conditional.* Note a particular consequence of the application of the above paragraph: @@ -127,8 +127,8 @@ be revoked rescinded removed or recalled), named `SVP64:EXT022` and revoked is if EXT022 itself is revoked. **All and any** re-definitions modifications enhancements clarifications that apply to EXT022 **also apply to these two new areas** because due to the Prefixes being -independent Defined Words the three areas are actually one and the same -area, just as *all* Scalar Defined Words are. +independent Defined Word-instructions the three areas are actually one and the same +area, just as *all* Scalar Defined Word-instructions are. Encoding spaces and their potential are illustrated: @@ -144,7 +144,7 @@ Notes: * **PO9**-PO1 Prefixed-Prefixed (96-bit) instructions are prohibited. EXT1xx is thus inherently Unvectorizable as the EXT1xx prefix is 32-bit - on top of an SVP64 prefix which is 32-bit on top of a Defined Word + on top of an SVP64 prefix which is 32-bit on top of a Defined Word-instruction and the complexity at the Decoder becomes too great for High Performance Multi-Issue systems. * There is however no reason why PO9-PO1 (EXT901?) as an entirely new RESERVED diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn index 9b6a5a82c..545be7ecf 100644 --- a/openpower/sv/remap.mdwn +++ b/openpower/sv/remap.mdwn @@ -329,7 +329,7 @@ Interrupts and exceptions can therefore also be precise. The final result will be in the first non-predicate-masked-out destination element, but due again to the deterministic schedule programmers may find uses for the intermediate -results, even for non-commutative Defined Word operations. +results, even for non-commutative Defined Word-instruction operations. Additionally, because the intermediate results are always written out it is possible to service Precise Interrupts without affecting latency (a common limitation of Vector ISAs implementing explicit @@ -435,7 +435,7 @@ not work. To achieve Sub-Vector Horizontal Reduction, Pack/Unpack should be enabled, which will turn the Schedule around such that issuing of the Scalar -Defined Words is done with SUBVL looping as the inner loop not the +Defined Word-instructions is done with SUBVL looping as the inner loop not the outer loop. Rc=1 with Sub-Vectors (SUBVL=2,3,4) is `UNDEFINED` behaviour. *Programmer's Note: Overwrite Parallel Reduction with Sub-Vectors diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn index ea0942180..b3c6ebb31 100644 --- a/openpower/sv/rfc/ls001.mdwn +++ b/openpower/sv/rfc/ls001.mdwn @@ -649,7 +649,7 @@ With Simple-V being a type of [Zero-Overhead Loop](https://en.m.wikipedia.org/wiki/Zero-overhead_looping) Engine on top of Scalar operations some clear guidelines are needed on how both -existing "Defined Words" (Public v3.1 Section 1.6.3 term) and future +existing "Defined Word-instructions" (Public v3.1 Section 1.6.3 term) and future Scalar operations are added within the 64-bit space. Examples of legal and illegal allocations are given later. @@ -663,7 +663,7 @@ being added in the Scalar space, and vice-versa, *even if Unvectorizeable*. This is extremely important because the worst possible situation is if a conflicting Scalar instruction is added by another Stakeholder, which then turns out to be Vectorizeable: it would then have to be -added to the Vector Space with a *completely different Defined Word* +added to the Vector Space with a *completely different Defined Word-instruction* and things go rapidly downhill in the Decode Phase from there. Setting a simple inviolate rule helps avoid this scenario but does need to be borne in mind when discussing potential allocation @@ -784,7 +784,7 @@ that is as follows: * EXT009, like EXT001 of Public v3.1, is **defined** as a 64-bit encoding. This makes Multi-Issue Length-identification trivial. * bit 6 if 0b1 is 100% for Simple-V augmentation of (Public v3.1 1.6.3) - "Defined Words" (aka EXT000-063), with the exception of 0x26000000 + "Defined Word-instructions" (aka EXT000-063), with the exception of 0x26000000 as a Prefix, which is a new RESERVED encoding. * when bit 6 is 0b0 and bits 32-33 are 0b11 are **defined** as also allocated to Simple-V @@ -833,7 +833,7 @@ and reserved areas, QTY 1of 32-bit, and QTY 3of 55-bit, are: which **must** be granted corresponding space in SVP64. * Anything Vectorized-EXT000-063 is **automatically** being - requested as 100% Reserved for every single "Defined Word" + requested as 100% Reserved for every single "Defined Word-instruction" (Public v3.1 1.6.3 definition). Vectorized-EXT001 or EXT009 is defined as illegal. * Any **future** instruction @@ -877,16 +877,16 @@ on their merits. **EXT000-EXT063** These are Scalar word-encodings. Often termed "v3.0 Scalar" in this document -Power ISA v3.1 Section 1.6.3 Book I calls it a "defined word". +Power ISA v3.1 Section 1.6.3 Book I calls it a "Defined Word-instruction". | 0-5 | 6-31 | |--------|--------| -| PO | EXT000-063 "Defined word" | +| PO | EXT000-063 "Defined Word-instruction" | **SVP64Single:{EXT000-063}** bit6=old bit7=scalar This encoding, identical to SVP64Single:{EXT248-263}, -introduces SVP64Single Augmentation of Scalar "defined words". +introduces SVP64Single Augmentation of Scalar "Defined Word-instructions". All meanings must be identical to EXT000-063, and is is likewise prohibited to add an instruction in this area without also adding the exact same (non-Augmented) instruction in EXT000-063 with the @@ -1048,7 +1048,7 @@ EXT300-363. the use of 0x12345678 for fredmv in scalar but fishmv in Vector is illegal. the suffix in both 64-bit locations must be allocated to a Vectorizeable EXT000-063 -"Defined Word" (Public v3.1 Section 1.6.3 definition) +"Defined Word-instruction" (Public v3.1 Section 1.6.3 definition) or not at all. \newpage{} @@ -1070,7 +1070,7 @@ and: | 64bit | sv.fishmv | 0x25nnnnnn | 0x12345678| vector SVP64:EXT2nn | Both of these Simple-V operations are illegally-allocated. The fact that -there does not exist a scalar "Defined Word" (even for EXT200-263) - the +there does not exist a scalar "Defined Word-instruction" (even for EXT200-263) - the unallocated block - means that the instruction may **not** be allocated in the Simple-V space. @@ -1082,7 +1082,7 @@ the Simple-V space. | 64bit | ss.fishmv | 0x24!zero | 0x10345678| scalar SVP64Single:EXT2nn | | 64bit | sv.fishmv | 0x25nnnnnn | 0x10345678| vector SVP64:EXT2nn | -This is an illegal attempt to place an EXT004 "Defined Word" +This is an illegal attempt to place an EXT004 "Defined Word-instruction" (Public v3.1 Section 1.6.3) into the EXT2nn Vector space. This is not just illegal it is not even possible to achieve. If attempted, by dropping EXT004 into bits 32-37, the top two @@ -1341,7 +1341,7 @@ of elements copied (VL), rather than decrementing simply by one. [^likeext001]: SVP64-Single is remarkably similar to the "bit 1" of EXT001 being set to indicate that the 64-bits is to be allocated in full to a new encoding, but in fact SVP64-single still embeds v3.0 Scalar operations. [^pseudorewrite]: elwidth overrides does however mean that all SFS / SFFS pseudocode will need rewriting to be in terms of XLEN. This has the indirect side-effect of automatically making a 32-bit Scalar Power ISA Specification possible, as well as a future 128-bit one (Cross-reference: RISC-V RV32 and RV128 [^only2]: reminder that this proposal only needs 75% of two POs for Scalar instructions. The rest of EXT200-263 is for general use. -[^ext001]: Recall that EXT100 to EXT163 is for Public v3.1 64-bit-augmented Operations prefixed by EXT001, for which, from Section 1.6.3, bit 6 is set to 1. This concept is where the above scheme originated. Section 1.6.3 uses the term "defined word" to refer to pre-existing EXT000-EXT063 32-bit instructions so prefixed to create the new numbering EXT100-EXT163, respectively +[^ext001]: Recall that EXT100 to EXT163 is for Public v3.1 64-bit-augmented Operations prefixed by EXT001, for which, from Section 1.6.3, bit 6 is set to 1. This concept is where the above scheme originated. Section 1.6.3 uses the term "Defined Word-instruction" to refer to pre-existing EXT000-EXT063 32-bit instructions so prefixed to create the new numbering EXT100-EXT163, respectively [^futurevsx]: A future version or other Stakeholder *may* wish to drop Simple-V onto VSX: this would be a separate RFC [^vsx256]: imagine a hypothetical future VSX-256 using the exact same instructions as VSX. the binary incompatibility introducrd would catastrophically **and retroactively** damage existing IBM POWER8,9,10 hardware's reputation and that of Power ISA overall. [^autovec]: Compiler auto-vectorization for best exploitation of SIMD and Vector ISAs on Scalar programming languages (c, c++) is an Indusstry-wide known-hard decades-long problem. Cross-reference the number of hand-optimised assembler algorithms. diff --git a/openpower/sv/rfc/ls001/discussion.mdwn b/openpower/sv/rfc/ls001/discussion.mdwn index a1e4453a3..38cdc9f94 100644 --- a/openpower/sv/rfc/ls001/discussion.mdwn +++ b/openpower/sv/rfc/ls001/discussion.mdwn @@ -49,12 +49,12 @@ Section 1.6.3: ``` Prefix bits 6:7 are used to identify one of four prefix for- mat types. When bit 6 is set to 0 (prefix types 00 and -01), the suffix is not a defined word instruction (i.e., +01), the suffix is not a Defined Word-instruction instruction (i.e., requires the prefix to identify the alternate opcode space the suffix is assigned to as well as additional or extended operand and/or control fields); when bit 6 is set to 1 (prefix types 10 and 11), the prefix is modifying -the behavior of a defined word instruction in the suffix. +the behavior of a Defined Word-instruction instruction in the suffix. ``` thus, we have: diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn index 84ed809fb..77db7d476 100644 --- a/openpower/sv/rfc/ls012.mdwn +++ b/openpower/sv/rfc/ls012.mdwn @@ -42,11 +42,11 @@ can only go via the External RFC Process. Also be advised and aware that "Libre-SOC" != "RED Semiconductor Ltd". The two are completely **separate** organisations*. -Worth bearing in mind during evaluation that every "Defined Word" may -or may not be Vectorizeable, but that every "Defined Word" should have +Worth bearing in mind during evaluation that every "Defined Word-instruction" may +or may not be Vectorizeable, but that every "Defined Word-instruction" should have merits on its own, not just when Vectorized, precisely because the instructions are Scalar. An example of a borderline -Vectorizeable Defined Word is `mv.swizzle` which only really becomes +Vectorizeable Defined Word-instruction is `mv.swizzle` which only really becomes high-priority for Audio/Video, Vector GPU and HPC Workloads, but has less merit as a Scalar-only operation, yet when SVP64Single-Prefixed can be part of an atomic Compare-and-Swap sequence. @@ -64,7 +64,7 @@ Good examples here include `bmask`. SVP64 Prefixing - also known by the terms "Zero-Overhead-Loop-Prefixing" as well as "True-Scalable-Vector Prefixing" - also literally brings new -dimensions to the Power ISA. Thus when adding new Scalar "Defined Words" +dimensions to the Power ISA. Thus when adding new Scalar "Defined Word-instructions" it has to unavoidably and simultaneously be taken into consideration their value when Vector-Prefixed, *as well as* SVP64Single-Prefixed. @@ -148,7 +148,7 @@ standard MAC or FMAC instruction), but Inner is required for Warshall Transitive Closure (on top of a cumulatively-applied max instruction). Excpt for `svstep` which is Vectorizeable the Management Instructions -themselves are all 32-bit Defined Words (Scalar Operations), so +themselves are all 32-bit Defined Word-instructions (Scalar Operations), so PO1-Prefixing is perfectly reasonable. SVP64 Management instructions of which there are only 6 are all 5 or 6 bit XO, meaning that the opcode space they take up in EXT0xx is not alarmingly high for their intrinsic diff --git a/openpower/sv/rfc/ls014.mdwn b/openpower/sv/rfc/ls014.mdwn index 82050215c..4990c5bb7 100644 --- a/openpower/sv/rfc/ls014.mdwn +++ b/openpower/sv/rfc/ls014.mdwn @@ -70,7 +70,7 @@ Desirable savings in general binary size are achieved. 1. bmask is a synthesis and generalisation of every "TBM" instruction with additional options not found in any other ISA BMI group. -2. grevluti as a 32-bit Defined Word is capable of generating over a thousand useful +2. grevluti as a 32-bit Defined Word-instruction is capable of generating over a thousand useful regular-patterned 64-bit "magic constants" that otherwise require either a Load or require several instructions to synthesise 3. word halfword byte nibble 2-bit 1-bit reversal at multiple levels are all achieved diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn index 397f54ab6..5c0b3d9ea 100644 --- a/openpower/sv/svp64.mdwn +++ b/openpower/sv/svp64.mdwn @@ -66,8 +66,8 @@ ranges are inclusive (so `4:6` means bits 4, 5, and 6, in MSB0 order). which is a different convention from that used elsewhere in the Power ISA. The SVP64 prefix always comes before the suffix in PC order and must be -considered an independent "Defined word-instruction"[^dwi] that augments the behaviour of -the following instruction (also a Defined word-instruction), but does **not** change the actual Decoding +considered an independent "Defined Word-instruction"[^dwi] that augments the behaviour of +the following instruction (also a Defined Word-instruction), but does **not** change the actual Decoding of that following instruction just because it is Prefixed. Unlike EXT100-163, where the Suffix is considered an entirely new Opcode Space, SVP64-Prefixed instructions **MUST NEVER** be treated or regarded @@ -99,7 +99,7 @@ as decoding of the Suffix. Multi-Issue Implementations may even Decode multiple 32-bit words in parallel and follow up with a second cycle of joining Prefix and Suffix "after-the-fact". Mixing and overlaying 64-bit Opcode Encodings into the -{SVP64 24-bit Prefix}{Defined word-instruction} space creates +{SVP64 24-bit Prefix}{Defined Word-instruction} space creates a hard dependency that catastrophically damages Multi-Issue Decoding. Therefore it has to be prohibited to accept RFCs which fundamentally violate this hard requirement. Under no circumstances @@ -157,7 +157,7 @@ RESERVED 32-bit future Opcode spaces. See [[sv/po9_encoding]]. ## Definition of "SVP64-Prefix" A 24-bit RISC-Paradigm Encoding area for Loop-Augmentation of the following -"Defined word-instruvtion". +"Defined Word-instruction-instruction". Used in the context of "An SVP64-Prefixed Defined Word-instruction", as separate and distinct from the 32-bit PO9-Prefix that holds a 24-bit SVP64 Prefix. @@ -176,7 +176,7 @@ Prefixed (either SVP64 or SVP64Single) and an Illegal Instruction Trap raised. *Architectural Note: Given that a "pre-classification" Decode Phase is -required (identifying whether the Suffix - Defined Word - is +required (identifying whether the Suffix - Defined Word-instruction - is Arithmetic/Logical, CR-op, Load/Store or Branch-Conditional), adding "Unvectorized" to this phase is not unreasonable.* @@ -743,8 +743,8 @@ of scope for this version of SVP64. ## SVP64 Remapped Encoding (`RM[0:23]`) In the SVP64 Vector Prefix spaces, the 24 bits 8-31 are termed `RM`. Bits -32-37 are the Primary Opcode of the Suffix "Defined Word". 38-63 are the -remainder of the Defined Word. Note that the new EXT232-263 SVP64 area +32-37 are the Primary Opcode of the Suffix "Defined Word-instruction". 38-63 are the +remainder of the Defined Word-instruction. Note that the new EXT232-263 SVP64 area it is obviously mandatory that bit 32 is required to be set to 1. | 0-5 | 6 | 7 | 8-31 | 32-37 | 38-64 |Description |