operations are firmly out of scope for this section, being covered fully
by [[sv/normal]].
-* Examples of Vectorizeable Defined Words to which this section does
+* Examples of Vectorizeable Defined Word-instructions to which this section does
apply is
- `mfcr` and `cmpi` (3 bit operands) and
- `crnor` and `crand` (5 bit operands).
# SV Load and Store
-This section describes how Standard Load/Store Defined Words are exploited as
+This section describes how Standard Load/Store Defined Word-instructions are exploited as
Element-level Load/Stores and augmented to create direct equivalents of
Vector Load/Store instructions.
Also included in SVP64 LD/ST is Element-width overrides and Twin-Predication.
Note also that Indexed [[sv/remap]] mode may be applied to both Scalar
-LD/ST Immediate Defined Words *and* LD/ST Indexed Defined Words.
+LD/ST Immediate Defined Word-instructions *and* LD/ST Indexed Defined Word-instructions.
LD/ST-Indexed should not be conflated with Indexed REMAP mode:
clarification is provided below.
RA=0 # vec - first one is valid, contains ptr
imm = 8 # offset_of(ptr->next)
for i in range(VL):
- # this part is the Scalar Defined Word (standard scalar ld operation)
+ # this part is the Scalar Defined Word-instruction (standard scalar ld operation)
EA = GPR(RA+i) + imm # ptr + offset(next)
data = MEM(EA, 8) # 64-bit address of ptr->next
# was a normal vector-ld up to this point. now the Data-Fail-First
Although Rc=1 on LD/ST is a rare occurrence at present, future versions
of Power ISA *might* conceivably have Rc=1 LD/ST Scalar instructions, and
with the SVP64 Vectorization Prefixing being itself a RISC-paradigm that
-is itself fully-independent of the Scalar Suffix Defined Words, prohibiting
+is itself fully-independent of the Scalar Suffix Defined Word-instructions, prohibiting
the possibility of Rc=1 Data-Dependent Mode on future potential LD/ST
operations is not strategically sound.
**Parallel Reduction REMAP**
No REMAP Schedule is prohibited in SVP64 because the RISC-paradigm Prefix
-is completely separate from the RISC-paradigm Scalar Defined Words. Although
+is completely separate from the RISC-paradigm Scalar Defined Word-instructions. Although
obscure there does exist the outside possibility that a potential use for
Parallel Reduction Schedules on LD/ST would find a use in Computer Science.
Readers are invited to contact the authors of this document if one is ever
In its simplest form, the Simple-V Loop/Vector concept is a Prefixing
system (similar to the 8086 `REP` instruction and the Z80 `LDIR`)
that both augments its
-following Defined Word Suffix, and also may repeat that instruction
+following Defined Word-instruction Suffix, and also may repeat that instruction
with optional sequential register offsets from those given in the
Suffix. Register numbers may also be extended (larger register files).
More advanced features add predication, element-width overrides, and
SVP64 is a well-defined implementation of the Simple-V Loop/Vector concept,
in a 32-bit Prefix format, that exploits the following instruction
-(the Defined Word) using it as a "template". It requires 24 bits,
+(the Defined Word-instruction) using it as a "template". It requires 24 bits,
some of which are common to all Suffixes, and some Mode bits are specific
-to the Defined Word class: Load/Store-Immediate, Load/Store-Indexed,
+to the Defined Word-instruction class: Load/Store-Immediate, Load/Store-Indexed,
Arithmetic/Logical, Condition Register operations, and Branch-Conditional.
Anything not falling into those five categories is termed "Unvectorizable".
**Definition of SVP64Single Prefixing:**
-A 32-bit Prefix in front of a Defined Word that extends register
+A 32-bit Prefix in front of a Defined Word-instruction that extends register
numbers (allows larger register files), adds single-bit predication,
element-width overrides, and optionally adds Saturation to Arithmetic
instructions that normally would not have it. *SVP64Single is in Draft only*
(either SVP64 or SVP64Single) and an Illegal Instruction Trap raised.
*Hardware Architectural Note: Given that a "pre-classification" Decode Phase
-is required (identifying whether the Suffix - Defined Word - is
+is required (identifying whether the Suffix - Defined Word-instruction - is
Arithmetic/Logical, CR-op, Load/Store or Branch-Conditional), adding
"Unvectorizable" to this phase is not unreasonable.*
* **x** - a `RESERVED` encoding. Illegal Instruction Trap must be raised
* **n** - a specification-defined value
* **!zero** - a non-zero specification-defined value
-* **DWd** - when including bit 32 is a "Defined Word" as explained in
+* **DWd** - when including bit 32 is a "Defined Word-instruction" as explained in
Book I Section 1.6 (Public v3.1 p11)
Note that for the future SVP64Single Encoding (currently RESERVED3 and 4)
*Architectural Resource Allocation Note: Similar to ARM's `MOVPRFX`
instruction and the original x86 REP instruction, despite "influence" over
the Suffix, the Suffix is entirely independent of the Prefix. Therefore
-**under no circumstances** must different Defined Words (different from
-the same **Un-Prefixed** Defined Word) be allocated within any `EXT{z}`
+**under no circumstances** must different Defined Word-instructions (different from
+the same **Un-Prefixed** Defined Word-instruction) be allocated within any `EXT{z}`
prefixed or unprefixed space for a given value of `z` of 0, 2 or 3: the
results would be catastrophic. Even if Unvectorizable an instruction
-Defined Word space **must** have the exact same Instruction and exact same
+Defined Word-instruction space **must** have the exact same Instruction and exact same
Instruction Encoding in all spaces being RESERVED (Illegal Instruction
Trap if Unvectorizable) or not be allocated at all. This is required
as an inviolate hard rule governing Primary Opcode 9 that may not be
revoked under any circumstances. A useful way to think of this is that
the Prefix Encoding is, like the 8086 REP instruction, an independent
-32-bit Defined Word. The only semi-exceptions are the Post-Increment
+32-bit Defined Word-instruction. The only semi-exceptions are the Post-Increment
Mode of LD/ST-Update and Vectorized Branch-Conditional.*
Note a particular consequence of the application of the above paragraph:
revoked is if EXT022 itself is revoked. **All and any** re-definitions
modifications enhancements clarifications that apply to EXT022 **also
apply to these two new areas** because due to the Prefixes being
-independent Defined Words the three areas are actually one and the same
-area, just as *all* Scalar Defined Words are.
+independent Defined Word-instructions the three areas are actually one and the same
+area, just as *all* Scalar Defined Word-instructions are.
Encoding spaces and their potential are illustrated:
* **PO9**-PO1 Prefixed-Prefixed (96-bit) instructions are prohibited. EXT1xx is
thus inherently Unvectorizable as the EXT1xx prefix is 32-bit
- on top of an SVP64 prefix which is 32-bit on top of a Defined Word
+ on top of an SVP64 prefix which is 32-bit on top of a Defined Word-instruction
and the complexity at the Decoder becomes too great for High
Performance Multi-Issue systems.
* There is however no reason why PO9-PO1 (EXT901?) as an entirely new RESERVED
can therefore also be precise. The final result will be in the first
non-predicate-masked-out destination element, but due again to
the deterministic schedule programmers may find uses for the intermediate
-results, even for non-commutative Defined Word operations.
+results, even for non-commutative Defined Word-instruction operations.
Additionally, because the intermediate results are always written out
it is possible to service Precise Interrupts without affecting latency
(a common limitation of Vector ISAs implementing explicit
To achieve Sub-Vector Horizontal Reduction, Pack/Unpack should be enabled,
which will turn the Schedule around such that issuing of the Scalar
-Defined Words is done with SUBVL looping as the inner loop not the
+Defined Word-instructions is done with SUBVL looping as the inner loop not the
outer loop. Rc=1 with Sub-Vectors (SUBVL=2,3,4) is `UNDEFINED` behaviour.
*Programmer's Note: Overwrite Parallel Reduction with Sub-Vectors
[Zero-Overhead Loop](https://en.m.wikipedia.org/wiki/Zero-overhead_looping)
Engine on top of
Scalar operations some clear guidelines are needed on how both
-existing "Defined Words" (Public v3.1 Section 1.6.3 term) and future
+existing "Defined Word-instructions" (Public v3.1 Section 1.6.3 term) and future
Scalar operations are added within the 64-bit space. Examples of
legal and illegal allocations are given later.
This is extremely important because the worst possible situation
is if a conflicting Scalar instruction is added by another Stakeholder,
which then turns out to be Vectorizeable: it would then have to be
-added to the Vector Space with a *completely different Defined Word*
+added to the Vector Space with a *completely different Defined Word-instruction*
and things go rapidly downhill in the Decode Phase from there.
Setting a simple inviolate rule helps avoid this scenario but does
need to be borne in mind when discussing potential allocation
* EXT009, like EXT001 of Public v3.1, is **defined** as a 64-bit
encoding. This makes Multi-Issue Length-identification trivial.
* bit 6 if 0b1 is 100% for Simple-V augmentation of (Public v3.1 1.6.3)
- "Defined Words" (aka EXT000-063), with the exception of 0x26000000
+ "Defined Word-instructions" (aka EXT000-063), with the exception of 0x26000000
as a Prefix, which is a new RESERVED encoding.
* when bit 6 is 0b0 and bits 32-33 are 0b11 are **defined** as also
allocated to Simple-V
which **must** be granted corresponding space
in SVP64.
* Anything Vectorized-EXT000-063 is **automatically** being
- requested as 100% Reserved for every single "Defined Word"
+ requested as 100% Reserved for every single "Defined Word-instruction"
(Public v3.1 1.6.3 definition). Vectorized-EXT001 or EXT009
is defined as illegal.
* Any **future** instruction
**EXT000-EXT063**
These are Scalar word-encodings. Often termed "v3.0 Scalar" in this document
-Power ISA v3.1 Section 1.6.3 Book I calls it a "defined word".
+Power ISA v3.1 Section 1.6.3 Book I calls it a "Defined Word-instruction".
| 0-5 | 6-31 |
|--------|--------|
-| PO | EXT000-063 "Defined word" |
+| PO | EXT000-063 "Defined Word-instruction" |
**SVP64Single:{EXT000-063}** bit6=old bit7=scalar
This encoding, identical to SVP64Single:{EXT248-263},
-introduces SVP64Single Augmentation of Scalar "defined words".
+introduces SVP64Single Augmentation of Scalar "Defined Word-instructions".
All meanings must be identical to EXT000-063, and is is likewise
prohibited to add an instruction in this area without also adding
the exact same (non-Augmented) instruction in EXT000-063 with the
the use of 0x12345678 for fredmv in scalar but fishmv in Vector is
illegal. the suffix in both 64-bit locations
must be allocated to a Vectorizeable EXT000-063
-"Defined Word" (Public v3.1 Section 1.6.3 definition)
+"Defined Word-instruction" (Public v3.1 Section 1.6.3 definition)
or not at all.
\newpage{}
| 64bit | sv.fishmv | 0x25nnnnnn | 0x12345678| vector SVP64:EXT2nn |
Both of these Simple-V operations are illegally-allocated. The fact that
-there does not exist a scalar "Defined Word" (even for EXT200-263) - the
+there does not exist a scalar "Defined Word-instruction" (even for EXT200-263) - the
unallocated block - means that the instruction may **not** be allocated in
the Simple-V space.
| 64bit | ss.fishmv | 0x24!zero | 0x10345678| scalar SVP64Single:EXT2nn |
| 64bit | sv.fishmv | 0x25nnnnnn | 0x10345678| vector SVP64:EXT2nn |
-This is an illegal attempt to place an EXT004 "Defined Word"
+This is an illegal attempt to place an EXT004 "Defined Word-instruction"
(Public v3.1 Section 1.6.3) into the EXT2nn Vector space.
This is not just illegal it is not even possible to achieve.
If attempted, by dropping EXT004 into bits 32-37, the top two
[^likeext001]: SVP64-Single is remarkably similar to the "bit 1" of EXT001 being set to indicate that the 64-bits is to be allocated in full to a new encoding, but in fact SVP64-single still embeds v3.0 Scalar operations.
[^pseudorewrite]: elwidth overrides does however mean that all SFS / SFFS pseudocode will need rewriting to be in terms of XLEN. This has the indirect side-effect of automatically making a 32-bit Scalar Power ISA Specification possible, as well as a future 128-bit one (Cross-reference: RISC-V RV32 and RV128
[^only2]: reminder that this proposal only needs 75% of two POs for Scalar instructions. The rest of EXT200-263 is for general use.
-[^ext001]: Recall that EXT100 to EXT163 is for Public v3.1 64-bit-augmented Operations prefixed by EXT001, for which, from Section 1.6.3, bit 6 is set to 1. This concept is where the above scheme originated. Section 1.6.3 uses the term "defined word" to refer to pre-existing EXT000-EXT063 32-bit instructions so prefixed to create the new numbering EXT100-EXT163, respectively
+[^ext001]: Recall that EXT100 to EXT163 is for Public v3.1 64-bit-augmented Operations prefixed by EXT001, for which, from Section 1.6.3, bit 6 is set to 1. This concept is where the above scheme originated. Section 1.6.3 uses the term "Defined Word-instruction" to refer to pre-existing EXT000-EXT063 32-bit instructions so prefixed to create the new numbering EXT100-EXT163, respectively
[^futurevsx]: A future version or other Stakeholder *may* wish to drop Simple-V onto VSX: this would be a separate RFC
[^vsx256]: imagine a hypothetical future VSX-256 using the exact same instructions as VSX. the binary incompatibility introducrd would catastrophically **and retroactively** damage existing IBM POWER8,9,10 hardware's reputation and that of Power ISA overall.
[^autovec]: Compiler auto-vectorization for best exploitation of SIMD and Vector ISAs on Scalar programming languages (c, c++) is an Indusstry-wide known-hard decades-long problem. Cross-reference the number of hand-optimised assembler algorithms.
```
Prefix bits 6:7 are used to identify one of four prefix for-
mat types. When bit 6 is set to 0 (prefix types 00 and
-01), the suffix is not a defined word instruction (i.e.,
+01), the suffix is not a Defined Word-instruction instruction (i.e.,
requires the prefix to identify the alternate opcode
space the suffix is assigned to as well as additional or
extended operand and/or control fields); when bit 6 is
set to 1 (prefix types 10 and 11), the prefix is modifying
-the behavior of a defined word instruction in the suffix.
+the behavior of a Defined Word-instruction instruction in the suffix.
```
thus, we have:
that "Libre-SOC" != "RED Semiconductor Ltd". The two are completely
**separate** organisations*.
-Worth bearing in mind during evaluation that every "Defined Word" may
-or may not be Vectorizeable, but that every "Defined Word" should have
+Worth bearing in mind during evaluation that every "Defined Word-instruction" may
+or may not be Vectorizeable, but that every "Defined Word-instruction" should have
merits on its own, not just when Vectorized, precisely because the
instructions are Scalar. An example of a borderline
-Vectorizeable Defined Word is `mv.swizzle` which only really becomes
+Vectorizeable Defined Word-instruction is `mv.swizzle` which only really becomes
high-priority for Audio/Video, Vector GPU and HPC Workloads, but has
less merit as a Scalar-only operation, yet when SVP64Single-Prefixed
can be part of an atomic Compare-and-Swap sequence.
SVP64 Prefixing - also known by the terms "Zero-Overhead-Loop-Prefixing"
as well as "True-Scalable-Vector Prefixing" - also literally brings new
-dimensions to the Power ISA. Thus when adding new Scalar "Defined Words"
+dimensions to the Power ISA. Thus when adding new Scalar "Defined Word-instructions"
it has to unavoidably and simultaneously be taken into consideration
their value when Vector-Prefixed, *as well as* SVP64Single-Prefixed.
Transitive Closure (on top of a cumulatively-applied max instruction).
Excpt for `svstep` which is Vectorizeable the Management Instructions
-themselves are all 32-bit Defined Words (Scalar Operations), so
+themselves are all 32-bit Defined Word-instructions (Scalar Operations), so
PO1-Prefixing is perfectly reasonable. SVP64 Management instructions
of which there are only 6 are all 5 or 6 bit XO, meaning that the opcode
space they take up in EXT0xx is not alarmingly high for their intrinsic
1. bmask is a synthesis and generalisation of every "TBM" instruction with additional
options not found in any other ISA BMI group.
-2. grevluti as a 32-bit Defined Word is capable of generating over a thousand useful
+2. grevluti as a 32-bit Defined Word-instruction is capable of generating over a thousand useful
regular-patterned 64-bit "magic constants" that otherwise require either a Load
or require several instructions to synthesise
3. word halfword byte nibble 2-bit 1-bit reversal at multiple levels are all achieved
which is a different convention from that used elsewhere in the Power ISA.
The SVP64 prefix always comes before the suffix in PC order and must be
-considered an independent "Defined word-instruction"[^dwi] that augments the behaviour of
-the following instruction (also a Defined word-instruction), but does **not** change the actual Decoding
+considered an independent "Defined Word-instruction"[^dwi] that augments the behaviour of
+the following instruction (also a Defined Word-instruction), but does **not** change the actual Decoding
of that following instruction just because it is Prefixed. Unlike EXT100-163,
where the Suffix is considered an entirely new Opcode Space,
SVP64-Prefixed instructions **MUST NEVER** be treated or regarded
Decode multiple 32-bit words in parallel and follow up with a second
cycle of joining Prefix and Suffix "after-the-fact".
Mixing and overlaying 64-bit Opcode Encodings into the
-{SVP64 24-bit Prefix}{Defined word-instruction} space creates
+{SVP64 24-bit Prefix}{Defined Word-instruction} space creates
a hard dependency that catastrophically damages Multi-Issue Decoding.
Therefore it has to be prohibited to accept RFCs
which fundamentally violate this hard requirement. Under no circumstances
## Definition of "SVP64-Prefix"
A 24-bit RISC-Paradigm Encoding area for Loop-Augmentation of the following
-"Defined word-instruvtion".
+"Defined Word-instruction-instruction".
Used in the context of "An SVP64-Prefixed Defined Word-instruction", as separate and
distinct from the 32-bit PO9-Prefix that holds a 24-bit SVP64 Prefix.
Trap raised.
*Architectural Note: Given that a "pre-classification" Decode Phase is
-required (identifying whether the Suffix - Defined Word - is
+required (identifying whether the Suffix - Defined Word-instruction - is
Arithmetic/Logical, CR-op, Load/Store or Branch-Conditional),
adding "Unvectorized" to this phase is not unreasonable.*
## SVP64 Remapped Encoding (`RM[0:23]`)
In the SVP64 Vector Prefix spaces, the 24 bits 8-31 are termed `RM`. Bits
-32-37 are the Primary Opcode of the Suffix "Defined Word". 38-63 are the
-remainder of the Defined Word. Note that the new EXT232-263 SVP64 area
+32-37 are the Primary Opcode of the Suffix "Defined Word-instruction". 38-63 are the
+remainder of the Defined Word-instruction. Note that the new EXT232-263 SVP64 area
it is obviously mandatory that bit 32 is required to be set to 1.
| 0-5 | 6 | 7 | 8-31 | 32-37 | 38-64 |Description |