Simple-V is a type of Vectorisation best described as a "Prefix Loop
Subsystem" similar to the 5 decades-old Zilog Z80 `LDIR` instruction and
to the 8086 `REP` Prefix instruction. More advanced features are similar
-to the Z80 `CPIR` instruction. If naively viewed one-dimensionally as an actual
-Vector ISA it introduces over 1.5 million 64-bit True-Scalable Vector instructions
-on the SFFS Subset and closer to 10 million 64-bit True-Scalable Vector
-instructions if introduced on VSX.
-SVP64, the instruction format used by Simple-V, is therefore best viewed
-as an orthogonal RISC-paradigm "Prefixing" subsystem instead.
+to the Z80 `CPIR` instruction. If naively viewed one-dimensionally as an
+actual Vector ISA it introduces over 1.5 million 64-bit True-Scalable
+Vector instructions on the SFFS Subset and closer to 10 million 64-bit
+True-Scalable Vector instructions if introduced on VSX. SVP64, the
+instruction format used by Simple-V, is therefore best viewed as an
+orthogonal RISC-paradigm "Prefixing" subsystem instead.
Except where explicitly stated all bit numbers remain as in the rest of
the Power ISA: in MSB0 form (the bits are numbered from 0 at the MSB on
of that following instruction. **All prefixed 32-bit instructions
(Defined Words) retain their non-prefixed encoding and definition**.
-Two apparent exceptions to the above hard rule exist: SV Branch-Conditional
-operations and LD/ST-update "Post-Increment" Mode. Post-Increment
-was considered sufficiently high priority (significantly reducing hot-loop
-instruction count) that one bit in the Prefix is reserved for it
-(Note the intention to release that bit and move Post-Increment instructions
-to EXT2xx).
-Vectorised Branch-Conditional operations "embed" the original Scalar
-Branch-Conditional behaviour into a much more advanced variant that
-is highly suited to High-Performance Computation (HPC), Supercomputing,
-and parallel GPU Workloads.
+Two apparent exceptions to the above hard rule exist: SV
+Branch-Conditional operations and LD/ST-update "Post-Increment" Mode.
+Post-Increment was considered sufficiently high priority (significantly
+reducing hot-loop instruction count) that one bit in the Prefix
+is reserved for it (Note the intention to release that bit and move
+Post-Increment instructions to EXT2xx). Vectorised Branch-Conditional
+operations "embed" the original Scalar Branch-Conditional behaviour into
+a much more advanced variant that is highly suited to High-Performance
+Computation (HPC), Supercomputing, and parallel GPU Workloads.
*Architectural Resource Allocation note: it is prohibited to accept RFCs
which fundamentally violate this hard requirement. Under no circumstances
* element-width overrides set the width of the *elements* in the
sequentially-numbered contiguous array.
-The relationship is best defined in Canonical form, below, in ANSI c
-as a union data structure. A key difference is that VSR elements are bounded
+The relationship is best defined in Canonical form, below, in ANSI c as a
+union data structure. A key difference is that VSR elements are bounded
fixed at 128-bit, where SVP64 elements are conceptually unbounded and
only limited by the Maximum Vector Length.
incrementally to the MSB end (confusingly numbered the lowest in
MSB0 ordering).
-When exclusively using MSB0-numbering, SVP64
-becomes unnecessarily complex to both express and subsequently understand:
-the required conditional subtractions from 63,
-31, 15 and 7 needed to express the fact that elements are LSB0-sequential
-unfortunately become a hostile minefield, obscuring both
-intent and meaning. Therefore for the
-purposes of this section the more natural **LSB0 numbering is assumed**
-and it is left to the reader to translate to MSB0 numbering.
+When exclusively using MSB0-numbering, SVP64 becomes unnecessarily complex
+to both express and subsequently understand: the required conditional
+subtractions from 63, 31, 15 and 7 needed to express the fact that
+elements are LSB0-sequential unfortunately become a hostile minefield,
+obscuring both intent and meaning. Therefore for the purposes of this
+section the more natural **LSB0 numbering is assumed** and it is left
+to the reader to translate to MSB0 numbering.
The Canonical specification for how element-sequential numbering and
element-width overrides is defined is expressed in the following c
int_regfile[RT].hwords[i] = int_regfile[RA].hwords[i] + int_regfile[RB].hwords[i]
```
-The most fundamental aspect here to understand is that the wrapping into
-subsequent Scalar GPRs that occurs on larger-numbered elements
-including and especially on smaller element widths is **deliberate and intentional**.
-From this Canonical definition it should be clear that sequential elements begin
-at the LSB end of any given underlying Scalar GPR, progress to the MSB end, and
-then to the LSB end of the *next numerically-larger Scalar GPR*. In the
-example above if VL=5 and RT=1 then the contents of GPR(1) and GPR(2) will
-be as follows. For clarity in the table below:
+The most fundamental aspect here to understand is that the wrapping
+into subsequent Scalar GPRs that occurs on larger-numbered elements
+including and especially on smaller element widths is **deliberate
+and intentional**. From this Canonical definition it should be clear
+that sequential elements begin at the LSB end of any given underlying
+Scalar GPR, progress to the MSB end, and then to the LSB end of the
+*next numerically-larger Scalar GPR*. In the example above if VL=5
+and RT=1 then the contents of GPR(1) and GPR(2) will be as follows.
+For clarity in the table below:
* Both MSB0-ordered bitnumbering *and* LSB-ordered bitnumbering are shown
* The GPR-numbering is considered LSB0-ordered
```
Note that the upper 48 bits of GPR(2) would **not** be modified due to
-the example having VL=5. Thus on "wrapping" - sequential progression from
-GPR(1) into GPR(2) - the 5th result modifies
-**only** the bottom 16 LSBs of GPR(1).
+the example having VL=5. Thus on "wrapping" - sequential progression
+from GPR(1) into GPR(2) - the 5th result modifies **only** the bottom
+16 LSBs of GPR(1).
Hardware Architectural note: to avoid a Read-Modify-Write at the register
file it is strongly recommended to implement byte-level write-enable lines
```
In other words, this perspective really is no different from the situation
-where the actual Register File is treated as an Industry-standard byte-level-addressable
-Little-Endian-addressed SRAM. Note that this perspective does **not**
-involve `MSR.LE` in any way shape or form because `MSR.LE` is directly
-in control of the Memory-to-Register byte-ordering. This section is
-exclusively about how to correctly perceive Simple-V-Augmented **Register**
-Files.
+where the actual Register File is treated as an Industry-standard
+byte-level-addressable Little-Endian-addressed SRAM. Note that
+this perspective does **not** involve `MSR.LE` in any way shape or
+form because `MSR.LE` is directly in control of the Memory-to-Register
+byte-ordering. This section is exclusively about how to correctly perceive
+Simple-V-Augmented **Register** Files.
**Comparative equivalent using VSR registers**
For a comparative data point the VSR Registers may be expressed in the
same fashion. The c code below is directly an expression of Figure 97 in
-Power ISA Public v3.1 Book I Section 6.3 page 258, *after compensating for
-MSB0 numbering in both bits and elements, adapting in full to LSB0 numbering,
-and obeying LE ordering*.
+Power ISA Public v3.1 Book I Section 6.3 page 258, *after compensating
+for MSB0 numbering in both bits and elements, adapting in full to LSB0
+numbering, and obeying LE ordering*.
-**Crucial to understanding why the subtraction from 1,3,7,15 is present
-is because the Power ISA numbers VSX Registers elements also in MSB0 order**.
+**Crucial to understanding why the subtraction from 1,3,7,15 is present is
+because the Power ISA numbers VSX Registers elements also in MSB0 order**.
SVP64 very specifically numbers elements in **LSB0** order with the first
-element (numbered zero) being at the bitwise-numbered **LSB** end of the register, where VSX
-does the reverse: places the numerically-*highest* (last-numbered) element at
-the LSB end of the register.
+element (numbered zero) being at the bitwise-numbered **LSB** end of the
+register, where VSX does the reverse: places the numerically-*highest*
+(last-numbered) element at the LSB end of the register.
```
}
```
-For VSR Registers one key difference is that the overlay of different element
-widths is clearly a *bounded static quantity*, whereas for Simple-V the
-elements are
-unrestrained and permitted to flow into *successive underlying Scalar registers*.
-This difference is absolutely critical to a full understanding of the entire
-Simple-V paradigm and why element-ordering, bit-numbering *and register numbering*
-are all so strictly defined.
+For VSR Registers one key difference is that the overlay of different
+element widths is clearly a *bounded static quantity*, whereas for
+Simple-V the elements are unrestrained and permitted to flow into
+*successive underlying Scalar registers*. This difference is absolutely
+critical to a full understanding of the entire Simple-V paradigm and
+why element-ordering, bit-numbering *and register numbering* are all so
+strictly defined.
-Implementations are not permitted to violate the Canonical definition. Software
-will be critically relying on the wrapped (overflow) behaviour inherently
-implied by the unbounded variable-length c arrays.
+Implementations are not permitted to violate the Canonical
+definition. Software will be critically relying on the wrapped (overflow)
+behaviour inherently implied by the unbounded variable-length c arrays.
-Illustrating the exact same loop with the exact same effect as achieved by Simple-V
-we are first forced to create wrapper functions, to cater for the fact
-that VSR register elements are static bounded:
+Illustrating the exact same loop with the exact same effect as achieved
+by Simple-V we are first forced to create wrapper functions, to cater
+for the fact that VSR register elements are static bounded:
```
int calc_VSR_reg_offs(int elt, int width) {
acts as if SV had not been applied at all to the instruction (an
"identity transformation").
-The fact that `VL` is dynamic and can be set to any value at runtime based
-on program conditions and behaviour means very specifically that
-`scalar identity behaviour` is **not** a redundant encoding. If the
-only means by which VL could be set was by way of static-compiled
-immediates then this assertion would be false. VL should not
-be confused with MAXVL when understanding this key aspect of SimpleV.
+The fact that `VL` is dynamic and can be set to any value at runtime
+based on program conditions and behaviour means very specifically that
+`scalar identity behaviour` is **not** a redundant encoding. If the only
+means by which VL could be set was by way of static-compiled immediates
+then this assertion would be false. VL should not be confused with
+MAXVL when understanding this key aspect of SimpleV.
## Register Naming and size
-As indicated above SV Registers are simply the GPR, FPR and CR
-register files extended linearly to larger sizes; SV Vectorisation
-iterates sequentially through these registers (LSB0 sequential ordering
-from 0 to VL-1).
+As indicated above SV Registers are simply the GPR, FPR and CR register
+files extended linearly to larger sizes; SV Vectorisation iterates
+sequentially through these registers (LSB0 sequential ordering from 0
+to VL-1).
Where the integer regfile in standard scalar Power ISA v3.0B/v3.1B is
r0 to r31, SV extends this as r0 to r127. Likewise FP registers are