Simple-V is effectively a type of "Zero-Overhead Loop Control" to which
an entire 24 bits are exclusively dedicated in a fully RISC-abstracted
-manner. This is why there are no Vector operations: *all* suitable
+manner. Within those 24-bits there are no Scalar instructions, and
+no Vector instructions: there is only "Loop Control".
+
+This is why there are no actuak Vector operations in Simple-V: *all* suitable
Scalar Operations are Vectorised or not at all. This has some extremely
important implications when considering adding new instructions, and
especially when allocating the Opcode Space for them.
Scalar Instructions must be simultaneously added in the corresponding
SVP64 opcode space with the exact same 32-bit "Defined Word" or they
- must not be added at all. Likewise instructions planned for addition
+ must not be added at all. Likewise, instructions planned for addition
in what is considered (wrongly) to be the exclusive "Vector" domain
must correspondingly be added in the Scalar space with the exact same
- 32-bit "Defined Word" or not at all.
+ 32-bit "Defined Word", or they must not be added at all.
Some explanation of the above is needed. Firstly, "Defined Word" is a term
used in Section 1.6.3 of the Power ISA v3 1 Book I: it means, in short,
"a 32 bit instruction", which can then be Prefixed by EXT001 to extend it
-to 64-bit. Prefixed-Prefixed (96-bit Variable-Length) encodings are
+to 64-bit (named EXT100-163).
+Prefixed-Prefixed (96-bit Variable-Length) encodings are
prohibited in v3.1 and they are just as prohibited in Simple-V: it's too
-complex in hardware.
+complex in hardware. This means that **only** 32-bit "Defined Words"
+may be Vectorised, and in particular it means that no 64-bit instruction
+(EXT100-163) may **ever** be Vectorised.
Secondly, the term "Vectoriseable" was used. This refers to "instructions
which if SVP64-Prefixed are actually meaningful". `sc` is meaningless
to Vectorise, for example, as is `sync` and `mtmsr` (there is only ever
going to be one MSR).
-The problem comes if the rationale is applied, "Unvectoriseable opcodes
-can be allocated to alternative instructions within the SVP64 Opcode space",
+The problem comes if the rationale is applied, "if unused,
+Unvectoriseable opcodes
+can therefore be allocated to alternative instructions mixing inside
+the SVP64
+Opcode space",
which unfortunately results in huge inadviseable complexity in HDL at the
-Decode Phase, attempting to discern between the two. Worse than that,
+Decode Phase, attempting to discern between the two types. Worse than that,
if the alternate 64-bit instruction is Vectoriseable but the 32-bit Scalar
"Defined Word" is already allocated, how can there ever be a Scalar version
of the alternate instruction? It would have to be added as a **completely
-different** 32-bit "Defined Word" and things go rapidly downhill in the
-Decoder from there.
+different** 32-bit "Defined Word", and things go rapidly downhill in the
+Decoder as well as the ISA from there.
Therefore to avoid risk and long-term damage to the Power ISA:
opcode space, as an Illegal Instruction.
A good example of the former is `mtmsr` because there is only one
-MSR register, and a good example of the latter is [[sv/mv.x]]
+MSR register (`sv.mtmsr` is meaningless, as is `sv.sc`),
+and a good example of the latter is [[sv/mv.x]]
which is so deeply problematic to add to any Scalar ISA that it was
rejected outright and an alternative route taken (Indexed REMAP).
Another good example would be Cross Product which has no meaning
-at all in a Scalar ISA. If any such Vector operation were ever added,
+at all in a Scalar ISA (Cross Product as a concept only applies
+to Mathematical Vectors). If any such Vector operation were ever added,
it would be **critically** important to reserve the exact same *Scalar*
-opcode in the *Scalar* Power ISA opcode space. There are
+opcode with the exact same "Defined Word" in the *Scalar* Power ISA
+opcode space, as an Illegal Instruction. There are
good reasons why Cross Product has not been proposed, but it serves
to illustrate the point as far as Architectural Resource Allocation is
concerned.
+Bottom line is that whilst this seems wasteful the alternatives are a
+destabilisation of the Power ISA and impractically-complex Hardware
+Decoders. With the Scalar Power ISA (v3.0, v3.1) already being comprehensive
+in the number of instructions, keeping further Decode complexity down is a
+high priority.
+
# Other Scalable Vector ISAs
These Scalable Vector ISAs are listed to aid in understanding and