the Power ISA is an entire area of research on its own deemed unlikely to be
achievable.
-## fclass
+## fclass and GPR-FPR moves
[[sv/fclass]] - just one instruction. With SFFS being locked down to exclude VSX,
and there being no desire within the nascent OpenPOWER ecosystem outside of IBM to
such that it is stand-alone capable. One omission based on the assumption
that VSX would always be present is an equivalent to `xvtstdcsp`.
+Similar arguments apply to the GPR-INT move operations, with the opportunity taken
+to add rounding modes present in other ISAs that Power ISA VSX PackedSIMD does not
+have. Javascript rounding, one of the worst offenders of Computer Science, requires
+a phenomental 35 instructions with *six branches* to emulate in Power ISA! For
+desktop as well as Server HTML/JS back-end execution of javascript this becomes an
+obvious priority, recognised already by ARM as just one example.
+
## (f)mv.swizzle
[[sv/mv.swizzle]] is dicey. It is a 2-in 2-out operation whose value as a Scalar
way that ARM SVE predicated-move extends 3-operand "overwrite" opcodes to full
independent 3-in 1-out.
-
+# BMI (bitmanipulation) group.
+
+Whilst the [[sv/vector_ops]] instructions are only two in number, in reality the
+`bmask` instruction has a Mode field allowing it to cover **24** instructions,
+more than have been added to any other CPUs by ARM, Intel or AMD. Analyis of
+the BMI sets of these CPUs shows simple patterns that can greatly simplify both
+Decode and implementation. These are sufficiently commonly used, saving instruction
+count regularly, that they justify going into EXT0xx.
+
+The other instruction is `cprop` - Carry-Propagation - which takes the P and Q
+from carry-propagation algorithms and generates carry look-ahead. Greatly
+increases the efficiency of arbitrary-precision integer arithmetic by combining
+what would otherwise be half a dozen instructions into one. However it is
+still not a huge priority unlike `bmask` so is probably best placed in EXT2xx.
+
+* Float-Load-Immediate
+
+Very easily justified. As explained in [[ls002]] these
+always saves one LD L1/2/3 D-Cache memory-lookup operation, by virtue of the Immediate
+FP value being in the I-Cache side. It is such a high priority that these instuctions
+are easily justifiable adding into EXT0xx, despite requiring a 16-bit immediate.
+By designing the second-half instruction as a Read-Modify-Write it saves on XO
+bitlength (only 5 bits), and can be macro-op fused with its first-half to store a
+full IEEE754 FP32 immediate into a register.
+
+There is little point in putting these instructions into EXT2xx. Their very benefit
+and inherent value *is* as 32-bit instructions, not 64-bit ones. Likewise there is
+less value in taking up EXT1xx Enoding space because EXT1xx only brings an additional
+16 bits (approx) to the table, and that is provided already by the second-half
+instuction.
+
+Thus they qualify as both high priority and also EXT0xx candidates.
[[!inline pages="openpower/sv/rfc/ls012/areas.mdwn" raw=yes ]]