From: lkcl <lkcl@web>
Date: Fri, 16 Sep 2022 10:38:46 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~405
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=f5f7baf590e10e583e4bdabc257cb1537ba80cf0;p=libreriscv.git

---

diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn
index 459b3bedf..8dd3d42bb 100644
--- a/openpower/sv/rfc/ls001.mdwn
+++ b/openpower/sv/rfc/ls001.mdwn
@@ -224,19 +224,18 @@ These Modes do not interact with SVSTATE per se.  SVSTATE
 primarily controls the looping (quantity, order), RM
 influences the *elements* (the Suffix).  There is however
 some close interaction when it comes to predication.
-REMAP is separately
-outlined in another section.
-
+REMAP is outlined separately.
 The primary options all of which are aimed at reducing instruction
 count and reducing assembler complexity are:
 
-* element-width overrides, which dynamically redefine each SFFS or SFS
+* **element-width overrides**, which dynamically redefine each SFFS or SFS
   Scalar prefixed instruction to be 8-bit, 16-bit, 32-bit or 64-bit
   operands **without requiring new 8/16/32 instructions.**[^pseudorewrite]
   This results in full BF16 and FP16 opcodes being added to the Power ISA
   **without adding BF16 or FP16 opcodes** including full conversion
   between all formats.
-* predication.  this is an absolutely essential feature for a 3D GPU VPU ISA.
+* **predication**.
+  this is an absolutely essential feature for a 3D GPU VPU ISA.
   CR Fields are available as Predicate Masks hence the reason for their
   extension to 128. Twin-Predication is also provided: this may best
   be envisaged as back-to-back VGATHER-VSCATTER but is not restricted
@@ -244,25 +243,27 @@ count and reducing assembler complexity are:
   of the predicates provides all of the other types of operations
   found in Vector ISAs (VEXTRACT, VINSERT etc) again with no need
   to actually provide explicit such instructions.
-* Saturation. **all** LD/ST and Arithmetic and Logical operations may
+* **Saturation**. **all** LD/ST and Arithmetic and Logical operations may
   be saturated (without adding explicit scalar saturated opcodes)
-* Reduction and Prefix-Sum (Fibonnacci Series) Modes
-* vec2/3/4  "Packing" and "Unpacking" (similar to VSX `vpack` and `vpkss`)
+* **Reduction and Prefix-Sum** (Fibonnacci Series) Modes, including a
+  "Reverse Gear".
+* **vec2/3/4 "Packing" and "Unpacking"** (similar to VSX `vpack` and `vpkss`)
   accessible in a way that is easier than REMAP, added for the same reasons
   that drove `vpack` and `vpkss` etc. to be added: pixel, audio, and 3D
   data manipulation. With Pack/Unpack being part of SVSTATE it can be
   applied *in-place* saving register file space (no copy/mv needed).
-* Load/Store speculative "fault-first" behaviour, identical to SVE and RVV
+* **Load/Store "fault-first"** speculative behaviour,
+  identical to SVE and RVV
   Fault-first: provides auto-truncation of a speculative sequential parallel
   LD/ST batch, helping
   solve the "SIMD Considered Harmful" stripmining problem from a Memory
   Access perspective.
-* Data-Dependent Fail-First: a 100% Deterministic extension of the LDST
+* **Data-Dependent Fail-First**: a 100% Deterministic extension of the LDST
   ffirst concept: first `Rc=1 BO test` failure terminates looping and 
   truncates VL to that exact point. Useful for implementing algorithms
   such as `strcpy` in around 14 high-performance Vector instructions, the
   option exists to include or exclude the failing element.
-* Predicate-result: a strategic mode that effectively turns all and any
+* **Predicate-result**: a strategic mode that effectively turns all and any
   operations into a type of `cmp`. An `Rc=1 BO test` is performed and if
   failing that element result is **not** written to the regfile. The `Rc=1`
   Vector of co-results **is** always written (subject to usual predication).