From: lkcl Date: Fri, 16 Sep 2022 10:38:46 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~405 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=f5f7baf590e10e583e4bdabc257cb1537ba80cf0;p=libreriscv.git --- diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn index 459b3bedf..8dd3d42bb 100644 --- a/openpower/sv/rfc/ls001.mdwn +++ b/openpower/sv/rfc/ls001.mdwn @@ -224,19 +224,18 @@ These Modes do not interact with SVSTATE per se. SVSTATE primarily controls the looping (quantity, order), RM influences the *elements* (the Suffix). There is however some close interaction when it comes to predication. -REMAP is separately -outlined in another section. - +REMAP is outlined separately. The primary options all of which are aimed at reducing instruction count and reducing assembler complexity are: -* element-width overrides, which dynamically redefine each SFFS or SFS +* **element-width overrides**, which dynamically redefine each SFFS or SFS Scalar prefixed instruction to be 8-bit, 16-bit, 32-bit or 64-bit operands **without requiring new 8/16/32 instructions.**[^pseudorewrite] This results in full BF16 and FP16 opcodes being added to the Power ISA **without adding BF16 or FP16 opcodes** including full conversion between all formats. -* predication. this is an absolutely essential feature for a 3D GPU VPU ISA. +* **predication**. + this is an absolutely essential feature for a 3D GPU VPU ISA. CR Fields are available as Predicate Masks hence the reason for their extension to 128. Twin-Predication is also provided: this may best be envisaged as back-to-back VGATHER-VSCATTER but is not restricted @@ -244,25 +243,27 @@ count and reducing assembler complexity are: of the predicates provides all of the other types of operations found in Vector ISAs (VEXTRACT, VINSERT etc) again with no need to actually provide explicit such instructions. -* Saturation. **all** LD/ST and Arithmetic and Logical operations may +* **Saturation**. **all** LD/ST and Arithmetic and Logical operations may be saturated (without adding explicit scalar saturated opcodes) -* Reduction and Prefix-Sum (Fibonnacci Series) Modes -* vec2/3/4 "Packing" and "Unpacking" (similar to VSX `vpack` and `vpkss`) +* **Reduction and Prefix-Sum** (Fibonnacci Series) Modes, including a + "Reverse Gear". +* **vec2/3/4 "Packing" and "Unpacking"** (similar to VSX `vpack` and `vpkss`) accessible in a way that is easier than REMAP, added for the same reasons that drove `vpack` and `vpkss` etc. to be added: pixel, audio, and 3D data manipulation. With Pack/Unpack being part of SVSTATE it can be applied *in-place* saving register file space (no copy/mv needed). -* Load/Store speculative "fault-first" behaviour, identical to SVE and RVV +* **Load/Store "fault-first"** speculative behaviour, + identical to SVE and RVV Fault-first: provides auto-truncation of a speculative sequential parallel LD/ST batch, helping solve the "SIMD Considered Harmful" stripmining problem from a Memory Access perspective. -* Data-Dependent Fail-First: a 100% Deterministic extension of the LDST +* **Data-Dependent Fail-First**: a 100% Deterministic extension of the LDST ffirst concept: first `Rc=1 BO test` failure terminates looping and truncates VL to that exact point. Useful for implementing algorithms such as `strcpy` in around 14 high-performance Vector instructions, the option exists to include or exclude the failing element. -* Predicate-result: a strategic mode that effectively turns all and any +* **Predicate-result**: a strategic mode that effectively turns all and any operations into a type of `cmp`. An `Rc=1 BO test` is performed and if failing that element result is **not** written to the regfile. The `Rc=1` Vector of co-results **is** always written (subject to usual predication).