bug 676: noted a way to reduce the number of instructions

[libreriscv.git] / openpower / sv / setvl.mdwn
diff --git a/openpower/sv/setvl.mdwn b/openpower/sv/setvl.mdwn

index f7132b6b5468559bb9a215b7fe2ceb1267d65339..d2a2031a5b6e3e633ab7c6614d1b1fdd62682a08 100644 (file)
--- a/openpower/sv/setvl.mdwn
+++ b/openpower/sv/setvl.mdwn
@@ -1,16 +1,20 @@
  # setvl: Set Vector Length
  
+<!-- hide -->
  See links:
  
  * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
  * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
  * <https://bugs.libre-soc.org/show_bug.cgi?id=587>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=914> TODO: setvl should not set SO
  * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
  * <https://bugs.libre-soc.org/show_bug.cgi?id=927> bug - RT>=32
  * <https://bugs.libre-soc.org/show_bug.cgi?id=862> VF Predication
+* <https://bugs.libre-soc.org/show_bug.cgi?id=1222> Rc=1 enhancement needed
  * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
  * [[sv/svstep]]
  * pseudocode [[openpower/isa/simplev]]
+<!-- show -->
  
  Add the following section to the Simple-V Chapter
  
@@ -71,30 +75,31 @@ Special Registers Altered:
  * `vs` - bit 24 - allows for setting of VL
  * `vf` - bit 25 - sets "Vertical First Mode".
  
-Note that in immediate setting mode VL and MVL start from **one**
-but that this is compensated for in the assembly notation.
-i.e. that an immediate value of 1 in assembler notation
-actually places the value 0b0000000 in the `SVi` field bits:
-on execution the `setvl` instruction adds one to the decoded
-`SVi` field bits, resulting in
-VL/MVL being set to 1. This allows VL to be set to values
-ranging from 1 to 128 with only 7 bits instead of 8.
-Setting VL/MVL
-to 0 would result in all Vector operations becoming `nop`.  If this is
-truly desired (nop behaviour) then setting VL and MVL to zero is to be
-done via the [[SVSTATE SPR|sv/sprs]].
+Note that in immediate setting mode VL and MVL start from **one** but that
+this is compensated for in the assembly notation.  i.e. that an immediate
+value of 1 in assembler notation actually places the value 0b0000000 in
+the `SVi` field bits: on execution the `setvl` instruction adds one to
+the decoded `SVi` field bits, resulting in VL/MVL being set to 1. In future
+this will allow VL to be set to values ranging from 1 to 128 with only 7 bits
+instead of 8.  Setting VL/MVL to 0 would result in all Vector operations
+becoming `nop`.  If this is truly desired (nop behaviour) then setting
+VL and MVL to zero is to be done via the [[SVSTATE SPR|sv/sprs]].
  
  Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
  
+```
      setvli   VL=8   : setvl  r0, r0, VL=8, vf=0, vs=1, ms=0
      setvli.  VL=8   : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
      setmvli  MVL=8  : setvl  r0, r0, MVL=8, vf=0, vs=0, ms=1
      setmvli. MVL=8  : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
+```
  
  Additional pseudo-op for obtaining VL without modifying it (or any state):
  
+```
      getvl  r5      : setvl  r5, r0, vf=0, vs=0, ms=0
      getvl. r5      : setvl. r5, r0, vf=0, vs=0, ms=0
+```
  
  Note that whilst it is possible to set both MVL and VL from the same
  immediate, it is not possible to set them to different immediates in
@@ -117,45 +122,56 @@ from different sources is as follows:
  The reasoning here is that the opportunity to set RT equal to the
  immediate `SVi+1` is sacrificed in favour of setting from CTR.
  
-## Unusual Rc=1 behaviour
+**Unusual Rc=1 behaviour**
  
-Normally, the return result from an instruction is in `RT`. With
-it being possible for `RT=0` to mean that `CTR` mode is to be read,
-some different semantics are needed.
+Normally, the return result from an instruction is in `RT`. With it
+being possible for `RT=0` to mean that `CTR` mode is to be read, some
+different semantics are needed.
  
  CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
  overflow may occur: `VL`, if set either from an immediate or from `CTR`,
  may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
  
-In reality it is **`VL`** being set. Therefore, rather
-than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE
-is set if `VL` is non-zero.
+In reality it is **`VL`** being set. Therefore, rather than `CR0`
+testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE is set if `VL`
+is non-zero.
  
  **SUBVL**
  
  Sub-vector elements are not be considered "Vertical". The vec2/3/4
  is to be considered as if the "single element".  Caveats exist for
-[[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled,
-due to the order in which VL and SUBVL loops are applied being
-swapped (outer-inner becomes inner-outer)
+[[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled, due
+to the order in which VL and SUBVL loops are applied being swapped
+(outer-inner becomes inner-outer)
  
  ## Examples
  
  ### Core concept loop
  
+This example illustrates the Cray-style Loop concept. However where most Cray
+Vectors have a Max Vector Length hard-coded into the architecture, Simple-V
+allows MVL to be set, but only as a static immediate, so that compilers may
+embed the register resource allocation statically at compile-time.
+
  ```
  loop:
      setvl a3, a0, MVL=8    #  update a3 with vl
                             # (# of elements this iteration)
-                           # set MVL to 8
+                           # set MVL to 8 and
+                           # set a3=VL=MIN(a0,MVL)
      # do vector operations at up to 8 length (MVL=8)
      # ...
-    sub a0, a0, a3   # Decrement count by vl
+    sub. a0, a0, a3   # Decrement count by vl, set CR0.eq
      bnez a0, loop    # Any more?
  ```
  
  ### Loop using Rc=1
  
+In this example, the `setvl.` instruction enabled Rc=1, which
+sets CR0.eq when VL becomes zero. Testing of `r4` (cmpi) is thus redundant
+saving one instruction.
+
+```
      my_fn:
        li r3, 1000
        b test
@@ -167,23 +183,28 @@ loop:
        bne cr0, loop
      end:
        blr
+```
  
  ### Load/Store-Multi (selective)
  
-Up to 64 FPRs will be loaded, here.  `r3` is set one per bit
-for each FP register required to be loaded.  The block of memory
-from which the registers are loaded is contiguous (no gaps):
-any FP register which has a corresponding zero bit in `r3`
-is *unaltered*.  In essence this is a selective LD-multi with
-"Scatter" capability.
+Up to 64 FPRs will be loaded, here.  `r3` is set one per bit for each
+FP register required to be loaded.  The block of memory from which the
+registers are loaded is contiguous (no gaps): any FP register which has
+a corresponding zero bit in `r3` is *unaltered*.  In essence this is a
+selective LD-multi with "Scatter" (`VCOMPRESS`) capability.
  
+```
      setvli r0, MVL=64, VL=64
      sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
+```
  
-Up to 64 FPRs will be saved, here.  Again, `r3` 
+Up to 64 FPRs will be saved, here.  Again, `r3` specifies which
+registers are set in a `VEXPAND` fashion.
  
+```
      setvli r0, MVL=64, VL=64
      sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
+```
  
  [[!tag standards]]