openpower/sv/setvl.mdwn

   1 [[!tag standards]]
   2
   3 # OpenPOWER SV setvl/setvli
   4
   5 See links:
   6
   7 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
   8 * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
   9 * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
  10 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
  11
  12 Use of setvl results in changes to the MVL, VL and STATE SPRs. see [[sv/sprs]]♧
  13
  14 # Behaviour and Rationale
  15
  16 SV's Vector Engine is based on Cray-style Variable-length Vectorisation,
  17 just like RVV.  However unlike RVV, SV sits on top of the standard Scalar
  18 regfiles: there is no separate Vector register numbering.  Therefore, also
  19 unlike RVV, SV does not have hard-coded "Lanes".  The relevant parameter
  20 in RVV is "MAXVL" and this is architecturally hard-coded into RVV systems,
  21 anywhere from 1 to tens of thousands of Lanes in supercomputers.
  22
  23 SV is more like how MMX used to sit on top of the x86 FP regfile.  Therefore
  24 when Vector operations are performed, the question has to be asked, "well,
  25 how much of the regfile do you want to allocate to this operation?" because if it is too small an amount performance may be affected, and if too large then other registers would overlap and cause data  corruption, or even if allocated correctly would require spill to memory.
  26
  27 The answer effectively needs to be parameterised.  Hence: MAXVL
  28 (MVL) is set from an immediate, so that the compiler may decide, statically, a guaranteed resource allocation according to the needs of the application.
  29
  30 Other than being able to set MVL, SV's VL (Vector Length) works just like RVV's VL, with one minor twist.  RVV permits the `setvl` instruction to set VL to an arbitrary value.  Given that RVV only works on Vector Loops, this is fine and part of its value and design.  However, SV sits on top of the standard register files.  When MVL=VL=2, a Vector Add on `r3` will perform two Scalar Adds: one on `r3` and one on `r4`.
  31
  32 Thus there is the opportunity to set VL to an explicit value (within the limits of MVL) with the reasonable expectation that if two operations are requested (by setting VL=2) then two operations are guaranteed.  This avoids the need for a loop (with not-insignificant use of the regfiles for counters), simply two
  33 instructions:
  34
  35     setvli r0, MVL=64, VL=64
  36     ld r0.v, 0(r30) # load 64 registers from memory
  37
  38 Page Faults etc. aside this is *guaranteed* 100% without fail to perform 64 unit-strided LDs starting from the address pointed to by r30 and put the contents into r0 through r63.  Thus it becomes a "LOAD-MULTI". Twin Predication could even be used to only load relevant registers from the stack.  This *only works if VL is set to the requested value* (caveat being, limited to not exceed MVL)
  39
  40 # Format
  41
  42 *(Allocation of opcode TBD pending OPF ISA WG approval)*
  43
  44 | 0.5|6.10|11.15|16.20| 21..24.25   | 26...30 |31|  name   |
  45 | -- | -- | --- | --- | ----------- | ------- |--| ------- |
  46 | 19 | RT | RA  |     | XO[0:4]     | XO[5:9] |Rc| XL-Form |
  47 | 19 | RT | RA  | imm | i //  vs ms | NNNNN   |Rc| setvl   |
  48
  49 Note that imm spans 7 bits (16 to 22), and that bit 22 is reserved and must be zero.  Setting bit 22 causes an illegal exception.
  50
  51 Note that in immediate setting mode VL and MVL start from **one** i.e. that an immediate value of zero will result in VL/MVL being set to 1.  0b111111 results in VL/MVL being set to 64. This is because setting VL/MVL to 1 results in "scalar identity" behaviour, where setting VL/MVL to 0 would result in all Vector operations becoming `nop`.  If this is truly desired (nop behaviour) then setting VL and MVL to zero is to be done via the [[SV SPRs|sv/sprs]]
  52
  53 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
  54
  55     setvli VL=8    : setvl r5, r0, VL=8
  56     setmvli MVL=8  : setvl r0, r0, MVL=8
  57
  58 Additional pseudo-op for obtaining VL without modifying it:
  59
  60     getvl r5       : setvl r5, r0, vs=0, ms=0
  61
  62 Note that whilst it is possible to set both MVL and VL from the same immediate, it is not possible to set them to different immediates in the same instruction.  That would require two instructions.
  63
  64 # Pseudocode
  65
  66     // instruction fields:
  67     rd = get_rt_field();         // bits 6..10
  68     ra = get_ra_field();         // bits 11..15
  69     vs = get_vs_field();         // bit 24
  70     ms = get_ms_field();         // bit 25
  71     Rc = get_Rc_field();         // bit 31
  72     // add one. MVL/VL=1..64 not 0..63
  73     vlimmed = get_immed_field()+1; //  16..22
  74
  75     // set VL (or not).
  76     // 3 options: from SPR, from immed, from ra
  77     if vs {
  78        // VL to be sourced from fields/regs
  79        if ra != 0 {
  80            VL = GPR[ra]
  81        } else {
  82            VL = vlimmed
  83        }
  84     } else {
  85        // VL not to change (except if MVL is reduced)
  86        // read from SPRs
  87        VL = SPR[SV_VL]
  88     }
  89
  90     // set MVL (or not).
  91     // 2 options: from SPR, from immed
  92     if ms {
  93        MVL = vlimmed
  94     } else {
  95        // MVL not to change, read from SPRs
  96        MVL = SPR[SV_MVL]
  97     }
  98
  99     // calculate (limit) VL
 100     VL = min(VL, MVL)
 101
 102     // store VL, MVL
 103     SPR[SV_VL] = VL
 104     SPR[SV_MVL] = MVL
 105
 106     // write rd
 107     if rt != 0 {
 108         // rt is not zero
 109         regs[rt] = VL;
 110     }
 111     // write CR?
 112     if Rc {
 113         // update CR from VL (not rt)
 114         CR0.eq = (VL == 0)
 115         ...
 116         ...
 117     }
 118
 119 # Examples
 120
 121 ## Core concept loop
 122
 123     loop:
 124     setvl a3, a0, MVL=8    #  update a3 with vl
 125                            # (# of elements this iteration)
 126                            # set MVL to 8
 127     # do vector operations at up to 8 length (MVL=8)
 128     # ...
 129     sub a0, a0, a3   # Decrement count by vl
 130     bnez a0, loop    # Any more?
 131
 132 ## Loop using Rc=1
 133
 134     my_fn:
 135       li r3, 1000
 136       b test
 137     loop:
 138       sub r3, r3, r4
 139       ...
 140     test:
 141       setvli. r4, r3, MVL=64
 142       bne cr0, loop
 143     end:
 144       blr