1 # Under consideration <a name="issues"></a>
3 for element-grouping, if there is unused space within a register
4 (3 16-bit elements in a 64-bit register for example), recommend:
6 * For the unused elements in an integer register, the used element
7 closest to the MSB is sign-extended on write and the unused elements
9 * The unused elements in a floating-point register are treated as-if
10 they are set to all ones on write and are ignored on read, matching the
11 existing standard for storing smaller FP values in larger registers.
13 > no, because it wastes space.
19 > One solution is to just not support LR/SC wider than a fixed
20 > implementation-dependent size, which must be at least
21 >1 XLEN word, which can be read from a read-only CSR
22 > that can also be used for info like the kind and width of
23 > hw parallelism supported (128-bit SIMD, minimal virtual
24 > parallelism, etc.) and other things (like maybe the number
25 > of registers supported).
27 > That CSR would have to have a flag to make a read trap so
28 > a hypervisor can simulate different values.
32 > And what about instructions like JALR?
34 answer: they're not vectorised, so not a problem
38 TODO: document different lengths for INT / FP regfiles, and provide
39 as part of info CSR register. 00=32, 01=64, 10=128, 11=reserved.
43 Could the 8 bit Register VBLOCK format use regnum<<1 instead, only accessing regs 0 to 64?
47 Expand the range of SUBVL and its associated svsrcoffs and svdestoffs by
48 adding a 2nd STATE CSR (or extending STATE to 64 bits). Future version?
52 TODO: evaluate - BRIEFLY (under 1 hour MAXIMUM) - why these rules exist,
53 by illustrating with pseudo-assembly DAXPY
55 1. Trap if imm > XLEN.
58 3. Else If regs[rs1] > 2 * imm, then
60 4. Else If regs[rs1] > imm, then
61 1. Set VL to regs[rs1] / 2 rounded down.
63 1. Set VL to regs[rs1].
64 6. Set regs[rd] to VL.
66 TODO: adapt to the above rules.
68 # a0 is n, a1 is pointer to x[0], a2 is pointer to y[0], fa0 is a
70 4: vsetdcfg t0 # enable 2 64b Fl.Pt. registers
72 8: setvl t0, a0 # vl = t0 = min(mvl, n)
73 c: vld v0, a1 # load vector x
74 10: slli t1, t0, 3 # t1 = vl * 8 (in bytes)
75 14: vld v1, a2 # load vector y
76 18: add a1, a1, t1 # increment pointer to x by vl*8
77 1c: vfmadd v1, v0, fa0, v1 # v1 += v0 * fa0 (y = a * x + y)
78 20: sub a0, a0, t0 # n -= vl (t0)
79 24: vst v1, a2 # store Y
80 28: add a2, a2, t1 # increment pointer to y by vl*8
81 2c: bnez a0, loop # repeat if n != 0
86 swizzle needs a MV. see [[mv.x]]