bug 676: noted a way to reduce the number of instructions
[libreriscv.git] / openpower / sv / ldst / discussion.mdwn
1 # notes from lxo
2
3 this section covers assembly notation for the immediate and indexed LD/ST.
4 the summary is that in immediate mode for LD it is not clear that if the
5 destination register is Vectorized `RT.v` but the source `imm(RA)` is scalar
6 the memory being read is *still a vector load*, known as "unit or element strides".
7
8 This anomaly is made clear with the following notation:
9
10 sv.ld RT.v, imm(RA).v
11
12 The following notation, although technically correct due to being implicitly identical to the above, is prohibited and is a syntax error:
13
14 sv.ld RT.v, imm(RA)
15
16 Notes taken from IRC conversation
17
18 <lxo> sv.ld r#.v, ofst(r#).v -> the whole vector is at ofst+r#
19 <lxo> sv.ld r#.v, ofst(r#.v) -> r# is a vector of addresses
20 <lxo> similarly sv.ldx r#.v, r#, r#.v -> whole vector at r#+r#
21 <lxo> whereas sv.ldx r#.v, r#.v, r# -> vector of addresses
22 <lxo> point being, you take an operand with the "m" constraint (or other memory-operand constraints), append .v to it and you're done addressing the in-memory vector
23 <lxo> as in asm ("sv.ld1 %0.v, %1.v" : "=r"(vec_in_reg) : "m"(vec_in_mem));
24 <lxo> (and ld%U1 got mangled into underline; %U expands to x if the address is a sum of registers
25
26 permutations of vector selection, to identify above asm-syntax:
27
28 imm(RA) RT.v RA.v nonstrided
29 sv.ld r#.v, ofst(r#2.v) -> r#2 is a vector of addresses
30 mem@ 0+r#2 offs+(r#2+1) offs+(r#2+2)
31 destreg r# r#+1 r#+2
32 imm(RA) RT.s RA.v nonstrided
33 sv.ld r#, ofst(r#2.v) -> r#2 is a vector of addresses
34 (dest r# is scalar) -> VSELECT mode
35 imm(RA) RT.v RA.s fixed stride: unit or element
36 sv.ld r#.v, ofst(r#2).v -> whole vector is at ofst+r#2
37 mem@r#2 +0 +1 +2
38 destreg r# r#+1 r#+2
39 sv.ld/els r#.v, ofst(r#2).v -> vector at ofst*elidx+r#2
40 mem@r#2 +0 ... +offs ... +offs*2
41 destreg r# r#+1 r#+2
42 imm(RA) RT.s RA.s not vectorized
43 sv.ld r#, ofst(r#2)
44
45 indexed mode:
46
47 RA,RB RT.v RA.v RB.v
48 sv.ldx r#.v, r#2, r#3.v -> whole vector at r#2+r#3
49 RA,RB RT.v RA.s RB.v
50 sv.ldx r#.v, r#2.v, r#3.v -> whole vector at r#2+r#3
51 RA,RB RT.v RA.v RB.s
52 sv.ldx r#.v, r#2.v, r#3 -> vector of addresses
53 RA,RB RT.v RA.s RB.s
54 sv.ldx r#.v, r#2, r#3 -> VSPLAT mode
55 RA,RB RT.s RA.v RB.v
56 RA,RB RT.s RA.s RB.v
57 RA,RB RT.s RA.v RB.s
58 RA,RB RT.s RA.s RB.s not vectorized