pseudo-code would look like this:
function op_load(rd, rs) # LD not VLD!
- rd = int_csr[rd].active ? int_csr[rd].regidx : rd;
- rs = int_csr[rs].active ? int_csr[rs].regidx : rs;
+ rdv = int_csr[rd].active ? int_csr[rd].regidx : rd;
+ rsv = int_csr[rs].active ? int_csr[rs].regidx : rs;
ps = get_pred_val(FALSE, rs); # predication on src
pd = get_pred_val(FALSE, rd); # ... AND on dest
for (int i = 0, int j = 0; i < VL && j < VL;):
if (int_csr[rs].isvec) while (!(ps & 1<<i)) i++;
if (int_csr[rd].isvec) while (!(pd & 1<<j)) j++;
if (int_csr[rd].isvec)
- srcbase = ireg[rs+i];
+ srcbase = ireg[rsv+i];
else
- srcbase = ireg[rs] + i * XLEN/8; # offset in bytes
- ireg[rd+j] <= mem[srcbase + imm_offs];
+ srcbase = ireg[rsv] + i * XLEN/8; # offset in bytes
+ ireg[rdv+j] <= mem[srcbase + imm_offs];
if (!int_csr[rs].isvec &&
!int_csr[rd].isvec) break # scalar-scalar LD
if (int_csr[rs].isvec) i++;
if (int_csr[rd].isvec) j++;
-The test for whether both source and destination are scalar is
-what makes the above pseudo-code provide the "standard" RV Base
-behaviour for LD operations. The offset in bytes (XLEN/8)
-changes depending on whether the operation is a LB (1 byte),
-LH (2 byes), LW (4 bytes) or LD (8 bytes), and also whether the element
-width is over-ridden (see special element width section).
+Notes:
+
+* For simplicity, zeroing and elwidth is not included in the above:
+ the key focus here is the decision-making for srcbase; vectorised
+ rs means use sequentially-numbered registers as the indirection
+ address, and scalar rs is "offset" mode.
+* The test towards the end for whether both source and destination are
+ scalar is what makes the above pseudo-code provide the "standard" RV
+ Base behaviour for LD operations.
+* The offset in bytes (XLEN/8) changes depending on whether the
+ operation is a LB (1 byte), LH (2 byes), LW (4 bytes) or LD
+ (8 bytes), and also whether the element width is over-ridden
+ (see special element width section).
## Compressed Stack LOAD / STORE Instructions <a name="c_ld_st"></a>