From 92168e48bf273c1a462c6046c09473d7b2028006 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 20 Nov 2018 15:57:16 +0000 Subject: [PATCH] update LD/ST section --- simple_v_extension/specification.mdwn | 60 ++++++++++++++++++++------- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/simple_v_extension/specification.mdwn b/simple_v_extension/specification.mdwn index 25be844af..5e038cf10 100644 --- a/simple_v_extension/specification.mdwn +++ b/simple_v_extension/specification.mdwn @@ -1251,7 +1251,8 @@ Similar rules apply to the destination register. ## LOAD / STORE Instructions and LOAD-FP/STORE-FP -An earlier draft of SV modified the behaviour of LOAD/STORE. This +An earlier draft of SV modified the behaviour of LOAD/STORE (modified +the interpretation of the instruction fields). This actually undermined the fundamental principle of SV, namely that there be no modifications to the scalar behaviour (except where absolutely necessary), in order to simplify an implementor's task if considering @@ -1259,21 +1260,48 @@ converting a pre-existing scalar design to support parallelism. So the original RISC-V scalar LOAD/STORE and LOAD-FP/STORE-FP functionality do not change in SV, however just as with C.MV it is important to note -that dual-predication is possible. Using the template outlined in -the section "Vectorised dual-op instructions", the pseudo-code covering -scalar-scalar, scalar-vector, vector-scalar and vector-vector applies, -where SCALAR\_OPERATION is as follows, exactly as for a standard -scalar RV LOAD operation: - - srcbase = ireg[rs+i]; - return mem[srcbase + imm]; - -Whilst LOAD and STORE remain as-is when compared to their scalar -counterparts, the incrementing on the source register (for LOAD) -means that pointers-to-structures can be easily implemented, and -if contiguous offsets are required, those pointers (the contents -of the contiguous source registers) may simply be set up to point -to contiguous locations. +that dual-predication is possible. + +In vectorised architectures there are usually at least two different modes +for LOAD/STORE: + +* Read (or write for STORE) from sequential locations, where one + register specifies the address, and the one address is incremented + by a fixed amount. +* Read (or write) from multiple indirected addresses, where the + vector elements each specify separate and distinct addresses. + +To support these different addressing modes, the CSR "isvector" +bit is used. So, for a LOAD, when the src register is set to +scalar, the LOADs are sequentially incremented by the src register +element width, and when the src register is set to "vector", the +elements are treated as indirection addresses. Simplified +pseudo-code would look like this: + + function op_load(rd, rs) # LD not VLD! +  rd = int_csr[rd].active ? int_csr[rd].regidx : rd; +  rs = int_csr[rs].active ? int_csr[rs].regidx : rs; +  ps = get_pred_val(FALSE, rs); # predication on src +  pd = get_pred_val(FALSE, rd); # ... AND on dest +  for (int i = 0, int j = 0; i < VL && j < VL;): + if (int_csr[rs].isvec) while (!(ps & 1< -- 2.30.2