From 7bb8074894a520529b71020edff02350326f1546 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Thu, 22 Nov 2018 01:30:38 +0000 Subject: [PATCH] update load/store --- simple_v_extension/specification.mdwn | 38 +++++++-------------------- 1 file changed, 9 insertions(+), 29 deletions(-) diff --git a/simple_v_extension/specification.mdwn b/simple_v_extension/specification.mdwn index 5e93369f5..b5392e716 100644 --- a/simple_v_extension/specification.mdwn +++ b/simple_v_extension/specification.mdwn @@ -1278,7 +1278,7 @@ element width, and when the src register is set to "vector", the elements are treated as indirection addresses. Simplified pseudo-code would look like this: - function op_load(rd, rs) # LD not VLD! + function op_ld(rd, rs) # LD not VLD!  rdv = int_csr[rd].active ? int_csr[rd].regidx : rd;  rsv = int_csr[rs].active ? int_csr[rs].regidx : rs;  ps = get_pred_val(FALSE, rs); # predication on src @@ -1313,47 +1313,27 @@ Notes: ## Compressed Stack LOAD / STORE Instructions C.LWSP / C.SWSP and floating-point etc. are also source-dest twin-predicated, -where it is implicit in C.LWSP/FLWSP that x2 is the source register. +where it is implicit in C.LWSP/FLWSP etc. that x2 is the source register. It is therefore possible to use predicated C.LWSP to efficiently pop registers off the stack (by predicating x2 as the source), cherry-picking which registers to store to (by predicating the destination). Likewise for C.SWSP. In this way, LOAD/STORE-Multiple is efficiently achieved. -However, to do so, the behaviour of C.LWSP/C.SWSP needs to be slightly -different: where x2 is marked as vectorised, instead of incrementing -the register on each loop (x2, x3, x4...), instead it is the *immediate* -that must be incremented. Pseudo-code follows: - - function lwsp(rd, rs): -  rd = int_csr[rd].active ? int_csr[rd].regidx : rd; -  rs = x2 # effectively no redirection on x2. -  ps = get_pred_val(FALSE, rs); # predication on src -  pd = get_pred_val(FALSE, rd); # ... AND on dest -  for (int i = 0, int j = 0; i < VL && j < VL;): - if (int_csr[rs].isvec) while (!(ps & 1< -- 2.30.2