add SIMD/bitwidth explanation

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Tue, 17 Apr 2018 01:37:07 +0000 (02:37 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Tue, 17 Apr 2018 01:37:07 +0000 (02:37 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Tue, 17 Apr 2018 01:37:07 +0000 (02:37 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Tue, 17 Apr 2018 01:37:07 +0000 (02:37 +0100)
diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn

index e8b73affe67f5dd54a647e466bed52fab27afbe1..73c47dc43b585a68f01170ee639296b7ca9c02ec 100644 (file)
--- a/simple_v_extension.mdwn
+++ b/simple_v_extension.mdwn
@@ -84,7 +84,7 @@ standard RV instructions and overflow registers extremely powerful
  
  # Analysis and discussion of Vector vs SIMD
  
-There are four combined areas between the two proposals that help with
+There are five combined areas between the two proposals that help with
  parallelism without over-burdening the ISA with a huge proliferation of
  instructions:
  
@@ -547,6 +547,9 @@ vew may be one of the following (giving a table "bytestable", used below):
  | 110 | rsvd     |
  | 111 | rsvd     |
  
+Extending this table (with extra bits) is covered in the section
+"Implementing RVV on top of Simple-V".
+
  Note that when using the "vsetl rs1, rs2" instruction, taking bitwidth
  into account, it becomes:
  
@@ -559,47 +562,34 @@ into account, it becomes:
      vlen = CSRvectorlen[rs1] * simdmult
      CSRvlength = MIN(MIN(vlen, MAXVECTORDEPTH), rs2)
  
-## Stride
-
-**TODO**: propose two LOAD/STORE offset CSRs, which mark a particular
-register as being "if you use this reg in LOAD/STORE, use the offset
-amount CSRoffsN (N=0,1) instead of treating LOAD/STORE as contiguous".
-can be used for matrix spanning.
-
-> For LOAD/STORE, could a better option be to interpret the offset in the
-> opcode as a stride instead, so "LOAD t3, 12(t2)" would, if t3 is
-> configured as a length-4 vector base, result in t3 = *t2, t4 = *(t2+12),
-> t5 = *(t2+24), t6 = *(t2+32)?  Perhaps include a bit in the
-> vector-control CSRs to select between offset-as-stride and unit-stride
-> memory accesses?
-
-So there would be an instruction like this:
-
-| SETOFF | On=rN | OBank={float|int} | Smode={offs|unit} | OFFn=rM |
-| opcode | 5 bit | 1 bit             | 1 bit             | 5 bit, OFFn=XLEN |
+The reason for multiplying the vector length by the number of SIMD elements
+(in each individual register) is so that each SIMD element may optionally be
+predicated.
  
+Example:
  
-which would mean:
+* RV32 assumed
+* CSRintbitwidth[2] = 010 # integer r2 is 16-bit
+* CSRintvlength[2] = 3 # integer r2 is a vector of length 3
+* vsetl rs1, 5 # set the vector length to 5
  
-* CSR-Offset register n <= (float|int) register number N
-* CSR-Offset Stride-mode = offset or unit
-* CSR-Offset amount register n = contents of register M
+This is interpreted as follows:
  
-LOAD rN, ldoffs(rM) would then be (assuming packed bit-width not set):
+* Given that the context is RV32, ELEN=32.
+* With ELEN=32 and bitwidth=16, the number of SIMD elements is 2
+* Therefore the actual vector length is up to *six* elements
  
-    offs = 0
-    stride = 1
-    vector-len = CSR-Vector-length register N
+So when using an operation that uses r2 as a source (or destination)
+the operation is carried out as follows:
  
-    for (o = 0, o < 2, o++)
-      if (CSR-Offset register o == M)
-          offs = CSR-Offset amount register o
-          if CSR-Offset Stride-mode == offset:
-              stride = ldoffs
-          break
+* 16-bit operation on r2(15..0) - vector element index 0
+* 16-bit operation on r2(31..16) - vector element index 1
+* 16-bit operation on r3(15..0) - vector element index 2
+* 16-bit operation on r3(31..16) - vector element index 3
+* 16-bit operation on r4(15..0) - vector element index 4
+* 16-bit operation on r4(31..16) **NOT** carried out due to length being 5
  
-    for (i = 0, i < vector-len; i++)
-      r[N+i] = mem[(offs*i + r[M+i])*stride]
+Predication has been left out of the above example for simplicity.
  
  # Example of vector / vector, vector / scalar, scalar / scalar => vector add
  
@@ -1468,6 +1458,7 @@ the question is asked "How can each of the proposals effectively implement
  * Extra register file: vector-file
  * Setup of Vector length and bitwidth CSRs now can specify vector-file
    as well as integer or float file.
+* Extend CSR tables (bitwidth) with extra bits
  * TODO
  
  # Implementing P (renamed to DSP) on top of Simple-V
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Tue, 17 Apr 2018 01:37:07 +0000 (02:37 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Tue, 17 Apr 2018 01:37:07 +0000 (02:37 +0100)