From e8ba996ec4635fae2e68b21ccba58df0264372d1 Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Fri, 19 Oct 2018 13:07:22 +0100
Subject: [PATCH] clarify polymorphic widths

---
 simple_v_extension/specification.mdwn | 82 ++++++++++++++++++++++-----
 1 file changed, 68 insertions(+), 14 deletions(-)

diff --git a/simple_v_extension/specification.mdwn b/simple_v_extension/specification.mdwn
index 2009aab21..f799ea9f0 100644
--- a/simple_v_extension/specification.mdwn
+++ b/simple_v_extension/specification.mdwn
@@ -1142,8 +1142,10 @@ Element bitwidth is best covered as its own special section, as it
 is quite involved and applies uniformly across-the-board.
 
 The effect of setting an element bitwidth is to re-cast each entry
-in the register table to a completely different width.  In c-style terms,
-on an RV64 architecture, effectively each register looks like this:
+in the register table, and for all memory operations involving
+load/stores of certain specific sizes, to a completely different width.
+Thus In c-style terms, on an RV64 architecture, effectively each register
+now looks like this:
 
     typedef union {
         uint8_t  b[8];
@@ -1155,25 +1157,36 @@ on an RV64 architecture, effectively each register looks like this:
     // integer table: assume maximum SV 7-bit regfile size
     reg_t int_regfile[128];
 
-However this hides the fact that setting VL greater than 8, for example,
-when the bitwidth is 8, accessing one specific register "spills over"
-to the following parts of the register file in a sequential fashion.
-So a much more accurate way to reflect this would be:
+where the CSR Register table entry (not the instruction alone) determines
+which of those union entries is to be used on each operation, and the
+VL element offset in the hardware-loop specifies the index into each array.
+
+However a naive interpretation of the data structure above masks the
+fact that setting VL greater than 8, for example, when the bitwidth is 8,
+accessing one specific register "spills over" to the following parts of
+the register file in a sequential fashion.  So a much more accurate way
+to reflect this would be:
 
     typedef union {
         uint8_t  actual_bytes[8]; // 8 for RV64, 4 for RV32, 16 for RV128
-        uint8_t  b[];
-        uint16_t s[];
-        uint32_t i[];
-        uint64_t l[];
-        uint128_t d[];
+        uint8_t  b[0]; // array of type uint8_t
+        uint16_t s[0];
+        uint32_t i[0];
+        uint64_t l[0];
+        uint128_t d[0];
     } reg_t;
 
     reg_t int_regfile[128];
 
-Where it is up to the implementor to ensure that, towards the end
-of the register file, an exception is thrown if attempts to access
-beyond the "real" register bytes is ever attempted.
+where when accessing any individual regfile[n].b entry it is permitted
+(in c) to arbitrarily over-run the *declared* length of the array (zero),
+and thus "overspill" to consecutive register file entries in a fashion
+that is completely transparent to a greatly-simplified software / pseudo-code
+representation.
+It is however critical to note that it is clearly the responsibility of
+the implementor to ensure that, towards the end of the register file,
+an exception is thrown if attempts to access beyond the "real" register
+bytes is ever attempted.
 
 Now we may modify pseudo-code an operation where all element bitwidths have
 been set to the same size, where this pseudo-code is otherwise identical
@@ -1312,6 +1325,8 @@ be clear that;
   stored in the destination.  i.e. truncation (if required) to the
   destination width occurs **after** the operation **not** before.
 
+## Polymorphic floating-point operation exceptions and error-handling
+
 For floating-point operations, conversion takes place without
 raising any kind of exception.  Exactly as specified in the standard
 RV specification, NAN (or appropriate) is stored if the result
@@ -1329,6 +1344,45 @@ provide hardware-level 8-bit support rather than throw a trap to emulate
 in software should contact the author of this specification before
 proceeding.
 
+## Polymorphic shift operators
+
+A special note is needed for changing the element width of left and right
+shift operators, particularly right-shift.  Even for standard RV base,
+in order for correct results to be returned, the second operand RS2 must
+be truncated to be within the range of RS1's bitwidth.  spike's implementation
+of sll for example is as follows:
+
+    WRITE_RD(sext_xlen(zext_xlen(RS1) << (RS2 & (xlen-1))));
+
+which means: where XLEN is 32 (for RV32), restrict RS2 to cover the
+range 0..31 so that RS1 will only be left-shifted by the amount that
+is possible to fit into a 32-bit register.  Whilst this appears not
+to matter for hardware, it matters greatly in software implementations,
+and it also matters where an RV64 system is set to "RV32" mode, such
+that the underlying registers RS1 and RS2 comprise 64 hardware bits
+each.
+
+For SV, where each operand's element bitwidth may be over-ridden, the
+rule about determining the operation's bitwidth *still applies*, being
+defined as the maximum bitwidth of RS1 and RS2.  *However*, this rule
+**also applies to the truncation of RS2**.  In other words, *after*
+determining the maximum bitwidth, RS2's range must **also be truncated**
+to ensure a correct answer.  Example:
+
+* RS1 is over-ridden to a 16-bit width
+* RS2 is over-ridden to an 8-bit width
+* RD is over-ridden to a 64-bit width
+* the maximum bitwidth is thus determined to be 16-bit - max(8,16)
+* RS2 is **truncated to a range of values from 0 to 15**: RS2 & (16-1)
+
+Pseudocode for this example would therefore be:
+
+    WRITE_RD(sext_xlen(zext_16bit(RS1) << (RS2 & (16-1))));
+
+This example illustrates that considerable care therefore needs to be
+taken to ensure that left and right shift operations are implemented
+correctly.
+
 # Exceptions
 
 TODO: expand.  Exceptions may occur at any time, in any given underlying
-- 
2.30.2