From 747b1106f2cfd26a6b249482982b87206f269f8d Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Sat, 14 Apr 2018 14:21:24 +0100
Subject: [PATCH] update

---
 simple_v_extension.mdwn | 42 +++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)
diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn
index 88cc9ed1d..55e48ece5 100644
--- a/simple_v_extension.mdwn
+++ b/simple_v_extension.mdwn
@@ -2,6 +2,22 @@
 
 [[!toc ]]
 
+# Summary
+
+Key insight: Simple-V is intended as an abstraction layer to provide
+a consistent "API" to parallelisation of existing *and future* operations.
+*Actual* internal hardware-level parallelism is *not* required, such
+that Simple-V may be viewed as providing a "compact" or "consolidated"
+means of issuing multiple near-identical arithmetic instructions to an
+instruction FIFO, pending execution.
+
+*Actual* parallelism, if added independently of Simple-V in the form
+of Out-of-order restructuring (including parallel ALU lanes) or VLIW
+implementations, or SIMD, or anything else, would then benefit *if*
+Simple-V was added on top.
+
+# Introduction
+
 This proposal exists so as to be able to satisfy several disparate
 requirements: power-conscious, area-conscious, and performance-conscious
 designs all pull an ISA and its implementation in different conflicting
@@ -1034,7 +1050,7 @@ translates effectively to:
 
 # Register reordering <a name="register_reordering"></a>
 
-Register File 
+## Register File 
 
 | Reg Num | Bits |
 | ------- | ---- |
@@ -1047,13 +1063,16 @@ Register File
 | r6 | (32..0) |
 | r7 | (32..0) |
 
-Vectorised CSR
+## Vectorised CSR
+
+May not be an actual CSR: may be generated from Vector Length CSR:
+single-bit is less burdensome on instruction decode phase.
 
 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
 | - | - | - | - | - | - | - | - |  
 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
 
-Vector Length CSR
+## Vector Length CSR
 
 | Reg Num | (3..0) |
 | ------- | ---- |
@@ -1066,7 +1085,7 @@ Vector Length CSR
 | r6 | 0 |
 | r7 | 1 |
 
-Virtual Register Reordering:
+## Virtual Register Reordering:
 
 | Reg Num | Bits (0) | Bits (1) | Bits (2) |
 | ------- | -------- | -------- | -------- |
@@ -1076,6 +1095,17 @@ Virtual Register Reordering:
 | r4 | (32..0) | (32..0) | (32..0) |
 | r7 | (32..0) |
 
+## Example Instruction translation: <a name="example_translation"></a>
+
+Instructions "ADD r2 r4 r4" would result in three instructions being
+generated and placed into the FIFO:
+
+* ADD r2 r4 r4
+* ADD r2 r5 r5
+* ADD r2 r6 r6
+
+## Insights 
+
 SIMD register file splitting still to consider.  For RV64, benefits of doubling
 (quadrupling in the case of Half-Precision IEEE754 FP) the apparent
 size of the floating point register file to 64 (128 in the case of HP)
@@ -1087,10 +1117,6 @@ be achieved by *actually* splitting the regfile into 64 virtual 32-bit
 registers such that a 64-bit FP scalar operation is dropped into (r0.H
 r0.L) tuples.Â  Implementation therefore hidden through register renaming.
 
-Instructions "ADD r2 r4 r4" would result in three instructions being
-generated and placed into the FIFO: ADD r2 r4 r4; ADD r2 r5 r5;
-ADD r2 r6 r6;
-
 Implementations intending to introduce VLIW, OoO and parallelism
 (even without Simple-V) would then find that the instructions are
 generated quicker (or in a more compact fashion that is less heavy
-- 
2.30.2