From 4af323d95355a1a0c60c8ae96b130d7256337583 Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Tue, 22 May 2018 09:48:44 +0100
Subject: [PATCH] add slide

---
 simple_v_extension.mdwn                      | 30 ++++++++++++++++++++
 simple_v_extension/simple_v_chennai_2018.tex | 10 ++++---
 2 files changed, 36 insertions(+), 4 deletions(-)
diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn
index 53f75f490..51fe43d14 100644
--- a/simple_v_extension.mdwn
+++ b/simple_v_extension.mdwn
@@ -1774,6 +1774,36 @@ discussion then led to the question of OoO architectures
 > relevant, is that the imprecise model increases the size of the context
 > structure, as the microarchitectural guts have to be spilled to memory.)
 
+-----
+
+>> > Â it just occurred to me that there's another reason why the data
+>> > should be left instead of zeroed. Â if the standard register file is
+>> > used, such that vectorised operations are translated to mean "please
+>> > insert multiple register-contiguous operations into the instruction
+>> > FIFO" and predication is used to *skip* some of those, then if the
+>> > next "vector" operation uses the (standard) registers that were masked
+>> > *out* of the previous operation it may proceed without blocking.
+>> >
+>> > Â if however zeroing is made mandatory then that optimisation becomes
+>> > flat-out impossible to deploy.
+>> >
+>> > Â whilst i haven't fully thought through the full implications, i
+>> > suspect RVV might also be able to benefit by being able to fit more
+>> > overlapping operations into the available SRAM by doing something
+>> > similar.
+>
+>
+> Luke, this is called density time masking. It doesnât apply to only your
+> model with the âstandard register fileâ is used. it applies to any
+> architecture that attempts to speed up by skipping computation and writeback
+> of masked elements.
+>
+> That said, the writing of zeros need not be explicit. It is possible to add
+> a âzero bitâ per element that, when set, forces a zero to be read from the
+> vector (although the underlying storage may have old data). In this case,
+> there may be a way to implement DTM as well.
+
+
 
 ## Implementation Paradigms <a name="implementation_paradigms"></a>
 
diff --git a/simple_v_extension/simple_v_chennai_2018.tex b/simple_v_extension/simple_v_chennai_2018.tex
index 0d0a47759..114b67912 100644
--- a/simple_v_extension/simple_v_chennai_2018.tex
+++ b/simple_v_extension/simple_v_chennai_2018.tex
@@ -28,7 +28,8 @@
  \begin{itemize}
    \item The Designers of RISC-V\vspace{15pt}
    \item The RVV Working Group and contributors\vspace{15pt}
-   \item Jacob Bachmeyer, Xan Phung, Chuanhua Chang and others\vspace{15pt}
+   \item Jacob Bachmeyer, Xan Phung, Chuanhua Chang,\\
+	     Guy Lemurieux and others\vspace{15pt}
    \item ISA-Dev Group Members\vspace{10pt}
   \end{itemize}
 }
@@ -165,9 +166,10 @@
   \end{itemize}
   Key differences from RVV:\vspace{10pt}
    \begin{itemize}
-   \item Predication in INT regs as a BIT field (max VL=XLEN)\vspace{10pt}
-   \item Minimum VL must be Num Regs - 1 (all regs single LD/ST)\vspace{10pt}
-   \item NO ZEROING: non-predicated elements are skipped\vspace{10pt}
+   \item Predication in INT regs as a BIT field (max VL=XLEN)
+   \item Minimum VL must be Num Regs - 1 (all regs single LD/ST)
+   \item SV may condense sparse Vecs: RVV lets ALU do predication
+   \item NO ZEROING: non-predicated elements are skipped
   \end{itemize}
 }
 
-- 
2.30.2