+\frame{\frametitle{Why are overlaps allowed in Regfiles?}
+
+ \begin{itemize}
+ \item Same register(s) can have multiple "interpretations"\vspace{6pt}
+ \item xBitManip plus SIMD plus xBitManip = Hi/Lo bitops\vspace{6pt}
+ \item (32-bit GREV plus 4x8-bit SIMD plus 32-bit GREV)\vspace{6pt}
+ \item RGB 565 (video): BEXTW plus 4x8-bit SIMD plus BDEPW\vspace{6pt}
+ \item Same register(s) can be offset (no need for VSLIDE)\vspace{6pt}
+ \end{itemize}
+ Note:\vspace{10pt}
+ \begin{itemize}
+ \item xBitManip reduces O($N^{6}$) SIMD down to O($N^{3}$)
+ \item Hi-Performance: Macro-op fusion (more pipeline stages?)
+ \end{itemize}
+}
+
+
+\frame{\frametitle{Why no Zeroing (place zeros in non-predicated elements)?}
+
+ \begin{itemize}
+ \item Zeroing is an implementation optimisation favouring OoO\vspace{8pt}
+ \item Simple implementations may skip non-predicated operations\vspace{8pt}
+ \item Simple implementations explicitly have to destroy data\vspace{8pt}
+ \item Complex implementations may use reg-renames to save power\\
+ Zeroing on predication chains makes optimisation harder
+ \end{itemize}
+ Considerations:\vspace{10pt}
+ \begin{itemize}
+ \item Complex not really impacted, Simple impacted a LOT
+ \item Overlapping "Vectors" may issue overlapping ops
+ \item Please don't use Vectors for "security" (use Sec-Ext)
+ \end{itemize}
+}
+% with overlapping "vectors" - bearing in mind that "vectors" are
+% just a remap onto the standard register file, if the top bits of
+% predication are zero, and there happens to be a second vector
+% that uses some of the same register file that happens to be
+% predicated out, the second vector op may be issued *at the same time*
+% if there are available parallel ALUs to do so.
+
+
+\frame{\frametitle{Predication key-value CSR store}
+
+ \begin{itemize}
+ \item key is int regfile number or FP regfile number (1 bit)\vspace{6pt}
+ \item register to be predicated if referred to (5 bits, key)\vspace{6pt}
+ \item register to store actual predication in (5 bits, value)\vspace{6pt}
+ \item predication is inverted (1 bit)\vspace{6pt}
+ \item non-predicated elements are to be zero'd (1 bit)\vspace{6pt}
+ \end{itemize}
+ Notes:\vspace{10pt}
+ \begin{itemize}
+ \item Table should be expanded out for high-speed implementations
+ \item Multiple "keys" (and values) theoretically permitted
+ \item RVV rules about deleting higher-indexed CSRs followed
+ \end{itemize}
+}
+
+
+\frame{\frametitle{Register key-value CSR store}
+
+ \begin{itemize}
+ \item key is int regfile number or FP regfile number (1 bit)\vspace{6pt}
+ \item treated as vector if referred to in op (5 bits, key)\vspace{6pt}
+ \item starting register to actually be used (5 bits, value)\vspace{6pt}
+ \item element bitwidth: default/8/16/32/64/rsvd (3 bits)\vspace{6pt}
+ \item element type: still under consideration\vspace{6pt}
+ \end{itemize}
+ Notes:\vspace{10pt}
+ \begin{itemize}
+ \item Same notes apply (previous slide) as for predication CSR table
+ \item Level of indirection has implications for pipeline latency
+ \end{itemize}
+}
+
+
+\frame{\frametitle{C.MV extremely flexible!}
+
+ \begin{itemize}
+ \item scalar-to-vector (w/no pred): VSPLAT
+ \item scalar-to-vector (w/dest-pred): Sparse VSPLAT
+ \item scalar-to-vector (w/single dest-pred): VINSERT
+ \item vector-to-scalar (w/src-pred): VEXTRACT
+ \item vector-to-vector (w/no pred): Vector Copy
+ \item vector-to-vector (w/src xor dest pred): Sparse Vector Copy
+ \item vector-to-vector (w/src and dest pred): Vector Gather/Scatter
+ \end{itemize}
+ \vspace{8pt}
+ Notes:\vspace{10pt}
+ \begin{itemize}
+ \item Really powerful!
+ \item Any other options?
+ \end{itemize}
+}
+
+