\begin{itemize}
\item Extremely powerful (extensible to 256 registers)\vspace{10pt}
\item Supports polymorphism, several datatypes (inc. FP16)\vspace{10pt}
- \item Requires a separate Register File (16 w/ext to 256)\vspace{10pt}
+ \item Requires a separate Register File (32 w/ext to 256)\vspace{10pt}
\item Implemented as a separate pipeline (no impact on scalar)\vspace{10pt}
\end{itemize}
However...\vspace{10pt}
Note: EVERYTHING is parallelised:
\begin{itemize}
\item All LOAD/STORE (inc. Compressed, Int/FP versions)
- \item All ALU ops (soft / hybrid / full HW, on per-op basis)
+ \item All ALU ops (Int, FP, SIMD, DSP, everything)
\item All branches become predication targets (C.FNE added?)
\item C.MV of particular interest (s/v, v/v, v/s)
\item FCVT, FMV, FSGNJ etc. very similar to C.MV
\item If s1 and s2 both scalars, Standard branch occurs
\item Predication stored in integer regfile as a bitfield
\item Scalar-vector and vector-vector supported
+ \item Overload Branch immediate to be predication target rs3
\end{itemize}
\end{frame}
\vspace{4pt}
Notes:
\begin{itemize}
- \item Surprisingly powerful!
+ \item Surprisingly powerful! Zero-predication even more so
\item Same arrangement for FVCT, FMV, FSGNJ etc.
\end{itemize}
}
(scalar ops are just vectors of length 1)\vspace{4pt}
\item Tightly coupled with the core (instruction issue)\\
could be disabled through MISA switch\vspace{4pt}
- \item An extra pipeline phase is pretty much essential\\
+ \item An extra pipeline phase almost certainly essential\\
for fast low-latency implementations\vspace{4pt}
\item With zeroing off, skipping non-predicated elements is hard:\\
it is however an optimisation (and could be skipped).\vspace{4pt}