\item Does not require sacrificing 32-bit Major Opcodes.
\item Does not require adding duplicates of instructions
(popcnt, popcntw, popcntd, vpopcntb, vpopcnth, vpopcntw, vpopcntd)
+\item Fully abstracted: does not create Micro-architectural dependencies
+ (no fixed "Lane" size).
\item Specifically designed to be easily implemented
on top of an existing Micro-architecture (especially
Superscalar Out-of-Order Multi-issue) without
dramatically reduced instruction count, and power consumption expected
to greatly reduce. Normally found only in high-end \acs{VLIW} \acs{DSP}
(TI MSP, Qualcomm Hexagon)
-\item Fail-First Load/Store allows strncpy to be implemented in around 14
+\item Fail-First Load/Store allows Vectorised high performance
+ strncpy to be implemented in around 14
instructions (hand-optimised \acs{VSX} assembler is 240).
\item Inner loop of MP3 implemented in under 100 instructions
(gcc produces 450 for the same function on POWER9).