[[!tag whitepapers]]

# Why in the 2020s would you invent a new Vector ISA

Inventing a new Scalar ISA from scratch is over a decade-long task including
simulators and compilers: OpenRISC 1200 took 12 years to mature.
A Vector or SIMD ISA to reach stable
general-purpose auto-vectorisation compiler support has never been
achieved in the history of computing, not with the combined resources of
ARM, Intel, AMD, MIPS, Sun Microsystems, SGI, Cray, and many more.
Rather: GPUs have ultra-specialist compilers that are designed
from the ground up to support Vector/SIMD parallelism,
and associated standards managed by the
Khronos Group, with multi-man-century development committment from
multiple billion-dollar-revenue companies, to sustain them.

Therefore it begs the question, why on earth would anyone consider this
task?

Hints as to the answer emerge from an article
"[SIMD considered harmful](https://www.sigarch.org/simd-instructions-considered-harmful/)"
which illustrates a catastrophic rabbit-hole taken by Industry Giants
ARM, Intel, AMD,
since the 90s (over 3 decades) whereby SIMD, an Order(N^6) opcode
proliferation nightmare, with its mantra "make it easy for hardware engineers,
let software sort out the mess" literally overwhelming programmers.
Worse than that, specialists in charging clients Optimisation Services
are finding that AVX-512, to take an example, is anything but optimal:
overall performance of AVX-512 actually *decreases* even as power
consumption goes up.

Cray-style Vectors solved, over thirty years ago, the opcode proliferation
nightmare.  Only the NEC SX Aurora however truly kept the Cray Vector flame
alive, until RISC-V RVV and now SVP64 and recently MRISC32 joined it.
ARM's SVE/SVE2 is critically flawed (lacking the Cray `setvl` instruction
that makes a truly ubiquitous Vector ISA) in ways that will become apparent
over time as adoption increases. In the meantime programmers are, in
direct violation of ARM's advice on how to use SVE2, trying desperately
to use it as if it was Packed SIMD NEON.  The advice not to create SVE2
assembler that is hardcoded to fixed widths is being disregarded, in
favour of writing *multiple identical implementations* of a function,
each with a different hardware width, and compelling software to choose
one at runtime after probing the hardware.

Even RISC-V, for all that we can be grateful to the RISC-V Founders for
reviving Cray Vectors, has severe performance and implementation
limitations that are only really apparent to exceptionally experienced
assembly-level developers with a wide, diverse depth in multiple ISAs:
one of the best and clearest is a
[ycombinator post](https://news.ycombinator.com/item?id=24459041)
by adrian_b.

Adrian logically and concisely points out that the fundamental
design assumptions and
simplifications that went into the RISC-V ISA have an
irrevocably damaging effect
on its viability for high performance use.  That is not to say that
its use in low-performance embedded scenarios is not ideal: in
private custom secretive commercial usage it is perfect. Ubiquitous
and common everyday usage in scenarios currently occupied by ARM, Intel,
AMD and IBM? not so much. Thus, even though RISC-V has Cray-style Vectors,
the whole ISA is, unfortunately, fundamentally flawed as far as power
efficient high performance is concerned.

Slowly, at this point, a realisation should be sinking in that, actually,
there aren't as many really truly viable Vector ISAs out there, as the
ones that are evolving in the general direction of Vectorisation are,
in various completely different ways, flawed.