[[!tag whitepapers]] # Why in the 2020s would you invent a new Vector ISA Inventing a new Scalar ISA from scratch is over a decade-long task including simulators and compilers: OpenRISC 1200 took 12 years to mature. A Vector or SIMD ISA to reach stable general-purpose auto-vectorisation compiler support has never been achieved in the history of computing, not with the combined resources of ARM, Intel, AMD, MIPS, Sun Microsystems, SGI, Cray, and many more. Rather: GPUs have ultra-specialist compilers that are designed from the ground up to support Vector/SIMD parallelism, and associated standards managed by the Khronos Group, with multi-man-century development committment from multiple billion-dollar-revenue companies, to sustain them. Therefore it begs the question, why on earth would anyone consider this task? Hints as to the answer emerge from an article "[SIMD considered harmful](https://www.sigarch.org/simd-instructions-considered-harmful/)" which illustrates a catastrophic rabbit-hole taken by Industry Giants ARM, Intel, AMD, since the 90s (over 3 decades) whereby SIMD, an Order(N^6) opcode proliferation nightmare, with its mantra "make it easy for hardware engineers, let software sort out the mess" literally overwhelming programmers. Worse than that, specialists in charging clients Optimisation Services are finding that AVX-512, to take an example, is anything but optimal: overall performance of AVX-512 actually *decreases* even as power consumption goes up. Cray-style Vectors solved, over thirty years ago, the opcode proliferation nightmare. Only the NEC SX Aurora however truly kept the Cray Vector flame alive, until RISC-V RVV and now SVP64 and recently MRISC32 joined it. ARM's SVE/SVE2 is critically flawed (lacking the Cray `setvl` instruction that makes a truly ubiquitous Vector ISA) in ways that will become apparent over time as adoption increases. In the meantime programmers are, in direct violation of ARM's advice on how to use SVE2, trying desperately to use it as if it was Packed SIMD NEON. The advice not to create SVE2 assembler that is hardcoded to fixed widths is being disregarded, in favour of writing *multiple identical implementations* of a function, each with a different hardware width, and compelling software to choose one at runtime after probing the hardware. Even RISC-V, for all that we can be grateful to the RISC-V Founders for reviving Cray Vectors, has severe performance and implementation limitations that are only really apparent to exceptionally experienced assembly-level developers with a wide, diverse depth in multiple ISAs: one of the best and clearest is a [ycombinator post](https://news.ycombinator.com/item?id=24459041) by adrian_b. Adrian logically and concisely points out that the fundamental design assumptions and simplifications that went into the RISC-V ISA have an irrevocably damaging effect on its viability for high performance use. That is not to say that its use in low-performance embedded scenarios is not ideal: in private custom secretive commercial usage it is perfect. Ubiquitous and common everyday usage in scenarios currently occupied by ARM, Intel, AMD and IBM? not so much. Thus, even though RISC-V has Cray-style Vectors, the whole ISA is, unfortunately, fundamentally flawed as far as power efficient high performance is concerned. Slowly, at this point, a realisation should be sinking in that, actually, there aren't as many really truly viable Vector ISAs out there, as the ones that are evolving in the general direction of Vectorisation are, in various completely different ways, flawed.