general-purpose auto-vectorisation compiler support has never been
achieved in the history of computing, not with the combined resources of
ARM, Intel, AMD, MIPS, Sun Microsystems, SGI, Cray, and many more.
-GPUs have ultra-specialist compilers, and standards managed by the
-Khronos Group, with multi-man-century development committment.
+Rather: GPUs have ultra-specialist compilers that are designed
+from the ground up to support Vector/SIMD parallelism,
+and associated standards managed by the
+Khronos Group, with multi-man-century development committment from
+multiple billion-dollar-revenue companies, to sustain them.
Therefore it begs the question, why on earth would anyone consider this
task?
let software sort out the mess" literally overwhelming programmers.
Worse than that, specialists in charging clients Optimisation Services
are finding that AVX-512, to take an example, is anything but optimal:
-overall performance actually *decreases* even as power consumption goes
-up.
+overall performance of AVX-512 actually *decreases* even as power
+consumption goes up.
Cray-style Vectors solved, over thirty years ago, the opcode proliferation
nightmare. Only the NEC SX Aurora however truly kept the Cray Vector flame
alive, until RISC-V RVV and now SVP64 and recently MRISC32 joined it.
+ARM's SVE/SVE2 is critically flawed (lacking the Cray `setvl` instruction
+that makes a truly ubiquitous Vector ISA) in ways that will become apparent
+over time as adoption increases. In the meantime programmers are, in
+direct violation of ARM's advice on how to use SVE2, trying desperately
+to use it as if it was Packed SIMD NEON.
+
+Even RISC-V, for all that we can be grateful to the RISC-V Founders for
+reviving Cray Vectors, has severe performance and imolementation
+limitations that are only really apparent to exceptionally experienced
+assembly-level developers with a wide, diverse depth in multiple ISAs:
+one of the best and clearest is a
+[ycombinator post](