Andes in Audio DSPs, WD in HDDs and SSDs. These are all
astoundingly commercially successful
multi-billion-unit mass volume markets that almost nobody
- knows anything about. Included for completeness.
+ knows anything about, outside their specialised proprietary
+ niche. Included for completeness.
In order of least controlled to most controlled, the viable
candidates for further advancement are:
Boolean Logic in a Vector context, on top of an already-powerful
Scalar Branch-Conditional/Counter instruction
+All of these festures are added as "Augmentations", to create of
+the order of 1.5 *million* instructions, none of which decode the
+32-bit scalar suffix any differently.
+
**What is missing from Power Scalar ISA that a Vector ISA needs?**
Remarkably, very little: the devil is in the details though.
why Matrix Multiplication Schedules may not be applied to Integer
Mul-and-Accumulate, Galois Field Mul-and-Accumulate, Logical
AND-and-OR, or any other future instruction such as Complex-Number
-Multiply-and-Accumulate that a future version of the Power ISA might
+Multiply-and-Accumulate or Abs-Diff-and-Accumulate
+that a future version of the Power ISA might
support. The flexibility is not only enormous, but the compactness
-unprecedented. RADIX2 in-place DCT Triple-loop Schedules may be created in
-around 11 instructions. The only other processors well-known to have
+unprecedented. RADIX2 in-place DCT may be created in
+around 11 instructions using the Triple-loop DCT Schedule. The only other processors well-known to have
this type of compact capability are both VLIW DSPs: TI's TMS320 Series
and Qualcom's Hexagon, and both are targetted at FFTs only.