SVP64 tree reduction needs a single instruction to work properly.
2. if you implement any of the FP min/max modes, the rest are not much more
hardware.
-3. FP min/max are rather complex to implement in software, the most commonly
+3. TODO(lkcl): fill out: that using VSX may have different meaning (SVP64/VSX)
+ so it is *really* crucial to have SVP64/SFFS ops.
+4. FP min/max are rather complex to implement in software, the most commonly
used FP max function `fmax` from glibc compiled for SFFS is 32 (!)
instructions.