advancements quickly follow naturally from analysis
of the problem-space:
+* Expanding the size of GPR, FPR and CR register files to
+ provide 128 entries in each. This is a bare minimum for GPUs
+ in order to keep processing workloads as close to a LOAD-COMPUTE-STORE
+ batching as possible.
* Predication (an absolutely critical component for a Vector ISA),
then the next logical advancement is to allow separate predication masks
to be applied to *both* the source *and* the destination, independently.