loop on all elements with the same instruction before moving
on to the next instruction. Predication needs to be pre-calculated
for the entire Vector in order to exclude certain elements from
-the computation. In this case, that's an expensive inconvenience.
+the computation. In this case, that's an expensive inconvenience
+(similar to the problems associated with Memory-to-Memory
+Vector Machines such as the CDC Star-100).
Vertical-First allows *scalar* temporary registers to be utilised
in the assessment as to whether a particular Vector element should
Vertical-First Loop allows a Multi-Issue Out-of-Order Engine to
*amortise in-flight scalar looped operations into SIMD batches*
as long as the loop is kept small enough to entirely fit into
-in-flight Reservation Stations.
+in-flight Reservation Stations in the first place.
*<blockquote>
(With thanks and gratitude to Mitch Alsup on comp.arch for
spending considerable time explaining VVM, how its Loop
Construct explicitly identifies loop-invariant registers,
-and how to exploit GB-OoO Micro-architectures)
+and how that helps to exploit a GB-OoO Micro-architectures)
</blockquote>*
**Use-case: More powerful in-memory PEs**