ARM has already added `vqrdmulhq_s16/32` instructions as their inclusion
in any ISA replaces **eight** non-Twin-Butterfly instructions, which
are often loop-unrolled, resulting in L1 I-Cache stripmining as well
-as requiring far greater resources or much more complex hardware to
+as requiring far greater resources (double the number of intermediate
+Vector registers) or much more complex hardware to
get efficient execution.
**Notes and Observations**: