<!-- show -->
Although best used with SVP64 REMAP these instructions may be used in a Scalar-only
-context to save considerably on DCT, DFT and FFT processing.
+context to save considerably on DCT, DFT and FFT processing. Whilst some hardware
+implementations may not necessarily implement them efficiently (slower Micro-coding)
+savings still come from the reduction in temporary registers as well as instruction
+count.
# Rationale for Twin Butterfly Integer DCT Instruction(s)
For the double-coefficient butterfly instruction.
-`fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
+In a 32-bit context `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
```
#define ROUND_POWER_OF_TWO(value, n) \
These instructions are at the core of **ALL** FDCT calculations in many
major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
-Arm includes special instructions to optimize these operations, although
+ARM includes special instructions to optimize these operations, although
they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
The suggestion is to have a single instruction to calculate both values