From: lkcl <lkcl@web>
Date: Mon, 1 May 2023 09:08:40 +0000 (+0100)
Subject: (no commit message)
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=30a8763113ca2a5c6b17ac015e0d5f7094ab7640;p=libreriscv.git

---

diff --git a/openpower/sv/twin_butterfly.mdwn b/openpower/sv/twin_butterfly.mdwn
index 5af3c7cb6..3a78acdab 100644
--- a/openpower/sv/twin_butterfly.mdwn
+++ b/openpower/sv/twin_butterfly.mdwn
@@ -11,7 +11,10 @@
 <!-- show -->
 
 Although best used with SVP64 REMAP these instructions may be used in a Scalar-only
-context to save considerably on DCT, DFT and FFT processing.  
+context to save considerably on DCT, DFT and FFT processing.  Whilst some hardware
+implementations may not necessarily implement them efficiently (slower Micro-coding)
+savings still come from the reduction in temporary registers as well as instruction
+count.
 
 # Rationale for Twin Butterfly Integer DCT Instruction(s)
 
@@ -35,7 +38,7 @@ For the single-coefficient butterfly instruction, and:
 
 For the double-coefficient butterfly instruction.
 
-`fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
+In a 32-bit context `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
 
 ```
     #define ROUND_POWER_OF_TWO(value, n) \
@@ -44,7 +47,7 @@ For the double-coefficient butterfly instruction.
 
 These instructions are at the core of **ALL** FDCT calculations in many
 major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
-Arm includes special instructions to optimize these operations, although
+ARM includes special instructions to optimize these operations, although
 they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
 
 The suggestion is to have a single instruction to calculate both values