From 30a8763113ca2a5c6b17ac015e0d5f7094ab7640 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Mon, 1 May 2023 10:08:40 +0100
Subject: [PATCH]

---
 openpower/sv/twin_butterfly.mdwn | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/openpower/sv/twin_butterfly.mdwn b/openpower/sv/twin_butterfly.mdwn
index 5af3c7cb6..3a78acdab 100644
--- a/openpower/sv/twin_butterfly.mdwn
+++ b/openpower/sv/twin_butterfly.mdwn
@@ -11,7 +11,10 @@
 <!-- show -->
 
 Although best used with SVP64 REMAP these instructions may be used in a Scalar-only
-context to save considerably on DCT, DFT and FFT processing.  
+context to save considerably on DCT, DFT and FFT processing.  Whilst some hardware
+implementations may not necessarily implement them efficiently (slower Micro-coding)
+savings still come from the reduction in temporary registers as well as instruction
+count.
 
 # Rationale for Twin Butterfly Integer DCT Instruction(s)
 
@@ -35,7 +38,7 @@ For the single-coefficient butterfly instruction, and:
 
 For the double-coefficient butterfly instruction.
 
-`fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
+In a 32-bit context `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
 
 ```
     #define ROUND_POWER_OF_TWO(value, n) \
@@ -44,7 +47,7 @@ For the double-coefficient butterfly instruction.
 
 These instructions are at the core of **ALL** FDCT calculations in many
 major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
-Arm includes special instructions to optimize these operations, although
+ARM includes special instructions to optimize these operations, although
 they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
 
 The suggestion is to have a single instruction to calculate both values
-- 
2.30.2