From: lkcl Date: Mon, 1 May 2023 09:08:40 +0000 (+0100) Subject: (no commit message) X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=30a8763113ca2a5c6b17ac015e0d5f7094ab7640;p=libreriscv.git --- diff --git a/openpower/sv/twin_butterfly.mdwn b/openpower/sv/twin_butterfly.mdwn index 5af3c7cb6..3a78acdab 100644 --- a/openpower/sv/twin_butterfly.mdwn +++ b/openpower/sv/twin_butterfly.mdwn @@ -11,7 +11,10 @@ Although best used with SVP64 REMAP these instructions may be used in a Scalar-only -context to save considerably on DCT, DFT and FFT processing. +context to save considerably on DCT, DFT and FFT processing. Whilst some hardware +implementations may not necessarily implement them efficiently (slower Micro-coding) +savings still come from the reduction in temporary registers as well as instruction +count. # Rationale for Twin Butterfly Integer DCT Instruction(s) @@ -35,7 +38,7 @@ For the single-coefficient butterfly instruction, and: For the double-coefficient butterfly instruction. -`fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)` +In a 32-bit context `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)` ``` #define ROUND_POWER_OF_TWO(value, n) \ @@ -44,7 +47,7 @@ For the double-coefficient butterfly instruction. These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc. -Arm includes special instructions to optimize these operations, although +ARM includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`. The suggestion is to have a single instruction to calculate both values