From: Konstantinos Margaritis Date: Thu, 27 Apr 2023 16:51:00 +0000 (+0000) Subject: remove autogen code, add rationale X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=7251019d429f469b1c717de850eae98c53c115b7;p=libreriscv.git remove autogen code, add rationale --- diff --git a/openpower/sv/twin_butterfly.mdwn b/openpower/sv/twin_butterfly.mdwn index 063671fec..da1a70bd5 100644 --- a/openpower/sv/twin_butterfly.mdwn +++ b/openpower/sv/twin_butterfly.mdwn @@ -25,6 +25,9 @@ For the double-coefficient butterfly instruction. #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n)) ``` +These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc. +Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`. + The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`. The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c. @@ -61,24 +64,6 @@ Where BF-Form is defined in fields.txt: ``` -The resulting autogenerated code is: - -``` -class butterfly: - @inject() - def op_maddsubrs(self, RA, RB, RC, RT): - RT2 = copy_assign_rhs(RT + 1) - sum = copy_assign_rhs(RA + RB) - diff = copy_assign_rhs(RA - RB) - prod1 = copy_assign_rhs(self.MUL(RC, sum)) - prod2 = copy_assign_rhs(self.MUL(RC, diff)) - res1 = copy_assign_rhs(self.ROTL64(prod1, SH)) - res2 = copy_assign_rhs(self.ROTL64(prod2, SH)) - RT = copy_assign_rhs(RT + res1) - RT2 = copy_assign_rhs(RT2 + res2) - return (RT,) -``` - The instruction has been added to `minor_59.csv`: ``` 1111011111,ALU,OP_MADDSUBRS,RA,RB,RC,RT,NONE,CR1,0,0,ZERO,0,NONE,0,0,0,0,1,0,RC_ONLY,0,0,maddsubrs,A,,1,unofficial until submitted and approved/renumbered by the opf isa wg