From b964851542395ec8194cc492cb1de78669fe1337 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Sat, 29 Apr 2023 17:08:22 +0100 Subject: [PATCH] whitespace --- openpower/sv/twin_butterfly.mdwn | 160 +++++++++++++++---------------- 1 file changed, 77 insertions(+), 83 deletions(-) diff --git a/openpower/sv/twin_butterfly.mdwn b/openpower/sv/twin_butterfly.mdwn index a5c29a379..7b71dca02 100644 --- a/openpower/sv/twin_butterfly.mdwn +++ b/openpower/sv/twin_butterfly.mdwn @@ -13,12 +13,11 @@ # Rationale for Twin Butterfly Integer DCT Instruction(s) -The number of general-purpose uses for DCT is huge. The -number of instructions needed instead of these Twin-Butterfly -instructions is also huge (**eight**) and given that it is -extremely common to explicitly loop-unroll them quantity -hundreds to thousands of instructions are dismayingly common -(for all ISAs). +The number of general-purpose uses for DCT is huge. The number of +instructions needed instead of these Twin-Butterfly instructions is also +huge (**eight**) and given that it is extremely common to explicitly +loop-unroll them quantity hundreds to thousands of instructions are +dismayingly common (for all ISAs). The goal is to implement instructions that calculate the expression: @@ -41,11 +40,16 @@ For the double-coefficient butterfly instruction. (((value) + (1 << ((n)-1))) >> (n)) ``` -These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc. -Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`. +These instructions are at the core of **ALL** FDCT calculations in many +major video codecs, including -but not limited to- VP8/VP9, AV1, etc. +Arm includes special instructions to optimize these operations, although +they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`. -The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`. -The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c. +The suggestion is to have a single instruction to calculate both values +`((a + b) * c) >> N`, and `((a - b) * c) >> N`. The instruction will +run in accumulate mode, so in order to calculate the 2-coeff version +one would just have to call the same instruction with different order a, +b and a different constant c. ## Integer Butterfly Multiply Add/Sub FFT/DCT @@ -82,14 +86,12 @@ Pseudo-code: RS <- (res2 & m | smask2) + s64_2 ``` -Note that if Rc=1 an Illegal Instruction is raised. -Rc=1 is `RESERVED` +Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED` -Similar to `RTp`, this instruction produces an implicit result, -`RS`, which under Scalar circumstances is defined as `RT+1`. -For SVP64 if `RT` is a Vector, `RS` begins immediately after the -Vector `RT` where the length of `RT` is set by `SVSTATE.MAXVL` -(Max Vector Length). +Similar to `RTp`, this instruction produces an implicit result, `RS`, +which under Scalar circumstances is defined as `RT+1`. For SVP64 if +`RT` is a Vector, `RS` begins immediately after the Vector `RT` where +the length of `RT` is set by `SVSTATE.MAXVL` (Max Vector Length). Special Registers Altered: @@ -122,24 +124,22 @@ Pseudo-code: The Floating-Point operand in register FRT is added to the floating-point operand in register FRB and the result stored in FRS. -Using the exact same operand input register values from FRT and FRB that -were used to create FRS, the Floating-Point operand in register FRB -is subtracted from the floating-point operand in register FRT and the -result then multiplied by FRA to create an intermediate result that is -stored in FRT. +Using the exact same operand input register values from FRT and FRB +that were used to create FRS, the Floating-Point operand in register +FRB is subtracted from the floating-point operand in register FRT and +the result then multiplied by FRA to create an intermediate result that +is stored in FRT. -The add into FRS is treated exactly as `fadd`. The creation -of the result FRT is exact!y that of `fmsub`. The creation of FRS and FRT are +The add into FRS is treated exactly as `fadd`. The creation of the +result FRT is exact!y that of `fmsub`. The creation of FRS and FRT are treated as parallel independent operations which occur at the same time. -Note that if Rc=1 an Illegal Instruction is raised. -Rc=1 is `RESERVED` +Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED` -Similar to `FRTp`, this instruction produces an implicit result, -`FRS`, which under Scalar circumstances is defined as `FRT+1`. -For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the -Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL` -(Max Vector Length). +Similar to `FRTp`, this instruction produces an implicit result, `FRS`, +which under Scalar circumstances is defined as `FRT+1`. For SVP64 if +`FRT` is a Vector, `FRS` begins immediately after the Vector `FRT` +where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length). Special Registers Altered: @@ -178,21 +178,20 @@ The two operations are performed. -The floating-point operand in register FRT is multiplied -by the floating-point operand in register FRA. The float- -ing-point operand in register FRB is added to -this intermediate result, and the intermediate stored in FRS. - -Using the exact same values of FRT, FRT and FRB as used to create FRS, -the floating-point operand in register FRT is multiplied -by the floating-point operand in register FRA. The float- -ing-point operand in register FRB is subtracted from -this intermediate result, and the intermediate stored in FRT. - -FRT is created as if -a `fmadds` operation had been performed. FRS is created as if -a `fnmsubs` operation had simultaneously been performed with -the exact same register operands, in parallel, independently, +The floating-point operand in register FRT is multiplied by the +floating-point operand in register FRA. The floating-point operand in +register FRB is added to this intermediate result, and the intermediate +stored in FRS. + +Using the exact same values of FRT, FRT and FRB as used to create +FRS, the floating-point operand in register FRT is multiplied by the +floating-point operand in register FRA. The float- ing-point operand +in register FRB is subtracted from this intermediate result, and the +intermediate stored in FRT. + +FRT is created as if a `fmadds` operation had been performed. FRS is +created as if a `fnmsubs` operation had simultaneously been performed +with the exact same register operands, in parallel, independently, at exactly the same time. FRT is a Read-Modify-Write operation. @@ -237,24 +236,22 @@ Pseudo-code: The Floating-Point operand in register FRT is added to the floating-point operand in register FRB and the result stored in FRS. -Using the exact same operand input register values from FRT and FRB that -were used to create FRS, the Floating-Point operand in register FRB -is subtracted from the floating-point operand in register FRT and the -result then multiplied by FRA to create an intermediate result that is -stored in FRT. +Using the exact same operand input register values from FRT and FRB +that were used to create FRS, the Floating-Point operand in register +FRB is subtracted from the floating-point operand in register FRT and +the result then multiplied by FRA to create an intermediate result that +is stored in FRT. -The add into FRS is treated exactly as `fadd`. The creation -of the result FRT is exact!y that of `fmsub`. The creation of FRS and FRT are +The add into FRS is treated exactly as `fadd`. The creation of the +result FRT is exact!y that of `fmsub`. The creation of FRS and FRT are treated as parallel independent operations which occur at the same time. -Note that if Rc=1 an Illegal Instruction is raised. -Rc=1 is `RESERVED` +Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED` -Similar to `FRTp`, this instruction produces an implicit result, -`FRS`, which under Scalar circumstances is defined as `FRT+1`. -For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the -Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL` -(Max Vector Length). +Similar to `FRTp`, this instruction produces an implicit result, `FRS`, +which under Scalar circumstances is defined as `FRT+1`. For SVP64 if +`FRT` is a Vector, `FRS` begins immediately after the Vector `FRT` +where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length). Special Registers Altered: @@ -293,33 +290,30 @@ The two operations are performed. -The floating-point operand in register FRT is multiplied -by the floating-point operand in register FRA. The float- -ing-point operand in register FRB is added to -this intermediate result, and the intermediate stored in FRS. - -Using the exact same values of FRT, FRT and FRB as used to create FRS, -the floating-point operand in register FRT is multiplied -by the floating-point operand in register FRA. The float- -ing-point operand in register FRB is subtracted from -this intermediate result, and the intermediate stored in FRT. - -FRT is created as if -a `fmadd` operation had been performed. FRS is created as if -a `fnmsub` operation had simultaneously been performed with -the exact same register operands, in parallel, independently, +The floating-point operand in register FRT is multiplied by the +floating-point operand in register FRA. The float- ing-point operand in +register FRB is added to this intermediate result, and the intermediate +stored in FRS. + +Using the exact same values of FRT, FRT and FRB as used to create +FRS, the floating-point operand in register FRT is multiplied by the +floating-point operand in register FRA. The float- ing-point operand +in register FRB is subtracted from this intermediate result, and the +intermediate stored in FRT. + +FRT is created as if a `fmadd` operation had been performed. FRS is +created as if a `fnmsub` operation had simultaneously been performed +with the exact same register operands, in parallel, independently, at exactly the same time. -FRT is a Read-Modify-Write operation. +FRT is a Read-Modify-Write operation. -Note that if Rc=1 an Illegal Instruction is raised. -Rc=1 is `RESERVED` +Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED` -Similar to `FRTp`, this instruction produces an implicit result, -`FRS`, which under Scalar circumstances is defined as `FRT+1`. -For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the -Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL` -(Max Vector Length). +Similar to `FRTp`, this instruction produces an implicit result, `FRS`, +which under Scalar circumstances is defined as `FRT+1`. For SVP64 if +`FRT` is a Vector, `FRS` begins immediately after the Vector `FRT` +where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length). Special Registers Altered: -- 2.30.2