From: lkcl Date: Sun, 17 Apr 2022 21:58:34 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2746 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=6be8d76e88e9f7f4fc591ea5e12f825054b1d4dc;p=libreriscv.git --- diff --git a/openpower/sv/bitmanip/appendix.mdwn b/openpower/sv/bitmanip/appendix.mdwn index 359e94f1f..4b8e20f72 100644 --- a/openpower/sv/bitmanip/appendix.mdwn +++ b/openpower/sv/bitmanip/appendix.mdwn @@ -126,6 +126,14 @@ Transformation of 4-in, 2-out into a pair of operations: +A trick used in the DCT and FFT twin-butterfly instructions, +originally borrowed from `lq` and LD/ST-with-update, is to +have a second hidden (implicit) destination register, RS. +RS is calculated as RT+VL, where all scalar operations +assume VL=1. With `sv.msubx` *creating* a pair of Vector +results, `sv.weirdaddx` correspondingly has to pick the +pair up in order to carry on the algorithm. + **msubx RT, RA, RB, RC** (RS=RT+VL for SVP64, RS=RT+1 for scalar) prod[0:127] = (RA) * (RB) @@ -146,12 +154,16 @@ These two combine as, simply: # RS=RT+VL, assume VL=8, therefore RS starts at r8.v # q : r16 - # dividend: r24.v - # divisor : r32.v + # dividend: r20.v + # divisor : r28.v # carry : r40 - li r40, 0 - sv.msubx r0.v, r16, r24.v, r32.v - sv.weirdaddx r0.v, r40, r8.v + li r17, 0 + sv.msubx r0.v, r16, r20.v, r28.v + sv.weirdaddx r0.v, r17, r8.v + +As a result, a big-integer subtract and multiply may be carried out +in only 3 instructions, one of which is setting a scalar integer to +zero. ## EXT004 Opcode map