From cad60e729727017520dac103fc26a4c5873b949c Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 27 Apr 2018 05:30:03 +0100 Subject: [PATCH] --- harmonised_rvv_rvp/discussion.mdwn | 35 ++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/harmonised_rvv_rvp/discussion.mdwn b/harmonised_rvv_rvp/discussion.mdwn index 128409895..1d86fcc9a 100644 --- a/harmonised_rvv_rvp/discussion.mdwn +++ b/harmonised_rvv_rvp/discussion.mdwn @@ -7,3 +7,38 @@ * Likewise the last (and first) of 2-wide 16-bit operations? * What about predication within a 4-wide 8-bit group? * Likewise what about predication within a 2-wide 16-bit group? + +## Providing "cross-over" between elements in a group + +what do you think of the "CSR cross[32][6]" idea? sorry below may +not be exactly clear, it's basically a way to generalise all +cross-operations, even the SUNPKD810 rt, ra and ZUNPKD810 rt, ra would +reduce down to one instruction as opposed to 8 right now. + + def butterfly_remap(remap_me): + # hmmm a little hazy on the details here.... + # help, help! logic-dyslexia kicking in! + # erm do some crossover using the 6 bits from + # the CSR cross map. first 2 bits swap + # elements in index positions 0,1 and 2,3 + # second 2 bits swap elements in positions 0,2 and 1,3 + # then swap 0,1 and 2,3 a second time. + # gives full set of all permutations. + return something, something + + def crossover(elidx, destreg): + base = elidx & ~0x7 + return butterfly_remap(CSR_cross[destreg][elidx & 0x7]) + + def op(v1, v2, v3): + for l in vlen: + remap_src1, remap_src2 = crossover(i, v1) + # remap_srcN references byte offsets? erm.... :) + GPR[v1] = scalar_op(GPR[v2][remap_src1], + GPR[v3][remap_src2]) + +Otherwise, VSHUFFLE and so on (and possibly xBitManip) would +need to be used. xBitManip would not be a bad idea, except +consideration of VLIW-like DSP (TI C67*) architectures needs +to be given, which do not do register-renaming and have fixed +pipeline phases with no stalling on register-dependencies. -- 2.30.2