From: lkcl Date: Sat, 11 Jun 2022 17:10:17 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~1847 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=3c7417532f34151fbbf124667ed56f86c6af3090;p=libreriscv.git --- diff --git a/openpower/sv/mv.vec.mdwn b/openpower/sv/mv.vec.mdwn index c7cc06704..87464f572 100644 --- a/openpower/sv/mv.vec.mdwn +++ b/openpower/sv/mv.vec.mdwn @@ -7,90 +7,11 @@ In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack * See * -Note that some of these may be covered by [[remap]]. - -# move to/from vec2/3/4 - -Basic idea: mv operations where either the src or dest is specifically marked as having SUBVL apply to it, but, crucially, the *other* argument does *not*. Note that this is highly unusual in SimpleV, which normally only allows SUBVL to be applied uniformly across all dest and all src. - - mv.srcvec r3, r4.vec2 - mv.destvec r2.vec4, r5 - -TODO: evaluate whether this will fit with [[mv.swizzle]] involved as well -(yes it probably will) - -* M=0 is mv.srcvec -* M=1 is mv.destvec - -mv.srcvec (leaving out elwidths and chop): - - for i in range(VL): - regs[rd+i] = regs[rs+i*SUBVL] - -mv.destvec (leaving out elwidths and chop): - - for i in range(VL): - regs[rd+i*SUBVL] = regs[rs+i] - -Note that these mv operations only become significant when elwidth is set on the vector to a small value. SUBVL=4, src elwidth=8, dest elwidth=32 for example. - -intended to cover: - - rd = (rs >> 0 * 8) & (2^8 - 1) - rd+1 = (rs >> 1 * 8) & (2^8 - 1) - rd+2 = (rs >> 2 * 8) & (2^8 - 1) - rd+3 = (rs >> 3 * 8) & (2^8 - 1) - -and variants involving vec3 into 32 bit (4th byte set to zero). -TODO: include this pseudocode which shows how the vecN can do that. -in this example RA elwidth=32 and RB elwidth=8, RB is a vec4. - - for i in range(VL): - if predicate_bit_not_set(i) continue - uint8_t *start_point = (uint8_t*)(int_regfile[RA].i[i]) - for j in range(SUBVL): # vec4 - start_point[j] = some_op(int_regfile[RB].b[i*SUBVL + j]) - - - -## Twin Predication, saturation, swizzle, and elwidth overrides - -Note that mv is a twin-predicated operation, and is swizzlable. This implies that from the vec2, vec3 or vec4, 1 to 8 bytes may be selected and re-ordered (XYZW), mixed with 0 and 1 constants, skipped by way of twin predicate pack and unpack, and a huge amount besides. - -Also saturation can be applied to individual elements, including the elements within a vec2/3/4. - -# mv.zip and unzip - -These are Scalar equivalents to VSX Pack and Unpack: v3.1 -Book I Section 6.8 p278. Saturated variants do not need -adding because SVP64 overrides add Saturation already. -More detailed merging may be achieved with [[sv/bitmanip]] -instructions. - -| 0.5 |6.10|11.15|16..20|21..25|26.....30|31| name | -|-----|----|-----|------|------|---------|--|--------------| -| 19 | RTp| RC | RB/0 | RA/0 | XO[5:9] |Rc| mv.zip | -| 19 | RT | RC | RS/0 | RA/0 | XO[5:9] |Rc| mv.unzip | - -these are specialist operations that zip or unzip to/from multiple regs to/from one vector including vec2/3/4. when SUBVL!=1 the vec2/3/4 is the contiguous unit that is copied (as if one register). different elwidths result in zero-extension or truncation except if saturation is enabled, where signed/unsigned may be applied as usual. - -mv.zip, RA=0, RB=0 - - for i in range(VL): - regs[rt+i] = regs[rc+i] - -mv.zip, RA=0, RB!=0 - - for i in range(VL): - regs[rt+i*2 ] = regs[rb+i] - regs[rt+i*2+1] = regs[rc+i] - -mv.zip, RA!=0, RB!=0 - - for i in range(VL): - regs[rt+i*3 ] = regs[rb+i] - regs[rt+i*3+1] = regs[rc+i] - regs[rt+i*3+2] = regs[ra+i] +Pack and unpack may be covered by [[remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so. Additionally, +with pressure on the Scalar 32-bit opcode space it is more appropriate to +compromise by adding required capability in SVP64 on top of a +base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently +unusual to justify a base Scalar 32-bit instruction but pack/unpack is not. # REMAP concept for pack/unpack @@ -144,7 +65,6 @@ This results in a 3-element "pack" Both examples become particularly fun when Twin Predication is thrown into the mix. - # RM Mode Concept: MVRM-2P-2S1D: