X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=Comparative_analysis_Harmonised_RVP_vs_Andes_Packed_SIMD_ISA_proposal.mdwn;h=98d2adac845ebb01c13930a40b092ffd85925bba;hb=fef9b52c18997028aa57c14198073580b66673cd;hp=0b5397b08818f2d1c654c9a08f7bd92c25702d35;hpb=805d1563f5c5bb1985beceeddc507a50ac85ba28;p=libreriscv.git diff --git a/Comparative_analysis_Harmonised_RVP_vs_Andes_Packed_SIMD_ISA_proposal.mdwn b/Comparative_analysis_Harmonised_RVP_vs_Andes_Packed_SIMD_ISA_proposal.mdwn index 0b5397b08..98d2adac8 100644 --- a/Comparative_analysis_Harmonised_RVP_vs_Andes_Packed_SIMD_ISA_proposal.mdwn +++ b/Comparative_analysis_Harmonised_RVP_vs_Andes_Packed_SIMD_ISA_proposal.mdwn @@ -1,19 +1,101 @@ -# Comparative analysis with Andes Packed ISA proposal +# Comparative analysis of Andes Packed ISA proposal vs Harmonised RVP + +Harmonised RVP is a proposal to provide SIMD functionality comparable to the Andes Packed SIMD ISA, but in a manner that is forwards compatible ("harmonised") with the RV Vector specification. + +An example use case is a string copy operation - using Harmonised RVP, code can use integer register SIMD instructions to copy a string. This code can then also execute (unchanged) on a full RV Vector processor and use the dedicated vector unit to copy the string. Harmonised RVP also upwards compatibility between RV32 and RV64 SIMD using this same approach. + +## Register file comparison + +The Andes Packed SIMD ISA permits any GPR to be used for either INT8 or INT16 vector operations. +In contrast, the default Harmonised RVP GPR register file is divided into a lower bank of Vector[INT8] and an upper banxk of Vector[INT16]. +(Effectively, the vector element size is encoded by the most significant bit of the 5 bit register specifiers. +However programmers can reconfigure the register file data types, if the default configuration is unsuitable.) + +| Register | Andes ISA | Harmonised RVP ISA | +| ------------------ | ------------------------- | ------------------- | +| v0 | Hardwired zero | Hardwired zero | +| v1 | 32bit GPR or Vector[4xINT8 or 2xINT16] | Predicate mask | +| | | | +| v2 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] | +| v3 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] | +| v4 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] | +| v5 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] | +| v6 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] | +| v7 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] | +| v8 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| v9 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| v10 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| v11 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| v12 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| v13 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| v14 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| v15 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] | +| | | | +| v16 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v17 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v18 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v19 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v20 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v21 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v22 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v23 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] | +| v24 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] | +| v25 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] | +| v26 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] | +| v27 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] | +| v28 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] | +| v29 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] | +| | | | +| v30 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[1xSINT32] | +| v31 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[1xSINT32] | + +Both Andes Packed SIMD and Harmonised RVP are intended to be "low end" SIMD implementations for processors without dedicated vector registers. +Instead, the integer register file is used for SIMD operations. To maintain forwards compatibility with "high end" RV Vector implementations, programmer should use VLD and VST to load/store vectors. The implementation will then load/store a vector to/from the register file supported by the implementation. + +To keep implementations simple and focused on within-register SIMD only, there is a strict 1:1 mapping between vectors (v0-v31) and integer registers (r0-r31). Standard calling conventions apply and so callee saved integer registers should be saved before being used as vector registers. + Strided (VLDS/VSTS) and indexed (VLDX/VSTX) load/stores are complex, and simple implementations will trap on these instructions, permitting emulation in software. + +## Proposed Harmonised RVP vector op instruction encoding + +Register x 2 -> register operations: + +| 31 30 29 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | +| ----------------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- | +| func_6 | 0 | rs2 | rs1 | 0 | mm | rd1 | VOP opcode | + +Immediate + register -> register operations: + +| 31 30 29 | 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | +| -------- | -------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- | +| func_3 | imm[7:5] | 1 | imm[4:0] | rs1 | 0 | mm | rd1 | VOP opcode | + +Register x 3 -> register operations: + +| 31 30 29 28 27 | 26 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | +| ----------------------- | -------------- | -------------- | -- | ----- | ----------- | ------------- | +| rs3 | func_2 | rs2 | rs1 | 1 | mm | rd1 | VOP opcode | + +Values for mm field (bits 12:13 above): + +* mm = 00 -> no predicate mask, and use current global saturation / rounding settings +* mm = 00 -> no predicate mask, and force saturation or rounding for this instruction only +* mm = 10 -> use v1 as predicate mask, and use global saturation / rounding settings +* mm = 11 -> use ~v1 as predicate mask, and use global saturation / rounding settings ## 16-bit Arithmetic | Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | | ------------------ | ------------------------- | ------------------- | -| ADD16 rt, ra, rb | Add | VADD (r16 <= rt,ra,rb <= r29), mm=00| -| RADD16 rt, ra, rb | Signed Halving add | RADD (r16 <= rt,ra,rb <= r23), mm=00| -| URADD16 rt, ra, rb | Unsigned Halving add | RADD (r24 <= rt,ra,rb <= r29), mm=00| -| KADD16 rt, ra, rb | Signed Saturating add | VADD (r16 <= rt,ra,rb <= r23), mm=01| -| UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (r24 <= rt,ra,rb <= r29), mm=01| -| SUB16 rt, ra, rb | Subtract | VSUB (r16 <= rt,ra,rb <= r29), mm=00| -| RSUB16 rt, ra, rb | Signed Halving sub | RSUB (r16 <= rt,ra,rb <= r23), mm=00| -| URSUB16 rt, ra, rb | Unsigned Halving sub | RSUB (r24 <= rt,ra,rb <= r29), mm=00| -| KSUB16 rt, ra, rb | Signed Saturating sub | VSUB (r16 <= rt,ra,rb <= r23), mm=01| -| UKSUB16 rt, ra, rb | Unsigned Saturating sub | VSUB (r24 <= rt,ra,rb <= r29), mm=01| +| ADD16 rt, ra, rb | Add | VADD (v16 <= rt,ra,rb <= v29), mm=00| +| RADD16 rt, ra, rb | Signed Halving add | RADD (v16 <= rt,ra,rb <= v23), mm=00| +| URADD16 rt, ra, rb | Unsigned Halving add | RADD (v24 <= rt,ra,rb <= v29), mm=00| +| KADD16 rt, ra, rb | Signed Saturating add | VADD (v16 <= rt,ra,rb <= v23), mm=01| +| UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (v24 <= rt,ra,rb <= v29), mm=01| +| SUB16 rt, ra, rb | Subtract | VSUB (v16 <= rt,ra,rb <= v29), mm=00| +| RSUB16 rt, ra, rb | Signed Halving sub | RSUB (v16 <= rt,ra,rb <= v23), mm=00| +| URSUB16 rt, ra, rb | Unsigned Halving sub | RSUB (v24 <= rt,ra,rb <= v29), mm=00| +| KSUB16 rt, ra, rb | Signed Saturating sub | VSUB (v16 <= rt,ra,rb <= v23), mm=01| +| UKSUB16 rt, ra, rb | Unsigned Saturating sub | VSUB (v24 <= rt,ra,rb <= v29), mm=01| | CRAS16 rt, ra, rb | Cross Add & Sub | | | RCRAS16 rt, ra, rb | Signed Halving Cross Add & Sub | | | URCRAS16 rt, ra, rb| Unsigned Halving Cross Add & Sub | | @@ -29,16 +111,16 @@ | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | | ------------------ | ------------------------- | ------------------- | -| ADD8 rt, ra, rb | Add | VADD (r2 <= rt,ra,rb <= r15), mm=00 | -| RADD8 rt, ra, rb | Signed Halving add | RADD (r2 <= rt,ra,rb <= r7), mm=00 | -| URADD8 rt, ra, rb | Unsigned Halving add | RADD (r8 <= rt,ra,rb <= r15), mm=00 | -| KADD8 rt, ra, rb | Signed Saturating add | VADD (r2 <= rt,ra,rb <= r7), mm=01 | -| UKADD8 rt, ra, rb | Unsigned Saturating add | VADD (r8 <= rt,ra,rb <= r15), mm=01 | -| SUB8 rt, ra, rb | Subtract | VSUB (r2 <= rt,ra,rb <= r15), mm=00 | -| RSUB8 rt, ra, rb | Signed Halving sub | RSUB (r2 <= rt,ra,rb <= r7), mm=00 | -| URSUB8 rt, ra, rb | Unsigned Halving sub | RSUB (r8 <= rt,ra,rb <= r15), mm=00 | -| KSUB8 rt, ra, rb | Signed Saturating sub | VSUB (r2 <= rt,ra,rb <= r7), mm=01 | -| UKSUB8 rt, ra, rb | Unsigned Saturating sub | VSUB (r8 <= rt,ra,rb <= r15), mm=01 | +| ADD8 rt, ra, rb | Add | VADD (v2 <= rt,ra,rb <= v15), mm=00 | +| RADD8 rt, ra, rb | Signed Halving add | RADD (v2 <= rt,ra,rb <= v7), mm=00 | +| URADD8 rt, ra, rb | Unsigned Halving add | RADD (v8 <= rt,ra,rb <= v15), mm=00 | +| KADD8 rt, ra, rb | Signed Saturating add | VADD (v2 <= rt,ra,rb <= v7), mm=01 | +| UKADD8 rt, ra, rb | Unsigned Saturating add | VADD (v8 <= rt,ra,rb <= v15), mm=01 | +| SUB8 rt, ra, rb | Subtract | VSUB (v2 <= rt,ra,rb <= v15), mm=00 | +| RSUB8 rt, ra, rb | Signed Halving sub | RSUB (v2 <= rt,ra,rb <= v7), mm=00 | +| URSUB8 rt, ra, rb | Unsigned Halving sub | RSUB (v8 <= rt,ra,rb <= v15), mm=00 | +| KSUB8 rt, ra, rb | Signed Saturating sub | VSUB (v2 <= rt,ra,rb <= v7), mm=01 | +| UKSUB8 rt, ra, rb | Unsigned Saturating sub | VSUB (v8 <= rt,ra,rb <= v15), mm=01 | ## 16-bit Shifts @@ -48,18 +130,18 @@ The “K” (Saturation) and “u” (Rounding) variants could be encoded using | Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | | ------------------ | ------------------------- | ------------------- | -| SRA16 rt, ra, rb | Shift right arithmetic | VSRA (r16 <= rt,ra,rb <= r29), mm=00| -| SRAI16 rt, ra, im | Shift right arithmetic imm | VSRAI (r16 <= rt,ra <= r29), mm=00| -| SRA16.u rt, ra, rb | Rounding Shift right arithmetic | VSRA (r16 <= rt,ra,rb <= r29), mm=01| -| SRAI16.u rt, ra, im | Rounding Shift right arithmetic imm | VSRAI (r16 <= rt,ra <= r29), mm=01| -| SRL16 rt, ra, rb | Shift right logical | VSRL (r16 <= rt,ra,rb <= r29), mm=00| -| SRLI16 rt, ra, im | Shift right logical imm | VSRLI (r16 <= rt,ra <= r29), mm=00| -| SRL16.u rt, ra, rb | Rounding Shift right logical | VSRL (r16 <= rt,ra,rb <= r29), mm=01| -| SRLI16.u rt, ra, im | Rounding Shift right logical imm | VSLRI (r16 <= rt,ra <= r29), mm=01| -| SLL16 rt, ra, rb | Shift left logical | VSLL (r16 <= rt,ra,rb <= r29), mm=00| -| SLLI16 rt, ra, im | Shift left logical imm | VSLLI (r16 <= rt,ra <= r29), mm=00| -| KSLL16 rt, ra, rb | Saturating Shift left logical | VSLL (r16 <= rt,ra,rb <= r29), mm=01| -| KSLLI16 rt, ra, im | Saturating Shift left logical imm | VSLLI (r16 <= rt,ra <= r29), mm=01| +| SRA16 rt, ra, rb | Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=00| +| SRAI16 rt, ra, im | Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=00| +| SRA16.u rt, ra, rb | Rounding Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=01| +| SRAI16.u rt, ra, im | Rounding Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=01| +| SRL16 rt, ra, rb | Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=00| +| SRLI16 rt, ra, im | Shift right logical imm | VSRLI (v16 <= rt,ra <= v29), mm=00| +| SRL16.u rt, ra, rb | Rounding Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=01| +| SRLI16.u rt, ra, im | Rounding Shift right logical imm | VSLRI (v16 <= rt,ra <= v29), mm=01| +| SLL16 rt, ra, rb | Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=00| +| SLLI16 rt, ra, im | Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=00| +| KSLL16 rt, ra, rb | Saturating Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=01| +| KSLLI16 rt, ra, im | Saturating Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=01| | KSLRA16 rt, ra, rb | Saturating Shift left logical or Shift right arithmetic || | KSLRA16.u rt, ra, rb | Saturating Shift left logical or Rounding Shift right arithmetic || @@ -70,35 +152,76 @@ Andes SIMD Packed ISA omits 8 bit shifts, but these can be encoded in Harmonised | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | | ------------------ | ------------------------- | ------------------- | -| n/a | Shift right arithmetic | VSRA (r2 <= rt,ra,rb <= r15), mm=00| -| n/a | Shift right arithmetic imm | VSRAI (r2 <= rt,ra <= r15), mm=00| -| n/a | Rounding Shift right arithmetic | VSRA (r2 <= rt,ra,rb <= r15), mm=01| -| n/a | Rounding Shift right arithmetic imm | VSRAI (r2 <= rt,ra <= r15), mm=01| -| n/a | Shift right logical | VSRL (r2 <= rt,ra,rb <= r15), mm=00| -| n/a | Shift right logical imm | VSRLI (r2 <= rt,ra <= r15), mm=00| -| n/a | Rounding Shift right logical | VSRL (r2 <= rt,ra,rb <= r15), mm=01| -| n/a | Rounding Shift right logical imm | VSLRI (r2 <= rt,ra <= r15), mm=01| -| n/a | Shift left logical | VSLL (r2 <= rt,ra,rb <= r15), mm=00| -| n/a | Shift left logical imm | VSLLI (r2 <= rt,ra <= r15), mm=00| -| n/a | Saturating Shift left logical | VSLL (r2 <= rt,ra,rb <= r15), mm=01| -| n/a | Saturating Shift left logical imm | VSLLI (r2 <= rt,ra <= r15), mm=01| +| n/a | Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=00| +| n/a | Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=00| +| n/a | Rounding Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=01| +| n/a | Rounding Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=01| +| n/a | Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=00| +| n/a | Shift right logical imm | VSRLI (v2 <= rt,ra <= v15), mm=00| +| n/a | Rounding Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=01| +| n/a | Rounding Shift right logical imm | VSLRI (v2 <= rt,ra <= v15), mm=01| +| n/a | Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=00| +| n/a | Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=00| +| n/a | Saturating Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=01| +| n/a | Saturating Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=01| ## 16-bit Comparison instructions | Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | | ------------------ | ------------------------- | ------------------- | -| CMPEQ16 rt, ra, rb | Compare equal | VSEQ (r16 <= rt,ra,rb <= r29), mm=00| -| SCMPLT16 rt, ra, rb | Signed Compare less than | !VSGT (r16 <= rt,ra,rb <= r23), mm=00| -| SCMPLE16 rt, ra, rb | Signed Compare less or equal | VSLE (r16 <= rt,ra,rb <= r23), mm=00| -| UCMPLT16 rt, ra, rb | Unsigned Compare less than | !VSGT (r24 <= rt,ra,rb <= r29), mm=00| -| UCMPLE16 rt, ra, rb | Unsigned Compare less or equal | VSLE (r24 <= rt,ra,rb <= r29), mm=00| +| CMPEQ16 rt, ra, rb | Compare equal | VSEQ (v16 <= rt,ra,rb <= v29), mm=00| +| SCMPLT16 rt, ra, rb | Signed Compare less than | !VSGT (v16 <= rt,ra,rb <= v23), mm=00| +| SCMPLE16 rt, ra, rb | Signed Compare less or equal | VSLE (v16 <= rt,ra,rb <= v23), mm=00| +| UCMPLT16 rt, ra, rb | Unsigned Compare less than | !VSGT (v24 <= rt,ra,rb <= v29), mm=00| +| UCMPLE16 rt, ra, rb | Unsigned Compare less or equal | VSLE (v24 <= rt,ra,rb <= v29), mm=00| ## 8-bit Comparison instructions | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | | ------------------ | ------------------------- | ------------------- | -| CMPEQ8 rt, ra, rb | Compare equal | VSEQ (r2 <= rt,ra,rb <= r7), mm=00| -| SCMPLT8 rt, ra, rb | Signed Compare less than | !VSGT (r2 <= rt,ra,rb <= r7), mm=00| -| SCMPLE8 rt, ra, rb | Signed Compare less or equal | VSLE (r2 <= rt,ra,rb <= r7), mm=00| -| UCMPLT8 rt, ra, rb | Unsigned Compare less than | !VSGT (r8 <= rt,ra,rb <= r15), mm=00| -| UCMPLE8 rt, ra, rb | Unsigned Compare less or equal | VSLE (r8 <= rt,ra,rb <= r15), mm=00| +| CMPEQ8 rt, ra, rb | Compare equal | VSEQ (v2 <= rt,ra,rb <= v7), mm=00| +| SCMPLT8 rt, ra, rb | Signed Compare less than | !VSGT (v2 <= rt,ra,rb <= v7), mm=00| +| SCMPLE8 rt, ra, rb | Signed Compare less or equal | VSLE (v2 <= rt,ra,rb <= v7), mm=00| +| UCMPLT8 rt, ra, rb | Unsigned Compare less than | !VSGT (v8 <= rt,ra,rb <= v15), mm=00| +| UCMPLE8 rt, ra, rb | Unsigned Compare less or equal | VSLE (v8 <= rt,ra,rb <= v15), mm=00| + +## 16-bit Miscellaneous instructions + +| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------ | ------------------- | +| SMIN16 rt, ra, rb | Signed minimum | VMIN (v16 <= rt,ra,rb <= v23), mm=00| +| UMIN16 rt, ra, rb | Unsigned minimum | VMIN (v24 <= rt,ra,rb <= v29), mm=00| +| SMAX16 rt, ra, rb | Signed maximum | VMAX (v16 <= rt,ra,rb <= v23), mm=00| +| UMAX16 rt, ra, rb | Unsigned maximum | VMAX (v24 <= rt,ra,rb <= v29), mm=00| +| SCLIP16 rt, ra, im | Signed clip | ?VCLIP (v16 <= rt,ra,rb <= v23), mm=01| +| UCLIP16 rt, ra, im | Unsigned clip | ?VCLIP (v24 <= rt,ra,rb <= v29), mm=01| +| KMUL16 rt, ra, rb | Signed multiply 16x16->16 | VMUL (v16 <= rt,ra,rb <= v23), mm=01| +| KMULX16 rt, ra, rb | Signed crossed multiply 16x16->16 | | +| SMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v16 <= ra,rb <= v23), mm=00| +| SMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | | +| UMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v24 <= ra,rb <= r31), mm=00| +| UMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | | +| KABS16 rt, ra | Saturated absolute value | VSGNX (v16 <= rt <= v29, v16 <= ra,rb <= v23, mm=01) | + +## 8-bit Miscellaneous instructions + +| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| SMIN8 rt, ra, rb | Signed minimum | VMIN (v2 <= rt,ra,rb <= v7), mm=00| +| UMIN8 rt, ra, rb | Unsigned minimum | VMIN (v8 <= rt,ra,rb <= v15), mm=00| +| SMAX8 rt, ra, rb | Signed maximum | VMAX (v2 <= rt,ra,rb <= v7), mm=00| +| UMAX8 rt, ra, rb | Unsigned maximum | VMAX (v8 <= rt,ra,rb <= v15), mm=00| +| KABS8 rt, ra | Saturated absolute value | VSGNX (v2 <= rt <= v15, v2 <= ra,rb <= v8, mm=01) | + +## 8-bit Unpacking instructions + +| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| SUNPKD810 rt, ra | Signed unpack bytes 1 & 0 | VMV (v16<= rt <= 23, v2 <= ra <= v7), mm=00| +| SUNPKD820 rt, ra | Signed unpack bytes 2 & 0 | | +| SUNPKD830 rt, ra | Signed unpack bytes 3 & 0 | | +| SUNPKD831 rt, ra | Signed unpack bytes 3 & 1 | | +| ZUNPKD810 rt, ra | Unsigned unpack bytes 1 & 0 | VMV (v24<= rt <= 31, v8 <= ra <= v15), mm=00| +| ZUNPKD820 rt, ra | Unsigned unpack bytes 2 & 0 | | +| ZUNPKD830 rt, ra | Unsigned unpack bytes 3 & 0 | | +| ZUNPKD831 rt, ra | Unsigned unpack bytes 3 & 1 | |