-# Comparative analysis with Andes Packed ISA proposal
-
-## Register file
-
-| Register | Andes ISA | Harmonised RVP ISA |
-| ------------------ | ------------------------- | ------------------- |
-| v0 | Hardwired zero | Hardwired zero |
-| v1 | 32bit GPR or Vector[4xB|2xH] | Predicate masks |
-| v2 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xSB] |
-| v3 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xSB] |
-| v4 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xSB] |
-| v5 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xSB] |
-| v6 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xSB] |
-| v7 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xSB] |
-| v8 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v9 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v10 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v11 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v12 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v13 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v14 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v15 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[4xUB] |
-| v16 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v17 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v18 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v19 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v20 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v21 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v22 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v23 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xSH] |
-| v24 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xUH] |
-| v25 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xUH] |
-| v26 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xUH] |
-| v27 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xUH] |
-| v28 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xUH] |
-| v29 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[2xUH] |
-| v30 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[1xUW] |
-| v31 | 32bit GPR or Vector[4xB|2xH] | 32bit GPR or Vector[1xUW] |
-
-
-
-| RADD16 rt, ra, rb | Signed Halving add | RADD (r16 <= rt,ra,rb <= r23), mm=00|
-| URADD16 rt, ra, rb | Unsigned Halving add | RADD (r24 <= rt,ra,rb <= r29), mm=00|
-| KADD16 rt, ra, rb | Signed Saturating add | VADD (r16 <= rt,ra,rb <= r23), mm=01|
-| UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (r24 <= rt,ra,rb <= r29), mm=01|
-| SUB16 rt, ra, rb | Subtract | VSUB (r16 <= rt,ra,rb <= r29), mm=00|
-| RSUB16 rt, ra, rb | Signed Halving sub | RSUB (r16 <= rt,ra,rb <= r23), mm=00|
+# Comparative analysis of Andes Packed ISA proposal vs Harmonised RVP
+
+Harmonised RVP is a proposal to provide SIMD functionality comparable to the Andes Packed SIMD ISA, but in a manner that is forwards compatible ("harmonised") with the RV Vector specification.
+
+An example use case is a string copy operation - using Harmonised RVP, code can use integer register SIMD instructions to copy a string. This code can then also execute (unchanged) on a full RV Vector processor and use the dedicated vector unit to copy the string. Harmonised RVP also upwards compatibility between RV32 and RV64 SIMD using this same approach.
+
+## Register file comparison
+
+The default Harmonised RVP GPR register file is divided into a lower bank of Vector[INT8] and an upper bank of Vector[INT16].
+In contrast, the Andes Packed SIMD ISA permits any GPR to be used for either INT8 or INT16 vector operations.
+
+| Register | Andes ISA | Harmonised RVP ISA |
+| ------------------ | ------------------------- | ------------------- |
+| v0 | Hardwired zero | Hardwired zero |
+| v1 | 32bit GPR or Vector[4xINT8 or 2xINT16] | Predicate mask |
+| | | |
+| v2 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
+| v3 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
+| v4 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
+| v5 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
+| v6 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
+| v7 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
+| v8 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| v9 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| v10 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| v11 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| v12 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| v13 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| v14 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| v15 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
+| | | |
+| v16 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v17 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v18 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v19 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v20 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v21 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v22 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v23 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
+| v24 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
+| v25 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
+| v26 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
+| v27 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
+| v28 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
+| v29 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
+| | | |
+| v30 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[1xSINT32] |
+| v31 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[1xSINT32] |
+
+However, programmers may reconfigure the Harmonised RVP register file if the default configuration is unsuitable.
+To keep implementations simple and focused on within-register SIMD only, there is a strict 1:1 mapping between vectors (v0-v31) and integer registers (r0-r31).
+Programmers needing forwards compatibility with RV Vector implementations should use VLD and VST to load/store from vector registers (even though these are then mapped into integer registers).
+
+## Proposed Harmonised RVP vector op instruction encoding
+
+Register x 2 -> register operations:
+
+| 31 30 29 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 |
+| ----------------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- |
+| func_6 | 0 | rs2 | rs1 | 0 | mm | rd1 | VOP opcode |
+
+Immediate + register -> register operations:
+
+| 31 30 29 | 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 |
+| -------- | -------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- |
+| func_3 | imm[7:5] | 1 | imm[4:0] | rs1 | 0 | mm | rd1 | VOP opcode |
+
+Register x 3 -> register operations:
+
+| 31 30 29 28 27 | 26 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 |
+| ----------------------- | -------------- | -------------- | -- | ----- | ----------- | ------------- |
+| rs3 | func_2 | rs2 | rs1 | 1 | mm | rd1 | VOP opcode |
+
+Values for mm field (bits 12:13 above):
+
+* mm = 00 -> no predicate mask, and use current global saturation / rounding settings
+* mm = 00 -> no predicate mask, and force saturation or rounding for this instruction only
+* mm = 10 -> use v1 as predicate mask, and use global saturation / rounding settings
+* mm = 11 -> use ~v1 as predicate mask, and use global saturation / rounding settings
## 16-bit Arithmetic
| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------- | ------------------- |
-| ADD16 rt, ra, rb | Add | VADD (r16 <= rt,ra,rb <= r29), mm=00|
-| RADD16 rt, ra, rb | Signed Halving add | RADD (r16 <= rt,ra,rb <= r23), mm=00|
-| URADD16 rt, ra, rb | Unsigned Halving add | RADD (r24 <= rt,ra,rb <= r29), mm=00|
-| KADD16 rt, ra, rb | Signed Saturating add | VADD (r16 <= rt,ra,rb <= r23), mm=01|
-| UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (r24 <= rt,ra,rb <= r29), mm=01|
-| SUB16 rt, ra, rb | Subtract | VSUB (r16 <= rt,ra,rb <= r29), mm=00|
-| RSUB16 rt, ra, rb | Signed Halving sub | RSUB (r16 <= rt,ra,rb <= r23), mm=00|
-| URSUB16 rt, ra, rb | Unsigned Halving sub | RSUB (r24 <= rt,ra,rb <= r29), mm=00|
-| KSUB16 rt, ra, rb | Signed Saturating sub | VSUB (r16 <= rt,ra,rb <= r23), mm=01|
-| UKSUB16 rt, ra, rb | Unsigned Saturating sub | VSUB (r24 <= rt,ra,rb <= r29), mm=01|
+| ADD16 rt, ra, rb | Add | VADD (v16 <= rt,ra,rb <= v29), mm=00|
+| RADD16 rt, ra, rb | Signed Halving add | RADD (v16 <= rt,ra,rb <= v23), mm=00|
+| URADD16 rt, ra, rb | Unsigned Halving add | RADD (v24 <= rt,ra,rb <= v29), mm=00|
+| KADD16 rt, ra, rb | Signed Saturating add | VADD (v16 <= rt,ra,rb <= v23), mm=01|
+| UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (v24 <= rt,ra,rb <= v29), mm=01|
+| SUB16 rt, ra, rb | Subtract | VSUB (v16 <= rt,ra,rb <= v29), mm=00|
+| RSUB16 rt, ra, rb | Signed Halving sub | RSUB (v16 <= rt,ra,rb <= v23), mm=00|
+| URSUB16 rt, ra, rb | Unsigned Halving sub | RSUB (v24 <= rt,ra,rb <= v29), mm=00|
+| KSUB16 rt, ra, rb | Signed Saturating sub | VSUB (v16 <= rt,ra,rb <= v23), mm=01|
+| UKSUB16 rt, ra, rb | Unsigned Saturating sub | VSUB (v24 <= rt,ra,rb <= v29), mm=01|
| CRAS16 rt, ra, rb | Cross Add & Sub | |
| RCRAS16 rt, ra, rb | Signed Halving Cross Add & Sub | |
| URCRAS16 rt, ra, rb| Unsigned Halving Cross Add & Sub | |
| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------- | ------------------- |
-| ADD8 rt, ra, rb | Add | VADD (r2 <= rt,ra,rb <= r15), mm=00 |
-| RADD8 rt, ra, rb | Signed Halving add | RADD (r2 <= rt,ra,rb <= r7), mm=00 |
-| URADD8 rt, ra, rb | Unsigned Halving add | RADD (r8 <= rt,ra,rb <= r15), mm=00 |
-| KADD8 rt, ra, rb | Signed Saturating add | VADD (r2 <= rt,ra,rb <= r7), mm=01 |
-| UKADD8 rt, ra, rb | Unsigned Saturating add | VADD (r8 <= rt,ra,rb <= r15), mm=01 |
-| SUB8 rt, ra, rb | Subtract | VSUB (r2 <= rt,ra,rb <= r15), mm=00 |
-| RSUB8 rt, ra, rb | Signed Halving sub | RSUB (r2 <= rt,ra,rb <= r7), mm=00 |
-| URSUB8 rt, ra, rb | Unsigned Halving sub | RSUB (r8 <= rt,ra,rb <= r15), mm=00 |
-| KSUB8 rt, ra, rb | Signed Saturating sub | VSUB (r2 <= rt,ra,rb <= r7), mm=01 |
-| UKSUB8 rt, ra, rb | Unsigned Saturating sub | VSUB (r8 <= rt,ra,rb <= r15), mm=01 |
+| ADD8 rt, ra, rb | Add | VADD (v2 <= rt,ra,rb <= v15), mm=00 |
+| RADD8 rt, ra, rb | Signed Halving add | RADD (v2 <= rt,ra,rb <= v7), mm=00 |
+| URADD8 rt, ra, rb | Unsigned Halving add | RADD (v8 <= rt,ra,rb <= v15), mm=00 |
+| KADD8 rt, ra, rb | Signed Saturating add | VADD (v2 <= rt,ra,rb <= v7), mm=01 |
+| UKADD8 rt, ra, rb | Unsigned Saturating add | VADD (v8 <= rt,ra,rb <= v15), mm=01 |
+| SUB8 rt, ra, rb | Subtract | VSUB (v2 <= rt,ra,rb <= v15), mm=00 |
+| RSUB8 rt, ra, rb | Signed Halving sub | RSUB (v2 <= rt,ra,rb <= v7), mm=00 |
+| URSUB8 rt, ra, rb | Unsigned Halving sub | RSUB (v8 <= rt,ra,rb <= v15), mm=00 |
+| KSUB8 rt, ra, rb | Signed Saturating sub | VSUB (v2 <= rt,ra,rb <= v7), mm=01 |
+| UKSUB8 rt, ra, rb | Unsigned Saturating sub | VSUB (v8 <= rt,ra,rb <= v15), mm=01 |
## 16-bit Shifts
| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------- | ------------------- |
-| SRA16 rt, ra, rb | Shift right arithmetic | VSRA (r16 <= rt,ra,rb <= r29), mm=00|
-| SRAI16 rt, ra, im | Shift right arithmetic imm | VSRAI (r16 <= rt,ra <= r29), mm=00|
-| SRA16.u rt, ra, rb | Rounding Shift right arithmetic | VSRA (r16 <= rt,ra,rb <= r29), mm=01|
-| SRAI16.u rt, ra, im | Rounding Shift right arithmetic imm | VSRAI (r16 <= rt,ra <= r29), mm=01|
-| SRL16 rt, ra, rb | Shift right logical | VSRL (r16 <= rt,ra,rb <= r29), mm=00|
-| SRLI16 rt, ra, im | Shift right logical imm | VSRLI (r16 <= rt,ra <= r29), mm=00|
-| SRL16.u rt, ra, rb | Rounding Shift right logical | VSRL (r16 <= rt,ra,rb <= r29), mm=01|
-| SRLI16.u rt, ra, im | Rounding Shift right logical imm | VSLRI (r16 <= rt,ra <= r29), mm=01|
-| SLL16 rt, ra, rb | Shift left logical | VSLL (r16 <= rt,ra,rb <= r29), mm=00|
-| SLLI16 rt, ra, im | Shift left logical imm | VSLLI (r16 <= rt,ra <= r29), mm=00|
-| KSLL16 rt, ra, rb | Saturating Shift left logical | VSLL (r16 <= rt,ra,rb <= r29), mm=01|
-| KSLLI16 rt, ra, im | Saturating Shift left logical imm | VSLLI (r16 <= rt,ra <= r29), mm=01|
+| SRA16 rt, ra, rb | Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=00|
+| SRAI16 rt, ra, im | Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=00|
+| SRA16.u rt, ra, rb | Rounding Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=01|
+| SRAI16.u rt, ra, im | Rounding Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=01|
+| SRL16 rt, ra, rb | Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=00|
+| SRLI16 rt, ra, im | Shift right logical imm | VSRLI (v16 <= rt,ra <= v29), mm=00|
+| SRL16.u rt, ra, rb | Rounding Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=01|
+| SRLI16.u rt, ra, im | Rounding Shift right logical imm | VSLRI (v16 <= rt,ra <= v29), mm=01|
+| SLL16 rt, ra, rb | Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=00|
+| SLLI16 rt, ra, im | Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=00|
+| KSLL16 rt, ra, rb | Saturating Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=01|
+| KSLLI16 rt, ra, im | Saturating Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=01|
| KSLRA16 rt, ra, rb | Saturating Shift left logical or Shift right arithmetic ||
| KSLRA16.u rt, ra, rb | Saturating Shift left logical or Rounding Shift right arithmetic ||
| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------- | ------------------- |
-| n/a | Shift right arithmetic | VSRA (r2 <= rt,ra,rb <= r15), mm=00|
-| n/a | Shift right arithmetic imm | VSRAI (r2 <= rt,ra <= r15), mm=00|
-| n/a | Rounding Shift right arithmetic | VSRA (r2 <= rt,ra,rb <= r15), mm=01|
-| n/a | Rounding Shift right arithmetic imm | VSRAI (r2 <= rt,ra <= r15), mm=01|
-| n/a | Shift right logical | VSRL (r2 <= rt,ra,rb <= r15), mm=00|
-| n/a | Shift right logical imm | VSRLI (r2 <= rt,ra <= r15), mm=00|
-| n/a | Rounding Shift right logical | VSRL (r2 <= rt,ra,rb <= r15), mm=01|
-| n/a | Rounding Shift right logical imm | VSLRI (r2 <= rt,ra <= r15), mm=01|
-| n/a | Shift left logical | VSLL (r2 <= rt,ra,rb <= r15), mm=00|
-| n/a | Shift left logical imm | VSLLI (r2 <= rt,ra <= r15), mm=00|
-| n/a | Saturating Shift left logical | VSLL (r2 <= rt,ra,rb <= r15), mm=01|
-| n/a | Saturating Shift left logical imm | VSLLI (r2 <= rt,ra <= r15), mm=01|
+| n/a | Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=00|
+| n/a | Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=00|
+| n/a | Rounding Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=01|
+| n/a | Rounding Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=01|
+| n/a | Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=00|
+| n/a | Shift right logical imm | VSRLI (v2 <= rt,ra <= v15), mm=00|
+| n/a | Rounding Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=01|
+| n/a | Rounding Shift right logical imm | VSLRI (v2 <= rt,ra <= v15), mm=01|
+| n/a | Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=00|
+| n/a | Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=00|
+| n/a | Saturating Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=01|
+| n/a | Saturating Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=01|
## 16-bit Comparison instructions
| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------- | ------------------- |
-| CMPEQ16 rt, ra, rb | Compare equal | VSEQ (r16 <= rt,ra,rb <= r29), mm=00|
-| SCMPLT16 rt, ra, rb | Signed Compare less than | !VSGT (r16 <= rt,ra,rb <= r23), mm=00|
-| SCMPLE16 rt, ra, rb | Signed Compare less or equal | VSLE (r16 <= rt,ra,rb <= r23), mm=00|
-| UCMPLT16 rt, ra, rb | Unsigned Compare less than | !VSGT (r24 <= rt,ra,rb <= r29), mm=00|
-| UCMPLE16 rt, ra, rb | Unsigned Compare less or equal | VSLE (r24 <= rt,ra,rb <= r29), mm=00|
+| CMPEQ16 rt, ra, rb | Compare equal | VSEQ (v16 <= rt,ra,rb <= v29), mm=00|
+| SCMPLT16 rt, ra, rb | Signed Compare less than | !VSGT (v16 <= rt,ra,rb <= v23), mm=00|
+| SCMPLE16 rt, ra, rb | Signed Compare less or equal | VSLE (v16 <= rt,ra,rb <= v23), mm=00|
+| UCMPLT16 rt, ra, rb | Unsigned Compare less than | !VSGT (v24 <= rt,ra,rb <= v29), mm=00|
+| UCMPLE16 rt, ra, rb | Unsigned Compare less or equal | VSLE (v24 <= rt,ra,rb <= v29), mm=00|
## 8-bit Comparison instructions
| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------- | ------------------- |
-| CMPEQ8 rt, ra, rb | Compare equal | VSEQ (r2 <= rt,ra,rb <= r7), mm=00|
-| SCMPLT8 rt, ra, rb | Signed Compare less than | !VSGT (r2 <= rt,ra,rb <= r7), mm=00|
-| SCMPLE8 rt, ra, rb | Signed Compare less or equal | VSLE (r2 <= rt,ra,rb <= r7), mm=00|
-| UCMPLT8 rt, ra, rb | Unsigned Compare less than | !VSGT (r8 <= rt,ra,rb <= r15), mm=00|
-| UCMPLE8 rt, ra, rb | Unsigned Compare less or equal | VSLE (r8 <= rt,ra,rb <= r15), mm=00|
+| CMPEQ8 rt, ra, rb | Compare equal | VSEQ (v2 <= rt,ra,rb <= v7), mm=00|
+| SCMPLT8 rt, ra, rb | Signed Compare less than | !VSGT (v2 <= rt,ra,rb <= v7), mm=00|
+| SCMPLE8 rt, ra, rb | Signed Compare less or equal | VSLE (v2 <= rt,ra,rb <= v7), mm=00|
+| UCMPLT8 rt, ra, rb | Unsigned Compare less than | !VSGT (v8 <= rt,ra,rb <= v15), mm=00|
+| UCMPLE8 rt, ra, rb | Unsigned Compare less or equal | VSLE (v8 <= rt,ra,rb <= v15), mm=00|
## 16-bit Miscellaneous instructions
| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------ | ------------------- |
-| SMIN16 rt, ra, rb | Signed minimum | VMIN (r16 <= rt,ra,rb <= r23), mm=00|
-| UMIN16 rt, ra, rb | Unsigned minimum | VMIN (r24 <= rt,ra,rb <= r29), mm=00|
-| SMAX16 rt, ra, rb | Signed maximum | VMAX (r16 <= rt,ra,rb <= r23), mm=00|
-| UMAX16 rt, ra, rb | Unsigned maximum | VMAX (r24 <= rt,ra,rb <= r29), mm=00|
-| SCLIP16 rt, ra, rb | Signed clip | ?VCLIP (r16 <= rt,ra,rb <= r23), mm=01|
-| UCLIP16 rt, ra, rb | Unsigned clip | ?VCLIP (r24 <= rt,ra,rb <= r29), mm=01|
-| KMUL16 rt, ra, rb | Signed multiply 16x16->16 | VMUL (r16 <= rt,ra,rb <= r23), mm=01|
+| SMIN16 rt, ra, rb | Signed minimum | VMIN (v16 <= rt,ra,rb <= v23), mm=00|
+| UMIN16 rt, ra, rb | Unsigned minimum | VMIN (v24 <= rt,ra,rb <= v29), mm=00|
+| SMAX16 rt, ra, rb | Signed maximum | VMAX (v16 <= rt,ra,rb <= v23), mm=00|
+| UMAX16 rt, ra, rb | Unsigned maximum | VMAX (v24 <= rt,ra,rb <= v29), mm=00|
+| SCLIP16 rt, ra, im | Signed clip | ?VCLIP (v16 <= rt,ra,rb <= v23), mm=01|
+| UCLIP16 rt, ra, im | Unsigned clip | ?VCLIP (v24 <= rt,ra,rb <= v29), mm=01|
+| KMUL16 rt, ra, rb | Signed multiply 16x16->16 | VMUL (v16 <= rt,ra,rb <= v23), mm=01|
| KMULX16 rt, ra, rb | Signed crossed multiply 16x16->16 | |
-| SMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, r16 <= ra,rb <= r23), mm=00|
+| SMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v16 <= ra,rb <= v23), mm=00|
| SMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | |
-| UMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, r24 <= ra,rb <= r31), mm=00|
+| UMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v24 <= ra,rb <= r31), mm=00|
| UMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | |
-| KABS16 rt, ra, rb | Saturated absolute value | VSGNX (r16 <= rt <= r29, r16 <= ra,rb <= r23, mm=01) |
+| KABS16 rt, ra | Saturated absolute value | VSGNX (v16 <= rt <= v29, v16 <= ra,rb <= v23, mm=01) |
## 8-bit Miscellaneous instructions
| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
| ------------------ | ------------------------- | ------------------- |
-| SMIN8 rt, ra, rb | Signed minimum | VMIN (r2 <= rt,ra,rb <= r7), mm=00|
-| UMIN8 rt, ra, rb | Unsigned minimum | VMIN (r8 <= rt,ra,rb <= r15), mm=00|
-| SMAX8 rt, ra, rb | Signed maximum | VMAX (r2 <= rt,ra,rb <= r7), mm=00|
-| UMAX8 rt, ra, rb | Unsigned maximum | VMAX (r8 <= rt,ra,rb <= r15), mm=00|
-| KABS8 rt, ra, rb | Saturated absolute value | VSGNX (r2 <= rt <= r15, r2 <= ra,rb <= r8, mm=01) |
+| SMIN8 rt, ra, rb | Signed minimum | VMIN (v2 <= rt,ra,rb <= v7), mm=00|
+| UMIN8 rt, ra, rb | Unsigned minimum | VMIN (v8 <= rt,ra,rb <= v15), mm=00|
+| SMAX8 rt, ra, rb | Signed maximum | VMAX (v2 <= rt,ra,rb <= v7), mm=00|
+| UMAX8 rt, ra, rb | Unsigned maximum | VMAX (v8 <= rt,ra,rb <= v15), mm=00|
+| KABS8 rt, ra | Saturated absolute value | VSGNX (v2 <= rt <= v15, v2 <= ra,rb <= v8, mm=01) |
+
+## 8-bit Unpacking instructions
+
+| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
+| ------------------ | ------------------------- | ------------------- |
+| SUNPKD810 rt, ra | Signed unpack bytes 1 & 0 | VMV (v16<= rt <= 23, v2 <= ra <= v7), mm=00|
+| SUNPKD820 rt, ra | Signed unpack bytes 2 & 0 | |
+| SUNPKD830 rt, ra | Signed unpack bytes 3 & 0 | |
+| SUNPKD831 rt, ra | Signed unpack bytes 3 & 1 | |
+| ZUNPKD810 rt, ra | Unsigned unpack bytes 1 & 0 | VMV (v24<= rt <= 31, v8 <= ra <= v15), mm=00|
+| ZUNPKD820 rt, ra | Unsigned unpack bytes 2 & 0 | |
+| ZUNPKD830 rt, ra | Unsigned unpack bytes 3 & 0 | |
+| ZUNPKD831 rt, ra | Unsigned unpack bytes 3 & 1 | |