--- /dev/null
+# 16 bit Compressed
+
+This one is a conundrum. OpenPOWER ISA was never designed with 16
+bit in mind. VLE was added 10 years ago but only by way of marking
+an entire 64k page as "VLE". With no means to mix 32 bit and 16 bit,
+jumping between the two would have been painful and taken up space.
+
+Here, in order to embed 16 bit into a predominantly 32 bit stream the
+overhead of using an entire 16 bits just to switch into Compressed mode
+is itself a significant overhead. The situation is made worse by 5 bits
+being taken up by Major Opcode space, leaving only 11 bits to allocate
+to actual instructions.
+
+In addition we would like to add SV-C32 which is a Vectorised version
+of 16 bit Compressed, and ideally have a variant that adds the 27-bit
+prefix format from SV-P64, as well.
+
+Potential ways to reduce pressure on the 16 bit space are:
+
+* To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
+* To enter "16 bit mode" for durations specified at the start
+* To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
+
+This latter would be useful in the Vector context to have an alternative
+meaning: as the bit which determines whether the instruction is 11-bit
+prefixed or 27-bit prefixed:
+
+ 0 1 2 3 4 5 6 7 8 9 a b c d e f |
+ |major op | 11 bit vector prefix|
+ |16 bit opcode alt vec. mode ^ |
+ | extra vector prefix if alt set|
+
+Using a major opcode to enter 16 bit mode, leaves 11 bits to find
+something to use them for:
+
+ 0 1 2 3 4 5 6 7 8 9 a b c d e f |
+ |major op | what to do here 1 |
+ |16 bit stay in 16bit mode 1 |
+ |16 bit stay in 16bit mode 1 |
+ |16 bit exit 16bit mode 0 |
+
+One possibility is that the 11 bits are used for bank selection, with
+some room for additional context such as altering the registers used
+for the 16 bit operations (bank selection of which scalar regs)
+
+Another is to use the 11 bits for only the utmost commonly used
+instructions. That being the case then even one of those 11 bits would
+also need to be dedicated to saying if 16 bit mode is to be continued.
+10 bits remain for actual opcodes!
+
+# Opcode Allocation Ideas
+
+## Opcodes exploration (Attempt 1)
+
+### Branch
+
+10 bit mode may be expanded by 16 bit mode later, adding capabilities
+that do not fit in the extreme limited space.
+
+ | 0 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
+ | offs2 | | 0 0 0 | offs | LK | 1 | b
+ | BO2 | BI3 | | 0 0 1 | 00 | BI | BO | LK | 1 | bclr
+ | BO2 | BI3 | | 0 0 1 | 01 | BI | BO | LK | 1 | bctar
+
+16 bit mode:
+
+* offs2 extends offset in MSBs
+* BI3 extends BI in MSBs to allow selection of full CR
+* BO2 extends BO
+
+10 bit mode:
+
+* BO[0] enables CR check, BO[1] inverts check
+* BI refers to CR0 only (4 bits of)
+* no Branch Conditional with immediate
+* no Absolute Address
+* no CTR mode (and no bctr)
+* offs is to 2 byte (signed) aligned
+* all branches to 2 byte aligned
+
+### LD/ST
+
+ | 0 | 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
+ | F | RA2 | RT | | 0 0 1 | 11 | RA | RB | 0 | 1 | ld
+ | F | RT2 | RB | | 0 0 1 | 11 | RA | RT | 1 | 1 | st
+
+* elwidth overrides can set different widths
+
+16 bit mode:
+
+* F=1 is FLD, FST
+* RA2 extends RA to 3 bits (MSB)
+* RT2 extends RT to 3 bits (MSB)
+
+10 bit mode:
+
+* RA and RB are only 2 bit (0-3)
+* for LD, RT is implicitly RB: ld RT=RB, RA(RB)
+* for ST, there is no offset: st RT, RA(0)
+
+### Arithmetic
+
+ | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
+ | | | | 0 1 0 | RB | RA | 0 | 1 | add
+ | | | | 0 1 0 | RB | RA | 1 | 1 | mul
+ | | | | 0 1 1 | RB | (RA|0)| 0 | 1 | sub
+ | | | | 0 1 1 | RB | (RA|0)| 1 | 1 | cmp
+
+10 bit mode:
+
+* cmp default target is CR0
+* for (RA|0) when RA=0 the input is a zero immediate,
+ meaning that sub becomes neg, and cmp becomes cmp-against-zero
+
+### Logical
+
+ | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
+ | | | | 1 0 0 | RB | RA | 0 | 1 | and
+ | | | | 1 0 0 | RB | RA | 1 | 1 | nand
+ | | | | 1 0 1 | RB | RA | 0 | 1 | or
+ | | | | 1 0 1 | RB | (RA|0)| 1 | 1 | nor
+
+10 bit mode:
+
+* for (RA|0) when RA=0 the input is a zero immediate,
+ meaning that nor becomes not
+
+### Floating Point
+
+ | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
+ | | RT | | 1 1 0 | RB | RA!=0 | 0 | 1 | fadd
+ | | RT | | 1 1 0 | RB | 0 0 0 | 0 | 1 | fabs
+ | | RT | | 1 1 0 | RB | RA | 1 | 1 | fmul
+ | | RT | | 1 1 1 | RB | (RA|0)| 0 | 1 | fsub
+ | | RT | | 1 1 1 | RB | (RA|0)| 1 | 1 | fcmp
+
+10 bit mode:
+
+* fcmp default target is CR1
+* for (RA|0) when RA=0 the input is a zero immediate,
+ meaning that fsub becomes fneg, and fcmp becomes fcmp-against-zero
+
+### Condition Register
+
+ | 0 1 2 3 | 4 | | 5 6 7 | 8 9 | a b | c d e | f |
+ | 0 0 0 0 | BF2 | | 0 0 1 | 10 | BF | BFA | 1 | mcrf
+
+10 bit mode:
+
+* BF is only 2 bits which means the destination is only CR0-CR3
+
# 16 bit Compressed
-This one is a conundrum. OpenPOWER ISA was never designed with 16
-bit in mind. VLE was added 10 years ago but only by way of marking
-an entire 64k page as "VLE". With no means to mix 32 bit and 16 bit,
-jumping between the two would have been painful and taken up space.
-
-Here, in order to embed 16 bit into a predominantly 32 bit stream the
-overhead of using an entire 16 bits just to switch into Compressed mode
-is itself a significant overhead. The situation is made worse by 5 bits
-being taken up by Major Opcode space, leaving only 11 bits to allocate
-to actual instructions.
-
-In addition we would like to add SV-C32 which is a Vectorised version
-of 16 bit Compressed, and ideally have a variant that adds the 27-bit
-prefix format from SV-P64, as well.
-
-Potential ways to reduce pressure on the 16 bit space are:
-
-* To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
-* To enter "16 bit mode" for durations specified at the start
-* To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
-
-This latter would be useful in the Vector context to have an alternative
-meaning: as the bit which determines whether the instruction is 11-bit
-prefixed or 27-bit prefixed:
-
- 0 1 2 3 4 5 6 7 8 9 a b c d e f |
- |major op | 11 bit vector prefix|
- |16 bit opcode alt vec. mode ^ |
- | extra vector prefix if alt set|
-
-Using a major opcode to enter 16 bit mode, leaves 11 bits to find
-something to use them for:
-
- 0 1 2 3 4 5 6 7 8 9 a b c d e f |
- |major op | what to do here 1 |
- |16 bit stay in 16bit mode 1 |
- |16 bit stay in 16bit mode 1 |
- |16 bit exit 16bit mode 0 |
-
-One possibility is that the 11 bits are used for bank selection, with
-some room for additional context such as altering the registers used
-for the 16 bit operations (bank selection of which scalar regs)
-
-Another is to use the 11 bits for only the utmost commonly used
-instructions. That being the case then even one of those 11 bits would
-also need to be dedicated to saying if 16 bit mode is to be continued.
-10 bits remain for actual opcodes!
-
-## 16 bit Compressed opcodes exploration
-
-### Branch
-
-10 bit mode may be expanded by 16 bit mode later, adding capabilities
-that do not fit in the extreme limited space.
-
- | 0 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
- | offs2 | | 0 0 0 | offs | LK | 1 | b
- | BO2 | BI3 | | 0 0 1 | 00 | BI | BO | LK | 1 | bclr
- | BO2 | BI3 | | 0 0 1 | 01 | BI | BO | LK | 1 | bctar
-
-16 bit mode:
-
-* offs2 extends offset in MSBs
-* BI3 extends BI in MSBs to allow selection of full CR
-* BO2 extends BO
-
-10 bit mode:
-
-* BO[0] enables CR check, BO[1] inverts check
-* BI refers to CR0 only (4 bits of)
-* no Branch Conditional with immediate
-* no Absolute Address
-* no CTR mode (and no bctr)
-* offs is to 2 byte (signed) aligned
-* all branches to 2 byte aligned
-
-### LD/ST
-
- | 0 | 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
- | F | RA2 | RT | | 0 0 1 | 11 | RA | RB | 0 | 1 | ld
- | F | RT2 | RB | | 0 0 1 | 11 | RA | RT | 1 | 1 | st
-
-* elwidth overrides can set different widths
-
-16 bit mode:
-
-* F=1 is FLD, FST
-* RA2 extends RA to 3 bits (MSB)
-* RT2 extends RT to 3 bits (MSB)
-
-10 bit mode:
-
-* RA and RB are only 2 bit (0-3)
-* for LD, RT is implicitly RB: ld RT=RB, RA(RB)
-* for ST, there is no offset: st RT, RA(0)
-
-### Arithmetic
-
- | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
- | | | | 0 1 0 | RB | RA | 0 | 1 | add
- | | | | 0 1 0 | RB | RA | 1 | 1 | mul
- | | | | 0 1 1 | RB | (RA|0)| 0 | 1 | sub
- | | | | 0 1 1 | RB | (RA|0)| 1 | 1 | cmp
-
-10 bit mode:
-
-* cmp default target is CR0
-* for (RA|0) when RA=0 the input is a zero immediate,
- meaning that sub becomes neg, and cmp becomes cmp-against-zero
-
-### Logical
-
- | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
- | | | | 1 0 0 | RB | RA | 0 | 1 | and
- | | | | 1 0 0 | RB | RA | 1 | 1 | nand
- | | | | 1 0 1 | RB | RA | 0 | 1 | or
- | | | | 1 0 1 | RB | (RA|0)| 1 | 1 | nor
-
-10 bit mode:
-
-* for (RA|0) when RA=0 the input is a zero immediate,
- meaning that nor becomes not
-
-### Floating Point
-
- | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
- | | RT | | 1 1 0 | RB | RA!=0 | 0 | 1 | fadd
- | | RT | | 1 1 0 | RB | 0 0 0 | 0 | 1 | fabs
- | | RT | | 1 1 0 | RB | RA | 1 | 1 | fmul
- | | RT | | 1 1 1 | RB | (RA|0)| 0 | 1 | fsub
- | | RT | | 1 1 1 | RB | (RA|0)| 1 | 1 | fcmp
-
-10 bit mode:
-
-* fcmp default target is CR1
-* for (RA|0) when RA=0 the input is a zero immediate,
- meaning that fsub becomes fneg, and fcmp becomes fcmp-against-zero
-
-### Condition Register
-
- | 0 1 2 3 | 4 | | 5 6 7 | 8 9 | a b | c d e | f |
- | 0 0 0 0 | BF2 | | 0 0 1 | 10 | BF | BFA | 1 | mcrf
-
-10 bit mode:
-
-* BF is only 2 bits which means the destination is only CR0-CR3
+See [[16_bit_compressed]]