add ls001.po9 RFC

[libreriscv.git] / openpower / sv / major_opcode_allocation.mdwn
diff --git a/openpower/sv/major_opcode_allocation.mdwn b/openpower/sv/major_opcode_allocation.mdwn

index 2e5b7cc02f6e984f3b1d370540851deb68b0d289..f94733e8391450430f4ca77417ae255706fa8c30 100644 (file)
--- a/openpower/sv/major_opcode_allocation.mdwn
+++ b/openpower/sv/major_opcode_allocation.mdwn
@@ -15,14 +15,94 @@ This **only** in "LibreSOC Mode".  Candidates for moving elsewhere
  include mulli, twi and tdi.
  
  * 2 opcodes for 16-bit Compressed instructions with 11 bits available
-* 2 opcodes are required in order to give SV-P48 (and SV-C32) the 11 bits needed for prefixing
-* 2 opcodes are likewise required for SV-P64 (and SV-C48) to have 27 bits available
-* 2 opcodes for SV VBLOCK
+* 2 opcodes are required in order to give SV-P48 the 11 bits needed for prefixing
+* 2 opcodes are likewise required for SV-P64 to have 27 bits available
+* 2 opcodes for SV-C32 and SV-C48 (32 bit versions of P48 and P64)
  
  With only 11 bits for 16-bit Compressed, it may be better to use the
-opportunity to switch into "16 bit mode".  Interestingly SV-P32 could
+opportunity to switch into "16 bit mode".  Interestingly SV-C32 could
  likewise switch into the same.
  
+VBLOCK can be added later by using further VSX dedicated major opcodes
+(EXT62, EXT60)
+
+* EXT00 - unused (one instruction: attn)
+* EXT01 - v3.1B prefix
+* EXT02 - twi
+* EXT03 - tdi
+* EXT04 - vector/bcd
+* EXT05 - unused
+* EXT06 - vector
+* EXT07 - mulli
+* EXT09 - reserved
+* EXT17 - unused (2 instructions: sc, scv)
+* EXT22 - reserved sandbox
+* EXT46 - lmw
+* EXT47 - stmw
+* EXT56 - lq
+* EXT57 - vector ld
+* EXT58 - ld (leave ok)
+* EXT59 - FP (leave ok)
+* EXT60 - vector
+* EXT61 - st (leave ok)
+* EXT62 - vector st
+* EXT63 - FP (leave ok)
+
+Potential allocations:
+
+    |  hword 0   | hword1  |  hword2    |  hword 3   |
+    EXT00/01 - C 10bit -> 16bit
+    EXT60/62 - VBLOCK
+    EXT09/17 - SV-C32 and other SV-C
+    EXT06/07 - SV-C32-Swizzle and other SV-C-Swizzle
+    EXT02/03 - SV-P48                
+    EXT04/05 - SV-P64
+    EXT56/57 - Predicated-SV-P48
+    EXT46/47 - Predicated SV-P64
+
+Spare:
+
+* EXT22
+
+## C10/16 FSM
+
+    if EXT == 00/01
+         start @ 10bit
+    if state==10bit:
+         if bit15:
+             next = 16bit
+         else:
+             next = Standard
+    if state==16bit:
+         if bit0 & bit15:
+             insn = C.immediate
+         if ~bit15:
+             if ~bit0:
+                 next = Standard
+             else
+                 next = Standard.then.16bit
+
+## SV-Compressed FSM
+
+    if EXT == 09/17:
+        if bit0:
+             SV.mode = 
+
+# Major opcode map
+
+Table 9: Primary Opcode Map (opcode bits 0:5)
+
+        |  000   |   001 |  010  | 011   |  100  |    101 |  110  |  111
+    000 |        |       |  tdi  | twi   | EXT04 |        |       | mulli | 000
+    001 | subfic |       | cmpli | cmpi  | addic | addic. | addi  | addis | 001
+    010 | bc/l/a | EXT17 | b/l/a | EXT19 | rlwimi| rlwinm |       | rlwnm | 010
+    011 |  ori   | oris  | xori  | xoris | andi. | andis. | EXT30 | EXT31 | 011
+    100 |  lwz   | lwzu  | lbz   | lbzu  | stw   | stwu   | stb   | stbu  | 100
+    101 |  lhz   | lhzu  | lha   | lhau  | sth   | sthu   | lmw   | stmw  | 101
+    110 |  lfs   | lfsu  | lfd   | lfdu  | stfs  | stfsu  | stfd  | stfdu | 110
+    111 |  lq    | EXT57 | EXT58 | EXT59 | EXT60 | EXT61  | EXT62 | EXT63 | 111
+        |  000   |   001 |   010 |  011  |   100 |   101  | 110   |  111
+
  # LE/BE complications.
  
  See <https://bugs.libre-soc.org/show_bug.cgi?id=529> for discussion
@@ -56,152 +136,24 @@ With the Major Opcode then always being in the 1st 2 bytes it becomes
  much simpler for the pre-analysis phase to determine instruction length,
  regardless of what that length is (16/32/48/64/VBLOCK).
  
-# 16 bit Compressed
-
-This one is a conundrum.  OpenPOWER ISA was never designed with 16
-bit in mind.  VLE was added 10 years ago but only by way of marking
-an entire 64k page as "VLE".  With no means to mix 32 bit and 16 bit,
-jumping between the two would have been painful and taken up space.
-
-Here, in order to embed 16 bit into a predominantly 32 bit stream the
-overhead of using an entire 16 bits just to switch into Compressed mode
-is itself a significant overhead.  The situation is made worse by 5 bits
-being taken up by Major Opcode space, leaving only 11 bits to allocate
-to actual instructions.
-
-In addition we would like to add SV-C32 which is a Vectorised version
-of 16 bit Compressed, and ideally have a variant that adds the 27-bit
-prefix format from SV-P64, as well.
-
-Potential ways to reduce pressure on the 16 bit space are:
-
-* To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
-* To enter "16 bit mode" for durations specified at the start
-* To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
-
-This latter would be useful in the Vector context to have an alternative
-meaning: as the bit which determines whether the instruction is 11-bit
-prefixed or 27-bit prefixed:
-
-    0 1 2 3 4 5 6 7 8 9 a b c d e f |
-    |major op | 11 bit vector prefix|
-    |16 bit opcode  alt vec. mode ^ |
-    | extra vector prefix if alt set|
-
-Using a major opcode to enter 16 bit mode, leaves 11 bits to find
-something to use them for:
-
-    0 1 2 3 4 5 6 7 8 9 a b c d e f |
-    |major op | what to do here   1 |
-    |16 bit    stay in 16bit mode 1 |
-    |16 bit    stay in 16bit mode 1 |
-    |16 bit       exit 16bit mode 0 |
-
-One possibility is that the 11 bits are used for bank selection, with
-some room for additional context such as altering the registers used
-for the 16 bit operations (bank selection of which scalar regs)
-
-Another is to use the 11 bits for only the utmost commonly used
-instructions.  That being the case then even one of those 11 bits would
-also need to be dedicated to saying if 16 bit mode is to be continued.
-10 bits remain for actual opcodes!
-
-## 16 bit Compressed opcodes exploration
-
-### Branch
-
-10 bit mode may be expanded by 16 bit mode later, adding capabilities
-that do not fit in the extreme limited space.
-
-    | 0 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e  | f |
-    |   offs2     | | 0 0 0 |     offs        | LK | 1 | b
-    | BO2 | BI3   | | 0 0 1 | 00  | BI  | BO  | LK | 1 | bclr
-    | BO2 | BI3   | | 0 0 1 | 01  | BI  | BO  | LK | 1 | bctar
-
-16 bit mode:
+Option 3:
  
-* offs2 extends offset in MSBs
-* BI3 extends BI in MSBs to allow selection of full CR
-* BO2 extends BO
+Just as in VLE, require instructions to be in BE order. Data, which has nothing to do with instruction order, may optionally remain in LE order.
  
-10 bit mode:
+## Why does VLE use a separate 64k page?
  
-* BO[0] enables CR check, BO[1] inverts check
-* BI refers to CR0 only (4 bits of)
-* no Branch Conditional with immediate
-* no Absolute Address
-* no CTR mode (and no bctr)
-* offs is to 2 byte (signed) aligned
-* all branches to 2 byte aligned
+VLE requires that the memory page be marked as VLE-encoded.  It also requires rhat the instructions be in BE order even when 32 bit standard opcodes are mixed in.
  
-### LD/ST
+Questions:
  
-    | 0 | 1   | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
-    | F | RA2 |  RT   | | 0 0 1 | 11  | RA  | RB  | 0 | 1 | ld
-    | F | RT2 |  RB   | | 0 0 1 | 11  | RA  | RT  | 1 | 1 | st
+* What would happen without the page being marked, when attempting to call ppc64le ABI code?
+* How would ppc64le code in the same page be distinguished from SVPrefix code?
  
-* elwidth overrides can set different widths
+The answers are that it is either impossible or that it requires a special mode-switching instruction to be called on entry and exit from functions, transitioning to and from ppc64le mode.
  
-16 bit mode:
+This transition may be achieved very simply by marking the 64k page.
  
-* F=1 is FLD, FST
-* RA2 extends RA to 3 bits (MSB)
-* RT2 extends RT to 3 bits (MSB)
-
-10 bit mode:
-
-* RA and RB are only 2 bit (0-3)
-* for LD, RT is implicitly RB: ld RT=RB, RA(RB)
-* for ST, there is no offset: st RT, RA(0)
-
-### Arithmetic
-
-    | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
-    |     |       | | 0 1 0 | RB    | RA    | 0 | 1 | add
-    |     |       | | 0 1 0 | RB    | RA    | 1 | 1 | mul
-    |     |       | | 0 1 1 | RB    | (RA|0)| 0 | 1 | sub
-    |     |       | | 0 1 1 | RB    | (RA|0)| 1 | 1 | cmp
-
-10 bit mode:
-
-* cmp default target is CR0
-* for (RA|0) when RA=0 the input is a zero immediate,
-  meaning that sub becomes neg, and cmp becomes cmp-against-zero
-
-### Logical
-
-    | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
-    |     |       | | 1 0 0 | RB    | RA    | 0 | 1 | and
-    |     |       | | 1 0 0 | RB    | RA    | 1 | 1 | nand
-    |     |       | | 1 0 1 | RB    | RA    | 0 | 1 | or
-    |     |       | | 1 0 1 | RB    | (RA|0)| 1 | 1 | nor
-
-10 bit mode:
-
-* for (RA|0) when RA=0 the input is a zero immediate,
-  meaning that nor becomes not
-
-### Floating Point
-
-    | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
-    |     |  RT   | | 1 1 0 | RB    | RA!=0 | 0 | 1 | fadd
-    |     |  RT   | | 1 1 0 | RB    | 0 0 0 | 0 | 1 | fabs
-    |     |  RT   | | 1 1 0 | RB    | RA    | 1 | 1 | fmul
-    |     |  RT   | | 1 1 1 | RB    | (RA|0)| 0 | 1 | fsub
-    |     |  RT   | | 1 1 1 | RB    | (RA|0)| 1 | 1 | fcmp
-
-10 bit mode:
-
-* fcmp default target is CR1
-* for (RA|0) when RA=0 the input is a zero immediate,
-  meaning that fsub becomes fneg, and fcmp becomes fcmp-against-zero
-
-### Condition Register
-
-    | 0 1 2 3 | 4   | | 5 6 7 | 8 9 | a b | c d e  | f |
-    | 0 0 0 0 | BF2 | | 0 0 1 | 10  | BF  | BFA    | 1 | mcrf
-
-10 bit mode:
+# 16 bit Compressed
  
-* BF is only 2 bits which means the destination is only CR0-CR3
+See [[16_bit_compressed]]