+++ /dev/null
-# OPF ISA WG External RFC ls005 v1: XLEN
-
-* RFC Author: Luke Kenneth Casson Leighton.
-* RFC Contributors/Ideas: Jacob Lifshay, Toshaan Bharvani
-* Funded by NLnet under the NGI Zero Entrust EU Horizon Europe Grant 101069594
-
-**URLs**:
-
-* <https://libre-soc.org/openpower/sv/rfc/ls005/>
-* <https://bugs.libre-soc.org/show_bug.cgi?id=988>
-* <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=openpower/isa;hb=HEAD>
-* <https://git.openpower.foundation/isa/PowerISA/issues/104>
-
-**Severity**: Major
-
-**Status**: New
-
-**Date**: 22 Dec 2022 v2 TODO
-
-**Target** v3.2B
-
-**Books and Section affected**:
-
-```
- Everything (in a consistent, regular and systematic fashion)
-```
-
-**Summary**
-
-```
- Exactly as is already done in RISC-V, convert the entire use of 64-bit hard-coding to "XLEN".
- Exactly as is in RISC-V, options then include PowerISA-32, PowerISA-64 and PowerISA-128.
- Unlike in RISC-V, the concept of PowerISA-16 and PowerISA-8 is also floated, for Embedded,
- AI, Edge, Processing-in-Memory, Distributed Computing and other purposes.
-```
-
-**Submitter**: Luke Leighton (Libre-SOC)
-
-**Requester**: Libre-SOC
-
-**Impact on processor**:
-
-```
- Entirely new processors, entirely new markets.
-```
-
-**Impact on software**:
-
-```
- Massive but regular, consistent, and systematic.
-```
-
-**Keywords**:
-
-```
- XLEN
-```
-
-**Motivation**
-
-The Power ISA is far too massive, making it wholly unsuited for Embedded
-markets and adversely impacting its reach and potential. The RISC paradigm
-it is based on has gone too far into PackedSIMD (128-bit). Fixing this is
-relatively and conceptually straightforward: allow 32-bit and even 16-bit
-and 8-bit implementations, and use the opportunity to allow future Scalar
-128-bit implementations in the exact same strategic way that RISC-V has RV128.
-
-Register files are redefined to XLEN width but are permitted to "group"
-registers together to create 16-bit, 32-bit and 64-bit addresses.
-In this way, the limitations of what would otherwise restrict the usefulness
-of a severely-targetted application-specific processor may be overcome in
-order to make it still possible to (at reduced performance) still run
-general-purpose applications.
-AI application-specific processing or other Processing-In-Memory or other
-specialist design therefore may for example focus a balance
-of raw computing power heavily onto 8-bit or 16-bit computation, but still
-gain the benefit of the Power ISA and everything it brings. Contrast
-this with the more "normal" approach of creating heavily-focussed
-specialist "AI" Engines incapable of Turing-completeness and the benefits
-are clear.
-
-Note 1: SVP64 **requires** this change as a 100% critical dependency.
-SIMD back-end ALUs process Vectors of "Elements" at 8, 16 and 32-bit (and
-64-bit), read from, processed, and returned to, the standard **Scalar**
-Register Files, with byte-level write-enable lines. The proposal is
-therefore made as an opportunity for others interested in Scalar ISA
-8/16/32-bit (and future 128-bit variants of Scalar Power ISA) to take
-**and complete** that work in an incremental fashion, without having
-to be faced with a massive bulk and body of work as a prerequisite.
-
-Examples include that whilst an SVP64 Prefixed '''lbz''' instruction
-('''sv.lbz''') is well-defined and has strict well-defined behaviour,
-a pure **Scalar-only** (non-SVP64) over-ridden '''lbz''' instruction
-has not been so well-defined, and would require a Stakeholder interested
-in 8/16/32-bit (and future 128-bit) to think through the implications
-and incrementally submit further OPF ISA RFCs. With RISC-V **already
-having done this type of work** it is not technically difficult: it
-just requires another Stakeholder to do it.
-
-Note 2: one alternative to this proposal, as far as SVP64 is concerned,
-is to literally duplicate the entirety of Chapters 3 and 4 Book III,
-and to create - and then maintain - multiple identical copies of the
-instructions including identical copies of the pseudocode except for
-substitution of occurrences of "64" with a "32" variant, "16" variant,
-"8" variant (and future "128" variant), and so on. This would add
-over 700 additional pages to the Power ISA Specification and it should
-be clear that it would become a maintenance nightmare.
-
-Another alternative is to poison and irredemably damage the Power ISA
-(as a powerful and lean RISC ISA) by adding several hundred (close to 1,000)
-additional specific 8-bit, 16-bit and 32-bit (and in future 128-bit) Scalar
-instructions. Given that the 32-bit Opcode Allocation Space is already
-under pressure such a move would be extremely unwise for that reason alone.
-
-**Changes**
-
-For all pseudocode right across the board in all Scalar operations, replace
-hard-coded "64" with "XLEN". **This work is already underway as sponsored
-by NLnet in the Libre-SOC Power ISA Pseudocode**. The default is obviously
-recommended to be "XLEN=64" in order to create zero disruption.
-
-Definitions of the Register File(s) for GPR and FPR are then changed to be
-"XLEN" wide. However, for Embedded purposes (XLEN=32/16/8), an SPR controls
-whether (and how many) sequentially-grouped registers are taken together to
-create 16-bit, 32-bit and 64-bit addresses (depending on application need).
-GPR is obvious, FPR is quirky. SVP64 redefines FP ops (those not ending in "s")
-to be "full width" and all ops ending in "s" to be "half of
-the full width".
-
-* XLEN=64 keeps FPR "full width" exactly as presently defined, and
- "half width" exactly as presently defined.
-* XLEN=32 overrides FPR "full width" operations to
- full BFP32, and "half width" to be "BFP16 stored in an BFP32"
-* XLEN=16 redefines FPR "full width" operations to full [IEEE BFP16](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) and leaves
- "half width" RESERVED (there is no IEEE version of [FP8](https://web.archive.org/web/20221223085833/https://wccftech.com/nvidia-intel-arm-bet-their-ai-future-on-fp8-whitepaper-for-8-bit-fp-published/)).
-* XLEN=8 redefines FPR "full width" operations to [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) and leaves
- "half width" RESERVED.
-
-----------------
-
-# Examples
-
-## pseudocode examples demonstrating modification.
-
-before for popcntb:
-
-```
-do i = 0 to 7
- n <- 0
- do j = 0 to 7
- if (RS)[(i*8)+j] = 1 then
- n <- n+1
- RA[(i*8):(i*8)+7] <- n
-```
-
-after:
-
-```
-do i = 0 to ((XLEN/8)-1)
- n <- 0
- do j = 0 to 7
- if (RS)[(i*8)+j] = 1 then
- n <- n+1
- RA[(i*8):(i*8)+7] <- n
-```
-
-Here as the instruction's intent is to count bytes, and RA contains on
-a per-byte basis a SIMD-style count of each byte's 1s, it becomes possible
-to simply count less bytes.
-
-Should it be more useful to redefine popcntb in terms of always returning
-eight results? For example `sv.popcntb/w=16` to return 8 2-bit counts of
-the number of bits in each 2-bit group in RS?
-
-## no modification needed, but function changes
-
-For the `addi` instruction there is no apparent change:
-
-```
-RT <- (RA|0) + EXTS(SI)
-```
-
-However behind the scenes, RA is XLEN bits wide, therefore EXTS performs an
-increase in bitlength not to exactly 64 but to XLEN. Obviousy for XLEN=16
-there is no sign-extension, and for XLEN=8 truncation of `SI` will occur.
-Illustrates that there are subtle quirks involved, requiring some thought.
-
-The reason for keeping as many bits of the Immediate as possible should be clear.
-
-## Compare Ranged Byte (cmprb BF,L,RA,RB)
-
-```
- src1 <- EXTZ((RA)[XLEN-8:XLEN-1])
- src21hi <- EXTZ((RB)[XLEN-32:XLEN-23])
- src21lo <- EXTZ((RB)[XLEN-24:XLEN-17])
- src22hi <- EXTZ((RB)[XLEN-16:XLEN-9])
- src22lo <- EXTZ((RB)[XLEN-8:XLEN-1])
- if L=0 then
- in_range <- (src22lo <= src1) & (src1 <= src22hi)
- else
- in_range <- (((src21lo <= src1) & (src1 <= src21hi)) |
- ((src22lo <= src1) & (src1 <= src22hi)))
- CR[4*BF+32] <- 0b0
- CR[4*BF+33] <- in_range
- CR[4*BF+34] <- 0b0
- CR[4*BF+35] <- 0b0
-```
-
-Compare Ranged Byte takes either one or two ranges from RB as individual bytes,
-thus requiring a minimum 16-bit (32-bit when L=1) operand RB.
-src1 on the other hand is only
-8-bit long: the first byte of RA.
-
-Therefore a little more thought is required. Should this simply be UNDEFINED
-behaviour when XLEN=8/16 and L=1? When XLEN=16, L=0 the instruction is still
-valid. Would it be costly at the Decoder?
-
-## Trap Word Immediate
-
-Like FP Single operations there also exist operations at "half of regfile width"
-in the Integer realm. They are discernable with the designation "Word" in their
-title, such as "Trap WORD Immediate".
-
-```
- a <- EXTS((RA)[XLEN/2:XLEN-1])
- if (a < EXTS(SI)) & TO[0] then TRAP
- if (a > EXTS(SI)) & TO[1] then TRAP
- if (a = EXTS(SI)) & TO[2] then TRAP
- if (a <u EXTS(SI)) & TO[3] then TRAP
- if (a >u EXTS(SI)) & TO[4] then TRAP
-```
-
-Here, EXTS receives **half** of the bits of its input register operand, RA.
-Note this is **not** "32 bit because a Word is 32-bit". The definition
-"Trap Word Immediate" has to be replaced with "Trap Half-register-width Immediate"
-but this is very clumsy.
-
-When XLEN=8 "half register width" is clearly 4 bit, thus the LSB nibble is tested,
-but still sign-extended for comparison
-against the 16-bit signed immediate.
-
-## Extend Sign byte/half/word
-
-This instruction can be redefined again in terms of:
-
-* "Word" meaning "Half of register width"
-* "Half-word" meaning "Quarter of register width"
-* "Byte" meaning "One-eighth of register width"
-
-And a table results as follows:
-
-```
- XLEN=8:
- extsb: 1-bit -> 8-bit sign extension
- extsh: 2-bit -> 8-bit sign extension
- extsw: 4-bit -> 8-bit sign extension
- XLEN=16:
- extsb: 2-bit -> 16-bit sign extension
- extsh: 4-bit -> 16-bit sign extension
- extsw: 8-bit -> 16-bit sign extension
- XLEN=32:
- extsb: 4-bit -> 32-bit sign extension
- extsh: 8-bit -> 32-bit sign extension
- extsw: 16-bit -> 32-bit sign extension
- XLEN=64:
- extsb: 8-bit -> 64-bit sign extension
- extsh: 16-bit -> 64-bit sign extension
- extsw: 32-bit -> 64-bit sign extension
-```
-
-If the instructions were kept as presently defined then there
-is a loss of functionality and opportunity:
-
-```
- XLEN=8: # completely wasted opportunity
- extsb: 8-bit -> 8-bit does nothing
- extsh: 16-bit -> 8-bit truncates
- extsw: 32-bit -> 8-bit truncates
- XLEN=16: # wasted 2/3 of encoding
- extsb: 8-bit -> 16-bit sign extension
- extsh: 16-bit -> 16-bit does nothing
- extsw: 32-bit -> 16-bit truncates
- XLEN=32: # wasted 1/3 of encoding
- extsb: 8-bit -> 32-bit sign extension
- extsh: 16-bit -> 32-bit sign extension
- extsw: 32-bit -> 32-bit does nothing
- XLEN=64: # unchanged (default) behaviour
- extsb: 8-bit -> 64-bit sign extension
- extsh: 16-bit -> 64-bit sign extension
- extsw: 32-bit -> 64-bit sign extension
-```
-
-The RTL for `extsb` becomes:
-
-```
- in <- (RA)[XLEN-8:XLEN-1] # extract first byte
- if XLEN = 8 then RT <- in[7] * 8 # 1->8
- if XLEN = 16 then RT <- in[6] * 15 || in[7] # 2->16
- if XLEN = 32 then RT <- in[4] * 29 || in[5:7] # 4->32
- if XLEN = 64 then RT <- in[0] * 57 || in[1:7] # 8->64
-```
-
-And `extsh` and `extsw` follow similar logic. Interestingly there is
-no loss of functionality compared to keeping `extsb` always as "byte
-sign-extending" and ironically the loss of opportunity *is* to keep
-`extsb` the same (extend *byte* regardless of XLEN).
-
-[[!tag opf_rfc]]
-
-\newpage{}
-
--- /dev/null
+# OPF ISA WG External RFC ls005.xlen v1: XLEN
+
+* RFC Author: Luke Kenneth Casson Leighton.
+* RFC Contributors/Ideas: Jacob Lifshay, Toshaan Bharvani
+* Funded by NLnet under the NGI Zero Entrust EU Horizon Europe Grant 101069594
+
+**URLs**:
+
+* <https://libre-soc.org/openpower/sv/rfc/ls005.xlen/>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=988>
+* <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=openpower/isa;hb=HEAD>
+* <https://git.openpower.foundation/isa/PowerISA/issues/104>
+
+**Severity**: Major
+
+**Status**: New
+
+**Date**: 22 Dec 2022 v2 TODO
+
+**Target** v3.2B
+
+**Books and Section affected**:
+
+```
+ Everything (in a consistent, regular and systematic fashion)
+```
+
+**Summary**
+
+```
+ Exactly as is already done in RISC-V, convert the entire use of 64-bit hard-coding to "XLEN".
+ Exactly as is in RISC-V, options then include PowerISA-32, PowerISA-64 and PowerISA-128.
+ Unlike in RISC-V, the concept of PowerISA-16 and PowerISA-8 is also floated, for Embedded,
+ AI, Edge, Processing-in-Memory, Distributed Computing and other purposes.
+```
+
+**Submitter**: Luke Leighton (Libre-SOC)
+
+**Requester**: Libre-SOC
+
+**Impact on processor**:
+
+```
+ Entirely new processors, entirely new markets.
+```
+
+**Impact on software**:
+
+```
+ Massive but regular, consistent, and systematic.
+```
+
+**Keywords**:
+
+```
+ XLEN
+```
+
+**Motivation**
+
+The Power ISA is far too massive, making it wholly unsuited for Embedded
+markets and adversely impacting its reach and potential. The RISC paradigm
+it is based on has gone too far into PackedSIMD (128-bit). Fixing this is
+relatively and conceptually straightforward: allow 32-bit and even 16-bit
+and 8-bit implementations, and use the opportunity to allow future Scalar
+128-bit implementations in the exact same strategic way that RISC-V has RV128.
+
+Register files are redefined to XLEN width but are permitted to "group"
+registers together to create 16-bit, 32-bit and 64-bit addresses.
+In this way, the limitations of what would otherwise restrict the usefulness
+of a severely-targetted application-specific processor may be overcome in
+order to make it still possible to (at reduced performance) still run
+general-purpose applications.
+AI application-specific processing or other Processing-In-Memory or other
+specialist design therefore may for example focus a balance
+of raw computing power heavily onto 8-bit or 16-bit computation, but still
+gain the benefit of the Power ISA and everything it brings. Contrast
+this with the more "normal" approach of creating heavily-focussed
+specialist "AI" Engines incapable of Turing-completeness and the benefits
+are clear.
+
+Note 1: SVP64 **requires** this change as a 100% critical dependency.
+SIMD back-end ALUs process Vectors of "Elements" at 8, 16 and 32-bit (and
+64-bit), read from, processed, and returned to, the standard **Scalar**
+Register Files, with byte-level write-enable lines. The proposal is
+therefore made as an opportunity for others interested in Scalar ISA
+8/16/32-bit (and future 128-bit variants of Scalar Power ISA) to take
+**and complete** that work in an incremental fashion, without having
+to be faced with a massive bulk and body of work as a prerequisite.
+
+Examples include that whilst an SVP64 Prefixed '''lbz''' instruction
+('''sv.lbz''') is well-defined and has strict well-defined behaviour,
+a pure **Scalar-only** (non-SVP64) over-ridden '''lbz''' instruction
+has not been so well-defined, and would require a Stakeholder interested
+in 8/16/32-bit (and future 128-bit) to think through the implications
+and incrementally submit further OPF ISA RFCs. With RISC-V **already
+having done this type of work** it is not technically difficult: it
+just requires another Stakeholder to do it.
+
+Note 2: one alternative to this proposal, as far as SVP64 is concerned,
+is to literally duplicate the entirety of Chapters 3 and 4 Book III,
+and to create - and then maintain - multiple identical copies of the
+instructions including identical copies of the pseudocode except for
+substitution of occurrences of "64" with a "32" variant, "16" variant,
+"8" variant (and future "128" variant), and so on. This would add
+over 700 additional pages to the Power ISA Specification and it should
+be clear that it would become a maintenance nightmare.
+
+Another alternative is to poison and irredemably damage the Power ISA
+(as a powerful and lean RISC ISA) by adding several hundred (close to 1,000)
+additional specific 8-bit, 16-bit and 32-bit (and in future 128-bit) Scalar
+instructions. Given that the 32-bit Opcode Allocation Space is already
+under pressure such a move would be extremely unwise for that reason alone.
+
+**Changes**
+
+For all pseudocode right across the board in all Scalar operations, replace
+hard-coded "64" with "XLEN". **This work is already underway as sponsored
+by NLnet in the Libre-SOC Power ISA Pseudocode**. The default is obviously
+recommended to be "XLEN=64" in order to create zero disruption.
+
+Definitions of the Register File(s) for GPR and FPR are then changed to be
+"XLEN" wide. However, for Embedded purposes (XLEN=32/16/8), an SPR controls
+whether (and how many) sequentially-grouped registers are taken together to
+create 16-bit, 32-bit and 64-bit addresses (depending on application need).
+GPR is obvious, FPR is quirky. SVP64 redefines FP ops (those not ending in "s")
+to be "full width" and all ops ending in "s" to be "half of
+the full width".
+
+* XLEN=64 keeps FPR "full width" exactly as presently defined, and
+ "half width" exactly as presently defined.
+* XLEN=32 overrides FPR "full width" operations to
+ full BFP32, and "half width" to be "BFP16 stored in an BFP32"
+* XLEN=16 redefines FPR "full width" operations to full [IEEE BFP16](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) and leaves
+ "half width" RESERVED (there is no IEEE version of [FP8](https://web.archive.org/web/20221223085833/https://wccftech.com/nvidia-intel-arm-bet-their-ai-future-on-fp8-whitepaper-for-8-bit-fp-published/)).
+* XLEN=8 redefines FPR "full width" operations to [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) and leaves
+ "half width" RESERVED.
+
+----------------
+
+# Examples
+
+## pseudocode examples demonstrating modification.
+
+before for popcntb:
+
+```
+do i = 0 to 7
+ n <- 0
+ do j = 0 to 7
+ if (RS)[(i*8)+j] = 1 then
+ n <- n+1
+ RA[(i*8):(i*8)+7] <- n
+```
+
+after:
+
+```
+do i = 0 to ((XLEN/8)-1)
+ n <- 0
+ do j = 0 to 7
+ if (RS)[(i*8)+j] = 1 then
+ n <- n+1
+ RA[(i*8):(i*8)+7] <- n
+```
+
+Here as the instruction's intent is to count bytes, and RA contains on
+a per-byte basis a SIMD-style count of each byte's 1s, it becomes possible
+to simply count less bytes.
+
+Should it be more useful to redefine popcntb in terms of always returning
+eight results? For example `sv.popcntb/w=16` to return 8 2-bit counts of
+the number of bits in each 2-bit group in RS?
+
+## no modification needed, but function changes
+
+For the `addi` instruction there is no apparent change:
+
+```
+RT <- (RA|0) + EXTS(SI)
+```
+
+However behind the scenes, RA is XLEN bits wide, therefore EXTS performs an
+increase in bitlength not to exactly 64 but to XLEN. Obviousy for XLEN=16
+there is no sign-extension, and for XLEN=8 truncation of `SI` will occur.
+Illustrates that there are subtle quirks involved, requiring some thought.
+
+The reason for keeping as many bits of the Immediate as possible should be clear.
+
+## Compare Ranged Byte (cmprb BF,L,RA,RB)
+
+```
+ src1 <- EXTZ((RA)[XLEN-8:XLEN-1])
+ src21hi <- EXTZ((RB)[XLEN-32:XLEN-23])
+ src21lo <- EXTZ((RB)[XLEN-24:XLEN-17])
+ src22hi <- EXTZ((RB)[XLEN-16:XLEN-9])
+ src22lo <- EXTZ((RB)[XLEN-8:XLEN-1])
+ if L=0 then
+ in_range <- (src22lo <= src1) & (src1 <= src22hi)
+ else
+ in_range <- (((src21lo <= src1) & (src1 <= src21hi)) |
+ ((src22lo <= src1) & (src1 <= src22hi)))
+ CR[4*BF+32] <- 0b0
+ CR[4*BF+33] <- in_range
+ CR[4*BF+34] <- 0b0
+ CR[4*BF+35] <- 0b0
+```
+
+Compare Ranged Byte takes either one or two ranges from RB as individual bytes,
+thus requiring a minimum 16-bit (32-bit when L=1) operand RB.
+src1 on the other hand is only
+8-bit long: the first byte of RA.
+
+Therefore a little more thought is required. Should this simply be UNDEFINED
+behaviour when XLEN=8/16 and L=1? When XLEN=16, L=0 the instruction is still
+valid. Would it be costly at the Decoder?
+
+## Trap Word Immediate
+
+Like FP Single operations there also exist operations at "half of regfile width"
+in the Integer realm. They are discernable with the designation "Word" in their
+title, such as "Trap WORD Immediate".
+
+```
+ a <- EXTS((RA)[XLEN/2:XLEN-1])
+ if (a < EXTS(SI)) & TO[0] then TRAP
+ if (a > EXTS(SI)) & TO[1] then TRAP
+ if (a = EXTS(SI)) & TO[2] then TRAP
+ if (a <u EXTS(SI)) & TO[3] then TRAP
+ if (a >u EXTS(SI)) & TO[4] then TRAP
+```
+
+Here, EXTS receives **half** of the bits of its input register operand, RA.
+Note this is **not** "32 bit because a Word is 32-bit". The definition
+"Trap Word Immediate" has to be replaced with "Trap Half-register-width Immediate"
+but this is very clumsy.
+
+When XLEN=8 "half register width" is clearly 4 bit, thus the LSB nibble is tested,
+but still sign-extended for comparison
+against the 16-bit signed immediate.
+
+## Extend Sign byte/half/word
+
+This instruction can be redefined again in terms of:
+
+* "Word" meaning "Half of register width"
+* "Half-word" meaning "Quarter of register width"
+* "Byte" meaning "One-eighth of register width"
+
+And a table results as follows:
+
+```
+ XLEN=8:
+ extsb: 1-bit -> 8-bit sign extension
+ extsh: 2-bit -> 8-bit sign extension
+ extsw: 4-bit -> 8-bit sign extension
+ XLEN=16:
+ extsb: 2-bit -> 16-bit sign extension
+ extsh: 4-bit -> 16-bit sign extension
+ extsw: 8-bit -> 16-bit sign extension
+ XLEN=32:
+ extsb: 4-bit -> 32-bit sign extension
+ extsh: 8-bit -> 32-bit sign extension
+ extsw: 16-bit -> 32-bit sign extension
+ XLEN=64:
+ extsb: 8-bit -> 64-bit sign extension
+ extsh: 16-bit -> 64-bit sign extension
+ extsw: 32-bit -> 64-bit sign extension
+```
+
+If the instructions were kept as presently defined then there
+is a loss of functionality and opportunity:
+
+```
+ XLEN=8: # completely wasted opportunity
+ extsb: 8-bit -> 8-bit does nothing
+ extsh: 16-bit -> 8-bit truncates
+ extsw: 32-bit -> 8-bit truncates
+ XLEN=16: # wasted 2/3 of encoding
+ extsb: 8-bit -> 16-bit sign extension
+ extsh: 16-bit -> 16-bit does nothing
+ extsw: 32-bit -> 16-bit truncates
+ XLEN=32: # wasted 1/3 of encoding
+ extsb: 8-bit -> 32-bit sign extension
+ extsh: 16-bit -> 32-bit sign extension
+ extsw: 32-bit -> 32-bit does nothing
+ XLEN=64: # unchanged (default) behaviour
+ extsb: 8-bit -> 64-bit sign extension
+ extsh: 16-bit -> 64-bit sign extension
+ extsw: 32-bit -> 64-bit sign extension
+```
+
+The RTL for `extsb` becomes:
+
+```
+ in <- (RA)[XLEN-8:XLEN-1] # extract first byte
+ if XLEN = 8 then RT <- in[7] * 8 # 1->8
+ if XLEN = 16 then RT <- in[6] * 15 || in[7] # 2->16
+ if XLEN = 32 then RT <- in[4] * 29 || in[5:7] # 4->32
+ if XLEN = 64 then RT <- in[0] * 57 || in[1:7] # 8->64
+```
+
+And `extsh` and `extsw` follow similar logic. Interestingly there is
+no loss of functionality compared to keeping `extsb` always as "byte
+sign-extending" and ironically the loss of opportunity *is* to keep
+`extsb` the same (extend *byte* regardless of XLEN).
+
+[[!tag opf_rfc]]
+
+\newpage{}
+
Related RFCs are [[ls008]] for the two Management instructions `setvl`
and `svstep`, and [[ls009]] for the REMAP Subsystem. Also [[ls001]] is
a Dependency as it introduces Primary Opcode 9 64-bit encoding. An
-additional RFC [[ls005]] introduced XLEN on which SVP64 is also critically
+additional RFC [[ls005.xlen]] introduced XLEN on which SVP64 is also critically
dependent, for Element-width Overrides.
**Changes**