# unit / element stride mode, compute 64 bit address
srcbase = get_polymorphed_reg(RA, 64, 0)
# adjust for unit/el-stride
- srcbase += ....
+ srcbase += .... uses op_width here
# read the underlying memory
memread <= MEM(srcbase + imm_offs, op_width)
# truncate/extend to over-ridden dest width.
- memread = adjust_wid(memread, op_width, svctx.dest_elwidth)
+ memread = adjust_wid(memread, op_width, svctx.elwidth)
# takes care of inserting memory-read (now correctly byteswapped)
# into regfile underlying LE-defined order, into the right place
- # within the NEON-like register, respecting destination element
- # bitwidth, and the element index (j)
- set_polymorphed_reg(RT, svctx.dest_elwidth, j, memread)
+ # using Element-Packing starting at register RT, respecting destination
+ # element bitwidth, and the element index (j)
+ set_polymorphed_reg(RT, svctx.elwidth, j, memread)
# increments both src and dest element indices (no predication here)
i++;
j++;
```
-Note above that the source elwidth is *not used at all* in LD-immediate.
+Note above that the source elwidth is *not used at all* in LD-immediate: RA
+never has elwidth overrides, leaving the elwidth free for truncation/extension
+of the result.
For LD/Indexed, the key is that in the calculation of the Effective Address,
RA has no elwidth override but RB does. Pseudocode below is simplified
memread = byteswap(memread, op_width)
# truncate/extend to over-ridden dest width.
- memread = adjust_wid(memread, op_width, svctx.dest_elwidth)
+ dest_width = op_width if RT.isvec else 64
+ memread = adjust_wid(memread, op_width, dest_width)
# takes care of inserting memory-read (now correctly byteswapped)
# into regfile underlying LE-defined order, into the right place
# within the NEON-like register, respecting destination element
# bitwidth, and the element index (j)
- set_polymorphed_reg(RT, svctx.dest_elwidth, j, memread)
+ set_polymorphed_reg(RT, destwidth, j, memread)
# increments both src and dest element indices (no predication here)
i++;
j++;
```
+*Programmer's note: with no destination elwidth override the destination
+width must be implicitly ascertained. The assumption is that if the destination
+is a Scalar that the entire 64-bit register must be written, thus the width is
+extended to 64-bit. If however the destination is a Vector then it is deemed
+appropriate to use the LD/ST width and to perform contiguous register element
+packing at that width. The justification for doing so is that if further
+sign-extension or saturation is required after a LD, these may be performed by a
+follow-up instruction that uses a source elwidth override matching the exact width
+of the LD operation. Correspondingly for a ST a destination elwidth override
+on a prior instruction may match the exact width of the ST instruction.*
+
## Remapped LD/ST
In the [[sv/remap]] page the concept of "Remapping" is described. Whilst