of which are packed into the first 16-bit destination element, the
second four of which are packed into the second 16-bit destination element.
-Pseudocode example (dest elwidth overrides not included):
+Pseudocode example: note that dest elwidth overrides affect the
+packing of results. BB.elwidth in effect requests how many 4-bit
+result elements would like to be packed, but RT.elwidth determines
+the limit. Any parts of the destination elements not containing
+results are set to zero.
for i in range(VL):
if BB.isvec:
n3 = mask[3] & (mode[3] == creg[3])
result = n0||n1||n2||n3 # 4-bit result
if RT.isvec:
- # TODO: RT.elwidth override to be also added here
+ # RT.elwidth override can affect the packing
+ bwid = {0b00:64, 0b01:8, 0b10:16, 0b11:32}[RT.elwidth]
+ t4, t8 = min(4, bwid//2), min(8, bwid//2)
# yes, really, the CR's elwidth field determines
# the bit-packing into the INT!
if BB.elwidth == 0b00: