From: lkcl Date: Sat, 21 Nov 2020 21:54:17 +0000 (+0000) Subject: (no commit message) X-Git-Tag: convert-csv-opcode-to-binary~1698 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=a319a480c95f4b5e1468cc71b5ba5b4619283cbb;p=libreriscv.git --- diff --git a/openpower/sv/vector_swizzle.mdwn b/openpower/sv/vector_swizzle.mdwn index 451699412..4b6688a19 100644 --- a/openpower/sv/vector_swizzle.mdwn +++ b/openpower/sv/vector_swizzle.mdwn @@ -10,7 +10,7 @@ and many more. Lane-based Vector Processors not having the 2/3/4 inter-lane crossing have some difficulty processing such data and require it to be pushed into memory and retrieved, which is prohibitively costly in both instructions, time, and power consumption. -The cost is so great and the requirement so common that it easily justifies augmenting the ISA of a GPU to be able to specify the reordering of vec2/3/4 elements, often drastically increasing the instruction size in the process. +The lane reordering cost is so great and the requirement so common that it easily justifies augmenting the ISA of a GPU to be able to specify the reordering of vec2/3/4 elements, often drastically increasing the instruction size in the process. The reason for the dramatic increase is that the reordering of each element in vec4 requires 2 bits per element, plus a predicate mask. This means a minimum of 3 bits per element: 12 bits for a vec4, and if there are 2 src operands this is a whopping 24 bits of immediate data, per instruction.