(no commit message)

author lkcl <lkcl@web>

Sat, 21 Nov 2020 21:54:17 +0000 (21:54 +0000)

committer IkiWiki <ikiwiki.info>

Sat, 21 Nov 2020 21:54:17 +0000 (21:54 +0000)
author lkcl <lkcl@web>
Sat, 21 Nov 2020 21:54:17 +0000 (21:54 +0000)
committer IkiWiki <ikiwiki.info>
Sat, 21 Nov 2020 21:54:17 +0000 (21:54 +0000)
diff --git a/openpower/sv/vector_swizzle.mdwn b/openpower/sv/vector_swizzle.mdwn

index 451699412d9c50c72fa5048604b8e319e4556272..4b6688a19128044333aaf1646073c80589e9c99d 100644 (file)
--- a/openpower/sv/vector_swizzle.mdwn
+++ b/openpower/sv/vector_swizzle.mdwn
@@ -10,7 +10,7 @@
  
  and many more.  Lane-based Vector Processors not having the 2/3/4 inter-lane crossing have some difficulty processing such data and require it to be pushed into memory and retrieved, which is prohibitively costly in both instructions, time, and power consumption.
  
-The cost is so great and the requirement so common that it easily justifies augmenting the ISA of a GPU to be able to specify the reordering of vec2/3/4 elements, often drastically increasing the instruction size in the process.
+The lane reordering cost is so great and the requirement so common that it easily justifies augmenting the ISA of a GPU to be able to specify the reordering of vec2/3/4 elements, often drastically increasing the instruction size in the process.
  
  The reason for the dramatic increase is that the reordering of each element in vec4 requires 2 bits per element, plus a predicate mask.  This means a minimum of 3 bits per element: 12 bits for a vec4, and if there are 2 src operands this is a whopping 24 bits of immediate data, per instruction.
author	lkcl <lkcl@web>
	Sat, 21 Nov 2020 21:54:17 +0000 (21:54 +0000)
committer	IkiWiki <ikiwiki.info>
	Sat, 21 Nov 2020 21:54:17 +0000 (21:54 +0000)