(no commit message)
authorlkcl <lkcl@web>
Sat, 25 Sep 2021 21:05:49 +0000 (22:05 +0100)
committerIkiWiki <ikiwiki.info>
Sat, 25 Sep 2021 21:05:49 +0000 (22:05 +0100)
3d_gpu/architecture/dynamic_simd/assign.mdwn

index 1ca9557c67be2018ca38e91cca9c140496f6cdd2..4c2b081f40a18d7eadf802b6b59ff6630dacf408 100644 (file)
@@ -84,10 +84,20 @@ This is similar to the parallel case except A is repeated
 | partition | o3         | o2         | o1         | o0         |
 | --------- | --         | --         | --         | --         |
 | 000       | [A7A7A7A7] | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   |
-| 001       | [A5A5A5A5] | [A5A5]A5A4 | A3A2A1A0   | [A1A1]A1A0 |
-| 010       | [A3A3A3A3] | A3A2A1A0   | [A3A3A3A3] | A3A2A1A0   |
-| 011       | [A3A3A3A3] | A3A2A1A0   | [A1A1]A1A0 | [A1A1]A1A0 |
-| 100       | [A1A1]A1A0 | [A5A5A5A5] | [A5A5]A5A4 | A3A2A1A0   |
-| 101       | [A1A1]A1A0 | [A3A3A3A3] | A3A2A1A0   | [A1A1]A1A0 |
-| 110       | [A1A1]A1A0 | [A1A1]A1A0 | [A3A3A3A3] | A3A2A1A0   |
-| 111       | [A1A1]A1A0 | [A1A1]A1A0 | [A1A1]A1A0 | [A1A1]A1A0 |
+| 001       | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   | A3A2A1A0   |
+| 010       | A7A6A5A4   | A3A2A1A0   | A7A6A5A4   | A3A2A1A0   |
+| 011       | A7A6A5A4   | A3A2A1A0   | A3A2A1A0   | A3A2A1A0   |
+| 100       | A3A2A1A0   | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   |
+| 101       | A3A2A1A0   | A7A6A5A4   | A3A2A1A0   | A3A2A1A0   |
+| 110       | A3A2A1A0   | A3A2A1A0   | A7A6A5A4   | A3A2A1A0   |
+| 111       | A3A2A1A0   | A3A2A1A0   | A3A2A1A0v  | A3A2A1A0   |
+
+Note how when the entire partition set is open (1x 16-bit output)
+that all of A is copied out, and either zero or sign extended
+in the top half of the output.  At the other extreme is the
+4x 4-bit output partitions, which have four copies of A, truncated
+from the first 4 bits of A.
+
+Unlike the parallel case, A is not itself partitioned, so is copied
+over as much as is possible.  In some cases such as `1x 4-bit, 1x 12-bit`
+the 8-bit scalar source will need sign or zero extending.