From: lkcl <lkcl@web>
Date: Sat, 25 Sep 2021 20:45:28 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: DRAFT_SVP64_0_1~4
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=daf639b06a8b4d2adf30b01661f1f8afea86ba00;p=libreriscv.git

---

diff --git a/3d_gpu/architecture/dynamic_simd/assign.mdwn b/3d_gpu/architecture/dynamic_simd/assign.mdwn
index df5b08c0c..9934cee4e 100644
--- a/3d_gpu/architecture/dynamic_simd/assign.mdwn
+++ b/3d_gpu/architecture/dynamic_simd/assign.mdwn
@@ -74,3 +74,20 @@ the whole of the target are copied.  In this example, B is 8 bits so only
 When the partitions are all closed (4x SIMD) each partition of B is
 2 bits wide, therefore only the *first two* bits of A are copied into
 *each* of the four 2-bit partitions in B.
+
+For the case where A is shorter than B output, sign or zero
+extension is required. Here we assume A is 8 bits, B is 16.
+This is similar to the parallel case except A is repeated
+(broadcast) across all of B.
+
+
+| partition | o3         | o2         | o1         | o0         |
+| --------- | --         | --         | --         | --         |
+| 000       | [A7A7A7A7] | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   |
+| 001       | [A7A7A7A7] | [A7A7]A7A6 | A5A4A3A2   | [A1A1]A1A0 |
+| 010       | [A7A7A7A7] | A7A6A5A4   | [A3A3A3A3] | A3A2A1A0   |
+| 011       | [A7A7A7A7] | A7A6A5A4   | [A3A3]A3A2 | [A1A1]A1A0 |
+| 100       | [A7A7]A7A6 | [A5A5A5A5] | [A5A5]A5A4 | A3A2A1A0   |
+| 101       | [A7A7]A7A6 | [A5A5A5A5] | A5A4A3A2   | [A1A1]A1A0 |
+| 110       | [A7A7]A7A6 | [A5A5]A5A4 | [A3A3A3A3] | A3A2A1A0   |
+| 111       | [A7A7]A7A6 | [A5A5]A5A4 | [A3A3]A3A2 | [A1A1]A1A0 |