From daf639b06a8b4d2adf30b01661f1f8afea86ba00 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sat, 25 Sep 2021 21:45:28 +0100 Subject: [PATCH] --- 3d_gpu/architecture/dynamic_simd/assign.mdwn | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/3d_gpu/architecture/dynamic_simd/assign.mdwn b/3d_gpu/architecture/dynamic_simd/assign.mdwn index df5b08c0c..9934cee4e 100644 --- a/3d_gpu/architecture/dynamic_simd/assign.mdwn +++ b/3d_gpu/architecture/dynamic_simd/assign.mdwn @@ -74,3 +74,20 @@ the whole of the target are copied. In this example, B is 8 bits so only When the partitions are all closed (4x SIMD) each partition of B is 2 bits wide, therefore only the *first two* bits of A are copied into *each* of the four 2-bit partitions in B. + +For the case where A is shorter than B output, sign or zero +extension is required. Here we assume A is 8 bits, B is 16. +This is similar to the parallel case except A is repeated +(broadcast) across all of B. + + +| partition | o3 | o2 | o1 | o0 | +| --------- | -- | -- | -- | -- | +| 000 | [A7A7A7A7] | [A7A7A7A7] | A7A6A5A4 | A3A2A1A0 | +| 001 | [A7A7A7A7] | [A7A7]A7A6 | A5A4A3A2 | [A1A1]A1A0 | +| 010 | [A7A7A7A7] | A7A6A5A4 | [A3A3A3A3] | A3A2A1A0 | +| 011 | [A7A7A7A7] | A7A6A5A4 | [A3A3]A3A2 | [A1A1]A1A0 | +| 100 | [A7A7]A7A6 | [A5A5A5A5] | [A5A5]A5A4 | A3A2A1A0 | +| 101 | [A7A7]A7A6 | [A5A5A5A5] | A5A4A3A2 | [A1A1]A1A0 | +| 110 | [A7A7]A7A6 | [A5A5]A5A4 | [A3A3A3A3] | A3A2A1A0 | +| 111 | [A7A7]A7A6 | [A5A5]A5A4 | [A3A3]A3A2 | [A1A1]A1A0 | -- 2.30.2