From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Fri, 7 Feb 2020 17:04:27 +0000 (+0000)
Subject: continue dynamic shift doc
X-Git-Tag: convert-csv-opcode-to-binary~3523
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=7f3da47fdb5ed3359ba4260cbdc323446a705272;p=libreriscv.git

continue dynamic shift doc
---

diff --git a/3d_gpu/architecture/dynamic_simd/shift.mdwn b/3d_gpu/architecture/dynamic_simd/shift.mdwn
index 83ca7d399..bd97bcbf2 100644
--- a/3d_gpu/architecture/dynamic_simd/shift.mdwn
+++ b/3d_gpu/architecture/dynamic_simd/shift.mdwn
@@ -9,4 +9,26 @@ Partitioned Shifting will also require to have an NxN matrix, however it is slig
 
 then, we compute the following matrix:
 
-    a0 << b0    a1 << b0    a2 << b0   a3 << b0
+    | a0 << b0 | a1 << b0 | a2 << b0 | a3 << b0
+    | a0 << b1 | a1 << b1 | a2 << b1 | a3 << b1
+    | a0 << b2 | a1 << b2 | a2 << b2 | a3 << b2
+    | a0 << b3 | a1 << b3 | a2 << b3 | a3 << b3
+
+Where multiply would perform a cascading-add across those partial results,
+shift is different in that we *know* (assume) that for each shift-amount
+(operand b), within each partition the topmost bits are **zero**.
+
+This because, in the typical 64-bit shift, the operation is actually:
+
+    result[63..0] = a[63..0] << b[5..0]
+
+**NOT** b[63..0], i.e. the amount to shift a 64-bit number by by is in the
+*lower* six bits of b.  Likewise, for a 32-bit number, this is 5 bits.
+
+Therefore, in principle, it should be possible to simply use Muxes on the
+partial-result matrix, ORing them together.  Assuming (again) a 32-bit
+input and a 4-way partition:
+
+    out0 = p00[7..0]
+    out1 = pmask[0] ? p01[7..0] : p00[15..8]
+