From dc210c5be7e1d46d418ff56f9c88ee1ad2bde440 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 17 Jan 2021 01:37:20 +0000 Subject: [PATCH] --- 3d_gpu/architecture/dynamic_simd/mul.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/3d_gpu/architecture/dynamic_simd/mul.mdwn b/3d_gpu/architecture/dynamic_simd/mul.mdwn index a269850c7..f6aba5572 100644 --- a/3d_gpu/architecture/dynamic_simd/mul.mdwn +++ b/3d_gpu/architecture/dynamic_simd/mul.mdwn @@ -4,7 +4,7 @@ This is complicated! It is necessary to compute a full NxN matrix of partial multiplication results, then perform a cascade of adds (long multipication, in binary), using PartitionedAdd, which will "automatically" break the results down into segments, at all times, keeping each partitioned result separate. -Therefore, for a full 64 bit multiply, with 7 partitions, a matrix of 8x8 multiplications are performed, then added up in each column of the same magnitude, in exactly the same way as described by Vedic Mathematics. Ultimately it is the partitions on the adds that allows the entire multiply to be broken into SIMD pieces. +Therefore, for a full 64 bit multiply, with 7 partitions, a matrix of 8x8 multiplications are performed, then added up in each column of the same magnitude, in exactly the same way as described by Vedic Mathematics. Ultimately it is the partitions on the adds that allows the entire multiply to be broken into SIMD pieces: the Partitioned Adders are what stops carry rolling over to "affect" the result of another partition. The [Wallace Tree](https://en.wikipedia.org/wiki/Wallace_tree) algorithm is presently deployed, here: we need to use the (more efficient) [Dadda algorithm](https://en.wikipedia.org/wiki/Dadda_multiplier) -- 2.30.2