From a94a093c9ea01ea19d09ebea82f5330fc28d994b Mon Sep 17 00:00:00 2001 From: lkcl Date: Tue, 28 Sep 2021 17:18:12 +0100 Subject: [PATCH] ghostmansd: update submitted RFPs --- 3d_gpu/architecture/dynamic_simd.mdwn | 88 ++++++++++++++++++++++++++- 3mdeb/ghostmansd.mdwn | 16 ++++- 2 files changed, 100 insertions(+), 4 deletions(-) diff --git a/3d_gpu/architecture/dynamic_simd.mdwn b/3d_gpu/architecture/dynamic_simd.mdwn index 18c401ed4..5ab225d33 100644 --- a/3d_gpu/architecture/dynamic_simd.mdwn +++ b/3d_gpu/architecture/dynamic_simd.mdwn @@ -28,8 +28,91 @@ into all 8 possible combinations of the 3 Partition bits: exp-a : ....1....1....0.... 2x 8-bit, 1x 16-bit exp-a : ....1....1....1.... 4x 8-bit - -Links: +A simple example, a "min" function: + + # declare x, y and out as 16-bit scalar Signals + x = Signal(16) + y = Signal(16) + out = Signal(16) + + # compare x against y and set output accordingly + with m.If(x < y): + comb += out.eq(x) + with m.Else(): + comb += out.eq(y) + +This is very straightforward and obvious, that the 16-bit output is the +lesser of the x and y inputs. We require the exact same obviousness +and under no circumstances any change of any kind to any nmigen language +construct: + + # a mask of length 3 indicates a desire to partition Signals at + # 3 points into 4 equally-spaced SIMD "partitions". + mask = Signal(3) + # x y and out are all 16-bit so are subdivided at: + # | mask[0] mask[1] mask[3] | + # | 0-3 | 4-7 | 8-11 | 12-15 | + + x = PartitionedSignal(mask, 16) # identical except for mask + y = PartitionedSignal(mask, 16) # identical except for mask + out = PartitionedSignal(mask, 16) # identical except for mask + + # all code here is required to be absolutely identical to the + # scalar case, and identical in nmigen language behaviour in + # every way. no changes to the nmigen language or its use + # are permitted + + with m.If(x < y): + comb += out.eq(x) + with m.Else(): + comb += out.eq(y) + +The purpose of PartitionedSignal is therefore to provide full 100% +transparent SIMD run-time dynamic behaviour as far as end-usage is +concerned. + +The alternative is absolutely awful and completely unacceptable +for both maintenance cost and development cost: + + # declare x, y and out as 16-bit scalar Signals + x = Signal(16) + y = Signal(16) + out = Signal(16) + + # start an absolutely awful unmaintainable duplication of + # SIMD behaviour. + with m.If(mask == 0b111): # 1x 16-bit + # compare x against y and set output accordingly + with m.If(x < y): + comb += out.eq(x) + with m.Else(): + comb += out.eq(y) + with m.ElIf(mask == 0b101): # 2x 8-bit + for i in range(2): + xh = x[i*8:(i+1)*8] + yh = y[i*8:(i+1)*8] + outh = out[i*8:(i+1)*8] + # compare halves of x against halves y and set + # halves of output accordingly + with m.If(xh < yh): + comb += outh.eq(xh) + with m.Else(): + comb += outh.eq(yh) + with m.ElIf(mask == 0b000): # 4x 4-bit + .... + with m.ElIf(mask == 0b100): # 1x 8-bit followed by 2x 4-bit + .... + with m.ElIf(....) + .... + with m.ElIf(....) + .... + with m.ElIf(....) + .... + + + + +# Links * m.If/Switch * top level SIMD @@ -38,6 +121,7 @@ Links: * Formal proof of PartitionedSignal * Formal proof of PartitionedSignal nmigen interaction +# Rationale / Introduction To save hugely on gate count the normal practice of having separate scalar ALUs and separate SIMD ALUs is not followed. diff --git a/3mdeb/ghostmansd.mdwn b/3mdeb/ghostmansd.mdwn index c3a9a942d..a25e3be41 100644 --- a/3mdeb/ghostmansd.mdwn +++ b/3mdeb/ghostmansd.mdwn @@ -20,10 +20,22 @@ dmitry.selyutin@3mdeb.com - EUR 325 dmitry - EUR 75 maciej -## Done +## Submitted RPFs +- First Steps documentation page + - 5/9/21 + - EUR 100 + - 50:50 dmitry/maciej - BCD instructions unit tests + - 5/9/21 + - EUR 150 + - 50:50 dmitry/maciej - BCD instructions implementation -- First Steps documentation page + - 5/9/21 + - EUR 125 + - 50:50 dmitry/maciej + +## Done + - not cherry-picking popcntw XLEN or cnttz XLEN - bpermd XLEN update needs refinement -- 2.30.2