From 2b70697a2cf9ce1a870bf5bf965b5cd3325d3f33 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Fri, 1 Jun 2018 23:18:47 +0100 Subject: [PATCH] add rogier brussee's parallelism extension which uses bitmaps --- bitmap_parallelism_extension.mdwn | 64 +++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 bitmap_parallelism_extension.mdwn diff --git a/bitmap_parallelism_extension.mdwn b/bitmap_parallelism_extension.mdwn new file mode 100644 index 000000000..8880d4b32 --- /dev/null +++ b/bitmap_parallelism_extension.mdwn @@ -0,0 +1,64 @@ +# Parallelism using Bitmaps + +If you think about it this way you can combine setvl, and predication, +and indeed vector length, by always working with bitmaps. + +So: you have 32 WARL CSRs , called X0, ... X31 (or perhaps 2 banks of +32 CSR's and have a set of additional CSR's FX0,... FX31) + +Each contains a bitmap of length 32 (assuming we only have the standard +registers) + +By default, X0 contains 1<<0, X1 contains 1<<1, X2 contains 1 << 2, ... + +now an instruction like + + add x1 x2 x3 + +is reinterpreted as referring to the CSR's rather than individual +registers. i.e. under simple V it means + + add X1, X2, X3 + +and it has the following semantics: + + let rds = registers in bitmap X1 + let rs1s = registers in bitmap X2 repeated periodically in order of register number to the length of X1 + let rs2s = registers in bitmap X3 repeated periodically in order of register number to the length of X1 + + + parallelfor (rd, rs1, rs2) in (rds[i],rs1s[i], rs2s[i]) where i = 0 to length(rds) - 1 + add rd rs1 rs2 + + +example: + + X1 <- 0b011111 + X2 <- 0b1011 + X3 <- 0b00010 + +then + rd1s = [x1, x2, x3, x4, x5] + rs1s = [x0, x2, x3, x0, x2] + rs2s = [x3, x3, x3, x3, x3] + +and + + add X1, X2, X3 + +is interpreted as + + parallel{ + add x1, x0, x3 + add x2, x2, x3 + add x3, x3, x3 + add x4, x0, x3 # x2 and x3 have their original values! + add x5, x2, x3 # x2 and x3 have their original values! + } + +This means that the analogue of setvl is simply the "write any" of +setting the bitmap, and the analogue of the return value of setvl, +is the "read legal" of the CSR. Moreover popc would tell you how many +operations are scheduled in parallel so you know how often you have to +repeat a sequential loop. + -- 2.30.2