From 95e64e71f200b7b882237d746e13495922cc0d46 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Wed, 11 Apr 2018 14:00:32 +0100 Subject: [PATCH] add auto-increment idea --- simple_v_extension.mdwn | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn index 3116fb9a3..14cdc6a30 100644 --- a/simple_v_extension.mdwn +++ b/simple_v_extension.mdwn @@ -661,6 +661,46 @@ existing non-Simple-V implementation.  i say that despite really *really* wanting IEEE 704 FP Half-precision to end up somewhere in RISC-V in some fashion, for optimising 3D Graphics.  *sigh*. +## TODO: analyse, auto-increment on unit-stride and constant-stride + +so i thought about that for a day or so, and wondered if it would be +possible to propose a variant of zero-overhead loop that included +auto-incrementing the two address registers a2 and a3, as well as +providing a means to interact between the zero-overhead loop and the +vsetvl instruction. a sort-of pseudo-assembly of that would look like: + +> # a2 to be auto-incremented by t0*4 +> zero-overhead-set-auto-increment a2, t0, 4 +> # a2 to be auto-incremented by t0*4 +> zero-overhead-set-auto-increment a3, t0, 4 +> zero-overhead-set-loop-terminator-condition a0 zero +> zero-overhead-set-start-end stripmine, stripmine+endoffset +> stripmine: +> vsetvl t0,a0 +> vlw v0, a2 +> vlw v1, a3 +> vfma v1, a1, v0, v1 +> vsw v1, a3 +> sub a0, a0, t0 +>stripmine+endoffset: + +the question is: would something like this even be desirable? it's a +variant of auto-increment [1]. last time i saw any hint of auto-increment +register opcodes was in the 1980s... 68000 if i recall correctly... yep +see [1] + +[1] http://fourier.eng.hmc.edu/e85_old/lectures/instruction/node6.html + +Reply: + +Another option for auto-increment is for vector-memory-access instructions +to support post-increment addressing for unit-stride and constant-stride +modes. This can be implemented by the scalar unit passing the operation +to the vector unit while itself executing an appropriate multiply-and-add +to produce the incremented address. This does *not* require additional +ports on the scalar register file, unlike scalar post-increment addressing +modes. + ## TODO: instructions (based on Hwacha) V-Ext duplication analysis This is partly speculative due to lack of access to an up-to-date -- 2.30.2