From: Luke Kenneth Casson Leighton Date: Sat, 17 Jul 2021 22:47:47 +0000 (+0100) Subject: add abstract / proposal for openpower2021 X-Git-Tag: DRAFT_SVP64_0_1~610 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=0ead93ebd8b4c02e54cc09c77daaf83e719108d0;p=libreriscv.git add abstract / proposal for openpower2021 --- diff --git a/conferences/openpower2021.mdwn b/conferences/openpower2021.mdwn index 8e404afa6..e764a80fe 100644 --- a/conferences/openpower2021.mdwn +++ b/conferences/openpower2021.mdwn @@ -3,3 +3,35 @@ Links * +* + +# Abstract + +*Draft SVP64 in-place Matrix Multiply and FFT / DCT for OpenPOWER* + +Advanced Cray-style Vectors are being developed for the Power ISA, as a +Draft Extension for submission to the new OpenPOWER ISA Working Group, +named SVP64. Whilst in-place Matrix Multiply was planned for a much +later advanced version of SVP64, an investigation into putting FFMPEG's +MP3 CODEC inner loop into Vectorised Assembler resulted in such a large +drop in code size (over 4x reduction) that it warranted priority +investigation. + +Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) +and Number-Theory Transform (NTT) form the basis of too numerous +high-priority algorithms to count. Normal SIMD Processors and even +normal Vector Processors have a hard time dealing with them: inspecting +FFMPEG's source code reveals that heavily optimised inline assembler (no +loops, just hundreds to thousands of lines of assembler) is not uncommon. + +The focus of this NLnet-sponsored research is therefore to create enhancements +to SVP64 to be able to cover DFT, DCT, NTT and Matrix-Multiply entirely +in-place. In-place is crucially important for many applications (3D, Video) +to keep power consumption down by avoiding register spill as well as L1/L2 +cache strip-mining. General-purpose RADIX-2 DCT and complex DFT will be +shown and explained, as well as the in-place Matrix Multiply which does +not require transposing or register spill for any sized Matrices up to +128 FMACs. The basics of SVP64, covered in the Overview [1], will also +be briefly described. + +[1] https://libre-soc.org/openpower/sv/overview/