conferences/openpower2021.mdwn

   1 # OpenPOWER Summit 2021
   2
   3 Links
   4
   5 * <https://cfp.openpower.foundation/summit2021/cfp>
   6 * <https://cfp.openpower.foundation/summit2021/talk/review/CA7XEWT9ZKMJ3D7NRXXEK9SYPXBAHPCD>
   7
   8 # Abstract
   9
  10 *Draft SVP64 in-place Matrix Multiply and FFT / DCT for OpenPOWER*
  11
  12 Advanced Cray-style Vectors are being developed for the Power ISA, as a
  13 Draft Extension for submission to the new OpenPOWER ISA Working Group,
  14 named SVP64.  Whilst in-place Matrix Multiply was planned for a much
  15 later advanced version of SVP64, an investigation into putting FFMPEG's
  16 MP3 CODEC inner loop into Vectorised Assembler resulted in such a large
  17 drop in code size (over 4x reduction) that it warranted priority
  18 investigation.
  19
  20 Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT)
  21 and Number-Theory Transform (NTT) form the basis of too numerous
  22 high-priority algorithms to count.  Normal SIMD Processors and even
  23 normal Vector Processors have a hard time dealing with them: inspecting
  24 FFMPEG's source code reveals that heavily optimised inline assembler (no
  25 loops, just hundreds to thousands of lines of assembler) is not uncommon.
  26
  27 The focus of this NLnet-sponsored research is therefore to create enhancements
  28 to SVP64 to be able to cover DFT, DCT, NTT and Matrix-Multiply entirely
  29 in-place.  In-place is crucially important for many applications (3D, Video)
  30 to keep power consumption down by avoiding register spill as well as L1/L2
  31 cache strip-mining.  General-purpose RADIX-2 DCT and complex DFT will be
  32 shown and explained, as well as the in-place Matrix Multiply which does
  33 not require transposing or register spill for any sized Matrices
  34 (including non-power-two) up to 128 FMACs.  The basics of SVP64, covered
  35 in the Overview [1], will also be briefly described.
  36
  37 [1] https://libre-soc.org/openpower/sv/overview/