From 986e7d007d0a0e603767a40d3797d27cd6627353 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 6 Feb 2024 13:51:53 +0000 Subject: [PATCH] bug 676: tidyup and add svp64 assembler to cookbook page --- openpower/sv/cookbook/fortran_maxloc.mdwn | 61 ++++++++++++++++------- 1 file changed, 44 insertions(+), 17 deletions(-) diff --git a/openpower/sv/cookbook/fortran_maxloc.mdwn b/openpower/sv/cookbook/fortran_maxloc.mdwn index b06e34b4f..367d0d12a 100644 --- a/openpower/sv/cookbook/fortran_maxloc.mdwn +++ b/openpower/sv/cookbook/fortran_maxloc.mdwn @@ -1,26 +1,28 @@ # Fortran MAXLOC SVP64 demo + + MAXLOC is a notoriously difficult function for SIMD to cope with. -SVP64 however has similar capabilities to Z80 CPIR and LDIR +Typical approaches are to perform leaf-node (depth-first) parallel +operations, merging the results mapreduce-style to guage a final +index. - +SVP64 however has similar capabilities to Z80 CPIR and LDIR and +therefore hardware may transparently implement back-end parallel +operations whilst the developer programs in a simple sequential +algorithm. + +A clear reference implementation of MAXLOC is as follows: ``` -int m2(int * const restrict a, int n) -{ - int m, nm; - int i; - - m = INT_MIN; - nm = -1; - for (i=0; i m) - { - m = a[i]; - nm = i; - } - } +int maxloc(int * const restrict a, int n) { + int m, nm = INT_MIN, 0; + for (int i=0; i m) { + m = a[i]; + nm = i; + } + } return nm; } ``` @@ -87,6 +89,31 @@ search seems to be a common technique. +# Implementation in SVP64 Assembler + +The core algorithm (inner part, in-register) is below: 11 instructions. +Loading of data, and offsetting the "core" below is relatively +straightforward: estimated another 6 instructions and needing one +more branch (outer loop). + +``` +# while (im): +sv.minmax./ff=le/m=ge 4,*10,4,1 # uses r4 as accumulator +crternlogi 0,1,2,127 # test greater/equal or VL=0 +sv.crand *19,*16,0 # clear if CR0.eq=0 +# nm = i (count masked bits. could use crweirds here TODO) +sv.svstep/mr/m=so 1, 0, 6, 1 # svstep: get vector dststep +sv.creqv *16,*16,*16 # set mask on already-tested +bc 12,0, -0x40 # CR0 lt bit clear, branch back +``` + [[!tag svp64_cookbook ]] -- 2.30.2