From: lkcl Date: Wed, 20 Apr 2022 15:59:06 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2678 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=920b9ab21509944e33b9f3e96e21369a499bcaee;p=libreriscv.git --- diff --git a/openpower/sv/biginteger.mdwn b/openpower/sv/biginteger.mdwn index bb3b9b9e1..1a26cfafb 100644 --- a/openpower/sv/biginteger.mdwn +++ b/openpower/sv/biginteger.mdwn @@ -27,14 +27,14 @@ whilst multiply and divide are O(N^2). ## Add and Subtract Surprisingly, no new additional instructions are required to perform -a straightforward big-integer add or subtract. Vectorised `addeo` +a straightforward big-integer add or subtract. Vectorised `adde` or `addex` is perfectly sufficient to produce arbitrary-length big-integer add due to the rules set in SVP64 that all Vector Operations are directly equivalent to the strict Program Order Execution of their element-level operations. -Thus, due to sequential execution of `addeo` both consuming and producing -a CA Flag, `sv.addeo` is in effect an alias for Vectorised add. As such, +Thus, due to sequential execution of `adde` both consuming and producing +a CA Flag, `sv.adde` is in effect an alias for Vectorised add. As such, implementors are entirely at liberty to recognise Horizontal-First Vector adds and send the vector of registers to a much larger and wider back-end ALU. @@ -96,19 +96,19 @@ as noted by Intel in their notes on mulx, RA*RB+RC+RD cannot overflow, so does not require setting an additional CA flag. -Combined with a Vectorised big-int `sv.addeo` the key inner loop of +Combined with a Vectorised big-int `sv.adde` the key inner loop of Knuth's Algorithm M may be achieved in four instructions, two of which are scalar initialisation: li r16, 0 # zero accululator addic r16, r16, 0 # CA to zero as well sv.madde r0.v, r8.v, r17, r16 # mul vector - sv.addeo r24.v, r24.v, r0.v # big-add row to result + sv.adde r24.v, r24.v, r0.v # big-add row to result Normally, in a Scalar ISA, the use of a register as both a source and destination like this would create costly Dependency Hazards, so such an instruction would never be proposed. However: it turns out -that, just as with repeated chained application of `addeo`, macro-op +that, just as with repeated chained application of `adde`, macro-op fusion may be internally applied to a sequence of these strange multiply operations. Such a trick works equally as well in Scalar-only.