(no commit message)

[libreriscv.git] / openpower / sv / biginteger.mdwn
diff --git a/openpower/sv/biginteger.mdwn b/openpower/sv/biginteger.mdwn

index 266924467edff09fbeff32480080836475aebb52..c84971b82731f7fadf7de77a34abcbef480c43df 100644 (file)
--- a/openpower/sv/biginteger.mdwn
+++ b/openpower/sv/biginteger.mdwn
@@ -6,6 +6,7 @@
  
  * [[discussion]] page for notes
  * <https://bugs.libre-soc.org/show_bug.cgi?id=817> bugreport
+* <https://bugs.libre-soc.org/show_bug.cgi?id=937> 128/64 shifts
  * [[biginteger/analysis]]
  * [[openpower/isa/svfixedarith]] pseudocode
  
@@ -28,32 +29,78 @@ Dynamic SIMD ALUs for maximum performance and effectiveness.
  Covered in [[biginteger/analysis]] the summary is that standard `adde`
  is sufficient for SVP64 Vectorisation of big-integer addition (and `subfe`
  for subtraction) but that big-integer shift, multiply and divide require an
-extra 3-in 2-out instructions, similar to Intel's `shld`, `shrd`,
-`mulx` and `idiv`, to be efficient.
-The same instruction (`maddedu`) is used for both because 'maddedu''s primary
+extra 3-in 2-out instructions, similar to Intel's
+[shld](https://www.felixcloutier.com/x86/shld)
+and [shrd](https://www.felixcloutier.com/x86/shrd),
+[mulx](https://www.felixcloutier.com/x86/mulx) and
+[divq](https://www.felixcloutier.com/x86/div),
+to be efficient.
+The same instruction (`maddedu`) is used in both
+big-divide and big-multiply because 'maddedu''s primary
  purpose is to perform a fused 64-bit scalar multiply with a large vector,
  where that result is Big-Added for Big-Multiply, but Big-Subtracted for
  Big-Divide.
  
+Chaining the operations together gives Scalar-by-Vector 
+operations, except for `sv.adde` and `sv.subfe` which are
+Vector-by-Vector Chainable (through the `CA` flag).
  Macro-op Fusion and back-end massively-wide SIMD ALUs may be deployed in a
  fashion that is hidden from the user, behind a consistent, stable ISA API.
  The same macro-op fusion may theoretically be deployed even on Scalar
  operations.
  
-# dsld and dsrd
+# **DRAFT** dsld
  
-**DRAFT**
+|0.....5|6..10|11..15|16..20|21.25|26..30|31|
+|-------|-----|------|------|-----|------|--|
+| EXT04 | RT  |  RA  |  RB  | RC  |  XO  |Rc|
+
+VA2-Form
+
+* dsld    RT,RA,RB,RC  (Rc=0)
+* dsld.   RT,RA,RB,RC  (Rc=1)
+
+Pseudo-code:
+
+    n <- (RB)[58:63]
+    v <- ROTL64((RA), n)
+    mask <- MASK(0, 63-n)
+    RT <- (v[0:63] & mask) | ((RC) & ¬mask)
+    RS <- v[0:63] & ¬mask
+    overflow = 0
+    if RS != [0]*64:
+        overflow = 1
+
+Special Registers Altered:
+
+    CR0                    (if Rc=1)
  
-`dsld` and `dsrd` are is similar to v3.0 `sld`, and
-is Z23-Form in "overwrite" on RT.
+# **DRAFT** dsrd
  
-|0.....5|6..10|11..15|16..20|21.22|23..30|31|
+|0.....5|6..10|11..15|16..20|21.25|26..30|31|
  |-------|-----|------|------|-----|------|--|
-| EXT04 | RT  |  RA  |  RB  | sm  |  XO  |Rc|
+| EXT04 | RT  |  RA  |  RB  | RC  |  XO  |Rc|
+
+VA2-Form
+
+* dsrd    RT,RA,RB,RC  (Rc=0)
+* dsrd.   RT,RA,RB,RC  (Rc=1)
+
+Pseudo-code:
+
+    n <- (RB)[58:63]
+    v <- ROTL64((RA), 64-n)
+    mask <- MASK(n, 63)
+    RT <- (v[0:63] & mask) | ((RC) & ¬mask)
+    RS <- v[0:63] & ¬mask
+    overflow = 0
+    if RS != [0]*64:
+        overflow = 1
+
+Special Registers Altered:
+
+    CR0                    (if Rc=1)
  
-Both instructions take two 64-bit sources, concatenate
-them together then extract 64 bits from it, the offset
-location determined by a third source.
  
  # maddedu
  
@@ -79,8 +126,12 @@ to it; the lower half of that result stored in RT and the upper half
  in RS.
  
  The differences here to `maddhdu` are that `maddhdu` stores the upper
-half in RT, where `maddedu` stores the upper half in RS. There is no
-equivalent to `maddld` because `maddld` performs sign-extension on RC.
+half in RT, where `maddedu` stores the upper half in RS.
+
+The value stored in RT is exactly equivalent to `maddld` despite `maddld`
+performing sign-extension on RC, because RT is the full mathematical result
+modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
+results modulo 2^64. This is why there is no maddldu instruction.
  
  *Programmer's Note:
  As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
@@ -130,7 +181,7 @@ that is near-identical to `divdeu` except that:
  
  RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
  division, producing a (pair) of 64 bit result(s), in the same way that
-Intel [idiv](https://www.felixcloutier.com/x86/idiv) works.
+Intel [divq](https://www.felixcloutier.com/x86/div) works.
  Overflow conditions
  are detected in exactly the same fashion as `divdeu`, except that rather
  than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
@@ -139,7 +190,7 @@ zeros on overflow.
  *Programmer's note: there are no Rc variants of any of these VA-Form
  instructions. `cmpi` will need to be used to detect overflow conditions:
  the saving in instruction count is that both RT and RS will have already
-been set to useful values (all 1s and all zeros redpectively)
+been set to useful values (all 1s and all zeros respectively)
  needed as part of implementing Knuth's
  Algorithm D*
  
@@ -164,10 +215,14 @@ Pseudo-code:
  
  For the Opcode map (XO Field)
  see Power ISA v3.1, Book III, Appendix D, Table 13 (sheet 7 of 8), p1357.
-Proposed is the addition of `maddedu` (**DRAFT, NOT APPROVED**) in `110010`
-and `divmod2du` in `110100`
+Proposed is the addition of:
+
+* `maddedu` in `110010`
+* `divmod2du` in `111010`
+* `pcdec` in `111000`
  
-|110000|110001 |110010    |110011|110100       |110101|110110|110111|
-|------|-------|----------|------|-------------|------|------|------|
-|maddhd|maddhdu|**maddedu**|maddld|**divmod2du**|rsvd  |rsvd  |rsvd  |
+|v >|   000|   001 |   010    |   011|   100  |   101  |   110   |   111  |
+|---|------|-------|----------|------|--------|--------|---------|--------|
+|110|maddhd|maddhdu|maddedu   |maddld|rsvd    |rsvd    |rsvd     |rsvd    |
+|111|pcdec.|rsvd   |divmod2du |vpermr|vaddequm|vaddecuq|vsubeuqm |vsubecuq|