FSIN | sin | half\_sin | native\_sin | NONE |
FCOS | cos | half\_cos | native\_cos | NONE |
FTAN | tan | half\_tan | native\_tan | NONE |
+NONE (1) | sincos | NONE | NONE | NONE |
FASIN | asin | NONE | NONE | NONE |
FACOS | acos | NONE | NONE | NONE |
+FATAN | atan | NONE | NONE | NONE |
FSINPI | sinpi | NONE | NONE | NONE |
FCOSPI | cospi | NONE | NONE | NONE |
FTANPI | tanpi | NONE | NONE | NONE |
FPOW | pow | NONE | NONE | NONE |
FROOT | rootn | NONE | NONE | NONE |
FHYPOT | hypot | NONE | NONE | NONE |
+FRECIP | NONE | half\_recip | native\_recip | NONE |
"""]]
+Note (1) FSINCOS is macro-op fused (see below).
+
# List of 2-arg opcodes
[[!table data="""
-opcode | Description | pseudo-code | Extension |
-FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
-FATAN2PI | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
-FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
-FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
-FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans |
+opcode | Description | pseudo-code | Extension |
+FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
+FATAN2PI | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
+FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
+FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
+FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans |
"""]]
# List of 1-arg transcendental opcodes
[[!table data="""
-opcode | Description | pseudo-code | Extension |
+opcode | Description | pseudo-code | Extension |
FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt |
FCBRT | Cube Root | rd = pow(rs1, 3) | Zftrans |
+FRECIP | Reciprocal | rd = 1.0 / rs1 | Zftrans |
FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans |
FLOG2 | log2 | rd = log2(rs1) | Zftrans |
-FEXPM1 | exponent minus 1 | rd = pow(e, rs1) - 1.0 | Zftrans |
+FEXPM1 | exponential minus 1 | rd = pow(e, rs1) - 1.0 | Zftrans |
FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | Zftrans |
-FEXP | exponent | rd = pow(e, rs1) | ZftransExt |
+FEXP | exponential | rd = pow(e, rs1) | ZftransExt |
FLOG | natural log (base e) | rd = log(e, rs1) | ZftransExt |
FEXP10 | power-of-10 | rd = pow(10, rs1) | ZftransExt |
FLOG10 | log base 10 | rd = log10(rs1) | ZftransExt |
FTAN | tan (radians) | rd = tan(rs1) | Ztrignpi |
FASIN | arcsin (radians) | rd = asin(rs1) | Zarctrignpi |
FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi |
+FATAN | arctan (radians) | rd = atan(rs1) | Zarctrignpi |
FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi |
FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi |
FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi |
-FASINPI | arcsin times pi | rd = asin(pi * rs1) | Zarctrigpi |
-FACOSPI | arccos times pi | rd = acos(pi * rs1) | Zarctrigpi |
-FATANPI | arctan times pi | rd = atan(pi * rs1) | Zarctrigpi |
+FASINPI | arcsin / pi | rd = asin(rs1) / pi | Zarctrigpi |
+FACOSPI | arccos / pi | rd = acos(rs1) / pi | Zarctrigpi |
+FATANPI | arctan / pi | rd = atan(rs1) / pi | Zarctrigpi |
FSINH | hyperbolic sin (radians) | rd = sinh(rs1) | Zfhyp |
FCOSH | hyperbolic cos (radians) | rd = cosh(rs1) | Zfhyp |
FTANH | hyperbolic tan (radians) | rd = tanh(rs1) | Zfhyp |
(loop invariant) set to "1.0" at the beginning of a function or other
suitable code block.
-* FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
-* FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
-* FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
* FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
* FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
fmv.x.s ft0, t0
fatan2pi.s rd, rs1, ft0
-Hypotenuse example (obviates need for Zfhyp except for high-performance):
+Hyperbolic function example (obviates need for Zfhyp except for
+high-performance or correctly-rounding):
- ASINH( x ) = ln( x + SQRT(x**2+1)
+ ASINH( x ) = ln( x + SQRT(x**2+1))
-LOG / LOGP1 example:
+# Reciprocal
- LOG(x) = LOGP1(x) + 1.0
- EXP(x) = EXPM1(x-1.0)
+Used to be an alias. Some imolementors may wish to implement divide as y times recip(x)
# To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
a decision
<http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
+> > correctly-rounded LOG will return different results than LOGP1 and ADD.
+> > Likewise for EXP and EXPM1
+
+> ok, they stay in as real opcodes, then.
+
+# ATAN / ATAN2 commentary
+
+Discussion starts here:
+<http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002470.html>
+
+from Mitch Alsup:
+
+would like to point out that the general implementations of ATAN2 do a
+bunch of special case checks and then simply call ATAN.
+
+ double ATAN2( double y, double x )
+ { // IEEE 754-2008 quality ATAN2
+
+ // deal with NANs
+ if( ISNAN( x ) ) return x;
+ if( ISNAN( y ) ) return y;
+
+ // deal with infinities
+ if( x == +∞ && |y|== +∞ ) return copysign( π/4, y );
+ if( x == +∞ ) return copysign( 0.0, y );
+ if( x == -∞ && |y|== +∞ ) return copysign( 3π/4, y );
+ if( x == -∞ ) return copysign( π, y );
+ if( |y|== +∞ ) return copysign( π/2, y );
+
+ // deal with signed zeros
+ if( x == 0.0 && y != 0.0 ) return copysign( π/2, y );
+ if( x >=+0.0 && y == 0.0 ) return copysign( 0.0, y );
+ if( x <=-0.0 && y == 0.0 ) return copysign( π, y );
+
+ // calculate ATAN2 textbook style
+ if( x > 0.0 ) return ATAN( |y / x| );
+ if( x < 0.0 ) return π - ATAN( |y / x| );
+ }
+
+
+Yet the proposed encoding makes ATAN2 the primitive and has ATAN invent
+a constant and then call/use ATAN2.
+
+When one considers an implementation of ATAN, one must consider several
+ranges of evaluation::
+
+ x [ -∞, -1.0]:: ATAN( x ) = -π/2 + ATAN( 1/x );
+ x (-1.0, +1.0]:: ATAN( x ) = + ATAN( x );
+ x [ 1.0, +∞]:: ATAN( x ) = +π/2 - ATAN( 1/x );
+
+I should point out that the add/sub of π/2 can not lose significance
+since the result of ATAN(1/x) is bounded 0..π/2
+
+The bottom line is that I think you are choosing to make too many of
+these into OpCodes, making the hardware function/calculation unit (and
+sequencer) more complicated that necessary.
+
+--------------------------------------------------------
+
+I might suggest that if there were a way for a calculation to be performed
+and the result of that calculation
+
+chained to a subsequent calculation such that the precision of the
+result-becomes-operand is wider than
+
+what will fit in a register, then you can dramatically reduce the count
+of instructions in this category while retaining
+
+acceptable accuracy:
+
+ z = x / y
+
+can be calculated as::
+
+ z = x * (1/y)
+
+Where 1/y has about 26-to-32 bits of fraction. No, it's not IEEE 754-2008
+accurate, but GPUs want speed and
+
+1/y is fully pipelined (F32) while x/y cannot be (at reasonable area). It
+is also not "that inaccurate" displaying 0.625-to-0.52 ULP.
+
+Given that one has the ability to carry (and process) more fraction bits,
+one can then do high precision multiplies of π or other transcendental
+radixes.
+
+And GPUs have been doing this almost since the dawn of 3D.
+
+ // calculate ATAN2 high performance style
+ // Note: at this point x != y
+ //
+ if( x > 0.0 )
+ {
+ if( y < 0.0 && |y| < |x| ) return - π/2 - ATAN( x / y );
+ if( y < 0.0 && |y| > |x| ) return + ATAN( y / x );
+ if( y > 0.0 && |y| < |x| ) return + ATAN( y / x );
+ if( y > 0.0 && |y| > |x| ) return + π/2 - ATAN( x / y );
+ }
+ if( x < 0.0 )
+ {
+ if( y < 0.0 && |y| < |x| ) return + π/2 + ATAN( x / y );
+ if( y < 0.0 && |y| > |x| ) return + π - ATAN( y / x );
+ if( y > 0.0 && |y| < |x| ) return + π - ATAN( y / x );
+ if( y > 0.0 && |y| > |x| ) return +3π/2 + ATAN( x / y );
+ }
+
+This way the adds and subtracts from the constant are not in a precision
+precarious position.