X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=ztrans_proposal.mdwn;h=615d0eab43434243255808021fde252f3f6ff1a9;hb=fd755ad6aa92330ce0d17636acb4cdeaac31f418;hp=9cc4112eb14a1554fee34407de9e1d79d583af2f;hpb=894a7d11587c87e9cb3a2d96122cfe0a147ea4fc;p=libreriscv.git diff --git a/ztrans_proposal.mdwn b/ztrans_proposal.mdwn index 9cc4112eb..615d0eab4 100644 --- a/ztrans_proposal.mdwn +++ b/ztrans_proposal.mdwn @@ -1,61 +1,296 @@ -# Ztrans - transcendental operations +# Zftrans - transcendental operations See: * * +* Discussion: +* [[rv_major_opcode_1010011]] for opcode listing. +* [[zfpacc_proposal]] for accuracy settings proposal Extension subsets: -* **Ztrans**: standard transcendentals (best suited to 3D) -* **ZtransExt**: extra functions (useful, not generally needed for 3D) -* **ZtransAdv**: much more complex to implement in hardware +* **Zftrans**: standard transcendentals (best suited to 3D) +* **ZftransExt**: extra functions (useful, not generally needed for 3D, + can be synthesised using Ztrans) +* **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi +* **Ztrignpi**: trig non-xxx-pi sin cos tan +* **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi +* **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos +* **Zfhyp**: hyperbolic/inverse-hyperbolic. sinh, cosh, tanh, asinh, + acosh, atanh (can be synthesised - see below) +* **ZftransAdv**: much more complex to implement in hardware +* **Zfrsqrt**: Reciprocal square-root. + +Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi, +Zarctrignpi [[!toc levels=2]] +# TODO: + +* Decision on accuracy, moved to [[zfpacc_proposal]] + +* Errors **MUST** be repeatable. +* How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded? + + Accuracy requirements for dual (triple) purpose implementations must + meet the higher standard. +* Reciprocal Square-root is in its own separate extension (Zfrsqrt) as + it is desirable on its own by other implementors. This to be evaluated. + +# Proposed Opcodes vs Khronos OpenCL Opcodes + +This list shows the (direct) equivalence between proposed opcodes and +their Khronos OpenCL equivalents. + +See + + +Special FP16 opcodes are *not* being proposed, except by indirect / inherent +use of the "fmt" field that is already present in the RISC-V Specification. + +"Native" opcodes are *not* being proposed: implementors will be expected +to use the (equivalent) proposed opcode covering the same function. + +"Fast" opcodes are *not* being proposed, because the Khronos Specification +fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require +vectors (or can be done as scalar operations using other RISC-V instructions). + +The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes. +Deviation from conformance with the Khronos Specification - including the +Khronos Specification accuracy requirements - is not an option. + +[[!table data=""" +Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast | +FSIN | sin | half\_sin | native\_sin | NONE | +FCOS | cos | half\_cos | native\_cos | NONE | +FTAN | tan | half\_tan | native\_tan | NONE | +NONE (1) | sincos | NONE | NONE | NONE | +FASIN | asin | NONE | NONE | NONE | +FACOS | acos | NONE | NONE | NONE | +FATAN | atan | NONE | NONE | NONE | +FSINPI | sinpi | NONE | NONE | NONE | +FCOSPI | cospi | NONE | NONE | NONE | +FTANPI | tanpi | NONE | NONE | NONE | +FASINPI | asinpi | NONE | NONE | NONE | +FACOSPI | acospi | NONE | NONE | NONE | +FATANPI | atanpi | NONE | NONE | NONE | +FSINH | sinh | NONE | NONE | NONE | +FCOSH | cosh | NONE | NONE | NONE | +FTANH | tanh | NONE | NONE | NONE | +FASINH | asinh | NONE | NONE | NONE | +FACOSH | acosh | NONE | NONE | NONE | +FATANH | atanh | NONE | NONE | NONE | +FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE | +FCBRT | cbrt | NONE | NONE | NONE | +FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE | +FLOG2 | log2 | half\_log2 | native\_log2 | NONE | +FEXPM1 | expm1 | NONE | NONE | NONE | +FLOG1P | log1p | NONE | NONE | NONE | +FEXP | exp | half\_exp | native\_exp | NONE | +FLOG | log | half\_log | native\_log | NONE | +FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE | +FLOG10 | log10 | half\_log10 | native\_log10 | NONE | +FATAN2 | atan2 | NONE | NONE | NONE | +FATAN2PI | atan2pi | NONE | NONE | NONE | +FPOW | pow | NONE | NONE | NONE | +FROOT | rootn | NONE | NONE | NONE | +FHYPOT | hypot | NONE | NONE | NONE | +FRECIP | NONE | half\_recip | native\_recip | NONE | +"""]] + +Note (1) FSINCOS is macro-op fused (see below). + # List of 2-arg opcodes [[!table data=""" -opcode | Description | pseudo-code | Extension | -FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Ztrans | -FATAN2PI | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi | ZtransExt | -FPOW | x power of y | rd = pow(rs1, rs2) | ZtransAdv | -FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZtransAdv | +opcode | Description | pseudo-code | Extension | +FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi | +FATAN2PI | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi | +FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv | +FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv | +FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans | +"""]] + +# List of 1-arg transcendental opcodes + +[[!table data=""" +opcode | Description | pseudo-code | Extension | +FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt | +FCBRT | Cube Root | rd = pow(rs1, 3) | Zftrans | +FRECIP | Reciprocal | rd = 1.0 / rs1 | Zftrans | +FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans | +FLOG2 | log2 | rd = log2(rs1) | Zftrans | +FEXPM1 | exponential minus 1 | rd = pow(e, rs1) - 1.0 | Zftrans | +FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | Zftrans | +FEXP | exponential | rd = pow(e, rs1) | ZftransExt | +FLOG | natural log (base e) | rd = log(e, rs1) | ZftransExt | +FEXP10 | power-of-10 | rd = pow(10, rs1) | ZftransExt | +FLOG10 | log base 10 | rd = log10(rs1) | ZftransExt | """]] -# List of 1-arg opcodes +# List of 1-arg trigonometric opcodes [[!table data=""" opcode | Description | pseudo-code | Extension | -FCBRT | Cube Root | rd = pow(rs1, 3) | Ztrans | -FEXP2 | power-of-2 | rd = pow(2, rs1) | Ztrans | -FLOG2 | log2 | rd = log2(rs1) | Ztrans | -FEXPM1 | exponent minus 1 | rd = pow(e, rs1) - 1.0 | Ztrans | -FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | Ztrans | -FEXP | exponent | rd = pow(e, rs1) | ZtransExt | -FLOG | natural log (base e) | rd = log(e, rs1) | ZtransExt | -FEXP10 | power-of-10 | rd = pow(10, rs1) | ZtransExt | -FLOG10 | log base 10 | rd = log10(rs1) | ZtransExt | -FSIN | sin (radians) | | Ztrans | -FCOS | cos (radians) | | Ztrans | -FTAN | tan (radians) | | Ztrans | -FASIN | arcsin (radians) | rd = asin(rs1) | Ztrans | -FACOS | arccos (radians) | rd = acos(rs1) | Ztrans | -FATAN | arctan (radians) | rd = atan(rs1) | Ztrans | -FSINPI | sin times pi | rd = sin(pi * rs1) | ZtransExt | -FCOSPI | cos times pi | rd = cos(pi * rs1) | ZtransExt | -FTANPI | tan times pi | rd = tan(pi * rs1) | ZtransExt | -FSINH | hyperbolic sin (radians) | | ZtransExt | -FCOSH | hyperbolic cos (radians) | | ZtransExt | -FTANH | hyperbolic tan (radians) | | ZtransExt | -FASINH | inverse hyperbolic sin | | ZtransExt | -FACOSH | inverse hyperbolic cos | | ZtransExt | -FATANH | inverse hyperbolic tan | | ZtransExt | +FSIN | sin (radians) | rd = sin(rs1) | Ztrignpi | +FCOS | cos (radians) | rd = cos(rs1) | Ztrignpi | +FTAN | tan (radians) | rd = tan(rs1) | Ztrignpi | +FASIN | arcsin (radians) | rd = asin(rs1) | Zarctrignpi | +FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi | +FATAN | arctan (radians) | rd = atan(rs1) | Zarctrignpi | +FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi | +FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi | +FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi | +FASINPI | arcsin / pi | rd = asin(rs1) / pi | Zarctrigpi | +FACOSPI | arccos / pi | rd = acos(rs1) / pi | Zarctrigpi | +FATANPI | arctan / pi | rd = atan(rs1) / pi | Zarctrigpi | +FSINH | hyperbolic sin (radians) | rd = sinh(rs1) | Zfhyp | +FCOSH | hyperbolic cos (radians) | rd = cosh(rs1) | Zfhyp | +FTANH | hyperbolic tan (radians) | rd = tanh(rs1) | Zfhyp | +FASINH | inverse hyperbolic sin | rd = asinh(rs1) | Zfhyp | +FACOSH | inverse hyperbolic cos | rd = acosh(rs1) | Zfhyp | +FATANH | inverse hyperbolic tan | rd = atanh(rs1) | Zfhyp | """]] -# Pseudo-code ops +# Synthesis, Pseudo-code ops and macro-ops + +The pseudo-ops are best left up to the compiler rather than being actual +pseudo-ops, by allocating one scalar FP register for use as a constant +(loop invariant) set to "1.0" at the beginning of a function or other +suitable code block. + +* FSINCOS - fused macro-op between FSIN and FCOS (issued in that order). +* FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order). + +FATANPI example pseudo-code: + + lui t0, 0x3F800 // upper bits of f32 1.0 + fmv.x.s ft0, t0 + fatan2pi.s rd, rs1, ft0 + +Hyperbolic function example (obviates need for Zfhyp except for +high-performance or correctly-rounding): + + ASINH( x ) = ln( x + SQRT(x**2+1)) + +# Reciprocal + +Used to be an alias. Some imolementors may wish to implement divide as y times recip(x) + +# To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)? + +RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD". +Research needed to ensure that implementors are not compromised by such +a decision + + +> > correctly-rounded LOG will return different results than LOGP1 and ADD. +> > Likewise for EXP and EXPM1 + +> ok, they stay in as real opcodes, then. + +# ATAN / ATAN2 commentary + +Discussion starts here: + + +from Mitch Alsup: + +would like to point out that the general implementations of ATAN2 do a +bunch of special case checks and then simply call ATAN. + + double ATAN2( double y, double x ) + { // IEEE 754-2008 quality ATAN2 + + // deal with NANs + if( ISNAN( x ) ) return x; + if( ISNAN( y ) ) return y; + + // deal with infinities + if( x == +∞ && |y|== +∞ ) return copysign( π/4, y ); + if( x == +∞ ) return copysign( 0.0, y ); + if( x == -∞ && |y|== +∞ ) return copysign( 3π/4, y ); + if( x == -∞ ) return copysign( π, y ); + if( |y|== +∞ ) return copysign( π/2, y ); + + // deal with signed zeros + if( x == 0.0 && y != 0.0 ) return copysign( π/2, y ); + if( x >=+0.0 && y == 0.0 ) return copysign( 0.0, y ); + if( x <=-0.0 && y == 0.0 ) return copysign( π, y ); + + // calculate ATAN2 textbook style + if( x > 0.0 ) return ATAN( |y / x| ); + if( x < 0.0 ) return π - ATAN( |y / x| ); + } + + +Yet the proposed encoding makes ATAN2 the primitive and has ATAN invent +a constant and then call/use ATAN2. + +When one considers an implementation of ATAN, one must consider several +ranges of evaluation:: + + x [ -∞, -1.0]:: ATAN( x ) = -π/2 + ATAN( 1/x ); + x (-1.0, +1.0]:: ATAN( x ) = + ATAN( x ); + x [ 1.0, +∞]:: ATAN( x ) = +π/2 - ATAN( 1/x ); + +I should point out that the add/sub of π/2 can not lose significance +since the result of ATAN(1/x) is bounded 0..π/2 + +The bottom line is that I think you are choosing to make too many of +these into OpCodes, making the hardware function/calculation unit (and +sequencer) more complicated that necessary. + +-------------------------------------------------------- + +I might suggest that if there were a way for a calculation to be performed +and the result of that calculation + +chained to a subsequent calculation such that the precision of the +result-becomes-operand is wider than + +what will fit in a register, then you can dramatically reduce the count +of instructions in this category while retaining + +acceptable accuracy: + + z = x / y + +can be calculated as:: + + z = x * (1/y) + +Where 1/y has about 26-to-32 bits of fraction. No, it's not IEEE 754-2008 +accurate, but GPUs want speed and + +1/y is fully pipelined (F32) while x/y cannot be (at reasonable area). It +is also not "that inaccurate" displaying 0.625-to-0.52 ULP. + +Given that one has the ability to carry (and process) more fraction bits, +one can then do high precision multiplies of π or other transcendental +radixes. + +And GPUs have been doing this almost since the dawn of 3D. -* FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1 -* SINCOS - fused macro-op between FSIN and FCOS (issued in that order). -* SINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order). + // calculate ATAN2 high performance style + // Note: at this point x != y + // + if( x > 0.0 ) + { + if( y < 0.0 && |y| < |x| ) return - π/2 - ATAN( x / y ); + if( y < 0.0 && |y| > |x| ) return + ATAN( y / x ); + if( y > 0.0 && |y| < |x| ) return + ATAN( y / x ); + if( y > 0.0 && |y| > |x| ) return + π/2 - ATAN( x / y ); + } + if( x < 0.0 ) + { + if( y < 0.0 && |y| < |x| ) return + π/2 + ATAN( x / y ); + if( y < 0.0 && |y| > |x| ) return + π - ATAN( y / x ); + if( y > 0.0 && |y| < |x| ) return + π - ATAN( y / x ); + if( y > 0.0 && |y| > |x| ) return +3π/2 + ATAN( x / y ); + } +This way the adds and subtracts from the constant are not in a precision +precarious position.