openpower/transcendentals/discussion.mdwn

   1 # Discussion
   2
   3 * <http://bugs.libre-soc.org/show_bug.cgi?id=127>
   4 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
   5 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
   6 * [[power_trans_ops]] for opcode listing.
   7 * https://mathr.co.uk/blog/2015-04-21_approximating_cosine.html
   8
   9 TODO:
  10
  11 * Decision on accuracy, moved to [[zfpacc_proposal]]
  12 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
  13 * Errors **MUST** be repeatable.
  14 * How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
  15 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
  16   Accuracy requirements for dual (triple) purpose implementations must
  17   meet the higher standard.
  18 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
  19   it is desirable on its own by other implementors.  This to be evaluated.
  20
  21 # Evaluation and commentary
  22
  23 This section now in discussion
  24
  25 ## Reciprocal
  26
  27 Used to be an alias. Some implementors may wish to implement divide as
  28 y times recip(x).
  29
  30 Others may have shared hardware for recip and divide, others may not.
  31
  32 To avoid penalising one implementor over another, recip stays.
  33
  34 ## To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
  35
  36 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
  37 Research needed to ensure that implementors are not compromised by such
  38 a decision
  39 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
  40
  41 > > correctly-rounded LOG will return different results than LOGP1 and ADD.
  42 > > Likewise for EXP and EXPM1
  43
  44 > ok, they stay in as real opcodes, then.
  45
  46 ## ATAN / ATAN2 commentary
  47
  48 Discussion starts here:
  49 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002470.html>
  50
  51 from Mitch Alsup:
  52
  53 would like to point out that the general implementations of ATAN2 do a
  54 bunch of special case checks and then simply call ATAN.
  55
  56     double ATAN2( double y, double x )
  57     {   // IEEE 754-2008 quality ATAN2
  58
  59         // deal with NANs
  60         if( ISNAN( x )             ) return x;
  61         if( ISNAN( y )             ) return y;
  62
  63         // deal with infinities
  64         if( x == +∞    && |y|== +∞  ) return copysign(  π/4, y );
  65         if( x == +∞                 ) return copysign(  0.0, y );
  66         if( x == -∞    && |y|== +∞  ) return copysign( 3π/4, y );
  67         if( x == -∞                 ) return copysign(    π, y );
  68         if(               |y|== +∞  ) return copysign(  π/2, y );
  69
  70         // deal with signed zeros
  71         if( x == 0.0  &&  y != 0.0 ) return copysign(  π/2, y );
  72         if( x >=+0.0  &&  y == 0.0 ) return copysign(  0.0, y );
  73         if( x <=-0.0  &&  y == 0.0 ) return copysign(    π, y );
  74
  75         // calculate ATAN2 textbook style
  76         if( x  > 0.0               ) return     ATAN( |y / x| );
  77         if( x  < 0.0               ) return π - ATAN( |y / x| );
  78     }
  79
  80
  81 Yet the proposed encoding makes ATAN2 the primitive and has ATAN invent
  82 a constant and then call/use ATAN2.
  83
  84 When one considers an implementation of ATAN, one must consider several
  85 ranges of evaluation::
  86
  87      x  [  -∞, -1.0]:: ATAN( x ) = -π/2 + ATAN( 1/x );
  88      x  (-1.0, +1.0]:: ATAN( x ) =      + ATAN(   x );
  89      x  [ 1.0,   +∞]:: ATAN( x ) = +π/2 - ATAN( 1/x );
  90
  91 I should point out that the add/sub of π/2 can not lose significance
  92 since the result of ATAN(1/x) is bounded 0..π/2
  93
  94 The bottom line is that I think you are choosing to make too many of
  95 these into OpCodes, making the hardware function/calculation unit (and
  96 sequencer) more complicated that necessary.
  97
  98 --------------------------------------------------------
  99
 100 We therefore I think have a case for bringing back ATAN and including ATAN2.
 101
 102 The reason is that whilst a microcode-like GPU-centric platform would
 103 do ATAN2 in terms of ATAN, a UNIX-centric platform would do it the other
 104 way round.
 105
 106 (that is the hypothesis, to be evaluated for correctness. feedback requested).
 107
 108 This because we cannot compromise or prioritise one platfrom's
 109 speed/accuracy over another. That is not reasonable or desirable, to
 110 penalise one implementor over another.
 111
 112 Thus, all implementors, to keep interoperability, must both have both
 113 opcodes and may choose, at the architectural and routing level, which
 114 one to implement in terms of the other.
 115
 116 Allowing implementors to choose to add either opcode and let traps sort it
 117 out leaves an uncertainty in the software developer's mind: they cannot
 118 trust the hardware, available from many vendors, to be performant right
 119 across the board.
 120
 121 Standards are a pig.
 122
 123 ---
 124
 125 I might suggest that if there were a way for a calculation to be performed
 126 and the result of that calculation chained to a subsequent calculation
 127 such that the precision of the result-becomes-operand is wider than
 128 what will fit in a register, then you can dramatically reduce the count
 129 of instructions in this category while retaining
 130
 131 acceptable accuracy:
 132
 133      z = x / y
 134
 135 can be calculated as::
 136
 137      z = x * (1/y)
 138
 139 Where 1/y has about 26-to-32 bits of fraction. No, it's not IEEE 754-2008
 140 accurate, but GPUs want speed and
 141
 142 1/y is fully pipelined (F32) while x/y cannot be (at reasonable area). It
 143 is also not "that inaccurate" displaying 0.625-to-0.52 ULP.
 144
 145 Given that one has the ability to carry (and process) more fraction bits,
 146 one can then do high precision multiplies of  π or other transcendental
 147 radixes.
 148
 149 And GPUs have been doing this almost since the dawn of 3D.
 150
 151     // calculate ATAN2 high performance style
 152     // Note: at this point x != y
 153     //
 154     if( x  > 0.0             )
 155     {
 156         if( y < 0.0 && |y| < |x| ) return - π/2 - ATAN( x / y );
 157         if( y < 0.0 && |y| > |x| ) return       + ATAN( y / x );
 158         if( y > 0.0 && |y| < |x| ) return       + ATAN( y / x );
 159         if( y > 0.0 && |y| > |x| ) return + π/2 - ATAN( x / y );
 160     }
 161     if( x  < 0.0             )
 162     {
 163         if( y < 0.0 && |y| < |x| ) return + π/2 + ATAN( x / y );
 164         if( y < 0.0 && |y| > |x| ) return + π   - ATAN( y / x );
 165         if( y > 0.0 && |y| < |x| ) return + π   - ATAN( y / x );
 166         if( y > 0.0 && |y| > |x| ) return +3π/2 + ATAN( x / y );
 167     }
 168
 169 This way the adds and subtracts from the constant are not in a precision
 170 precarious position.