(no commit message)
[libreriscv.git] / openpower / transcendentals / discussion.mdwn
1 # Discussion
2
3 * <http://bugs.libre-soc.org/show_bug.cgi?id=127>
4 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
5 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
6 * [[power_trans_ops]] for opcode listing.
7 * https://mathr.co.uk/blog/2015-04-21_approximating_cosine.html
8
9 TODO:
10
11 * Decision on accuracy, moved to [[zfpacc_proposal]]
12 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
13 * Errors **MUST** be repeatable.
14 * How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
15 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
16 Accuracy requirements for dual (triple) purpose implementations must
17 meet the higher standard.
18 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
19 it is desirable on its own by other implementors. This to be evaluated.
20
21 # Evaluation and commentary
22
23 This section now in discussion
24
25 ## Reciprocal
26
27 Used to be an alias. Some implementors may wish to implement divide as
28 y times recip(x).
29
30 Others may have shared hardware for recip and divide, others may not.
31
32 To avoid penalising one implementor over another, recip stays.
33
34 ## To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
35
36 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
37 Research needed to ensure that implementors are not compromised by such
38 a decision
39 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
40
41 > > correctly-rounded LOG will return different results than LOGP1 and ADD.
42 > > Likewise for EXP and EXPM1
43
44 > ok, they stay in as real opcodes, then.
45
46 ## ATAN / ATAN2 commentary
47
48 Discussion starts here:
49 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002470.html>
50
51 from Mitch Alsup:
52
53 would like to point out that the general implementations of ATAN2 do a
54 bunch of special case checks and then simply call ATAN.
55
56 double ATAN2( double y, double x )
57 { // IEEE 754-2008 quality ATAN2
58
59 // deal with NANs
60 if( ISNAN( x ) ) return x;
61 if( ISNAN( y ) ) return y;
62
63 // deal with infinities
64 if( x == +∞ && |y|== +∞ ) return copysign( π/4, y );
65 if( x == +∞ ) return copysign( 0.0, y );
66 if( x == -∞ && |y|== +∞ ) return copysign( 3π/4, y );
67 if( x == -∞ ) return copysign( π, y );
68 if( |y|== +∞ ) return copysign( π/2, y );
69
70 // deal with signed zeros
71 if( x == 0.0 && y != 0.0 ) return copysign( π/2, y );
72 if( x >=+0.0 && y == 0.0 ) return copysign( 0.0, y );
73 if( x <=-0.0 && y == 0.0 ) return copysign( π, y );
74
75 // calculate ATAN2 textbook style
76 if( x > 0.0 ) return ATAN( |y / x| );
77 if( x < 0.0 ) return π - ATAN( |y / x| );
78 }
79
80
81 Yet the proposed encoding makes ATAN2 the primitive and has ATAN invent
82 a constant and then call/use ATAN2.
83
84 When one considers an implementation of ATAN, one must consider several
85 ranges of evaluation::
86
87 x [ -∞, -1.0]:: ATAN( x ) = -π/2 + ATAN( 1/x );
88 x (-1.0, +1.0]:: ATAN( x ) = + ATAN( x );
89 x [ 1.0, +∞]:: ATAN( x ) = +π/2 - ATAN( 1/x );
90
91 I should point out that the add/sub of π/2 can not lose significance
92 since the result of ATAN(1/x) is bounded 0..π/2
93
94 The bottom line is that I think you are choosing to make too many of
95 these into OpCodes, making the hardware function/calculation unit (and
96 sequencer) more complicated that necessary.
97
98 --------------------------------------------------------
99
100 We therefore I think have a case for bringing back ATAN and including ATAN2.
101
102 The reason is that whilst a microcode-like GPU-centric platform would
103 do ATAN2 in terms of ATAN, a UNIX-centric platform would do it the other
104 way round.
105
106 (that is the hypothesis, to be evaluated for correctness. feedback requested).
107
108 This because we cannot compromise or prioritise one platfrom's
109 speed/accuracy over another. That is not reasonable or desirable, to
110 penalise one implementor over another.
111
112 Thus, all implementors, to keep interoperability, must both have both
113 opcodes and may choose, at the architectural and routing level, which
114 one to implement in terms of the other.
115
116 Allowing implementors to choose to add either opcode and let traps sort it
117 out leaves an uncertainty in the software developer's mind: they cannot
118 trust the hardware, available from many vendors, to be performant right
119 across the board.
120
121 Standards are a pig.
122
123 ---
124
125 I might suggest that if there were a way for a calculation to be performed
126 and the result of that calculation chained to a subsequent calculation
127 such that the precision of the result-becomes-operand is wider than
128 what will fit in a register, then you can dramatically reduce the count
129 of instructions in this category while retaining
130
131 acceptable accuracy:
132
133 z = x / y
134
135 can be calculated as::
136
137 z = x * (1/y)
138
139 Where 1/y has about 26-to-32 bits of fraction. No, it's not IEEE 754-2008
140 accurate, but GPUs want speed and
141
142 1/y is fully pipelined (F32) while x/y cannot be (at reasonable area). It
143 is also not "that inaccurate" displaying 0.625-to-0.52 ULP.
144
145 Given that one has the ability to carry (and process) more fraction bits,
146 one can then do high precision multiplies of π or other transcendental
147 radixes.
148
149 And GPUs have been doing this almost since the dawn of 3D.
150
151 // calculate ATAN2 high performance style
152 // Note: at this point x != y
153 //
154 if( x > 0.0 )
155 {
156 if( y < 0.0 && |y| < |x| ) return - π/2 - ATAN( x / y );
157 if( y < 0.0 && |y| > |x| ) return + ATAN( y / x );
158 if( y > 0.0 && |y| < |x| ) return + ATAN( y / x );
159 if( y > 0.0 && |y| > |x| ) return + π/2 - ATAN( x / y );
160 }
161 if( x < 0.0 )
162 {
163 if( y < 0.0 && |y| < |x| ) return + π/2 + ATAN( x / y );
164 if( y < 0.0 && |y| > |x| ) return + π - ATAN( y / x );
165 if( y > 0.0 && |y| < |x| ) return + π - ATAN( y / x );
166 if( y > 0.0 && |y| > |x| ) return +3π/2 + ATAN( x / y );
167 }
168
169 This way the adds and subtracts from the constant are not in a precision
170 precarious position.