Merge branch 'drm-gem'
[mesa.git] / docs / shading.html
1 <HTML>
2
3 <TITLE>Shading Language Support</TITLE>
4
5 <link rel="stylesheet" type="text/css" href="mesa.css"></head>
6
7 <BODY>
8
9 <H1>Shading Language Support</H1>
10
11 <p>
12 This page describes the features and status of Mesa's support for the
13 <a href="http://opengl.org/documentation/glsl/" target="_parent">
14 OpenGL Shading Language</a>.
15 </p>
16
17 <p>
18 Last updated on 28 March 2007.
19 </p>
20
21 <p>
22 Contents
23 </p>
24 <ul>
25 <li><a href="#unsup">Unsupported Features</a>
26 <li><a href="#notes">Implementation Notes</a>
27 <li><a href="#hints">Programming Hints</a>
28 <li><a href="#standalone">Stand-alone GLSL Compiler</a>
29 <li><a href="#implementation">Compiler Implementation</a>
30 <li><a href="#validation">Compiler Validation</a>
31 <li><a href="#120">GLSL 1.20 support</a>
32 </ul>
33
34
35 <a name="unsup">
36 <h2>Unsupported Features</h2>
37
38 <p>
39 The following features of the shading language are not yet supported
40 in Mesa:
41 </p>
42
43 <ul>
44 <li>Dereferencing arrays with non-constant indexes
45 <li>Comparison of user-defined structs
46 <li>Linking of multiple shaders is not supported
47 <li>gl_ClipVertex
48 <li>The derivative functions such as dFdx() are not implemented
49 <li>The inverse trig functions asin(), acos(), and atan() are not implemented
50 <li>The gl_Color and gl_SecondaryColor varying vars are interpolated
51 without perspective correction
52 <li>Floating point literal suffixes 'f' and 'F' aren't allowed.
53 </ul>
54
55 <p>
56 All other major features of the shading language should function.
57 </p>
58
59
60 <a name="notes">
61 <h2>Implementation Notes</h2>
62
63 <ul>
64 <li>Shading language programs are compiled into low-level programs
65 very similar to those of GL_ARB_vertex/fragment_program.
66 <li>All vector types (vec2, vec3, vec4, bvec2, etc) currently occupy full
67 float[4] registers.
68 <li>Float constants and variables are packed so that up to four floats
69 can occupy one program parameter/register.
70 <li>All function calls are inlined.
71 <li>Shaders which use too many registers will not compile.
72 <li>The quality of generated code is pretty good, register usage is fair.
73 <li>Shader error detection and reporting of errors (InfoLog) is not
74 very good yet.
75 <li>The ftransform() function doesn't necessarily match the results of
76 fixed-function transformation.
77 </ul>
78
79 <p>
80 These issues will be addressed/resolved in the future.
81 </p>
82
83
84 <a name="hints">
85 <h2>Programming Hints</h2>
86
87 <ul>
88 <li>Declare <em>in</em> function parameters as <em>const</em> whenever possible.
89 This improves the efficiency of function inlining.
90 </li>
91 <br>
92 <li>To reduce register usage, declare variables within smaller scopes.
93 For example, the following code:
94 <pre>
95 void main()
96 {
97 vec4 a1, a2, b1, b2;
98 gl_Position = expression using a1, a2.
99 gl_Color = expression using b1, b2;
100 }
101 </pre>
102 Can be rewritten as follows to use half as many registers:
103 <pre>
104 void main()
105 {
106 {
107 vec4 a1, a2;
108 gl_Position = expression using a1, a2.
109 }
110 {
111 vec4 b1, b2;
112 gl_Color = expression using b1, b2;
113 }
114 }
115 </pre>
116 Alternately, rather than using several float variables, use
117 a vec4 instead. Use swizzling and writemasks to access the
118 components of the vec4 as floats.
119 </li>
120 <br>
121 <li>Use the built-in library functions whenever possible.
122 For example, instead of writing this:
123 <pre>
124 float x = 1.0 / sqrt(y);
125 </pre>
126 Write this:
127 <pre>
128 float x = inversesqrt(y);
129 </pre>
130 <li>
131 Use ++i when possible as it's more efficient than i++
132 </li>
133 </ul>
134
135
136 <a name="standalone">
137 <h2>Stand-alone GLSL Compiler</h2>
138
139 <p>
140 A unique stand-alone GLSL compiler driver has been added to Mesa.
141 <p>
142
143 <p>
144 The stand-alone compiler (like a conventional command-line compiler)
145 is a tool that accepts Shading Language programs and emits low-level
146 GPU programs.
147 </p>
148
149 <p>
150 This tool is useful for:
151 <p>
152 <ul>
153 <li>Inspecting GPU code to gain insight into compilation
154 <li>Generating initial GPU code for subsequent hand-tuning
155 <li>Debugging the GLSL compiler itself
156 </ul>
157
158 <p>
159 After building Mesa, the glslcompiler can be built by manually running:
160 </p>
161 <pre>
162 cd src/mesa/drivers/glslcompiler
163 make
164 </pre>
165
166
167 <p>
168 Here's an example of using the compiler to compile a vertex shader and
169 emit GL_ARB_vertex_program-style instructions:
170 </p>
171 <pre>
172 bin/glslcompiler --debug --numbers --fs progs/glsl/CH06-brick.frag.txt
173 </pre>
174 <p>
175 results in:
176 </p>
177 <pre>
178 # Fragment Program/Shader
179 0: RCP TEMP[4].x, UNIFORM[2].xxxx;
180 1: RCP TEMP[4].y, UNIFORM[2].yyyy;
181 2: MUL TEMP[3].xy, VARYING[0], TEMP[4];
182 3: MOV TEMP[1], TEMP[3];
183 4: MUL TEMP[0].w, TEMP[1].yyyy, CONST[4].xxxx;
184 5: FRC TEMP[1].z, TEMP[0].wwww;
185 6: SGT.C TEMP[0].w, TEMP[1].zzzz, CONST[4].xxxx;
186 7: IF (NE.wwww); # (if false, goto 9);
187 8: ADD TEMP[1].x, TEMP[1].xxxx, CONST[4].xxxx;
188 9: ENDIF;
189 10: FRC TEMP[1].xy, TEMP[1];
190 11: SGT TEMP[2].xy, UNIFORM[3], TEMP[1];
191 12: MUL TEMP[1].z, TEMP[2].xxxx, TEMP[2].yyyy;
192 13: LRP TEMP[0], TEMP[1].zzzz, UNIFORM[0], UNIFORM[1];
193 14: MUL TEMP[0].xyz, TEMP[0], VARYING[1].xxxx;
194 15: MOV OUTPUT[0].xyz, TEMP[0];
195 16: MOV OUTPUT[0].w, CONST[4].yyyy;
196 17: END
197 </pre>
198
199 <p>
200 Note that some shading language constructs (such as uniform and varying
201 variables) aren't expressible in ARB or NV-style programs.
202 Therefore, the resulting output is not always legal by definition of
203 those program languages.
204 </p>
205 <p>
206 Also note that this compiler driver is still under development.
207 Over time, the correctness of the GPU programs, with respect to the ARB
208 and NV languagues, should improve.
209 </p>
210
211
212
213 <a name="implementation">
214 <h2>Compiler Implementation</h2>
215
216 <p>
217 The source code for Mesa's shading language compiler is in the
218 <code>src/mesa/shader/slang/</code> directory.
219 </p>
220
221 <p>
222 The compiler follows a fairly standard design and basically works as follows:
223 </p>
224 <ul>
225 <li>The input string is tokenized (see grammar.c) and parsed
226 (see slang_compiler_*.c) to produce an Abstract Syntax Tree (AST).
227 The nodes in this tree are slang_operation structures
228 (see slang_compile_operation.h).
229 The nodes are decorated with symbol table, scoping and datatype information.
230 <li>The AST is converted into an Intermediate representation (IR) tree
231 (see the slang_codegen.c file).
232 The IR nodes represent basic GPU instructions, like add, dot product,
233 move, etc.
234 The IR tree is mostly a binary tree, but a few nodes have three or four
235 children.
236 In principle, the IR tree could be executed by doing an in-order traversal.
237 <li>The IR tree is traversed in-order to emit code (see slang_emit.c).
238 This is also when registers are allocated to store variables and temps.
239 <li>In the future, a pattern-matching code generator-generator may be
240 used for code generation.
241 Programs such as L-BURG (Bottom-Up Rewrite Generator) and Twig look for
242 patterns in IR trees, compute weights for subtrees and use the weights
243 to select the best instructions to represent the sub-tree.
244 <li>The emitted GPU instructions (see prog_instruction.h) are stored in a
245 gl_program object (see mtypes.h).
246 <li>When a fragment shader and vertex shader are linked (see slang_link.c)
247 the varying vars are matched up, uniforms are merged, and vertex
248 attributes are resolved (rewriting instructions as needed).
249 </ul>
250
251 <p>
252 The final vertex and fragment programs may be interpreted in software
253 (see prog_execute.c) or translated into a specific hardware architecture
254 (see drivers/dri/i915/i915_fragprog.c for example).
255 </p>
256
257 <h3>Code Generation Options</h3>
258
259 <p>
260 Internally, there are several options that control the compiler's code
261 generation and instruction selection.
262 These options are seen in the gl_shader_state struct and may be set
263 by the device driver to indicate its preferences:
264
265 <pre>
266 struct gl_shader_state
267 {
268 ...
269 /** Driver-selectable options: */
270 GLboolean EmitHighLevelInstructions;
271 GLboolean EmitCondCodes;
272 GLboolean EmitComments;
273 };
274 </pre>
275
276 <ul>
277 <li>EmitHighLevelInstructions
278 <br>
279 This option controls instruction selection for loops and conditionals.
280 If the option is set high-level IF/ELSE/ENDIF, LOOP/ENDLOOP, CONT/BRK
281 instructions will be emitted.
282 Otherwise, those constructs will be implemented with BRA instructions.
283 </li>
284
285 <li>EmitCondCodes
286 <br>
287 If set, condition codes (ala GL_NV_fragment_program) will be used for
288 branching and looping.
289 Otherwise, ordinary registers will be used (the IF instruction will
290 examine the first operand's X component and do the if-part if non-zero).
291 This option is only relevant if EmitHighLevelInstructions is set.
292 </li>
293
294 <li>EmitComments
295 <br>
296 If set, instructions will be annoted with comments to help with debugging.
297 Extra NOP instructions will also be inserted.
298 </br>
299
300 </ul>
301
302
303 <a name="validation">
304 <h2>Compiler Validation</h2>
305
306 <p>
307 A new <a href="http://glean.sf.net" target="_parent">Glean</a> test has
308 been create to exercise the GLSL compiler.
309 </p>
310 <p>
311 The <em>glsl1</em> test runs over 150 sub-tests to check that the language
312 features and built-in functions work properly.
313 This test should be run frequently while working on the compiler to catch
314 regressions.
315 </p>
316 <p>
317 The test coverage is reasonably broad and complete but additional tests
318 should be added.
319 </p>
320
321
322
323 <a name="120">
324 <h2>GLSL 1.20 support</h2>
325
326 <p>
327 Support for GLSL version 1.20 is underway. Status as follows.
328 </p>
329
330 <h3>Supported</h3>
331 <ul>
332 <li><code>mat2x3, mat2x4</code>, etc. types and functions
333 <li><code>transpose(), outerProduct(), matrixCompMult()</code> functions
334 (but untested)
335 <li>precision qualifiers (lowp, mediump, highp)
336 </ul>
337
338 <h3>Partially Complete</h3>
339 <ul>
340 <li><code>invariant</code> qualifier
341 </ul>
342
343 <h3>Not Completed</h3>
344 <ul>
345 <li><code>array.length()</code> method
346 <li><code>float[5] a;</code> array syntax
347 <li><code>centroid</code> qualifier
348 <li>unsized array constructors
349 <li>initializers for uniforms
350 <li>const initializers calling built-in functions
351 </ul>
352
353
354
355
356 </BODY>
357 </HTML>