mesa: updated GLSL docs
[mesa.git] / docs / shading.html
1 <HTML>
2
3 <TITLE>Shading Language Support</TITLE>
4
5 <link rel="stylesheet" type="text/css" href="mesa.css"></head>
6
7 <BODY>
8
9 <H1>Shading Language Support</H1>
10
11 <p>
12 This page describes the features and status of Mesa's support for the
13 <a href="http://opengl.org/documentation/glsl/" target="_parent">
14 OpenGL Shading Language</a>.
15 </p>
16
17 <p>
18 Last updated on 15 December 2008.
19 </p>
20
21 <p>
22 Contents
23 </p>
24 <ul>
25 <li><a href="#120">GLSL 1.20 support</a>
26 <li><a href="#unsup">Unsupported Features</a>
27 <li><a href="#notes">Implementation Notes</a>
28 <li><a href="#hints">Programming Hints</a>
29 <li><a href="#standalone">Stand-alone GLSL Compiler</a>
30 <li><a href="#implementation">Compiler Implementation</a>
31 <li><a href="#validation">Compiler Validation</a>
32 </ul>
33
34
35
36 <a name="120">
37 <h2>GLSL 1.20 support</h2>
38
39 <p>
40 GLSL version 1.20 is supported in Mesa 7.3.
41 Among the features/differences of GLSL 1.20 are:
42 <ul>
43 <li><code>mat2x3, mat2x4</code>, etc. types and functions
44 <li><code>transpose(), outerProduct(), matrixCompMult()</code> functions
45 (but untested)
46 <li>precision qualifiers (lowp, mediump, highp)
47 <li><code>invariant</code> qualifier
48 <li><code>array.length()</code> method
49 <li><code>float[5] a;</code> array syntax
50 <li><code>centroid</code> qualifier
51 <li>unsized array constructors
52 <li>initializers for uniforms
53 <li>const initializers calling built-in functions
54 </ul>
55
56
57
58 <a name="unsup">
59 <h2>Unsupported Features</h2>
60
61 <p>
62 The following features of the shading language are not yet supported
63 in Mesa:
64 </p>
65
66 <ul>
67 <li>Linking of multiple shaders is not supported
68 <li>gl_ClipVertex
69 <li>The gl_Color and gl_SecondaryColor varying vars are interpolated
70 without perspective correction
71 </ul>
72
73 <p>
74 All other major features of the shading language should function.
75 </p>
76
77
78 <a name="notes">
79 <h2>Implementation Notes</h2>
80
81 <ul>
82 <li>Shading language programs are compiled into low-level programs
83 very similar to those of GL_ARB_vertex/fragment_program.
84 <li>All vector types (vec2, vec3, vec4, bvec2, etc) currently occupy full
85 float[4] registers.
86 <li>Float constants and variables are packed so that up to four floats
87 can occupy one program parameter/register.
88 <li>All function calls are inlined.
89 <li>Shaders which use too many registers will not compile.
90 <li>The quality of generated code is pretty good, register usage is fair.
91 <li>Shader error detection and reporting of errors (InfoLog) is not
92 very good yet.
93 <li>The ftransform() function doesn't necessarily match the results of
94 fixed-function transformation.
95 </ul>
96
97 <p>
98 These issues will be addressed/resolved in the future.
99 </p>
100
101
102 <a name="hints">
103 <h2>Programming Hints</h2>
104
105 <ul>
106 <li>Declare <em>in</em> function parameters as <em>const</em> whenever possible.
107 This improves the efficiency of function inlining.
108 </li>
109 <br>
110 <li>To reduce register usage, declare variables within smaller scopes.
111 For example, the following code:
112 <pre>
113 void main()
114 {
115 vec4 a1, a2, b1, b2;
116 gl_Position = expression using a1, a2.
117 gl_Color = expression using b1, b2;
118 }
119 </pre>
120 Can be rewritten as follows to use half as many registers:
121 <pre>
122 void main()
123 {
124 {
125 vec4 a1, a2;
126 gl_Position = expression using a1, a2.
127 }
128 {
129 vec4 b1, b2;
130 gl_Color = expression using b1, b2;
131 }
132 }
133 </pre>
134 Alternately, rather than using several float variables, use
135 a vec4 instead. Use swizzling and writemasks to access the
136 components of the vec4 as floats.
137 </li>
138 <br>
139 <li>Use the built-in library functions whenever possible.
140 For example, instead of writing this:
141 <pre>
142 float x = 1.0 / sqrt(y);
143 </pre>
144 Write this:
145 <pre>
146 float x = inversesqrt(y);
147 </pre>
148 <li>
149 Use ++i when possible as it's more efficient than i++
150 </li>
151 </ul>
152
153
154 <a name="standalone">
155 <h2>Stand-alone GLSL Compiler</h2>
156
157 <p>
158 A unique stand-alone GLSL compiler driver has been added to Mesa.
159 <p>
160
161 <p>
162 The stand-alone compiler (like a conventional command-line compiler)
163 is a tool that accepts Shading Language programs and emits low-level
164 GPU programs.
165 </p>
166
167 <p>
168 This tool is useful for:
169 <p>
170 <ul>
171 <li>Inspecting GPU code to gain insight into compilation
172 <li>Generating initial GPU code for subsequent hand-tuning
173 <li>Debugging the GLSL compiler itself
174 </ul>
175
176 <p>
177 After building Mesa, the glslcompiler can be built by manually running:
178 </p>
179 <pre>
180 cd src/mesa/drivers/glslcompiler
181 make
182 </pre>
183
184
185 <p>
186 Here's an example of using the compiler to compile a vertex shader and
187 emit GL_ARB_vertex_program-style instructions:
188 </p>
189 <pre>
190 bin/glslcompiler --debug --numbers --fs progs/glsl/CH06-brick.frag.txt
191 </pre>
192 <p>
193 results in:
194 </p>
195 <pre>
196 # Fragment Program/Shader
197 0: RCP TEMP[4].x, UNIFORM[2].xxxx;
198 1: RCP TEMP[4].y, UNIFORM[2].yyyy;
199 2: MUL TEMP[3].xy, VARYING[0], TEMP[4];
200 3: MOV TEMP[1], TEMP[3];
201 4: MUL TEMP[0].w, TEMP[1].yyyy, CONST[4].xxxx;
202 5: FRC TEMP[1].z, TEMP[0].wwww;
203 6: SGT.C TEMP[0].w, TEMP[1].zzzz, CONST[4].xxxx;
204 7: IF (NE.wwww); # (if false, goto 9);
205 8: ADD TEMP[1].x, TEMP[1].xxxx, CONST[4].xxxx;
206 9: ENDIF;
207 10: FRC TEMP[1].xy, TEMP[1];
208 11: SGT TEMP[2].xy, UNIFORM[3], TEMP[1];
209 12: MUL TEMP[1].z, TEMP[2].xxxx, TEMP[2].yyyy;
210 13: LRP TEMP[0], TEMP[1].zzzz, UNIFORM[0], UNIFORM[1];
211 14: MUL TEMP[0].xyz, TEMP[0], VARYING[1].xxxx;
212 15: MOV OUTPUT[0].xyz, TEMP[0];
213 16: MOV OUTPUT[0].w, CONST[4].yyyy;
214 17: END
215 </pre>
216
217 <p>
218 Note that some shading language constructs (such as uniform and varying
219 variables) aren't expressible in ARB or NV-style programs.
220 Therefore, the resulting output is not always legal by definition of
221 those program languages.
222 </p>
223 <p>
224 Also note that this compiler driver is still under development.
225 Over time, the correctness of the GPU programs, with respect to the ARB
226 and NV languagues, should improve.
227 </p>
228
229
230
231 <a name="implementation">
232 <h2>Compiler Implementation</h2>
233
234 <p>
235 The source code for Mesa's shading language compiler is in the
236 <code>src/mesa/shader/slang/</code> directory.
237 </p>
238
239 <p>
240 The compiler follows a fairly standard design and basically works as follows:
241 </p>
242 <ul>
243 <li>The input string is tokenized (see grammar.c) and parsed
244 (see slang_compiler_*.c) to produce an Abstract Syntax Tree (AST).
245 The nodes in this tree are slang_operation structures
246 (see slang_compile_operation.h).
247 The nodes are decorated with symbol table, scoping and datatype information.
248 <li>The AST is converted into an Intermediate representation (IR) tree
249 (see the slang_codegen.c file).
250 The IR nodes represent basic GPU instructions, like add, dot product,
251 move, etc.
252 The IR tree is mostly a binary tree, but a few nodes have three or four
253 children.
254 In principle, the IR tree could be executed by doing an in-order traversal.
255 <li>The IR tree is traversed in-order to emit code (see slang_emit.c).
256 This is also when registers are allocated to store variables and temps.
257 <li>In the future, a pattern-matching code generator-generator may be
258 used for code generation.
259 Programs such as L-BURG (Bottom-Up Rewrite Generator) and Twig look for
260 patterns in IR trees, compute weights for subtrees and use the weights
261 to select the best instructions to represent the sub-tree.
262 <li>The emitted GPU instructions (see prog_instruction.h) are stored in a
263 gl_program object (see mtypes.h).
264 <li>When a fragment shader and vertex shader are linked (see slang_link.c)
265 the varying vars are matched up, uniforms are merged, and vertex
266 attributes are resolved (rewriting instructions as needed).
267 </ul>
268
269 <p>
270 The final vertex and fragment programs may be interpreted in software
271 (see prog_execute.c) or translated into a specific hardware architecture
272 (see drivers/dri/i915/i915_fragprog.c for example).
273 </p>
274
275 <h3>Code Generation Options</h3>
276
277 <p>
278 Internally, there are several options that control the compiler's code
279 generation and instruction selection.
280 These options are seen in the gl_shader_state struct and may be set
281 by the device driver to indicate its preferences:
282
283 <pre>
284 struct gl_shader_state
285 {
286 ...
287 /** Driver-selectable options: */
288 GLboolean EmitHighLevelInstructions;
289 GLboolean EmitCondCodes;
290 GLboolean EmitComments;
291 };
292 </pre>
293
294 <ul>
295 <li>EmitHighLevelInstructions
296 <br>
297 This option controls instruction selection for loops and conditionals.
298 If the option is set high-level IF/ELSE/ENDIF, LOOP/ENDLOOP, CONT/BRK
299 instructions will be emitted.
300 Otherwise, those constructs will be implemented with BRA instructions.
301 </li>
302
303 <li>EmitCondCodes
304 <br>
305 If set, condition codes (ala GL_NV_fragment_program) will be used for
306 branching and looping.
307 Otherwise, ordinary registers will be used (the IF instruction will
308 examine the first operand's X component and do the if-part if non-zero).
309 This option is only relevant if EmitHighLevelInstructions is set.
310 </li>
311
312 <li>EmitComments
313 <br>
314 If set, instructions will be annoted with comments to help with debugging.
315 Extra NOP instructions will also be inserted.
316 </br>
317
318 </ul>
319
320
321 <a name="validation">
322 <h2>Compiler Validation</h2>
323
324 <p>
325 A <a href="http://glean.sf.net" target="_parent">Glean</a> test has
326 been create to exercise the GLSL compiler.
327 </p>
328 <p>
329 The <em>glsl1</em> test runs over 170 sub-tests to check that the language
330 features and built-in functions work properly.
331 This test should be run frequently while working on the compiler to catch
332 regressions.
333 </p>
334 <p>
335 The test coverage is reasonably broad and complete but additional tests
336 should be added.
337 </p>
338
339
340 </BODY>
341 </HTML>