gallium: interface changes necessary to implement transform feedback (v5)
[mesa.git] / src / gallium / docs / source / context.rst
1 .. _context:
2
3 Context
4 =======
5
6 A Gallium rendering context encapsulates the state which effects 3D
7 rendering such as blend state, depth/stencil state, texture samplers,
8 etc.
9
10 Note that resource/texture allocation is not per-context but per-screen.
11
12
13 Methods
14 -------
15
16 CSO State
17 ^^^^^^^^^
18
19 All Constant State Object (CSO) state is created, bound, and destroyed,
20 with triplets of methods that all follow a specific naming scheme.
21 For example, ``create_blend_state``, ``bind_blend_state``, and
22 ``destroy_blend_state``.
23
24 CSO objects handled by the context object:
25
26 * :ref:`Blend`: ``*_blend_state``
27 * :ref:`Sampler`: Texture sampler states are bound separately for fragment,
28 vertex and geometry samplers. Note that sampler states are set en masse.
29 If M is the max number of sampler units supported by the driver and N
30 samplers are bound with ``bind_fragment_sampler_states`` then sampler
31 units N..M-1 are considered disabled/NULL.
32 * :ref:`Rasterizer`: ``*_rasterizer_state``
33 * :ref:`Depth, Stencil, & Alpha`: ``*_depth_stencil_alpha_state``
34 * :ref:`Shader`: These are create, bind and destroy methods for vertex,
35 fragment and geometry shaders.
36 * :ref:`Vertex Elements`: ``*_vertex_elements_state``
37
38
39 Resource Binding State
40 ^^^^^^^^^^^^^^^^^^^^^^
41
42 This state describes how resources in various flavours (textures,
43 buffers, surfaces) are bound to the driver.
44
45
46 * ``set_constant_buffer`` sets a constant buffer to be used for a given shader
47 type. index is used to indicate which buffer to set (some apis may allow
48 multiple ones to be set, and binding a specific one later, though drivers
49 are mostly restricted to the first one right now).
50
51 * ``set_framebuffer_state``
52
53 * ``set_vertex_buffers``
54
55 * ``set_index_buffer``
56
57
58 Non-CSO State
59 ^^^^^^^^^^^^^
60
61 These pieces of state are too small, variable, and/or trivial to have CSO
62 objects. They all follow simple, one-method binding calls, e.g.
63 ``set_blend_color``.
64
65 * ``set_stencil_ref`` sets the stencil front and back reference values
66 which are used as comparison values in stencil test.
67 * ``set_blend_color``
68 * ``set_sample_mask``
69 * ``set_clip_state``
70 * ``set_polygon_stipple``
71 * ``set_scissor_state`` sets the bounds for the scissor test, which culls
72 pixels before blending to render targets. If the :ref:`Rasterizer` does
73 not have the scissor test enabled, then the scissor bounds never need to
74 be set since they will not be used. Note that scissor xmin and ymin are
75 inclusive, but xmax and ymax are exclusive. The inclusive ranges in x
76 and y would be [xmin..xmax-1] and [ymin..ymax-1].
77 * ``set_viewport_state``
78
79
80 Sampler Views
81 ^^^^^^^^^^^^^
82
83 These are the means to bind textures to shader stages. To create one, specify
84 its format, swizzle and LOD range in sampler view template.
85
86 If texture format is different than template format, it is said the texture
87 is being cast to another format. Casting can be done only between compatible
88 formats, that is formats that have matching component order and sizes.
89
90 Swizzle fields specify they way in which fetched texel components are placed
91 in the result register. For example, ``swizzle_r`` specifies what is going to be
92 placed in first component of result register.
93
94 The ``first_level`` and ``last_level`` fields of sampler view template specify
95 the LOD range the texture is going to be constrained to. Note that these
96 values are in addition to the respective min_lod, max_lod values in the
97 pipe_sampler_state (that is if min_lod is 2.0, and first_level 3, the first mip
98 level used for sampling from the resource is effectively the fifth).
99
100 The ``first_layer`` and ``last_layer`` fields specify the layer range the
101 texture is going to be constrained to. Similar to the LOD range, this is added
102 to the array index which is used for sampling.
103
104 * ``set_fragment_sampler_views`` binds an array of sampler views to
105 fragment shader stage. Every binding point acquires a reference
106 to a respective sampler view and releases a reference to the previous
107 sampler view. If M is the maximum number of sampler units and N units
108 is passed to set_fragment_sampler_views, the driver should unbind the
109 sampler views for units N..M-1.
110
111 * ``set_vertex_sampler_views`` binds an array of sampler views to vertex
112 shader stage. Every binding point acquires a reference to a respective
113 sampler view and releases a reference to the previous sampler view.
114
115 * ``create_sampler_view`` creates a new sampler view. ``texture`` is associated
116 with the sampler view which results in sampler view holding a reference
117 to the texture. Format specified in template must be compatible
118 with texture format.
119
120 * ``sampler_view_destroy`` destroys a sampler view and releases its reference
121 to associated texture.
122
123 Surfaces
124 ^^^^^^^^
125
126 These are the means to use resources as color render targets or depthstencil
127 attachments. To create one, specify the mip level, the range of layers, and
128 the bind flags (either PIPE_BIND_DEPTH_STENCIL or PIPE_BIND_RENDER_TARGET).
129 Note that layer values are in addition to what is indicated by the geometry
130 shader output variable XXX_FIXME (that is if first_layer is 3 and geometry
131 shader indicates index 2, the 5th layer of the resource will be used). These
132 first_layer and last_layer parameters will only be used for 1d array, 2d array,
133 cube, and 3d textures otherwise they are 0.
134
135 * ``create_surface`` creates a new surface.
136
137 * ``surface_destroy`` destroys a surface and releases its reference to the
138 associated resource.
139
140 Stream output targets
141 ^^^^^^^^^^^^^^^^^^^^^
142
143 Stream output, also known as transform feedback, allows writing the primitives
144 produced by the vertex pipeline to buffers. This is done after the geometry
145 shader or vertex shader if no geometry shader is present.
146
147 The stream output targets are views into buffer resources which can be bound
148 as stream outputs and specify a memory range where it's valid to write
149 primitives. The pipe driver must implement memory protection such that any
150 primitives written outside of the specified memory range are discarded.
151
152 Two stream output targets can use the same resource at the same time, but
153 with a disjoint memory range.
154
155 Additionally, the stream output target internally maintains the offset
156 into the buffer which is incremented everytime something is written to it.
157 The internal offset is equal to how much data has already been written.
158 It can be stored in device memory and the CPU actually doesn't have to query
159 it.
160
161 The stream output target can be used in a draw command to provide
162 the vertex count. The vertex count is derived from the internal offset
163 discussed above.
164
165 * ``create_stream_output_target`` create a new target.
166
167 * ``stream_output_target_destroy`` destroys a target. Users of this should
168 use pipe_so_target_reference instead.
169
170 * ``set_stream_output_targets`` binds stream output targets. The parameter
171 append_bitmask is a bitmask, where the i-th bit specifies whether new
172 primitives should be appended to the i-th buffer (writing starts at
173 the internal offset), or whether writing should start at the beginning
174 (the internal offset is effectively set to 0).
175
176 NOTE: The currently-bound vertex or geometry shader must be compiled with
177 the properly-filled-in structure pipe_stream_output_info describing which
178 outputs should be written to buffers and how. The structure is part of
179 pipe_shader_state.
180
181 Clearing
182 ^^^^^^^^
183
184 Clear is one of the most difficult concepts to nail down to a single
185 interface (due to both different requirements from APIs and also driver/hw
186 specific differences).
187
188 ``clear`` initializes some or all of the surfaces currently bound to
189 the framebuffer to particular RGBA, depth, or stencil values.
190 Currently, this does not take into account color or stencil write masks (as
191 used by GL), and always clears the whole surfaces (no scissoring as used by
192 GL clear or explicit rectangles like d3d9 uses). It can, however, also clear
193 only depth or stencil in a combined depth/stencil surface, if the driver
194 supports PIPE_CAP_DEPTHSTENCIL_CLEAR_SEPARATE.
195 If a surface includes several layers then all layers will be cleared.
196
197 ``clear_render_target`` clears a single color rendertarget with the specified
198 color value. While it is only possible to clear one surface at a time (which can
199 include several layers), this surface need not be bound to the framebuffer.
200
201 ``clear_depth_stencil`` clears a single depth, stencil or depth/stencil surface
202 with the specified depth and stencil values (for combined depth/stencil buffers,
203 is is also possible to only clear one or the other part). While it is only
204 possible to clear one surface at a time (which can include several layers),
205 this surface need not be bound to the framebuffer.
206
207
208 Drawing
209 ^^^^^^^
210
211 ``draw_vbo`` draws a specified primitive. The primitive mode and other
212 properties are described by ``pipe_draw_info``.
213
214 The ``mode``, ``start``, and ``count`` fields of ``pipe_draw_info`` specify the
215 the mode of the primitive and the vertices to be fetched, in the range between
216 ``start`` to ``start``+``count``-1, inclusive.
217
218 Every instance with instanceID in the range between ``start_instance`` and
219 ``start_instance``+``instance_count``-1, inclusive, will be drawn.
220
221 If there is an index buffer bound, and ``indexed`` field is true, all vertex
222 indices will be looked up in the index buffer.
223
224 In indexed draw, ``min_index`` and ``max_index`` respectively provide a lower
225 and upper bound of the indices contained in the index buffer inside the range
226 between ``start`` to ``start``+``count``-1. This allows the driver to
227 determine which subset of vertices will be referenced during te draw call
228 without having to scan the index buffer. Providing a over-estimation of the
229 the true bounds, for example, a ``min_index`` and ``max_index`` of 0 and
230 0xffffffff respectively, must give exactly the same rendering, albeit with less
231 performance due to unreferenced vertex buffers being unnecessarily DMA'ed or
232 processed. Providing a underestimation of the true bounds will result in
233 undefined behavior, but should not result in program or system failure.
234
235 In case of non-indexed draw, ``min_index`` should be set to
236 ``start`` and ``max_index`` should be set to ``start``+``count``-1.
237
238 ``index_bias`` is a value added to every vertex index after lookup and before
239 fetching vertex attributes.
240
241 When drawing indexed primitives, the primitive restart index can be
242 used to draw disjoint primitive strips. For example, several separate
243 line strips can be drawn by designating a special index value as the
244 restart index. The ``primitive_restart`` flag enables/disables this
245 feature. The ``restart_index`` field specifies the restart index value.
246
247 When primitive restart is in use, array indexes are compared to the
248 restart index before adding the index_bias offset.
249
250 If a given vertex element has ``instance_divisor`` set to 0, it is said
251 it contains per-vertex data and effective vertex attribute address needs
252 to be recalculated for every index.
253
254 attribAddr = ``stride`` * index + ``src_offset``
255
256 If a given vertex element has ``instance_divisor`` set to non-zero,
257 it is said it contains per-instance data and effective vertex attribute
258 address needs to recalculated for every ``instance_divisor``-th instance.
259
260 attribAddr = ``stride`` * instanceID / ``instance_divisor`` + ``src_offset``
261
262 In the above formulas, ``src_offset`` is taken from the given vertex element
263 and ``stride`` is taken from a vertex buffer associated with the given
264 vertex element.
265
266 The calculated attribAddr is used as an offset into the vertex buffer to
267 fetch the attribute data.
268
269 The value of ``instanceID`` can be read in a vertex shader through a system
270 value register declared with INSTANCEID semantic name.
271
272
273 Queries
274 ^^^^^^^
275
276 Queries gather some statistic from the 3D pipeline over one or more
277 draws. Queries may be nested, though only d3d1x currently exercises this.
278
279 Queries can be created with ``create_query`` and deleted with
280 ``destroy_query``. To start a query, use ``begin_query``, and when finished,
281 use ``end_query`` to end the query.
282
283 ``get_query_result`` is used to retrieve the results of a query. If
284 the ``wait`` parameter is TRUE, then the ``get_query_result`` call
285 will block until the results of the query are ready (and TRUE will be
286 returned). Otherwise, if the ``wait`` parameter is FALSE, the call
287 will not block and the return value will be TRUE if the query has
288 completed or FALSE otherwise.
289
290 The interface currently includes the following types of queries:
291
292 ``PIPE_QUERY_OCCLUSION_COUNTER`` counts the number of fragments which
293 are written to the framebuffer without being culled by
294 :ref:`Depth, Stencil, & Alpha` testing or shader KILL instructions.
295 The result is an unsigned 64-bit integer.
296 This query can be used with ``render_condition``.
297
298 In cases where a boolean result of an occlusion query is enough,
299 ``PIPE_QUERY_OCCLUSION_PREDICATE`` should be used. It is just like
300 ``PIPE_QUERY_OCCLUSION_COUNTER`` except that the result is a boolean
301 value of FALSE for cases where COUNTER would result in 0 and TRUE
302 for all other cases.
303 This query can be used with ``render_condition``.
304
305 ``PIPE_QUERY_TIME_ELAPSED`` returns the amount of time, in nanoseconds,
306 the context takes to perform operations.
307 The result is an unsigned 64-bit integer.
308
309 ``PIPE_QUERY_TIMESTAMP`` returns a device/driver internal timestamp,
310 scaled to nanoseconds, recorded after all commands issued prior to
311 ``end_query`` have been processed.
312 This query does not require a call to ``begin_query``.
313 The result is an unsigned 64-bit integer.
314
315 ``PIPE_QUERY_TIMESTAMP_DISJOINT`` can be used to check whether the
316 internal timer resolution is good enough to distinguish between the
317 events at ``begin_query`` and ``end_query``.
318 The result is a 64-bit integer specifying the timer resolution in Hz,
319 followed by a boolean value indicating whether the timer has incremented.
320
321 ``PIPE_QUERY_PRIMITIVES_GENERATED`` returns a 64-bit integer indicating
322 the number of primitives processed by the pipeline.
323
324 ``PIPE_QUERY_PRIMITIVES_EMITTED`` returns a 64-bit integer indicating
325 the number of primitives written to stream output buffers.
326
327 ``PIPE_QUERY_SO_STATISTICS`` returns 2 64-bit integers corresponding to
328 the results of
329 ``PIPE_QUERY_PRIMITIVES_EMITTED`` and
330 ``PIPE_QUERY_PRIMITIVES_GENERATED``, in this order.
331
332 ``PIPE_QUERY_SO_OVERFLOW_PREDICATE`` returns a boolean value indicating
333 whether the stream output targets have overflowed as a result of the
334 commands issued between ``begin_query`` and ``end_query``.
335 This query can be used with ``render_condition``.
336
337 ``PIPE_QUERY_GPU_FINISHED`` returns a boolean value indicating whether
338 all commands issued before ``end_query`` have completed. However, this
339 does not imply serialization.
340 This query does not require a call to ``begin_query``.
341
342 ``PIPE_QUERY_PIPELINE_STATISTICS`` returns an array of the following
343 64-bit integers:
344 Number of vertices read from vertex buffers.
345 Number of primitives read from vertex buffers.
346 Number of vertex shader threads launched.
347 Number of geometry shader threads launched.
348 Number of primitives generated by geometry shaders.
349 Number of primitives forwarded to the rasterizer.
350 Number of primitives rasterized.
351 Number of fragment shader threads launched.
352 Number of tessellation control shader threads launched.
353 Number of tessellation evaluation shader threads launched.
354 If a shader type is not supported by the device/driver,
355 the corresponding values should be set to 0.
356
357 Gallium does not guarantee the availability of any query types; one must
358 always check the capabilities of the :ref:`Screen` first.
359
360
361 Conditional Rendering
362 ^^^^^^^^^^^^^^^^^^^^^
363
364 A drawing command can be skipped depending on the outcome of a query
365 (typically an occlusion query). The ``render_condition`` function specifies
366 the query which should be checked prior to rendering anything.
367
368 If ``render_condition`` is called with ``query`` = NULL, conditional
369 rendering is disabled and drawing takes place normally.
370
371 If ``render_condition`` is called with a non-null ``query`` subsequent
372 drawing commands will be predicated on the outcome of the query. If
373 the query result is zero subsequent drawing commands will be skipped.
374
375 If ``mode`` is PIPE_RENDER_COND_WAIT the driver will wait for the
376 query to complete before deciding whether to render.
377
378 If ``mode`` is PIPE_RENDER_COND_NO_WAIT and the query has not yet
379 completed, the drawing command will be executed normally. If the query
380 has completed, drawing will be predicated on the outcome of the query.
381
382 If ``mode`` is PIPE_RENDER_COND_BY_REGION_WAIT or
383 PIPE_RENDER_COND_BY_REGION_NO_WAIT rendering will be predicated as above
384 for the non-REGION modes but in the case that an occulusion query returns
385 a non-zero result, regions which were occluded may be ommitted by subsequent
386 drawing commands. This can result in better performance with some GPUs.
387 Normally, if the occlusion query returned a non-zero result subsequent
388 drawing happens normally so fragments may be generated, shaded and
389 processed even where they're known to be obscured.
390
391
392 Flushing
393 ^^^^^^^^
394
395 ``flush``
396
397
398 Resource Busy Queries
399 ^^^^^^^^^^^^^^^^^^^^^
400
401 ``is_resource_referenced``
402
403
404
405 Blitting
406 ^^^^^^^^
407
408 These methods emulate classic blitter controls.
409
410 These methods operate directly on ``pipe_resource`` objects, and stand
411 apart from any 3D state in the context. Blitting functionality may be
412 moved to a separate abstraction at some point in the future.
413
414 ``resource_copy_region`` blits a region of a resource to a region of another
415 resource, provided that both resources have the same format, or compatible
416 formats, i.e., formats for which copying the bytes from the source resource
417 unmodified to the destination resource will achieve the same effect of a
418 textured quad blitter.. The source and destination may be the same resource,
419 but overlapping blits are not permitted.
420
421 ``resource_resolve`` resolves a multisampled resource into a non-multisampled
422 one. Their formats must match. This function must be present if a driver
423 supports multisampling.
424 The region that is to be resolved is described by ``pipe_resolve_info``, which
425 provides a source and a destination rectangle.
426 The source rectangle may be vertically flipped, but otherwise the dimensions
427 of the rectangles must match, unless PIPE_CAP_SCALED_RESOLVE is supported,
428 in which case scaling and horizontal flipping are allowed as well.
429 The result of resolving depth/stencil values may be any function of the values at
430 the sample points, but returning the value of the centermost sample is preferred.
431
432 The interfaces to these calls are likely to change to make it easier
433 for a driver to batch multiple blits with the same source and
434 destination.
435
436 Transfers
437 ^^^^^^^^^
438
439 These methods are used to get data to/from a resource.
440
441 ``get_transfer`` creates a transfer object.
442
443 ``transfer_destroy`` destroys the transfer object. May cause
444 data to be written to the resource at this point.
445
446 ``transfer_map`` creates a memory mapping for the transfer object.
447 The returned map points to the start of the mapped range according to
448 the box region, not the beginning of the resource.
449
450 ``transfer_unmap`` remove the memory mapping for the transfer object.
451 Any pointers into the map should be considered invalid and discarded.
452
453 ``transfer_inline_write`` performs a simplified transfer for simple writes.
454 Basically get_transfer, transfer_map, data write, transfer_unmap, and
455 transfer_destroy all in one.
456
457
458 The box parameter to some of these functions defines a 1D, 2D or 3D
459 region of pixels. This is self-explanatory for 1D, 2D and 3D texture
460 targets.
461
462 For PIPE_TEXTURE_1D_ARRAY, the box::y and box::height fields refer to the
463 array dimension of the texture.
464
465 For PIPE_TEXTURE_2D_ARRAY, the box::z and box::depth fields refer to the
466 array dimension of the texture.
467
468 For PIPE_TEXTURE_CUBE, the box:z and box::depth fields refer to the
469 faces of the cube map (z + depth <= 6).
470
471
472
473 .. _transfer_flush_region:
474
475 transfer_flush_region
476 %%%%%%%%%%%%%%%%%%%%%
477
478 If a transfer was created with ``FLUSH_EXPLICIT``, it will not automatically
479 be flushed on write or unmap. Flushes must be requested with
480 ``transfer_flush_region``. Flush ranges are relative to the mapped range, not
481 the beginning of the resource.
482
483
484
485 .. _redefine_user_buffer:
486
487 redefine_user_buffer
488 %%%%%%%%%%%%%%%%%%%%
489
490 This function notifies a driver that the user buffer content has been changed.
491 The updated region starts at ``offset`` and is ``size`` bytes large.
492 The ``offset`` is relative to the pointer specified in ``user_buffer_create``.
493 While uploading the user buffer, the driver is allowed not to upload
494 the memory outside of this region.
495 The width0 is redefined to ``MAX2(width0, offset+size)``.
496
497
498
499 .. _texture_barrier
500
501 texture_barrier
502 %%%%%%%%%%%%%%%
503
504 This function flushes all pending writes to the currently-set surfaces and
505 invalidates all read caches of the currently-set samplers.
506
507
508
509 .. _pipe_transfer:
510
511 PIPE_TRANSFER
512 ^^^^^^^^^^^^^
513
514 These flags control the behavior of a transfer object.
515
516 ``PIPE_TRANSFER_READ``
517 Resource contents read back (or accessed directly) at transfer create time.
518
519 ``PIPE_TRANSFER_WRITE``
520 Resource contents will be written back at transfer_destroy time (or modified
521 as a result of being accessed directly).
522
523 ``PIPE_TRANSFER_MAP_DIRECTLY``
524 a transfer should directly map the resource. May return NULL if not supported.
525
526 ``PIPE_TRANSFER_DISCARD_RANGE``
527 The memory within the mapped region is discarded. Cannot be used with
528 ``PIPE_TRANSFER_READ``.
529
530 ``PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE``
531 Discards all memory backing the resource. It should not be used with
532 ``PIPE_TRANSFER_READ``.
533
534 ``PIPE_TRANSFER_DONTBLOCK``
535 Fail if the resource cannot be mapped immediately.
536
537 ``PIPE_TRANSFER_UNSYNCHRONIZED``
538 Do not synchronize pending operations on the resource when mapping. The
539 interaction of any writes to the map and any operations pending on the
540 resource are undefined. Cannot be used with ``PIPE_TRANSFER_READ``.
541
542 ``PIPE_TRANSFER_FLUSH_EXPLICIT``
543 Written ranges will be notified later with :ref:`transfer_flush_region`.
544 Cannot be used with ``PIPE_TRANSFER_READ``.