José Fonseca [Tue, 17 Sep 2013 18:22:44 +0000 (19:22 +0100)]
util/u_blit: Support blits from cubemaps.
By calling util_map_texcoords2d_onto_cubemap.
A new parameter for util_blit_pixels_tex is necessary, as
pipe_sampler_view::first_layer is always supposed to point to the first
face when sampling from cubemaps.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
José Fonseca [Tue, 17 Sep 2013 18:01:11 +0000 (19:01 +0100)]
vega: Use pipe_context::blit instead of util_blit_pixels_tex.
Only compile-tested but it seems straightforward.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Kenneth Graunke [Wed, 18 Sep 2013 06:32:10 +0000 (23:32 -0700)]
i965: Rename brw_{fs,vec4}_emit.cpp to brw_{fs,vec4}_generator.cpp.
The previous names were really confusing to talk about:
- brw_fs_visitor() contained methods named emit_whatever().
- brw_fs_generator() contained methods named generate_whatever(), but
lived in brw_fs_emit.cpp.
So when someone said "the emit layer", or "emit code", we weren't sure
whether they meant the visitor's emit() functions or the generator in
brw_fs_emit.cpp.
By renaming these files, the method names, class names, and file names
all match, which is much less confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Matt Turner [Fri, 6 Sep 2013 22:05:10 +0000 (15:05 -0700)]
glsl: Correctly validate fma()'s types.
lrp() can take a scalar as a third argument, and fma() cannot.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 9 Sep 2013 18:13:20 +0000 (11:13 -0700)]
glsl: Add frexp signatures and implementation.
I initially implemented frexp() as an IR opcode with a lowering pass,
but since it returns a value and has an out-parameter, it would break
assumptions our optimization passes make about ir_expressions being pure
(i.e., having no side effects).
For example, if opt_tree_grafting encounters this code:
uniform float u;
void main()
{
int exp;
float f = frexp(u, out exp);
float g = float(exp)/256.0;
float h = float(exp) + 1.0;
gl_FragColor = vec4(f, g, h, g + h);
}
it may try to optimize it to this:
uniform float u;
void main()
{
int exp;
float g = float(exp)/256.0;
float h = float(exp) + 1.0;
gl_FragColor = vec4(frexp(u, out exp), g, h, g + h);
}
Some hardware has an instruction which performs frexp(), but we would
need some other compiler infrastructure to be able to generate it, such
as an intrinsics system that would allow backends to emit specific code
for particular bits of IR.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Matt Turner [Sat, 3 Aug 2013 18:34:30 +0000 (11:34 -0700)]
i965: Lower ldexp.
v2: Drop frexp lowering.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Matt Turner [Sat, 3 Aug 2013 18:02:59 +0000 (11:02 -0700)]
glsl: Add ldexp_to_arith lowering pass.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Matt Turner [Mon, 5 Aug 2013 22:15:37 +0000 (15:15 -0700)]
glsl: Allow vectors to be created from ir_constant().
Note the parameter name change in the int version of ir_constant, to
avoid the conflict with the loop iterator.
v2: Make analogous change to builtin_builder::imm().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Matt Turner [Thu, 22 Aug 2013 20:31:18 +0000 (13:31 -0700)]
glsl: Add support for ldexp.
v2: Drop frexp. Rebase on builtins rewrite.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Paul Berry [Mon, 2 Sep 2013 00:52:20 +0000 (17:52 -0700)]
i965: Add some missing bits to {mesa,brw,cache}_bits[].
These data structures are used for debug output, so it wasn't hurting
anything that there were missing bits. But it's good to keep things
up to date.
This patch also adds static asserts so that the {brw,cache}_bits[]
arrays are the proper size, so that we don't forget to add to them in
the future. Unfortunately there's no convenient way to assert that
mesa_bits[] is the proper size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 12 Aug 2013 15:00:10 +0000 (08:00 -0700)]
i965/gs: Implement basic gl_PrimitiveIDIn functionality.
If the geometry shader refers to the built-in variable
gl_PrimitiveIDIn, we need to set a bit in 3DSTATE_GS to tell the
hardware to dispatch primitive ID to r1, and we need to leave room for
it when allocating registers.
Note: this feature doesn't yet work properly when software primitive
restart is in use (the primitive ID counter will incorrectly reset
with each primitive restart, since software primitive restart works by
performing multiple draw calls). I plan to address that in a future
patch series.
Fixes piglit test "spec/glsl-1.50/execution/geometry/primitive-id-in".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 27 Aug 2013 04:20:12 +0000 (21:20 -0700)]
i965/gs: New gs primitive types are supported by HW primitive restart.
When we previously implemented primitive restart, we didn't add cases
to brw_primitive_restart.c's can_cut_index_handle_prims() for the
primitive types that are introduced with geometry shaders. It turns
out that all of the new primitive types are supported by hardware
primitive restart.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sun, 28 Apr 2013 14:43:18 +0000 (07:43 -0700)]
i965/gs: Add new primitive types.
As part of its support for geometry shaders, GL 3.2 introduces four
new primitive types: GL_LINES_ADJACENCY, GL_LINE_STRIP_ADJACENCY,
GL_TRIANGLES_ADJACENCY, and GL_TRIANGLE_STRIP_ADJACENCY.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Roland Scheidegger [Fri, 13 Sep 2013 17:52:09 +0000 (19:52 +0200)]
gallivm: some bits of seamless cube filtering implementation
Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a
correct implementation for nearest filtering, and it's way better than
using repeat wrap for instance for linear filtering (though obviously this
doesn't actually do seamless filtering).
v2: fix s/t wrap not r/s...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Kenneth Graunke [Sat, 14 Sep 2013 03:01:08 +0000 (20:01 -0700)]
i965: Remove MIPLAYOUT_BELOW from Gen4-6 constant buffer surface state.
Specifying a miptree layout makes no sense for constant buffers.
This has no functional change since BRW_SURFACE_MIPMAPLAYOUT_BELOW is
just a #define for 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kristian Høgsberg [Tue, 17 Sep 2013 05:22:49 +0000 (22:22 -0700)]
egl: Also add EGL_TEXTURE_FORMAT as a valid eglQueryWaylandBufferWL attribute
Now that we have a table of accepted eglQueryWaylandBufferWL() attributes,
we should also list EGL_TEXTURE_FORMAT.
Stanislav Vorobiov [Mon, 16 Sep 2013 09:02:46 +0000 (13:02 +0400)]
egl: add EGL_WAYLAND_Y_INVERTED_WL attribute
This enables querying of wl_buffer's orientation
Kenneth Graunke [Fri, 13 Sep 2013 21:41:04 +0000 (14:41 -0700)]
i965: Use gen7_upload_constant_state for 3DSTATE_CONSTANT_PS as well.
Now we use gen7_upload_constant_state() for all three shader stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Fri, 13 Sep 2013 21:37:09 +0000 (14:37 -0700)]
i965: Set brw_stage_state::push_const_size for PS constants.
This paves the way for using gen7_upload_constant_state for PS data.
The formula is copied from gen7_wm_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Fri, 13 Sep 2013 21:34:48 +0000 (14:34 -0700)]
i965: Introduce a prog_data temporary in gen6_upload_wm_push_constants.
This saves a bit of typing and shortens a few lines.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Paul Berry [Tue, 3 Sep 2013 19:37:47 +0000 (12:37 -0700)]
i965/gen6+: Support 128 varying components.
GL 3.2 requires us to support 128 varying components for geometry
shader outputs and fragment shader inputs, and 64 varying components
otherwise. But there's no hardware limitation that restricts us to 64
varying components, and core Mesa doesn't currently allow different
stages to have different maximum values, so just go ahead and enable
128 varying components for all stages. This gets us better test
coverage anyway.
Even though we are only working on GL 3.2 support for gen7 right now,
gen6 also supports 128 varying components, so go ahead and switch it
on there too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 21:38:19 +0000 (14:38 -0700)]
i965/ff_gs: Generate URB writes using a loop.
Previously we only ever did 1 URB write, since the maximum number of
varyings we support is small enough to fit in 1 URB write (when using
BRW_URB_SWIZZLE_NONE, which is what the pre-Gen7 GS always uses). But
we're about to increase the number of varying components we support
from 64 to 128.
With 128 varyings, the most URB writes we'll have to do is 2, but it's
just as easy to write a general-purpose loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 21:19:18 +0000 (14:19 -0700)]
i965/gen6: Fix assertions on VS/GS URB size.
The "{VS,GS} URB Entry Allocation Size" fields of 3DSTATE_URB allow
values in the range 0-4, but they are U8-1 fields, so the range of
possible allocation sizes is 1-5. We were erroneously prohibiting a
size of 5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 19:30:06 +0000 (12:30 -0700)]
i965/vec4: Generate URB writes using a loop.
Previously we only ever did 1 or 2 URB writes, since the maximum
number of varyings we support is small enough to fit in 2 URB writes.
But GL 3.2 requires the geometry shader to support 128 output varying
components, and this could require up to 3 URB writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 19:15:53 +0000 (12:15 -0700)]
i965/fs: When >64 input components, order them to match prev pipeline stage.
Since the SF/SBE stage is only capable of performing arbitrary
reorderings of 16 varying slots, we can't arrange the fragment shader
inputs in an arbitrary order if there are more than 16 input varying
slots in use. We need to make sure that slots 16-31 match the
corresponding outputs of the previous pipeline stage.
The easiest way to accomplish this is to just make all varying slots
match up with the previous pipeline stage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 18:55:17 +0000 (11:55 -0700)]
i965/fs: Simplify computation of key.input_slots_valid during precompile.
The for loop was rather silly. In addition to checking brw->gen < 6
on each loop iteration, it took pains to exclude bits from
fp->Base.InputsRead that don't correspond to fragment shader inputs.
But those bits would never have been set in the first place, since the
only bits that are ever set in fp->Base.InputsRead are fragment shader
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 2 Sep 2013 21:02:22 +0000 (14:02 -0700)]
i965/gs: Stop storing an input VUE map in the GS program key.
Now that the vertex shader output VUE map is determined solely by a
64-bit bitfield, we don't have to store it in its entirety in the
geometry shader program key; instead, we can just store the bitfield,
and let the geometry shader infer the VUE map at compile time.
This dramatically reduces the size of the geometry shader program key,
which we want to keep small since it gets recomputed whenever the
active program changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 2 Sep 2013 20:46:25 +0000 (13:46 -0700)]
i965/gen6+: Remove VUE map dependency on userclip_active.
Previously, on Gen6+, we laid out the vertex (or geometry) shader VUE
map differently depending whether user clipping was active. If it was
active, we put the clip distances in slots 2 and 3 (where the clipper
expects them); if it was inactive, we assigned them in the order of
the gl_varying_slot enum.
This made for unnecessary recompiles, since turning clipping on/off
for a shader that used gl_ClipDistance might rearrange the varyings.
It also required extra bookkeeping, since it required the user
clipping flag to be provided to brw_compute_vue_map() as a parameter.
With this patch, we always put clip distances at in slots 2 and 3 if
they are written to. do_vs_prog() and do_gs_prog() are responsible
for ensuring that clip distances are written to when user clipping is
enabled (as do_vs_prog() previously did for gen4-5).
This makes the only input to brw_compute_vue_map() a bitfield of which
varyings the shader writes to, a fact that we'll take advantage of in
forthcoming patches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 05:18:27 +0000 (22:18 -0700)]
i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_FrontFacing.
Previously, if a fragment shader accessed gl_FragCoord or
gl_FrontFacing, we would assign them their own slots in the fragment
shader input attribute array, using up space that could be made
available to real varyings. This was not strictly necessary (since
these values are not true varyings, and are instead computed from
other data available in the FS payload). But we had to do it anyway
because the SF/SBE setup code assumed that every 1 bit in the
gl_program::InputsRead bitfield corresponded to a genuine varying
variable.
Now that the SF/SBE code consults brw_wm_prog_data and only sets up
the attributes that the fragment shader actually needs, we don't have
to do this anymore.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 04:59:04 +0000 (21:59 -0700)]
i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.
Previously, the SF/SBE setup code delivered varying inputs to the FS
in the order in which they appear in the gl_program::InputsRead
bitfield, since that's what the FS expects.
When we add support for more than 64 varying components, this will no
longer always be the case, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots. So,
when there are more than 16 vec4's worth of varying inputs, the FS
will have to adjust the order its input varyings in order to partially
match the order of outputs from the geometry or vertex shader.
To allow extra flexibility in the ordering of FS varyings, this patch
causes the SF/SBE to deliver varying inputs to the FS in exactly the
order that the FS requests, by consulting brw_wm_prog_data::urb_setup
and brw_wm_prog_data::num_varying_inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 01:09:08 +0000 (18:09 -0700)]
i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 2 Sep 2013 15:43:02 +0000 (08:43 -0700)]
i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values.
We always program the SF unit to start reading the vertex URB entry at
offset 1. In upcoming patches, we'll be adding FS code that relies on
this. So consistently use the constant BRW_SF_URB_ENTRY_READ_OFFSET
rather than hardcoding a 1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 18:30:19 +0000 (11:30 -0700)]
i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state.
Previously, we assumed that the number of varying inputs consumed by
the fragment shader was equal to the number of bits set in
gl_program::InputsRead. However, we'll soon be making two changes
that will cause that not to be true:
- We'll stop wasting varying input space for gl_FragCoord and
gl_FrontFacing, which aren't varyings.
- For fragment shaders that have more than 16 varying inputs, we'll
adjust the layout of the inputs to account for the fact that the
SF/SBE pipeline stage can't reorder inputs beyond the first 16; if
there are GS outputs that the FS doens't use (or vice versa) this
may cause the number of FS varying inputs to change.
So, instead of trying to guess the number of FS inputs from
gl_program::InputsRead, simply read it from
brw_wm_prog_data:num_varying_inputs, which is guaranteed to be correct
since it's populated by fs_visitor::calculate_urb_setup().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 00:35:32 +0000 (17:35 -0700)]
i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.
On gen4-5, the FS stage reads varying inputs from URB entries that
were output by the SF thread, where each register stores the
interpolation setup for two components of a vec4, therefore the FS
urb_read_length is twice the number of FS input varyings. On gen6+,
varying inputs are directly deposited in the FS payload by the SF/SBE
fixed function logic, so urb_read_length is irrelevant.
However, in future patches, it will be nice to be able to consult
brw_wm_prog_data to determine how many varying inputs the FS expects
(rather than inferring it from gl_program::InputsRead). So instead of
storing urb_read_length, we simply store num_varying_inputs in
brw_wm_prog_data. On gen4-5, we multiply this by 2 to recover the URB
read length.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Tue, 3 Sep 2013 00:24:19 +0000 (17:24 -0700)]
i965/fs: Expose "urb_setup" as part of brw_wm_prog_data.
At the moment, for Gen6+, the FS assumes that all varying inputs are
delivered to it in the order in which they appear in the
gl_program::InputsRead bitfield, and the SF/SBE setup code ensures
that they are delivered in this order.
When we add support for more than 64 varying components, this will no
longer always be possible, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots.
To allow extra flexibility in the ordering of FS varyings, this patch
causes the FS to advertise exactly what ordering it expects.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Chia-I Wu [Fri, 13 Sep 2013 03:34:19 +0000 (11:34 +0800)]
ilo: make ilo_bind_sampler_states return void
So that it can be hooked up pipe_context::bind_sampler_states that is
currently living on another branch.
Kenneth Graunke [Mon, 16 Sep 2013 15:25:44 +0000 (08:25 -0700)]
glsl/tests: Update .gitignore for new unit test.
I rarely run 'git status', so I failed to notice this was missing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Thu, 12 Sep 2013 06:57:26 +0000 (23:57 -0700)]
glsl/tests: Add a test for properties of sampler types.
For each sampler type, this tests that:
- The base type is GLSL_TYPE_SAMPLER.
- The dimensionality is set correctly.
- The returned data type is correct.
- The sampler_array and sampler_shadow flags are set correctly.
- sampler_coordinate_components() returns the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Dave Airlie [Tue, 10 Sep 2013 04:46:23 +0000 (14:46 +1000)]
st/mesa: don't dereference stObj->pt if NULL
It seems a user app can get us into this state, I trigger the fail
running fbo-maxsize inside virgl, it fails to create the backing
storage for the texture object, but then segfaults here when it
should fail the completeness test.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 10 Sep 2013 02:02:30 +0000 (12:02 +1000)]
nouveau: fix regression since float comparison instructions (v2)
Fix the return type and allow src and dst types for comparison
to be separate, this at least fixes the two test cases I've written.
v2: drop the u32->s32 change
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rico Schüller [Sat, 14 Sep 2013 18:27:07 +0000 (20:27 +0200)]
vdpau/decode: Check max width and max height.
Reviewed-by: Christian König <christian.koenig@amd.com>
Rob Clark [Wed, 11 Sep 2013 14:08:08 +0000 (10:08 -0400)]
freedreno: PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE
When the old contents do not need to be preserved, it is faster to
create a new backing bo rather than stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 11 Sep 2013 14:06:29 +0000 (10:06 -0400)]
freedreno/a3xx: fix VFD_INDEX_MAX overflow
max_index may be 0xffffffff. The hardware does not need 1 + max_index
(although it does not hurt unless max_index wraps around to zero).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 10 Sep 2013 15:35:58 +0000 (11:35 -0400)]
freedreno: add debug option to disable GMEM bypass
Useful for debugging.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 9 Sep 2013 15:31:20 +0000 (11:31 -0400)]
freedreno/a3xx: handle front_ccw
Used by supertuxkart.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 8 Sep 2013 21:00:40 +0000 (17:00 -0400)]
freedreno/a3xx: stencil fixes
For mem->gmem we don't sample depth/stencil as it's native type. So we
need to setup the swizzle state for the sampler based on the format used
for sampling.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 8 Sep 2013 17:49:54 +0000 (13:49 -0400)]
freedreno/a3xx: alpha-test
Needed by some games, like etuxracer and supertuxkart which use alpha
test rather than blending, to handle texture transparency.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 7 Sep 2013 23:57:04 +0000 (19:57 -0400)]
freedreno/a3xx/compiler: implement SUB
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 6 Sep 2013 22:21:25 +0000 (18:21 -0400)]
freedreno/a3xx: use INDIRECT state load for shaders
With a debug option to force DIRECT (mainly to make it easier for
capturing cmdstream dumps). Using INDIRECT for large shaders at least
makes a noticable reduction in CPU load, which helps for CPU limited
games.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 6 Sep 2013 17:20:46 +0000 (13:20 -0400)]
freedreno: avoid stalling at ringbuffer wraparound
Because of how the tiling works, we can't really flush at arbitrary
points very easily. So wraparound is handled by resetting to top of
ringbuffer. Previously this would stall until current rendering is
complete. Instead cycle through multiple ringbuffers to avoid a stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 6 Sep 2013 16:47:18 +0000 (12:47 -0400)]
freedreno: emit markers to scratch registers
Emit markers by writing to scratch registers in order to "triangulate"
gpu lockup position from post-mortem register dump. By comparing
register values in post-mortem dump to command-stream, it is possible to
narrow down which DRAW_INDX caused the lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 6 Sep 2013 14:23:14 +0000 (10:23 -0400)]
freedreno: split out WFI helper
Mostly just to give an easy debug/instrumentation point.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 2 Sep 2013 11:32:22 +0000 (07:32 -0400)]
freedreno: fd_draw helper
Have a single helper that all draws come through.. mainly for a
convenient debug and instrumentation point.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 4 Sep 2013 02:00:47 +0000 (22:00 -0400)]
freedreno/a3xx: fix gpu lockup in some piglit tests
The varying-out config comes from the inputs of the frag shader (so that
we aren't exporting unneeded varyinges). The varyings-count should come
from the frag shader as well, to avoid a discrepency in configuration
and resulting gpu lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 1 Sep 2013 15:35:56 +0000 (11:35 -0400)]
freedreno/a3xx/compiler: add LIT
Needed by glxgears and etuxracer ;-)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 31 Aug 2013 13:14:27 +0000 (09:14 -0400)]
freedreno: multi-slice resources (cubemap, mipmap, etc)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Paul Berry [Thu, 12 Sep 2013 16:11:37 +0000 (09:11 -0700)]
glsl/builtins: Fix {texture1D,texture2D,shadow1D}ArrayLod availibility.
These functions are defined in EXT_texture_array, which makes no
mention of what shader types they should be allowed in. At the time
EXT_texture_array was introduced, functions ending in "Lod" were
available only in vertex shaders, however this restriction was lifted
in later spec versions and extensions.
We already have the function lod_exists_in_stage() for figuring out
whether functions ending in "Lod" should be available, so just re-use
that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Mon, 2 Sep 2013 00:31:54 +0000 (17:31 -0700)]
i965: Use brw_stage_state for WM data as well.
This gets the VS, GS, and PS all using the same data structure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Mon, 2 Sep 2013 00:18:22 +0000 (17:18 -0700)]
i965: Increase the size of brw_stage_state::surf_offset.
Since BRW_MAX_WM_SURFACES is greater than BRW_MAX_VEC4_SURFACES, the
existing array isn't large enough to be used by the WM. Increasing it
will make it possible to share them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Mon, 2 Sep 2013 00:14:25 +0000 (17:14 -0700)]
i965: Add comments to the new brw_state_state structure's fields.
These are largely based on the similar fields in brw->wm.
v2: Add a better comment than "Scratch buffer".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Ian Romanick [Thu, 12 Sep 2013 16:40:00 +0000 (11:40 -0500)]
mesa: Rename MESA_shader_integer_mix to EXT_shader_integer_mix
Everyone at the Khronos meeting was as surprised that GLSL didn't
already support this as we were. Several vendors said they'd ship it,
but there didn't seem to be enough interest to put in the effort to make
it ARB or KHR.
v2: Fix a couple typos and rename the spec file to
EXT_shader_integer_mix.spec. Suggested by Roland.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Marek Olšák [Fri, 6 Sep 2013 19:59:29 +0000 (21:59 +0200)]
radeonsi: fix and enable transform feedback for CIK
The CP_STRMOUT_CNTL register was moved again.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Thu, 5 Sep 2013 13:39:57 +0000 (15:39 +0200)]
radeonsi: fix gl_InstanceID with non-zero start_instance
start_instance doesn't affect gl_InstanceID.
There's no piglit test, but it's kinda obvious the code was wrong.
Reviewed-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Thu, 5 Sep 2013 13:38:42 +0000 (15:38 +0200)]
gallium: comment that INSTANCEID doesn't include start_instance
Reviewed-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 18 Aug 2013 01:05:19 +0000 (03:05 +0200)]
radeonsi: enable streamout AKA transform feedback for SI
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 1 Sep 2013 21:59:06 +0000 (23:59 +0200)]
radeonsi: implement streamout shader support
The shader is responsible for writing to streamout buffers using
the TBUFFER_STORE_FORMAT_* instructions.
The locations of some input SGPRs and VGPRs are assigned dynamically, because
the input SGPRs controlling streamout are not declared if they are not needed,
decreasing the indices of all following inputs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Mon, 26 Aug 2013 16:17:09 +0000 (18:17 +0200)]
radeonsi: implement glDrawTransformFeedback functionality
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 21 Aug 2013 12:27:17 +0000 (14:27 +0200)]
radeonsi: fix streamout queries
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Mon, 2 Sep 2013 10:57:46 +0000 (12:57 +0200)]
radeonsi: implement streamout flush properly
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 18 Aug 2013 00:34:23 +0000 (02:34 +0200)]
radeonsi: bind streamout buffers to VGT and the vertex shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 18 Aug 2013 01:05:34 +0000 (03:05 +0200)]
radeonsi: handle rasterizer_discard and set GS_OUT_PRIM_TYPE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Fri, 30 Aug 2013 22:13:43 +0000 (00:13 +0200)]
radeonsi: initialize the first CS like any other
So that the "init" state is always emitted first and not later in draw_vbo.
This fixes streamout where the "init" state, which disables streamout,
was emitted in draw_vbo after streamout was enabled.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 13 Aug 2013 23:52:38 +0000 (01:52 +0200)]
radeonsi: integrate shared streamout state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 1 Sep 2013 21:00:28 +0000 (23:00 +0200)]
radeon: don't emit streamout state if there are no streamout buffers
This could happen if set_stream_output_targets is called twice
in a row without a draw call in between.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Aug 2013 00:32:22 +0000 (02:32 +0200)]
radeon: don't emit VGT_STRMOUT_BUFFER_BASE on SI
The register doesn't exist on SI.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Kenneth Graunke [Fri, 6 Sep 2013 22:41:19 +0000 (15:41 -0700)]
mesa: Disallow relinking if a program is used by an active XFB object.
Paused transform feedback objects may refer to a program other than the
current program. If any active objects refer to a program, LinkProgram
must reject the request to relink.
The code to detect this is ugly since _mesa_HashWalk is awkward to use,
but unfortunately we can't use hash_table_foreach since there's no way
to get at the underlying struct hash_table (and even then, we'd need to
handle locking somehow).
Fixes the last subcase of Piglit's new ARB_transform_feedback2
api-errors test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Kenneth Graunke [Fri, 6 Sep 2013 21:51:26 +0000 (14:51 -0700)]
mesa: Reject ResumeTransformFeedback if the wrong program is bound.
This is actually a pretty important error condition: otherwise, you
could set up transform feedback with one program, and resume it with
a program that generates a completely different set of outputs.
Fixes a subcase of Piglit's new ARB_transform_feedback2 api-errors test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Kenneth Graunke [Fri, 6 Sep 2013 21:47:19 +0000 (14:47 -0700)]
mesa: Track the vertex program active at BeginTransformFeedback() time.
The next few patches will use this for API error checking.
All of the drivers appear to CALLOC_STRUCT transform feedback objects,
so this should be properly NULL initialized on creation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Kenneth Graunke [Fri, 6 Sep 2013 19:38:12 +0000 (12:38 -0700)]
mesa: Disallow TransformFeedbackVaryings when active.
Fixes a subcase of Piglit's new ARB_transform_feedback2 api-errors test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Christian König [Mon, 9 Sep 2013 08:49:55 +0000 (10:49 +0200)]
radeon/uvd: move more logic into the common files
Move the code back into the common UVD files since we now
have base structures for R600 and radeonsi.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Sat, 7 Sep 2013 17:40:34 +0000 (11:40 -0600)]
radeon/uvd: use more sane defaults for bitstream buffer size
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Andreas Boll [Wed, 11 Sep 2013 12:27:08 +0000 (14:27 +0200)]
os: First check for __GLIBC__ and then for PIPE_OS_BSD
Fixes FTBFS on kfreebsd-*
Debian GNU/kFreeBSD doesn't provide getprogname() since it uses stdlib.h
from glibc. Instead it provides program_invocation_short_name from glibc.
You can find the same order in src/mesa/drivers/dri/common/xmlconfig.c
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Tested-by: Julien Cristau <jcristau@debian.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
José Fonseca [Wed, 11 Sep 2013 11:04:29 +0000 (12:04 +0100)]
llvmpipe: Remove the special path for TGSI_OPCODE_EXP.
It was wrong for EXP.y, as we clamped the source before computing the
fractional part, and this opcode should be rarely used, so it's not
worth the hassle.
José Fonseca [Wed, 4 Sep 2013 17:10:35 +0000 (18:10 +0100)]
trace: Several enhancements to dump_state.py
- Handle more calls
- Handle more state
- Try to normalize the output a bit, to eliminate spurious differences
José Fonseca [Wed, 4 Sep 2013 13:54:31 +0000 (14:54 +0100)]
trace: Support bigger TGSI shaders.
Trivial.
Kenneth Graunke [Wed, 11 Sep 2013 18:20:36 +0000 (11:20 -0700)]
glsl: Use sampler_coordinate_components instead of passing it by hand.
We used to pass the number of components actually used for the
coordinate (rather than padding, shadow comparitors, and projectors) by
hand, specifying it on every _texture() call.
The new helper function can just compute this, eliminating a lot of
potential mistakes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Wed, 11 Sep 2013 18:14:14 +0000 (11:14 -0700)]
glsl: Add a new glsl_type::sampler_coordinate_components() function.
This computes the number of components necessary to address a sampler
based on its dimensionality. It will be useful for texturing built-ins.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Johannes Obermayr [Tue, 20 Aug 2013 18:14:00 +0000 (20:14 +0200)]
Move nv30, nv50 and nvc0 to nouveau.
It is planned to ship openSUSE 13.1 with -shared libs.
nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau
related targets.
This change makes it possible to easily build one shared libnouveau.so which is
then LIBADDed.
Also dlopen will be faster for one library instead of three and build time on
-jX will be reduced.
Whitespace fixes were requested by 'git am'.
Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Paul Berry [Sun, 21 Apr 2013 15:51:33 +0000 (08:51 -0700)]
i965/gs: implement EndPrimitive() functionality in the visitor.
According to GLSL, the shader may call EndPrimitive() at any point
during its execution, causing the line or triangle strip currently
being output to be terminated and a new strip to be begun.
This is implemented in gen7 hardware by using one control data bit per
vertex, to indicate whether EndPrimitive() was called after that
vertex was emitted.
In order to make this work without sacrificing too much efficiency, we
accumulate 32 control data bits at a time in a GRF. When we have
accumulated 32 bits (or when the shader terminates), we output them to
the appropriate DWORD in the control data header and reset the
accumulator to 0.
We have to take special care to make sure that EndPrimitive() calls
that occur prior to the first vertex have no effect.
Since geometry shaders that output a large number of vertices are
likely to be rare, an optimization kicks in if max_vertices <= 32. In
this case, we know that we can wait until the end of shader execution
before any control data bits need to be output.
I've tried to write the code in such a way that in the future, we can
easily adapt it to output stream ID bits (which are two bits/vertex
instead of one).
Fixes piglit tests "spec/glsl-1.50/glsl-1.50-geometry-end-primitive *".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sun, 21 Apr 2013 15:51:33 +0000 (08:51 -0700)]
i965/vec4: Add the ability to emit opcodes with just a dst register.
This is needed for GS_OPCODE_PREPARE_CHANNEL_MASKS.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sun, 21 Apr 2013 15:51:33 +0000 (08:51 -0700)]
i965/gs: Add opcodes needed for EndPrimitive().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 12 Aug 2013 03:29:34 +0000 (20:29 -0700)]
i965/gen7: Add the ability to send URB_WRITE_OWORD messages.
Previously, brw_urb_WRITE() would always generate a URB_WRITE_HWORD
message, we always wanted to write data to the URB in pairs of varying
slots or larger (an HWORD is 32 bytes, which is 2 varying slots).
In order to support geometry shader EndPrimitive functionality, we'll
need the ability to write to just a single OWORD (16 byte) slot, since
we'll only be outputting 32 of the control data bits at a time. So
this patch adds a flag that will cause brw_urb_WRITE to generate a
URB_WRITE_OWORD message.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sun, 11 Aug 2013 04:57:59 +0000 (21:57 -0700)]
i965/gen7: Allow URB_WRITE channel masks to be used.
Previously, brw_urb_WRITE() would unconditionally override the channel
masks in the URB_WRITE message to 0xff (indicating that all channels
should be written to the URB).
In order to support geometry shader EndPrimitive functionality, we'll
need the ability to set the channel masks programatically, so that we
can output just 32 of the control data bits at a time. So this patch
adds a flag that will prevent brw_urb_WRITE() from overriding them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 19 Aug 2013 04:18:19 +0000 (21:18 -0700)]
i965/gs: Set control data header size/format appropriately for EndPrimitive().
The gen7 geometry shader uses a "control data header" at the beginning
of the output URB entry to store either
(a) flag bits (1 bit/vertex) indicating whether EndPrimitive() was
called after each vertex, or
(b) stream ID bits (2 bits/vertex) indicating which stream each vertex
should be sent to (when multiple transform feedback streams are in
use).
Fortunately, OpenGL only requires separate streams to be supported
when the output type is points, and EndPrimitive() only has an effect
when the output type is line_strip or triangle_strip, so it's not a
problem that these two uses of the control data header are mutually
exclusive.
This patch modifies do_vec4_gs_prog() to determine the correct
hardware settings for configuring the control data header, and
modifies upload_gs_state() to propagate these settings to the
hardware.
In addition, it modifies do_vec4_gs_prog() to ensure that the output
URB entry is large enough to contain both the output vertices *and*
the control data header.
Finally, it modifies vec4_gs_visitor so that it accounts for the size
of the control data header when computing the offset within the URB
where output vertex data should be stored.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Fixed incorrect handling of IVB/HSW differences.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 19 Aug 2013 03:59:37 +0000 (20:59 -0700)]
glsl: During linking, record whether a GS uses EndPrimitive().
This information will be useful in the i965 back end, since we can
save some compilation effort if we know from the outset that the
shader never calls EndPrimitive().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Wed, 27 Mar 2013 20:21:36 +0000 (13:21 -0700)]
i965/gs: Add a state atom to set up geometry shader state.
v2: Do not attempt to share the code that uploads
3DSTATE_BINDING_TABLE_POINTERS_GS, 3DSTATE_SAMPLER_STATE_POINTERS_GS,
or 3DSTATE_GS with VS.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v3: Add _NEW_TRANSFORM to gen7_gs_state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 9 Sep 2013 14:28:17 +0000 (07:28 -0700)]
i965/gen7: Extract a function for setting up a shader stage's constants.
This will allow us to reuse some code when setting up the geometry
shader stage.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Torsten Duwe [Tue, 10 Sep 2013 21:36:48 +0000 (23:36 +0200)]
wayland-egl.pc requires wayland-client.pc.
Mesa provides the wayland-egl libs and the pkgconfig file, but the headers
originate from the wayland package. Ensure everything matches, by requiring
application builds to look at the wayland headers as well.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
Johannes Obermayr [Tue, 10 Sep 2013 21:36:47 +0000 (23:36 +0200)]
st/gbm: Add $(WAYLAND_CFLAGS) for HAVE_EGL_PLATFORM_WAYLAND.
Maarten Lankhorst [Mon, 9 Sep 2013 11:02:08 +0000 (13:02 +0200)]
st/dri: do not create a new context for msaa copy
Commit
b77316ad7594f
st/dri: always copy new DRI front and back buffers to corresponding MSAA buffers
introduced creating a pipe_context for every call to validate, which is not required
because the callers have a context anyway.
Only exception is egl_g3d_create_pbuffer_from_client_buffer, can someone test if it
still works with NULL passed as context for validate? From examining the code I
believe it does, but I didn't thoroughly test it.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>