git.libre-soc.org Git - mesa.git/log

nv50: implement new float comparison instructions

untested.

Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>

ilo: implement new float comparison instructions

untested.

Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

gallivm: already pass coords in the right place in the sampler interface

This makes things a bit nicer, and more importantly it fixes an issue
where a "downgraded" array texture (due to view reduced to 1 layer and
addressed with (non-array) samplec instruction) would use the wrong
coord as shadow reference value. (This could also be fixed by passing
target through the sampler interface much the same way as is done for
size queries, might do this eventually anyway.)
And if we'd ever want to support (shadow) cube map arrays, we'd need
5 coords in any case.

v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
using wrong shadow coord for 2d...). Plus need to project the shadow
coord, and just for fun keep projecting the layer coord too.

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: change coordinate handling throughout functions

Instead of passing s,t,r coordinates pass a coord array - the reason is that
I need to pass more coords (in particular for shadow "coord", future will also
need another one for cube map arrays) so just pass them as an array.
Also, to simplify things, use fixed location for the shadow reference value I
want to get rid of the silly "where is the right coord value" game.
Keep old-style however for aos sampling (which is not going to need shadow
coord, though for cube map arrays it still would need fixing).
(Next patch will pass those through using the new arrangement directly from
sampler interface.)

v2: fix up soa split path (unreachable currently but still...)

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: fix border color with normalized texture formats

We need to put border color into texture format color space which
essentially means clamping for non-float, normalized formats (not entirely
sure if we're also meant to quantize the float but it's probably ok not to
do it thankfully).
For OpenGL we could do this easily outside generated code due to the
1:1 sampler/texture correspondence but not for d3d10 which is terrible
(as we recalculate a constant over and over again per shader invocation).
Fortunately border color should be rare enough that we don't care THAT much.

Reviewed-by: Zack Rusin <zackr@vmware.com>

llvmpipe: fix pipeline statistics with a null ps

If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

draw: make sure that the stages setup outputs

Calling the prepare outputs cleans up the slot assignments
for outputs, unfortunately aapoint and aaline didn't have
code to reset their slots after the initial setup, this
was messing up our slot assignments. The unfilled stage
was just missing the initial assignment of the face slot.
This fixes all of the reported piglit failures.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

glsl: Fix incorrect pattern matching in ir_set_program_inouts

In commit 8fc41df (glsl: Modify ir_set_program_inouts to handle
geometry shaders), when attempting to pattern match the "foo" part of
expressions such as:

   foo[i][j]
   foo[i]

I incorrectly called as_dereference_variable() on the subexpression
foo[i] instead of foo.  As a result, the pattern never matched, so
ir_set_program_inouts would fall back on marking the entire variable
as used, rather than just the portion indexed by the array.

This didn't result in incorrect behaviour, but it could have resulted
in inefficiency by causing the back-end to allocate resources for
unused parts of an input or output array.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

vl: Add support for max level query v2

This patch adds the level query support to the video decoders
and uses some more reasonable defaults.

v2: (ck) add commit message

Reviewed-by: Christian König <christian.koenig@amd.com>

glsl: Emit better warnings for things that look like default precision statements

Previously we would emit a warning for empty declarations like

float;

We would also emit the same warning for things like

highp float;

However, this second case is most likely the application trying to set
the default precision.  This makes the compiler generate a stronger
warning with some suggestion of a fix.

It really seems like this should be an error.  I'll bet that 100% of the
time someone writes 'highp float;' the actually meant 'precision highp
float;'.  Alas, both AMD and NVIDIA accept this syntax, and the spec
doesn't explicitly forbid it.

This makes piglit's precision-05.vert generate the following warnings:

0:12(11): warning: empty declaration with precision qualifier, to set the default precision, use `precision lowp float;'
0:13(12): warning: empty declaration with precision qualifier, to set the default precision, use `precision mediump int;'

v2: Add { } around a one-line if body and fix a comment.  Suggested by
Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

glsl/ast: Don't perform GS input array checks on non-inputs.

Previously, we were accidentally calling
handle_geometry_shader_input_decl() on non-input interface block
declarations, resulting in bogus error checking.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl/ast: Fix assertion failure when GS input declared as non-array.

Previously, if a geometry shader input was declared as a non-array, we
would flag the proper compiler error, but then before we got a chance
to report it to the client, handle_geometry_shader_input_decl() would
assertion fail.

With this patch, handle_geometry_shader_input_decl() ignores
non-arrays.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl/ast: Check that geometry shader interface block inputs are arrays.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/gen7+: Fix build error introduced by renaming upload_3dstate_so_decl_list.

Commit 9f9ccf707c54156b4559a4b1206022c2ca2d45cd renamed
upload_3dstate_so_decl_list to gen7_upload_3dstate_so_decl_list but
forgot to update the caller.

radeon/llvm: Add missing "%s" format string to fprintf.

This fixes a compilation warning with -Wformat-security.

CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

i965: Move arrays brw_multisample_positions* to new header

Move the arrays to the new header brw_multisample_state.h, which will be
shared with Broadwell code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

i965: Refactor names of sample_positions_8/4x arrays

Place each array in the brw namespace by renaming it:
sample_positions_4x -> brw_multisample_positions_4x
sample_positions_8x -> brw_multisample_positions_8x

This prepares for moving the arrays to a header shared by gen6 and gen8.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

i965/gen7+: Mark upload_3dstate_so_decl_list as non-static (v2)

We will reuse this for Broadwell.

v2: Prefix function name with 'gen7'. (chadv)

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Mark a few brw_draw_upload.c functions as non-static

We will reuse these for Broadwell.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

glsl: Require function return type arrays be explicitly sized

Fixes piglit array-function-return-unsized.vert.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

glsl: Move and refine test for unsized arrays in GLSL ES

GLSL ES does not allow unsized arrays, and GLSL ES 1.00 does not allow
array initializers.  However, GLSL ES 3.00 allows array initializers,
and the initializer can explicitly size the array.  The specification
even includes some examples of this:

    float x[] = float[2] (1.0, 2.0);     // declares an array of size 2
    float y[] = float[] (1.0, 2.0, 3.0); // declares an array of size 3

    float a[5];
    float b[] = a;

Move the unsized array check to after the initializer has been
processed.  If the array is still unsized, generate the error.  This
should have no effect in GLSL ES 1.00 because, as previously mentioned,
array initializers are not allowed.

Fixes piglit "glsl-es-3.00 compiler array-sized-by-initializer.vert".

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "9.1 9.2" <mesa-stable@lists.freedesktop.org>

glx: Generate GLXBadDrawable when drawable is zero

Fixes piglit glx-query-drawable-GLXBadDrawable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

mesa: Use _mesa_detach_renderbuffer when deleting a texture

The functional change is that now invalidate_framebuffer is called if
the texture is actually detached from one of the currently bound FBOs.
Previously this was only done for renderbuffers.

The remaining changes make the texture delete path look more similar to
the renderbuffer delete path. This includes adding relevant spec
quotations to justify the behavior.

Fixes piglit fbo-incomplete "delete texture of bound FBO" test.

v2: Move 'fb->Attachment[i].Texture == att' check from previous patch to
this patch... where it was intended to be in the first place. Noticed
by Chad.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

mesa: Make detach_renderbuffer available outside fbobject.c

Also add a return value indicating whether any work was done.

This will be used by the next patch.

v2: Move 'fb->Attachment[i].Texture == att' check to the next
patch... where it was intended to be in the first place. Noticed by
Chad.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

meta: Don't call _mesa_Ortho with width or height of 0

Fixes failures in oglconform fbo mipmap.manual.color,
mipmap.manual.colorAndDepth, mipmap.automatic, and
mipmap.manualIterateTexTargets subtests.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

r600g/sb: use MULADD workaround on R7xx for MULADD_IEEE

Looks like the same issue that was seen with MULADD in trans slot on
R7xx also affects MULADD_IEEE (maybe all OP3 instructions and MULADD is
just a most frequently used?). So the workaround is to not allow affected
instructions to be placed into the trans slot.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=67927

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

gallivm: implement new float comparison instructions returning integer masks

FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the
select.
And just for consistency use the same appropriate ordered/unordered comparisons
for the old opcodes as well.

Reviewed-by: Zack Rusin <zackr@vmware.com>

tgsi: implement new float comparison instructions returning integer masks

Also while here add a bunch of other forgotten (integer) instructions to
tgsi_util_get_inst_usage_mask() (which isn't used for much except optimizing
away unused input components), though it may still be incomplete.

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallium: add new float comparison instructions returning integer masks

Newer graphic languages don't want messy float mask results but instead true
"boolean" mask results for float comparisons. Otherwise just need to convert
the floats back to integers. Need to keep the old opcodes however due to both
legacy (gl and d3d9) needing them and because older hw can't really deal with
integers. These new FSEQ/FSGE/FSLT/FSNE opcodes are part of integer API and
hence must be supported if a driver claims to support glsl 1.30 (or
PIPE_SHADER_CAP_INTEGERS).

Reviewed-by: Zack Rusin <zackr@vmware.com>

ilo: enable dumping of WM PCB

It was disabled because it wasn't supported.

ilo: no binding table change when constants are pushed

When constants can be pushed, and nothing else requires new SURFACE_STATEs,
there is no need to emit BINDING_TABLE_STATE.

ilo: support push constant model in shaders

Source constants from URB constant data when the constant data can fit in the
PCB.

ilo: support copying constant buffer 0 to PCB

Add ILO_KERNEL_PCB_CBUF0_SIZE so that a kernel can specify how many bytes of
constant buffer 0 need to be copied to PCB.

ilo: make constant buffer 0 upload optional

Add ILO_KERNEL_SKIP_CBUF0_UPLOAD so that we can skip constant buffer 0 upload
when the kernel does not need it.

Revert "ilo: initialize constant buffer SURFACE_STATE early"

This reverts commit a9b800aa81cffdcaef2490ff49986099feae2663. With push
constant support, the constructed SURFACE_STATE is unused and wasted. The
change only slows things down.

gbm: Link to libwayland-drm if Wayland EGL platform is enabled

We were relying on libEGL to pull in libwayland-client symbols, but with
commit 2c2e64edaba0f6aeb181ca5b51eb8dea8e9b39f9 cleaned up the
symbol leak.

https://bugs.freedesktop.org/show_bug.cgi?id=67962

gallivm: fix exec_mask interaction with geometry shader after end of main

Because we must maintain an exec_mask even if there's currently nothing
on the mask stack, we can still have an exec_mask at the end of the program.
Effectively, this mask should be set back to default when returning from main.
Without relying on END/RET opcode (I think it's valid to have neither) it is
actually difficult to do this, as there doesn't seem any reasonable place to
do it, so instead let's just say the exec_mask is invalid outside main (which
it really is effectively).
The problem is that geometry shader called end_primitive outside the shader
(in the epilogue), and as a result used a bogus mask, leading to bugs if we
had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid
the mask combining function when called from outside the shader.

Reviewed-by: Zack Rusin <zackr@vmware.com>

draw: simplify prim mask construction

The code was quite weird, the second comparison was in fact a complete no-op
and we can also do the comparison with the vector directly instead of scalar,
which should not also be faster but it is way more obvious how that mask
is actually going to look like.

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: simplify geometry shader mask handling a bit

Instead of reducing masks to 0/1 simply use the mask directly as -1.
Also use some signed comparison instead of unsigned (as far as I understand
these values have to be (very) small and signed means llvm doesn't have to
apply additional logic to do the unsigned comparisons the cpu can't do).
Saves a couple of instructions in some test geometry shader here.

v2: that was a bit to much optimization, don't skip combining the masks...

Reviewed-by: Zack Rusin <zackr@vmware.com>

draw: (trivial) dump tgsi for geometry shaders with GALLIVM_DEBUG_TGSI

And dump the variant key too (same as vs does).
Just so I can stop wondering why I see the tgsi dump for fs and vs but not
gs...

gallivm: (trivial) fix typo in argument declaration of lp_build_size_query_soa

Was meant to match the name used elsewhere, spotted by Anthony.

i965/fs: Add dump_instruction() support for ARF destinations.

CMP instructions use BRW_ARF_NULL as a destination. Prior to this
patch, dump_instruction() decoded the destination as "???".

Now it decodes BRW_ARF_NULL as "(null)" and other ARFs numerically.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Remove extraneous newline in dump_instruction() for CMP.

This resulted in printouts like:

246: cmp.cmod.f0.0
???, vgrf152, 0.000000f, (null),

With this patch, CMP is properly printed on one line.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.

Many GLSL shaders contain code of the form:

   x = condition ? foo : bar

The compiler emits an ir_if tree for this, since each subexpression
might be a complex tree that could have side-effects and short-circuit
logic operations.

However, the common case is to simply pick one of two constants or
variable's values---which is exactly what SEL is for.  Replacing IF/ELSE
with SEL also simplifies the control flow graph, making optimization
passes which work on basic blocks more effective.

The shader-db statistics:

   total instructions in shared programs: 1655247 -> 1503234 (-9.18%)
   instructions in affected programs:     949188 -> 797175 (-16.02%)

   2,970 shaders were helped, none hurt.  Gained 181 SIMD16 programs.

This helps Valve's Source Engine games (max -41.33%), The Cave
(max -33.33%), Serious Sam 3 (max -18.64%), Yo Frankie! (max -30.19%),
Zen Bound (max -22.22%), GStreamer (max -6.12%), and GLBenchmark 2.7
(max -1.94%).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Consider predicated SEL instructions as whole variable writes.

The instruction

(+f0.0) SEL dst, src0, src1

will write either src0 or src1 to dst, depending on the predicate.
Unlike most predicated instructions, it always writes to dst.

fs_inst::is_partial_write() is supposed to return true if the whole
register is guaranteed to be written. The !inst->predicated check makes
sense for most instructions, which might not write the whole register,
but SEL is a special case.

This caused live interval analysis to ignore the destination of
predicated SEL instructions when computing "def" information.

Requires the previous commit to avoid regressions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Explicitly disallow CSE on predicated instructions.

The existing inst->is_partial_write() already disallows predicated
instructions, so this has no functional change. However, it's worth
doing explicitly since the CSE pass does not consider the flag register.
This means it could blindly factor out operations that use the same
sources, but which have different condition codes set.

This prevents a regression in the next commit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Log a performance warning if skipping 16-wide due to pulls.

Usually, the driver creates both 8-wide and 16-wide variants of every
fragment shader. When 16-wide compilation fails, it logs a performance
warning explaining why only an 8-wide program exists.

However, when there are pull parameters, the driver won't even bother
trying the 16-wide compile (since it would fail). In this case, it
failed to emit a performance warning, leaving no explanation for the
missing 16-wide program.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

ilo: initialize constant buffer SURFACE_STATE early

Fix ilo_gpe_init_view_surface_for_buffer to allow buffer to be NULL, and add
ilo_gpe_set_view_surface_bo to set it later. This allows us to set up
SURFACE_STATE early for constant buffers backed by user buffers.

ilo: 3DSTATE_INDEX_BUFFER may be wrongly skipped

In finalize_index_buffer(), when the current index buffer was destroyed due to
u_upload_data(), it may happen that the new index buffer is at the same
address as the old one. Comparing the pointers to the two buffers could fail
to work, and 3DSTATE_INDEX_BUFFER would be incorrectly skipped.

Holding a reference to the current index buffer before calling u_upload_data()
should fix the problem.

i965: add missing BRW_NEW_INTERPOLATION_MAP to state dump

Makes this flag appear in the output for INTEL_DEBUG=state

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add a new debug mode for the VUE map

INTEL_DEBUG=vue now emits a listing of each slot in the VUE map,
and the corresponding interpolation mode.

V2: Fix whitespace issues.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Don't allow const on out or inout function parameters

Fixes piglit tests const-inout-parameter.frag and
const-out-parameter.frag.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

gallivm: set non-existing values really to zero in size queries for d3d10

My previous attempt at doing so double-failed miserably (minification of
zero still gives one, and even if it would not the value was never written
anyway).
While here also rename the confusingly named int_vec bld as we have int vecs
of different sizes, and rename need_nr_mips (as this also changes out-of-bounds
behavior) to is_sviewinfo too.

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: use texture target from shader instead of static state for size query

d3d10 has no notion of distinct array resources neither at the resource nor
sampler view level. However, shader dcl of resources certainly has, and
d3d10 expects resinfo to return the values according to that - in particular
a resource might have been a 1d texture with some array layers, then the
sampler view might have only used 1 layer so it can be accessed both as 1d
or 1d array texture (I think - the former definitely works). resinfo of a
resource decleared as array needs to return number of array layers but
non-array resource needs to return 0 (and not 1). Hence fix this by passing
the target from the shader decl to emit_size_query and use that (in case of
OpenGL the target will come from the instruction itself).
Could probably do the same for actual sampling, though it may not matter there
(as the bogus components will essentially get clamped away), possibly could
wreak havoc though if it REALLY doesn't match (which is of course an error
but still).

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: honor d3d10's wishes of out-of-bounds behavior for texture size query

Specifically, must return 0 for non-existent mip levels (and non-existent
textures which is an unsolved problem) for everything but total mip count.

Reviewed-by: Zack Rusin <zackr@vmware.com>

glsl: Enable ARB_fragment_coord_conventions functionality in GLSL 1.50.

GLSL 1.50 incorporates the functionality of the
ARB_fragment_coord_conventions extension, so we need to make this
functionality available even if the extension isn't enabled.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

main: Fix deprecation of glLineWidth()

From section E.1 (Profiles and Deprecated Features of OpenGL 3.0)
of the OpenGL 3.0 spec:

"LineWidth is not deprecated, but values greater than 1.0
will generate an INVALID VALUE error"

From context it is clear that values greater than 1.0 should only
generate an INVALID VALUE error in a forward-compatible context.

The code was correctly quoting this spec text, but it was disallowing
all line widths in forward-compatible contexts, instead of just widths
greater than 1.0.

This patch introduces the correct check, so that setting a line width
of 1.0 or less is permitted.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

util: (trivial) fix asm input/output list for fxsave

Otherwise gcc might do very unsafe optimizations, spotted by Uros Bizjak.
Hopefully this time it's finally right?

r600g: disable GPUVM by default

Cayman and trinity systems still seem to suffer from
stability problems with GPUVM. This also fixes compute
on these asics. It can still be enabled for testing
by setting env var RADEON_VA=true.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=65958

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: "9.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>

softpipe: fix the regressions

softpipe has a really weird handling of the draw attrs, lets
just not inject outputs in its data.
Trivial.

draw: rewrite primitive assembler

We can't be injecting the primitive id's in the pipeline because
by that time the primitives have already been decomposed. To
properly number the primitives we need to handle the adjacency
primitives by hand. This patch moves the prim id injection into
the original primitive assembler and completely removes the
useless pipeline stage.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

draw: reset the vertex id when injecting new primitive id

Without reseting the vertex id, with primitives where the same
vertex is used with different primitives (e.g. tri/lines strips)
our vbuf module won't re-emit those vertices with the changed
primitive id. So lets reset the vertex id whenever injecting
new primitive id to make sure that the vertex data is correctly
emitted.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

draw: cleanup the extra attribs

Before inserting new front face and prim id outputs cleanup
the old extra outputs, otherwise our cache will use previous
output slots which will break as soon as outputs of the current
shader don't match the last.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

util: (trivial) fix more compile errors in u_cpu_detect (gcc/x86 this time).

Oops. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=67921

egl: Do not export private symbols

libEGL was incorrectly exporting *all* symbols, public and private.
This patch adds -fvisibility=hidden to libEGL's linker flags to ensure
that only symbols annotated with __attribute__((visibility("default")))
get exported.

Sanity-checked with libEGL's builtin DRI2 driver and the i965 DRI driver
by running Piglit on X/EGL and by running weston-gears on Weston as an
X client.

Sanity-checked with libEGL's Gallium driver (which is not built-in) and
the swrast Gallium driver by running es2gears_x11.

Kristian reviewed the symbol diff in `nm libEGL.so`.

CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: Ian Romanick <idr@freedesktop.org>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

i965: Remember to call intel_prepare_render() before blitting.

Otherwise, blits to the window system buffer may cause crashes,
since dst_irb->mt may be NULL.

This code is lifted straight out of brw_blorp_framebuffer()'s
try_blorp_blit() helper.

Fixes crashes in Piglit's fbo-sys-blit on systems without BLORP.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65919
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>

util: (trivial) fix compile error with MSVC on x86

gallivm: honor d3d10 floating point rules for shadow comparisons

d3d10 specifies ordered comparisons for everything but not_equal which is
unordered (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx).
OpenGL probably doesn't care.

Reviewed-by: Zack Rusin <zackr@vmware.com>

softpipe: don't clamp reference value for shadow comparison for float formats

Clamping is only done for fixed-point formats as part of conversion to
texture format.

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: don't clamp reference value for shadow comparison for float formats

This is wrong both for OpenGL and d3d. (In fact clamping is a side effect
of converting to depth format, so this should really do quantization too
at least in d3d10 for the comparisons to be truly correct.)

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: propagate scalar_lod to emit_size_query too

Clearly the returned values need to be per-element if the lod is per element.
Does not actually change behavior yet.

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallium: clarify SVIEWINFO opcode

This opcode is quite problematic in tgsi, while it tries to mirror
d3d10 resinfo it can't really do what's stated there due to missing
the crazy return type modifiers. Hence specify this is ignored along
with the swizzle.
(Other options would be to have multiple opcodes or specify the ret
type modifier maybe in dst_reg as there's padding bits left there but
it is the only instruction allowing this.)

Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: fix out-of-bounds behavior for fetch/ld

For d3d10 and ARB_robust_buffer_access_behavior, we are required to return
0 for out-of-bounds coordinates (for which we can just enable the code already
there was just disabled). Additionally, also need to return 0 for
out-of-bounds mip level and out-of-bounds layer. This changes the logic
so instead of clamping the level/layer, an out-of-bound mask is computed
instead in this case (actual clamping then can be omitted just like with
coordinates, since we set the fetch offset to zero if that happens anyway).

Reviewed-by: Zack Rusin <zackr@vmware.com>

util: try much harder to set DAZ flag

While so far this only causes some harmless test failures, there's lots more
cpus with DAZ. All 64bit capable ones can do it (particularly relevant for
AMD cpus as they supported sse3 very very late) but if really necessary we
can check support for that for real with some more magic.
(In fact just about ANY cpu with sse2 can support DAZ, I believe the only
exception are first gen P4 (Willamette) and from those only early steppings
which can't do it it's almost like intel forgot to add it... - a real pity
though docs say you can't just try to set it as they will throw a GPF.)
While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672
it does not fix it. Most likely the tests need fixing as I don't think
there's any guarantee about denorm handling in the reference math library
functions if the flags aren't set to standard values. Nevertheless enabling
DAZ on all cpus which can do it should be the right thing to do.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

util: implement table-based + linear interpolation linear-to-srgb conversion

Should be much faster, seems to work in softpipe.
While here (also it's now disabled) fix up the pow factor - the former value
is what is in GL core it is however not actually accurate to fp32 standard
(as it is 1.0/2.4), and if someone would do all the accurate math there's no
reason to waste 8 mantissa bits or so...

v2: use real table generating function instead of just printing the values
(might take a bit longer as it does calculations on some 3+ million floats
but much more descriptive obviously).
Also fix up another inaccurate pow factor (this time in the python code) -
wondering where the couple one bit errors came from :-(.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>

gallivm: fix comment wrt srgb accuracy.

I think it's actually not good enough now...

ilo: get rid of GPE tables completely

Move the estimate functions out of the tables and kill the tables.

ilo: clean up GPE header inclusions

This reduces the number of source files need to be recompiled when GPE
functions are changed other than regular clean ups.

ilo: initialize alpha test state in ilo_gpe_init_dsa

This could speed up BLEND_STATE and COLOR_CALC_STATE emission a bit.

ilo: fold gen6_translate_index_size into the caller

There is only one caller so fold it.

ilo: fold gen6_translate_depth_format into the caller

There is only one caller so fold it.

ilo: Call GPE emit functions directly.

Eliminate pipeline and GPE function vectors and have the pipeline functions
call the GPE emit functions directly.

ilo: move emit functions so that they can be inlined.

r300g/compiler/tests: Pass the required LDFLAGS when building the test program

CC: "9.2 <mesa-stable@lists.freedesktop.org>"

r300g/compiler/tests: Fix segfault

CC: "9.2" <mesa-stable@lists.freedesktop.org>

gallium-egl: Commit the rest of the native_wayland_drm_bufmgr_helper v2 patch

I missed Anders v2 on the list which fixed non-wayland compilation:

http://lists.freedesktop.org/archives/mesa-dev/2013-July/042062.html

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>

egl: Update to Wayland 1.2 server API

Since Wayland 1.2, struct wl_buffer and a few functions are deprecated.

References to wl_buffer are replaced with wl_resource and some getter
functions and calls to deprecated functions are replaced with the proper
new API. The latter changes are related to resource versioning.

Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>

gallium-egl: Don't add a listener for wl_drm twice in wayland platform

A listener is added just after the interface is bound, in
registry_handle_global().

Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>

gallium-egl: Simplify native_wayland_drm_bufmgr_helper interface

The helper provides a series of functions to easy the implementation
of the WL_bind_wayland_display extension on different platforms. But
even with the helpers there was still a bit of duplicated code between
platforms, with the drm authentication being the only part that
differs.

This patch changes the bufmgr interface to provide a self contained
object with a create function that takes a drm authentication callback
as an argument. That way all the helper functions are made static and
the "_helper" suffix was removed from the sources file name.

This change also removes the mix of Wayland client and server code in
the wayland drm platform source file. All the uses of libwayland-server
are now contained in native_wayland_drm_bufmgr.c.

Changes to the drm platform are only compile tested.

Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>

ilo: speed up 3DSTATE_VERTEX_BUFFERS emission a bit

Ignore vbuffer_mask which does not gain us anything.

ilo: skip state emission when reducing sampler count

When the number of sampler states bound is reduced, we are good to keep
referencing the old SAMPLER_STATE array and skip emitting a new one.

ilo: simplify setting of shader samplers and views

Remove the special path that unbinds all samplers/views not in the range.
Just make another call to unbind them.

ilo: correctly check for stencil ref change

I intended to do a memcmp(), not a memcpy()...

draw: fix slot detection

Nowadays -1 for slots means that the semantic is not present, so
we need to store it in a signed variables, otherwise <0 comparisons
are pointless. Fixes
http://bugzilla.eng.vmware.com/show_bug.cgi?id=67811 (at least
with softpipe, edgeflags don't work wit llvmpipe)

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallivm: Fix build - Remove TargetOptions.RealignStack for llvm>=3.4

Since llvm -3.4svn r187618, TargetOptions doesn't provide
RealignStack, so only enable it with llvm<3.4

This option must now be specified using function attributes, see LLVM
commit r187618

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

i965: Add #defines for the MI_LOAD_REGISTER_MEM command.

This command reads a value from memory and writes it to a register (the
opposite of MI_STORE_REGISTER_MEM). It's only available on Gen7+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>

i965: Initialize the intel_context::bufmgr pointer earlier.

This prevents a crash in a future patch.

_mesa_initialize_context() creates a default transform feedback object
by calling the NewTransformFeedbackObject() driver hook. Eventually,
we'll want to subclass that and allocate a buffer object. This means
passing brw->bufmgr to drm_intel_alloc_bo(), and crashing if it isn't
initialized yet.

The buffer manager is actually already initialized; we just hadn't
copied the pointer from intel_screen to intel_context quite early
enough.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>

i965: Tidy preprocessor macros for SO_PRIM_STORAGE_NEEDED registers.

Gen7+ supports four transform feedback streams.  Using a function-like
macro makes it easy to access them by stream number or loop over them.
"GEN7_" prefixes are more common than "_IVB" suffixes, so use that.

Gen6 only supports a single stream, so the single #define should be
fine.  However, SO_NUM_PRIM_STORAGE_NEEDED was a poor name.  For one,
the word "NUM" doesn't appear in the actual name of the register.
It's also confusingly generic, as it doesn't exist on Gen7+.  Add a
"GEN6_" prefix for clarity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>

i965: Tidy preprocessor macros for SO_NUM_PRIMS_WRITTEN registers.

Gen7+ supports four transform feedback streams.  Using a function-like
macro makes it easy to access them by stream number or loop over them.
"GEN7_" prefixes are more common than "_IVB" suffixes, so we use that.

Gen6 only supports a single stream, so the single #define should be
fine.  However, SO_NUM_PRIMS_WRITTEN was confusingly generic, as it
doesn't exist on Gen7+.  Add a "GEN6_" prefix for clarity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>

nvc0: don't access array out of bounds on unexpected sample count