git.libre-soc.org Git - mesa.git/log

radeonsi: Add missing error-checking to si_create_compute_state (v2)

When the uploading of shader fails on si_shader_binary_upload(),
it returns -ENOMEM. We should handle si_shader_binary_upload() failure path
on si_create_compute_state().

CID 1394027

v2: Fixes from Edward O'Callaghan's review
a) Update explicitly return value check with "si_shader_binary_upload() < 0"
b) Update commit message.

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

draw: drop some overflow computations

It turns out that noone actually cares if the address computations overflow,
be it the stride mul or the offset adds.
Wrap around seems to be explicitly permitted even by some other API (which
is a _very_ surprising result, as these overflow computations were added just
for that and made some tests pass at that time - I suspect some later fixes
fixed the actual root cause...). So the requirements in that other api were
actually sane there all along after all...
Still need to make sure the computed buffer size needed is valid, of course.
This ditches the shiny new widening mul from these codepaths, ah well...

And now that I really understand this, change the fishy min limiting
indices to what it really should have done. Which is simply to prevent
fetching more values than valid for the last loop iteration. (This makes
the code path in the loop minimally more complex for the non-indexed case
as we have to skip the optimization combining two adds. I think it should
be safe to skip this actually there, but I don't care much about this
especially since skipping that optimization actually makes the code easier
to read elsewhere.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

draw: simplify fetch some more

Don't keep the ofbit. This is just a minor simplification, just adjust
the buffer size so that there will always be an overflow if buffers aren't
valid to fetch from.
Also, get rid of control flow from the instanced path too. Not worried about
performance, but it's simpler and keeps the code more similar to ordinary
fetch.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

draw: unify linear and elts draw jit functions

The code for elts and linear paths was nearly 100% identical by now - with
the elts path simply having some additional gather for the elements in the
main loop (with some additional small differences before the main loop).

Hence nuke the separate functions and decide this at jit shader execution
time (simply based on the presence of the elts pointer).

Some analysis shows that the generated vs jit functions seem to be just very
minimally more complex than the former elts functions, and almost none of the
additional complexity is in the main loop (basically just the branch logic
for the branch fetching the actual indices).
Compared to linear, the codesize of the function is of course a bit larger,
however the actual executed code in the main loop appears to be near 100%
identical (the additional code looking up indices is skipped as expected).

So, I would not expect a (meaningful) performance difference with the
generated code, neither with elts nor linear, this does however roughly
half the compilation time (the compiled shaders should also use only half
the memory of course).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

draw: use same argument order for jit draw linear / elts functions

This is a bit simpler. Mostly to make it easier to unify the paths later...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

draw: drop unnecessary index overflow handling from vsplit code

This was kind of strange, since it replaced indices which were only
overflowing due to bias with MAX_UINT. This would cause an overflow later
in the shader, except if stride was 0, however the vertex id would be
essentially random then (-1 + eltBias). No test cared about it, though.
So, drop this and just use ordinary int arithmetic wraparound as usual.
This is much simpler to understand and the results are "more correct" or
at least more consistent (vertex id as well as actual fetch results just
correspond to wrapped around arithmetic).
There's only one catch, it is now possible to hit the cache initialization
value also with ushort and ubyte elts path (this wouldn't be an issue if
we'd simply handle the eltBias itself later in the shader). Hence, we need
to make sure the cache logic doesn't think this element has already been
emitted when it has not (I believe some seriously bad things could happen
otherwise). So, borrow the logic which handled this from the uint case, but
not before fixing it up...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

draw: simplify vsplit elts code a bit

vsplit_get_base_idx explicitly returned idx 0 and set the ofbit
in case of overflow. We'd then check the ofbit and use idx 0 instead of
looking it up. This was necessary because DRAW_GET_IDX used to return
DRAW_MAX_FETCH_IDX and not 0 in case of overflows.
However, this is all unnecessary, we can just let DRAW_GET_IDX return 0
in case of overflow. In fact before bbd1e60198548a12be3405fc32dd39a87e8968ab
the code already did that, not sure why this particular bit was changed
(might have been one half of an attempt to get these indices to actual draw
shader execution - in fact I think this would make things less awkward, it
would require moving the eltBias handling to the shader as well).
Note there's other callers of DRAW_GET_IDX - those code paths however
explicitly do not handle index buffer overflows, therefore the overflow
value doesn't matter for them.

Also do some trivial simplification - for (unsigned) a + b, checking res < a
is sufficient for overflow detection, we don't need to check for res < b too
(similar for signed).

And an index buffer overflow check looked bogus - eltMax is the number of
elements in the index buffer, not the maximum element which can be fetched.
(Drop the start check against the idx buffer though, this is already covered
by end check and end < start).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

gallium: Add support for SWR compilation

Include swr library and include -DHAVE_SWR in the compile line.

v3: split to a separate commit

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

gallium: swr: Added swr build for windows

v4: Add windows-specific gen_knobs.{cpp|h} changes
v5: remove aggresive squashing of gen_knobs.py to this commit; added
SConscript to EXTRA_DIST in Makefile.am

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

swr: Modify gen_knobs.{cpp|h} creation script

Modify gen_knobs.py so that each invocation creates a single generated
file. This is more similar to how the other generators behave.

v5: remove Scoscript edits from this commit; moved to commit that first
adds SConscript

Acked-by: Emil Velikov <emil.velikov@collabora.com>

scons: Add swr compile option

To buils The SWR driver (currently optional, not compiled by default)

v3: add option as opposed to target

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

swr: Windows-related changes

- Handle dynamic library loading for windows
- Implement swap for gdi
- fix prototypes
- update include paths on configure-based build for swr_loader.cpp

v2: split to multiple patches
v3: split and reshuffle some more; renamed title
v4: move Makefile.am changes to other commit. Modify header files

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

swr: renamed duplicate swr_create_screen()

There are 2 swr_create_screen() functions. One in swr_loader.cpp, which
is used during driver init, and the other is hiding in swr_screen.cpp,
which ends up in the arch-specific .dll/.so.

Rename the second one to swr_create_screen_internal(), to avoid confusion
in header files.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

swr: Handle windows.h and NOMINMAX

Reorder header files so that we have a chance to defined NOMINMAX before
mesa include files include windows.h

v3: split from bigger patch

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

gallium: Added SWR support for gdi

Added hooks for screen creation and swap. Still keep llvmpipe the default
software renderer.

v2: split from bigger patch
v3: reword commit message

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

scons: add llvm 3.9 support.

v2: reworded commit message

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

scons: ignore .hpp files in parse_source_list()

Drivers that contain C++ .hpp files need to ignore them too, along
with .h files, when building source file lists.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

mesa: removed redundant #else

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

i965/hsw: Set integer mode in sampling state for stencil texturing

Fixes:

ES31-CTS.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_pot
ES31-CTS.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_npot
ES31-CTS.functional.texture.border_clamp.formats.depth32f_stencil8_sample_stencil.nearest_size_pot
ES31-CTS.functional.texture.border_clamp.formats.depth32f_stencil8_sample_stencil.nearest_size_npot
ES31-CTS.functional.texture.border_clamp.unused_channels.depth24_stencil8_sample_stencil
ES31-CTS.functional.texture.border_clamp.unused_channels.depth32f_stencil8_sample_stencil

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

reviewers: add Rob H for the Android EGL+build parts

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

docs: recommend using --enable-mangling over the manual -DUSE...

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs: rework/update install.html

Still far from perfect, but a few small steps in the right direction.

- Split build systems, compilers, third party tools
- Mention building mesa for Android (part of AOSP)
- Drop explicit "other" dependencies. Reference to disto methods to
get them.
- HTML 4.01 Traditional compliance fixes - mixed ul and br tags.
- nuke dead links README.{CYGWIN,VMS}

v2: Squash typos, add note about buggy flex 2.6.2 (Eric), add Suse
zipper command (Tobias).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs: sourcetree.html misc updates

A mixed bag of updates/fixes - mostly aiming at removing no longer
applicable directories.

Add a few more state-trackers, drivers, etc. alongside "XXX more" where
applicable. Attribute for the GLSL/NIR movement and nukage of
src/egl/docs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs: flesh out releasing.html

Properly document the whole process:
- Brief on what, when, where
- Picking, testing, branchpoints, pre-release announcement
- Releasing, announcement, website and bugzilla updates

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs/submittingpatches: fix tags mis/abuse

Fix the odd tag so that we're HTML 4.01 Traditional compliant

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs/submittingpatches: flesh out "how to nominate" methods

Currently they are buried within the text, making it hard to find.
Move them to the top and be clear what is _not_ a good idea.

v2: Minor commit polish, use only "resending" as suggested by Matt.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs/autoconf: update glx driver / enable-debug text

With earlier commit we folded all the xlib handling in --enable-glx, but
we forgot to update the documentation.

Elaborate on --enable-debug and drop mentions about depenencies.

v2: Grammar - s|haven't|hasn't| (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs/repository: refer to Submitting patches

v2: Improve grammar - add missing "to" (Eric).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs: split Submitting Patches into separate document

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs: split Codying style into separate document

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs: mention/suggest testing your patch against dEQP

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

docs: mention that coding style can differ between drivers

... and point people to use/honour the EditorConfig/Emacs files, where
applicable.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

revieweds: add Tomasz for the Android/EGL implementation

As mentioned/requested on the mailing list.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

mesa: fold always true conditional

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

mesa: drop unneeded assert

As seen a couple of lines above - there's no way for the assert to
trigger.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

egl/wayland: remove non-applicable destroyDrawable from error path

If we fail to create the drawable there's not much point in attampting
to destroy it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

loader: automake: whitespace cleanup

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

gbm: automake: remove unused defines

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

intel: aubinator: Fix resource leak in gen_spec_load_from_path

This fixes resource leak in gen_spec_load_from_path XML_ParserCreate
failure path

CID 1373564

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

egl/android: Use gralloc::lock_ycbcr for resolving YUV formats (v2)

There is an interface that can be used to query YUV buffers for their
internal format. Specifically, if gralloc:lock_ycbcr() is given no SW
usage flags, it's supposed to return plane offsets instead of pointers.
Let's use this interface to implement support for YUV formats in Android
EGL backend.

v2: Fixes from Emil's review:
a) Added comments for parts that might be not clear,
b) Changed get_fourcc_yuv() to return -1 on failure,
c) Changed is_yuv() to use bool.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

egl/android: Get gralloc module in dri2_initialize_android() (v2)

Currently droid_open_device() gets a reference to the gralloc module
only for its own use and does not store it anywhere. To make it possible
to call gralloc methods from code added in further patches, let's
refactor current code to get gralloc module in dri2_initialize_android()
and store it in dri2_dpy.

v2: fixes from Emil's review:
a) remove duplicate initialization of 'err'.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

egl/android: Remove handling of RGB_888 pixel format

It is currently completely broken, as it ends up using RGBX_8888 on
hardware side, due to no way of distinguishing between these two in the
DRI API, while HAL_PIXEL_FORMAT_RGB_888 is clearly defined to be the
3-byte per pixel RGB format.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

radeonsi: Fix resource leak in gs_copy_shader allocation failure path

CID 1394028

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

glsl/lower_output_reads: remove unused mem_ctx

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glsl/lower_output_reads: bail early in tessellation control shaders

This whole pass is a no-op.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glsl/lower_output_reads: fix geometry shader output handling with conditional emit

Consider a geometry shader that contains code like this:

   some_out = expr;

   if (cond) {
      ...
      EmitVertex();
   } else {
      ...
      EmitVertex();
   }

Both branches should see the correct value of some_out.

Since this is a rather subtle and rare case, I'm submitting a piglit test
for this as well.

GLSL says that the values of output variables are undefined after
EmitVertex(). With this change, the values will now be defined and
unmodified. This may reduce optimization opportunities in the probably
quite rare case where subsequent compiler passes cannot prove that the
value of the output variable is overwritten.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: store group_size_variable in struct si_compute

For compute shaders, we free the selector after the shader has been
compiled, so we need to save this bit somewhere else. Also, make sure that
this type of bug cannot re-appear, by NULL-ing the selector pointer after
we're done with it.

This bug has been there since the feature was added, but was only exposed
in piglit arb_compute_variable_group_size-local-size by commit
9bfee7047b70cb0aa026ca9536465762f96cb2b1 (which is totally unrelated).

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glsl: don't flatten if-blocks with dynamic array indices

This fixes the regression of radeonsi in
glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr
caused by commit 74e39de9324d2d2333cda6adca50ae2a3fc36de2.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

anv/state: enable coordinate address rounding for Min/Mag filters

This patch improves pass rate of dEQP-VK.texture.explicit_lod.2d.sizes.*
from 68.0% (98/144) to 83.3% (120/144) by enabling sampler address
rounding mode when the selected filter is not nearest, which is the same
thing we do for OpenGL.

These tests check texture filtering for various texture sizes and mipmap
levels. The failures (without this patch) affect cases where the target
texture has odd dimensions (like 57x35) and either the Min or the Mag filter
is not nearest.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv: Implement a depth stall restriction on gen7

Fixes around 60 Vulkan CTS tests on Haswell

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>

nvc0/ir: use levelZero flag when the lod is set to 0

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radv: spir-v allows texture size query with and without lod.

The translation to llvm was failing here due to required lod.

This fixes some new SteamVR shaders.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: fix image view creation for depth and stencil only

This fixes the image view for sampling just the depth.

It removes some pointless swizzle code, and adds
a missing case for the x8_d24 format.

Fixes:
dEQP-VK.renderpass.formats.d32_sfloat_s8_uint.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: make sure to flush input attachments correctly.

This fixes 9 of the
dEQP-VK.renderpass.attachment_allocation.input_output.*
tests.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

tnl: remove unneeded #include "util/simple_list.h"

Reviewed-by: Vinson Lee <vlee@freedesktop.org>

radeon: remove unneeded #include "util/simple_list.h"

Compile tested only.

Reviewed-by: Vinson Lee <vlee@freedesktop.org>

r200: remove unneeded #include "util/simple_list.h"

And include "util/simple_list.h" where it is needed in r200_state.c
Compile tested only.

Reviewed-by: Vinson Lee <vlee@freedesktop.org>

i915: remove unneeded #include "util/simple_list.h"

Compile tested only.

Reviewed-by: Vinson Lee <vlee@freedesktop.org>

mesa: remove unneeded #includes in errors.c

Reviewed-by: Vinson Lee <vlee@freedesktop.org>

mesa: remove trailing whitespace in errors.c

Reviewed-by: Vinson Lee <vlee@freedesktop.org>

nir: Add a C wrapper for glsl_type::is_array_of_arrays().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Store a clip_distance_mask field similar to cull_distance_mask.

This isn't useful for legacy GL, but will be used in Vulkan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Use shader_info for brw_vue_prog_data::cull_distance_mask.

This also allows us to move it from a GL specific location to a
part of the compiler shared by both GL and Vulkan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

compiler: Store the clip/cull distance array sizes in shader_info.

We switched from a boolean to array lengths in gl_program a while back.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Fix GS push inputs with enhanced layouts.

We weren't taking first_component into account when handling GS push
inputs. We hardly ever push GS inputs, so this was not caught by
existing tests. When I started using component qualifiers for the
gl_ClipDistance arrays, glsl-1.50-transform-feedback-type-and-size
started catching this.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Delete unused variable.

I forgot to delete this in 9ef2b9277d3bead6dbfa47e95794ca61e8be4e84.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

intel: Share URB configuration code between GL and Vulkan.

This code is far too complicated to cut and paste.

v2: Update the newly added genX_gpu_memcpy.c; const a few things.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Use arrays in Gen7+ URB code.

So much of this code was cut and pasted per stage. We can accomplish
much of it by looping over shader stages.

Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049%
(n = 71) at 1024x768 on Cherryview.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Drop brw->urb.{nr_*_entries,*_start} assignments from gen7_urb.c.

The context fields are for Gen4-5; setting them has always been useless.
There's no point in spending the cost in the hottest path in the driver.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Switch to roundf in HS/DS URB code.

Matt intentionally switched the VS calculation to be float-based in
commit c1da15709a0c0c2775bd9e534f67c60f7dc95ce8. Tessellation support
was written before this and rebased forward, and missed the change.

Now it's consistent.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Make URB code use prog_data for GS/tessellation enable checks.

If geometry/tessellation shaders are disabled, prog_data will be NULL
(see brw_state_upload.c). This consolidates dirty bits a little.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

intel: Convert devinfo->urb.min_*_entries into an array.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

intel: Convert devinfo->urb.max_*_entries into an array.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

docs: document MESA_DEBUG=context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

swr: mark streamout buffers as written

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

st/mesa/glsl/nir/i965: make use of new gl_shader_program_data in gl_shader_program

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

mesa: create new gl_shader_program_data struct

This will be used to share data between gl_program and gl_shader_program
allowing for greater code simplification as we can remove a number of
awkward uses of gl_shader_program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

glsl: add new program driver function to standalone compiler

This fixes a regression with the standalone compiler caused by
9d96d3803ab5dc

Note that we change standalone_compiler_cleanup() to no longer
explicitly free the linked shaders as the will be freed when
we free the parent ctx whole_program.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98774

i965: Disable depth writes when depth test is GL_EQUAL.

There's no point in performing depth writes when the depth test
comparison function is set to GL_EQUAL - it would just write out
the same value that's already there (if it is written at all).  While
this is harmless from a functional perspective, it hurts performance.
Obviously, writing to memory is not free, but there's another more
subtle impact as well: it can prevent early depth optimizations.

Depth writes aren't supposed to happen for pixels that are killed
by fragment shader discard statements or the alpha test.  So, with
depth writes enabled and either of those, the pixel shader must be
invoked to determine whether or not to perform the write.  This is
fairly stupid in the EQUAL case - we're running a shader to decide
whether to replace the existing depth value with itself.

By disabling these pointless writes, we allow early depth even with
discards and alpha testing, allowing the hardware to skip the pixel
shader altogether if the depth test fails.

Improves performance of Unigine Valley:

- Skylake GT2:    +17.8%
- Broadwell GT3e: +11.5%
- Cherrytrail:    +19.4%

Huge thanks to Mark Janes for building frameretrace [1], the performance
analysis tool that helped us find this issue, and to Robert Bragg for
providing us performance metrics on Linux.  Mark also spent the time to
analyze Valley performance on Windows vs. Linux and discovered a
discrepancy in early depth test metrics.  Once he had isolated a draw
call and drawn attention to the problem, fixing it was pretty simple.

[1] https://github.com/janesma/apitrace/wiki/frameretrace-branch

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

glsl: tidy up entries temporary

Here we just move initialisation of entries to where it is needed i.e.
outside the loop and after the continue checks.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

glsl/i965: move per stage AtomicBuffers list to gl_program

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

glsl: create gl_program at the start of linking rather than the end

This will allow us to directly store metadata we want to retain in
gl_program this metadata is currently stored in gl_linked_shader and
will be lost if relinking fails even though the program will remain
in use and is still valid according to the spec.

"If a program object that is active for any shader stage is re-linked
unsuccessfully, the link status will be set to FALSE, but any existing
executables and associated state will remain part of the current
rendering state until a subsequent call to UseProgram,
UseProgramStages, or BindProgramPipeline removes them from use."

This change will also help avoid the double handing that happens in
_mesa_copy_linked_program_data().

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

st/mesa/i965: simplify gl_program references and stop leaking

In i965 we were calling _mesa_reference_program() after creating
gl_program and then later calling it again with NULL as a param
to get the refcount back down to 1. This changes things to not
use _mesa_reference_program() at all and just have gl_linked_shader
take ownership of gl_program since refcount starts at 1.

The st and ir_to_mesa linkers were worse as they were both getting
in a state were the refcount would never get to 0 and we would leak
the program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

anv/cmd_buffer: Enable stencil-only HZ clears

The HZ sequence modifies less state than the blorp path and requires
less CPU time to generate the necessary packets.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv/cmd_buffer: Manage Anv state around HZ op emission

Move the assignment to a less surprising location.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv/cmd_buffer: Clarify HZ rectangle behavior

This behavior differs from what's described in the PRMs and was
observed by analyzing CTS test results.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

mesa/fbobject: Update CubeMapFace when reusing textures

Framebuffer attachments can be specified through FramebufferTexture*
calls. Upon specifying a depth (or stencil) framebuffer attachment that
internally reuses a texture, the cube map face of the new attachment
would not be updated (defaulting to TEXTURE_CUBE_MAP_POSITIVE_X).
Fix this issue by actually updating the CubeMapFace field.

This bug manifested itself in BindFramebuffer calls performed on
framebuffers whose stencil attachments internally reused a depth
texture. When binding a framebuffer, we walk through the framebuffer's
attachments and update each one's corresponding gl_renderbuffer. Since
the framebuffer's depth and stencil attachments may share a
gl_renderbuffer and the walk visits the stencil attachment after
the depth attachment, the uninitialized CubeMapFace forced rendering
to TEXTURE_CUBE_MAP_POSITIVE_X.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77662
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

mesa: add NV_image_formats extension support

This extension can be enabled automatically as it is a subset of
ARB_shader_image_load_store.

v2: Replace helper function by qualifier struct field (Ilia)
Enable NV_image_formats using ARB_shader_image_load_store (Ilia)

v3: Drop extension field from gl_extensions (Ilia)
Release notes (Ilia)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98480
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

mesa: fix old classic drivers to use ralloc for ARB asm programs

These changes were missed in 0ad69e6b5.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98767

st/mesa: silence warnings in optimized builds

Mark variables and static functions that only occur in assert()s as
MAYBE_UNUSED.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

radeonsi: emit sample locations also when nr_samples == 1

Since the state tracker now enables MSAA in the hardware for the case
nr_samples == 1 as well, we need to set sample locations correctly for
this case.

The Polaris override is still needed for the non-MSAA case (when
nr_samples == 0).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

radeonsi: allow sample mask export for single-sample framebuffers

This fixes GL45-CTS.sample_variables.mask.*.samples_1.*.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

st/mesa: remove a redundant call to _mesa_is_multisample_enabled

We called it immediately prior, so re-use the previously returned value.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

mesa/main: consider multisampling enabled when number of samples == 1

There are some differences between how non-multisampled framebuffers (i.e.
samples == 0) and multisampled framebuffers with a single sample should be
treated. For example, alpha to coverage and writing to gl_SampleMask has an
effect with single-sample multisample framebuffers, but not on
non-multisample framebuffers.

This fixes GL45-CTS.sample_variables.mask.*.samples_1.* at least for
Gallium drivers (and possibly others, though at least radeonsi needs an
additional fix).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

i965: Delete fs_visitor::nir_setup_single_output_varying prototype.

I deleted this function in 59864e8e02057cc6fa0448a8af067a3cf53389da.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: fix empty program log length

In case we have empty log (""), we should return 0. This fixes
Khronos WebGL conformance test 'program-infolog'.

From OpenGL ES 3.1 (and OpenGL 4.5 Core) spec:
   "If pname is INFO_LOG_LENGTH , the length of the info log, including
    a null terminator, is returned. If there is no info log, zero is
    returned."

v2: apply same fix for get_shaderiv and _mesa_GetProgramPipelineiv (Ian)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97321
Cc: "13.0" <mesa-stable@lists.freedesktop.org>

draw: finally optimize bool clip mask generation

lp_build_any_true_range is just what we need, though it will only produce
optimal code with sse41 (ptest + set) - but even without it on 64bit x86
the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's
going to be roughly the same as before.
While here also make it a "real" 8bit boolean - cuts one instruction but
more importantly similar to ordinary booleans.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

draw: use vectorized calculations for fetch (v2)

Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar.
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).

Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to llvm not recognizing it's all
the same fetch, since it would have been possible some of the fetches
getting replaced with zeros in case vector size exceeds remaining fetch
count - the values of such fetches don't matter at all though).

Also, for elts gathering, use vectorized code as well.

The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).

v3: skip the fake index buffer, not needed due to the jit code never seeing
the real index buffer in the first place.
Fix a bug with mask expansion (needs SExt, not ZExt).
Also, be really really careful to keep the behavior the same, even in cases
where it looks wrong, and add comments why the code is doing the seemingly
wrong stuff... Fortunately it's not actually more complex in the end...
Also change function order slightly just to make the diff more readable.

No piglit change. Passes some internal testing with another api too...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

i965/gen7: Minify blit size for stencil tree copy

Found by the piglit 'fbo-depth-array stencil-clear' test when
implementing blorp blit splitting for gen7.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

mesa: Drop PATH_MAX usage.

GNU/Hurd does not define PATH_MAX since it doesn't have such arbitrary
limitation, so this failed to compile.  Apparently glibc does not
enforce PATH_MAX restrictions anyway, so it's kind of a hoax:

https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html

MSVC uses a different name (_MAX_PATH) as well, which is annoying.

We don't really need it.  We can simply asprintf() the filenames.
If the filename exceeds an OS path limit, presumably fopen() will
fail, and we already check that.  (We actually use ralloc_asprintf
because Mesa provides that everywhere, and it doesn't look like we've
provided an implementation of GNU's asprintf() for all platforms.)

Fixes the build on GNU/Hurd.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98632
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>