git.libre-soc.org Git - mesa.git/log

nir: rename nir_link_constant_varyings() nir_link_opt_varyings()

The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

st/glsl_to_nir: call nir_lower_load_const_to_scalar() in the st

This will help the new opt introduced in the following patches
allowing us to remove extra duplicate varyings.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

radeonsi: make use of ac_are_tessfactors_def_in_all_invocs()

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac/nir_to_llvm: add ac_are_tessfactors_def_in_all_invocs()

The following patch will use this with the radeonsi NIR backend
but I've added it to ac so we can use it with RADV in future.

This is a NIR implementation of the tgsi function
tgsi_scan_tess_ctrl().

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: remove unrequired param in si_nir_scan_tess_ctrl()

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

tgsi/scan: correctly walk instructions in tgsi_scan_tess_ctrl()

The previous code used a do while loop and continues after walking
a nested loop/if-statement. This means we end up evaluating the
last instruction from the nested block against the while condition
and potentially exit early if it matches the exit condition of the
outer block.

Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

tgsi/scan: fix loop exit point in tgsi_scan_tess_ctrl()

This just happened not to crash/assert because all loops have at
least 1 if-statement and due to a second bug we end up matching
the same ENDIF to exit both the iteration over the if-statment
and the loop.

The second bug is fixed in the following patch.

Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

nv30: disable rendering to 3D textures

There's no way to tell the 3D engine about swizzling on such textures.
While rendering to NPOT ones may be possible, there's no great way to
expose that in gallium, nor would there be any practical benefit.

Fixes the non-compressed-format "copyteximage 3D" failures. Something
odd going on with the compressed formats.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

radv: Do a cache flush if needed before reading predicates.

This caused random failures for two conditional rendering tests:

dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard

These wrote the predicate with the vertex shader, did a barrier and then
started the conditional rendering. However the cache flushes for the barrier
only happen on first draw, so after the predicate has been read.

Fixes: e45ba51ea45 "radv: add support for VK_EXT_conditional_rendering"
Reviewed-by: Dave Airlie <airlied@redhat.com>

anv/autotools: make sure tests link with -msse2

Without this, I get the following error when building the tests with
autotools on i686:

---8<---
src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration]
       __builtin_ia32_clflush(p);
       ^~~~~~~~~~~~~~~~~~~~~~
       __builtin_ia32_pause
src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration]
    __builtin_ia32_mfence();
    ^~~~~~~~~~~~~~~~~~~~~
    __builtin_ia32_fnclex
---8<---

The erros are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c

This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Eric Engeström <eric@engestrom.ch>

anv/meson: make sure tests link with -msse2

Without this, I get the following error when building the tests using
meson on i686:

---8<---
In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46,
                 from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26:
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration]
       __builtin_ia32_clflush(p);
       ^~~~~~~~~~~~~~~~~~~~~~
       __builtin_ia32_pause
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration]
    __builtin_ia32_mfence();
    ^~~~~~~~~~~~~~~~~~~~~
    __builtin_ia32_fnclex
---8<---

The errors are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c

This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>

nv30: fix some s3tc layout issues

s3tc layouts are a bit finicky - they're packed, but not swizzled.
Adjust logic to allow for that case:

  - Don't set a uniform pitch for POT-sized compressed textures
  - Adjust define_rect API to be less confused about block sizes
  - Only mark a texture as linear if it has a uniform pitch set

This has been tested to fix xonotic (as well as the s3tc-* piglits)
on nv3x and keeps it working on nv4x.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv30: use correct helper to get blocks in y direction

This doesn't matter since all compressed formats supported by this
hardware use square blocks, but best to use the correct helper.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv30: add support for multi-layer transfers

This logic mirrors what we do on nv50. The relatively new
texture_subdata callback can cause this to happen with 3D textures,
which is triggered at least by xonotic, and probably many piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv30: fix rare issue with fp unbinding not finding the bufctx

If the last-active context gets deleted, the pushbuf doesn't have a
bufctx to reference. Then there could be a sequence of binds which would
trigger a reset on that bin before validation was done. Instead we just
pass in the bufctx in question directly.

All other instances of PUSH_RESET happen strictly after a validation is
run.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv30: avoid setting user_priv without setting cur_ctx

The whole user_priv thing is a mess, but as long as it's there, it
basically has to map 1:1 to the cur_ctx. Unfortunately we were setting
user_priv to some context, then that context could get deleted without
any draws/validations in it, leading user_priv to become NULL, with
cur_ctx still pointing at some old context. Then we wouldn't run the
switch logic, which in turn led to a NULL bufctx being dereferenced.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

v3d: Add support for gl_HelperInvocation.

We can just look at the MSF flags -- if they're unset, then we're
definitely in a helper invocation. Fixes
dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.

v3d: Add support for textureSize() on MSAA textures.

Fixes failures in
dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d
in the GLES3.1 suite.

v3d: Add support for requesting the sample offsets.

v3d: Add support for non-constant texture offsets.

Fixes
dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat
and others.

v3d: Force sampling from base level for tg4.

This is what the GLSL ES 310 spec tells us to do, but apparently the
"gather mode" flag doesn't imply it in the HW. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear

v3d: Add a note for a potential performance win on multop/umul24.

Noticed while debugging a testcase.

v3d: Dead-code eliminate unused flags updates.

The greedy comparison folding in bcsel means that we may have left the
original bool-generating NIR ALU instruction dead, but DCE wasn't
eliminating the VIR code for it because of the flags updates.

total instructions in shared programs: 5186024 -> 5100894 (-1.64%)
instructions in affected programs: 1448695 -> 1363565 (-5.88%)

v3d: Don't generate temps for comparisons.

This was just generated work for vir_opt_dead_code and cluttered up the
dumps.

v3d: Move "does this instruction have flags" from sched to generic helpers.

I wanted to reuse it for DCE of flags updates.

v3d: Drop incorrect dependency for flpop.

It is just shifting probably-means-flags bits out of a value, it doesn't
actually update the flags on its own.

v3d: Drop unused count_nir_instrs() helper.

This was for shader-db, but I haven't cared about NIR instruction counts
in a long time.

v3d: Hook up some shader-db output to GL_ARB_debug_output.

This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general. I formatted it more like Intel's so I can mostly reuse their
report script.

v3d: Add a "precompile" debug flag for shader-db.

I've been using my apitrace-based shader-db so far, but it's slow
(apitrace decompression), intrusive (apitrace windows spamming the
screen), and doesn't have much coverage. The original shader-db provides
a lot more coverage and compiles faster, at the expense of not having the
actual runtime variant key. As v3d has a lot less runtime variation than
vc4 did, this tradeoff makes more sense.

v3d: Fix uniform pretty printing assertion failure with branches.

Fixes: 248a7fb392ba ("v3d: Do uniform pretty-printing in the QPU dump.")

meson: Override C++ standard to gnu++11 when building with altivec on ppc64

Otherwise there will be symbol collisions for the vector name.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108943
Distro Bug: https://bugs.gentoo.org/673622
Fixes: 42ea0631f108d82554339530d6c88aa1b448af1e
("meson: build clover")
Acked-by: Matt Turner <mattst88@gmail.com>

intel/aub_viewer: highlight true booleans

Useful to spot PIPE_CONTROL flags.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

intel/aub_viewer: fold binding/sampler table items

Makes things easier to read rather than a long block of text.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

intel/aub_viewer: fix shader view

Not decoding the shader at the right offset.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

intel/aub_viewer: print address of missing shader

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

intel/aub_viewer: fixup 0x address prefix

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

intel/aub_viewer: fix shader get_bo

Instruction addresses are always in ppgtt space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

radeonsi: Enable adaptive_sync by default for radeon

It's better to let most applications make use of adaptive sync
by default. Problematic applications can be placed on the blacklist
or the user can manually disable the feature.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

loader/dri3: Enable adaptive_sync via _VARIABLE_REFRESH property

The DDX driver can be notified of adaptive sync suitability by
flagging the application's window with the _VARIABLE_REFRESH property.

This property is set on the first swap the application performs
when adaptive_sync is set to true in the drirc.

It's performed here instead of when the loader is initialized for
two reasons:

(1) The window's drawable can be missing during loader init.
This can be observed during the Unigine Superposition benchmark.

(2) Adaptive sync will only be enabled closer to when the application
actually begins rendering.

If adaptive_sync is false then the _VARIABLE_REFRESH property
is deleted on loader init.

The property is only managed on the glx DRI3 backend for now. This
should cover most common applications and games on modern hardware.

Vulkan support can be implemented in a similar manner but would likely
require splitting the function out into a common helper function.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

drirc: Initial blacklist for adaptive sync

Applications that don't present at a predictable rate (ie. not games)
shouldn't have adapative sync enabled. This list covers some of the
common desktop compositors, web browsers and video players.

[ Michel Dänzer: Added entry for firefox-esr ]

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

util: Add adaptive_sync driconf option

This option lets the user decide whether mesa should notify the
window manager / DDX driver that the current application is adaptive
sync capable.

It's off by default.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

util: Get program name based on path when possible

Some programs start with the path and command line arguments in
argv[0] (program_invocation_name). Chromium is an example of
an application using mesa that does this.

This tries to query the real path for the symbolic link /proc/self/exe
to find the program name instead. It only uses the realpath if it
was a prefix of the invocation to avoid breaking wine programs.

Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

etnaviv: Consolidate buffer references from framebuffers

We were leaking surfaces because the references taken in
etna_set_framebuffer_state weren't being released on context destroy.

Instead of just directly releasing those references in
etna_context_destroy, use the util_copy_framebuffer_state helper.

Take the chance to remove the duplicated buffer references in
compiled_framebuffer_state to avoid confusion.

The leak can be reproduced with a client that continuously creates and
destroys contexts.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

virgl/vtest: fix front buffer flush with protocol version 0.

Older versions of virglrenderer before 33da7361aec486290df0aec4ad8dfa8ff6adde2c
in vtest mode, misrender gears.

Fixes: 9d81cd8e7c (virgl: Pass resource size and transfer offsets)
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

docs/autoconf: Mark autoconf as being replaced

I know it's not what anyone wants, but how about we start with a
message in the documentation that encourages people to try meson.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>

docs/install: Update python dependency section

Note that meson requires python 3, scons requires python 2, and
autotools works with either.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>

docs/meson: Update LLVM section with information about native files

Reviewed-by: Eric Engeström <eric@engestrom.ch>

docs/install: Add meson to the main install page

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>

docs: update calendar, add news item and link release notes for 18.2.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

docs: add sha256 checksums for 18.2.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 24c31bc0e237148b1c44811b17c61fc71f09bd93)

docs: add release notes for 18.2.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 785e09e3b32980380eb2081eeb48c157306f99ba)

nv50,nvc0: add missing CAPs for unsupported features

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0: enable GL_NV_shader_atomic_float on pre-Maxwell

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv50/ir: add support for converting ATOMFADD to proper ir

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

st/mesa: expose GL_NV_shader_atomic_float when ATOMFADD is supported

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: select ATOMFADD when source type is float

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: add PIPE_CAP_TGSI_ATOMFADD to indicate support

ATOMFADD is a little special -- make drivers have to specify it
explicitly.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

tgsi: add ATOMFADD operation

This is supported by at least NVIDIA hardware, and exposeable via GL
extensions.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: allow glDrawElements to work with GL_SELECT feedback

Not sure if this ever worked, but the current logic for setting the
min/max index is definitely wrong for indexed draws. While we're at it,
bring in all the usual logic from the non-indirect drawing path.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109086
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallium/ttn: Fix setup of outputs_written.

We need a 64-bit value, otherwise we only handle the low 32, and happen to
sign-extend to claim to write all varying slots if VARYING_SLOT_VAR2 was
used.

Fixes: 4d0b2c7aaac3 ("ttn: Update shader->info as we generate code.")
Reviewed-by: Rob Clark <robdclark@gmail.com>

anv: don't do partial resolve on layer > 0

We've made the choice not to use fast clears on layer > 0 with
multilayer images. This is partly because we would need to store
multiple clear colors for each layer, making the existing memory
layout, already including aux surfaces, fast clear color, image state,
etc... even more complex.

Partial resolves are the operations transfering the clear colors into
the auxiliary buffers. This operation is currently implemented in
Blorp by loading the clear color from the image's BO, into a shader
that then samples from the auxiliary buffer and writes the color only
if it isn't there already.

The problem here is that because we store only one clear color for all
layers and it is used for partial resolves. If you trigger a partial
clear on a layer > 0, then you're likely to deal with a color that is
not what you actually want. In the particular issues below, we have
multiple layers, each cleared with a different color but the partial
resolve just writes the wrong color into the auxiliary buffers for
layers > 0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108910
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Cc: mesa-stable@lists.freedesktop.org

st/nine: Increase the limit of cached ff shaders

100 is too small for some games, which triggers recompilations
every frame. Increase to 1024.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

st/nine: Add src reference to nine_context_range_upload

Just like nine_context_box_upload, nine_context_range_upload
should reference the src, which holds the ram source buffer.

Fixes: https://github.com/iXit/Mesa-3D/issues/327
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Cc: mesa-stable@lists.freedesktop.org

st/nine: Bind src not dst in nine_context_box_upload

nine_context_box_upload uploads a ram buffer (from src)
to a pipe_resource (dst).
We already have a refcount on the pipe_resource,
what needs to be protected from release is the ram buffer,
thus a reference to src.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Cc: mesa-stable@lists.freedesktop.org

st/nine: Fix volumetexture dtor on ctor failure

The dtor is called on allocation failure,
thus we must check the volumes are allocated
before trying to release them.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Cc: mesa-stable@lists.freedesktop.org

st/nine: Switch to presentation buffer if resize is detected

This enables to match the window size
on resize on all cases, as it only works
currently with presentation buffers.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

st/nine: Use helper to release swapchain buffers later

This patch introduces a structure to release the
present_handles only when they are fully released
by the server, thus making
"DestroyD3DWindowBuffer" actually release the buffer
right away when called.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

freedreno/a6xx: fix 3d texture layout

Maybe not 100% perfect, but seems to be a pretty good approximation of
that.

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno: update generated headers

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: improve setup_slices() debug msgs

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: simplify special case for 3d layout

This logic can be re-written as the two cases for 3d (ie. before/after
the miplevel sizes start reducing) vs everything else. I think it is
easier to read this way.

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno: combine fd_resource_layer_offset()/fd_resource_offset()

We really only need this logic in one place.

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: don't treat all inputs/outputs as vec4

This was a hold-over from the early TGSI days, and mostly not needed
with NIR. This avoids burning an entire 4 consecutive scalar regs
for vec3 outputs, for example. Which fixes a few places that we were
doing worse that we should on register usage.

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: fix fallout of extra assert

Fixes the following crash that happened after d6110d4d

The problem happens if we first compile a "vanilla" shader with nothing
lowered in NIR, which perform the final lowering passes on so->shader->
nir (including nir_lower_locals_to_regs()), and then later we have
compile a shader with some lowering. The second time through we would
have already done nir_lower_locals_to_regs().

Arguably this was already a bug, just one we hadn't noticed yet.

Fixes: d6110d4d547 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
Signed-off-by: Rob Clark <robdclark@gmail.com>

st/nir: Drop unused gl_program parameter in VS input handling helper.

Nobody uses this, so let's drop it. This makes the helper callable
from places without a gl_program.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/nir: Gather info after applying lowering FS variant features

DrawPixels lowering, for example, adds new varyings that need to be
accounted for in inputs_read. The earlier info gathering at link time
cannot account for this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: Combine the DrawPixels and Bitmap passthrough VS programs.

They're now identical, so we can just compile it once.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: Don't open code the drawpixels vertex shader.

Now that we always copy color, we can just use the util function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: Drop !passColor optimization in drawpixels shaders.

The glDrawPixels passthrough vertex shader copies position and texcoord
vertex attributes to varying outputs.  It also optionally copies a third
gl_Color attribute, which sometimes is unnecessary.  Until now, we've
compiled separate variants of the shader, one of which does this extra
copy, and the other of which doesn't.  We have done this since 2007.

But, the vertex shader runs for a whopping four vertices, and so the
cost of a copying a single input to output is likely inconsequential.
In theory, we could bind one fewer vertex element - but we always bind
all three regardless.  So, we don't even get that savings.

This patch unifies the two, so we always copy the optional color,
and save having to compile the variant.  It also makes the VS input
interface match up with the vertex element state without any dead
(unused) input attributes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: Drop dead 'passthrough_fs' field.

Dead since 2015 (commit 5142564734bd68f165b02e29e384ebbcf91cce38).

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radv: Fix wrongly positioned paren.

Trivial.

Fixes: 9f0bfbed11f "radv: Work around non-renderable 128bpp compressed 3d textures on GFX9."

docs: add note about using backticks for rbs in gitlab

So that gitlab will render the < and > correctly allowing the tag to be
copy-n-pasted without additional formatting.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

pci_ids: add new VegaM pci id

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org

gallivm: abort when trying to use non-existing intrinsic

Whenever llvm removes an intrinsic (we're using), we're hitting segfaults
due to llvm doing calls to address 0 in the jitted code instead.
However, Jose figured out we can actually detect this with
LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder
what got broken. (Of course, someone still needs to fix the code to
no longer use this intrinsic.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

gallivm: don't use pavg.b intrinsic on llvm >= 6.0

This intrinsic disppeared with llvm 6.0, using it ends up in segfaults
(due to llvm issuing call to NULL address in the jited shaders).
Add code doing the same thing as the autoupgrade code in llvm so it
can be matched and replaced back with a pavgb.

While here, also improve lp_test_format, so it tests both with and without
cache (as it was, it tested the cache versions only, whereas cache is
actually disabled in llvmpipe, and in any case even with it enabled
vertex and geometry shaders wouldn't use it). (Although at least for
the unorm8 uncached fetch, the code is still quite different to what
llvmpipe is using, since that would use unorm8x16 type, whereas
the test code is using unorm8x4 type, hence disabling some intrinsic
paths.)

Fixes: 6f4083143bb8 ("gallivm: use llvm jit code for decoding s3tc")
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>

travis: meson: port gallium build combinations over

This commit adds a number of build combinations:

- Gallium Drivers {SWR, RadeonSI, Others)
Each one has different LLVM requirements. Building SWR alone is twice
as slow as all other drivers combined.

- Gallium ST Clover LLVM {5,6,7}
Because C++ API changes all the time. Analogous to above building
Clover takes as much time as building all other ST combined.

- Gallium ST Others
Nouveau is used, instead of i915g since meson has explicit target
tracking. Meaning that a configure error is thrown if we use i915g
with say va, vdpau or others.

Note: LLVM prior to 5.0 is intentionally dropped. If needed we can add
that later.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

travis: meson: add explicit handling to gallium ST

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

travis: meson: explicitly control the DRI loaders

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

travis: meson: add unwind handling

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

travis: meson: use FOO_DRIVERS directly

It makes for a shorter MESON_OPTIONS and cleaner handling.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

travis: meson: enable unit tests

v2: [Emil] pass the argument directly to meson

Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

travis: Don't try to read libdrm out of configure.ac

Since we're going to delete it shortly

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

travis: meson: use native files to override llvm-config

This is the supported way to do this, and should be more robust and
reliable.

v2: [Emil]
- enable backslash escapes
- don't hardcode the path
- pass the argument directly to meson

Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

travis: printout llvm-config --version

Provides quick and easy feedback.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>

travis: meson: print the configured state

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

travis: flip to distro xenial, drop sudo false

The latter is the default these days and Travis will be removing sudo
soonish.

Flipping to xenial, allows us to remove a bunch of hacks we have. Plus
it prevents us from adding new ones, to workaround what seems like a
gcc/binutils bug. For example (from the upcoming meson build):

FAILED: ccache c++ -o src/gallium/targets/pipe-loader/pipe_r600.so ...
... src/util/libmesa_util.a ... /usr/lib/x86_64-linux-gnu/libz.so ...

src/util/libmesa_util.a(disk_cache.c.o): In function `deflate_and_write_to_disk':
_build/../src/util/disk_cache.c:746: undefined reference to `deflateInit_'
_build/../src/util/disk_cache.c:765: undefined reference to `deflate'
...

As we can see, even though libz.so is explicitly passed after the
object that requires it - the linker still fails to see the symbols.
Avoid all those situations - flip the switch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

configure: add CXX11_CXXFLAGS to LLVM_CXXFLAGS

Seemingly with LLVM7 and GCC 5.0, the former won't properly advertise
-std=c++11 and the latter will choke.

dd this temporary workaround, otherwise we'll get errors like:

In file included from /usr/include/c++/5/type_traits:35:0,
                 from /usr/lib/llvm-7/include/llvm/Support/type_traits.h:18,
                 from /usr/lib/llvm-7/include/llvm/ADT/Optional.h:22,
                 from /usr/lib/llvm-7/include/llvm/ADT/STLExtras.h:20,
                 from /usr/lib/llvm-7/include/llvm/ADT/StringRef.h:13,
                 from /usr/lib/llvm-7/include/llvm/Target/TargetMachine.h:17,
                 from ../../../src/amd/common/ac_llvm_helper.cpp:36:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

glx/test: meson: assorted include fixes

Swap '..' with the symbolic inc_glx and add glproto as dependency. That
will pull the correct include, effectively fixing the tests on macOS.

Fixes: a47c525f328 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

glx: meson: wire up the dispatch-index-check test

Accidentally dropped with earlier commit.!

Fixes: 4ccb9816737 ("meson: Use consistent style for tests")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

glx: meson: drop includes from a link-only library

When producing the final libGL.so/libGLX_mesa.so we only link the local
static helper lib (libglx). Thus there's no reason for the includes.

Fixes: a47c525f328 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>