git.libre-soc.org Git - mesa.git/log

intel/batch_decoder: Fix dynamic state printing

Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address. Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/decoder: Print ISL formats for vertex elements

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/decoder: Clean up field iteration and fix sub-dword fields

First of all, setting iter->name in advance_field is unnecessary because
it gets set by gen_decode_field which gets called immediately after
gen_decode_field in the one call-site.  Second, we weren't properly
initializing start_bit and end_bit in the initial condition of
gen_field_iterator_next so the first field of a struct would get printed
wrong if it doesn't start on the first bit.  This is fixed by adding a
iter_start_field helper which sets the field and also sets up the other
bits we need.  This fixes decoding of 3DSTATE_SBE_SWIZ.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.

Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.

Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.

This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).

intel: decoder: unify MI_BB_START field naming

The batch decoder looks for a field with a particular name to decide
whether an MI_BB_START leads into a second batch buffer level. Because
the names are different between Gen7.5/8 and the newer generation we
fail that test and keep on reading (invalid) instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

docs: Update calendar, news, relnotes for 18.1.7

docs: Add mesa 18.1.7 notes

docs: Add mesa 18.1.7 docs

docs: update calendar 18.2.0-rc4 is out, extend to 18.2.0-rc5

Signed-off-by: Andres Gomez <agomez@igalia.com>

docs/relnotes: Mark NV_fragment_shader_interlock support in i965

Acked-by: Jason Ekstrand <jason@jlekstrand.net>

egl/drm: use gbm_dri_bo() wrapper

Remove the explicit cast, using the appropriate wrapper instead.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>

egl/drm: use gbm_dri_surface() wrapper

Remove the explicit cast, using the appropriate wrapper instead.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>

egl/drm: use gbm_dri_device() wrapper

Remove the explicit cast, using the appropriate wrapper instead.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>

egl/android: simplify device open/probe

Currently droid_probe_device, does not do any 'probing' but filtering
out a device if it doesn't match the vendor string given.

Rename the function, straighten the return type and call it only as
needed - an actual vendor string is provided.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>

egl/android: remove drmVersion::name NULL check

The name string is guaranteed to be non-NULL.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>

egl/android: remove droid_probe_driver()

The function name is misleading - it effectively checks if
loader_get_driver_for_fd fails. Which can happen only only on strdup
error - a close to impossible scenario.

Drop the function - we call the loader API at at later stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>

egl/android: use strcmp with drmVersion::name

The name string is guaranteed to be NULL terminated. Drop the explicit
length check that comes with strncmp().

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>

egl/android: use drmDevice instead of the manual /dev/dri iteration

Replace the manual handling of /dev/dri in favor of the drmDevice API.
The latter provides a consistent way of enumerating the devices,
providing device details as needed.

v2:
- Use ARRAY_SIZE (Frank)
- s/famour/favor/ typo (Frank)
- Make MAX_DRM_DEVICES a macro - fix vla errors (RobF)
- Remove left-over dev_path instance (RobF)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com> (v1)
Reviewed-by: Tomasz Figa <tfiga@chromium.org>

Revert "configure: allow building with python3"

This reverts commit ae7898dfdbe5c8dab7d11c71862353f1ae43feb0.

Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.

Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)

https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html

Revert "travis: use python3 for the autoconf builds"

This reverts commit 855af9a5a209f061355513b92f3ba4576f48d091.

Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.

Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)

https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html

Revert "mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES"

This reverts commit 095515e16ca3cb2c9f1813b6602ee57ae28325a8.

This breaks KHR-GL46.map_buffer_alignment.functional on i965.

This code was apparently not reviewed and I don't know why we would
move from a driver configurable constant to a hardcoded value for all
drivers. This really looks like an accidental hack push.

Revert recent changes about not including compute in combined limits.

As far as I can tell, no one reviewed these changes, they made i965
assert fail on driver load, and I am not certain they are correct.
(Hopefully reverting these does not break radeonsi too badly...)

The uniform related changes seem fine and reasonable, but the texture
image units change is possibly incorrect.  According to the
OES_tessellation_shader spec issue 5:

   (5) How are aggregate shader limits computed?

    RESOLVED: Following the GL 4.4 model, but we restrict uniform
    buffer bindings to 12/stage instead of 14, this results in

        MAX_UNIFORM_BUFFER_BINDINGS = 72
            This is 12 bindings/stage * 6 shader stages, allowing a static
            partitioning of the bindings even though at most 5 stages can
            appear in a program object).
        MAX_COMBINED_UNIFORM_BLOCKS = 60
            This is 12 blocks/stage * 5 stages, since compute shaders can't
            be mixed with other stages.
        MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
            This is 16 textures/stage * 6 stages.

which definitely is including compute shaders in that last limit.
Not including compute shaders breaks the following test:
dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger

There was enough breakage that I figured we should just send this back
to the drawing board.

Revert "i965: don't include compute resources in "Combined" limits"
Revert "st/mesa: don't include compute resources in "Combined" limits"
Revert "mesa: don't include compute resources in MAX_COMBINED_* limits"

This reverts commit b03dcb1e5f507c5950d0de053a6f76e6306ee71f.
This reverts commit cff290df4c09547cd2cb3b129ec59bdebdadba90.
This reverts commit 45f87a48f94148b484961f18a4f1ccf86f066b1c.

gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0

These have been removed. Unfortunately auto-upgrade doesn't work for
jit. (Worse, it seems we don't get a compilation error anymore when
compiling the shader, rather llvm will just do a call to a null
function in the jitted shaders making it difficult to detect when
intrinsics vanish.)

Luckily the signed ones are still there, I helped convincing llvm
removing them is a bad idea for now, since while the unsigned ones have
sort of agreed-upon simplest patterns to replace them with, this is not
the case for the signed ones, and they require _significantly_ more
complex patterns - to the point that the recognition is IMHO probably
unlikely to ever work reliably in practice (due to other optimizations
interfering). (Even for the relatively trivial unsigned patterns, llvm
already added test cases where recognition doesn't work, unsaturated
add followed by saturated add may produce atrocious code.)
Nevertheless, it seems there's a serious quest to squash all
cpu-specific intrinsics going on, so I'd expect patches to nuke them as
well to resurface.

Adapt the existing fallback code to match the simple patterns llvm uses
and hope for the best. I've verified with lp_test_blend that it does
produce the expected saturated assembly instructions. Though our
cmp/select build helpers don't use boolean masks, but it doesn't seem
to interfere with llvm's ability to recognize the pattern.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

st/mesa: expose KHR_texture_compression_astc_sliced_3d

This is ASTC 2D LDR allowing texture arrays and 3D, compressing each
slice as a separate 2D image. Tested by piglit. Trivial.

st/mesa: expose EXT_disjoint_timer_query

same cap as ARB_timer_query, no changes needed, tested by piglit

mesa: expose EXT_vertex_attrib_64bit

because the closed driver exposes it.
It's the same as the ARB extension.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: expose AMD_query_buffer_object

it's a subset of the ARB extension.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: expose AMD_multi_draw_indirect

because the closed driver exposes it.
This is equivalent to the ARB extension.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: expose AMD_gpu_shader_int64

because the closed driver exposes it.

It's equivalent to ARB_gpu_shader_int64.
In this patch, I did everything the same as we do for ARB_gpu_shader_int64.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: expose ARB_post_depth_coverage in the Compatibility profile

It only contains GLSL changes.

v2: allow the layout qualifier on GLSL <= 1.30

intel/nir: Enable nir_opt_find_array_copies

We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15176942 -> 15176942 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add an array copy optimization

This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array. The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore. It can always be improved later if needed.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

intel/nir: Use nir_shrink_vec_array_vars

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15176765 (<.01%)
    instructions in affected programs: 4259 -> 3419 (-19.72%)
    helped: 1
    HURT: 0

    total spills in shared programs: 10954 -> 10855 (-0.90%)
    spills in affected programs: 295 -> 196 (-33.56%)
    helped: 1
    HURT: 0

    total fills in shared programs: 22222 -> 22117 (-0.47%)
    fills in affected programs: 417 -> 312 (-25.18%)
    helped: 1
    HURT: 0

The helped shader is from the OglCSDof synmark test.  On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add a array-of-vector variable shrinking pass

This pass looks for variables with vector or array-of-vector types and
narrows the type to only the components used.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

intel/nir: Use the new structure and array splitting passes

We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go.  We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15177605 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally.  It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add an array splitting pass

This pass looks for array variables where at least one level of the
array is never indirected and splits it into multiple smaller variables.

This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through arrays of arrays and can detect indirects on just
one level or even see that arr[i][0][5] does not alias arr[i][1][j].
This pass exists to help other passes more easily see through arrays of
arrays. If a back-end does implement arrays using scratch or indirects
on registers, having more smaller arrays is likely to have better memory
efficiency.

v2 (Jason Ekstrand):
- Better comments and naming (some from Caio)
- Rework to use one hash map instead of two

v2.1 (Jason Ekstrand):
- Fix a couple of bugs that were added in the rework including one
which basically prevented it from running

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add a structure splitting pass

This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split". This
pass exists to help other passes more easily see through structure
variables. If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir/types: Add array_or_matrix helpers

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>

i965: don't include compute resources in "Combined" limits

The combined limits should only include shader stages that can be active
at the same time. We don't need to include compute.

See also cff290df4c09547cd2cb3b129ec59bdebdadba90 for st/mesa.

Unbreaks i965 from assert failing on driver load since Marek's
45f87a48f94148b484961f18a4f1ccf86f066b1c, which dropped the core
Mesa capabilities before adjusting driver limits down to match.

radeonsi: increase the maximum UBO size to 2 GB

Same as the closed driver.

This causes a failure in GL45-CTS.compute_shader.max, which has a trivial
bug.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

radeonsi: bump MAX_GS_INVOCATIONS

same as the closed driver

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZE

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

gallium: add PIPE_CAP_MAX_GS_INVOCATIONS

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

tgsi/ureg: don't call tgsi_sanity when it's too slow

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

st/mesa: fix up uniform limits to be able to expose large UBOs

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

st/mesa: don't include compute resources in "Combined" limits

The combined limits should only include shader stages that can be active
at the same time.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

st/mesa: set ctx->Const.SubPixelBits

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

glsl: fix error checking against MAX_UNIFORM_LOCATIONS

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

mesa: make MaxCombinedUniformComponents 64-bit to allow large UBOs

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

mesa: add ctx->Const.MaxGeometryShaderInvocations

radeonsi wants to report a different value

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

mesa: don't include compute resources in MAX_COMBINED_* limits

5 is the maximum number of shader stages that can be used by 1 execution
call at the same time (e.g. a draw call). The limit ensures that each
stage can use all of its binding points.

Compute is separate and doesn't need the 5x multiplier.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES

same number as our closed GL driver

v2: don't use MaxArrayLockSize

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

mesa: remove incorrect change for EXT_disjoint_timer_query

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

glapi: actually implement GL_EXT_robustness for GLES

The extension was exposed but not the functions.

This fixes:
    dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.readn_pixels
    dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformfv
    dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformiv

Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

intel/decoder: Decode SFIXED values.

This lets us example SAMPLER_STATE's LOD Bias field, among other things.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

travis: use python3 for the autoconf builds

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

configure: allow building with python3

Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.

Note:
- python3.4 is used as it's the earliest supported version
- python3 chosen prior to python2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>

bin/git_sha1_gen.py: remove execute bit/shebang

The script is executed explicitly via the build system, that uses
PYTHON/prog_python and equivalent.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

vk/wsi: avoid reading uninitialised memory

It will be ignored by x11_swapchain_result() anyway (because reaching
the `fail` label without setting `result` means the swapchain status was
already a hard error), but the compiler still complains about reading
uninitialised memory.

While at it, drop the unused assignment right before returning.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

egl: drop unused _EGL_BUILT_IN_DRIVER_DRI2

Unused since b174a1ae720cb404738c "egl: Simplify the "driver" interface".

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>

radv/gfx9: implement coherent shaders for VK_ACCESS_SHADER_READ_BIT

Single-sample color and single-sample depth (not stencil)
are coherent with shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl

bin/install_megadrivers.py: Remove shebang and executable bit

Since the script is never executed directly, but launched by Meson as an
argument to the Python interpreter, those are not needed any more.

In addition, they are the reason this script was missed when I moved the
Meson buildsystem to Python 3, so removing them helps avoiding future
confusion.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: Run the install script with Python 3

The script was being run directly as an executable, and it has a
Python 2 shebang.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

glsl: remove execute bit and shebang from python tests

Just like the rest of the tree - these should be run either as part of
the build system check target, or at the very least with an explicitly
versioned python executable.

Fixes: db8cd8e3677 ("glcpp/tests: Convert shell scripts to a python script")
Fixes: 97c28cb0823 ("glsl/tests: Convert optimization-test.sh to pure python")
Fixes: 3b52d292273 ("glsl/tests: reimplement warnings-test in python")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: update required mako version

The requirement was bumped a while back, but we forgot to update the
docs.

Fixes: ed871af91c2 ("configure.ac: raise Mako required version to
0.8.0")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

configure: use distutils in ax_check_python_mako_module

Handling the version comparison by hand is a bad idea. Python has a handy
module distutils for that - use it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

configure: enforce python 2.7 with AM_PATH_PYTHON

Currently we use AC_CHECK_PROGS looking for python2.7, python2 and
finally python. That is due to the varying names used across the
different OS.

Use the handy AM_PATH_PYTHON which finds the correct name and checks for
the version.

Note: python2.7 has been an unofficial requirement for quite some time.
Update the docs to reflect that.

Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

i965: Enable INTEL_shader_atomic_float_minmax on Gen9+

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

i965: Sort Gen9+ extension enables

This is a strictly alphabetic sort, as is done in extensions_table.h
There are other options. We should pick one and document it. Right
now, this file is chaos.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages

v2: Split changes to the message type field to another patch. Suggested
by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

intel/compiler: Expand untyped atomic message type field by a bit

This is necessary for a new Gen9 message type that will be added in the
next patch. There are also Gen8 message types that need the extra bit
(mostly for bindless).

v2: Split off from the next patch. Suggested by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

intel/compiler: Silence unused parameter warnings

src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’:
src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter]
__attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                             ^~~~~
src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter]
__attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {}
                                                                          ^~
src/intel/compiler/brw_disasm.c: In function ‘src_ia1’:
src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter]
         unsigned _reg_file,
                  ^~~~~~~~~
src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’:
src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter]
                                 unsigned dims, unsigned size,
                                                         ^~~~

v2: Update commit message.  brw_fs_generator.cpp warnings were already
fixed by another patch.  Noticed by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add floating point atomic min, max, and compare-swap instrinsics

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add floating point atomic add instrinsics

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

glsl: Add support for lowering shared-variable float atomics

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

glsl: Add support for lowering SSBO float atomics

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

glsl: Add built-in functions for INTEL_shader_atomic_float_minmax

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

mesa: Extension boilerplate for INTEL_shader_atomic_float_minmax

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

docs: Initial version of INTEL_shader_atomic_float_minmax spec

v2: Describe interactions with the capabilities added by
SPV_INTEL_shader_atomic_float_minmax

v3: Remove 64-bit float support.

v4: Explain NaN issues. Explain issues with atomicMin(-0, +0) and
atomicMax(-0, +0).

v5: Fix whitespace issues noticed by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

glsl: Add built-in functions for NV_shader_atomic_float

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

mesa: Extension boilerplate for NV_shader_atomic_float

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

meson: fix egl build for android

Haven't tested this, but we do include loader.h
in platform_android.c

Fixes: c5ec1556859b7d33637c9fad13d3473c7b2f9eb3 ("meson: wire up egl/android")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

meson: fix egl build for surfaceless

Without this, I get:

> platform_surfaceless.c:38:10: fatal error: 'loader.h' file not found
> #include "loader.h"
> ^~~~~~~~~~
> 1 error generated.

Fixes: 108d257a16859898f5ce02f4759c5c58f9b8c050 ("meson: build libEGL")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
v2: Split up patches, modify commit message (Dylan)

nir: Give end_block its own index

Since there's no particular reason for the index to be 0, choose an
index that is not used by other block. This is convenient when we
store "per-block" data in an array AND look for the successors
data (e.g. any kind of backwards data-flow analysis).

v2: Add a note about end_block's index. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: Skip common instructions when comparing deref paths

Deref paths may share the same deref instructions in their chains,
e.g.

    ssa_100 = deref_var A
    ssa_101 = deref_struct "array_field" of ssa_100
    ssa_102 = deref_array "[1]" of ssa_101
    ssa_103 = deref_struct "field_a" of ssa_102
    ssa_104 = deref_struct "field_a" of ssa_103

when comparing the two last deref instructions, their paths will share
a common sequence ssa_100, ssa_101, ssa_102.  This patch skips to next
iteration if the deref instructions are the same.  Path[0] (the var)
is still handled specially, so in the case above, only ssa_101 and
ssa_102 will be skipped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: Export deref comparison functions

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

util/dynarray: add a clone function

v2: Fix mem_ctx parameter type. (Thomas)

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

amd/addrlib: Fix include path for c99_compat.h

Without this patch mesa doesn't compile:

In file included from ../mesa-9999/src/amd/addrlib/addrinterface.cpp:39:
../mesa-9999/src/util/macros.h:29:10: fatal error: c99_compat.h: No such file or directory
#include "c99_compat.h"
^~~~~~~~~~~~~~
compilation terminated.

Fixes: 15ca5ce99a80d9ebb5ef2b1aca6ea00784931de4
("amd/addrlib: mark returnCode as MAYBE_UNUSED in")
Signed-off-by: Mariusz Ceier <mceier+mesa-dev@gmail.com>
Acked-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

vulkan/wsi: fix pointer-integer conversion warnings

For 32bit build. Trivial.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: use different builtin shader cache for 32bit

Currently if 64bit and 32bit programs are used interchangeably, radv
will keep overwriting the cache. Use separate cache files to avoid
that.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: place pointer length into cache uuid

Thanks to reproducible builds, binary file timestamps may be identical
for both 32bit and 64bit packages when built from the same source.
This means radv will use the same cache for both 32 and 64 bit
processes, which leads to crashes.

Conveniently there is a spare byte in cache_uuid, let's place the
pointer size there.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
CC: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107601
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105904
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

llvmpipe: add cc clobber to inline asm

The bsr instruction modifies flags, so that needs to be indicated to the
compiler. No effect on generated code, but still needed for correctness.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

intel/isl: Avoid tiling some 16K-wide render targets

Fix rendering issues on BDW and SKL.

Fixes: 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3
("i965/miptree: Use the correct BLT pitch")

Fixes the following regressions seen

exclusively on SKL:
* KHR-GL46.texture_barrier_ARB.disjoint-texels
* KHR-GL46.texture_barrier_ARB.overlapping-texels
* KHR-GL46.texture_barrier.disjoint-texels
* KHR-GL46.texture_barrier.overlapping-texels

and both on BDW and SKL:
* GTF-GL46.gtf21.GL2FixedTests.buffer_corners.buffer_corners
* GTF-GL46.gtf21.GL2FixedTests.stencil_plane_corners.stencil_plane_corners

v2: Note the fixed tests (Andres).
Don't cause failures with multisampled buffers (Andres).
Don't hamper SKL GT4 (Ken).
v3: Fix the Fixes tag (Dylan).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107359
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/miptree: Fix can_blit_slice()

Check the destination's row pitch against the BLT engine's row pitch
limitation as well.

Fixes: 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3
("i965/miptree: Use the correct BLT pitch")

v2: Fix the Fixes tag (Dylan).
Check the destination row pitch (Chris).

Reported-by: Dylan Baker <dylan@pnwbakers.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/miptree: Use miptree_map in map_blit functions

This struct contains all the data of interest. can_blit_slice() will use
it in the next patch to calculate the correct pitch.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/tools/aubwrite: Always use physical addresses for traces.

It looks like we can't rely on the simulator to always translate virtual
addresses to physical ones correctly. So let's use physical everywhere.

Since our current GGTT maps virtual to physical addresses in a 1:1 way,
no further changes are required.

Additionally, we have other address spaces not in use right now. So
let's make it easier to switch which one we are using but putting the
default one into the aub_file struct.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/tools/aubwrite: Rename "legacy" to "Trace Block".

Hopefully it's a little more descriptive, and more accurate.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

nir/vars_to_ssa: Don't build deref nodes for non-local variables

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

ac: fix WAITCNT flags for GFX9

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

amd/addrlib: mark physicalSliceSize as MAYBE_UNUSED in Addr::V1::EgBasedLib::HwlGetSizeAdjustmentMicroTiled

Only used, when asserts are enabled.

Fixes an unused-but-set-variable warning with GCC 8:
../../../src/amd/addrlib/r800/egbaddrlib.cpp: In member function 'virtual long long unsigned int Addr::V1::EgBasedLib::HwlGetSizeAdjustmentMicroTiled(unsigned int, unsigned int, ADDR_SURFACE_FLAGS, unsigned int, unsigned int, unsigned int, unsigned int*, unsigned int*) const':
../../../src/amd/addrlib/r800/egbaddrlib.cpp:4111:13: warning: variable 'physicalSliceSize' set but not used [-Wunused-but-set-variable]
UINT_64 physicalSliceSize;
^~~~~~~~~~~~~~~~~

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>