git.libre-soc.org Git - mesa.git/log

genxml: add a sorting script

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

bin: drop unused import from install_megadrivers.py

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

anv: advertise 8 subtexel/mipmap precision bits

So far ANV was advertising 4 bits for both subTexelPrecisionBits and
mipmapPrecisionBits. But these values were not actually verified.

But it seems the right value is actually 8 bits for both cases.

Unfortunately Intel PRM does not clarify how many bits the hardware use.
For the mipmap case, there is the following reference in PRM Volume 6
(3D Media GPGPU), specifically in LOD Computation Pseudocode:

```
Bias: S4.8
MinLod: U4.8
MaxLod: U4.8
Base: U4.1
MIPCnt: U4
SurfMinLod: U4.8
ResMinLod: U4.8
``

We have other clues, though:

- On one side, dEQP-VK.texture.explicit_lod.* tests fail when using 4
bits, but work when using 8 bits. These tests try to mimic the expected
behaviour as much real as possible, and they use the reported
subTexelPrecisionBits and mipmapPrecisionBits reported to get this.

- On the other side, the equivalent driver for Windows is reporting 8
bits for both elements. Not sure if they got to verify it from the PRM
or from a diffent source.

CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

st/va: reverse qt matrix back to its original order

The quantiser matrix that VAAPI provides has been applied with inverse z-scan.
However, what we expect in MPEG2 picture description is the original order.
Therefore, we need to reverse it back to its original order.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110257
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

glsl/linker: location aliasing requires types to have the same width

From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers,
Page 67, (Location aliasing):

  " Further, when location aliasing, the aliases sharing the location
    must have the same underlying numerical type and bit
    width (floating-point or integer, 32-bit versus 64-bit, etc.) and
    the same auxiliary storage and interpolation qualification."

Additionally, we have improved the linker error descriptions.
Specifically, when taking structs into account we were producing a
linker error because we assumed that all components in each location
were used and that would cause component aliasing. This is not
accurate of the actual problem. Now, the failure specifies that the
underlying numerical type incompatibility is the cause for the
failure.

Fixes the following piglit test:

tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test

v2:
  - Do not assert if we see invalid numerical types. These come
    straight from shader code, so we should produce linker errors if
    shaders attempt to do location aliasing on variables that are not
    numerical such as records.
  - While we are at it, improve error reporting for the case of
    numerical type mismatch to include the shader stage.

v3:
  - Allow location aliasing of images and samplers. If we get these
    it means bindless support is active and they should be handled
    as 64-bit integers (Ilia)
  - Make sure we produce link errors for any non-numerical type
    for which we attempt location aliasing, not just structs.

v4:
  - Rebased with minor fixes (Andres).
  - Added fixing tag to the commit log (Andres).

v5:
  - Remove the helper function and check individually for the
    underlying numerical type and bit width (Timothy).
  - Implicitly, assume that any non-treated type which is checked for
    its underlying numerical type is either integer or
    float and has a defined bit width (Timothy).
  - Implicitly, assume that structs are the only non-treated
    non-numerical type (Timothy).
  - Improve the linker error descriptions and commit log (Andres).

Fixes: 13652e7516a ("glsl/linker: Fix type checks for location aliasing")
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

softpipe: Enable PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT

The offset alignment must be set to s16 because the tile cache is
implemented to require this.

This enables ARB_buffer_texture_range and OES_texture_buffer for
softpipe. The according deqp-gles31 tests pass.

Also update the feature table.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

softpipe: Add an extra code path for the buffer texel lookup

With buffers the addressing is done on a per-byte bases so the code
path for normal textures doesn't work properly. Also add an assert
to make sure that the bit cound for storing the X coordinate is
large enough.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

softpipe: raise number of bits used for X coordinate texture lookup

With buffers the addressing is done on a per byte basis and we with
a maximal block size of 16 byte we have to take into acount four more
bits. For simplicity just remove the TEX_TILE_SIZE_LOG2, which is 5 bit.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

softpipe: Don't use mag filter for gather op

For the gather op no magnifictaion filter is provided, so always use
the filter given for minification (which is the linear filter)

Fixes: 0dff1533f25951adda3c36be6d9efa944741befb
softpipe: Use mag texture filter also for clamped lod == 0

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

nir: Get rid of global registers

We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: Get rid of nir_register::is_packed

All we ever do is initialize it to zero, clone it, print it, and
validate it. No one ever sets or uses it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

virgl: add support for ARB_indirect_parameters

The protocol changes are already in place for it.

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

virgl: add support for ARB_multi_draw_indirect

This will pass the multi draw through to the host if it has
support for it instead of using the st to emulate it

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

virgl: add support for missing command buffer binding.

When I added indirect support I forgot this, however to use it
now we need to check for a new enough capability on the host side.

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

docs: Add NV_compute_shader_derivatives to 19.1.0 relnotes

anv: Implement VK_NV_compute_shader_derivatives

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv: Add support for DerivativeGroup capabilities

As defined in SPV_NV_compute_shader_derivatives. These control how the
invocations are arranged in a CS when doing derivative and related
operations (which are also enabled by the extension).

Since we expect valid SPIR-V, we don't need to do more work at SPIR-V
level to enable the derivative and related operations to be called.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

iris: Enable NV_compute_shader_derivatives

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

gallium: Add PIPE_CAP_COMPUTE_SHADER_DERIVATIVES

To enable NV_compute_shader_derivatives, which allows derivatives (and
texture lookups with implicit derivatives) in compute shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Advertise NV_compute_shader_derivatives

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

intel/fs: Use NIR_PASS_V when lowering CS intrinsics

This will make that step visible in NIR_PRINT=1.

v2: Also use the macro for the cleanup passes.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/fs: Don't loop when lowering CS intrinsics

This was needed when certain intrinsics were lowered to other ones
that were defined by the same pass. After 060817b2 "intel,nir: Move
gl_LocalInvocationID lowering to nir_lower_system_values" we don't
need the loop anymore.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/fs: Add support for CS to group invocations in quads

When using quads, instead of mapping the elements to the next 4 local
invocation indices, we map the two next in the "current" row and two
next in the "next row".  A side effect is that a thread will execute
the indices in a different order.

We now perform the lowering of both local invocation ID and index
together -- and don't rely anymore on lowering done by
nir_lower_system_values.  That is convenient when doing the math for
quads, because we need X and Y to get the right invocation index.

When the pass progresses, fold the constants and clean up to reduce
the noise from the indexing math.

This implements the derivative_group_quadsNV semantics from
NV_compute_shader_derivatives.

v2: Take subgroup_id into account, otherwise only values in the first
    subgroup would be used. (Jason)

v3: Calculate invocation index and ID together, to avoid duplicating
    some math in the quads case when both index and ID are used. (Jason)

v4: Don't call cleanup passes as part of the lowering, let that to the
    call site. (Jason)
    Change calculation to use less instructions. (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/fs: Use TEX_LOGICAL whenever implicit lod is supported

Make sure we include compute shaders that have a derivative group
defined.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: Don't set LOD=0 for compute shader that has derivative group

When using NV_compute_shader_derivatives to set a derivative group,
a compute shader supports texture with implicit LOD calculation, so
don't set an explicit LOD.

Note if the extension is used but the derivative group is not
specified, it will default to LOD=0 as before.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir/algebraic: Lower CS derivatives to zero when no group defined

In compute shaders if no derivative group is defined, the derivatives
will always be zero. Specified in NV_compute_shader_derivatives.

To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules. (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

glsl: Parse and propagate derivative_group to shader_info

NV_compute_shader_derivatives allow selecting between two possible
arrangements (quads and linear) when calculating derivatives and
certain subgroup operations in case of Vulkan. So parse and propagate
those up to shader_info.h.

v2: Do not fail when ARB_compute_variable_group_size is being used,
since we are still clarifying what is the right thing to do here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

glsl: Enable texture builtins for NV_compute_shader_derivatives

Renamed a few predicates from "fs_only" to be "derivative_only" (or
similar pairs).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

glsl: Enable derivative builtins for NV_compute_shader_derivatives

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

glsl: Remove redundant conditions when asserting in_qualifier

As the code evolved, we ended up with a redundant conditions. Clean
this up.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Extension boilerplate for NV_compute_shader_derivatives

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

nir/radv: remove restrictions on opt_if_loop_last_continue()

When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After: 80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

softpipe: add support for vertex streams (v2)

This enables the ARB_gpu_shader5 vertex streams on softpipe.

v2: only enable when not using llvm.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

draw: add support to tgsi paths for geometry streams. (v2)

This hooks up the geometry shader processing to the TGSI
support added in the previous commits.

It doesn't change the llvm interface other than to
keep things building.

v2: fix some regressions caused by primitiveoffsets

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

softpipe: add support for indexed queries.

We need indexed queries to retrieve the geom shader info.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

tgsi: add support for geometry shader streams.

This adds support to retrieve the primitive counts
for each stream, along with the offset for each
primitive into the output array.

It also adds support for parsing the stream argument
to the emit and end instructions.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

draw: add stream member to stats callback

This just adds space for the member to the callback, doesn't
change anything else.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

vulkan/wsi: make wl_drm optional

When wl_drm is missing and the driver supports modifiers, use
zwp_linux_dmabuf_v1 for the list of supported formats and for buffer
creation.

Limit the supported formats to those with modifiers, which are
WL_DRM_FORMAT_{ARGB8888,XRGB8888} currently.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

vulkan/wsi: add wsi_wl_display_dmabuf

Add wsi_wl_display_dmabuf for zwp_linux_dmabuf_v1-related states.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

vulkan/wsi: add wsi_wl_display_drm

Add wsi_wl_display_drm for wl_drm-related states. We will move
formats into the struct in a later commit.

Remove the unnecessary check for wl_registry_bind failures.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

vulkan/wsi: refactor drm_handle_format

Refactor the swtich statement in drm_handle_format out to
wsi_wl_display_add_wl_format.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

vulkan/wsi: create wl_drm wrapper as needed

When modifiers are specified, we have to use dmabuf rather than
wl_drm. We don't need the wrapper in that case.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

vulkan/wsi: move modifier array into wsi_wl_swapchain

This avoids repeated checks for each wsi_wl_image.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

drisw: Try harder to probe whether MIT-SHM works

XQueryExtension merely tells you whether the extension exists, it
doesn't tell you whether you're local enough for it to work.
XShmQueryVersion is not enough to discover this either, you need to
provoke the server to do actual work, and if it thinks you're remote it
will throw BadRequest at you. So send an invalid ShmDetach and use the
error code to distinguish local from remote.

[airlied: fixed bug not resetting xshm_error to 0 on success,
which made later stuff fail completely.]

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>

nir/search: Search for all combinations of commutative ops

Consider the following search expression and NIR sequence:

    ('iadd', ('imul', a, b), b)

    ssa_2 = imul ssa_0, ssa_1
    ssa_3 = iadd ssa_2, ssa_0

The current algorithm is greedy and, the moment the imul finds a match,
it commits those variable names and returns success.  In the above
example, it maps a -> ssa_0 and b -> ssa_1.  When we then try to match
the iadd, it sees that ssa_0 is not b and fails to match.  The iadd
match will attempt to flip itself and try again (which won't work) but
it cannot ask the imul to try a flipped match.

This commit instead counts the number of commutative ops in each
expression and assigns an index to each.  It then does a loop and loops
over the full combinatorial matrix of commutative operations.  In order
to keep things sane, we limit it to at most 4 commutative operations (16
combinations).  There is only one optimization in opt_algebraic that
goes over this limit and it's the bitfieldReverse detection for some UE4
demo.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
    instructions in affected programs: 1797123 -> 1789467 (-0.43%)
    helped: 6751
    HURT: 2264

    total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
    cycles in affected programs: 15931005 -> 15786914 (-0.90%)
    helped: 6024
    HURT: 3436

    total loops in shared programs: 4360 -> 4360 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 23675 -> 23666 (-0.04%)
    spills in affected programs: 235 -> 226 (-3.83%)
    helped: 5
    HURT: 1

    total fills in shared programs: 32040 -> 32032 (-0.02%)
    fills in affected programs: 190 -> 182 (-4.21%)
    helped: 6
    HURT: 2

    LOST:   18
    GAINED: 5

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>

intel: add dependency on genxml generated files

Drivers using genxml will start compilation before generated files are
created, so add a dependency to it.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Cc: mesa-stable@lists.freedesktop.org

radeonsi: fix a crash when unbinding sampler states

Acked-by: James Zhu <James.Zhu@amd.com>

radv: fix getting the vertex strides if the bindings aren't contiguous

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349
Fixes: a66b186bebf ("radv: use typed buffer loads for vertex input fetches")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

anv: implement VK_KHR_swapchain revision 70

This revision allows for images to be :

   - created by reusing image parameters from swapchain

   - bound to memory from a swapchain

v2: Add color attachment flag
    Use same implicit WSI parameters (tiling, samples, usage)

v3: Fix missing break in vk_foreach_struct_const() switch (Lionel)

v4: Fix accessing image aspects before android resolve (Tapani)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

vk/util: remove unneeded array index

This is an array of 1, so [0] is the only content, and meson already
flattens the list so this is unnecessary.
Also, all the other uses of vk_api_xml don't do that.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

ac/nir: fix intrinsic names for atomic operations with LLVM 9+

This fixes the following LLVM error when using RADV_DEBUG=checkir:
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32
i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add

The cmpswap operation still uses the old intrinsic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

panfrost: Remove "mali_unknown6" nonsense

This structure was used maaaany moons ago as a placeholder for the
varying meta (now unified with mali_attr_meta and essentially fully
decoded). I don't know why it's still in the file. Let's wack it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost/midgard: Enable lower_find_lsb

This is exactly what the blob does.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost/midgard: Add ibitcount8 op

The mechanics of this opcode are a little opaque, but essentially, it's
used in 8-bit mode to do a bit count in parallel of a uint and then
doing a ton of clever iadd/imov ops to recombine.

v2: Correct opcode. Thank you to jernej on IRC for noticing this awkward
typo!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost/midgard: Add ilzcnt op

Used for implementing findLSB/MSB

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost/midgard: Add umin/umax opcodes

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: Add tilebuffer load? branch

Also document branches better.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost/decode: Add flags for tilebuffer readback

These flags are set when reading back the tilebuffer from a fragment
shader via various mechanisms (including ARM_shader_framebuffer_fetch
and EXT_pixel_local_storage).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost/midgard: use nir_src_is_const and nir_src_as_uint

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

vc4: Prefer nir_src_comp_as_uint over nir_src_as_const_value

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

gallium/util: Add const to u_range_intersect

This doesn't modify the range, so it can accept a const pointer.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

gallium/hud: add CPU usage support for FreeBSD

Reviewed-by: Eric Engestrom <eric@engestrom.ch>

iris: Silence unused variable warnings in release mode

nir/algebraic: Add some logical OR and AND patterns

The new OR pattern has been seen in the wild and can end up being
generated by GLSLang.  Not sure about the other two new patterns but we
may as well throw them in for completeness.  While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15321227 -> 15321129 (<.01%)
    instructions in affected programs: 3594 -> 3496 (-2.73%)
    helped: 6
    HURT: 0

    total cycles in shared programs: 357481321 -> 357479725 (<.01%)
    cycles in affected programs: 44109 -> 42513 (-3.62%)
    helped: 6
    HURT: 0

VkPipeline-DB results on Kaby Lake:

    total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
    instructions in affected programs: 19058 -> 18288 (-4.04%)
    helped: 163
    HURT: 0

    total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
    cycles in affected programs: 750958 -> 736984 (-1.86%)
    helped: 158
    HURT: 1

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir/algebraic: Drop some @bool specifiers

Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time.  We can drop these specifiers and now the optimizations
will apply more generally.

Shader-DB results on Kaby Lake:

    total instructions in shared programs: 15321168 -> 15321227 (<.01%)
    instructions in affected programs: 8836 -> 8895 (0.67%)
    helped: 1
    HURT: 31

    total cycles in shared programs: 357481781 -> 357481321 (<.01%)
    cycles in affected programs: 146524 -> 146064 (-0.31%)
    helped: 22
    HURT: 10

    total spills in shared programs: 23675 -> 23673 (<.01%)
    spills in affected programs: 11 -> 9 (-18.18%)
    helped: 1
    HURT: 0

    total fills in shared programs: 32040 -> 32036 (-0.01%)
    fills in affected programs: 27 -> 23 (-14.81%)
    helped: 1
    HURT: 0

No change in VkPipeline-DB

Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb.  In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL.  Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win?  Also it helps spilling in one Car Chase compute shader.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

util: clean the 24-bit unused field to avoid an issues

This is a field of FLOAT_32_UNSIGNED_INT_24_8_REV texture pixel.
OpenGL spec "8.4.4.2 Special Interpretations" is saying:
"the second word contains a packed 24-bit unused field,
followed by an 8-bit index"
The spec doesn't require us to clear this unused field
however it make sense to do it to avoid some
undefined behavior in some apps.

Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110305
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>

nir: Take if_uses into account when repairing SSA

If a def is used as an condition before its definition, we should also
consider this a case to repair. When repairing, make sure we rewrite
any if conditions too.

Found in while inspecting a SPIR-V conversion from a 'continue block'
that contains a conditional branch. We pull the continue block up to
the beggining of the loop, and the condition in the branch ends up
defined afterwards.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 364212f1ede4b "nir: Add a pass to repair SSA form"

tegra: fix the build after the set_shader_buffers change

gallium/auxiliary/vl: Add barrier/unbind after compute shader launch.

Add memory barrier sync for multiple launch cases, and unbind completed
resources after launch.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/auxiliary/vl: Fixed blank issue with compute shader

Multiple init buffer within one open instance will cause blank issue.
Updating viewport per frame will fix this issue.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Tested-by: Bruno Milreu <bmilreu@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/auxiliary/vl: Fixed blur issue with weave compute shader

Correct wrong interpolatation with top/bottom row which caused blur issue.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Tested-by: Bruno Milreu <bmilreu@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

docs: update calendar, add news item and link release notes for 18.3.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

docs: add sha256 checksums for 18.3.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit eb9da68cbf23aafb1192beed084b2f05df65dd04)

docs: add release notes for 18.3.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b03f51c4b4dfa54775e866b75f68a41862c062c2)

nir: do not pack varying with different types

The current algorithm only supports packing 32-bit types.
If a shader uses both 16-bit and 32-bit varyings, we shouldn't
compact them together.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

softpipe: Use mag texture filter also for clamped lod == 0

Follow the spec when selecting the magnification filter (OpenGL 4.5,
section 8.14):

  If λ(x, y) is less than or equal to the constant c (see section 8.15)
  the texture is said to be magnified;

While we're here also silence a potential warning about implicit float
to double conversion.

v2: Update commit message to contain a reference to the spec as pointed
    out by Eric.

Fixes a number of dEQP GLES2 and GLES3 test out of:
dEQP-GLES2.functional.texture.filtering.*
dEQP-GLES2.functional.texture.vertex.2d.filtering.*
dEQP-GLES3.functional.texture.vertex.*.filtering.*
dEQP-GLES3.functional.texture.filtering.*
dEQP-GLES3.functional.texture.shadow.2d.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

iris: handle aux properly in iris_resource_get_handle

Disable aux when resource seen the first time and EXPLICIT_FLUSH
not being set. This fixes issues seen when launching Xorg and
CCS_E getting utilized.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

v3d: Add some more new packets for V3D 4.x.

The T/G shader references and common state will be needed for GLES 3.2.

v3d: Don't try to use the TFU blit path if a scissor is enabled.

We'll need to do a render-based blit for scissors, since the TFU (as seen
in this conditional) can only update a whole surface.

Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.")
Fixes piglit fbo-scissor-blit.

v3d: Bump the maximum texture size to 4k for V3D 4.x.

4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in
the CTS at 8k and 16k. 4k at least lets us get one 4k display working.

Cc: mesa-stable@lists.freedesktop.org

v3d: Add support for handling OOM signals from the simulator.

I have v3d allocating enough initial allocation memory that we've been
passing tests without it, but to match kernel behavior more it would be
good to actually exercise the OOM path.

mesa/main: Fix multisample texture initialize

Sampler of Multisample textures wasn't initialized correct. So when
texture object created as multisample its sampler is initialized in a
individual case. We change the initial state of TEXTURE_MIN_FILTER and
TEXTURE_MAG_FILTER to NEAREST.
These changes are approved by KhronosGroup.
https://github.com/KhronosGroup/OpenGL-API/issues/45

Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109057

glsl: Fix input/output structure matching across shader stages

Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.30 spec says:

    "Variables or block members declared as structures are considered
     to match in type if and only if structure members match in name,
     type, qualification, and declaration order."

Fixes:
     * layout-location-struct.shader_test

v2: rebased against master and small fixes

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108250

ddebug: add compute functions to help hang detection

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

iris: avoid use after free in shader destruction

While playing with compute shaders, I was getting a random crash,
noticed that bind_state was using the old shader info for comparision,
but gallium allows the shader to be deleted while bound, so this could
lead to a use after free.

This can't happen using the cso cache. As it tracks all of this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

radeonsi: set exact shader buffer read/write usage in CS

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

glsl: remember which SSBOs are not read-only and pass it to gallium

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

gallium: add writable_bitmask parameter into set_shader_buffers

to indicate write usage per buffer.
This is just a hint (it will be used by radeonsi).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

iris: Fix assert when using vertex attrib without buffer binding

The GL 4.5 spec says:
"If any enabled array’s buffer binding is zero when DrawArrays or
one of the other drawing commands defined in section 10.4 is called,
the result is undefined."

The result is undefined but it should not crash.

Fixes: gl-3.1-vao-broken-attrib
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

iris: move iris_flush_resource so we can call it from get_handle

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

iris: Save/restore MI_PREDICATE_RESULT, not MI_PREDICATE_DATA.

MI_PREDICATE_DATA is an intermediate storage for the MI_PREDICATE
command's calculations - it holds the result of the subtraction when
the compare operation is SRCS_EQUAL or DELTAS_EQUAL. But the actual
result of the predication is MI_PREDICATE_RESULT, which is what we
want to copy from the render context to the compute context.

util/process: document memory leak

We consider it acceptable, but let's still document it in case people
notice it and are not sure why it's there.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

simplify LLVM version string printing

Figure it out once in the build system, then just use that all over the place.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/u_dump: util_dump_sampler_view: Dump u.tex.first_level

Dump u.tex.first_level instead of dumping u.tex.last_level twice.

Signed-off-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: ddebug: Add missing fence related wrappers

Without that `GALLIUM_DDEBUG=always kmscube -A` would segfault like

  #0  0x0000000000000000 in  ()
  #1  0x0000ffffa72a3c54 in dri2_get_fence_fd (_screen=0xaaaaed4f2090, _fence=0xaaaaed9ef880) at ../src/gallium/state_trackers/dri/dri_helpers.c:140
  #2  0x0000ffffa8744824 in dri2_dup_native_fence_fd (drv=0xaaaaed5010c0, disp=0xaaaaed5029a0, sync=0xaaaaed9ef7c0) at ../src/egl/drivers/dri2/egl_dri2.c:3050
  #3  0x0000ffffa87339b8 in eglDupNativeFenceFDANDROID (dpy=0xaaaaed5029a0, sync=0xaaaaed9ef7c0) at ../src/egl/main/eglapi.c:2107
  #4  0x0000aaaabd29ca90 in  ()
  #5  0x0000aaaabd401000 in  ()

Signed-off-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>

st/mesa: Fix GL_MAP_COLOR with glDrawPixels GL_COLOR_INDEX

Documentation for glDrawPixels with GL_COLOR_INDEX says:
"If the GL is in color index mode, and if GL_MAP_COLOR is true,
the index is replaced with the value that it references in
lookup table GL_PIXEL_MAP_I_TO_I"

We are always in RGBA mode and there is nothing in documentation
about GL_MAP_COLOR in RGBA mode for GL_COLOR_INDEX.

Scale and bias are also only applicable for RGBA format and not
mentioned for GL_COLOR_INDEX.

Thus the behaviour will be on par with i965.

Fixes: gl-1.0-drawpixels-color-index
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

gallium/hud: fix rounding error in nic bps computation

While at it, fix typo in "rounding error" :P

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/hud: prevent buffer overflow

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/hud: fix memory leaks

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>