git.libre-soc.org Git - mesa.git/log

docs: update calendar to extended the 18.1 cycle by one more release

Due to having 2 additional RCs for 18.2.

Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Juan A. Suarez <jasuarez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>

intel: Introducing Amber Lake platform

Amber Lake uses the same gen graphics as Kaby Lake, including a id
that were previously marked as reserved on Kaby Lake, but that
now is moved to AML page.

This follows the ids and approach used on kernel's commit
e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform")

Reported-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel: aubinator: Adding missed platforms to the error message.

Many new platforms got added to gen_device_name_to_pci_device_id()
but the error message inside aubinator didn't reflected those
changes. So syncing on the same order to be sure that we are not
missing any now.

Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

i965/gen7_urb: Re-emit PUSH_CONSTANT_ALLOC on some gen9

According to internal docs, some gen9 platforms have a pixel shader push
constant synchronization issue. Although not listed among said
platforms, this issue seems to be present on the GeminiLake 2x6's we've
tested.

We consider the available workarounds to be too detrimental on
performance. Instead, we mitigate the issue by applying part of one of
the workarounds. Re-emit PUSH_CONSTANT_ALLOC at the top of every batch
(as suggested by Ken).

Fixes ext_framebuffer_multisample-accuracy piglit test failures with the
following options:
* 6 depth_draw small depthstencil
* 8 stencil_draw small depthstencil
* 6 stencil_draw small depthstencil
* 8 depth_resolve small
* 6 stencil_resolve small depthstencil
* 4 stencil_draw small depthstencil
* 16 stencil_draw small depthstencil
* 16 depth_draw small depthstencil
* 2 stencil_resolve small depthstencil
* 6 stencil_draw small
* all_samples stencil_draw small
* 2 depth_draw small depthstencil
* all_samples depth_draw small depthstencil
* all_samples stencil_resolve small
* 4 depth_draw small depthstencil
* all_samples depth_draw small
* all_samples stencil_draw small depthstencil
* 4 stencil_resolve small depthstencil
* 4 depth_resolve small depthstencil
* all_samples stencil_resolve small depthstencil

v2: Include more platforms in WA (Ken).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106865
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93355
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

imx: make use of loader_open_render_node(..) helper

Gets rid of hard-coded gpu device path.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

tegra: make use loader_open_render_node(..) helper

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

loader: add loader_open_render_node(..)

This helper is almost a 1:1 copy of tegra_open_render_node().

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

tegra: fix memory leak

Fixes: 1755f608f52 ("tegra: Initial support")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

st/dri: Don't expose sRGB formats to clients

Though the SARGB8888 format is used internally through its FourCC value,
it is not a real format as defined by drm_fourcc.h; it cannot be used
with KMS or other interfaces expecting drm_fourcc.h format codes.

Ensure we don't advertise it through the dmabuf format/modifier query
interfaces, preventing us from tripping over an assert.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 8c1b9882b2e0 ("egl/dri2: Guard against invalid fourcc formats")
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>

radv: add missing support for protected memory properties

Fixes Vulkan CTS CL#2849. Similar to the ANV driver.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: remove dead code in scan_shader_output_decl()

Never used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: remove radv_shader_context::num_output_{clips,culls}

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: adjust the cull dist mask in scan_shader_output_decl()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: get length of the clip/cull distances array from usage mask

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: do not recompute the output usage mask for clipdist twice

The shader info pass takes care of this now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: gather the output usage mask for clip/cull distances correctly

It's a special case because both are combined into a single array.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: add set_output_usage_mask() helper

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: fix passing clip/cull distances from VS to PS

CTS doesn't test input clip/cull distances for the fragment
shader stage, which explains why this was totally broken. I
wrote a simple test locally that works now.

This fixes a crash with GTA V and DXVK.

Note that we are exporting unused parameters from the vertex
shader now, but this can't be optimized easily because we don't
keep the fragment shader info...

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107477
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

egl/wayland: do not leak wl_buffer when it is locked

If color buffer is locked, do not set its wayland buffer to NULL;
otherwise it can not be freed later.

Rather, flag it in order to destroy it later on the release event.

v2: instruct release event to unlock only or free wl_buffer too (Daniel)

This also fixes dEQP-EGL.functional.swap_buffers_with_damage.* tests.

CC: Daniel Stone <daniel@fooishbar.org>
Reviewed-by: Daniel Stone <daniels@collabora.com>

ac/radeonsi: fix CIK copy max size

While adding transfer queues to radv, I started writing some tests,
the first test I wrote fell over copying a buffer larger than this
limit.

Checked AMDVLK and found the correct limit.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: fix regression in indirect input swizzles.

This fixes:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.shader_test
since I reworked the 64-bit swizzles.

Fixes: bb17ae49ee2 (gallivm: allow to pass two swizzles into fetches.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: fix tess/gs fetchs for new swizzle.

I have piglit results from my machine, but I must have messed up,
and not built mesa in between properly.

Fixes: bb17ae49ee2 (gallivm: allow to pass two swizzles into fetches.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: ignore VAO IDs equal to 0 in glDeleteVertexArrays

This fixes a firefox crash.

Fixes: 781a78914c798dc64005b37c6ca1224ce06803fc
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

Revert "intel/tools/aubwrite: Always use physical addresses for traces."

This reverts commit f8cfc7766016d0ff7d52953e7a992b1e77c521d0.

This appears to break intel_dump_gpu for Gen9 systems - I can load them
in the simulator, but nothing happens. Reverting the patch makes the
simulator properly execute our commands and shaders again.

intel/nir: Lowering image loads and stores trashes all metadata

This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8
platforms where the lack of metadata dirtying was causing another pass
to accidentally delete a much needed loop.

https://bugs.freedesktop.org/show_bug.cgi?id=107745
Fixes: 37f7983bcca1 "intel/compiler: Do image load/store lowering..."
Jason Ekstrand <jason@jlekstrand.net> writes:
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

i965/screen: Allow modifiers on sRGB formats

This effectively reverts a26693493570a9d0f0fba1be617e01ee7bfff4db which
was a misguided attempt at protecting intel_query_dma_buf_modifiers from
invalid formats. Unfortunately, in some internal EGL cases, we can get
an SRGB format validly in this function. Rejecting such formats caused
us to not allow CCS in some cases where we should have been allowing it.
This regressed the performance of some SynMark tests as well as GfxBench
ALU2, Tessellation and Manhattan 3.0 tests

There's some question of whether or not we really should be using SRGB
"fourcc" formats that aren't actually in drm_foucc.h but there's not
much harm in allowing them through here.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107223
Fixes: a26693493570 "i965/screen: Return false for unsupported..."
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

egl/dri2: Guard against invalid fourcc formats

We already reject attempts to import images with invalid fourcc formats
but don't really guard the queries all that well. This makes us error
out in any calls to eglQueryDmaBufModifiersEXT if the given format is
not a valid fourcc format. We also add an assert to ensure that drivers
don't advertise any non-fourcc formats.

Cc: mesa-stable@lists.freedesktop.org
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

egl/dri2: Add a helper for the number of planes for a FOURCC format

This also serves as a convenient "is this a fourcc format" check as well
which we'll take advantage of in the next commit.

Cc: mesa-stable@lists.freedesktop.org
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

radv/meta: Set num_components on image_store intrinsics

Now that image load/store intrinsics are variable-width, we need to set
num_components accordingly. In 15d39f474b890, both glsl_to_nir and
spirv_to_nir were updated to properly set num_components but radv meta
was left behind.

Fixes: 15d39f474b890 "nir: Make image load/store intrinsics..."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

gallivm: Detect VSX separately from Altivec

Previously gallivm would attempt to use VSX instructions on all systems
where it detected that Altivec is supported; however, VSX was added to
POWER long after Altivec, causing lots of crashes on older POWER/PPC
hardware, e.g. PPC Macs. By detecting VSX separately from Altivec we can
automatically disable it on hardware that supports Altivec but not VSX

Signed-off-by: Vicki Pfau <vi@endrift.com>

nv50: bump compat glsl level to same as core

Passes the compat piglits. I'm sure that there will be odd issues that
aren't caught by them, but at least it should basically work.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0: bump compat GLSL version to match core

This passes the handful of tests in piglit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl: avoid lowering texcoord array except in simple cases

With compat creeping up to geometry and tess shaders, lowering texcoord
accesses/writes becomes more complicated. Since it's an optimization
anyways, just avoid the complication for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

docs: update calendar 18.2.0-rc5 is out, extend to 18.2.0-rc6

Signed-off-by: Andres Gomez <agomez@igalia.com>

st/mesa, gallium: add a workaround for No Mans Sky

The spec seems clear this is not allowed but the Nvidia binary
forces apps to add layout qualifiers so this works around the
issue for No Mans Sky until the CTS can be sorted out.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glsl: add a mechanism to allow layout qualifiers on function params

The spec is quite clear this is not allowed:

    From Section 4.4. (Layout Qualifiers) of the GLSL 4.60 spec:

       "Layout qualifiers can appear in several forms of declaration.
       They can appear as part of an interface block definition or
       block member, as shown in the grammar in the previous section.
       They can also appear with just an interface-qualifier to establish
       layouts of other declarations made with that qualifier:

          layout-qualifier interface-qualifier ;

       Or, they can appear with an individual variable declared with
       an interface qualifier:

          layout-qualifier interface-qualifier declaration ;"

    From Section 4.10 (Memory Qualifiers) of the GLSL 4.60 spec:

       "Layout qualifiers cannot be used on formal function parameters,
       and layout qualification is not included in parameter matching."

However on the Nvidia binary driver they actually fail to compile
if image function params don't have a layout qualifier. This results
in applications such as No Mans Sky using layout qualifiers on params.

I've submitted a CTS test to expose this problem in the Nvidia driver
but until that is resolved this patch will help Mesa drivers work
around the issue.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glsl: skip stringification in preprocessor if in unreachable branch

This fixes compilation of some "No Mans Sky" shaders where the stringification
happens in branches intended for DX12.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

radv: Add missing checks in radv_get_image_format_properties.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

gallivm: allow to pass two swizzles into fetches.

This hijacks the top 16-bits of swizzle, to pass in the swizzle
for the second channel.

This fixes handling .yx swizzles of 64-bit values.

This should fixup radeonsi and llvmpipe.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: enable radeonsi_zerovram for No Mans Sky

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: add radeonsi_zerovram driconfig option

More and more games seem to require this so lets make it a config
option.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: enable GL 4.5 in compat profile

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: enable ARB_direct_state_access in compat for GL3.1+

We could enable it for lower versions of GL but this allows us
to just use the existing version/extension checks that are already
used by the core profile.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: add a thorough clear/copy_buffer benchmark

radeonsi: let internal compute dispatches tune WAVES_PER_SH

radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI

radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA_SI for measuring DMA on SI

DMA on SI doesn't support the timestamp packet, so it's emulated.

radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA for measuring SDMA performance

radeonsi: add flag L2_STREAM for minimal cache usage

gallium: add TGSI_MEMORY_STREAM_CACHE_POLICY

For internal radeonsi shaders.

intel/compiler: Remove surface_idx from brw_image_param

Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel: Use TXS for image_size when we have a typed surface

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

anv,i965: Lower away image derefs in the driver

Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params.  As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index.  There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.

This also has the side-effect of making most image use compile-time
binding table indices.  Previously, all image access pulled the binding
table index from a uniform.  Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart.  Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
    instructions in affected programs: 115834 -> 113255 (-2.23%)
    helped: 191
    HURT: 0

    total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
    cycles in affected programs: 4757115 -> 4642085 (-2.42%)
    helped: 73
    HURT: 67

    total spills in shared programs: 10951 -> 10926 (-0.23%)
    spills in affected programs: 742 -> 717 (-3.37%)
    helped: 7
    HURT: 0

    total fills in shared programs: 22226 -> 22201 (-0.11%)
    fills in affected programs: 1146 -> 1121 (-2.18%)
    helped: 7
    HURT: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: Add handle/index-based image intrinsics

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: Use a bitfield for image access qualifiers

This commit expands the current memory access enum to contain the extra
two bits provided for images. We choose to follow the SPIR-V convention
of NonReadable and NonWriteable because readonly implies that you *can*
read so readonly + writeonly doesn't make as much sense as NonReadable +
NonWriteable.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl/link,i965: Make ImageAccess four-state

The GLSL spec allows you to set both the "readonly" and "writeonly"
qualifiers on images to indicate that it can only be used with
imageSize.  However, we had no way of representing this int he linked
shader and flagged it as GL_READ_ONLY.  This is good from a "does it use
this buffer?" perspective but not from a format and access lowering
perspective.  By using GL_NONE for if "readonly" and "writeonly" are
both set, we can detect this case in the driver and handle it correctly.

Nothing currently relies on the type of surface in the "readonly" +
"writeonly" case but that's about to change.  i965 is the only drier
which uses the ImageAccess field and gl_bindless_image::access is
currently unused.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Use two components for 1D array image sizes

Having the array length component stored in .z was a small convenience
for the ISL image param filling code and an annoyance in the NIR
lowering code. The only convenience of treating 1D arrays like 2D
arrays in the lowering code is in the address calculation code so let's
put all the complexity there as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

isl: Use the view array length for the image size

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Do image load/store lowering to NIR

This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end.  This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down.  In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end.  On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166910 -> 15166872 (<.01%)
    instructions in affected programs: 5895 -> 5857 (-0.64%)
    helped: 15
    HURT: 0

Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/types: Add a wrapper for coordinate_components

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

anv/pipeline: Remove dead image loads in lower_input_attacnments

Dead code will get rid of them eventually but it's better if they're
just gone so we guarantee they won't trip up later passes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: Make image load/store intrinsics variable-width

Instead of requiring 4 components, this allows them to potentially use
fewer. Both the SPIR-V and GLSL paths still generate vec4 intrinsics so
drivers which assume 4 components should be safe. However, we want to
be able to shrink them for i965.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/format_convert: Fix a bitmask in unpack_11f11f10f

Fixes: 4e337b42f9a2 "nir/format_convert: Add pack/unpack for R11F_G11F_B10F"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/format_convert: Rename pack_r11g11b10f to pack_11f11f10f

This matches the unpack function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/format_convert: Add [us]norm conversion helpers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/format_convert: Rename nir_format_bitcast_uint_vec

We have a name for that, it's called a uvec. This just makes the
function name a bit shorter. While we're here, we also add an assert
for one of the assumptions this function makes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/format_convert: Add vec mask and sign-extend helpers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/format_convert: Add support for unpacking signed integers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/opcodes: Make unpack_half_2x16_split_* variable-width

There is nothing inherent about these opcodes that requires them to only
take scalars. It's very convenient if we let them take vectors as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/algebraic: Add some max/min optimizations

Found by inspection.  This doesn't help much now but we'll see this
pattern with images if you load UNORM and then store UNORM.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166916 -> 15166910 (<.01%)
    instructions in affected programs: 761 -> 755 (-0.79%)
    helped: 6
    HURT: 0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/algebraic: Add more extract_[iu](8|16) optimizations

This adds the "(a << N) >> M" family of mask or sign-extensions.  Not a
huge win right now but this pattern will soon be generated by NIR format
lowering code.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166918 -> 15166916 (<.01%)
    instructions in affected programs: 36 -> 34 (-5.56%)
    helped: 2
    HURT: 0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/algebraic: Be more careful converting ushr to extract_u8/16

If it's not the right bit-size, it may not actually be the correct
extraction. For now, we'll only worry about 32-bit versions.

Fixes: 905ff8619824 "nir: Recognize open-coded extract_u16"
Fixes: 76289fbfa84a "nir: Recognize open-coded extract_u8"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/tools: new i965_disasm tool

Adds a new i965 instruction disassemble tool

v2: 1) fix a few nits (Matt Turner)
    2) Remove i965_disasm header (Matt Turner)

v3: 1) Redirect output to correct file descriptors (Matt Turner)
    2) Refactor code (Matt Turner)
    3) Use better formatting style (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>

st/mesa: Disable blending for integer formats.

Blending isn't valid for integer formats. Rather than having drivers
worry about this, just disable blending in this case. This hopefully
will increase hits in the CSO cache as well, by eliminating most of the
meaningless fields in this case.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

svga: add missing switch cases for shadow textures

This doesn't seem to make any difference in testing, but it fixes a
failed assertion when dumping sm3 shaders.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: fix vgpu9 sprite coordinate bug

Setting GL_POINT_SPRITE_COORD_ORIGIN to GL_LOWER_LEFT did not work for
vgpu9. We can use the rasterizer sprite_coord_enable bitfield as-is.
We need to index into it using the TGSI semantic index, not the
register index.

This fixes the Piglit fbo-gl_pointcoord and glsl-fs-pointcoord tests.

Testing done: Piglit, Mesa sprite demos

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: fix PIPE_TEXTURE_RECT/BUFFER const buffer issue

The flag_rect and flag_buffer fields didn't sufficiently capture
the state changes needed for those resource types.  For example,
if a texture binding was changed from a 500x500 rect texture to a
400x400 rect texture we didn't set SVGA_NEW_TEXTURE_CONSTS.  But
we need to do that to emit the new texcoord scale factors to the
constant buffers.  Rather than track the sizes of all bound
resources, just set the flag if the resource is a rect.  Same
story with texture buffers.

Also, since rect/buffer textures are usable with VS/GS shaders,
add SVGA_NEW_TEXTURE_CONSTS to the flags we check for emitting
VS/GS constants.

This seems to help with XFCE / xfwm4 desktop scaling.
VMware issue 2156696.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: minor improvements in svga_state_constants.c

Add const qualifiers. Add 'f' suffix on floats to avoid double
promotion.

Remove unneeded shader type assertion since the switch statement
handled it already.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

anv: Free the app and engine name

Fixes: 8c048af5890d4 "anv: Copy the appliation info into the instance"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

nv50/ir: silence partitionLoadStore() unused function warning

Move this now-unused function into the existing comment block, which was its only prior use.

../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning:
unused function 'partitionLoadStore' [-Wunused-function]
partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask)

Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl/linker: Link all out vars from a shader objects on a single stage

During intra stage linking some out variables can be dropped because
it is not used in a shader with the main function. But these out vars
can be referenced on later stages which can lead to further linking
errors.

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105731

anv: blorp: support multiple aspect blits

Newer blit tests are enabling depth&stencils blits. We currently don't
support it but can do by iterating over the aspects masks (copy some
logic from the CopyImage function).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9f44745eca0e41 ("anv: Use blorp to implement VkBlitImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

mesa: allow GL_UNSIGNED_BYTE type for SNORM reads

OpenGL ES spec states:
   "For normalized fixed-point rendering surfaces, the combination format
    RGBA and type UNSIGNED_BYTE is accepted."

This fixes following failing VK-GL-CTS tests:

   KHR-GLES3.packed_pixels.pbo_rectangle.rgba8_snorm
   KHR-GLES3.packed_pixels.rectangle.rgba8_snorm
   KHR-GLES3.packed_pixels.varied_rectangle.rgba8_snorm

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=107658
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andres Gomez <agomez@igalia.com>

nir: add loop unroll support for wrapper loops

This adds support for unrolling the classic

    do {
        // ...
    } while (false)

that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.

shader-db results IVB:

total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir/opt_loop_unroll: Remove unneeded phis if we make progress

Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis. In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: add complex_loop bool to loop info

In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.

This will be used in the following patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: always attempt to find loop terminators

This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

ac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI

This fixes VM faults and corruption.

Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1

No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

No changes on any other Intel platforms.

Skylake
total instructions in shared programs: 14304261 -> 14304241 (<.01%)
instructions in affected programs: 1625 -> 1605 (-1.23%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -15.91% 4.19%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527531226 -> 527531194 (<.01%)
cycles in affected programs: 92204 -> 92172 (-0.03%)
helped: 2
HURT: 0

Haswell and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 14615730 -> 14615710 (<.01%)
instructions in affected programs: 1838 -> 1818 (-1.09%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -14.59% 3.85%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: Refactor image atomics to be a bit more like other atomics

This greatly simplifies the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1

Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/compiler: Silence unused parameter warnings in brw_eu.h

All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it. Keep the parameter,
but mark it as unused.

Silences 37 warnings like:

In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_pixel_interp_desc(const struct gen_device_info *devinfo,
^~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

i965: enable AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: add functional changes for AMD_depth_clamp_separate

Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.

z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.

v2: 1) Use better coding style (Ian Romanick)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: add EXTRA_EXT for AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for GL_AMD_depth_clamp_separate tokens

_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.

v2: Remove unnecessary parentheses (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: Add support for AMD_depth_clamp_separate

Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.

Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.

Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.

v2: 1) Remove unnecessary parentheses (Marek Olsak)
    2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
       GL_DEPTH_CLAMP only (Marek Olsak)
    3) Clamp against near and far plane separately (Marek Olsak)
    4) Clip point separately for near and far Z clipping plane (Marek
       Olsak)

v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
    and [0, max(n,f)] for far plane (Marek Olsak)

v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: Add types for AMD_depth_clamp_separate.

Add some basic types and storage for the AMD_depth_clamp_separate
extension.

v2: 1) Drop unnecessary definition (Marek Olsak)
2) Expose extension in compatibility profile (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glapi: define AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>