git.libre-soc.org Git - mesa.git/log

panfrost: Only take the fast paths on buffers aligned to block size

As the functions operate on 16-byte blocks.

Fixes this Valgrind error:

Invalid read of size 4
   at 0x5857568: swizzle_bpp1_align16 (pan_swizzle.c:85)
   by 0x585780F: panfrost_texture_swizzle (pan_swizzle.c:171)
   by 0x584F587: panfrost_tile_texture (pan_resource.c:489)
   by 0x584F641: panfrost_transfer_unmap (pan_resource.c:525)
   by 0x587718D: u_transfer_helper_transfer_unmap (u_transfer_helper.c:516)
   by 0x5875D85: pipe_transfer_unmap (u_inlines.h:515)
   by 0x5875F13: u_default_texture_subdata (u_transfer.c:80)
   by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480)
   by 0x54005BB: st_TexImage (st_cb_texture.c:1709)
   by 0x5391353: teximage (teximage.c:3105)
   by 0x5391353: teximage_err (teximage.c:3132)
   by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170)
   by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833)
Address 0x1e94f1e8 is 0 bytes after a block of size 16 alloc'd
   at 0x483F5C8: malloc (vg_replace_malloc.c:299)
   by 0x584F47D: panfrost_transfer_map (pan_resource.c:467)
   by 0x587694D: u_transfer_helper_transfer_map (u_transfer_helper.c:243)
   by 0x5875EA7: u_default_texture_subdata (u_transfer.c:59)
   by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480)
   by 0x54005BB: st_TexImage (st_cb_texture.c:1709)
   by 0x5391353: teximage (teximage.c:3105)
   by 0x5391353: teximage_err (teximage.c:3132)
   by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170)
   by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833)
   by 0x4DA8AB: glu::CallLogWrapper::glTexImage2D(unsigned int, int, int, int, int, int, unsigned int, unsigned int, void const*) (in /home/tomeu/deqp-build/modules/gles2/deqp-gles2)

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: 19.1 <mesa-stable@lists.freedesktop.org>

panfrost: Fix two uninitialized accesses in compiler

Valgrind was complaining of those.

NIR_PASS only sets progress to TRUE if there was progress.

nir_const_load_to_arr() only sets as many constants as components has
the instruction.

This was causing some dEQP tests to flip-flop, such as:

dEQP-GLES2.functional.fragment_ops.blend.equation_src_func_dst_func.add_src_color_constant_color

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: 14531d676b11 ("nir: make nir_const_value scalar")

panfrost: ci: Skip running some tests

These tests add too much time to the total run time, and some of them
even hang the DUTs, even if I haven't been able to reproduce it locally.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: ci: Don't restart Weston

There doesn't seem to actually be any noticeably memory leaks on Weston
when running dEQP. We do seem to leak quiet a bit in the client, so we
still have to run the dEQP runner in batches.

This removes the risk of Weston not restarting properly and introducing
spurious failures.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: ci: Update list of expected failures

This matches the current state of things on both RK3288 and RK3399.
Hopefully, from now on we'll only remove stuff from this list.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: ci: Tweak dEQP to improve throughput

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: ci: Fix list of tests to run

Make sure we have only test case names in the list, excluding names of
test groups.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: ci: Check for incomplete runs

To improve robustness, check that we got the expected number of results.
Right now we hard-code the expected number of tests run, but with some
effort we may be able to infer it.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: ci: Add tests to flip-flop list

These tests aren't giving reliable results. Mask them for now.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

panfrost: ci: Add support for running the tests on RK3288

Build artifacts for armhf and schedule them on a Veyron Chromebook with
RK3288.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

lima: fix tile buffer reloading

Buffer needs to be reloaded every time unless explicit clear() was
called.

Fixes rendering issues with wayland compositors.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>

anv: Remove special allocation for anv_push_constants

The key reason for that mechanism is gone: all the extra optional data
that could be in the anv_push_constants was moved elsewhere.  At this
point, just put anv_push_constants directly in anv_cmd_state (part of
anv_cmd_buffer).

v2: Remove a NULL check we don't need anymore in
    anv_cmd_buffer_push_constants().  (Lionel)
    Fix size we consider for valid push params.  (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

iris: Expose PIPE_CAP_DEVICE_RESET_STATUS_QUERY

This provides a way for the application to query whether any resets have
happened, which lets us expose "robust" contexts. This also enables the
KHR_robust_buffer_access_behavior tests.

iris: Hook up device reset callbacks

This mechanism lets the driver inform the state tracker about GPU
resets, say for destroying a robust API context and reporting a "device
lost" error to the application, making it take action to deal with this.

iris: Try to recover from GPU hangs.

The iris batch module now tries to detect that the kernel has banned
our GEM context, creates a new non-banned context, and informs the
iris context module that all assumptions about state are now invalid
and it needs to reinitialize the relevant state.

Based on Chris Wilson's work, but significantly rewritten by me.

iris: Add helpers to clone a hardware context.

(Chris Wilson wrote this code in a patch titled "i965: Be resilient in
the face of GPU hangs"; Ken fixed a bug and copied it to iris.)

iris: Mark render batches as non-recoverable.

Adapted from Chris Wilson's patch.  The comment is largely his.

Currently, when iris hangs the GPU, it will continue sending batches
which incrementally update the state, assuming it's preserved across
batches.  However, the kernel's GPU reset support reinitializes the
guilty context to the default GPU state (reasonably not wanting to
trust the current state).  This ends up resetting critical things
like STATE_BASE_ADDRESS, causing memory accesses in all subsequent
batches to be garbage, and almost certainly result in more hangs
until we're banned or we kill the machine.

We now ask the kernel to ban our render context immediately, so we
notice we've gone off the rails as fast as possible.  Eventually, we'll
attempt to recover and continue.  For now, we just avoid torching the
GPU over and over.

freedreno/ir3: fix rasterflat/glxgears

Ofc legacy gl features that are broken don't trigger fails in deqp. I
should remember to test glxgears more often.

Fixes: 7ff6705b8d8 freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>

anv: Use corresponding type from the vector allocation

We didn't notice this issue much because the 2 struct share a similar
layout, expect for the additional fields...

We run into that issue in Anv :

==15236== Invalid write of size 8
==15236==    at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211)
==15236==    by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264)
==15236==    by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312)
==15236==    by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167)
==15236==    by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190)
==15236==    by 0x8CF60871: alloc_surface_state (anv_image.c:1122)
==15236==    by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519)
==15236==    by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358)
==15236==  Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd
==15236==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15236==    by 0x8D2578E6: u_vector_init (u_vector.c:47)
==15236==    by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168)
==15236==    by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921)
==15236==    by 0x8CF56517: anv_CreateDevice (anv_device.c:1909)
==15236==    by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073)
==15236==    by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so)
==15236==    by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so)
==15236==    by 0x8BCB35C6: loader_create_device_chain (loader.c:5449)
==15236==    by 0x8BCBC230: vkCreateDevice (trampoline.c:838)

v2: Rename mmap_cleanups to avoid confusion (Caio)

v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

docs: update calendar, and news item and link release notes for 19.0.4

docs: Add SHA256 sums for mesa 19.0.4

Docs: add 19.0.4 release notes

mesa: fix GL_PROGRAM_BINARY_RETRIEVABLE_HINT handling

When first implemented in fefd03e16c16 Mesa's behavior was aligned on behavior
of Nvidia's driver. This caused a failing test in piglit but was ok since the
specification is unclear on this subject.

Nvidia's driver behavior has been modified because using version 410.104, the
problematic test (program_binary_retrievable_hint) now passes.

This commit defers BinaryRetrievableHint update until the next linking so the
test passes on Mesa as well.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

nir: Initialize lower_flrp_progress everywhere

I don't know why I thought NIR_PASS always set the progress variable.
Derp.

Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989

gallium: fix typo in comment

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>

meson: fix a couple typos in comments

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>

i965_asm: avoid free()ing uninitialized pointers

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

i965_asm: fix memleak

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

radv: fix setting the number of rectangles when it's dyanmic

We need to know the number of rectangles.

This fixes new CTS dEQP-VK.draw.discard_rectangles.dynamic_*.

Fixes: 5db0bf99944 ("radv: Implement VK_EXT_discard_rectangles.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

iris: Reorganise execbuf to have a single point of failure

Propagate the failure from GEM_EXECBUFFER2, cleanup then report failure
if need be. We retain the current behaviour to abort() at the first sign
of trouble -- for a non-robustness context, arguably this is the right
thing to do as the client cannot recover, and the system state is lost.
How to properly integrate with KHR_robustness and reset-strategy is
left as a future exercise.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

drm-uapi: Update i915_drm.h for I915_CONTEXT_PARAM_RECOVERABLE

Pull i915_drm.h to include

kernel commit ba4fda620a5f7db521aa9e0262cf49854c1b1d9c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Feb 18 10:58:21 2019 +0000

drm/i915: Optionally disable automatic recovery after a GPU reset

for improved resilience in handling GPU hangs.

kmsro: add _dri.so to two of the kmsro drivers.

Fixes: 8cfc17bdda3 (kmsro: Add the rest of the current set of tinydrm drivers.)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

iris: Report the same video memory settings as i965.

This just copy and pastes Ian's code from i965.

gitlab-ci: add the vulkan overlay layer to the vulkan build

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

gitlab-ci: add the vulkan overlay layer to the vulkan build

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
[ Michel Dänzer: Take changes affecting the docker image from !299,
plus remove the unzip package again before generating the image ]

gitlab-ci: Don't install WINE packages

They were just making the docker image larger for no benefit at this
point.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

gitlab-ci: Reorder jobs a bit to be generally ordered longer => shorter

This makes the longer jobs likely to run earlier, which can help the
overall pipeline duration.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

gitlab-ci: Build clover against all supported versions of LLVM

And consolidate it all into a single job.

It doesn't take much longer than a single version, thanks to ccache.
Overall, this single job might be faster or at least use fewer CPU
cycles than the two jobs before, while covering thrice as many versions
of LLVM.

v2:
* Move "rm -rf _build" to meson-build.sh.
* Set GALLIUM_DRIVERS the same way both times in the meson-clover job,
for symmetry.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1

gitlab-ci: Move meson job script to separate file

No functional change intended (except for no longer running meson
--version separately, as the version appears early in meson's output
anyway).

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

gitlab-ci: Remove superfluous comment about image tag counter suffix

We really shouldn't ever need a suffix, otherwise it indicates a failure
in coordination. :) In which case, it doesn't really matter how the tag
is disambiguated.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

meson: Force the use of config-tool for llvm

meson git now has a cmake find method for llvm, but it lacks a couple of
features that we use from the config tool version. Until that reaches
parity we need to use the config-tool version.

CC: 19.0 19.1 <<mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

gallium/util: fix two MSVC compiler warnings

Remove stray const qualifier.
s/unsigned/enum tgsi_semantic/

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallium/pp: s/uint/enum tgsi_semantic/ to fix MSVC warning

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

noop: s/enum pipe_transfer_usage/unsigned/ to fix MSVC warning

The function pointer declaration in pipe_context uses unsigned
for the bitmask.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

ddebug: fix a few MSVC compiler warnings

Don't return an expression in void functions.
Replace an unsigned int with proper enum.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

glsl: s/GLboolean/bool/ to silence MSVC compiler warning

It complains about mixing GLboolean and bool in the |= expression.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

nir/flrp: Reassociate add in flrp(±1, b, c) lowering path

With this reassociation, this lowering path is still beneficial.

Ice Lake
total instructions in shared programs: 17220191 -> 17207181 (-0.08%)
instructions in affected programs: 999871 -> 986861 (-1.30%)
helped: 3703
HURT: 17
helped stats (abs) min: 1 max: 686 x̄: 3.52 x̃: 3
helped stats (rel) min: 0.09% max: 51.97% x̄: 2.21% x̃: 1.35%
HURT stats (abs)   min: 1 max: 9 x̄: 1.47 x̃: 1
HURT stats (rel)   min: 0.08% max: 4.55% x̄: 0.78% x̃: 0.55%
95% mean confidence interval for instructions value: -4.01 -2.99
95% mean confidence interval for instructions %-change: -2.29% -2.11%
Instructions are helped.

total cycles in shared programs: 360871298 -> 360755040 (-0.03%)
cycles in affected programs: 9931334 -> 9815076 (-1.17%)
helped: 2388
HURT: 1569
helped stats (abs) min: 1 max: 10228 x̄: 93.54 x̃: 18
helped stats (rel) min: <.01% max: 74.11% x̄: 3.36% x̃: 1.07%
HURT stats (abs)   min: 1 max: 1917 x̄: 68.27 x̃: 22
HURT stats (rel)   min: <.01% max: 44.90% x̄: 3.44% x̃: 1.72%
95% mean confidence interval for cycles value: -39.48 -19.28
95% mean confidence interval for cycles %-change: -0.86% -0.46%
Cycles are helped.

total spills in shared programs: 12355 -> 12159 (-1.59%)
spills in affected programs: 295 -> 99 (-66.44%)
helped: 2
HURT: 1

total fills in shared programs: 25398 -> 25207 (-0.75%)
fills in affected programs: 288 -> 97 (-66.32%)
helped: 2
HURT: 1

LOST:   3
GAINED: 44

Iron Lake
total instructions in shared programs: 8169225 -> 8159729 (-0.12%)
instructions in affected programs: 1025712 -> 1016216 (-0.93%)
helped: 3352
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.83 x̃: 3
helped stats (rel) min: 0.15% max: 12.00% x̄: 1.51% x̃: 1.05%
95% mean confidence interval for instructions value: -2.86 -2.80
95% mean confidence interval for instructions %-change: -1.56% -1.46%
Instructions are helped.

total cycles in shared programs: 188656796 -> 188612280 (-0.02%)
cycles in affected programs: 18633584 -> 18589068 (-0.24%)
helped: 3085
HURT: 14
helped stats (abs) min: 2 max: 72 x̄: 14.45 x̃: 12
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.73% x̃: 0.31%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -14.55 -14.18
95% mean confidence interval for cycles %-change: -0.76% -0.69%
Cycles are helped.

GM45
total instructions in shared programs: 5026905 -> 5021856 (-0.10%)
instructions in affected programs: 584169 -> 579120 (-0.86%)
helped: 1776
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.84 x̃: 3
helped stats (rel) min: 0.15% max: 11.11% x̄: 1.43% x̃: 0.98%
95% mean confidence interval for instructions value: -2.88 -2.80
95% mean confidence interval for instructions %-change: -1.50% -1.37%
Instructions are helped.

total cycles in shared programs: 129047376 -> 129018918 (-0.02%)
cycles in affected programs: 12941924 -> 12913466 (-0.22%)
helped: 1722
HURT: 14
helped stats (abs) min: 4 max: 72 x̄: 16.56 x̃: 18
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.72% x̃: 0.30%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -16.65 -16.13
95% mean confidence interval for cycles %-change: -0.76% -0.66%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

nir/flrp: Fix typo on the flrp(±1, b, c) path

After Samuel reported the bisect, I was able to find the bug by
inspection.  Good thing for well-named varibles. :)

Unfortunately, this undoes almost all of the benefit of the original
patch.

Ice Lake
total instructions in shared programs: 17183159 -> 17218166 (0.20%)
instructions in affected programs: 1308722 -> 1343729 (2.67%)
helped: 98
HURT: 4746
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.47% max: 2.70% x̄: 0.60% x̃: 0.57%
HURT stats (abs)   min: 1 max: 691 x̄: 7.40 x̃: 8
HURT stats (rel)   min: 0.10% max: 700.00% x̄: 5.82% x̃: 2.83%
95% mean confidence interval for instructions value: 6.82 7.64
95% mean confidence interval for instructions %-change: 5.22% 6.15%
Instructions are HURT.

total cycles in shared programs: 360705959 -> 360853522 (0.04%)
cycles in affected programs: 10754380 -> 10901943 (1.37%)
helped: 1594
HURT: 3331
helped stats (abs) min: 1 max: 1896 x̄: 119.81 x̃: 60
helped stats (rel) min: <.01% max: 35.48% x̄: 5.06% x̃: 3.64%
HURT stats (abs)   min: 1 max: 10208 x̄: 101.63 x̃: 38
HURT stats (rel)   min: 0.01% max: 878.95% x̄: 9.01% x̃: 2.78%
95% mean confidence interval for cycles value: 21.11 38.81
95% mean confidence interval for cycles %-change: 3.76% 5.15%
Cycles are HURT.

total spills in shared programs: 12158 -> 12355 (1.62%)
spills in affected programs: 98 -> 295 (201.02%)
helped: 1
HURT: 2

total fills in shared programs: 25204 -> 25398 (0.77%)
fills in affected programs: 94 -> 288 (206.38%)
helped: 0
HURT: 3

LOST:   15
GAINED: 8

Iron Lake
total instructions in shared programs: 8121430 -> 8166733 (0.56%)
instructions in affected programs: 1148353 -> 1193656 (3.95%)
helped: 2
HURT: 4046
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.92% x̄: 1.89% x̃: 1.89%
HURT stats (abs)   min: 1 max: 43 x̄: 11.20 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.40% x̃: 3.87%
95% mean confidence interval for instructions value: 11.02 11.37
95% mean confidence interval for instructions %-change: 6.84% 7.94%
Instructions are HURT.

total cycles in shared programs: 188376326 -> 188601568 (0.12%)
cycles in affected programs: 27416674 -> 27641916 (0.82%)
helped: 68
HURT: 3947
helped stats (abs) min: 2 max: 222 x̄: 13.88 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 57.31 x̃: 64
HURT stats (rel)   min: <.01% max: 1811.11% x̄: 4.11% x̃: 1.09%
95% mean confidence interval for cycles value: 55.01 57.20
95% mean confidence interval for cycles %-change: 2.88% 5.19%
Cycles are HURT.

LOST:   35
GAINED: 3

GM45
total instructions in shared programs: 4979794 -> 5003551 (0.48%)
instructions in affected programs: 635174 -> 658931 (3.74%)
helped: 1
HURT: 2142
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.85% x̄: 1.85% x̃: 1.85%
HURT stats (abs)   min: 1 max: 43 x̄: 11.09 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.00% x̃: 3.53%
95% mean confidence interval for instructions value: 10.85 11.33
95% mean confidence interval for instructions %-change: 6.25% 7.74%
Instructions are HURT.

total cycles in shared programs: 128519586 -> 128654990 (0.11%)
cycles in affected programs: 17635304 -> 17770708 (0.77%)
helped: 46
HURT: 2088
helped stats (abs) min: 4 max: 220 x̄: 18.13 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 65.25 x̃: 66
HURT stats (rel)   min: <.01% max: 1464.29% x̄: 4.05% x̃: 0.99%
95% mean confidence interval for cycles value: 61.75 65.15
95% mean confidence interval for cycles %-change: 2.58% 5.34%
Cycles are HURT.

LOST:   38
GAINED: 38

Fixes: 5b908db604b ("nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently")
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

anv: fix use after free

Once mem->bo is removed from the cache, it is likely to be freed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b80930a6fea075 ("anv: add support for VK_EXT_memory_budget")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv: rework queries writes to ensure ordering memory writes

We use a mix of MI & PIPE_CONTROL commands to write our queries' data
(results & availability). Those commands' memory write order is not
guaranteed with regard to their order in the command stream, unless CS
stalls are inserted between them. This is problematic for 2 reasons :

   1. We copy results from the device using MI commands even though
      the values are generated from PIPE_CONTROL, meaning we could
      copy unlanded values into the results and then copy the
      availability that is inconsistent with the values.

   2. We allow the user to poll on the availability values of the
      query pool from the CPU. If the availability lands in memory
      before the values then we could return invalid values.

This change does 2 things to address this problem :

      - We use either PIPE_CONTROL or MI commands to write both
        queries values and availability, so that the ordering of the
        memory writes guarantees that if availability is visible,
        results are also visible.

      - For the occlusion & timestamp queries we apply a CS stall
        before copying the results on the device, to ensure copying
        with MI commands see the correct values of previous
        PIPE_CONTROL writes of availability (required by the Vulkan
        spec).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

radv: call constant folding before opt algebraic

The pattern of calling opt algebraic first seems to have originated
in i965. The order in OpenGL drivers generally doesn't matter
because the GLSL IR optimisations do constant folding before
opt algebraic.

However in Vulkan drivers calling opt algebraic first can result
in missed constant folding opportunities.

vkpipeline-db results (VEGA64):

Totals from affected shaders:
SGPRS: 3160 -> 3176 (0.51 %)
VGPRS: 3588 -> 3580 (-0.22 %)
Spilled SGPRs: 52 -> 44 (-15.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 12 -> 12 (0.00 %) dwords per thread
Code Size: 261812 -> 261036 (-0.30 %) bytes
LDS: 7 -> 7 (0.00 %) blocks
Max Waves: 346 -> 348 (0.58 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

docs: drop h1 in header

It's generally frowned upon to have more than one H1 per document in
HTML4. So let's put the text directly inside the header. This means we
can drop the flex-based centering, which makes things a bit easier. We
also need to change the padding to rem instead of em, because the em has
now changed.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: harmonize headings and titles

We're pretty insonsistent in what the headings and titles are, especially
compared to what the articles are listed as in the sidebar. Let's
harmonize this.

There's a notable exception for meson.html, where the sidebar uses a
short-hand form that makes sense in the sidebar, but not in the article
due to the visible context being different.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: renumber headings

It's generally frowned upon to have multiple H1 headings in HTML4. So
let's make sure each article has a primary heading for the article, and
that that heading is the title that is used in the sidebar.

While we're at it, let's update the title in the articles to match the
title from the sidebar as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: give download-article a primary heading

It's generally frowned upon to have multiple H1 headings in HTML4. So
let's add a primary heading for the article, and source that from the
title used in the sidebar.

While we're at it, let's update the title in the article to match the
title from the sidebar as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: use title-casing for all headings in sidebar

We generally use title-casing for headings in the sidebar. But not
all headings was constently cased like that. Let's make sure this
is consistent.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>

docs: spell out "and" in sidebar

There's no need to keep this short, we can just spell out "and" here.
Besides, a slash kind of implies "or", but these articles are about
both of these, not either.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: remove pointless list-entry

It's quite visible that there's more docs below, we don't need to spell
it out for the reader.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: spell out faq in sidebar

We're not short on space here, so there's little point in abbreviating
this. This also matches the heading in the article.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

docs: spell out "and" in sidebar

We're not short on space here, so let's just spell out "and" instead of
using the ampersand. This is more consistent with the entry above in the
sidebar.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

glsl_to_nir: remove unused type_is_int()

This was missed in e00fa99b08b3.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Revert "glx: Fix synthetic error generation in __glXSendError"

This reverts commit e91ee763c378d03883eb88cf0eadd8aa916f7878.

This seems to have broken a number of wine games. Lets revert
everything for now and try again later.

Acked-by: Adam Jackson <ajax@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110632
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110590

radeonsi: add an AMD_TEX_ANISO environment variable

This brings it inline with the recently added AMD_DEBUG.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109619

i965: leave the top 4Gb of the high heap VMA unused

This ports commit 9e7b0988d6e98690eb8902e477b51713a6ef9cae from anv
to i965. Thanks to Lionel for noticing that it was missing!

Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Force VMA alignment to be a multiple of the page size.

This should happen regardless, but let's be paranoid.

Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Fix BRW_MEMZONE_LOW_4G heap size.

The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages,
and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB.

So we can't use the top page.

Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/compiler: Unset flag reg when FB write is not predicated

In the FS IR we pretend that the instruction is predicated with (+f0.1)
just for flag dependency tracking purposes. Since the instruction
doesn't support predication before Haswell, we unset the predicate so we
should also unset the flag register so that we can round-trip the
disassembly.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/disasm: Disassemble immediate value properly for dim

On haswell, for dim instruction we encode immediate float value operand
into double float,

v2: Fix comment (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/disasm: Disassemble JIP offset for while

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/compiler: Replicate 16 bit immediate value correctly

For the W or UW (signed or unsigned word) source types, the 16-bit value
must be replicated in both the low and high words of the 32-bit
immediate value.

v2: Fix replication in other places as well
V3: fix a few nits (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/compiler: Print quad value in hex format

Print quad value same as unsigned quad so that we can distinguish in
between quater control disassembled values for e.g 1/2/3[Q] and
immediate quad value for e.g 1Q. This allows round-tripping through the
assembler/disassembler.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/tools: Add unit tests for assembler

v1: Pass executable object from meson to test(Dylan Baker)
v2: Ignore generated output files from git status(Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

intel/tools: Initialize offset correctly for i965_asm

If we leave offset uninitialized, access to store
will be random depending on stack value and can
segfault.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/tools: Add meson pthread dependancy for i965_asm

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/tools: New i965 instruction assembler tool

Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who
mentored me through out this project.

v2: Fix memory leaks and naming convention (Caio)
v3: Fix meson changes (Dylan Baker)
v4: Fix usage options (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141

iris: Also handle res->offset for buffer sampler/image views

iris: support dmabuf imports with offsets

this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264

fixes kwg/mesa#47

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

gallivm: fix broken 8-wide s3tc decoding

Brian noticed there was an uninitialized var for the 8-wide case and 128
bit blocks, which made it always crash. Likewise, the 64bit block case
had another crash bug due to type mismatch.
Color decode (used for all s3tc formats) also had a bogus shuffle for
this case, leading to decode artifacts.
Fix these all up, which makes the code actually work 8-wide. Note that
it's still not used - I've verified it works, and the generated assembly
does look quite a bit simpler actually (20-30% less instructions for the
s3tc decode part with avx2), however in practice it still seems to be
sligthly slower for some unknown reason (tested with openarena) on my
haswell box, so for now continue to split things into 4-wide vectors
before decoding.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

docs: Add relnotes stub for 19.2

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Bump version for 19.1 branch

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

lima: enable sin and cos lowering for GP

GP doesn't support sin/cos natively, so we have to lower them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>

nir: implement lowering for fsin and fcos

Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>

freedreno/ir3: move const_state to ir3_shader

For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass. So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this. The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: split out const_state setup

Next patch moves const_state to ir3_shader, before the compile context
is created. So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: move immediates to const_state

They are really part of the constant state, and it will moving things
from ir3_shader_variant to ir3_shader if we combine them.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: consolidate const state

Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout. Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: move ir3_pointer_size()

Move to ir3_compiler so it doesn't depend on the compile context. Prep
work for moving constant state from variant (where we have compile
context) to shader (where we do not).

Signed-off-by: Rob Clark <robdclark@chromium.org>

vulkan/overlay-layer: fix cast errors

Not quite sure what version of GCC/Clang produces errors (8.3.0
locally was fine).

v2: also fix an integer literal issue (Karol)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv: fix alphaToCoverage when there is no color attachment

There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.

v2:
  - We already create a null rt when we don't have any, so reuse that
    for this case (Jason)
  - Simplify the code a bit (Iago)

v3:
  - Take alpha to coverage from the key and don't tie this to depth-only
    rendering only, we want the same behavior if we have multiple render
    targets but the one at location 0 is not used. (Jason).
  - Rewrite commit message (Iago)

v4:
  - Make sure we take into account the array length of the shader outputs,
    which we were no handling correctly either and make sure we also
    create null render targets for any invalid array entries too.

v5:
  - Simplify removal of unused outputs by using rt_used[] so we don't have
    to special case alpha to coverage there too.

Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/compiler: Don't always require precise lowering of flrp

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
instructions in affected programs: 3271235 -> 3242419 (-0.88%)
helped: 13636
HURT: 90
helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
95% mean confidence interval for instructions value: -2.13 -2.07
95% mean confidence interval for instructions %-change: -1.16% -1.13%
Instructions are helped.

total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
cycles in affected programs: 70415766 -> 70282014 (-0.19%)
helped: 12563
HURT: 515
helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
95% mean confidence interval for cycles value: -10.56 -9.90
95% mean confidence interval for cycles %-change: -0.47% -0.45%
Cycles are helped.

LOST:   0
GAINED: 13

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/algebraic: Reassociate open-coded flrp(1, b, c)

In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/flrp: Lower flrp(a, b, #c) differently

This doesn't help on Intel GPUs now because we always take the
"always_precise" path first. It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists

There is little effect on Intel GPUs now because we almost always take
the "always_precise" path first. It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any other Intel platforms.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 188852500 -> 188852484 (<.01%)
cycles in affected programs: 14612 -> 14596 (-0.11%)
helped: 4
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11%
95% mean confidence interval for cycles value: -4.00 -4.00
95% mean confidence interval for cycles %-change: -0.13% -0.09%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists

This doesn't help on Intel GPUs now because we always take the
"always_precise" path first. It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any Intel platform. Before a number of large rebases this
helped cycles in a couple shaders on Iron Lake and GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8189888 -> 8153912 (-0.44%)
instructions in affected programs: 1199037 -> 1163061 (-3.00%)
helped: 4124
HURT: 10
helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9
helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02%
HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
HURT stats (rel)   min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06%
95% mean confidence interval for instructions value: -8.84 -8.56
95% mean confidence interval for instructions %-change: -5.12% -4.77%
Instructions are helped.

total cycles in shared programs: 188606710 -> 188426964 (-0.10%)
cycles in affected programs: 27505596 -> 27325850 (-0.65%)
helped: 4026
HURT: 77
helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46
helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85%
HURT stats (abs)   min: 2 max: 376 x̄: 17.79 x̃: 6
HURT stats (rel)   min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04%
95% mean confidence interval for cycles value: -44.75 -42.87
95% mean confidence interval for cycles %-change: -2.44% -2.17%
Cycles are helped.

LOST:   3
GAINED: 35

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/flrp: Lower flrp(#a, #b, c) differently

If the magnitudes of #a and #b are such that (b-a) won't lose too much
precision, lower as a+c(b-a).

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8192503 -> 8192383 (<.01%)
instructions in affected programs: 18417 -> 18297 (-0.65%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43%
95% mean confidence interval for instructions value: -2.48 -1.05
95% mean confidence interval for instructions %-change: -1.56% -0.63%
Instructions are helped.

total cycles in shared programs: 188662536 -> 188661956 (<.01%)
cycles in affected programs: 744476 -> 743896 (-0.08%)
helped: 62
HURT: 0
helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6
helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06%
95% mean confidence interval for cycles value: -12.37 -6.34
95% mean confidence interval for cycles %-change: -0.48% -0.06%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5

Previously lower_flrp32 was only set for vertex shaders.  Fragment
shaders performed a(1-c)+bc lowering during code generation.

The shaders with loops hurt are SIMD8 and SIMD16 shaders for a
text-identical fragment shader.

v2: Rebase on 26391cceaa1 ("intel/compiler: Lower ffma on Gen4 and
Gen5").

v3: Rebase on a004e95dd73 ("radeonsi/nir: create si_nir_opts() helper")

Iron Lake
total instructions in shared programs: 8211385 -> 8185974 (-0.31%)
instructions in affected programs: 2503898 -> 2478487 (-1.01%)
helped: 9936
HURT: 921
helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11%
HURT stats (abs)   min: 1 max: 12 x̄: 3.24 x̃: 2
HURT stats (rel)   min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89%
95% mean confidence interval for instructions value: -2.43 -2.25
95% mean confidence interval for instructions %-change: -1.41% -1.33%
Instructions are helped.

total cycles in shared programs: 188523186 -> 188401198 (-0.06%)
cycles in affected programs: 71541604 -> 71419616 (-0.17%)
helped: 11649
HURT: 1871
helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6
helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 13.38 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17%
95% mean confidence interval for cycles value: -9.42 -8.63
95% mean confidence interval for cycles %-change: -0.54% -0.50%
Cycles are helped.

total loops in shared programs: 852 -> 856 (0.47%)
loops in affected programs: 0 -> 4
helped: 0
HURT: 4
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for loops value: 1.00 1.00
95% mean confidence interval for loops %-change: 0.00% 0.00%
Loops are HURT.

LOST:   3
GAINED: 12

GM45
total instructions in shared programs: 5046407 -> 5033694 (-0.25%)
instructions in affected programs: 1303584 -> 1290871 (-0.98%)
helped: 5010
HURT: 464
helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2
helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08%
HURT stats (abs)   min: 1 max: 75 x̄: 3.39 x̃: 2
HURT stats (rel)   min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87%
95% mean confidence interval for instructions value: -2.45 -2.20
95% mean confidence interval for instructions %-change: -1.40% -1.28%
Instructions are helped.

total cycles in shared programs: 128889476 -> 128812366 (-0.06%)
cycles in affected programs: 44845402 -> 44768292 (-0.17%)
helped: 6079
HURT: 940
helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8
helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 16.01 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17%
95% mean confidence interval for cycles value: -11.63 -10.34
95% mean confidence interval for cycles %-change: -0.58% -0.52%
Cycles are helped.

total loops in shared programs: 633 -> 635 (0.32%)
loops in affected programs: 0 -> 2
helped: 0
HURT: 2

total spills in shared programs: 60 -> 69 (15.00%)
spills in affected programs: 54 -> 63 (16.67%)
helped: 0
HURT: 1

total fills in shared programs: 92 -> 105 (14.13%)
fills in affected programs: 80 -> 93 (16.25%)
helped: 0
HURT: 1

LOST:   15
GAINED: 15

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]

nir: Use the flrp lowering pass instead of nir_opt_algebraic

I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/flrp: Add new lowering pass for flrp instructions

This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp. Rebase on top of
3766334923e ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/algebraic: Pull common multiplication out of flrp arguments

All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>