mesa.git
5 years agospecs: Remove GLES profile interaction text from GLX_MESA_query_renderer
Adam Jackson [Fri, 9 Nov 2018 16:27:27 +0000 (11:27 -0500)]
specs: Remove GLES profile interaction text from GLX_MESA_query_renderer

In one place we say, if GLES isn't supported then the profile version
will be 0.0. Then later we say, if the GLES profile extension isn't
supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not
mentioned in the spec. A strict reading of the latter would mean that
GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token,
and the query should instead return False.

The implementation does not check for the GLES profile extensions, and
the additional complexity doesn't seem worth it. Removing the
interaction text makes the spec match the implementation.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
5 years agofreedreno/ir3: Make imageStore use num components from image format
Eduardo Lima Mitev [Mon, 17 Dec 2018 20:41:24 +0000 (21:41 +0100)]
freedreno/ir3: Make imageStore use num components from image format

emit_intrinsic_store_image() is always using 4 components when
collecting registers for the value. When image has less than
4 components (e.g, r32f, rg32i, etc) this results in extra mov
instructions.

This patch uses the actual number of components from the image format.

For example, in a shader like:

layout (r32f, binding=0) writeonly uniform imageBuffer u_image;
...
void main(void) {
   ...
   imageStore (u_image, some_offset, vec4(1.0));
   ...
}

instruction count is reduced in at least 3 instructions (note image
format is r32f, 1 component only).

This obviously reduces register pressure as well.

v2: - Added support for image formats from NV_image_format extension
    (Ilia Mirkin).
    - Return 4 components by default instead of asserting. (Rob Clark).

v3: Added more missing formats (Ilia Mirkin).

v4: Added a debug message for unknown image formats (Rob Clark).

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agonir/dead_write_vars: Get modes directly from derefs
Jason Ekstrand [Thu, 13 Dec 2018 22:30:24 +0000 (16:30 -0600)]
nir/dead_write_vars: Get modes directly from derefs

Instead of going all the way back to the variable, just look at the
deref.  The modes are guaranteed to be the same by nir_validate whenever
the variable can be found.  This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/copy_prop_vars: Get modes directly from derefs
Jason Ekstrand [Thu, 13 Dec 2018 22:28:23 +0000 (16:28 -0600)]
nir/copy_prop_vars: Get modes directly from derefs

Instead of going all the way back to the variable, just look at the
deref.  The modes are guaranteed to be the same by nir_validate whenever
the variable can be found.  This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/lower_wpos_center: Look at derefs for modes
Jason Ekstrand [Thu, 13 Dec 2018 21:03:28 +0000 (15:03 -0600)]
nir/lower_wpos_center: Look at derefs for modes

This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/lower_io_to_scalar: Look at derefs for modes
Jason Ekstrand [Thu, 13 Dec 2018 21:01:05 +0000 (15:01 -0600)]
nir/lower_io_to_scalar: Look at derefs for modes

This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/lower_io_arrays_to_elements: Look at derefs for modes
Jason Ekstrand [Thu, 13 Dec 2018 20:33:41 +0000 (14:33 -0600)]
nir/lower_io_arrays_to_elements: Look at derefs for modes

This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/linking_helpers: Look at derefs for modes
Jason Ekstrand [Thu, 13 Dec 2018 20:31:54 +0000 (14:31 -0600)]
nir/linking_helpers: Look at derefs for modes

This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/propagate_invariant: Skip unknown vars
Jason Ekstrand [Thu, 13 Dec 2018 20:06:48 +0000 (14:06 -0600)]
nir/propagate_invariant: Skip unknown vars

If we can't find the variable from the deref, just assume it isn't
invariant and continue on.  This can happen if, for instance, we're
writing to a deref that points into an SSBO.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoRevert "nir/lower_indirect: Bail early if modes == 0"
Ian Romanick [Tue, 18 Dec 2018 02:48:17 +0000 (18:48 -0800)]
Revert "nir/lower_indirect: Bail early if modes == 0"

    "There's no point in walking the program if we're never going to
    actually lower anything."

Except we might lower compacted local arrays.  In that case, modes will
be 0, but there is still lowering to be done.

This reverts commit 7f75cf2a9408b9af562e033ef6c1d1fd15141421.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
5 years agost/dri: replace format conversion functions with single mapping table
Lucas Stach [Thu, 17 May 2018 09:03:57 +0000 (11:03 +0200)]
st/dri: replace format conversion functions with single mapping table

Each time I have to touch the buffer import/export functions in the dri
state tracker I get lost in the maze of functions converting between
DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format.

Rip it out and replace by a single table, which defines the correspondence
between the different representations.

Also this now stores all the known representations in the __DRIimageRec,
to avoid the loss of information we currently have when importing a buffer
with a fourcc, which doesn't have a corresponding dri format.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agost/dri: allow both render and sampler compatible dma-buf formats
Lucas Stach [Mon, 26 Mar 2018 14:39:28 +0000 (16:39 +0200)]
st/dri: allow both render and sampler compatible dma-buf formats

Currently all the EGL APIs are missing a way to specify how an imported
dma-buf is intended to be used. Demanding the format to be both usable
for sampling and rendering artificially restricts the list of formats a
driver is able to import.

Looking at how the Intel driver implements those DRI2 image APIs it
doesn't distinguish between render or sampler compatible formats. So
this patch aligns behavior between Intel and Gallium based drivers.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoetnaviv: use surface format directly
Lucas Stach [Wed, 14 Nov 2018 14:11:07 +0000 (15:11 +0100)]
etnaviv: use surface format directly

There is no need to do the detour over the resource behind the
surface to get the format. Use the surface format directly.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
5 years agomeson: Add toggle for glx-direct
Dylan Baker [Tue, 4 Dec 2018 18:06:08 +0000 (10:06 -0800)]
meson: Add toggle for glx-direct

GNU Hurd needs to turn off glx-direct, rather than special case it,
we'll just add a toggle.

CC: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomeson: Add support for gnu hurd
Dylan Baker [Tue, 4 Dec 2018 17:56:30 +0000 (09:56 -0800)]
meson: Add support for gnu hurd

CC: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomeson: remove duplicate definition
Dylan Baker [Tue, 4 Dec 2018 17:48:42 +0000 (09:48 -0800)]
meson: remove duplicate definition

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomeson: Fix ppc64 little endian detection
Dylan Baker [Tue, 4 Dec 2018 17:28:10 +0000 (09:28 -0800)]
meson: Fix ppc64 little endian detection

Old versions of meson returned ppc64le as the cpu_family for little
endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't
work in that case. This generalizes the check to work for both old and
new versions of meson.

Fixes: 34bbb24ce7702658cdc4e9d34a650e169716c39e
       ("meson: Add support for ppc assembly/optimizations")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoanv: Bump the patch version to 96
Jason Ekstrand [Mon, 17 Dec 2018 16:48:47 +0000 (10:48 -0600)]
anv: Bump the patch version to 96

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoi965: Don't override subslice count to 4 on Gen11.
Kenneth Graunke [Fri, 14 Dec 2018 23:38:29 +0000 (15:38 -0800)]
i965: Don't override subslice count to 4 on Gen11.

Gen9-10 have fewer than 4 subslices per slice, so they need this to be
rounded up.  Gen11 isn't documented as needing this hack, and it can
also have more than 4 subslices, so the hack actually can break things.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
5 years agointel/compiler: More peephole_select for pre-Gen6
Ian Romanick [Tue, 19 Jun 2018 00:09:41 +0000 (17:09 -0700)]
intel/compiler: More peephole_select for pre-Gen6

No shader-db changes on any Gen6+ platform.

All of the shaders with cycles hurt by more than ~2% are from Master of
Orion.  All of the shaders have instructions helped.  It looks like the
pass enables some control flow to be converted to bcsels, then the
scheduler does dumb things.  These are new shaders (just added before
doing this shader-db run), so there's probably some low-hanging fruit.

Iron Lake
total instructions in shared programs: 8214327 -> 8213684 (<.01%)
instructions in affected programs: 84469 -> 83826 (-0.76%)
helped: 114
HURT: 26
helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61%
95% mean confidence interval for instructions value: -5.87 -3.32
95% mean confidence interval for instructions %-change: -2.32% -1.17%
Instructions are helped.

total cycles in shared programs: 187736850 -> 187749314 (<.01%)
cycles in affected programs: 506750 -> 519214 (2.46%)
helped: 104
HURT: 36
helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16
helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63%
HURT stats (abs)   min: 4 max: 1402 x̄: 409.67 x̃: 40
HURT stats (rel)   min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39%
95% mean confidence interval for cycles value: 28.32 149.74
95% mean confidence interval for cycles %-change: -0.07% 1.61%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 5044014 -> 5043652 (<.01%)
instructions in affected programs: 46751 -> 46389 (-0.77%)
helped: 63
HURT: 13
helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52%
95% mean confidence interval for instructions value: -6.54 -2.99
95% mean confidence interval for instructions %-change: -3.04% -1.28%
Instructions are helped.

total cycles in shared programs: 128143042 -> 128150188 (<.01%)
cycles in affected programs: 324564 -> 331710 (2.20%)
helped: 57
HURT: 19
helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32
helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81%
HURT stats (abs)   min: 10 max: 1400 x̄: 468.21 x̃: 60
HURT stats (rel)   min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70%
95% mean confidence interval for cycles value: 6.90 181.15
95% mean confidence interval for cycles %-change: -0.52% 1.59%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agonir/opt_peephole_select: Don't peephole_select expensive math instructions
Ian Romanick [Mon, 18 Jun 2018 23:11:55 +0000 (16:11 -0700)]
nir/opt_peephole_select: Don't peephole_select expensive math instructions

On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agointel/compiler: More peephole select
Ian Romanick [Wed, 23 May 2018 01:56:41 +0000 (18:56 -0700)]
intel/compiler: More peephole select

Shader-db results:

The one shader hurt for instructions is a compute shader that had both
spills and fills hurt.

v2: Fix typo in comment noticed by Caio.

v3: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15072761 -> 15047884 (-0.17%)
instructions in affected programs: 895539 -> 870662 (-2.78%)
helped: 3623
HURT: 1
helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20%
HURT stats (abs)   min: 92 max: 92 x̄: 92.00 x̃: 92
HURT stats (rel)   min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92%
95% mean confidence interval for instructions value: -7.10 -6.63
95% mean confidence interval for instructions %-change: -4.03% -3.82%
Instructions are helped.

total cycles in shared programs: 369738930 -> 369535732 (-0.05%)
cycles in affected programs: 68027851 -> 67824653 (-0.30%)
helped: 2609
HURT: 1035
helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77
helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47%
HURT stats (abs)   min: 1 max: 33336 x̄: 261.04 x̃: 20
HURT stats (rel)   min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47%
95% mean confidence interval for cycles value: -96.43 -15.09
95% mean confidence interval for cycles %-change: -6.07% -5.36%
Cycles are helped.

total spills in shared programs: 10158 -> 10159 (<.01%)
spills in affected programs: 166 -> 167 (0.60%)
helped: 1
HURT: 1

total fills in shared programs: 22105 -> 22116 (0.05%)
fills in affected programs: 837 -> 848 (1.31%)
helped: 4
HURT: 1

Ivy Bridge
total instructions in shared programs: 12021190 -> 11990256 (-0.26%)
instructions in affected programs: 910561 -> 879627 (-3.40%)
helped: 3344
HURT: 18
helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6
helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31%
HURT stats (abs)   min: 2 max: 20 x̄: 7.89 x̃: 6
HURT stats (rel)   min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70%
95% mean confidence interval for instructions value: -9.49 -8.91
95% mean confidence interval for instructions %-change: -5.32% -4.98%
Instructions are helped.

total cycles in shared programs: 179077826 -> 178570196 (-0.28%)
cycles in affected programs: 63205667 -> 62698037 (-0.80%)
helped: 2767
HURT: 620
helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88
helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09%
HURT stats (abs)   min: 1 max: 31255 x̄: 152.27 x̃: 11
HURT stats (rel)   min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58%
95% mean confidence interval for cycles value: -173.94 -125.81
95% mean confidence interval for cycles %-change: -7.68% -6.97%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10852569 -> 10843758 (-0.08%)
instructions in affected programs: 235803 -> 226992 (-3.74%)
helped: 800
HURT: 0
helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8
helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36%
95% mean confidence interval for instructions value: -11.93 -10.10
95% mean confidence interval for instructions %-change: -4.99% -4.39%
Instructions are helped.

total cycles in shared programs: 154732047 -> 154608941 (-0.08%)
cycles in affected programs: 4063110 -> 3940004 (-3.03%)
helped: 606
HURT: 253
helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62
helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81%
HURT stats (abs)   min: 1 max: 1966 x̄: 59.36 x̃: 11
HURT stats (rel)   min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67%
95% mean confidence interval for cycles value: -170.49 -116.13
95% mean confidence interval for cycles %-change: -2.61% -1.65%
Cycles are helped.

No change on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agonir/opt_peephole_select: Don't try to remove flow control around indirect loads
Ian Romanick [Wed, 27 Jun 2018 18:41:19 +0000 (11:41 -0700)]
nir/opt_peephole_select: Don't try to remove flow control around indirect loads

That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoi965/vec4: Propagate conditional modifiers from more compares to other compares
Ian Romanick [Wed, 20 Jun 2018 01:09:05 +0000 (18:09 -0700)]
i965/vec4: Propagate conditional modifiers from more compares to other compares

If there is a CMP.NZ that compares a single component (via a .zzzz
swizzle, for example) with 0, it can propagate its conditional modifier
back to a previous CMP that writes only that component.  The specific
case that I saw was:

    cmp.l.f0(8)     g42<1>.xF       g61<4>.xF       (abs)g18<4>.zF
    ...
    cmp.nz.f0(8)    null<1>D        g42<4>.xD       0D

In this case we can just delete the second CMP.

No changes on Broadwell or Skylake because they do not use the vec4
backend.  Also no changes on GM45 or Iron Lake.

Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10856676 -> 10852569 (-0.04%)
instructions in affected programs: 228322 -> 224215 (-1.80%)
helped: 1331
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4
helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83%
95% mean confidence interval for instructions value: -3.19 -2.99
95% mean confidence interval for instructions %-change: -1.93% -1.83%
Instructions are helped.

total cycles in shared programs: 154788865 -> 154732047 (-0.04%)
cycles in affected programs: 2485892 -> 2429074 (-2.29%)
helped: 1097
HURT: 59
helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64
helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22%
HURT stats (abs)   min: 2 max: 16 x̄: 3.02 x̃: 2
HURT stats (rel)   min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71%
95% mean confidence interval for cycles value: -51.04 -47.26
95% mean confidence interval for cycles %-change: -3.40% -3.07%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoi965/fs: Eliminate unary op on operand of compare-with-zero
Ian Romanick [Fri, 22 Jun 2018 15:34:03 +0000 (08:34 -0700)]
i965/fs: Eliminate unary op on operand of compare-with-zero

The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.

All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.

total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.

Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0

total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0

GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0

total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoi965/vec4/dce: Don't narrow the write mask if the flags are used
Ian Romanick [Thu, 21 Jun 2018 00:18:30 +0000 (17:18 -0700)]
i965/vec4/dce: Don't narrow the write mask if the flags are used

In an instruction sequence like

            cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D
    (+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD

The other fields of vgrf17 may be unused, but the CMP still needs to
generate the other flag bits.

To my surprise, nothing in shader-db or any test suite appears to hit
this.  However, I have a change to brw_vec4_cmod_propagation that
creates cases where this can happen.  This fix prevents a couple dozen
regressions in that patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5df88c20 ("i965/vec4: Rewrite dead code elimination to use live in/out.")
5 years agoi965/vec4: Silence unused parameter warnings in vec4 compiler tests
Ian Romanick [Thu, 21 Jun 2018 21:22:02 +0000 (14:22 -0700)]
i965/vec4: Silence unused parameter warnings in vec4 compiler tests

src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::dst_reg* copy_propagation_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:57:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual void copy_propagation_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:77:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::vec4_instruction* copy_propagation_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:82:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::dst_reg* register_coalesce_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual void register_coalesce_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:80:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::vec4_instruction* register_coalesce_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:85:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::dst_reg* cmod_propagation_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual void cmod_propagation_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:85:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::vec4_instruction* cmod_propagation_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:90:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoradv: Fix multiview depth clears
Bas Nieuwenhuizen [Sun, 16 Dec 2018 19:31:10 +0000 (20:31 +0100)]
radv: Fix multiview depth clears

We were not using the view mask for depth clears, causing only the
first view to be cleared.

Fixes: 2e86f6b2597 "radv: Add multiview clears."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Remove redundant format check.
Bas Nieuwenhuizen [Mon, 30 Jul 2018 13:43:19 +0000 (15:43 +0200)]
radv: Remove redundant format check.

The switch directly after the check has a default case that returns
NULL too, so the effective return value is not changed. Also this
check is wrong once we start dealing with formats introduced by an
extension (e.g. YUV formats).

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir: Fix clamping of uints for image store lowering.
Eric Anholt [Sat, 15 Dec 2018 00:53:18 +0000 (16:53 -0800)]
nir: Fix clamping of uints for image store lowering.

I botched some copy-and-paste and clamped to signed int max instead of
uint max.  Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on
skl.

Fixes: d3e046e76c06 ("nir: Pull some of intel's image load/store format
conversion to nir_format.h")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agov3d: Fix the argument type for vir_BRANCH().
Eric Anholt [Sun, 16 Dec 2018 06:59:23 +0000 (22:59 -0800)]
v3d: Fix the argument type for vir_BRANCH().

Apparently this has been spewing warnings for Jason's clang, but not my
gcc.

5 years agovc4: Reuse nir_format_convert.h in our blend lowering.
Eric Anholt [Sun, 16 Dec 2018 03:31:22 +0000 (19:31 -0800)]
vc4: Reuse nir_format_convert.h in our blend lowering.

These helpers came along after and have effectively the same
implementation.

5 years agoradv: report Vulkan version 1.1.90 for real
Samuel Pitoiset [Fri, 14 Dec 2018 09:23:22 +0000 (10:23 +0100)]
radv: report Vulkan version 1.1.90 for real

I thought the value was correctly propagated, but actually not.

Fixes: 2ac6d55f38c ("radv: bump reported version to 1.1.90")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoanv,radv: Re-enable VK_EXT_pci_bus_info
Jason Ekstrand [Mon, 17 Dec 2018 15:36:03 +0000 (09:36 -0600)]
anv,radv: Re-enable VK_EXT_pci_bus_info

Now at version 2 with the fixed header.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agovulkan: Update the XML and headers to 1.1.96
Jason Ekstrand [Mon, 17 Dec 2018 15:35:17 +0000 (09:35 -0600)]
vulkan: Update the XML and headers to 1.1.96

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoradv: switch from nir_bcsel to nir_b32csel
Rhys Perry [Mon, 17 Dec 2018 13:51:09 +0000 (13:51 +0000)]
radv: switch from nir_bcsel to nir_b32csel

Fixes: 191a1dce928 ('nir: Add 1-bit Boolean opcodes')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: don't set surf_index for stencil-only images
Rhys Perry [Fri, 14 Dec 2018 16:47:09 +0000 (16:47 +0000)]
radv: don't set surf_index for stencil-only images

Fixes: f8d5b377c8b ('radv: set cb base tile swizzles for MRT speedups (v4)')
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108116
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir: Release per-block metadata in nir_sweep
Ian Romanick [Mon, 26 Nov 2018 23:12:30 +0000 (15:12 -0800)]
nir: Release per-block metadata in nir_sweep

nir_sweep already marks all metadata invalid, so it is safe to release
the memory here too.

mean soft fp64 using uint64:   1,342,759,331 => 1,010,670,475
gfxbench5 aztec ruins high 11:    63,555,571 =>    61,889,811
deus ex mankind divided 148:      62,845,304 =>    62,829,640
deus ex mankind divided 2890:     71,922,686 =>    71,922,686
dirt showdown 676:                69,238,607 =>    69,238,607
dolphin ubershaders 210:          77,822,072 =>    77,822,072

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Fix holes in nir_instr
Ian Romanick [Wed, 31 Oct 2018 02:15:18 +0000 (19:15 -0700)]
nir: Fix holes in nir_instr

Found using pahole.

Changes in peak memory usage according to Valgrind massif:

mean soft fp64 using uint64:   1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,555,571
deus ex mankind divided 148:      62,887,728 =>    62,845,304
deus ex mankind divided 2890:     72,399,750 =>    71,922,686
dirt showdown 676:                69,464,023 =>    69,238,607
dolphin ubershaders 210:          78,359,728 =>    77,822,072

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/phi_builder: Use per-value hash table to store [block] -> def mapping
Ian Romanick [Wed, 21 Nov 2018 21:46:51 +0000 (13:46 -0800)]
nir/phi_builder: Use per-value hash table to store [block] -> def mapping

Replace the old array in each value with a hash table in each value.

Changes in peak memory usage according to Valgrind massif:

mean soft fp64 using uint64:   5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,619,971
deus ex mankind divided 148:      62,887,728 =>    62,887,728
deus ex mankind divided 2890:     72,402,222 =>    72,399,750
dirt showdown 676:                74,466,431 =>    69,464,023
dolphin ubershaders 210:         109,630,376 =>    78,359,728

Run-time change for a full run on shader-db on my Haswell desktop (with
-march=native) is 1.22245% +/- 0.463879% (n=11).  This is about +2.9
seconds on a 237 second run.  The first time I sent this version of this
patch out, the run-time data was quite different.  I had misconfigured
the script that ran the test, and none of the tests from higher GLSL
versions were run.  These are generally more complex shaders, and they
are more affected by this change.

The previous version of this patch used a single hash table for the
whole phi builder.  The mapping was from [value, block] -> def, so a
separate allocation was needed for each [value, block] tuple.  There was
quite a bit of per-allocation overhead (due to ralloc), so the patch was
followed by a patch that added the use of the slab allocator.  The
results of those two patches was not quite as good:

mean soft fp64 using uint64:   5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,619,971
deus ex mankind divided 148:      62,887,728 =>    62,887,728
deus ex mankind divided 2890:     72,402,222 =>    72,402,222 *
dirt showdown 676:                74,466,431 =>    72,443,591 *
dolphin ubershaders 210:         109,630,376 =>    81,034,320 *

The * denote tests that are better now.  In the tests that are the same
in both patches, the "after" peak memory usage was at a different
location.  I did not check the local peaks.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/hash_table: Add _mesa_hash_table_init function
Ian Romanick [Wed, 21 Nov 2018 21:45:52 +0000 (13:45 -0800)]
util/hash_table: Add _mesa_hash_table_init function

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agost/nir: Use nir_src_as_uint for tokens
Jason Ekstrand [Fri, 14 Dec 2018 16:56:16 +0000 (10:56 -0600)]
st/nir: Use nir_src_as_uint for tokens

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv: Fix a stupid if in gather_intrinsic_info
Jason Ekstrand [Sun, 16 Dec 2018 06:59:08 +0000 (00:59 -0600)]
radv: Fix a stupid if in gather_intrinsic_info

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/algebraic: Add some optimizations for D3D-style Booleans
Jason Ekstrand [Fri, 19 Oct 2018 21:58:36 +0000 (16:58 -0500)]
nir/algebraic: Add some optimizations for D3D-style Booleans

D3D Booleans use a 32-bit 0/-1 representation.  Because this previously
matched NIR exactly, we didn't have to really optimize for it.  Now that
we have 1-bit Booleans, we need some specific optimizations to chew
through the D3D12-style Booleans.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15136811 -> 14967944 (-1.12%)
    instructions in affected programs: 2457021 -> 2288154 (-6.87%)
    helped: 8318
    HURT: 10

    total cycles in shared programs: 373544524 -> 359701825 (-3.71%)
    cycles in affected programs: 151029683 -> 137186984 (-9.17%)
    helped: 7749
    HURT: 682

    total loops in shared programs: 4431 -> 4399 (-0.72%)
    loops in affected programs: 32 -> 0
    helped: 21
    HURT: 0

    total spills in shared programs: 10290 -> 10051 (-2.32%)
    spills in affected programs: 2532 -> 2293 (-9.44%)
    helped: 18
    HURT: 18

    total fills in shared programs: 22203 -> 21732 (-2.12%)
    fills in affected programs: 3319 -> 2848 (-14.19%)
    helped: 18
    HURT: 18

Note that a large chunk of the improvement fixing regressions caused by
switching to 1-bit Booleans.  Previously, our ability to optimize D3D
booleans was improved by using the D3D representation directly in NIR.
Now that NIR does 1-bit bools, we need a few more optimizations.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/algebraic: Optimize 1-bit Booleans
Jason Ekstrand [Thu, 6 Dec 2018 03:36:10 +0000 (21:36 -0600)]
nir/algebraic: Optimize 1-bit Booleans

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Switch to using 1-bit Booleans for almost everything
Jason Ekstrand [Fri, 19 Oct 2018 16:14:47 +0000 (11:14 -0500)]
nir: Switch to using 1-bit Booleans for almost everything

This is a squash of a few distinct changes:

    glsl,spirv: Generate 1-bit Booleans

    Revert "Use 32-bit opcodes in the NIR producers and optimizations"

    Revert "nir/builder: Generate 32-bit bool opcodes transparently"

    nir/builder: Generate 1-bit Booleans in nir_build_imm_bool

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Add a bool to int32 lowering pass
Jason Ekstrand [Thu, 18 Oct 2018 17:04:09 +0000 (12:04 -0500)]
nir: Add a bool to int32 lowering pass

We also enable it in all of the NIR drivers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Add 1-bit Boolean opcodes
Jason Ekstrand [Fri, 19 Oct 2018 15:40:20 +0000 (10:40 -0500)]
nir: Add 1-bit Boolean opcodes

We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/algebraic: Generalize an optimization
Jason Ekstrand [Thu, 6 Dec 2018 18:56:33 +0000 (12:56 -0600)]
nir/algebraic: Generalize an optimization

This just makes it nicely scale across bit sizes.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/large_constants: Properly handle 1-bit bools
Jason Ekstrand [Thu, 6 Dec 2018 17:20:26 +0000 (11:20 -0600)]
nir/large_constants: Properly handle 1-bit bools

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Add support for 1-bit data types
Jason Ekstrand [Thu, 18 Oct 2018 16:59:40 +0000 (11:59 -0500)]
nir: Add support for 1-bit data types

This commit adds support for 1-bit Booleans and integers.  Booleans
obviously take a value of true or false.  Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1.  1-bit
arithmetic is then well-defined in the usual way, just with fewer bits.
The definition of int1_t and uint1_t doesn't usually matter but we do
need something for purposes of constant folding.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/constant_expressions: Rework Boolean handling
Jason Ekstrand [Mon, 22 Oct 2018 20:35:06 +0000 (15:35 -0500)]
nir/constant_expressions: Rework Boolean handling

This commit contains three related changes.  First, we define boolN_t
for N = 8, 16, and 64 and move the definition of boolN_vec to the loop
with the other vec definitions.  Second, there's no reason why we need
the != 0 on the source because that happens implicitly when it's
converted to bool.  Third, for destinations, we use a signed integer
type and just do -(int)bool_val which will give us the 0/-1 behavior we
want and neatly scales to all bit widths.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Rename Boolean-related opcodes to include 32 in the name
Jason Ekstrand [Thu, 18 Oct 2018 16:44:38 +0000 (11:44 -0500)]
nir: Rename Boolean-related opcodes to include 32 in the name

This is a squash of a bunch of individual changes:

    nir/builder: Generate 32-bit bool opcodes transparently

    nir/algebraic: Remap Boolean opcodes to the 32-bit variant

    Use 32-bit opcodes in the NIR producers and optimizations

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

     Use 32-bit opcodes in the NIR back-ends

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/algebraic: Make an optimization more specific
Jason Ekstrand [Thu, 6 Dec 2018 03:32:11 +0000 (21:32 -0600)]
nir/algebraic: Make an optimization more specific

Later in this series, bool is not going to imply 32-bit.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Drop support for lower_b2f
Jason Ekstrand [Thu, 6 Dec 2018 03:26:46 +0000 (21:26 -0600)]
nir: Drop support for lower_b2f

This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/algebraic: Optimize x2b(xneg(a)) -> a
Jason Ekstrand [Thu, 6 Dec 2018 19:02:01 +0000 (13:02 -0600)]
nir/algebraic: Optimize x2b(xneg(a)) -> a

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15072525 -> 15072525 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This helps prevent regressions in later commits.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/constant_folding: Fix source bit size logic
Jason Ekstrand [Thu, 6 Dec 2018 20:31:20 +0000 (14:31 -0600)]
nir/constant_folding: Fix source bit size logic

Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.

Fixes: 35baee5dce5 "nir/constant_folding: fix incorrect bit-size check"
Fixes: 9076c4e289d "nir: update opcode definitions for different bit sizes"
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/tgsi: Use nir_bany in ttn_kill_if
Jason Ekstrand [Sat, 15 Dec 2018 18:29:32 +0000 (12:29 -0600)]
nir/tgsi: Use nir_bany in ttn_kill_if

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir/lower_idiv: Use ilt instead of bit twiddling
Jason Ekstrand [Sun, 16 Dec 2018 06:42:01 +0000 (00:42 -0600)]
nir/lower_idiv: Use ilt instead of bit twiddling

The previous code was creating a boolean by doing an arithmetic right-
shift by 31 which produces a boolean which is true if the argument is
negative.  This is the same as the expression r < 0 which is much
simpler and doesn't depend on NIR's representation of booleans.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agov3d: Use the original bit size when scalarizing uniform loads.
Eric Anholt [Sun, 16 Dec 2018 06:17:52 +0000 (22:17 -0800)]
v3d: Use the original bit size when scalarizing uniform loads.

Prevents a regression in jekstrand's 1-bit series.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agovc4: Use the original bit size when scalarizing uniform loads.
Eric Anholt [Sun, 16 Dec 2018 03:42:57 +0000 (19:42 -0800)]
vc4: Use the original bit size when scalarizing uniform loads.

Prevents a regression in jekstrand's 1-bit series.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoac: split 16-bit ssbo loads that may not be dword aligned
Rhys Perry [Thu, 13 Dec 2018 17:03:23 +0000 (17:03 +0000)]
ac: split 16-bit ssbo loads that may not be dword aligned

Fixes: 7e7ee826982 ('ac: add support for 16bit buffer loads')
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108114
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac: refactor visit_load_buffer
Rhys Perry [Wed, 5 Dec 2018 13:42:47 +0000 (13:42 +0000)]
ac: refactor visit_load_buffer

This is so that we can split different types of loads more easily.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir: fix constness in nir_intrinsic_align()
Rhys Perry [Fri, 14 Dec 2018 11:08:51 +0000 (11:08 +0000)]
nir: fix constness in nir_intrinsic_align()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoclover: Fix build after clang r348827
Jan Vesely [Thu, 13 Dec 2018 20:53:42 +0000 (15:53 -0500)]
clover: Fix build after clang r348827

CodeGenOptions were moved to Basic.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
CC: mesa-stable@lists.freedesktop.org
5 years agoglx: Fix compilation with GLX_USE_WINDOWSGL
Jon Turney [Fri, 14 Dec 2018 13:20:10 +0000 (13:20 +0000)]
glx: Fix compilation with GLX_USE_WINDOWSGL

Sadly, the GLX_USE_APPLEGL and GLX_USE_WINDOWSGL cases are not identical
(because GLX_USE_WINDOWSGL uses vtables rather than a maze of ifdefs)

Include <sys/time.h> again, as functions prototyped by it are used in
the GLX_USE_WINDOWSGL path.

Make the include guard around the __glxGetMscRate() definition match the
one at it's declaration again, as it's referenced from dri_common.c
which is built for GLX_USE_WINDOWSGL.

Fixes: a95ec138 ("glx: mandate xf86vidmode only for "drm" dri platforms")
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agov3d: Drop in a bunch of notes about performance improvement opportunities.
Eric Anholt [Fri, 14 Dec 2018 22:46:48 +0000 (14:46 -0800)]
v3d: Drop in a bunch of notes about performance improvement opportunities.

These have all been floating in my head, and while I've thought about
encoding them in issues on gitlab once they're enabled, they also make
sense to just have in the area of the code you'll need to work in.

5 years agov3d: Do uniform pretty-printing in the QPU dump.
Eric Anholt [Wed, 12 Dec 2018 07:02:37 +0000 (23:02 -0800)]
v3d: Do uniform pretty-printing in the QPU dump.

If you're trying to trace what's going on in a QPU dump, this will
definitely help you find your way.

5 years agov3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code.
Eric Anholt [Wed, 12 Dec 2018 06:43:56 +0000 (22:43 -0800)]
v3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code.

This will be a lot easier than my usual "38400.000000?  that looks like a
viewport scale" decoding strategy.

5 years agov3d: Move uniform pretty-printing to its own helper function.
Eric Anholt [Wed, 12 Dec 2018 06:31:41 +0000 (22:31 -0800)]
v3d: Move uniform pretty-printing to its own helper function.

I want to reuse it in the QPU dump.

5 years agov3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms().
Eric Anholt [Wed, 12 Dec 2018 06:42:13 +0000 (22:42 -0800)]
v3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms().

Follows 3954331aff23 ("vc4: Pull uinfo->data[i] dereference out to the top
of the loop.") which showed a large performance win for vc4, but also
cleans up the code a decent bit.

5 years agov3d: Avoid assertion failures when removing end-of-shader instructions.
Eric Anholt [Tue, 11 Dec 2018 22:07:52 +0000 (14:07 -0800)]
v3d: Avoid assertion failures when removing end-of-shader instructions.

After generating VIR, we leave c->cursor pointing at the end of the
shader.  If the shader had dead code at the end (for example from preamble
instructions in a shader with no side effects), we would assertion fail
that we were leaving the cursor pointing at freed memory.  Since anything
following DCE should be setting up a new cursor anyway, just clear the
cursor at the start.

5 years agov3d: Add support for draw indirect for GLES3.1.
Eric Anholt [Sat, 8 Dec 2018 05:37:26 +0000 (21:37 -0800)]
v3d: Add support for draw indirect for GLES3.1.

In trying to enable compute shaders, I found that a bunch of deqp-gles31's
compute stuff wanted to interact with indirect dispatch.  This was easy to
do on its own.

5 years agov3d: Add missing flagging of SYNCB as a TSY op.
Eric Anholt [Tue, 11 Dec 2018 00:47:13 +0000 (16:47 -0800)]
v3d: Add missing flagging of SYNCB as a TSY op.

Fixes: f2e41daac577 ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")
5 years agov3d: Make sure that a thrsw doesn't split a multop from its umul24.
Eric Anholt [Wed, 12 Dec 2018 00:14:03 +0000 (16:14 -0800)]
v3d: Make sure that a thrsw doesn't split a multop from its umul24.

The thrsw will invalidate rtop, just like accumulators and flags.  Caught
by simulator assertions in CS imulextended/umulextended tests.

Fixes: 90269ba35333 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
5 years agov3d: Add safety checks for resource_create().
Eric Anholt [Fri, 30 Nov 2018 01:06:25 +0000 (17:06 -0800)]
v3d: Add safety checks for resource_create().

This should ease my debugging next time I screw it up.

5 years agov3d: Add support for texturing from linear.
Eric Anholt [Thu, 29 Nov 2018 01:22:45 +0000 (17:22 -0800)]
v3d: Add support for texturing from linear.

Just like vc4, we have to support linear shared BOs for X11 on arbitrary
displays.  When we're faced with a request to texture from one of those,
make a shadow image that we copy using the TFU at the start of the draw
call.

5 years agov3d: Add support for using the TFU to do some blits.
Eric Anholt [Thu, 29 Nov 2018 01:59:51 +0000 (17:59 -0800)]
v3d: Add support for using the TFU to do some blits.

This will be useful in particular for blits from raster to UIF for X11.

5 years agov3d: Don't forget to bump the number of writes when doing TFU ops.
Eric Anholt [Thu, 13 Dec 2018 23:47:29 +0000 (15:47 -0800)]
v3d: Don't forget to bump the number of writes when doing TFU ops.

generatemipmap is just filling out the rest of the mipmap that's already
been written (by a mapping or a draw call), so it didn't matter.  As I
reuse the TFU code for linear-to-UIF conversions, it'll start mattering.

5 years agov3d: Set up the right stride for raster TFU.
Eric Anholt [Thu, 13 Dec 2018 23:46:51 +0000 (15:46 -0800)]
v3d: Set up the right stride for raster TFU.

I didn't have any raster images in the generatemipmap path, so the
pixels-vs-bytes mixup didn't matter here.

5 years agov3d: Don't forget to wait for our TFU job before rendering from it.
Eric Anholt [Fri, 14 Dec 2018 17:43:15 +0000 (09:43 -0800)]
v3d: Don't forget to wait for our TFU job before rendering from it.

Otherwise we may race to read old contents.  This didn't show up in the
CTS and piglit for me, but it did once I started using the TFU to do
linear->UIF blits for X11.

Fixes: 2ebca177dc18 ("v3d: Use the TFU to do generatemipmap.")
5 years agonvc0: always keep TSC slot 0 bound to fix TXF
Ilia Mirkin [Sun, 2 Dec 2018 18:19:01 +0000 (13:19 -0500)]
nvc0: always keep TSC slot 0 bound to fix TXF

Same as on nv50, the TXF op always uses the TSC bound to slot 0,
returning blank values if nothing is bound.

An earlier change arranges for the TSC entries list to always have valid
data at entry 0, so here we just make use of it.

Fixes arb_texture_buffer_object-subdata-sync among others.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agonvc0: replace use of explicit default_tsc with entry 0
Ilia Mirkin [Sun, 2 Dec 2018 17:27:23 +0000 (12:27 -0500)]
nvc0: replace use of explicit default_tsc with entry 0

This was used for implementing FBFETCH. However that uses TXF, which
doesn't do much with a TSC. The only important bit is that sRGB-decoding
works as expected, which we can achieve since all samplers we ever
generate enable sRGB-decoding. Always point to entry 0 in the TSC table,
and ensure that even before it ever gets initialized, the sRGB-decoding
enable bit is set.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agofreedreno/a6xx: fix corrupted uniforms
Rob Clark [Fri, 14 Dec 2018 19:35:54 +0000 (14:35 -0500)]
freedreno/a6xx: fix corrupted uniforms

For older gen's fd_wfi() is used to conditionally insert a WFI if there
hasn't already been one since last draw.  But this doesn't work out well
with stateobj since the order the stateobj is evaluated might not be
what you expect.  (Ie. stateobj might not be evaluated until a later
draw if there is no geometry from the current draw in a given tile.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agopci_ids: add new vega20 pci id
Alex Deucher [Fri, 7 Dec 2018 21:10:33 +0000 (16:10 -0500)]
pci_ids: add new vega20 pci id

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
5 years agopci_ids: add new vega10 pci ids
Alex Deucher [Fri, 7 Dec 2018 21:09:16 +0000 (16:09 -0500)]
pci_ids: add new vega10 pci ids

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
5 years agoi965/gen9: Add workarounds for object preemption.
Rafael Antognolli [Mon, 29 Oct 2018 17:19:54 +0000 (10:19 -0700)]
i965/gen9: Add workarounds for object preemption.

Gen9 hardware requires some workarounds to disable preemption depending
on the type of primitive being emitted.

We implement this by adding a function that checks the primitive type
and number of instances right before the 3DPRIMITIVE.

For now, we just ignore blorp.  The only primitive it emits is
3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can
safely leave preemption enabled when it happens. Or it will be disabled
by a previous 3DPRIMITIVE, which should be fine too.

v3:
 - Apply missing workarounds for instanced rendering and line loop (Ken)
 - Move workaround code to brw_draw_single_prim()

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoi965/gen10+: Enable object level preemption.
Rafael Antognolli [Mon, 29 Oct 2018 17:19:53 +0000 (10:19 -0700)]
i965/gen10+: Enable object level preemption.

Set bit when initializing context.

v3:
 - Always toggle preemption bool to false before enabling it for the
 first time, so the state gets emitted (Chris Wilson).
 - Emit end of pipe sync with PIPE_CONTROL_RENDER_TARGET_FLUSH (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agointel/genxml: Add register for object preemption.
Rafael Antognolli [Mon, 29 Oct 2018 17:19:52 +0000 (10:19 -0700)]
intel/genxml: Add register for object preemption.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoutil/slab: Rename slab_mempool typed parameters to mempool
Ian Romanick [Mon, 26 Nov 2018 18:28:02 +0000 (10:28 -0800)]
util/slab: Rename slab_mempool typed parameters to mempool

Now everything with type 'struct slab_child_pool *' is name pool, and
everything with type 'struct slab_mempool *' is named mempool.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir/phi_builder: Internal users should use nir_phi_builder_value_set_block_def too
Ian Romanick [Tue, 30 Oct 2018 16:46:26 +0000 (09:46 -0700)]
nir/phi_builder: Internal users should use nir_phi_builder_value_set_block_def too

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoetnaviv: drop redundant ctx function parameter
Christian Gmeiner [Wed, 12 Dec 2018 13:45:56 +0000 (14:45 +0100)]
etnaviv: drop redundant ctx function parameter

There is no need to have an extra ctx paramter as all the other
parameters carry all the needed information.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
5 years agogenxml: Consistently use a numeric "MOCS" field
Kenneth Graunke [Tue, 11 Dec 2018 08:34:11 +0000 (00:34 -0800)]
genxml: Consistently use a numeric "MOCS" field

When we first started using genxml, we decided to represent MOCS as an
actual structure, and pack values.  However, in many places, it was more
convenient to use a numeric value rather than treating it as a struct,
so we added secondary setters in a bunch of places as well.

We were not entirely consistent, either.  Some places only had one.
Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens
only had the struct-based setters.  The names were sometimes "Constant
Buffer Object Control State" instead of "Memory", making it harder to
find.  Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer
packet...which is a bit redundant.

On modern hardware, MOCS is simply an index into a table, but we were
still carrying around the structure with an "Index to MOCS Table" field,
in addition to the direct numeric setters.  This is clunky - we really
just want a number on new hardware.

This patch eliminates the struct-based setters, and makes the numeric
setters be consistently called "MOCS".  We leave the struct definition
around on Gen7-8 for reference purposes, but it is unused.

v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agonir: fix opt_if_loop_last_continue()
Timothy Arceri [Thu, 13 Dec 2018 23:23:27 +0000 (10:23 +1100)]
nir: fix opt_if_loop_last_continue()

The pass did not correctly handle loops ending in:

if ssa_7 {
block block_8:
/* preds: block_7 */
continue
/* succs: block_1 */
} else {
block block_9:
/* preds: block_7 */
break
/* succs: block_11 */
}

The break will get eliminated by another opt but if this pass gets
called first (as it does on RADV) we ended up inserting
instructions after the break.

Fixes: 5921a19d4b0c ("nir: add if opt opt_if_loop_last_continue()")
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agofreedreno/a6xx: fix resource_copy_region()
Rob Clark [Thu, 13 Dec 2018 14:15:33 +0000 (09:15 -0500)]
freedreno/a6xx: fix resource_copy_region()

pctx->resource_copy_region() needs to fall back to sw copy for
non-renderable formats.  But previously for things that we could
not use the blitter for, would fall back to 3d.  Which won't work
if 3d can't render to the dst format either.

Instead rework things to fallback to fd_resource_copy_region(),
which will try 3d core and then fall back to memcpy().

Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: move fd_resource_copy_region()
Rob Clark [Thu, 13 Dec 2018 14:14:48 +0000 (09:14 -0500)]
freedreno: move fd_resource_copy_region()

Code-motion prep for next patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: more blitter fixes
Rob Clark [Tue, 11 Dec 2018 17:36:52 +0000 (12:36 -0500)]
freedreno/a6xx: more blitter fixes

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: update generated headers
Rob Clark [Tue, 11 Dec 2018 15:59:53 +0000 (10:59 -0500)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agogallium/aux: add is_unorm() helper
Rob Clark [Tue, 11 Dec 2018 16:13:49 +0000 (11:13 -0500)]
gallium/aux: add is_unorm() helper

We already had one for is_snorm() but not unorm.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: fix blitter crash
Rob Clark [Mon, 10 Dec 2018 15:40:31 +0000 (10:40 -0500)]
freedreno/a6xx: fix blitter crash

Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot

Also fixes gpu hangs with some formats that are supported, but which we
don't know what internal-format to use for the blitter, for ex
dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot

Signed-off-by: Rob Clark <robdclark@gmail.com>