git.libre-soc.org Git - mesa.git/log

projects / mesa.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Erico Nunes [Tue, 16 Apr 2019 20:49:41 +0000 (22:49 +0200)]

nir/algebraic: add lowering for fsign

The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Brian Paul [Fri, 19 Apr 2019 14:30:27 +0000 (08:30 -0600)]

docs: s/Aptril/April/

Found by Manuel Huber. Trivial.

commit | commitdiff | tree

Erico Nunes [Tue, 16 Apr 2019 21:21:24 +0000 (23:21 +0200)]

lima/ppir: support ppir_op_ceil

Add a few missing ppir_op_ceil enum handling entries to implement
nir_op_fceil in lima ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>

commit | commitdiff | tree

Bas Nieuwenhuizen [Thu, 14 Mar 2019 10:20:53 +0000 (11:20 +0100)]

radv: Support VK_EXT_inline_uniform_block.

Basically just reserve the memory in the descriptor sets.

On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.

This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.

v2: - rebased on top of master (Samuel)
    - remove the loading resources rework (Samuel)
    - only load UBO descriptors if it's a pointer (Samuel)
    - use LLVMBuildPtrToInt to avoid IR failures (Samuel)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)

commit | commitdiff | tree

Samuel Pitoiset [Thu, 18 Apr 2019 07:09:55 +0000 (09:09 +0200)]

ac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap

This is actually fixed now.

This change requires LLVM r358579. Make sure to have it in
your tree, otherwise the following piglit will hang:

tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 18 Apr 2019 07:06:49 +0000 (09:06 +0200)]

ac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+

They are buggy with older LLVM version, see r358579.

Fixes: 78c551aca1c ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 18 Apr 2019 07:17:04 +0000 (09:17 +0200)]

ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+

They are buggy with LLVM 8 because they weren't marked as source
of divergence, see r358579.

Fixes: dd0172e865f ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Kenneth Graunke [Thu, 18 Apr 2019 21:35:26 +0000 (14:35 -0700)]

iris: Be less aggressive at postdraw work skipping

We empty the cache sets when flushing the batch, at which point we need
to add any framebuffer related BOs even though the bindings haven't
changed.  So, we now do the cache set tracking unconditionally.

For now, we continue skipping resolve work based on the same conditions
in the predraw functions - the thinking is if we didn't trigger
resolves, there's nothing to update here.  Time will tell if this works.

Partly reverts commit 365886ebe1a54f893b688b457553eead6aa572ea, and
fixes Unigine Valley rendering on Gen9+.  Drops drawoverhead scores
by about 10-12%.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110353

commit | commitdiff | tree

Jason Ekstrand [Sat, 13 Apr 2019 21:01:50 +0000 (16:01 -0500)]

intel/fs: Account for live range lengths in spill costs

The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes.  The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling.  However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.

This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.

Shader-db results on Kaby Lake:

    total spills in shared programs: 23664 -> 12050 (-49.08%)
    spills in affected programs: 19243 -> 7629 (-60.35%)
    helped: 296
    HURT: 8

    total fills in shared programs: 32028 -> 25139 (-21.51%)
    fills in affected programs: 20378 -> 13489 (-33.81%)
    helped: 295
    HURT: 16

Of course, most of that is in Deus Ex...

Shader-db results on Kaby Lake (without Deus Ex):

    total spills in shared programs: 6479 -> 5834 (-9.96%)
    spills in affected programs: 3231 -> 2586 (-19.96%)
    helped: 40
    HURT: 4

    total fills in shared programs: 17165 -> 17099 (-0.38%)
    fills in affected programs: 6951 -> 6885 (-0.95%)
    helped: 40
    HURT: 7

Even without Deus Ex, the spill help is pretty respectable.  The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.

VkPipeline-db results on Kaby Lake:

    total spills in shared programs: 9149 -> 8069 (-11.80%)
    spills in affected programs: 5197 -> 4117 (-20.78%)
    helped: 27
    HURT: 16

    total fills in shared programs: 26390 -> 25477 (-3.46%)
    fills in affected programs: 12662 -> 11749 (-7.21%)
    helped: 24
    HURT: 22

The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Gurchetan Singh [Sat, 15 Dec 2018 00:07:19 +0000 (16:07 -0800)]

virgl/vtest: bump up protocol version + support encoded transfers

This more accurately reflects what the drm winsys does.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Gurchetan Singh [Sat, 15 Dec 2018 00:36:07 +0000 (16:36 -0800)]

virgl/vtest: wait after issuing a transfer get

Otherwise, there's artifacts when running Unigine Valley with
protocol version 2.

We can get away with not waiting for most buffers, but let's
be conservative.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Gurchetan Singh [Thu, 13 Dec 2018 02:01:06 +0000 (18:01 -0800)]

virgl/vtest: modify sending and receiving data for shared memory

We need to copy the shared memory region to the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Gurchetan Singh [Wed, 12 Dec 2018 17:49:35 +0000 (09:49 -0800)]

virgl/vtest: receive and handle shared memory fd

The only tricky part is with protocol 0 we can either have
a display target or resource backing store. With protocol
2 we can have both. Make the map/unmap functions only deal
with the resource backing store.

v2: Handle MSAA texture case.
v3: spelling
v4: Fix dangling else (@prak)
v5: mmap --> os_mmap (@prak) + added comments (@gerddie)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Gurchetan Singh [Wed, 12 Dec 2018 18:08:06 +0000 (10:08 -0800)]

virgl/vtest: plumb support for shared memory

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Gurchetan Singh [Wed, 12 Dec 2018 01:01:34 +0000 (17:01 -0800)]

virgl/vtest: add utilities for receiving fds

v2: recieve --> receive (airlied@)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Gurchetan Singh [Wed, 12 Dec 2018 23:43:43 +0000 (15:43 -0800)]

virgl/vtest: execute a transfer_get when flushing the front buffer

This just moves everything to a helper function -- "flush_front_buffer"
will be used later.

virgl_vtest_resource_map / virgl_vtest_resource_unmap already take
care to map the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Gurchetan Singh [Tue, 16 Apr 2019 03:36:54 +0000 (20:36 -0700)]

virgl: wait after a flush

We really need to wait under certain circumstances, or we can end
up writing to memory the same time the host is reading.

Partial revert of d6dc68 ("virgl: use uint16_t mask instead of separate booleans").

Test cases:
   - dEQP-GLES31.functional.texture.texture_buffer.render_modify.as_vertex_array.bufferdata
     on vtest protocol version 2
   - Flickering during Alien Isolation
Fixes: d6dc68 ("virgl: use uint16_t mask instead of separate booleans")
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>

commit | commitdiff | tree

Lionel Landwerlin [Thu, 18 Apr 2019 16:39:36 +0000 (17:39 +0100)]

anv: fix uninitialized pthread cond clock domain

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 843775bab78a6b ("anv: Rework fences")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Kristian H. Kristensen [Thu, 18 Apr 2019 17:31:31 +0000 (10:31 -0700)]

.gitignore: Remove autotool artifacts

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>

commit | commitdiff | tree

Eric Anholt [Wed, 17 Apr 2019 21:44:44 +0000 (14:44 -0700)]

v3d: Fix atomic cmpxchg in shaders on hardware.

In what might be my first case of finding a divergence between hardware
and simpenrose for v3d 4.x, it seems that despite what the spec claims,
you actually need specific values in the TYPE field for atomic ops.

Fixes dEQP-GLES31.functional.*.compswap.*

commit | commitdiff | tree

Eric Anholt [Wed, 17 Apr 2019 21:07:20 +0000 (14:07 -0700)]

v3d: Fix an invalid reuse of flags generation from before a thrsw.

Noticed while debugging the last GLES 3.1 failure, though it doesn't seem
to affect that bug.

commit | commitdiff | tree

Jason Ekstrand [Thu, 18 Apr 2019 20:04:42 +0000 (15:04 -0500)]

anv: Drop some unneeded ANV_FROM_HANDLE for physical devices

Ever since 48ed2a7bb009618ed, we've had one at the top of the function.

Reviewed-by: Caio Marcelo de Oliveira Filho caio.oliveira@intel.com

commit | commitdiff | tree

Jason Ekstrand [Thu, 18 Apr 2019 19:19:29 +0000 (14:19 -0500)]

anv: Re-sort the GetPhysicalDeviceFeatures2 switch statement

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Marek Olšák [Wed, 17 Apr 2019 15:17:18 +0000 (11:17 -0400)]

radeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaround

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Ian Romanick [Tue, 18 Dec 2018 06:29:26 +0000 (22:29 -0800)]

nir/algebraic: Strength reduce some compares of x and -x

Converting the x vs -x comparison to an x vs 0 comparison enable cmod
propagation to help.

The seems to be a win everywhere except Gen7.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15566733 -> 15566014 (<.01%)
instructions in affected programs: 72617 -> 71898 (-0.99%)
helped: 302
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98%
95% mean confidence interval for instructions value: -2.55 -2.21
95% mean confidence interval for instructions %-change: -1.40% -1.16%
Instructions are helped.

total cycles in shared programs: 413014786 -> 413015475 (<.01%)
cycles in affected programs: 707594 -> 708283 (0.10%)
helped: 227
HURT: 101
helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20
helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49%
HURT stats (abs)   min: 2 max: 334 x̄: 87.90 x̃: 45
HURT stats (rel)   min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36%
95% mean confidence interval for cycles value: -8.12 12.32
95% mean confidence interval for cycles %-change: -0.67% 0.34%
Inconclusive result (value mean confidence interval includes 0).

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13828220 -> 13827881 (<.01%)
instructions in affected programs: 60887 -> 60548 (-0.56%)
helped: 253
HURT: 6
helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1
helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27%
95% mean confidence interval for instructions value: -1.39 -1.23
95% mean confidence interval for instructions %-change: -0.85% -0.70%
Instructions are helped.

total cycles in shared programs: 386870095 -> 386894412 (<.01%)
cycles in affected programs: 1537307 -> 1561624 (1.58%)
helped: 127
HURT: 188
helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4
helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33%
HURT stats (abs)   min: 2 max: 5585 x̄: 141.43 x̃: 14
HURT stats (rel)   min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06%
95% mean confidence interval for cycles value: 21.95 132.45
95% mean confidence interval for cycles %-change: 0.32% 0.85%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10896339 -> 10896276 (<.01%)
instructions in affected programs: 10757 -> 10694 (-0.59%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89%
95% mean confidence interval for instructions value: -1.42 -1.15
95% mean confidence interval for instructions %-change: -1.03% -0.72%
Instructions are helped.

total cycles in shared programs: 155091003 -> 155090480 (<.01%)
cycles in affected programs: 102761 -> 102238 (-0.51%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4
helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36%
95% mean confidence interval for cycles value: -12.98 -7.53
95% mean confidence interval for cycles %-change: -0.97% -0.56%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8234667 -> 8234652 (<.01%)
instructions in affected programs: 2063 -> 2048 (-0.73%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.97% -0.67%
Instructions are helped.

total cycles in shared programs: 188700906 -> 188700598 (<.01%)
cycles in affected programs: 283480 -> 283172 (-0.11%)
helped: 83
HURT: 3
helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4
helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04%
95% mean confidence interval for cycles value: -3.87 -3.29
95% mean confidence interval for cycles %-change: -0.16% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Tue, 18 Dec 2018 05:34:11 +0000 (21:34 -0800)]

nir/algebraic: Fix some 1-bit Boolean weirdness

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total cycles in shared programs: 372594532 -> 372594460 (<.01%)
cycles in affected programs: 46854 -> 46782 (-0.15%)
helped: 9
HURT: 0
helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09%
95% mean confidence interval for cycles value: -14.34 -1.66
95% mean confidence interval for cycles %-change: -0.28% -0.04%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 12038379 -> 12038373 (<.01%)
instructions in affected programs: 1278 -> 1272 (-0.47%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55%

total cycles in shared programs: 180889027 -> 180888997 (<.01%)
cycles in affected programs: 29979 -> 29949 (-0.10%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5
helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07%
95% mean confidence interval for cycles value: -13.40 1.40
95% mean confidence interval for cycles %-change: -0.27% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total cycles in shared programs: 155091021 -> 155091003 (<.01%)
cycles in affected programs: 8842 -> 8824 (-0.20%)
helped: 2
HURT: 0

No changes on Iron Lake or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Thu, 6 Sep 2018 03:45:19 +0000 (20:45 -0700)]

nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel

All of the affected shaders are in Mad Max.  I noticed this while
looking at some other things.  I tried a couple similar patterns, but
the affect on cycles was general negative.  It may be worth revisiting
this later.

v2: Rebase on 1-bit Boolean changes.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.

total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Thu, 22 Feb 2018 02:30:20 +0000 (18:30 -0800)]

nir/algebraic: Recognize open-coded copysign(1.0, a)

All of the affected shaders are in Mad Max.  The inner part of the
pattern is itself an open-coded sign(a).  I tried using that as a
pattern, but the results were not good.  A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.

v2: Rebase on 1-bit Boolean changes.

v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.

total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Tue, 26 Jun 2018 02:55:31 +0000 (19:55 -0700)]

intel/fs: Generate better code for fsign multiplied by a value

v2: Rebase on v2 changes in previous two commits.

v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

shader-db results:

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15297100 -> 15282141 (-0.10%)
instructions in affected programs: 956685 -> 941726 (-1.56%)
helped: 4527
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.30 x̃: 2
helped stats (rel) min: 0.07% max: 10.53% x̄: 1.85% x̃: 1.37%
95% mean confidence interval for instructions value: -3.48 -3.12
95% mean confidence interval for instructions %-change: -1.88% -1.81%
Instructions are helped.

total cycles in shared programs: 372809551 -> 372597886 (-0.06%)
cycles in affected programs: 13645512 -> 13433847 (-1.55%)
helped: 4362
HURT: 125
helped stats (abs) min: 1 max: 2088 x̄: 50.73 x̃: 28
helped stats (rel) min: 0.01% max: 28.20% x̄: 2.77% x̃: 2.39%
HURT stats (abs)   min: 1 max: 1836 x̄: 76.90 x̃: 28
HURT stats (rel)   min: <.01% max: 34.36% x̄: 3.03% x̃: 1.42%
95% mean confidence interval for cycles value: -50.98 -43.37
95% mean confidence interval for cycles %-change: -2.67% -2.55%
Cycles are helped.

total spills in shared programs: 23465 -> 23463 (<.01%)
spills in affected programs: 42 -> 40 (-4.76%)
helped: 1
HURT: 0

total fills in shared programs: 31766 -> 31763 (<.01%)
fills in affected programs: 69 -> 66 (-4.35%)
helped: 1
HURT: 0

Haswell
total instructions in shared programs: 13839992 -> 13828311 (-0.08%)
instructions in affected programs: 712503 -> 700822 (-1.64%)
helped: 3477
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.07% max: 10.64% x̄: 1.96% x̃: 1.52%
95% mean confidence interval for instructions value: -3.58 -3.14
95% mean confidence interval for instructions %-change: -2.01% -1.92%
Instructions are helped.

total cycles in shared programs: 387026330 -> 386872483 (-0.04%)
cycles in affected programs: 11329966 -> 11176119 (-1.36%)
helped: 3307
HURT: 139
helped stats (abs) min: 2 max: 1776 x̄: 49.58 x̃: 18
helped stats (rel) min: 0.01% max: 20.38% x̄: 2.27% x̃: 1.79%
HURT stats (abs)   min: 1 max: 2314 x̄: 72.68 x̃: 20
HURT stats (rel)   min: <.01% max: 33.99% x̄: 2.28% x̃: 0.96%
95% mean confidence interval for cycles value: -49.31 -39.98
95% mean confidence interval for cycles %-change: -2.15% -2.01%
Cycles are helped.

LOST:   1
GAINED: 0

Ivy Bridge
total instructions in shared programs: 12045602 -> 12038463 (-0.06%)
instructions in affected programs: 623837 -> 616698 (-1.14%)
helped: 2498
HURT: 0
helped stats (abs) min: 1 max: 39 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.05% max: 10.00% x̄: 1.30% x̃: 1.05%
95% mean confidence interval for instructions value: -2.96 -2.75
95% mean confidence interval for instructions %-change: -1.34% -1.26%
Instructions are helped.

total cycles in shared programs: 181025675 -> 180891323 (-0.07%)
cycles in affected programs: 11329329 -> 11194977 (-1.19%)
helped: 2439
HURT: 47
helped stats (abs) min: 1 max: 1565 x̄: 57.06 x̃: 26
helped stats (rel) min: 0.02% max: 24.56% x̄: 2.02% x̃: 1.64%
HURT stats (abs)   min: 1 max: 1269 x̄: 102.51 x̃: 43
HURT stats (rel)   min: 0.11% max: 52.94% x̄: 4.15% x̃: 1.34%
95% mean confidence interval for cycles value: -59.91 -48.17
95% mean confidence interval for cycles %-change: -1.99% -1.82%
Cycles are helped.

Sandy Bridge, Iron Lake, and GM45 had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10896368 -> 10896339 (<.01%)
instructions in affected programs: 3767 -> 3738 (-0.77%)
helped: 17
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.71 x̃: 1
helped stats (rel) min: 0.13% max: 9.52% x̄: 3.58% x̃: 2.73%
95% mean confidence interval for instructions value: -2.27 -1.14
95% mean confidence interval for instructions %-change: -5.14% -2.03%
Instructions are helped.

total cycles in shared programs: 155091109 -> 155091021 (<.01%)
cycles in affected programs: 47241 -> 47153 (-0.19%)
helped: 15
HURT: 8
helped stats (abs) min: 2 max: 81 x̄: 15.73 x̃: 4
helped stats (rel) min: 0.03% max: 10.59% x̄: 1.55% x̃: 0.71%
HURT stats (abs)   min: 14 max: 32 x̄: 18.50 x̃: 17
HURT stats (rel)   min: 0.32% max: 2.79% x̄: 2.43% x̃: 2.71%
95% mean confidence interval for cycles value: -14.59 6.93
95% mean confidence interval for cycles %-change: -1.41% 1.08%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]

commit | commitdiff | tree

Ian Romanick [Tue, 26 Jun 2018 02:53:38 +0000 (19:53 -0700)]

intel/fs: Add a scale factor to emit_fsign

Normally fsign generates -1, 0, or +1. The new scale factor, S, causes
fsign to generate -S, 0, or +S.

v2: Rebase on v2 changes in previous commit.

v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]

commit | commitdiff | tree

Ian Romanick [Tue, 26 Jun 2018 02:50:56 +0000 (19:50 -0700)]

intel/fs: Refactor code generation for nir_op_fsign to its own function

v2: Call emit_fsign from inside the existing switch statement.
Suggested by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Sun, 9 Sep 2018 18:37:24 +0000 (11:37 -0700)]

intel/fs: Eliminate dead code first

This simplifies the later patch "i965/fs: Generate better code for fsign
multiplied by a value".

shader-db results:

Broadwell and Skylake had similar results. (Skylake shown)
total cycles in shared programs: 372808735 -> 372809551 (<.01%)
cycles in affected programs: 1519520 -> 1520336 (0.05%)
helped: 243
HURT: 277
helped stats (abs) min: 1 max: 226 x̄: 34.05 x̃: 5
helped stats (rel) min: 0.01% max: 13.88% x̄: 1.46% x̃: 0.27%
HURT stats (abs)   min: 1 max: 1810 x̄: 32.82 x̃: 5
HURT stats (rel)   min: 0.01% max: 16.03% x̄: 1.56% x̃: 0.29%
95% mean confidence interval for cycles value: -7.18 10.32
95% mean confidence interval for cycles %-change: -0.17% 0.46%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge, Haswell and Ivy Bridge had similar results. (Sandy Bridge shown)
total cycles in shared programs: 155091458 -> 155091109 (<.01%)
cycles in affected programs: 370797 -> 370448 (-0.09%)
helped: 24
HURT: 36
helped stats (abs) min: 1 max: 331 x̄: 103.17 x̃: 41
helped stats (rel) min: 0.02% max: 7.70% x̄: 2.07% x̃: 0.56%
HURT stats (abs)   min: 1 max: 291 x̄: 59.08 x̃: 10
HURT stats (rel)   min: 0.02% max: 5.29% x̄: 1.02% x̃: 0.15%
95% mean confidence interval for cycles value: -37.92 26.28
95% mean confidence interval for cycles %-change: -0.88% 0.45%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (GM45 shown)
total cycles in shared programs: 129133970 -> 129133978 (<.01%)
cycles in affected programs: 111966 -> 111974 (<.01%)
helped: 3
HURT: 1
helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 16 max: 16 x̄: 16.00 x̃: 16
HURT stats (rel)   min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
95% mean confidence interval for cycles value: -12.93 16.93
95% mean confidence interval for cycles %-change: -0.05% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Kristian H. Kristensen [Thu, 18 Apr 2019 17:44:02 +0000 (10:44 -0700)]

freedreno: Fix format string warning

Modifiers are uin64_t.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>

commit | commitdiff | tree

Kristian H. Kristensen [Thu, 18 Apr 2019 17:40:45 +0000 (10:40 -0700)]

freedreno/a6xx: Add helper for incrementing regid

Increments the regid by specified amount unless regid is is
r63.x (invalid).

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>

commit | commitdiff | tree

Kristian H. Kristensen [Thu, 18 Apr 2019 17:38:56 +0000 (10:38 -0700)]

freedreno: Use enum values from matching enum

We get a couple of warnings from using mismatched enum values. This
fixes that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>

commit | commitdiff | tree

Kristian H. Kristensen [Wed, 10 Apr 2019 20:08:00 +0000 (13:08 -0700)]

freedreno/a2xx: Fix redundant if statement

We test the condition, declare a few variables, then test the exact
same condition again. Let's not do that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>

commit | commitdiff | tree

Kristian H. Kristensen [Wed, 10 Apr 2019 20:06:39 +0000 (13:06 -0700)]

freedreno/ir3: Mark ir3_context_error() as NORETURN

Fixes a few warnings.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>

commit | commitdiff | tree

Jason Ekstrand [Wed, 17 Apr 2019 22:18:19 +0000 (17:18 -0500)]

nir: Add a nir_src_as_intrinsic() helper

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Wed, 17 Apr 2019 22:10:18 +0000 (17:10 -0500)]

nir: Rework nir_src_as_alu_instr to not take a pointer

Other nir_src_as_* functions just take a nir_src. It's not that much
more memory copying and the constness preserving really isn't worth the
cognitive dissonance.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Wed, 17 Apr 2019 22:01:14 +0000 (17:01 -0500)]

nir: Drop "struct" from some nir_* declarations

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Thu, 18 Apr 2019 11:00:19 +0000 (12:00 +0100)]

anv: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

commit | commitdiff | tree

Lionel Landwerlin [Thu, 18 Apr 2019 11:00:08 +0000 (12:00 +0100)]

i965: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

commit | commitdiff | tree

Lionel Landwerlin [Thu, 18 Apr 2019 10:57:57 +0000 (11:57 +0100)]

iris: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 22 Jun 2018 09:41:28 +0000 (11:41 +0200)]

anv/device: expose VK_KHR_shader_float16_int8 in gen8+

v2 (Jason):
- Merge shaderFloat16 and shaderInt8 enablement into a single patch.
- Merge extension enable.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 22 Jan 2019 10:26:03 +0000 (11:26 +0100)]

anv/pipeline: support Float16 and Int8 SPIR-V capabilities in gen8+

v2:
  - Merge Float16 and Int8 capabilities into a single patch (Jason)
  - Merged patch that enabled SPIR-V front-end checks for these caps
    (except for Int8, which was already merged)

v3:
- Keep capabilities sorted (Jason)

v4:
- SpvCapabilityFloat16 support already added in master (Juan)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 22 Jan 2019 10:27:09 +0000 (11:27 +0100)]

compiler/spirv: move the check for Int8 capability

So it is right after the checks for the other various Int* capabilities.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 6 Feb 2019 08:13:22 +0000 (09:13 +0100)]

intel/compiler: validate region restrictions for mixed float mode

v2:
- Adapted unit tests to make them consistent with the changes done
   to the validation of half-float conversions.

v3 (Curro):
- Check all the accummulators
- Constify declarations
- Do not check src1 type in single-source instructions.
- Check for all instructions that read accumulator (either implicitly or
  explicitly)
- Check restrictions in src1 too.
- Merge conditional block
- Add invalid test case.

v4 (Curro):
- Assert on 3-src instructions, as they are not validated.
- Get rid of types_are_mixed_float(), as we know instruction is mixed
  float at that point.
- Remove conditions from not verified case.
- Fix brackets on conditional.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 8 Feb 2019 08:20:56 +0000 (09:20 +0100)]

intel/compiler: validate conversions between 64-bit and 8-bit types

v2:
- Add some tests with UB type too (Jason)

v3:
- consider implicit conversions from 2src instructions too (Curro).

v4:
- Do not check src1 type in single-source instructions (Curro).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 1 Feb 2019 10:41:33 +0000 (11:41 +0100)]

intel/compiler: validate region restrictions for half-float conversions

v2:
- Consider implicit conversions in 2-src instructions too (Curro)
- For restrictions that involve destination stride requirements
   only validate them for Align1, since Align16 always requires
   packed data.
- Skip general rule for the dst/execution type size ratio for
   mixed float instructions on CHV and SKL+, these have their own
   set of rules that we'll be validated separately.

v3 (Curro):
- Do not check src1 type in single-source instructions.
- Check restriction on src1.
- Remove invalid test.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 5 Feb 2019 12:50:09 +0000 (13:50 +0100)]

intel/compiler: also set F execution type for mixed float mode in BDW

The section 'Execution Data Types' of 3D Media GPGPU volume, which
describes execution types, is exactly the same in BDW and SKL+.

Also, this section states that there is a single execution type, so it
makes sense that this is the wider of the two floating point types
involved in mixed float mode, which is what we do for SKL+ and CHV.

v2:
- Make sure we also account for the destination type in mixed mode (Curro).

Acked-by: Francisco Jerez <currojerez@riseup.net>

commit | commitdiff | tree

Iago Toral Quiroga [Thu, 14 Mar 2019 09:35:58 +0000 (10:35 +0100)]

intel/compiler: implement SIMD16 restrictions for mixed-float instructions

v2: f32to16/f16to32 can use a :W destination (Curro)
v3: check destination is packed (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 12 Feb 2019 08:34:10 +0000 (09:34 +0100)]

intel/compiler: skip MAD algebraic optimization for half-float or mixed mode

It is very likely that this optimzation is never useful and we'll probably
just end up removing it, so let's not bother adding more cases to it for
now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 12 Feb 2019 11:43:30 +0000 (12:43 +0100)]

intel/compiler: remove inexact algebraic optimizations from the backend

NIR already has these and correctly considers exact/inexact qualification,
whereas the backend doesn't and can apply the optimizations where it
shouldn't. This happened to be the case in a handful of Tomb Raider shaders,
where NIR would skip the optimizations because of a precise qualification
but the backend would then (incorrectly) apply them anyway.

Besides this, considering that we are not emitting much math in the backend
these days it is unlikely that these optimizations are useful in general. A
shader-db run confirms that MAD and LRP optimizations, for example, were only
being triggered in cases where NIR would skip them due to precise
requirements, so in the near future we might want to remove more of these,
but for now we just remove the ones that are not completely correct.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 19 Nov 2018 12:08:07 +0000 (13:08 +0100)]

intel/compiler: fix cmod propagation for non 32-bit types

v2:
- Do not propagate if the bit-size changes

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 20 Nov 2018 13:04:26 +0000 (14:04 +0100)]

intel/compiler: add a brw_reg_type_is_integer helper

v2:
- Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 26 Oct 2018 11:40:27 +0000 (13:40 +0200)]

intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit

There are no 8-bit immediates, so assert in that case.
16-bit immediates are replicated in each word of a 32-bit immediate, so
we only need to check the lower 16-bits.

v2:
- Fix is_zero with half-float to consider -0 as well (Jason).
- Fix is_negative_one for word type.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 29 Jan 2019 09:58:49 +0000 (10:58 +0100)]

intel/compiler: generalize the combine constants pass

At the very least we need it to handle HF too, since we are doing
constant propagation for MAD and LRP, which relies on this pass
to promote the immediates to GRF in the end, but ideally
we want it to support even more types so we can take advantage
of it to improve register pressure in some scenarios.

v2 (Jason):
- Support 64-bit types too.
- Check if we need to set the half-float flag if the immediate already
existed.
- Multiply the size of the immediate by the width of the copy

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 7 Nov 2018 11:08:02 +0000 (12:08 +0100)]

intel/eu: force stride of 2 on NULL register for Byte instructions

The hardware only allows a stride of 1 on a Byte destination for raw
byte MOV instructions. This is required even when the destination
is the NULL register.

Rather than making sure that we emit a proper NULL:B destination
every time we need one, just fix it at emission time.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 4 Jan 2019 09:15:39 +0000 (10:15 +0100)]

intel/compiler: ask for an integer type if requesting an 8-bit type

v2:
- Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 17 Jul 2018 07:02:27 +0000 (09:02 +0200)]

intel/compiler: rework conversion opcodes

Now that we have the regioning lowering pass we can just put all of these
opcodes together in a single block and we can just assert on the few cases
of conversion instructions that are not supported in hardware and that should
be lowered in brw_nir_lower_conversions.

The only cases what we still handle separately are the conversions from float
to half-float since the rounding variants would need to fallthrough and we
are already doing this for boolean opcodes (since they need to negate), plus
there is also a large comment about these opcodes that we probably want to
keep so it is just easier to keep these separate.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 13 Jul 2018 08:03:14 +0000 (10:03 +0200)]

intel/compiler: activate 16-bit bit-size lowerings also for 8-bit

Particularly, we need the same lowewrings we use for 16-bit
integers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 10 Jul 2018 07:52:46 +0000 (09:52 +0200)]

intel/compiler: split is_partial_write() into two variants

This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.

One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.

The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.

For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).

This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.

This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.

Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.

Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.

Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:

            Patched  |     Master    |    %    |
================================================
SIMD8  |    621,900  |    706,721    | -12.00% |
================================================
SIMD16 |     93,252  |     93,252    |   0.00% |
================================================

As expected, the change only affects SIMD8 dispatches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 21 Jan 2019 11:11:44 +0000 (12:11 +0100)]

intel/compiler: workaround for SIMD8 half-float MAD in gen8

Empirical testing shows that gen8 has a bug where MAD instructions with
a half-float source starting at a non-zero offset fail to execute
properly.

This scenario usually happened in SIMD8 executions, where we used to
pack vector components Y and W in the second half of SIMD registers
(therefore, with a 16B offset). It looks like we are not currently doing
this any more but this would handle the situation properly if we ever
happen to produce code like this again.

v2 (Jason):
- Move this workaround to the lower_regioning pass as an additional case
to has_invalid_src_region()
- Do not apply the workaround if the stride of the source operand is 0,
testing suggests the problem doesn't exist in that case.

v3 (Jason):
- We want offset % REG_SIZE > 0, not just offset > 0
- Use a helper to compute the offset

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 30 May 2018 10:14:14 +0000 (12:14 +0200)]

intel/compiler: fix ddy for half-float in Broadwell

Broadwell has restrictions that apply to Align16 half-float that
make the Align16 implementation of this invalid for this platform.
Use the gen11 path for this instead, which uses Align1 mode.

The restriction is not present in cherryview, gen9 or gen10, where
the Align16 implementation seems to work just fine.

v2:
- Rework the comment in the code, move the PRM citation from the
commit message to the comment in the code (Matt)
- Cherryview isn't affected, only Broadwell (Matt)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 28 May 2018 10:32:08 +0000 (12:32 +0200)]

intel/compiler: fix ddx and ddy for 16-bit float

We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base offset we
have, but instead add to it.

v2
- Use byte_offset() helper (Jason)
- Merge the fix for SIMD8: using byte_offset() fixes that too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 22 May 2018 06:17:38 +0000 (08:17 +0200)]

intel/compiler: set correct precision fields for 3-source float instructions

Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
means half-precision. Since we always use the type of the destination for
all operands when we emit 3-source instructions, we only need set Src1Type
and Src2Type to 1 when we are emitting a half-precision instruction.

v2:
- Set the bit separately for each source based on its type so we can
   do mixed floating-point mode in the future (Topi).

v3:
- Use regular citation style for the comment referencing the PRM (Matt).
- Decided not to add asserts in the emission code to check that only
   mixed HF/F types are used since such checks would break negative tests
   for brw_eu_validate.c (Matt)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 22 May 2018 06:17:17 +0000 (08:17 +0200)]

intel/compiler: allow half-float on 3-source instructions since gen8

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 22 May 2018 08:21:29 +0000 (10:21 +0200)]

intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits

We are now using these bits, so don't assert that they are not set. In gen8,
if these bits are set compaction is not possible. On gen9 and CHV platforms
set_3src_control_index() checks these bits (and others) against a table to
validate if the particular bit combination is eligible for compaction or not.

v2
- Add more detail in the commit message explaining the situation for SKL+
and CHV (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 21 May 2018 12:42:42 +0000 (14:42 +0200)]

intel/compiler: add new half-float register type for 3-src instructions

This is available since gen8.

v2: restore previously existing assertion.

v3: don't use separate tables for gen7 and gen8, just assert that we
don't use half-float before gen8 (Matt)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 21 May 2018 12:34:01 +0000 (14:34 +0200)]

intel/compiler: add instruction setters for Src1Type and Src2Type.

The original SrcType is a 3-bit field that takes a subset of the types
supported for the hardware for 3-source instructions. Since gen8,
when the half-float type was added, 3-source floating point operations
can use use mixed precision mode, where not all the operands have the
same floating-point precision. While the precision for the first operand
is taken from the type in SrcType, the bits in Src1Type (bit 36) and
Src2Type (bit 35) define the precision for the other operands
(0: normal precision, 1: half precision).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 21 Jan 2019 08:47:59 +0000 (09:47 +0100)]

intel/compiler: drop unnecessary temporary from 32-bit fsign implementation

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 21 Jan 2019 08:47:15 +0000 (09:47 +0100)]

intel/compiler: implement 16-bit fsign

v2:
- make 16-bit be its own separate case (Jason)

v3:
- Drop the result_int temporary (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Thu, 26 Apr 2018 08:26:22 +0000 (10:26 +0200)]

intel/compiler: handle extended math restrictions for half-float

Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.

v2: quashed together the following patches (Jason):
  - intel/compiler: allow extended math functions with HF operands
  - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  - intel/compiler: extended Math is limited to SIMD8 on half-float

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
  (allow extended math functions with HF operands,
   extended Math is limited to SIMD8 on half-float)

commit | commitdiff | tree

Iago Toral Quiroga [Thu, 26 Apr 2018 08:12:12 +0000 (10:12 +0200)]

intel/compiler: lower some 16-bit float operations to 32-bit

The hardware doesn't support half-float for these.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 18 Dec 2018 08:27:21 +0000 (09:27 +0100)]

intel/compiler: assert restrictions on conversions to half-float

There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.

v2:
- rebased on top of regioning lowering pass

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Thu, 22 Nov 2018 09:59:59 +0000 (10:59 +0100)]

intel/compiler: handle b2i/b2f with other integer conversion opcodes

Since we handle booleans as integers this makes more sense.

v2:
- rebased to incorporate new boolean conversion opcodes

v3:
- rebased on top regioning lowering pass

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 2 Mar 2018 12:37:59 +0000 (13:37 +0100)]

intel/compiler: split float to 64-bit opcodes from int to 64-bit

Going forward having these split is a bit more convenient since these two
groups have different restrictions.

v2:
- Rebased on top of new regioning lowering pass.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 17 Dec 2018 08:17:06 +0000 (09:17 +0100)]

intel/compiler: add a NIR pass to lower conversions

Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.

v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles on conversion sources.

v3
- Run the pass earlier, right after nir_opt_algebraic_late (Jason)
- NIR alu output types already have the bit-size (Jason)
- Use 'is_conversion' to identify conversion operations (Jason)

v4:
- Be careful about the intermediate types we use so we don't lose
range and avoid incorrect rounding semantics (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Dominik Drees [Mon, 15 Apr 2019 09:05:46 +0000 (11:05 +0200)]

Add no_aos_sampling GALLIVM_PERF option

This forces using general sampling and should improve precision and
performance in some cases.

commit | commitdiff | tree

Samuel Pitoiset [Tue, 26 Mar 2019 11:37:39 +0000 (12:37 +0100)]

ac: use struct/raw store intrinsics for 8-bit/16-bit int with LLVM 9+

This changes requires LLVM r356465.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 26 Mar 2019 11:24:52 +0000 (12:24 +0100)]

ac: use struct/raw load intrinsics for 8-bit/16-bit int with LLVM 9+

This changes requires LLVM r356465.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Samuel Pitoiset [Mon, 15 Apr 2019 13:23:58 +0000 (15:23 +0200)]

ac: add support for more types with struct/raw LLVM intrinsics

LLVM 9+ now supports 8-bit and 16-bit types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 16 Apr 2019 08:38:24 +0000 (10:38 +0200)]

radv: add VK_KHR_shader_atomic_int64 but disable it for now

No support for 64-bit compare&swap atomic operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 16 Apr 2019 08:38:23 +0000 (10:38 +0200)]

ac/nir: add 64-bit SSBO atomic operations support

Except compare&swap which is still buggy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 16 Apr 2019 08:38:22 +0000 (10:38 +0200)]

ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap

Use the raw version (ie. IDXEN=0) because vindex is unused.
Use the old intrinsic for compare&swap because the new one
hangs the GPU for some reasons.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Roland Scheidegger [Wed, 17 Apr 2019 00:34:01 +0000 (02:34 +0200)]

gallivm: fix saturated signed add / sub with llvm 9

llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and
now llvm 9 removed the signed versions as well - they were proposed for
removal earlier, but the pattern to recognize those was very complex,
so it wasn't done then. However, instead of these arch-specific
intrinsics, there's now arch-independent intrinsics for saturated
add / sub, both for signed and unsigned, so use these.
They should have only advantages (work with arbitrary vector sizes,
optimal code for all archs), although I don't know how well they work
in practice for other archs (at least for x86 they do the right thing).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454

Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Juan A. Suarez Romero [Wed, 17 Apr 2019 09:38:00 +0000 (09:38 +0000)]

meson: Add dependency on genxml to anvil genfiles

This fixes a race condition where anv_gen_files are executed before
genxml files, which causes a build failure

v2: add dependency on idep_genxml (Lionel)

Fixes: d1992255bb29054fa51763376d125183a9f602f
("meson: Add build Intel "anv" vulkan driver")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Fri, 5 Oct 2018 16:29:17 +0000 (17:29 +0100)]

intel/perf: constify accumlator parameter

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Tue, 2 Oct 2018 14:41:41 +0000 (15:41 +0100)]

intel/perf: drop counter size field

We can deduct the size from another field, let's just save some space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Mon, 18 Jun 2018 10:40:24 +0000 (11:40 +0100)]

i965: perf: add mdapi pipeline statistics queries on gen10/11

The Gen10+ expected format adds an additional counter which we can't
disclose yet. We can still make the size of the expected query result
match.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Fri, 8 Jun 2018 16:26:49 +0000 (17:26 +0100)]

intel/perf: stub gen10/11 missing definitions

Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Fri, 8 Jun 2018 21:18:46 +0000 (22:18 +0100)]

i965: move mdapi guid into intel/perf

One more thing we want to share between the different APIs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Fri, 8 Jun 2018 16:53:08 +0000 (17:53 +0100)]

i965: move mdapi result data format to intel/perf

We want to reuse this in Anv.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Fri, 8 Jun 2018 16:51:33 +0000 (17:51 +0100)]

i965: move brw_timebase_scale to device info

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Fri, 8 Jun 2018 14:29:51 +0000 (15:29 +0100)]

i965: move OA accumulation code to intel/perf

We'll want to reuse this in our Vulkan extension.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Thu, 7 Jun 2018 17:18:43 +0000 (18:18 +0100)]

i965: move mdapi data structure to intel/perf

We'll want to reuse those structures later on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Sun, 27 May 2018 19:33:25 +0000 (20:33 +0100)]

i965: extract performance query metrics

We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Lionel Landwerlin [Sun, 27 May 2018 19:36:49 +0000 (20:36 +0100)]

i965: store device revision in gen_device_info

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Topi Pohjolainen [Wed, 27 Mar 2019 16:38:15 +0000 (09:38 -0700)]

intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27

Similarly to 1cc17fb731466c68586915acbb916586457b19bc

Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

commit | commitdiff | tree

Erik Faye-Lund [Tue, 9 Apr 2019 12:25:51 +0000 (14:25 +0200)]

virgl: document potentially failing blit

This blit can fail, but this is not new; in the old version we
didn't even try to blit in this case. So let's just document the
limitation for now, and leave this for another day.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>