Jordan Justen [Tue, 26 Feb 2019 01:17:29 +0000 (17:17 -0800)]
intel/compiler: Move int64/doubles lowering options
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jordan Justen [Tue, 26 Feb 2019 01:13:48 +0000 (17:13 -0800)]
nir: Add int64/doubles options into nir_shader_compiler_options
This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Fri, 1 Mar 2019 22:41:59 +0000 (14:41 -0800)]
nir/algebraic: Optimize away an fsat of a b2f
The b2f can only produce 0.0 or 1.0, so the fsat does nothing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 1 Mar 2019 22:39:14 +0000 (14:39 -0800)]
intel/fs: Don't assert on b2f with a saturate modifier
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d609387 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
Lionel Landwerlin [Sat, 23 Feb 2019 23:27:17 +0000 (23:27 +0000)]
anv: add support for INTEL_DEBUG=bat
As requested by Ken ;)
v2: Also decode simple batches (Caio)
Fix u_vector usage issues (Lionel)
v3: Make binding/instruction/state/surface available (Lionel)
v4: Going through device pools for simple batches (Lionel)
Centralize search BO callbacks into anv_device.c (Lionel)
v5: Clear decoded batch buffer var after use (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Eric Anholt [Fri, 1 Mar 2019 20:48:51 +0000 (12:48 -0800)]
v3d: Fix build of NEON code with Mesa's cflags not targeting NEON.
v3d may be built as part of a set of drivers in a system not requiring
NEON, but we know V3D devices will be paired with CPUs with NEON so we
should be able to use this asm.
Fixes: 0c05198d6b5b ("v3d: Always enable the NEON utile load/store code.")
Matt Turner [Thu, 15 Feb 2018 22:43:30 +0000 (14:43 -0800)]
intel/compiler: Add commas on final values of compaction table arrays
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Ian Romanick [Sat, 23 Feb 2019 00:47:06 +0000 (16:47 -0800)]
nir/algebraic: Replace a-fract(a) with floor(a)
I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.
In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?
As far as I can tell, it's just because our scheduler is terrible. In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).
Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader. Once that happens, everything
is different.
I looked at one vertex shader that was hurt (from Goat Simulator). In
that case, both the floor and fract are used. The optimization
eliminates the add, and it should allow better scheduling. In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.
In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions. It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs:
15437001 ->
15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.
total cycles in shared programs:
383007378 ->
382997063 (<.01%)
cycles in affected programs:
1650825 ->
1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.
Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs:
5043616 ->
5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.
total cycles in shared programs:
128139812 ->
128135762 (<.01%)
cycles in affected programs:
3829724 ->
3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Ian Romanick [Mon, 3 Dec 2018 20:06:50 +0000 (12:06 -0800)]
intel/fs: Generate if instructions with inverted conditions
Per-platform results were all over the place, so I have included all the
results here. There is an important note at the bottom of the commit
message.
Skylake
total instructions in shared programs:
15184683 ->
15184679 (<.01%)
instructions in affected programs: 2786 -> 2782 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs:
370961367 ->
370961173 (<.01%)
cycles in affected programs: 205867 -> 205673 (-0.09%)
helped: 5
HURT: 1
helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16
helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55%
HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -93.01 28.34
95% mean confidence interval for cycles %-change: -0.82% 0.08%
Inconclusive result (value mean confidence interval includes 0).
Broadwell
total instructions in shared programs:
15465366 ->
15465362 (<.01%)
instructions in affected programs: 2799 -> 2795 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs:
410938419 ->
410938531 (<.01%)
cycles in affected programs: 566028 -> 566140 (0.02%)
helped: 18
HURT: 17
helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1
helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01%
HURT stats (abs) min: 1 max: 12 x̄: 10.29 x̃: 12
HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09%
95% mean confidence interval for cycles value: 0.31 6.09
95% mean confidence interval for cycles %-change: -0.10% 0.05%
Inconclusive result (%-change mean confidence interval includes 0).
Haswell
total instructions in shared programs:
13749760 ->
13749759 (<.01%)
instructions in affected programs: 2241 -> 2240 (-0.04%)
helped: 1
HURT: 0
total cycles in shared programs:
385398913 ->
385398363 (<.01%)
cycles in affected programs: 554914 -> 554364 (-0.10%)
helped: 31
HURT: 1
helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6
helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05%
HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -45.88 11.51
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total cycles in shared programs:
180663626 ->
180663881 (<.01%)
cycles in affected programs: 472350 -> 472605 (0.05%)
helped: 15
HURT: 30
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs) min: 8 max: 10 x̄: 9.00 x̃: 9
HURT stats (rel) min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: 4.21 7.12
95% mean confidence interval for cycles %-change: 0.05% 0.08%
Cycles are HURT.
Sandy Bridge
total cycles in shared programs:
154568664 ->
154569225 (<.01%)
cycles in affected programs: 356486 -> 357047 (0.16%)
helped: 1
HURT: 31
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
HURT stats (abs) min: 4 max: 33 x̄: 18.16 x̃: 8
HURT stats (rel) min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for cycles value: 12.19 22.87
95% mean confidence interval for cycles %-change: 0.10% 0.16%
Cycles are HURT.
Iron Lake
total instructions in shared programs:
8206589 ->
8206565 (<.01%)
instructions in affected programs: 3024 -> 3000 (-0.79%)
helped: 12
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.82% -0.77%
Instructions are helped.
total cycles in shared programs:
187657428 ->
187656228 (<.01%)
cycles in affected programs: 95748 -> 94548 (-1.25%)
helped: 12
HURT: 0
helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100
helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21%
95% mean confidence interval for cycles value: -113.27 -86.73
95% mean confidence interval for cycles %-change: -1.43% -1.11%
Cycles are helped.
GM45
total instructions in shared programs:
5037569 ->
5037557 (<.01%)
instructions in affected programs: 1521 -> 1509 (-0.79%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.83% -0.75%
Instructions are helped.
total cycles in shared programs:
128101478 ->
128100758 (<.01%)
cycles in affected programs: 52746 -> 52026 (-1.37%)
helped: 6
HURT: 0
helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120
helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41%
95% mean confidence interval for cycles value: -120.00 -120.00
95% mean confidence interval for cycles %-change: -1.70% -1.12%
Cycles are helped.
This change has almost no effect right now. However, removing this
patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f
with a b2f(!(a || b))") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:
Skylake
total instructions in shared programs:
15071022 ->
15089710 (0.12%)
instructions in affected programs:
1022219 ->
1040907 (1.83%)
helped: 1
HURT: 3937
helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41
helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01%
HURT stats (abs) min: 1 max: 256 x̄: 4.76 x̃: 4
HURT stats (rel) min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60%
95% mean confidence interval for instructions value: 4.56 4.93
95% mean confidence interval for instructions %-change: 2.54% 2.64%
Instructions are HURT.
total cycles in shared programs:
369777134 ->
370092923 (0.09%)
cycles in affected programs:
17516573 ->
17832362 (1.80%)
helped: 115
HURT: 3624
helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28
helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65%
HURT stats (abs) min: 1 max: 12640 x̄: 89.71 x̃: 54
HURT stats (rel) min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52%
95% mean confidence interval for cycles value: 75.21 93.71
95% mean confidence interval for cycles %-change: 4.43% 4.64%
Cycles are HURT.
total spills in shared programs: 9450 -> 9442 (-0.08%)
spills in affected programs: 166 -> 158 (-4.82%)
helped: 2
HURT: 0
total fills in shared programs: 21115 -> 21094 (-0.10%)
fills in affected programs: 438 -> 417 (-4.79%)
helped: 2
HURT: 0
LOST: 1
GAINED: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Mon, 3 Dec 2018 22:41:07 +0000 (14:41 -0800)]
nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))
I have not investigated the result of doing this during code
generation. That should be possible, but it would be a bit more
effort.
All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs:
370961508 ->
370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs:
8206587 ->
8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2
total cycles in shared programs:
187657422 ->
187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2
This change has almost no effect right now. However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:
Skylake
total instructions in shared programs:
15071804 ->
15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2
total cycles in shared programs:
369914348 ->
369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Mon, 3 Dec 2018 23:53:36 +0000 (15:53 -0800)]
intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))
Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a))
maps -1 => 0.0 and 0 => 1.0. This is equivalent to 1.0 +
float(boolBitsToInt(a)). On Intel GPUs, ADD is one of the few
instructions that can type-convert during write to destination, so we
can achieve this in a single instruction:
add g47F, g26D, 1D
v2: Fix swizzles.
v3: Fix typos in comments. Noticed by Ken.
All Gen6+ platforms had similar results. (Skylake shown)
Skylake
total instructions in shared programs:
15185583 ->
15184683 (<.01%)
instructions in affected programs: 239389 -> 238489 (-0.38%)
helped: 899
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09%
95% mean confidence interval for instructions value: -1.01 -0.99
95% mean confidence interval for instructions %-change: -0.51% -0.48%
Instructions are helped.
total cycles in shared programs:
370964249 ->
370961508 (<.01%)
cycles in affected programs:
1487586 ->
1484845 (-0.18%)
helped: 420
HURT: 268
helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6
helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41%
HURT stats (abs) min: 1 max: 230 x̄: 24.90 x̃: 10
HURT stats (rel) min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52%
95% mean confidence interval for cycles value: -7.61 -0.36
95% mean confidence interval for cycles %-change: -0.44% -0.02%
Cycles are helped.
No changes on Iron Lake or GM45.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Thu, 9 Feb 2017 15:21:47 +0000 (15:21 +0000)]
intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+
Instead of emitting ~(a & b), emit (~a | ~b) since logical-not of
operands is free on Gen8+.
v2: Fix swizzles. Fix types for cmod propagation.
v3: Simplify logic for inverting source of inot(ixor(a, b)). Suggested
by Ken.
Skylake and Broadwell had similar results. (Skylake shown)
Skylake
total instructions in shared programs:
15185593 ->
15185583 (<.01%)
instructions in affected programs: 5673 -> 5663 (-0.18%)
helped: 12
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70%
HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -1.66 0.13
95% mean confidence interval for instructions %-change: -2.60% -0.15%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
370977726 ->
370964249 (<.01%)
cycles in affected programs: 869987 -> 856510 (-1.55%)
helped: 15
HURT: 2
helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16
helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53%
HURT stats (abs) min: 14 max: 42 x̄: 28.00 x̃: 28
HURT stats (rel) min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13%
95% mean confidence interval for cycles value: -1654.87 69.34
95% mean confidence interval for cycles %-change: -2.29% -0.23%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Thu, 9 Feb 2017 15:20:04 +0000 (15:20 +0000)]
intel/fs: Emit logical-not of operands on Gen8+
On Gen8+ specifying negation of a logical operation such as AND actually
performs a logical-not. Take advantage of this to generate fewer
instructions.
v2: Major rebase. Use nir_src_as_alu_instr. Fix swizzle handling.
No changes on any pre-Gen8 platform.
Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs:
15466902 ->
15466274 (<.01%)
instructions in affected programs:
1262953 ->
1262325 (-0.05%)
helped: 682
HURT: 4
helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04%
HURT stats (abs) min: 1 max: 62 x̄: 17.50 x̃: 3
HURT stats (rel) min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10%
95% mean confidence interval for instructions value: -1.10 -0.73
95% mean confidence interval for instructions %-change: -0.19% -0.15%
Instructions are helped.
total cycles in shared programs:
410996093 ->
410950440 (-0.01%)
cycles in affected programs:
144389048 ->
144343395 (-0.03%)
helped: 519
HURT: 51
helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140
helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03%
HURT stats (abs) min: 1 max: 4060 x̄: 167.90 x̃: 22
HURT stats (rel) min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25%
95% mean confidence interval for cycles value: -97.16 -63.02
95% mean confidence interval for cycles %-change: -0.32% -0.13%
Cycles are helped.
total spills in shared programs: 95311 -> 95329 (0.02%)
spills in affected programs: 881 -> 899 (2.04%)
helped: 0
HURT: 4
total fills in shared programs: 93629 -> 93634 (<.01%)
fills in affected programs: 794 -> 799 (0.63%)
helped: 1
HURT: 2
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Wed, 5 Dec 2018 19:35:37 +0000 (11:35 -0800)]
intel/fs: Refactor ALU source and destination handling to a separate function
Other places will need to do this soon to properly handle source
swizzles. The patch looks a little odd, but the change is pretty
straight forward. All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.
I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive. I am open to
suggestions of alternatives.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Thu, 13 Dec 2018 02:14:34 +0000 (18:14 -0800)]
intel/fs: Handle OR source modifiers in algebraic optimization
Found by inspection.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Tue, 19 Jun 2018 20:34:57 +0000 (13:34 -0700)]
intel/fs: Relax type matching rules in cmod propagation from MOV instructions
To allow cmod propagation from a MOV in a sequence like:
and(16) g31<1>UD g20<8,8,1>UD g22<8,8,1>UD
mov.nz.f0(16) null<1>F g31<8,8,1>D
A similar change to the vec4 backend had no effect.
Somewhere between
c1ec5820593 and
40fc4b5acd6 (1,094 commits) the
effectiveness of this patch diminished, and as of commit
d7e0d47b9de
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.
A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Tue, 4 Dec 2018 00:30:44 +0000 (16:30 -0800)]
nir/algebraic: Replace i2b used by bcsel or if-statement with comparison
All of the helped shaders are in Deus Ex. I looked at a couple shaders,
and they have a pattern like:
vec1 32 ssa_373 = i2b32 ssa_345.w
vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0
...
vec1 32 ssa_377 = ine ssa_345.w, ssa_0
if ssa_377 {
...
vec1 32 ssa_416 = i2b32 ssa_385.w
vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374
...
}
The massive help occurs because the i2b32 is removed, then other passes
determine that ssa_374 must be ssa_20 inside the if-statement allowing
the first bcsel to also be deleted.
v2: Rebase on 1-bit Boolean changes.
v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by
Bas.
Skylake
total instructions in shared programs:
15241394 ->
15186287 (-0.36%)
instructions in affected programs: 890583 -> 835476 (-6.19%)
helped: 355
HURT: 0
helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149
helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59%
95% mean confidence interval for instructions value: -165.07 -145.39
95% mean confidence interval for instructions %-change: -6.42% -5.77%
Instructions are helped.
total cycles in shared programs:
373846583 ->
371023357 (-0.76%)
cycles in affected programs:
118972102 ->
116148876 (-2.37%)
helped: 343
HURT: 14
helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089
helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77%
HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019
HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11%
95% mean confidence interval for cycles value: -8723.28 -7093.12
95% mean confidence interval for cycles %-change: -2.57% -2.02%
Cycles are helped.
total spills in shared programs: 32401 -> 23465 (-27.58%)
spills in affected programs: 24457 -> 15521 (-36.54%)
helped: 343
HURT: 0
total fills in shared programs: 37866 -> 31765 (-16.11%)
fills in affected programs: 18889 -> 12788 (-32.30%)
helped: 343
HURT: 0
Broadwell and Haswell had similar results. (Haswell shown)
Haswell
total instructions in shared programs:
13764783 ->
13750679 (-0.10%)
instructions in affected programs:
1176256 ->
1162152 (-1.20%)
helped: 334
HURT: 21
helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47
helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37%
HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1
HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03%
95% mean confidence interval for instructions value: -43.99 -35.47
95% mean confidence interval for instructions %-change: -1.35% -1.08%
Instructions are helped.
total cycles in shared programs:
386511910 ->
385402528 (-0.29%)
cycles in affected programs:
143831110 ->
142721728 (-0.77%)
helped: 327
HURT: 39
helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570
helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96%
HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997
HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24%
95% mean confidence interval for cycles value: -3375.59 -2686.60
95% mean confidence interval for cycles %-change: -0.92% -0.64%
Cycles are helped.
total spills in shared programs: 100480 -> 97846 (-2.62%)
spills in affected programs: 84702 -> 82068 (-3.11%)
helped: 316
HURT: 21
total fills in shared programs: 96877 -> 94369 (-2.59%)
fills in affected programs: 69167 -> 66659 (-3.63%)
helped: 316
HURT: 9
No changes on Ivy Bridge or earlier platforms.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Thu, 13 Dec 2018 23:39:49 +0000 (15:39 -0800)]
intel/vec4: Emit constants for some ALU sources as immediate values
In some cases of flow control, the constant propagation is not able to
determine that the source of an instruction must be a constant value.
When we still have NIR SSA values, we can easily determine this. Emit
the immediate value during code generation to possible avoid spurious
loads of constants into registers.
I wrote this patch to prevent a couple trivial regressions in vec4
shaders caused by "nir/algebraic: Replace i2b used by bcsel or
if-statement with comparison". The final result was quite a bit better
than that...
No shader-db changes on any Gen8+ platform.
v2: Assert that we never get a negation source modifier on Gen8+.
Suggested by Ken. This should never happen because we don't normally
use vec4 for Gen8+ (requires and environment variable to force it), and
there's no code to generate these negations. Still, erring on the side
of caution is better.
Haswell
total instructions in shared programs:
13776218 ->
13764783 (-0.08%)
instructions in affected programs: 663931 -> 652496 (-1.72%)
helped: 3495
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49%
HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel) min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24%
95% mean confidence interval for instructions value: -3.39 -3.15
95% mean confidence interval for instructions %-change: -1.84% -1.75%
Instructions are helped.
total cycles in shared programs:
386818984 ->
386511910 (-0.08%)
cycles in affected programs:
20379636 ->
20072562 (-1.51%)
helped: 3052
HURT: 476
helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6
helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69%
HURT stats (abs) min: 2 max: 416 x̄: 62.76 x̃: 24
HURT stats (rel) min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18%
95% mean confidence interval for cycles value: -115.57 -58.51
95% mean confidence interval for cycles %-change: -0.93% -0.73%
Cycles are helped.
total spills in shared programs: 100482 -> 100480 (<.01%)
spills in affected programs: 79 -> 77 (-2.53%)
helped: 3
HURT: 1
total fills in shared programs: 96883 -> 96877 (<.01%)
fills in affected programs: 85 -> 79 (-7.06%)
helped: 4
HURT: 0
Ivy Bridge
total instructions in shared programs:
12000562 ->
11990113 (-0.09%)
instructions in affected programs: 572581 -> 562132 (-1.82%)
helped: 3106
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49%
95% mean confidence interval for instructions value: -3.49 -3.23
95% mean confidence interval for instructions %-change: -1.91% -1.81%
Instructions are helped.
total cycles in shared programs:
180958504 ->
180664500 (-0.16%)
cycles in affected programs:
19991810 ->
19697806 (-1.47%)
helped: 2654
HURT: 486
helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6
helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68%
HURT stats (abs) min: 2 max: 396 x̄: 59.18 x̃: 24
HURT stats (rel) min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16%
95% mean confidence interval for cycles value: -125.62 -61.64
95% mean confidence interval for cycles %-change: -0.76% -0.56%
Cycles are helped.
Sandy Bridge
total instructions in shared programs:
10842336 ->
10835438 (-0.06%)
instructions in affected programs: 395340 -> 388442 (-1.74%)
helped: 1926
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2
helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42%
95% mean confidence interval for instructions value: -3.73 -3.43
95% mean confidence interval for instructions %-change: -1.84% -1.72%
Instructions are helped.
total cycles in shared programs:
154590074 ->
154569050 (-0.01%)
cycles in affected programs:
8159932 ->
8138908 (-0.26%)
helped: 1670
HURT: 228
helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6
helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28%
HURT stats (abs) min: 2 max: 1798 x̄: 40.58 x̃: 14
HURT stats (rel) min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31%
95% mean confidence interval for cycles value: -13.51 -8.64
95% mean confidence interval for cycles %-change: -0.60% -0.46%
Cycles are helped.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs:
8212357 ->
8206587 (-0.07%)
instructions in affected programs: 323664 -> 317894 (-1.78%)
helped: 1457
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3
helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44%
95% mean confidence interval for instructions value: -4.14 -3.78
95% mean confidence interval for instructions %-change: -1.93% -1.78%
Instructions are helped.
total cycles in shared programs:
187668016 ->
187657422 (<.01%)
cycles in affected programs:
14856234 ->
14845640 (-0.07%)
helped: 1372
HURT: 83
helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6
helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08%
HURT stats (abs) min: 2 max: 14 x̄: 3.20 x̃: 2
HURT stats (rel) min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for cycles value: -7.65 -6.91
95% mean confidence interval for cycles %-change: -0.11% -0.10%
Cycles are helped.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Engestrom [Fri, 1 Mar 2019 16:37:31 +0000 (16:37 +0000)]
Revert "swr/rast: Archrast codegen updates"
This reverts the following commits:
71a76a47ccb34c5c259781ed49b0013e86dfaa31 "swr/codegen: fix autotools build"
7763e664cefd1e394101b37fbc552b50f820f44a "meson/swr: replace hard-coded path with current_build_dir()"
773b3ceacaf6d32135348e07878b8514a4350b0e "swr/rast: Fix autotools and scons codegen"
16e10b8c304481e423e76311f70de5de9e7424b1 "swr/rast: Add general SWTag statistics"
b45a15a39f7630d569fcf1296dac1415eb758249 "swr/rast: Add string handling to AR event framework"
8608a747aafe6aef42fba148bfcdbb3ca136e7de "swr/rast: Add initial SWTag proto definitions"
93cd9905c8fbb98985ae1a61c0eebdb225fd1325 "swr/rast: Cleanup and generalize gen_archrast"
The last one in this list broke all the build systems that can build
this (meson, autotools & scons).
See MR !304 for more details:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/304
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Fritz Koenig [Wed, 27 Feb 2019 02:31:38 +0000 (18:31 -0800)]
freedreno/a6xx: Enable UBWC modifier
Adding the supported_modifiers allows buffers
to be created with UBWC
Fritz Koenig [Mon, 7 Jan 2019 20:00:41 +0000 (12:00 -0800)]
freedreno: UBWC allocator
UBWC requires space for a metadata or flag buffer
that contains compression data. Each 16x4 tile of image
data corresponds to a byte of compression data.
This buffer needs to be stored before (at a lower address)
the image buffer in order to match up with what the
display driver. This allows the display driver to directly
scan-out at UBWC buffer.
Fritz Koenig [Mon, 7 Jan 2019 19:58:53 +0000 (11:58 -0800)]
freedreno/a6xx: UBWC support
Universal bandwidth compression(UBWC) reduces memory bandwidth
by compressing buffers. This compression takes the form of
a full sized image buffer as well as a smaller metadata buffer.
Fritz Koenig [Wed, 27 Feb 2019 04:06:31 +0000 (20:06 -0800)]
freedreno: pass count to query_dmabuf_modifiers
query_dmabuf_modifiers needs to know the max number
of modifiers that the list will hold.
Eric Engestrom [Wed, 27 Feb 2019 12:02:47 +0000 (12:02 +0000)]
anv: fix typo
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Engestrom [Wed, 27 Feb 2019 10:43:40 +0000 (10:43 +0000)]
anv: remove spaces around kwargs assignment
pylint complains:
> C0326: No space allowed around keyword argument assignment
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Engestrom [Wed, 27 Feb 2019 10:40:52 +0000 (10:40 +0000)]
anv: drop unused parameter
I'm guessing a previous version of this script used an index-based map
of entrypoints, but that's not the case anymore.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Engestrom [Wed, 27 Feb 2019 10:40:14 +0000 (10:40 +0000)]
anv: simplify chained comparison
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Caio Marcelo de Oliveira Filho [Tue, 26 Feb 2019 04:37:59 +0000 (20:37 -0800)]
nir/copy_prop_vars: handle indirect vector elements
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load. Such SSA
values don't help loading of the indirect unless we emit an if-ladder.
Copy_derefs are supported for indirects.
Also enable two tests that now pass.
v2: Remove unnecessary temporaries. Be clearer when identifying the
case where copy_entry doesn't help when we are dealing with an
indirect array_deref (of a vector). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Tue, 29 Jan 2019 20:39:28 +0000 (12:39 -0800)]
nir/copy_prop_vars: prefer using entries from equal derefs
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.
This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Tue, 29 Jan 2019 14:35:20 +0000 (06:35 -0800)]
nir/copy_prop_vars: add tests for indirect array deref
Both on an actual array and on a vector, and an extra test on a vector
mixing direct and indirect access. The vector tests are disabled and
will be enabled by a later commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Mon, 14 Jan 2019 23:28:33 +0000 (15:28 -0800)]
nir/copy_prop_vars: handle load/store of vector elements
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.
Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w. Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.
The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.
It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.
Tests related to these cases are now enabled.
v2: Instead of asserting on invalid indices, "load" an undef and
remove the store. (Jason)
v3: Merge code path for the cases of is_array_deref_of_vector into the
regular code path. Add a base_index parameter to
value_set_from_value. (code changes by Jason)
v4: Removed the get_entry_for_deref helper, now being used only once.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Wed, 16 Jan 2019 19:48:32 +0000 (11:48 -0800)]
nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS
Also replace uses of 0xf with the appropriate full mask created from
the number of components.
Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Wed, 16 Jan 2019 19:27:43 +0000 (11:27 -0800)]
nir/copy_prop_vars: rename/refactor store_to_entry helper
The name reflected this function role back when the pass also did dead
write elimination. So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Christian Gmeiner [Fri, 1 Mar 2019 07:42:04 +0000 (08:42 +0100)]
etnaviv: fix compile warnings
Fixes the following compile warnings:
[591/629] Compiling C object 'src/gallium/drivers/etnaviv/
df32d18@@etnaviv@sta/etnaviv_context.c.o'.
../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_context.c: In function 'etna_cmd_stream_reset_notify':
../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_context.c:334:22: warning: unused variable 'entry' [-Wunused-variable]
struct set_entry *entry;
^~~~~
[604/629] Compiling C object 'src/gallium/drivers/etnaviv/
df32d18@@etnaviv@sta/etnaviv_resource.c.o'.
../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_resource.c: In function 'etna_resource_used':
../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_resource.c:649:22: warning: unused variable 'entry' [-Wunused-variable]
struct set_entry *entry;
^~~~~
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Sat, 23 Feb 2019 15:15:19 +0000 (16:15 +0100)]
etnaviv: fix resource usage tracking across different pipe_context's
A pipe_resource can be shared by all the pipe_context's hanging off the
same pipe_screen.
Changes from v2 -> v3:
- add locking with mtx_*() to resource and screen (Marek)
Changes from v3 -> v4:
- drop rsc->lock, just use screen->lock for the entire serialization (Marek)
- simplify etna_resource_used() flush condition, which also prevents
potentially flushing resources twice (Marek)
- don't remove resouces from screen->used_resources in
etna_cmd_stream_reset_notify(), they may still be used in other
contexts and may need flushing there later on (Marek)
Changes from v4 -> v5:
- Fix coding style issues reported by Guido
Changes from v5 -> v6:
- Add missing locking in etna_transfer_map(..) (Boris)
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Boris Brezillon <boris.brezillon@collabora.com>
Christian Gmeiner [Thu, 28 Feb 2019 06:26:40 +0000 (07:26 +0100)]
etnaviv: enable ETC2 texture compression support for HALTI0 GPUs
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Christian Gmeiner [Thu, 28 Feb 2019 06:26:39 +0000 (07:26 +0100)]
etnaviv: hook-up etc2 patching
Changes v1 -> v2:
- Avoid the GPU sampling from the resource that gets mutated by the the
transfer map by setting DRM_ETNA_PREP_WRITE.
Changes v2 -> v3:
- make use of likely(..)
- drop minor optimization regarding rsc->layout == ETNA_LAYOUT_LINEAR
- better documentation why DRM_ETNA_PREP_WRITE is needed
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Christian Gmeiner [Thu, 28 Feb 2019 06:26:38 +0000 (07:26 +0100)]
etnaviv: keep track of mapped bo address
Saves us from calling etna_bo_map(..) and saves us from doing the
same offset calcs for map() and unmap() operations.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Christian Gmeiner [Thu, 28 Feb 2019 06:26:37 +0000 (07:26 +0100)]
etnaviv: implement ETC2 block patching for HALTI0
ETC2 is supported with HALTI0, however that implementation is buggy
in hardware. The blob driver does per-block patching to work around
this. We need to swap colors for t-mode etc2 blocks.
Changes v2 -> v3:
- Drop redundant format check
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Jason Ekstrand [Thu, 21 Feb 2019 16:41:59 +0000 (10:41 -0600)]
intel/compiler: Re-prefix non-logical surface opcodes with VEC4
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there. Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 21 Feb 2019 16:32:01 +0000 (10:32 -0600)]
intel/schedule_instructions: Move some comments
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 21 Feb 2019 16:14:17 +0000 (10:14 -0600)]
intel/compiler: Drop unused surface opcodes
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops. There's no reason to keep these opcodes around
anymore.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 21 Feb 2019 15:59:35 +0000 (09:59 -0600)]
intel/fs: Get rid of the IMAGE_SIZE opcode
Since switching to SHADER_OPCODE_SEND for image operations, we no longer
need the non-logical opcode.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 21 Feb 2019 16:12:07 +0000 (10:12 -0600)]
intel/vec4: Drop dead code for handling typed surface messages
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Mon, 11 Feb 2019 22:11:35 +0000 (16:11 -0600)]
intel/fs: Drop the fs_surface_builder
All of the actual abstraction (except possibly setting size_written)
happens as part of the logical opcodes. The only thing that the surface
builder is providing at this point is extra levels of functions to call
through. I'm going to be adding bindless image support soon and all the
extra abstraction here is just getting in the way.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Mon, 11 Feb 2019 22:15:50 +0000 (16:15 -0600)]
intel/fs: Re-order logical surface arguments
It makes more sense to start at the surface then move on to the address
and then the data. Also, this is a really good test of whether or not
we got all the places that use the sources by explicit integer number.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Mon, 11 Feb 2019 20:51:02 +0000 (14:51 -0600)]
intel/fs: Add an enum type for logical sampler inst sources
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jose Fonseca [Thu, 28 Feb 2019 09:53:28 +0000 (09:53 +0000)]
scons: Workaround failures with MSVC when using SCons 3.0.[2-4].
This change applies the workaround suggested by Bill Deegan on the
affected SCons versions.
It also adds a comment with the URL explaining why we were using
customizing the decider and max_drift in the first place, as I had
forgotten all about it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109443
Tested-by: liviuprodea@yahoo.com
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Kristian H. Kristensen [Tue, 26 Feb 2019 06:14:53 +0000 (22:14 -0800)]
freedreno: Fix a couple of warnings
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Kristian H. Kristensen [Sat, 23 Feb 2019 19:23:54 +0000 (11:23 -0800)]
freedreno/a6xx: Don't zero SO buffer addresses
Just disable SO in VPC_SO_BUF_CNTL. Less noise in dumps.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Kristian H. Kristensen [Sat, 23 Feb 2019 19:12:23 +0000 (11:12 -0800)]
freedreno/a6xx: Only output MRT control for used framebuffers
Not much of an optimization, but makes for less noise in the command
buffer dumps.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Eric Engestrom [Wed, 27 Feb 2019 09:48:46 +0000 (09:48 +0000)]
gitlab-ci: install xmllint to validate 00-mesa-defaults.conf
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Eric Engestrom [Tue, 26 Feb 2019 12:32:04 +0000 (12:32 +0000)]
driconf: add DTD to allow the drirc xml (00-mesa-defaults.conf) to be validated
This DTD can be used to validate the drirc xml:
$ xmllint --noout --valid 00-mesa-defaults.conf
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Eric Engestrom [Thu, 28 Feb 2019 14:48:09 +0000 (14:48 +0000)]
vulkan: use VkBase{In,Out}Structure instead of a custom struct
VkBaseInStructure and VkBaseOutStructure are part of vulkan_core.h
(which is part of vulkan.h)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Lionel Landwerlin [Mon, 25 Feb 2019 17:48:14 +0000 (17:48 +0000)]
vulkan/overlay: add support for fps output in file
Also make the sampling period configurable.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Mon, 25 Feb 2019 12:37:27 +0000 (12:37 +0000)]
vulkan/overlay: rework option parsing
Makes adding new options easier.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Tue, 26 Feb 2019 12:44:36 +0000 (12:44 +0000)]
vulkan/overlay: fix min/max computations
This shouldn't be condition to the acquire time being visible.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Emil Velikov [Tue, 19 Feb 2019 14:08:08 +0000 (14:08 +0000)]
egl/sl: use kms_swrast with vgem instead of a random GPU
VGEM and kms_swrast were introduced to work with one another.
All we do is CPU rendering to dumb buffers. There is no reason to carve
out GPU memory, increasing the memory pressure on a device that could
make a better use of it.
Note:
- The original code did not work out of the box, since the dumb buffer
ioctls are not exposed to render nodes.
- This requires libdrm commit
3df8a7f0 ("xf86drm: fallback to MODALIAS
for OF less platform devices")
- The non-kms, swrast is unaffected by this change.
v2:
- elaborate what and how is/isn't working (Eric)
- simplify driver_name handling (Eric)
v3:
- move node_type outside of the loop (Eric)
- kill no longer needed DRM_RENDER_DEV_NAME define
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Emil Velikov [Tue, 19 Feb 2019 14:08:07 +0000 (14:08 +0000)]
egl/sl: use drmDevice API to enumerate available devices
This provides for a more comprehensive iteration and slightly more
straight-forward codebase.
v2:
- s/dpy/disp/
- keep original 64 devices (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Emil Velikov [Tue, 19 Feb 2019 14:08:06 +0000 (14:08 +0000)]
egl/sl: split out swrast probe into separate function
Make the code a bit easier to read.
As a bonus point this makes it obvious that we forgot to call
_eglAddDevice() for the device - do so.
v2:
- s/dpy/disp/ (Eric)
- free(driver_name) on dri2_load_driver_swrast() failure (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Juan A. Suarez Romero [Wed, 27 Feb 2019 10:42:00 +0000 (10:42 +0000)]
nir/spirv: return after emitting a branch in block
When emitting a branch in a block, it does not make sense to continue
processing further instructions, as they will not be reachable.
This fixes a nasty case with a loop with a branch that both then-part
and else-part exits the loop:
%1 = OpLabel
OpLoopMerge %2 %3 None
OpBranchConditional %false %2 %2
%3 = OpLabel
OpBranch %1
%2 = OpLabel
[...]
We know that block %1 will branch always to block %2, which is the merge
block for the loop. And thus a break is emitted. If we keep continuing
processing further instructions, we will be processing the branch
conditional and thus emitting the proper NIR conditional, which leads to
instructions after the break.
This fixes dEQP-VK.graphicsfuzz.continue-and-merge.
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Engestrom [Fri, 9 Nov 2018 11:55:10 +0000 (11:55 +0000)]
egl/android: replace magic 0=CbCr,1=CrCb with simple enum
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Caio Marcelo de Oliveira Filho [Wed, 27 Feb 2019 06:29:27 +0000 (22:29 -0800)]
st/nir: count num_uniforms for FS bultin shader
Usually the uniforms will be assigned locations and have their slots
counted automatically, but for builtin shaders the location assignment
is manual. So count them too otherwise we get num_uniforms == 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ray Zhang [Wed, 27 Feb 2019 06:54:05 +0000 (06:54 +0000)]
glx: fix shared memory leak in X11
call XShmDetach to allow X server to free shared memory
Fixes: bcd80be49a8260c2233d "drisw/glx: use XShm if possible"
Signed-off-by: Ray Zhang <zhanglei002@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Wed, 27 Feb 2019 03:30:29 +0000 (14:30 +1100)]
radeonsi/nir: move si_lower_nir() call into compiler thread
This helps improve compile times. For example the shader-db dolphin
shader shaders/dolphin/ubershaders/120.shader_test goes from
~1.69 -> ~1.57 seconds on my machine with this change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 27 Feb 2019 07:26:07 +0000 (18:26 +1100)]
glsl: fix shader cache for packed param list
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.
Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.
This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.
Fixes: edded1237607 ("mesa: rework ParameterList to allow packing")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Yevhenii Kolesnikov [Mon, 25 Feb 2019 14:21:48 +0000 (16:21 +0200)]
i965: Fix allow_higher_compat_version workaround limited by OpenGL 3.0
Added check for higher compat profile being allowed
before assigning certain extensions.
Fixes: 272fe9494232 (mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile)
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107052
Lionel Landwerlin [Wed, 27 Feb 2019 15:53:21 +0000 (15:53 +0000)]
intel/compiler: use correct swizzle for replacement
The optimization in
4cd1a0be76883c introduced a replacement of :
cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D
...
cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D
By :
cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D
...
mov(8) vgrf11.y:D, vgrf15.yyyy:D
The first cmp instruction is storing in x while the second mov is
sourcing from y. We need to take into account where the replacement on
the scan_inst destination is going to store thing so that the
replacement mov can source things from the correct location.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cd1a0be76883c ("i965/vec4: Propagate conditional modifiers from more compares to other compares")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jonathan Marek [Tue, 26 Feb 2019 16:54:56 +0000 (11:54 -0500)]
freedreno: catch failing fd_blit and fallback to software blit
Fixes cases where the fd_blit fails and never happens (ex: blit to etc1)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Jonathan Marek [Wed, 20 Feb 2019 10:50:47 +0000 (11:50 +0100)]
freedreno: use renderonly path for buffers allocated with modifiers
Now that freedreno has create_with_modifiers(), this "hack" is needed to
make some cases work. Copied from vc4.
Fixes: 41ddf1d1
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Jonathan Marek [Tue, 26 Feb 2019 17:00:01 +0000 (12:00 -0500)]
freedreno: a2xx: fix mipmapping for NPOT textures
Fixes: 3a273a4a
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Jonathan Marek [Tue, 19 Feb 2019 18:01:55 +0000 (19:01 +0100)]
freedreno: a2xx: fix fast clear for some gmem configurations
In freedreno_gmem.c, gmem_align of 0x8000 is used. Alignment used here
should be the same.
Fixes: 912a9c8d
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Jonathan Marek [Tue, 19 Feb 2019 21:41:54 +0000 (22:41 +0100)]
freedreno: a2xx: add use_hw_binning function
Fixes: cb2322c7
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Jonathan Marek [Fri, 22 Feb 2019 18:21:27 +0000 (19:21 +0100)]
freedreno: a2xx: don't write 4th vertex in mem2gmem
There is only room for 3 vertices now (RECT has 3 vertices).
Fixes: 6ef7700a
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Erik Faye-Lund [Tue, 26 Feb 2019 13:53:25 +0000 (14:53 +0100)]
swr/codegen: fix autotools build
When the output directory was changed, the BUILT_SOURCES and build-rule
target-path was no longer correct, leading to races to generate the
sources and compiling them.
Fix this by updating both sets of paths, so automake see what's going on
here.
Fixes: 773b3ceacaf ("swr/rast: Fix autotools and scons codegen")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
Timo Aaltonen [Wed, 6 Feb 2019 08:02:40 +0000 (10:02 +0200)]
util/os_misc: Add check for PIPE_OS_HURD
Fix build on Hurd.
Signed-off-by: Timo Aaltonen <tjaalton@debian.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Lionel Landwerlin [Tue, 26 Feb 2019 23:23:03 +0000 (23:23 +0000)]
vulkan/overlay: install layer binary in libdir
This will allow multilib.
v2: Drop path from json file, dlopen should be able to locate the lib in libdir
v3: Switch from configure_file to install_data (Dylan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109788
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Eric Engestrom [Tue, 26 Feb 2019 14:30:53 +0000 (14:30 +0000)]
meson/swr: replace hard-coded path with current_build_dir()
Fixes: 93cd9905c8fbb98985ae "swr/rast: Cleanup and generalize gen_archrast"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Gert Wollny [Sat, 2 Feb 2019 17:38:17 +0000 (18:38 +0100)]
nir: Add posibility to not lower to source mod 'abs' for ops with three sources
This is useful for r600 since there the abs source modifier is not supported
for ops with three sources
v2: Use correct logic to enable lowering to abs source mod (Eric Anhold)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gurchetan Singh [Fri, 14 Dec 2018 23:12:48 +0000 (15:12 -0800)]
virgl/vtest: deprecate protocol version 1
This is a partial revert of 9d81cd ("virgl: Pass resource size and
transfer offsets").
The adjustments made in the client code means there's various
mismatches when transfering data.
Let's fallback to protocol version 0 and deprecate protocol
version 1. We can still use the protocol version 1 slots for
a shared memory transfer mechanism later.
Fixes:
dEQP-GLES31.functional.copy_image.mixed.viewclass_128_bits_mixed.*_renderbuffer
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Tapani Pälli [Tue, 26 Feb 2019 10:51:07 +0000 (12:51 +0200)]
util: fix a warning when building against clang7 headers
Header xmmintrin.h conditionally includes emmintrin.h that defines
_MM_DENORMALS_ZERO_MASK, add ifndef to fix this warning.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tapani Pälli [Tue, 26 Feb 2019 08:21:32 +0000 (10:21 +0200)]
iris: add libmesa_iris_gen8 library to the build
Patch fixes iris build on Android.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tapani Pälli [Tue, 26 Feb 2019 08:27:15 +0000 (10:27 +0200)]
android: make libbacktrace optional on USE_LIBBACKTRACE
Otherwise with VNDK enabled we fail linking:
src/gallium/targets/dri/Android.mk: error: gallium_dri (native:vendor)
should not link to libbacktrace.vendor (native:vndk_private)
Option makes it possible to use libbacktrace only when VNDK is not
enabled.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tapani Pälli [Thu, 21 Feb 2019 13:00:10 +0000 (15:00 +0200)]
android: add liblog to libmesa_intel_common build
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Alyssa Rosenzweig [Mon, 25 Feb 2019 03:42:12 +0000 (03:42 +0000)]
panfrost/midgard: Allow flt to run on most units
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Mon, 25 Feb 2019 03:31:29 +0000 (03:31 +0000)]
panfrost: Expose perf counters in environment
Previously, we were guarded by an #ifdef, which is generally a bad form.
This patch instead guards them behind an environmental variable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 24 Feb 2019 06:28:39 +0000 (06:28 +0000)]
panfrost: Identify 4-bit channel texture formats
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 24 Feb 2019 05:43:14 +0000 (05:43 +0000)]
panfrost: Add RGB565, RGB5A1 texture formats
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Jose Maria Casanova Crespo [Tue, 26 Feb 2019 13:37:23 +0000 (14:37 +0100)]
iris: Enable ARB_shader_draw_parameters support
Additional VERTEX_ELEMENT_STATE are used to store basevertex and
baseinstance and drawid updating the DWordLength of the
3DSTATE_VERTEX_ELEMENTS command.
This passes all piglit tests for spec.*draw_parameters.* tests
and VK-GL-CTS KHR-GL45.shader_draw_parameters_tests.* tests.
Now we only mark a dirty_update when parameters are changed or
when we have an indirect draw.
We enable PIPE_CAP_DRAW_PARAMETERS on Iris.
There is no edge flag support in the Vertex Elements setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pierre Moreau [Sat, 2 Feb 2019 14:33:51 +0000 (15:33 +0100)]
clover: Fix indentation issues
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pierre Moreau [Wed, 30 Jan 2019 17:38:18 +0000 (18:38 +0100)]
clover: Only use devices supporting IR_NATIVE
Currently clover will advertise any device that advertises
PIPE_CAP_COMPUTE, even if they do not support PIPE_SHADER_IR_NATIVE,
which is the IR used internally by clover.
This avoids clover advertising devices as available even though they
actually are not supported.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pierre Moreau [Thu, 18 Jan 2018 22:42:51 +0000 (23:42 +0100)]
clover: Move platform extensions definitions to clover/platform.cpp
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Pierre Moreau [Sun, 21 Jan 2018 17:49:00 +0000 (18:49 +0100)]
clover: Move device extensions definitions to core/device.cpp
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Pierre Moreau [Wed, 30 Jan 2019 21:27:54 +0000 (22:27 +0100)]
clover: Validate program and library linking options
Program linking options are only valid if the library was created with
the `-enable-link-options` option, which itself is only valid when
creating a library, and only when creating an executable.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pierre Moreau [Sat, 27 Jan 2018 17:25:31 +0000 (18:25 +0100)]
clover: Disallow creating libraries from other libraries
If creating a library, do not allow non-compiled object in it, as
executables are not allowed, and libraries would make it really hard to
enforce the "-enable-link-options" flag.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Pierre Moreau [Sat, 27 Jan 2018 17:12:16 +0000 (18:12 +0100)]
clover/api: Fail if trying to build a non-executable binary
From the OpenCL 1.2 Specification, Section 5.6.2 (about clBuildProgram):
> If program is created with clCreateProgramWithBinary, then the
> program binary must be an executable binary (not a compiled binary or
> library).
Reviewed-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pierre Moreau [Sat, 27 Jan 2018 17:11:17 +0000 (18:11 +0100)]
clover/api: Rework the validation of devices for building
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pierre Moreau [Tue, 3 Oct 2017 19:07:45 +0000 (21:07 +0200)]
clover: Add an helper for checking if an IR is supported
Reviewed-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pierre Moreau [Sat, 10 Feb 2018 15:56:11 +0000 (16:56 +0100)]
clover: Remove the TGSI backend as unused
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pierre Moreau [Fri, 1 Feb 2019 11:33:37 +0000 (12:33 +0100)]
clover: Avoid warnings from new OpenCL headers
* Avoid warnings from references to deprecated CL 1.0, 1.2, 2.0 and 2.1 APIs.
* Avoid warnings from not defining CL_TARGET_OPENCL_VERSION.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>