mesa.git
6 years agonir: Simplify min and max of b2f
Ian Romanick [Tue, 8 Mar 2016 19:11:00 +0000 (11:11 -0800)]
nir: Simplify min and max of b2f

v2: Rebase on almost 2 years.  Require that one of the arguments to fmin
or fmax be used only once.  This prevents some regressions.

shader-db results:

Skylake and Broadwell had similar results.  Skylake shown.
total instructions in shared programs: 14526021 -> 14525913 (<.01%)
instructions in affected programs: 4613 -> 4505 (-2.34%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4
helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42%

total cycles in shared programs: 533118710 -> 533118403 (<.01%)
cycles in affected programs: 34334 -> 34027 (-0.89%)
helped: 24
HURT: 0
helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14
helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03%

No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
6 years agonir: Undo possible damage caused by rearranging or-compounded float compares
Ian Romanick [Fri, 5 Jan 2018 21:29:26 +0000 (13:29 -0800)]
nir: Undo possible damage caused by rearranging or-compounded float compares

shader-db results:

Skylake and Broadwell had similar results (Skylake shown)
total instructions in shared programs: 14525898 -> 14525836 (<.01%)
instructions in affected programs: 1964 -> 1902 (-3.16%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1
helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86%
95% mean confidence interval for instructions value: -9.46 0.60
95% mean confidence interval for instructions %-change: -3.97% -0.24%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 533119892 -> 533115756 (<.01%)
cycles in affected programs: 96061 -> 91925 (-4.31%)
helped: 13
HURT: 1
helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300
helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46%
95% mean confidence interval for cycles value: -379.43 -211.43
95% mean confidence interval for cycles %-change: -4.84% -3.01%
Cycles are helped.

Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown).
total instructions in shared programs: 9033948 -> 9033898 (<.01%)
instructions in affected programs: 535 -> 485 (-9.35%)
helped: 2
HURT: 0

total cycles in shared programs: 84631402 -> 84628949 (<.01%)
cycles in affected programs: 63197 -> 60744 (-3.88%)
helped: 13
HURT: 2
helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140
helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01%
HURT stats (abs)   min: 4 max: 8 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for cycles value: -253.40 -73.67
95% mean confidence interval for cycles %-change: -4.24% -2.25%
Cycles are helped.

No changes on GM45 or Iron Lake.

v2: Add a couple more tautological compares.  Suggested by Elie.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
6 years agonir: Be more conservative about rearranging or-compounded compares
Ian Romanick [Thu, 4 Jan 2018 21:30:49 +0000 (13:30 -0800)]
nir: Be more conservative about rearranging or-compounded compares

If both comparisons are used as sources for instructions other than the
ior, this transformation is detrimental.  If the non-identical value in
both compares is constant, the fmin or fmax will be constant-folded
away, so the transformation is always a win.

shader-db results:

Skylake
total instructions in shared programs: 14526147 -> 14525898 (<.01%)
instructions in affected programs: 70239 -> 69990 (-0.35%)
helped: 102
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.86 -2.02
95% mean confidence interval for instructions %-change: -0.46% -0.31%
Instructions are helped.

total cycles in shared programs: 533120531 -> 533119892 (<.01%)
cycles in affected programs: 994875 -> 994236 (-0.06%)
helped: 76
HURT: 26
helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13
helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18%
HURT stats (abs)   min: 1 max: 167 x̄: 54.62 x̃: 26
HURT stats (rel)   min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39%
95% mean confidence interval for cycles value: -19.44 6.91
95% mean confidence interval for cycles %-change: -0.30% 0.15%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 14816005 -> 14815787 (<.01%)
instructions in affected programs: 64658 -> 64440 (-0.34%)
helped: 97
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.62 -1.87
95% mean confidence interval for instructions %-change: -0.45% -0.30%
Instructions are helped.

total cycles in shared programs: 559340386 -> 559339907 (<.01%)
cycles in affected programs: 1090491 -> 1090012 (-0.04%)
helped: 66
HURT: 28
helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16
helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27%
HURT stats (abs)   min: 2 max: 226 x̄: 39.07 x̃: 11
HURT stats (rel)   min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20%
95% mean confidence interval for cycles value: -15.94 5.75
95% mean confidence interval for cycles %-change: -0.35% 0.07%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 1

Haswell
total instructions in shared programs: 9034106 -> 9033948 (<.01%)
instructions in affected programs: 24096 -> 23938 (-0.66%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4
helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64%
95% mean confidence interval for instructions value: -4.71 -3.60
95% mean confidence interval for instructions %-change: -0.84% -0.58%
Instructions are helped.

total cycles in shared programs: 84631628 -> 84631402 (<.01%)
cycles in affected programs: 148674 -> 148448 (-0.15%)
helped: 14
HURT: 14
helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12
helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21%
HURT stats (abs)   min: 1 max: 10 x̄: 6.00 x̃: 5
HURT stats (rel)   min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11%
95% mean confidence interval for cycles value: -19.42 3.28
95% mean confidence interval for cycles %-change: -0.59% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 10015456 -> 10015293 (<.01%)
instructions in affected programs: 27701 -> 27538 (-0.59%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4
helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52%
95% mean confidence interval for instructions value: -4.87 -3.71
95% mean confidence interval for instructions %-change: -0.82% -0.51%
Instructions are helped.

total cycles in shared programs: 87524771 -> 87524569 (<.01%)
cycles in affected programs: 112324 -> 112122 (-0.18%)
helped: 6
HURT: 12
helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20
helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26%
HURT stats (abs)   min: 1 max: 16 x̄: 5.50 x̃: 5
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -29.14 6.69
95% mean confidence interval for cycles %-change: -0.93% 0.08%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 2

Sandy Bridge
total instructions in shared programs: 10545655 -> 10545465 (<.01%)
instructions in affected programs: 37198 -> 37008 (-0.51%)
helped: 42
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4
helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49%
95% mean confidence interval for instructions value: -5.14 -3.91
95% mean confidence interval for instructions %-change: -0.68% -0.47%
Instructions are helped.

total cycles in shared programs: 146113059 -> 146112427 (<.01%)
cycles in affected programs: 423514 -> 422882 (-0.15%)
helped: 32
HURT: 10
helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12
helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11%
HURT stats (abs)   min: 12 max: 19 x̄: 14.70 x̃: 14
HURT stats (rel)   min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14%
95% mean confidence interval for cycles value: -26.03 -4.07
95% mean confidence interval for cycles %-change: -0.43% -0.05%
Cycles are helped.

Iron Lake
total instructions in shared programs: 7886959 -> 7886925 (<.01%)
instructions in affected programs: 1340 -> 1306 (-2.54%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8
helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43%
95% mean confidence interval for instructions value: -20.44 3.44
95% mean confidence interval for instructions %-change: -5.78% 0.89%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 178116996 -> 178116888 (<.01%)
cycles in affected programs: 6262 -> 6154 (-1.72%)
helped: 2
HURT: 2
helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61
helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62%
HURT stats (abs)   min: 6 max: 8 x̄: 7.00 x̃: 7
HURT stats (rel)   min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -93.27 39.27
95% mean confidence interval for cycles %-change: -5.38% 2.27%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857887 -> 4857870 (<.01%)
instructions in affected programs: 674 -> 657 (-2.52%)
helped: 2
HURT: 0

total cycles in shared programs: 122180816 -> 122180744 (<.01%)
cycles in affected programs: 3764 -> 3692 (-1.91%)
helped: 1
HURT: 1
helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78
helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34%

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
6 years agonir: See through an fneg to apply existing optimizations
Ian Romanick [Tue, 9 Jan 2018 23:32:47 +0000 (15:32 -0800)]
nir: See through an fneg to apply existing optimizations

Doing the same for the existing feq and fne transformations didn't help
anything in shader-db.

shader-db results:

Broadwell and Skylake (Skylake shown)
total instructions in shared programs: 14529463 -> 14526147 (-0.02%)
instructions in affected programs: 402420 -> 399104 (-0.82%)
helped: 2136
HURT: 131
helped stats (abs) min: 1 max: 10 x̄: 1.61 x̃: 1
helped stats (rel) min: 0.03% max: 16.22% x̄: 3.14% x̃: 1.12%
HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel)   min: 0.13% max: 7.69% x̄: 0.75% x̃: 0.57%
95% mean confidence interval for instructions value: -1.51 -1.41
95% mean confidence interval for instructions %-change: -3.06% -2.78%
Instructions are helped.

total cycles in shared programs: 533146915 -> 533120531 (<.01%)
cycles in affected programs: 10356261 -> 10329877 (-0.25%)
helped: 1933
HURT: 844
helped stats (abs) min: 1 max: 490 x̄: 29.44 x̃: 16
helped stats (rel) min: <.01% max: 28.57% x̄: 3.43% x̃: 1.88%
HURT stats (abs)   min: 1 max: 423 x̄: 36.17 x̃: 12
HURT stats (rel)   min: <.01% max: 23.75% x̄: 1.90% x̃: 0.59%
95% mean confidence interval for cycles value: -11.78 -7.22
95% mean confidence interval for cycles %-change: -1.98% -1.65%
Cycles are helped.

Haswell
total instructions in shared programs: 9037416 -> 9034106 (-0.04%)
instructions in affected programs: 389831 -> 386521 (-0.85%)
helped: 2184
HURT: 120
helped stats (abs) min: 1 max: 11 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.73% x̃: 1.02%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.19% max: 7.69% x̄: 0.81% x̃: 0.57%
95% mean confidence interval for instructions value: -1.49 -1.39
95% mean confidence interval for instructions %-change: -2.68% -2.41%
Instructions are helped.

total cycles in shared programs: 84636243 -> 84631628 (<.01%)
cycles in affected programs: 4745058 -> 4740443 (-0.10%)
helped: 1904
HURT: 960
helped stats (abs) min: 1 max: 466 x̄: 30.21 x̃: 18
helped stats (rel) min: 0.02% max: 36.36% x̄: 3.57% x̃: 2.38%
HURT stats (abs)   min: 1 max: 1080 x̄: 55.11 x̃: 14
HURT stats (rel)   min: 0.02% max: 51.33% x̄: 2.77% x̃: 0.81%
95% mean confidence interval for cycles value: -4.51 1.29
95% mean confidence interval for cycles %-change: -1.64% -1.25%
Inconclusive result (value mean confidence interval includes 0).

LOST:   1
GAINED: 0

Sandy Bridge and Ivy Bridge (Ivy Bridge shown)
total instructions in shared programs: 10018873 -> 10015456 (-0.03%)
instructions in affected programs: 512820 -> 509403 (-0.67%)
helped: 2268
HURT: 162
helped stats (abs) min: 1 max: 11 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.47% x̃: 0.88%
HURT stats (abs)   min: 1 max: 4 x̄: 1.59 x̃: 1
HURT stats (rel)   min: 0.09% max: 7.69% x̄: 0.86% x̃: 0.50%
95% mean confidence interval for instructions value: -1.46 -1.35
95% mean confidence interval for instructions %-change: -2.38% -2.12%
Instructions are helped.

total cycles in shared programs: 87538223 -> 87524771 (-0.02%)
cycles in affected programs: 5435520 -> 5422068 (-0.25%)
helped: 1916
HURT: 946
helped stats (abs) min: 1 max: 1392 x̄: 29.44 x̃: 18
helped stats (rel) min: <.01% max: 34.51% x̄: 3.34% x̃: 1.97%
HURT stats (abs)   min: 1 max: 633 x̄: 45.41 x̃: 11
HURT stats (rel)   min: 0.02% max: 25.95% x̄: 2.41% x̃: 0.62%
95% mean confidence interval for cycles value: -7.34 -2.06
95% mean confidence interval for cycles %-change: -1.62% -1.26%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 7888446 -> 7886959 (-0.02%)
instructions in affected programs: 331581 -> 330094 (-0.45%)
helped: 1160
HURT: 97
helped stats (abs) min: 1 max: 10 x̄: 1.37 x̃: 1
helped stats (rel) min: 0.02% max: 9.68% x̄: 0.93% x̃: 0.43%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.17% max: 4.17% x̄: 0.37% x̃: 0.25%
95% mean confidence interval for instructions value: -1.25 -1.12
95% mean confidence interval for instructions %-change: -0.91% -0.75%
Instructions are helped.

total cycles in shared programs: 178130766 -> 178116996 (<.01%)
cycles in affected programs: 12534564 -> 12520794 (-0.11%)
helped: 1856
HURT: 187
helped stats (abs) min: 2 max: 202 x̄: 7.78 x̃: 4
helped stats (rel) min: <.01% max: 6.47% x̄: 0.28% x̃: 0.11%
HURT stats (abs)   min: 2 max: 26 x̄: 3.55 x̃: 2
HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.08% x̃: 0.02%
95% mean confidence interval for cycles value: -7.41 -6.07
95% mean confidence interval for cycles %-change: -0.28% -0.22%
Cycles are helped.

GM45
total instructions in shared programs: 4858912 -> 4857887 (-0.02%)
instructions in affected programs: 237565 -> 236540 (-0.43%)
helped: 867
HURT: 57
helped stats (abs) min: 1 max: 10 x̄: 1.25 x̃: 1
helped stats (rel) min: 0.02% max: 9.38% x̄: 0.87% x̃: 0.43%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 3.85% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: -1.18 -1.04
95% mean confidence interval for instructions %-change: -0.88% -0.71%
Instructions are helped.

total cycles in shared programs: 122189118 -> 122180816 (<.01%)
cycles in affected programs: 8776418 -> 8768116 (-0.09%)
helped: 1213
HURT: 166
helped stats (abs) min: 2 max: 202 x̄: 7.30 x̃: 4
helped stats (rel) min: <.01% max: 6.43% x̄: 0.25% x̃: 0.11%
HURT stats (abs)   min: 2 max: 26 x̄: 3.35 x̃: 2
HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.06% x̃: 0.02%
95% mean confidence interval for cycles value: -6.78 -5.26
95% mean confidence interval for cycles %-change: -0.24% -0.18%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
6 years agost/glsl_to_nir: disable io lowering and array splitting of fs inputs
Timothy Arceri [Mon, 15 Jan 2018 00:48:16 +0000 (11:48 +1100)]
st/glsl_to_nir: disable io lowering and array splitting of fs inputs

We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.

The lowering and splitting is made conditional on lower_all_io_to_temps
because vc4 and freedreno both expect these passes to be enabled and
niether support glsl 400 so don't need to deal with the interpolateAt
builtins.

We leave the other stages for now as to avoid regressions. Ideally we
could remove the stage checks and just set the nir options correctly
for each stage. However all gallium drivers currently just use return
the same nir compiler options for all stages, and it's probably more
trouble than its worth to change this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agonir: add lower_all_io_to_temps flag
Timothy Arceri [Mon, 29 Jan 2018 23:55:19 +0000 (10:55 +1100)]
nir: add lower_all_io_to_temps flag

This will be used for freedreno and vc4 which require all inputs
and outputs to be copied to temps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agonir/st_glsl_to_nir: add param to disable splitting of inputs
Timothy Arceri [Fri, 19 Jan 2018 02:05:35 +0000 (13:05 +1100)]
nir/st_glsl_to_nir: add param to disable splitting of inputs

We need this because we will always copy fs outputs to temps and
split the arrays, but do not want to do either of these with fs
inputs as it is unnessisary and makes handling interpolateAt
builtins difficult.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/glsl_to_nir: copy nir compiler options to context
Timothy Arceri [Tue, 30 Jan 2018 00:51:31 +0000 (11:51 +1100)]
st/glsl_to_nir: copy nir compiler options to context

Various nir passes may expect this to be here as does the nir
serialisation pass.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: add input support for arrays that have not been copied to temps and...
Timothy Arceri [Mon, 15 Jan 2018 00:45:37 +0000 (11:45 +1100)]
radeonsi/nir: add input support for arrays that have not been copied to temps and split

We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoac/radeonsi: add lookup_interp_param and load_sample_position to the abi
Timothy Arceri [Sun, 14 Jan 2018 09:54:20 +0000 (20:54 +1100)]
ac/radeonsi: add lookup_interp_param and load_sample_position to the abi

This will enable the interpolateAt builtins to work on the radeonsi
nir backend.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: add prim_mask to the abi
Timothy Arceri [Sun, 14 Jan 2018 09:51:35 +0000 (20:51 +1100)]
radeonsi/nir: add prim_mask to the abi

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: adjust load_sample_position() to be shared between backends
Timothy Arceri [Sun, 14 Jan 2018 09:49:40 +0000 (20:49 +1100)]
radeonsi/nir: adjust load_sample_position() to be shared between backends

With this interface change it can be shared between the tgsi and
nir backends.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: add si_nir_lookup_interp_param() helper
Timothy Arceri [Tue, 30 Jan 2018 03:54:13 +0000 (14:54 +1100)]
radeonsi/nir: add si_nir_lookup_interp_param() helper

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoac/nir_to_llvm: move some interp defines to the header
Timothy Arceri [Tue, 30 Jan 2018 03:52:43 +0000 (14:52 +1100)]
ac/nir_to_llvm: move some interp defines to the header

These will be used in the following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: move the interpolation qualifier scanning
Timothy Arceri [Sun, 14 Jan 2018 09:43:40 +0000 (20:43 +1100)]
radeonsi/nir: move the interpolation qualifier scanning

We need to collect this when scanning over the instruction rather
than when scanning over the inputs otherwise we might get confliting
values for inputs that are use by the interpolateAt* builtins.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: add interpolate at intrinsics to scan_instruction()
Timothy Arceri [Sun, 14 Jan 2018 08:52:24 +0000 (19:52 +1100)]
radeonsi/nir: add interpolate at intrinsics to scan_instruction()

V2: use the uses_*_opcode_interp_* flags

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradv: Merge raster state with PM4 generation.
Bas Nieuwenhuizen [Tue, 16 Jan 2018 19:44:48 +0000 (20:44 +0100)]
radv: Merge raster state with PM4 generation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Move gs state out of pipeline.
Bas Nieuwenhuizen [Tue, 16 Jan 2018 12:03:44 +0000 (13:03 +0100)]
radv: Move gs state out of pipeline.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Split out cliprect rule generation.
Bas Nieuwenhuizen [Mon, 15 Jan 2018 14:26:41 +0000 (15:26 +0100)]
radv: Split out cliprect rule generation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Merge VGT_GS_MODE computation with PM4 generation.
Bas Nieuwenhuizen [Mon, 15 Jan 2018 13:27:12 +0000 (14:27 +0100)]
radv: Merge VGT_GS_MODE computation with PM4 generation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Split out processing the vertex input state.
Bas Nieuwenhuizen [Mon, 15 Jan 2018 12:23:48 +0000 (13:23 +0100)]
radv: Split out processing the vertex input state.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Move tessellation state out of pipeline.
Bas Nieuwenhuizen [Mon, 15 Jan 2018 12:11:20 +0000 (13:11 +0100)]
radv: Move tessellation state out of pipeline.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Move blend state out of pipeline.
Bas Nieuwenhuizen [Mon, 15 Jan 2018 11:32:57 +0000 (12:32 +0100)]
radv: Move blend state out of pipeline.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Split out generating VGT_SHADER_STAGES_EN.
Bas Nieuwenhuizen [Mon, 15 Jan 2018 11:32:25 +0000 (12:32 +0100)]
radv: Split out generating VGT_SHADER_STAGES_EN.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Split out the ia_multi_vgt_param precomputation.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 23:41:59 +0000 (00:41 +0100)]
radv: Split out the ia_multi_vgt_param precomputation.

Also moved everything in a struct and then return the struct from
the helper function, so it is clear in the caller what part of the
pipeline gets modified.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Split out db_shader_control computation.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 22:40:43 +0000 (23:40 +0100)]
radv: Split out db_shader_control computation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Compute shader_z_format when emitting it.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 22:25:46 +0000 (23:25 +0100)]
radv: Compute shader_z_format when emitting it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Merge depth stencil state with PM4 generation.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 19:00:40 +0000 (20:00 +0100)]
radv: Merge depth stencil state with PM4 generation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Merge ps_input_cntl computation with PM4 generation.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 15:05:01 +0000 (16:05 +0100)]
radv: Merge ps_input_cntl computation with PM4 generation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Merge vtx_reuse_depth computation with PM4 generation.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 01:46:49 +0000 (02:46 +0100)]
radv: Merge vtx_reuse_depth computation with PM4 generation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Merge vs state computation with PM4 generation.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 01:43:08 +0000 (02:43 +0100)]
radv: Merge vs state computation with PM4 generation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Merge binning state generation with pm4 emission.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 01:18:53 +0000 (02:18 +0100)]
radv: Merge binning state generation with pm4 emission.

We don't need the pipeline state struct anymore.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Constify some pipeline helpers.
Bas Nieuwenhuizen [Mon, 15 Jan 2018 11:34:33 +0000 (12:34 +0100)]
radv: Constify some pipeline helpers.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Add PM4 pregeneration for compute pipelines.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 20:20:20 +0000 (21:20 +0100)]
radv: Add PM4 pregeneration for compute pipelines.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Record a PM4 sequence for graphics pipeline switches.
Bas Nieuwenhuizen [Sun, 14 Jan 2018 01:03:38 +0000 (02:03 +0100)]
radv: Record a PM4 sequence for graphics pipeline switches.

This gives about 2% performance improvement on dota2 for me.

This is mostly a mechanical copy and replacement, but at bind time
we still do:

1) Some stuff that is only based on num_samples changes.
2) Some command buffer state setting.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Determine unneeded dynamic states.
Bas Nieuwenhuizen [Tue, 16 Jan 2018 13:32:35 +0000 (14:32 +0100)]
radv: Determine unneeded dynamic states.

Which avoids setting or emitting them.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agomesa: check for invalid index on UUID glGet queries
Andres Rodriguez [Fri, 22 Dec 2017 00:18:59 +0000 (19:18 -0500)]
mesa: check for invalid index on UUID glGet queries

This fixes the piglit test:
spec/ext_semaphore/api-errors/usigned-byte-i-v-bad-value

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: fix glGet for ext_external_objects parameters
Andres Rodriguez [Fri, 22 Dec 2017 00:00:29 +0000 (19:00 -0500)]
mesa: fix glGet for ext_external_objects parameters

This allows the client to actually query the enums specified in the
ext_external_objects spec.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: fix error codes for importing memory/semaphore FDs
Andres Rodriguez [Thu, 21 Dec 2017 22:59:07 +0000 (17:59 -0500)]
mesa: fix error codes for importing memory/semaphore FDs

This fixes the following piglit tests:
spec/ext_semaphore_fd/api-errors/import-semaphore-fd-bad-enum
spec/ext_memory_object_fd/api-errors/import-memory-fd-bad-enum

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: fix fence_server_sync() holding up extra work v2
Andres Rodriguez [Wed, 20 Dec 2017 00:31:41 +0000 (19:31 -0500)]
radeonsi: fix fence_server_sync() holding up extra work v2

When calling si_fence_server_sync(), the wait operation is associated
with the next kernel submission. Therefore, any unflushed work
submitted previous to fence_server_sync() will also be affected by
the wait.

To avoid adding the dependency to the unflushed work, we flush before
emitting the fence dependency.

v2: s/semaphore/fence

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: implement semaphore_server_signal v2
Andres Rodriguez [Thu, 26 Oct 2017 21:13:03 +0000 (17:13 -0400)]
radeonsi: implement semaphore_server_signal v2

Syncobj based waits or signals only happen at submission boundaries. In
order to guarantee that the requested signal event will occur when the
state tracker requested it, we must issue a flush.

v2: s/fence/semaphore for pipe objects

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: add support for importing PIPE_FD_TYPE_SYNCOBJ semaphores
Andres Rodriguez [Fri, 15 Dec 2017 05:13:50 +0000 (00:13 -0500)]
radeonsi: add support for importing PIPE_FD_TYPE_SYNCOBJ semaphores

Hook up importing semaphores of type PIPE_FD_TYPE_SYNCOBJ

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agowinsys/amdgpu: add support for syncobj signaling v3
Andres Rodriguez [Fri, 27 Oct 2017 02:42:08 +0000 (22:42 -0400)]
winsys/amdgpu: add support for syncobj signaling v3

Add the ability to signal a syncobj when a cs completes execution.

v2: corresponding changes for gallium fence->semaphore rename
v3: s/semaphore/fence for pipe objects

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa/st: add support for semaphore object signal/wait v4
Andres Rodriguez [Wed, 18 Oct 2017 19:11:27 +0000 (15:11 -0400)]
mesa/st: add support for semaphore object signal/wait v4

Bits to implement ServerWaitSemaphoreObject/ServerSignalSemaphoreObject

v2:
  - corresponding changes for gallium fence->semaphore rename
  - flushing moved to mesa/main

v3: s/semaphore/fence for pipe objects
v4: add bitmap flushing

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add support for semaphore object signal/wait v3
Andres Rodriguez [Tue, 17 Oct 2017 00:10:31 +0000 (20:10 -0400)]
mesa: add support for semaphore object signal/wait v3

Memory synchronization is left for a future patch.

v2: flush vertices/bitmaps moved to mesa/main
v3: removed spaces before/after braces

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add semaphore parameter stub v2
Andres Rodriguez [Fri, 6 Oct 2017 21:50:20 +0000 (17:50 -0400)]
mesa: add semaphore parameter stub v2

EXT_semaphore and EXT_semaphore_fd define no pnames. Therefore there
isn't much to do besides determining the correct error code.

v2: removed useless return

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa/st: add support for semaphore object create/import/delete v3
Andres Rodriguez [Tue, 17 Oct 2017 00:09:46 +0000 (20:09 -0400)]
mesa/st: add support for semaphore object create/import/delete v3

Add basic semaphore object operations.

v2: s/semaphore/fence for pipe objects
v3: added missing license headers

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add support for semaphore object creation/import/delete v3
Andres Rodriguez [Fri, 6 Oct 2017 21:17:54 +0000 (17:17 -0400)]
mesa: add support for semaphore object creation/import/delete v3

Used by EXT_semmaphore and EXT_semaphore_fd

v2: Removed unnecessary dummy callback initialization
v3: Fixed attempting to free the DummySemaphoreObject

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa/st: introduce EXT_semaphore and EXT_semaphore_fd v2
Andres Rodriguez [Tue, 3 Oct 2017 19:35:57 +0000 (15:35 -0400)]
mesa/st: introduce EXT_semaphore and EXT_semaphore_fd v2

Guarded by PIPE_CAP_SEMAPHORE_SIGNAL

v2: corresponding changes for PIPE_CAP_SEMAPHORE_SIGNAL rename

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agou_threaded_context: add support for fence_server_signal v2
Andres Rodriguez [Thu, 26 Oct 2017 23:16:51 +0000 (19:16 -0400)]
u_threaded_context: add support for fence_server_signal v2

v2: s/semaphore/fence

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: add fence_server_signal() v2
Andres Rodriguez [Thu, 14 Dec 2017 05:24:46 +0000 (00:24 -0500)]
gallium: add fence_server_signal() v2

Calling this function will emit a fence signal operation into the
GPU's command stream.

v2: documentation typos

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: introduce PIPE_FD_TYPE_SYNCOBJ
Andres Rodriguez [Tue, 5 Dec 2017 20:44:04 +0000 (15:44 -0500)]
gallium: introduce PIPE_FD_TYPE_SYNCOBJ

Denotes that a fd is backed by a synobj. For example, radv shared
semaphores.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: introduce PIPE_CAP_FENCE_SIGNAL v2
Andres Rodriguez [Wed, 4 Oct 2017 21:30:23 +0000 (17:30 -0400)]
gallium: introduce PIPE_CAP_FENCE_SIGNAL v2

Protects semaphore signaling functionality required by GL_EXT_semaphore.

v2: s/semaphore/fence

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: add type parameter to create_fence_fd
Andres Rodriguez [Mon, 4 Dec 2017 20:27:08 +0000 (15:27 -0500)]
gallium: add type parameter to create_fence_fd

An fd can potentially have different types of objects backing it.
Specifying the type helps us make sure we treat the FD correctly.

This is in preparation to allow importing syncobj fence FDs in addition
to native sync FDs.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoac/llvm: bump the number of results to 8.
Dave Airlie [Tue, 30 Jan 2018 03:58:05 +0000 (13:58 +1000)]
ac/llvm: bump the number of results to 8.

This function can get access for a 64-bit dvec4, which means we
have to load 8 components.

This fixes:
R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agor600/sb: insert the else clause when we might depart from a loop
Dave Airlie [Tue, 30 Jan 2018 06:38:51 +0000 (16:38 +1000)]
r600/sb: insert the else clause when we might depart from a loop

If there is a break inside the else clause and this means we
are breaking from a loop, the loop finalise will want to insert
the LOOP_BREAK/CONTINUE instruction, however if we don't emit
the else there is no where for these to end up, so they will end
up in the wrong place.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101442
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agomesa: remove invalid assertion in _mesa_enable_vertex_array_attrib()
Brian Paul [Tue, 30 Jan 2018 17:11:49 +0000 (10:11 -0700)]
mesa: remove invalid assertion in _mesa_enable_vertex_array_attrib()

The meta module passes some 0-based attrib values.  Should fix Piglit
regressions reported by Mark Janes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104863
Fixes: 4ab7e03e1fc7ac ("mesa: add an assertion in
_mesa_enable_vertex_array_attrib()")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agomesa: use gl_vert_attrib enum type in more places
Brian Paul [Tue, 30 Jan 2018 18:02:06 +0000 (11:02 -0700)]
mesa: use gl_vert_attrib enum type in more places

Slightly better readbility.

Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
6 years agomesa: rename some 'client' array functions
Brian Paul [Fri, 26 Jan 2018 18:35:43 +0000 (11:35 -0700)]
mesa: rename some 'client' array functions

A long time ago gl_vertex_array was gl_client_array.  Update some function
names to be consistent.

Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
6 years agomesa: s/src/attribs/ in _mesa_update_client_array()
Brian Paul [Fri, 26 Jan 2018 18:27:59 +0000 (11:27 -0700)]
mesa: s/src/attribs/ in _mesa_update_client_array()

Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
6 years agomesa: check/assert array index in _mesa_bind_vertex_buffer()
Brian Paul [Fri, 26 Jan 2018 18:27:33 +0000 (11:27 -0700)]
mesa: check/assert array index in _mesa_bind_vertex_buffer()

Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
6 years agomesa: trivial comment typo fix in arrayobj.c
Brian Paul [Fri, 26 Jan 2018 18:09:44 +0000 (11:09 -0700)]
mesa: trivial comment typo fix in arrayobj.c

Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
6 years agomesa: add an assertion in _mesa_enable_vertex_array_attrib()
Brian Paul [Fri, 26 Jan 2018 18:03:57 +0000 (11:03 -0700)]
mesa: add an assertion in _mesa_enable_vertex_array_attrib()

Some of the enable/disable vertex array functions take a zero-based
generic index, while others take a VERT_ATTRIB_GENERIC0-based value.
Add an assertion to clarify that in one place.

Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
6 years agomesa: rename some vars in client_state()
Brian Paul [Fri, 26 Jan 2018 18:03:29 +0000 (11:03 -0700)]
mesa: rename some vars in client_state()

Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
6 years agomesa: Care for differences in fog mode only if fog is consumed.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: Care for differences in fog mode only if fog is consumed.

In creating fixed function vertex shader hash keys do only
care for producing the varying output if fog is enabled and the
varing is consumed in the fragment stage.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: Reduce ffvertex_prog state_key to 36 bytes.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: Reduce ffvertex_prog state_key to 36 bytes.

Using lower alignment restrictions for the state key fields finally
yields to a smaller hashing state key.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: Remove unused ffvertex_prog texunit_really_enabled.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: Remove unused ffvertex_prog texunit_really_enabled.

Remove set but not read field from the state key used for hashing
fixed function vertex shaders.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: Remove unused bit in ffvertex_prog state_key.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: Remove unused bit in ffvertex_prog state_key.

Remove set but not read field from the state key used for hashing
fixed function vertex shaders.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: texgen_enabled is only 1 bit.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: texgen_enabled is only 1 bit.

For the state key for hashing fixed function vertex shaders, the
texgen_enabled field requires only a single bit.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: Encode fog modes in a 2 bit field.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: Encode fog modes in a 2 bit field.

For the state key for hashing fixed function
vertex shaders, encode the different fog modes, including
if fog is generally enabled or not, into a 2 bit field.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: Move seperate_specular into the lighting section.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: Move seperate_specular into the lighting section.

For the state key for hashing fixed function
vertex shaders, the information is only evaluated
if lighting is generally switched on.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: Get the point size array state from varying_vp_inputs.
Mathias Fröhlich [Sat, 27 Jan 2018 19:09:00 +0000 (12:09 -0700)]
mesa: Get the point size array state from varying_vp_inputs.

For the state key for hashing fixed function
vertex shaders, The varying_vp_inputs bitmask already
contains the point size array enabled information.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: Remove unused gl_fog_attrib::_Scale.
Mathias Fröhlich [Tue, 30 Jan 2018 15:37:21 +0000 (08:37 -0700)]
mesa: Remove unused gl_fog_attrib::_Scale.

The patch removes a variable that is only written to.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agoanv/pipeline: lower constant initializers on output variables earlier
Iago Toral Quiroga [Tue, 16 Jan 2018 08:37:11 +0000 (09:37 +0100)]
anv/pipeline: lower constant initializers on output variables earlier

If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.

Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: move disk cache from brw_context to intel_screen
Tapani Pälli [Fri, 26 Jan 2018 06:20:07 +0000 (08:20 +0200)]
i965: move disk cache from brw_context to intel_screen

Now every context refers to same disk_cache instance in screen.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agomesa: Correctly print glTexImage dimensions
Elie Tournier [Thu, 25 Jan 2018 15:18:10 +0000 (15:18 +0000)]
mesa: Correctly print glTexImage dimensions

texture_format_error_check_gles() displays error like "glTexImage%dD".
This patch just replace the %d by the correct dimension.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agomesa: shrink size of gl_array_attributes (v2)
Brian Paul [Mon, 29 Jan 2018 21:09:54 +0000 (14:09 -0700)]
mesa: shrink size of gl_array_attributes (v2)

Inspired by Marek's earlier patch, but even smaller.  Sort fields from
largest to smallest.  Use bitfields for more fields (sometimes with an
extra bit for MSVC).  Reduce Stride field to GLshort.

Note that some fields cannot be bitfields because they're accessed via
pointers (such as for glEnableClientState(GL_VERTEX_ARRAY) to set the
Enabled field).

Reduces size from 48 to 24 bytes.
Also reduces size of gl_vertex_array_object from 3632 to 2864 bytes.

And add some assertions in init_array().

v2: use s/GLuint/unsigned/, improve commit comments.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: shrink gl_vertex_array
Brian Paul [Fri, 26 Jan 2018 21:49:41 +0000 (14:49 -0700)]
mesa: shrink gl_vertex_array

Inspired by Marek's earlier patch, but goes a little further.
Sort fields from largest to smallest.  Use bitfields.

Reduced from 48 bytes to 32.  Also reduces size of gl_vertex_array_object
from 4144 to 3632

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: replace GLenum with GLenum16 in common structures (v4)
Marek Olšák [Fri, 26 Jan 2018 21:25:53 +0000 (14:25 -0700)]
mesa: replace GLenum with GLenum16 in common structures (v4)

v2: - fix glGet*
    - also use GLenum16 for DrawBuffers
v3: - rebase to top of tree (BrianP) and incorporate Ian's suggestions
v4: - fix a GLenum16 bug in VBO/save code, add some STATIC_ASSERT()s

gl_context = 152432 -> 136840 bytes
vbo_context = 22096 -> 20608 bytes

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomesa: fix incorrect size/error test in _mesa_GetUnsignedBytevEXT()
Brian Paul [Mon, 29 Jan 2018 21:19:44 +0000 (14:19 -0700)]
mesa: fix incorrect size/error test in _mesa_GetUnsignedBytevEXT()

get_value_size() returns -1 for an error.  The similar check in
_mesa_GetUnsignedBytei_vEXT() is correct.

Found by chance.  There are apparently no Piglit tests which exercise
glGetUnsignedBytei_vEXT() or glGetUnsignedBytevEXT().

Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agosvga: Check rasterization state object before checking poly_stipple_enable
Neha Bhende [Mon, 22 Jan 2018 23:01:20 +0000 (15:01 -0800)]
svga: Check rasterization state object before checking poly_stipple_enable

Sometimes rasterization state object could be empty. This is causing
segfault on hw8,9,10 for some traces.

This patch fixes enemy_territory_quake_wars_high,
enemy_territory_quake_wars_low, etqw-demo, lightsmark2008, quake1
glretrace crashes on hw 8,9,10.

Tested with mtt-glretrace and mtt-piglit.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agosvga: Adjust alpha for S3TC_DXT1_EXT RGB formats
Neha Bhende [Mon, 29 Jan 2018 16:32:19 +0000 (09:32 -0700)]
svga: Adjust alpha for S3TC_DXT1_EXT RGB formats

According to spec, S3TC_DXT1_EXT RGB formats are supposed to be
opaque. Correspoding svga formats are not handling it so explicitly
setting it to 1.0.
This fixes piglit test spec@ext_texture_compression_s3tc@s3tc-targeted
Note: This test is testcase for freedesktop bug 100925

Tested with mtt-piglit and mtt-glretrace on 8,9,10,11 and 15

Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa/st/glsl_to_tgsi: Mark first write as unconditional when appropriate
Gert Wollny [Mon, 29 Jan 2018 12:24:00 +0000 (05:24 -0700)]
mesa/st/glsl_to_tgsi: Mark first write as unconditional when appropriate

In the register lifetime estimation if the first write is unconditional or
conditional but not within a loop then this is an unconditional dominant
write in the sense of register life time estimation.
Add a test case and record the write accordingly.

Fixes: 807e2539e512ca6c96f059da855473eb7be99ba1 ("mesa/st/glsl_to_tgsi: Add
tracking of ifelse writes in register merging")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104803
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomesa: skip validation of legality of size/type queries for format queries
Roland Scheidegger [Sat, 27 Jan 2018 00:39:35 +0000 (01:39 +0100)]
mesa: skip validation of legality of size/type queries for format queries

The size/type query is always legal (if we made it that far).
Removing this causes a difference for GL_TEXTURE_BUFFER - the reason is that
these parameters are valid only with GetTexLevelParameter() if gl 3.1 is
supported, but not if only ARB_texture_buffer_object is supported.
However, while the spec says that these queries return "the same information
as querying GetTexLevelParameter" I believe we're not expected to return just
zeros here. By definition, these pnames are always valid (unlike for the
GetTexLevelParameter() function which would return an error without GL 3.1).
The spec is a bit inconsistent there and open to interpretation - while
mentioning the "same information as querying GetTexLevelParameter" is
returned, it also mentions that 0 is returned for size/type if the
target/format is not supported - implying correct results to be returned
if it is supported, regardless that GetTexLevelParameter would return
an error. (Also, the bit about this returning the same as
GetTexLevelParameter also includes querying stencil type, which isn't
even possible with GetTexLevelParameter.)

This breaks some piglit arb_internalformat_query2 tests (which I believe to
be wrong).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>§
6 years agomesa: restrict formats being supported by target type for formatquery
Roland Scheidegger [Sat, 27 Jan 2018 00:25:26 +0000 (01:25 +0100)]
mesa: restrict formats being supported by target type for formatquery

The code just considered all formats as being supported if they were either
a valid fbo or texture format.
This was quite awkward since then the query would return "supported" for
e.g. GL_RGB9E5 or compressed formats and target RENDERBUFFER (albeit the driver
could still refuse it in theory). However, when then querying for instance the
internalformat sizes, it would just return 0 (due to the checks being more
strict there).
It was also a problem for texture buffer targets, which have a more restricted
list of formats which are allowed (and again, it would return supported but
then querying sizes would return 0).
So only take validation of formats into account which make sense for a given
target.
Can also toss out some special checks for rgb9e5 later, since we'd never get
there if it wasn't supported in the first place.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
6 years agomesa: (trivial) add TODO comment for default results for internal queries
Roland Scheidegger [Tue, 30 Jan 2018 00:03:49 +0000 (01:03 +0100)]
mesa: (trivial) add TODO comment for default results for internal queries

6 years agomesa: remove misleading gles checks for formatquery
Roland Scheidegger [Sat, 27 Jan 2018 00:12:52 +0000 (01:12 +0100)]
mesa: remove misleading gles checks for formatquery

Testing for gles there is just confusing - this is about target being
supported, if it was valid at all was already determined earlier
(in _legal_parameters). It didn't make sense at all in any case, since
it would only have said false there for gles for 2d but not 2d arrays etc.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
6 years agoi965: Emit PIPE_CONTROL with ISP bit on older platforms.
Rafael Antognolli [Fri, 26 Jan 2018 01:14:47 +0000 (17:14 -0800)]
i965: Emit PIPE_CONTROL with ISP bit on older platforms.

Emit it on all platforms since gen7.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoanv/cmd_buffer: Emit PIPE_CONTROL with ISP bit on older platforms.
Rafael Antognolli [Fri, 26 Jan 2018 01:13:26 +0000 (17:13 -0800)]
anv/cmd_buffer: Emit PIPE_CONTROL with ISP bit on older platforms.

Emit it on all platforms since gen7.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agost/glsl_to_nir: remove dead io after conversion to nir
Timothy Arceri [Mon, 29 Jan 2018 06:33:57 +0000 (17:33 +1100)]
st/glsl_to_nir: remove dead io after conversion to nir

This fixes an assert in nir_lower_var_copies() for some bioshock
shaders where an unused clipdistance array has no size.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: add support vs double inputs
Timothy Arceri [Fri, 15 Dec 2017 03:22:16 +0000 (14:22 +1100)]
radeonsi/nir: add support vs double inputs

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: pass input_idx to declare_nir_input_vs()
Timothy Arceri [Fri, 15 Dec 2017 03:16:01 +0000 (14:16 +1100)]
radeonsi: pass input_idx to declare_nir_input_vs()

This make it consistent with declare_nir_input_fs() and will allow
us to support doubles.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: add bitcast_inputs() helper
Timothy Arceri [Fri, 15 Dec 2017 03:13:11 +0000 (14:13 +1100)]
radeonsi: add bitcast_inputs() helper

Will be used in a following patch to help support doubles.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: fix num_inputs for doubles in vs
Timothy Arceri [Fri, 15 Dec 2017 00:22:56 +0000 (11:22 +1100)]
radeonsi/nir: fix num_inputs for doubles in vs

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agonir: partially revert c2acf97fcc9b32e
Timothy Arceri [Thu, 14 Dec 2017 06:22:23 +0000 (17:22 +1100)]
nir: partially revert c2acf97fcc9b32e

c2acf97fcc9b32e changed the use of double_inputs_read to be
inconsitent with its previous meaning. Here we re-enable the
gather info code that was removed as the modified code from
c2acf97fcc9b32e now uses the double_inputs member rather than
double_inputs_read.

This change allows us to use double_inputs_read with gallium
drivers without impacting double_inputs which is used by i965.

We also make use of the compiler option vs_inputs_dual_locations
to allow for the difference in behaviour between drivers that handle
vs inputs as taking up two locations for doubles, versus those that
treat them as taking a single location.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonir: add vs_inputs_dual_locations compiler option
Timothy Arceri [Sun, 7 Jan 2018 23:37:27 +0000 (10:37 +1100)]
nir: add vs_inputs_dual_locations compiler option

Allows nir drivers to either use a single or dual locations for
vs double inputs.

i965 uses dual locations for both OpenGL and Vulkan drivers, for
now gallium OpenGL drivers only use a single location.

The following patch will also make use of this option when
calling nir_shader_gather_info().

Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agocompiler: tidy up double_inputs_read uses
Timothy Arceri [Sat, 16 Dec 2017 03:06:23 +0000 (14:06 +1100)]
compiler: tidy up double_inputs_read uses

First we move double_inputs_read into a vs struct in the union,
double_inputs_read is only used for vs inputs so this will
save space and also allows us to add a new double_inputs field.

We add the new field because c2acf97fcc9b changed the behaviour
of double_inputs_read, and while it's no longer used to track
actual reads in i965 we do still want to track this for gallium
drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradv/gfx9: fix block compression texture views. (v2)
Dave Airlie [Mon, 29 Jan 2018 04:15:09 +0000 (04:15 +0000)]
radv/gfx9: fix block compression texture views. (v2)

This ports a fix from amdvlk, to fix the sizing for mip levels
when block compressed images are viewed using uncompressed views.

My original fix didn't power the clamping, but it looks like
the clamping is required to stop the sizing going too large.

Fixes:
dEQP-VK.image.texel_view_compatible.graphic.extended*bc*
Doesn't crash DOW3 anymore.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Signal fence correctly after sparse binding.
Bas Nieuwenhuizen [Sat, 27 Jan 2018 13:51:12 +0000 (14:51 +0100)]
radv: Signal fence correctly after sparse binding.

It did not signal syncobjs in the fence, and also signalled too early
if there was work on the queue already, as we have to wait till that
work is done.

Fixes: d27aaae4d2 "radv: Add external fence support."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agomesa/vbo: replace vbo_draw_method() with _mesa_set_drawing_arrays()
Brian Paul [Wed, 24 Jan 2018 16:14:35 +0000 (09:14 -0700)]
mesa/vbo: replace vbo_draw_method() with _mesa_set_drawing_arrays()

The arrays specified by ctx->Array._DrawArrays are used for all
vertex drawing via vbo_context::draw_prims().  Different arrays are
used for immediate mode, vertex arrays, display lists, etc.  Changing
from one to another requires updating derived/driver array state.

Before, we indirectly specifid the arrays with the gl_draw_method values.
Now we just directly specify the arrays instead.  This is simpler and
will allow a subsequent display list optimization.

In the future, it might make sense to get rid of ctx->Array._DrawArrays
entirely and just pass the arrays as another parameter to
vbo_context::draw_prims().

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>