George Kyriazis [Wed, 28 Feb 2018 23:33:13 +0000 (17:33 -0600)]
swr/rast: Add tracking for stream out topology
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Mon, 26 Feb 2018 23:55:23 +0000 (17:55 -0600)]
swr/rast: Add split draw and other state information to DrawInfoEvent.
Removed specific split draw events.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Mon, 26 Feb 2018 21:19:08 +0000 (15:19 -0600)]
swr/rast: Refactor api and worker event handlers.
In the API event handler we want to share information between the core
layer and the API. Specifically, around associating various ids with
different kinds of events. For example, associate render pass id with
draw ids, or command buffer ids with draw ids.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Sat, 24 Feb 2018 00:51:18 +0000 (18:51 -0600)]
swr/rast: Add support for generalized late and early z/stencil stats
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Fri, 23 Feb 2018 22:11:04 +0000 (16:11 -0600)]
swr/rast: Rasterized Subspans stats support
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
George Kyriazis [Wed, 21 Feb 2018 01:24:55 +0000 (19:24 -0600)]
swr/rast: Added comment
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Eric Engestrom [Mon, 26 Feb 2018 13:34:54 +0000 (13:34 +0000)]
vulkan/wsi: clean up cleanup path
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Bas Nieuwenhuizen [Fri, 9 Mar 2018 13:08:38 +0000 (14:08 +0100)]
radv: Fix the autotools build take 2.
Forgot to remove a word....
Fixes: 04ffabf17a "radv: Fix autotools build."
Lucas Stach [Wed, 7 Mar 2018 13:31:59 +0000 (14:31 +0100)]
etnaviv: allow mixing different bit depths for color and depth surfaces
Vivante hardware supports this just fine. There is no reason why this shouldn't
be advertised as a valid combination.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Thierry Reding [Thu, 22 Feb 2018 17:21:45 +0000 (18:21 +0100)]
autotools: Add tegra to AM_DISTCHECK_CONFIGURE_FLAGS
This allows the driver to be built on a make distcheck and makes sure
that it properly builds when a distribution tarball is made.
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Thierry Reding [Tue, 27 May 2014 22:36:48 +0000 (00:36 +0200)]
tegra: Initial support
Tegra K1 and later use a GPU that can be driven by the Nouveau driver.
But the GPU is a pure render node and has no display engine, hence the
scanout needs to happen on the Tegra display hardware. The GPU and the
display engine each have a separate DRM device node exposed by the
kernel.
To make the setup appear as a single device, this driver instantiates
a Nouveau screen with each instance of a Tegra screen and forwards GPU
requests to the Nouveau screen. For purposes of scanout it will import
buffers created on the GPU into the display driver. Handles that
userspace requests are those of the display driver so that they can be
used to create framebuffers.
This has been tested with some GBM test programs, as well as kmscube and
weston. All of those run without modifications, but I'm sure there is a
lot that can be improved.
Some fixes contributed by Hector Martin <marcan@marcan.st>.
Changes in v2:
- duplicate file descriptor in winsys to avoid potential issues
- require nouveau when building the tegra driver
- check for nouveau driver name on render node
- remove unneeded dependency on libdrm_tegra
- remove zombie references to libudev
- add missing headers to C_SOURCES variable
- drop unneeded tegra/ prefix for includes
- open device files with O_CLOEXEC
- update copyrights
Changes in v3:
- properly unwrap resources in ->resource_copy_region()
- support vertex buffers passed by user pointer
- allocate custom stream and const uploader
- silence error message on pre-Tegra124
- support X without explicit PRIME
Changes in v4:
- ship Meson build files in distribution tarball
- drop duplicate driver_tegra dependency
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Thierry Reding [Wed, 11 Oct 2017 12:38:56 +0000 (14:38 +0200)]
nouveau: Add framebuffer modifier support
This adds support for framebuffer modifiers to Nouveau. This will be
used by the Tegra driver to share metadata about the format of buffers
(such as the tiling mode or compression).
Changes in v2:
- remove unused parameters to nouveau_buffer_create()
- move format modifier query code to nvc0 backend
- restrict format modifiers to 2D textures
- implement ->query_dmabuf_modifiers()
Changes in v4:
- add UAPI include path on meson builds
Changes in v5:
- remove unnecessary includes
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Thierry Reding [Wed, 11 Oct 2017 12:41:26 +0000 (14:41 +0200)]
nouveau/nvc0: Extract common tile mode macro
Add a new macro that can be used to extract the tiling mode from a
tile_mode value. This is will be used to determine the number of GOBs
used in block linear mode.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Thierry Reding [Tue, 20 Feb 2018 14:48:37 +0000 (15:48 +0100)]
drm/tegra: Sanitize format modifiers
The existing format modifier definitions were merged prematurely, and
recent work has unveiled that the definitions are suboptimal in several
ways:
- The format specifiers, except for one, are not Tegra specific, but
the names don't reflect that.
- The number space is split into two, reserving 32 bits for some
"parameter" which most of the modifiers are not going to have.
- Symbolic names for the modifiers are not using the standard
DRM_FORMAT_MOD_* prefix, which makes them awkward to use.
- The vendor prefix NV is somewhat ambiguous.
Fortunately, nobody's started using these modifiers, so we can still fix
the above issues. Do so by using the standard prefix. Also, remove TEGRA
from the name of those modifiers that exist on NVIDIA GPUs as well. In
case of the block linear modifiers, make the "parameter" smaller (4
bits, though only 6 values are valid) and don't let that leak into any
of the other modifiers.
Finally, also use the more canonical NVIDIA instead of the ambiguous NV
prefix.
This is based on commit
268892cb63a822315921a8dab48ac3e4abf7dd03 from
Linux v4.16-rc1.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Thierry Reding [Tue, 20 Feb 2018 14:47:25 +0000 (15:47 +0100)]
drm/fourcc: Fix fourcc_mod_code() definition
Avoid a compiler warnings when the val parameter is an expression.
This is based on commit
5843f4e02fbe86a59981e35adc6cabebee46fdc0 from
Linux v4.16-rc1.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Bas Nieuwenhuizen [Fri, 9 Mar 2018 07:43:01 +0000 (08:43 +0100)]
radv: Fix autotools build.
Forgot it again ....
Fixes: b6347807a9 "radv: Generate icd files."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Thu, 8 Mar 2018 16:30:05 +0000 (17:30 +0100)]
ac/nir: set number of channels for packed mrt exports
Bit 0 enables VSRC0 (R in low bits, G high) and bit 2 enables
VSRC1 (B in low bits, A high).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Thu, 8 Mar 2018 23:49:57 +0000 (00:49 +0100)]
radv: Update version to 1.1.70.
Turns out they did not reset the patch number on release.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Thu, 8 Mar 2018 23:47:26 +0000 (00:47 +0100)]
radv: Generate icd files.
If the api version is too low, the loader clamps the application
requested version to the advertized version, which messes with
which extensions are enabled.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Ian Romanick [Thu, 22 Feb 2018 02:15:52 +0000 (18:15 -0800)]
nir: Don't i2b a value that is already Boolean
A bunch of shaders have sequences like:
i2b(u2i(floatBitsToUint(intBitsToFloat(x == y ? -1 : 0))))
Other optimizations (and NIR's typeless nature) reduce this to
i2b(x == y)
which is silly.
Skylake
total instructions in shared programs:
14498698 ->
14497948 (<.01%)
instructions in affected programs: 74480 -> 73730 (-1.01%)
helped: 277
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 2.71 x̃: 2
helped stats (rel) min: 0.04% max: 13.79% x̄: 1.45% x̃: 0.68%
95% mean confidence interval for instructions value: -3.35 -2.06
95% mean confidence interval for instructions %-change: -1.74% -1.16%
Instructions are helped.
total cycles in shared programs:
532015500 ->
531999238 (<.01%)
cycles in affected programs:
5943878 ->
5927616 (-0.27%)
helped: 251
HURT: 74
helped stats (abs) min: 1 max: 13149 x̄: 127.89 x̃: 14
helped stats (rel) min: 0.01% max: 17.31% x̄: 1.55% x̃: 0.53%
HURT stats (abs) min: 1 max: 4550 x̄: 214.04 x̃: 15
HURT stats (rel) min: <.01% max: 44.43% x̄: 2.81% x̃: 0.33%
95% mean confidence interval for cycles value: -158.51 58.43
95% mean confidence interval for cycles %-change: -1.07% -0.04%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 4753 -> 4735 (-0.38%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.
Haswell and Broadwell had simliar results. (Broadwell shown)
total instructions in shared programs:
14791877 ->
14791127 (<.01%)
instructions in affected programs: 77326 -> 76576 (-0.97%)
helped: 278
HURT: 1
helped stats (abs) min: 1 max: 32 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.04% max: 13.79% x̄: 1.42% x̃: 0.68%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49%
95% mean confidence interval for instructions value: -3.33 -2.05
95% mean confidence interval for instructions %-change: -1.70% -1.13%
Instructions are helped.
total cycles in shared programs:
558250067 ->
558252872 (<.01%)
cycles in affected programs:
5806328 ->
5809133 (0.05%)
helped: 235
HURT: 83
helped stats (abs) min: 1 max: 10630 x̄: 81.73 x̃: 16
helped stats (rel) min: 0.03% max: 18.58% x̄: 1.60% x̃: 0.51%
HURT stats (abs) min: 1 max: 10590 x̄: 265.19 x̃: 20
HURT stats (rel) min: <.01% max: 15.28% x̄: 1.89% x̃: 0.54%
95% mean confidence interval for cycles value: -89.87 107.51
95% mean confidence interval for cycles %-change: -1.06% -0.32%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 4735 -> 4717 (-0.38%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.
total fills in shared programs: 83111 -> 83110 (<.01%)
fills in affected programs: 28 -> 27 (-3.57%)
helped: 1
HURT: 0
Ivy Bridge
total instructions in shared programs:
11774173 ->
11773436 (<.01%)
instructions in affected programs: 70819 -> 70082 (-1.04%)
helped: 267
HURT: 0
helped stats (abs) min: 1 max: 48 x̄: 2.76 x̃: 2
helped stats (rel) min: 0.21% max: 19.51% x̄: 1.57% x̃: 0.63%
95% mean confidence interval for instructions value: -3.51 -2.01
95% mean confidence interval for instructions %-change: -1.94% -1.21%
Instructions are helped.
total cycles in shared programs:
257153833 ->
257148932 (<.01%)
cycles in affected programs: 585341 -> 580440 (-0.84%)
helped: 167
HURT: 100
helped stats (abs) min: 1 max: 1327 x̄: 44.89 x̃: 16
helped stats (rel) min: 0.04% max: 26.54% x̄: 2.41% x̃: 0.88%
HURT stats (abs) min: 1 max: 200 x̄: 25.95 x̃: 16
HURT stats (rel) min: 0.04% max: 9.81% x̄: 1.34% x̃: 0.65%
95% mean confidence interval for cycles value: -33.25 -3.46
95% mean confidence interval for cycles %-change: -1.47% -0.54%
Cycles are helped.
total loops in shared programs: 3416 -> 3398 (-0.53%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.
LOST: 2
GAINED: 0
Sandy Bridge
total instructions in shared programs:
10499306 ->
10499094 (<.01%)
instructions in affected programs: 6051 -> 5839 (-3.50%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 4.93 x̃: 2
helped stats (rel) min: 0.39% max: 12.90% x̄: 4.29% x̃: 2.45%
95% mean confidence interval for instructions value: -7.66 -2.20
95% mean confidence interval for instructions %-change: -5.47% -3.12%
Instructions are helped.
total cycles in shared programs:
145862568 ->
145861370 (<.01%)
cycles in affected programs: 61733 -> 60535 (-1.94%)
helped: 36
HURT: 2
helped stats (abs) min: 16 max: 66 x̄: 36.61 x̃: 35
helped stats (rel) min: 0.45% max: 17.31% x̄: 4.92% x̃: 2.81%
HURT stats (abs) min: 18 max: 102 x̄: 60.00 x̃: 60
HURT stats (rel) min: 1.10% max: 1.85% x̄: 1.48% x̃: 1.48%
95% mean confidence interval for cycles value: -41.28 -21.77
95% mean confidence interval for cycles %-change: -6.16% -3.00%
Cycles are helped.
total loops in shared programs: 1803 -> 1785 (-1.00%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.
LOST: 4
GAINED: 0
No changes on Iron Lake of GM45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Ian Romanick [Sat, 17 Feb 2018 01:33:13 +0000 (17:33 -0800)]
i965/vec4: Allow CSE on subset VF constant loads
v2: Rewrite the code that generates the VF mask. Suggested by Ken.
No changes on other platforms.
Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs:
13059891 ->
13059884 (<.01%)
instructions in affected programs: 431 -> 424 (-1.62%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -3.39% -0.71%
Instructions are helped.
total cycles in shared programs:
409260032 ->
409260018 (<.01%)
cycles in affected programs: 4228 -> 4214 (-0.33%)
helped: 7
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -1.15% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Sat, 17 Feb 2018 01:26:11 +0000 (17:26 -0800)]
i965/vec4: Relax writemask condition in CSE
If the previously seen instruction generates more fields than the new
instruction, still allow CSE to happen. This doesn't do much, but it
also enables a couple more shaders in the next patch. It helped quite a
bit in another change series that I have (at least for now) abandoned.
v2: Add some extra comentary about the parameters to instructions_match.
Suggested by Ken.
No changes on Skylake, Broadwell, Iron Lake or GM45.
Ivy Bridge and Haswell had similar results. (Ivy Bridge shown)
total instructions in shared programs:
11780295 ->
11780294 (<.01%)
instructions in affected programs: 302 -> 301 (-0.33%)
helped: 1
HURT: 0
total cycles in shared programs:
257308315 ->
257308313 (<.01%)
cycles in affected programs: 2074 -> 2072 (-0.10%)
helped: 1
HURT: 0
Sandy Bridge
total instructions in shared programs:
10506687 ->
10506686 (<.01%)
instructions in affected programs: 335 -> 334 (-0.30%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Thu, 22 Feb 2018 02:06:56 +0000 (18:06 -0800)]
i965/fs: Merge CMP and SEL into CSEL on Gen8+
v2: Fix several problems handling inverted predicates. Add a much
bigger comment around the BRW_CONDITIONAL_NZ case.
v3: Allow uniforms and shader inputs as sources for the original SEL and
CMP instructions. This enables a LOT more shaders to receive CSEL
merging (5816 vs 8564 on SKL).
v4: Report progress.
Broadwell and Skylake had similar results. (Broadwell shown)
helped: 8527
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 1
helped stats (rel) min: 0.03% max: 17.80% x̄: 1.12% x̃: 0.70%
95% mean confidence interval for instructions value: -2.51 -2.36
95% mean confidence interval for instructions %-change: -1.15% -1.10%
Instructions are helped.
total cycles in shared programs:
559442317 ->
558288357 (-0.21%)
cycles in affected programs:
372699860 ->
371545900 (-0.31%)
helped: 6748
HURT: 1450
helped stats (abs) min: 1 max: 32000 x̄: 182.41 x̃: 12
helped stats (rel) min: <.01% max: 66.08% x̄: 3.42% x̃: 0.70%
HURT stats (abs) min: 1 max: 2538 x̄: 53.08 x̃: 14
HURT stats (rel) min: <.01% max: 96.72% x̄: 3.32% x̃: 0.90%
95% mean confidence interval for cycles value: -179.01 -102.51
95% mean confidence interval for cycles %-change: -2.37% -2.08%
Cycles are helped.
LOST: 0
GAINED: 6
No changes on earlier platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 23 Nov 2015 04:12:17 +0000 (20:12 -0800)]
i965/fs: Add infrastructure for generating CSEL instructions.
v2 (idr): Don't allow CSEL with a non-float src2.
v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt.
v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt). Don't
reset the access mode afterwards (suggested by Samuel and Matt). Add
support for CSEL not modifying the flags to more places (requested by
Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 15 Feb 2018 22:49:55 +0000 (14:49 -0800)]
nir: Narrow some dot product operations
On vector platforms, this helps elide some constant loads.
v2: Reorder the transformations.
No changes on Broadwell or Skylake.
Haswell
total instructions in shared programs:
13093793 ->
13060163 (-0.26%)
instructions in affected programs:
1277532 ->
1243902 (-2.63%)
helped: 13216
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78%
HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.57 -2.49
95% mean confidence interval for instructions %-change: -3.65% -3.54%
Instructions are helped.
total cycles in shared programs:
409580819 ->
409268463 (-0.08%)
cycles in affected programs:
71730652 ->
71418296 (-0.44%)
helped: 9898
HURT: 2352
helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16
helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50%
HURT stats (abs) min: 2 max: 276 x̄: 23.25 x̃: 6
HURT stats (rel) min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97%
95% mean confidence interval for cycles value: -33.19 -17.80
95% mean confidence interval for cycles %-change: -4.50% -4.26%
Cycles are helped.
total fills in shared programs: 82059 -> 82052 (<.01%)
fills in affected programs: 21 -> 14 (-33.33%)
helped: 7
HURT: 0
Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown)
total instructions in shared programs:
11811851 ->
11780605 (-0.26%)
instructions in affected programs:
1155007 ->
1123761 (-2.71%)
helped: 12304
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86%
HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.56 -2.48
95% mean confidence interval for instructions %-change: -3.71% -3.59%
Instructions are helped.
total cycles in shared programs:
257618409 ->
257316805 (-0.12%)
cycles in affected programs:
71999580 ->
71697976 (-0.42%)
helped: 9155
HURT: 2380
helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16
helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62%
HURT stats (abs) min: 2 max: 290 x̄: 21.14 x̃: 4
HURT stats (rel) min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33%
95% mean confidence interval for cycles value: -34.32 -17.97
95% mean confidence interval for cycles %-change: -4.55% -4.29%
Cycles are helped.
GM45 and Iron Lake had nearly identical results (Iron Lake shown)
total instructions in shared programs:
7886750 ->
7879944 (-0.09%)
instructions in affected programs: 373781 -> 366975 (-1.82%)
helped: 3715
HURT: 47
helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1
helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06%
HURT stats (abs) min: 1 max: 6 x̄: 2.55 x̃: 2
HURT stats (rel) min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35%
95% mean confidence interval for instructions value: -1.85 -1.77
95% mean confidence interval for instructions %-change: -2.91% -2.73%
Instructions are helped.
total cycles in shared programs:
178114636 ->
178095452 (-0.01%)
cycles in affected programs:
7227666 ->
7208482 (-0.27%)
helped: 3349
HURT: 301
helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4
helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63%
HURT stats (abs) min: 2 max: 42 x̄: 9.13 x̃: 10
HURT stats (rel) min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50%
95% mean confidence interval for cycles value: -5.52 -4.99
95% mean confidence interval for cycles %-change: -0.81% -0.73%
Cycles are helped.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Lionel Landwerlin [Wed, 7 Mar 2018 14:10:15 +0000 (14:10 +0000)]
i965: perf: consolidate unmapping oa perf bo outside accumulation
Do this in one place outside the only caller of the accumulation
function.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Tue, 6 Mar 2018 17:11:56 +0000 (17:11 +0000)]
i965: perf: count number of accumlated reports
This will be reused later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Tue, 6 Mar 2018 15:47:00 +0000 (15:47 +0000)]
i965: perf: reuse timescale base function from query
We already have the same function in brw_queryobj.c
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Wed, 7 Feb 2018 18:09:58 +0000 (18:09 +0000)]
i965: perf: store sysfs device entry into context
We want to reuse it later on.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Wed, 7 Feb 2018 18:10:57 +0000 (18:10 +0000)]
i965: perf: store the hw_id of the context in the query
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Tue, 6 Feb 2018 17:29:32 +0000 (17:29 +0000)]
i965: perf: default case for unknown query types
Just some extra safety before further changes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek Olšák [Tue, 6 Mar 2018 23:30:06 +0000 (18:30 -0500)]
radeonsi: remove chip_class parameter from si_lower_nir
We can get it from si_screen.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Sun, 11 Sep 2016 19:53:20 +0000 (21:53 +0200)]
winsys/amdgpu: query GDS info
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Tue, 6 Mar 2018 20:03:09 +0000 (15:03 -0500)]
winsys/amdgpu: pad compute IBs
v2: pad with PKT2 NOPs on SI
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Wed, 7 Mar 2018 16:36:26 +0000 (11:36 -0500)]
radeonsi: expand constbuf 0 address correctly to fix Vega10 hangs
This is only required with the latest libdrm.
This fixes 32-bit support with high addresses.
(and possibly 64-bit support too because the high bits need to be masked out)
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Wed, 7 Mar 2018 00:07:58 +0000 (19:07 -0500)]
radeonsi: align command buffer starting address to fix some Raven hangs
Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Christian Gmeiner [Mon, 5 Mar 2018 22:26:43 +0000 (23:26 +0100)]
etnaviv: add get_driver_query_group_info(..)
This enables AMD_performance_monitor extension.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Christian Gmeiner [Mon, 5 Mar 2018 22:26:42 +0000 (23:26 +0100)]
etnaviv: add query_group_info for sw counters
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Dylan Baker [Wed, 28 Feb 2018 21:07:57 +0000 (13:07 -0800)]
meson: Fix building gallium media libs without egl
v2: - rebase on omx fix
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Dylan Baker [Wed, 28 Feb 2018 18:13:38 +0000 (10:13 -0800)]
meson: Allow building dri based EGL without GLX
It should be possible to build EGL without GLX, but the meson build
currently doesn't allow that because it too tightly couples glx and dri.
This patch eases dri and glx apart, so that EGL without GLX can be
built.
CC: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Thierry Reding [Tue, 6 Mar 2018 09:44:08 +0000 (10:44 +0100)]
glx/apple: Ship meson build file in tarball
The meson build file for Apple GLX is not listed in the EXTRA_DIST make
variable and therefore isn't shipped as part of the release tarball, so
meson builds from the tarball will fail.
Add the file to EXTRA_DIST to ensure it is included in the tarball.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Samuel Pitoiset [Thu, 8 Mar 2018 08:53:14 +0000 (09:53 +0100)]
ac/nir: do not emit unnecessary null exports in fragment shaders
Null exports should only be needed when no other exports are
emitted. This removes a bunch of 'exp null off, off, off, off done vm'.
Affected games are Dota 2 and Wolfenstein 2, not sure if that
really helps, but code size is decreasing there.
Polaris10:
Totals from affected shaders:
SGPRS: 8216 -> 8216 (0.00 %)
VGPRS: 7072 -> 7072 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 454968 -> 453896 (-0.24 %) bytes
Max Waves: 772 -> 772 (0.00 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Thu, 8 Mar 2018 09:52:16 +0000 (09:52 +0000)]
drirc: whitespace fix
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Hellstrom [Mon, 26 Feb 2018 13:32:01 +0000 (14:32 +0100)]
drirc: Disable the GLX_SGI_video_sync extension for gnome-shell on vmware
With this extension enabled and a server GLX implementation that actually
honors it, Window movement lags considerably on gnome-shell/vmware, so
disable it by default.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Thomas Hellstrom [Mon, 26 Feb 2018 13:30:33 +0000 (14:30 +0100)]
gallium/st_dri: Honor the glx_disable_sgi_video_sync config option
This option is disabled by default. Primarily intended for drivers on
virtual hardware.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Thomas Hellstrom [Mon, 26 Feb 2018 13:27:40 +0000 (14:27 +0100)]
glx/dri: Add a driconf option to disable GLX_SGI_video_sync
Drivers on virtual hardware don't want to expose this extension to
GLX compositors, similarly to GLX_OML_sync_control, since that significantly
increases latency.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Timothy Arceri [Wed, 7 Mar 2018 22:46:42 +0000 (09:46 +1100)]
ac/radeonsi: add emit_kill to the abi
This should fix a regression with Rocket League grass rendering
on the NIR backend.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104717
Timothy Arceri [Wed, 7 Mar 2018 22:37:10 +0000 (09:37 +1100)]
radeonsi: add si_llvm_emit_kill() helper
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 7 Mar 2018 23:37:52 +0000 (10:37 +1100)]
spirv: fix autotools builds
Fixes: 68a6a3b51acc "spirv: handle AMD_gcn_shader extended instructions"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Wed, 7 Mar 2018 00:10:54 +0000 (11:10 +1100)]
ac: make use of if/loop build helpers
These helpers insert the basic block in the same order as they
appear in NIR making it easier to follow LLVM IR dumps. The helpers
also insert more useful labels onto the blocks.
TGSI use the line number of the corresponding opcode in the TGSI
dump as the label id, here we use the corresponding block index
from NIR.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 6 Mar 2018 23:55:47 +0000 (10:55 +1100)]
radeonsi: make use of if/loop build helpers in ac
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 6 Mar 2018 23:53:34 +0000 (10:53 +1100)]
ac: add if/loop build helpers
These have been ported over from radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Daniel Schürmann [Fri, 23 Feb 2018 12:55:01 +0000 (13:55 +0100)]
radv: enable AMD_gcn_shader extension
Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Daniel Schürmann [Fri, 23 Feb 2018 12:55:00 +0000 (13:55 +0100)]
ac: implement AMD_gcn_shader extended instructions
Co-authored-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Daniel Schürmann [Fri, 23 Feb 2018 12:54:59 +0000 (13:54 +0100)]
spirv: handle AMD_gcn_shader extended instructions
Co-authored-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Daniel Schürmann [Fri, 23 Feb 2018 12:54:58 +0000 (13:54 +0100)]
nir: add AMD_gcn_shader extended instructions
Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Daniel Schürmann [Fri, 23 Feb 2018 12:54:57 +0000 (13:54 +0100)]
spirv: import AMD extensions header from glslang
Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dylan Baker [Tue, 6 Mar 2018 18:36:09 +0000 (10:36 -0800)]
meson: Fix indent in omx meson.build
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Julien Isorce <julien.isorce@gmail.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Dylan Baker [Tue, 6 Mar 2018 18:36:42 +0000 (10:36 -0800)]
meson: Use include directory variables instead of traversing
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Julien Isorce <julien.isorce@gmail.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Dylan Baker [Tue, 6 Mar 2018 18:11:38 +0000 (10:11 -0800)]
meson: Re-add auto option for omx
This re-adds the auto option for omx, without it we default to tizonia
and the build fails almost immediately, this is especially obnoxious
those building a driver that doesn't support the OMX state tracker to
begin with.
v2: - Only define OMX_FOO for auto cases if the dependencies are found.
This fixes building tizonia with auto (Julien, Eric)
CC: Gurkirpal Singh <gurkirpal204@gmail.com>
Fixes: bb5e27fab6087a5c1528a5faf507acce700e883c
("st/omx/bellagio: Rename st and target directories")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Julien Isorce <julien.isorce@gmail.com>
Tested-by: Karol Herbst <kherbst@redhat.com> (v1)
Dylan Baker [Tue, 6 Mar 2018 19:33:16 +0000 (11:33 -0800)]
meson: fix tizonia compilation
It needs to have src/egl in it's includes as well.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Julien Isorce <julien.isorce@gmail.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Dylan Baker [Tue, 6 Mar 2018 19:32:23 +0000 (11:32 -0800)]
meson: combine state trackers and target if blocks
This is needed later since tizonia requires dri
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Julien Isorce <julien.isorce@gmail.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Marek Olšák [Fri, 23 Feb 2018 19:42:41 +0000 (20:42 +0100)]
st/mesa: expose 0 shader binary formats for compat profiles for Qt
Bugzilla: https://bugreports.qt.io/browse/QTBUG-66420
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105065
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Roland Scheidegger [Tue, 6 Mar 2018 20:33:16 +0000 (21:33 +0100)]
draw: fix line stippling with aa lines
In contrast to non-aa, where stippling is based on either dx or dy
(depending on if it's a x or y major line), stippling is based on
actual distance with smooth lines, so adjust for this.
(It looks like there's some minor artifacts with mesa demos
line-sample and stippling, it looks like the line endpoints
aren't quite right with aa + stippling - maybe due to the
integer math in the stipple stage, but I can't quite pinpoint it.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Tue, 6 Mar 2018 18:16:45 +0000 (19:16 +0100)]
draw: simplify (and correct) aaline fallback (v2)
The motivation actually was to get rid of the additional tex
instruction, since that requires the draw fallback code to intercept
all sampler / view calls (even if the fallback is never hit).
Basically, the idea is to use coverage of the pixel to calculate
the alpha value, and coverage is simply based on the distance
to the center of the line (in both line direction, which is useful
for wide lines, as well as perpendicular to the line).
This is much closer to what hw supporting this natively actually does.
It also fixes an issue with line width not quite being correct, as
well as endpoints getting stretched too far (in line direction) with
wide lines, which is apparent with mesa demo line-sample.
(For llvmpipe, it would probably make sense to do something like this
directly when drawing lines, since rendering two tris is twice as
expensive as a line, but it would need some changes with state
management.)
Since we're no longer relying on mipmapping to get the alpha value,
we also don't need to draw 3 rects (6 tris), one is sufficient.
There's still issues (as before):
- quite sure it's not correct without half_pixel_center, but can't test
this with GL.
- aaline + line stipple is incorrect (evident with line-sample demo).
Looking at the spec the stipple pattern should actually be based on
distance (not just dx or dy for x/y major lines as without aa).
- outputs (other than pos + the one used for line aa) should be
reinterpolated since we actually increase line length by half a pixel
(but there's no tests which would care).
v2: simplify the math (should be equivalent), don't need immediate
v3: use float versions of atan2,cos,sin, minor cleanups
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Bas Nieuwenhuizen [Wed, 7 Mar 2018 15:38:32 +0000 (16:38 +0100)]
radv: Don't emit a warning on VI-GFX9.
We are conformant:
https://www.khronos.org/conformance/adopters/conformant-products#submission_308
v2: Actually not emit it on gfx9.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Tue, 6 Feb 2018 00:40:00 +0000 (01:40 +0100)]
radv: Enable vulkan 1.1.0 for configurations that can support it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 22 Jan 2018 21:22:41 +0000 (22:22 +0100)]
radv: Disable sampler ycbcr conversion.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 23:34:08 +0000 (00:34 +0100)]
radv: Expose that we don't support any VK_KHR_16_bit_storage parts.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 21:34:11 +0000 (22:34 +0100)]
radv: Implement vkEnumerateInstanceVersion.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 16:13:26 +0000 (17:13 +0100)]
radv: Add trivial device group implementation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 15:32:38 +0000 (16:32 +0100)]
radv: Implement vkCmdDispatchBase.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 15:11:48 +0000 (16:11 +0100)]
radv: Implement VkGetDeviceQueue2.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 14:59:45 +0000 (15:59 +0100)]
radv: Support VkPhysicalDeviceProtectedMemoryFeatures.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 14:57:59 +0000 (15:57 +0100)]
radv: Support VkPhysicalDeviceShaderDrawParameterFeatures.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 14:53:03 +0000 (15:53 +0100)]
radv: Implement VK_KHR_maintenance3.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 14:06:10 +0000 (15:06 +0100)]
radv: Add minimal subgroup support.
Deliberately not implementing workgroup scopes as that is not needed
for core vulkan.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 12:55:26 +0000 (13:55 +0100)]
radv: Change client version check.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 21 Jan 2018 12:39:22 +0000 (13:39 +0100)]
radv: Update MAX_API_VERSION to 1.1.0
v2: Don't bump supported version.
v3: Update json files.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 5 Feb 2018 21:54:18 +0000 (22:54 +0100)]
ac/nir: Add vote_ieq/vote_feq lowering pass.
The old vote_eq implementation supported only booleans, but now
we have to support arbitrary values, so use the read_first_invocation
intrinsic + ballot.
I took this as an opportunity to figure out how easy it was to do this
in nir instead of in the nir_to_llvm pass, and it actually turned out
pretty okay IMO. Only creating the pass is some extra code.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 10 Nov 2017 03:17:29 +0000 (19:17 -0800)]
anv: Support version overrides
While always sketchy to do, this is useful for debugging.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 10 Nov 2017 03:17:17 +0000 (19:17 -0800)]
vulkan/util: Add a helper to get a version override
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 22 Sep 2017 14:44:10 +0000 (07:44 -0700)]
anv: Enable Vulkan 1.1
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 28 Apr 2017 08:22:39 +0000 (01:22 -0700)]
anv: Add support for SPIR-V 1.3 subgroup operations
This requires us to bump the subgroup size to 32 for all shader stages
because Vulkan requires that to be a physical device query.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 1 Sep 2017 22:18:02 +0000 (15:18 -0700)]
intel/fs: Add support for subgroup quad operations
NIR has code to lower these away for us but we can do significantly
better in many cases with register regioning and SIMD4x2.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 1 Sep 2017 05:12:48 +0000 (22:12 -0700)]
intel/fs: Implement reduce and scan opeprations
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 1 Sep 2017 04:50:31 +0000 (21:50 -0700)]
intel/fs: Add a helper for emitting scan operations
This commit adds a helper to the builder for emitting "scan" operations.
Given a binary operation #, a scan takes the vector [a0, a1, ..., aN]
and returns the vector [a0, a0 # a1, ..., a0 # a1 # ... # aN] where each
channel contains the combination of all previous channels. The sequence
of instructions to perform the scan is fairly optimal; a 16-wide scan on
a 32-bit type is only 6 instructions. The subgroup scan and reduction
operations will be implemented in terms of this.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 1 Sep 2017 04:45:30 +0000 (21:45 -0700)]
intel/fs: Add a couple of simple helper opcodes
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 30 Aug 2017 03:10:35 +0000 (20:10 -0700)]
spirv: Add support for subgroup arithmetic
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 30 Aug 2017 03:36:55 +0000 (20:36 -0700)]
nir: Add a helper for getting binop identities
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 30 Aug 2017 03:09:58 +0000 (20:09 -0700)]
nir: Add subgroup arithmetic reduction intrinsics
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 29 Aug 2017 17:21:31 +0000 (10:21 -0700)]
spirv: Add subgroup quad support
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 29 Aug 2017 17:20:56 +0000 (10:20 -0700)]
nir: Add quad operations and lowering
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 29 Aug 2017 16:21:32 +0000 (09:21 -0700)]
i965/fs: Add support for nir_intrinsic_shuffle
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 29 Aug 2017 16:44:44 +0000 (09:44 -0700)]
spirv: Add subgroup shuffle support
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Thu, 7 Dec 2017 05:41:47 +0000 (21:41 -0800)]
nir: Add subgroup shuffle intrinsics and lowering
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 29 Aug 2017 00:38:53 +0000 (17:38 -0700)]
i965/fs: Support nir_intrinsic_vote_feq
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 29 Aug 2017 02:55:34 +0000 (19:55 -0700)]
nir/lower_subgroups: Add scalarizing for vote_eq
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Thu, 24 Aug 2017 18:01:22 +0000 (11:01 -0700)]
spirv: Add subgroup vote support
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Tue, 29 Aug 2017 00:33:33 +0000 (17:33 -0700)]
nir: Generalize nir_intrinsic_vote_eq
The SPIR-V extension wants us to be able to do an AllEqual on any vector
or scalar type. This has two implications:
1) We need to be able to handle vectors so we switch the vote_eq
intrinsics to be vectorized intrinsics.
2) We need to handle floats which have different behavior with respect
to +-0, NaN, etc. than the integer variant so we need two variants.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>