mesa.git
8 years agoradeonsi: generalize si_set_constant_buffer
Marek Olšák [Mon, 18 Apr 2016 21:09:55 +0000 (23:09 +0200)]
radeonsi: generalize si_set_constant_buffer

this will be used in the next commit

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: make RW buffer descriptor array global, not per shader stage
Marek Olšák [Mon, 18 Apr 2016 20:41:48 +0000 (22:41 +0200)]
radeonsi: make RW buffer descriptor array global, not per shader stage

v2: also simplify invalidation of RW buffer bindings (squashed)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: rename and rearrange RW buffer slots
Marek Olšák [Mon, 18 Apr 2016 20:16:54 +0000 (22:16 +0200)]
radeonsi: rename and rearrange RW buffer slots

- use an enum
- use a unique slot number regardless of the shader stage
  (the per-stage slots will go away for RW buffers)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallivm: fix bogus argument order to lp_build_sample_mipmap function
Roland Scheidegger [Thu, 21 Apr 2016 00:52:35 +0000 (02:52 +0200)]
gallivm: fix bogus argument order to lp_build_sample_mipmap function

Screwed up since 0753b135f6e83b171d8a1b08aea967374f3542bc.

(Only an issue with different min/mag filters, and then only in some cases,
which is probably why it went unnoticed for quite a while.
The effect should have simply been nearest mip filter instead of linear, iff
min was nearest, mag was linear, and all pixels hit the mignifying path.)

Fixes a bunch of dEQP failures.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
8 years agoi965: Fix clear code for ignoring colormask for XRGB formats on Gen9+.
Kenneth Graunke [Wed, 20 Apr 2016 23:55:33 +0000 (16:55 -0700)]
i965: Fix clear code for ignoring colormask for XRGB formats on Gen9+.

In commit cda886a4851ab767fba40e8474d6fa8190347e4f, Neil made us stop
advertising RGBX formats on Gen9+, as the hardware apparently no longer
has working fast clear support for those formats.  Instead, we just
fall back to RGBA formats, and use SCS to override alpha to 1.0.

This is fine, but had one unintended side effect: it made us fall back
to slow clears when the color mask disables alpha.  Normally, we ignore
the color mask for non-existent channels.  This includes alpha for XRGB
formats as writing garbage to the X channel is harmless.  But, now that
we use RGBA, we think there's a real alpha channel, and can't do the
optimization.

To hack around this, check if _BaseFormat is GL_RGB and ignore alpha.

Improves WebGL Aquarium performance on Skylake GT3e by about 50%
by letting it use repclears instead of slow clears.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agoi965/blorp: Improve precission of blitting coordinates when clipping
Iago Toral Quiroga [Wed, 29 Jul 2015 14:01:21 +0000 (16:01 +0200)]
i965/blorp: Improve precission of blitting coordinates when clipping

We do this in two steps: first we clip the dst rect and adjust the src
rect accordingly. Then we do it the other way around. In both passes
the adjustment part involves multiplying by a scale factor that can lead
to a small precision loss. This is breaking a few dEQP tests.

Specifically, the problem happens when we need to clip the same coordinate
twice. For example, if srcX0 and dstX0 need both to be clipped we want to
avoid the situation where we clip srcX0 first, then adjust dstX0 accordingly
but then we realize that the resulting dstX0 still needs to be clipped, so
we clip dstX0 and adjust srcX0 again. Each of these two passes can lead
to precission loss. What we want to do here is detect the rect that leads
to the largest clip (accounting for the scale factor involved), clip that
rect and adjust the other one. With this we ensure that the adjusted
coordinate does not need to be clipped again and we can skip a second pass,
improving precision.

Fixes the following 4 dEQP tests:
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_linear

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
8 years agoradeonsi: Add config parameter to si_shader_apply_scratch_relocs.
Bas Nieuwenhuizen [Thu, 21 Apr 2016 16:12:48 +0000 (18:12 +0200)]
radeonsi: Add config parameter to si_shader_apply_scratch_relocs.

shader->config is not updated for compute kernels.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
8 years agoglsl: Relax GLSL 1.10 float suffix error to a warning.
Matt Turner [Wed, 20 Apr 2016 19:29:23 +0000 (12:29 -0700)]
glsl: Relax GLSL 1.10 float suffix error to a warning.

Float suffixes are allowed in all subsequent GLSL specifications, and
it's obvious what the user meant if they specify one. Accept it with a
warning to avoid breaking applications, like Planeshift (although it
looks like between 0.6.1 and 0.6.3 they might have removed the suffixes
from their shaders).

Reviewed-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Readd opt_drop_redundant_mov_to_flags().
Matt Turner [Wed, 20 Apr 2016 21:22:53 +0000 (14:22 -0700)]
i965/fs: Readd opt_drop_redundant_mov_to_flags().

This reverts commit b449366587b5f3f64c6fb45fe22c39e4bc8a4309.

I removed the pass thinking that it was now not useful, but that was not
true. I believe I ran shader-db on HSW and saw no results, but HSW does
not use the unlit centroid workaround code and as a result does not emit
redundant MOV_DISPATCH_TO_FLAGS instructions.

On IVB, the shader-db results are:

total instructions in shared programs: 6650806 -> 6646303 (-0.07%)
instructions in affected programs: 106893 -> 102390 (-4.21%)
helped: 793

total cycles in shared programs: 56195538 -> 56103720 (-0.16%)
cycles in affected programs: 873048 -> 781230 (-10.52%)
helped: 553
HURT: 209

On SNB, the shader-db results are:

total instructions in shared programs: 7173074 -> 7168541 (-0.06%)
instructions in affected programs: 119757 -> 115224 (-3.79%)
helped: 799

total cycles in shared programs: 98128032 -> 98072938 (-0.06%)
cycles in affected programs: 1437104 -> 1382010 (-3.83%)
helped: 454
HURT: 237

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agoi965/blorp: Do not emit pma stall on gen9+
Topi Pohjolainen [Thu, 21 Apr 2016 09:31:37 +0000 (12:31 +0300)]
i965/blorp: Do not emit pma stall on gen9+

This was left out from the original gen8 upload introduction.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoswr: add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT to get_param
Tim Rowley [Thu, 21 Apr 2016 16:10:29 +0000 (11:10 -0500)]
swr: add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT to get_param

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoi965: automake: remove gratuitous "+" during variable assignment
Emil Velikov [Thu, 21 Apr 2016 15:48:34 +0000 (16:48 +0100)]
i965: automake: remove gratuitous "+" during variable assignment

There is not initial assignment, thus appending to it does not work.

Fixes: b27c85c4c08 "i965: add build rule for brw_nir_trig_workarounds.c"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
8 years agogbm: add GBM_FORMAT_XBGR8888 format support
Rob Herring [Tue, 19 Apr 2016 19:38:41 +0000 (14:38 -0500)]
gbm: add GBM_FORMAT_XBGR8888 format support

Add GBM_FORMAT_XBGR8888/__DRI_IMAGE_FORMAT_XBGR8888 format support which
is needed for Android.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agost/dri: add 32-bit RGBX/RGBA formats
Rob Herring [Wed, 20 Apr 2016 22:39:54 +0000 (17:39 -0500)]
st/dri: add 32-bit RGBX/RGBA formats

Add support for 32-bit RGBX/RGBA formats which are preferred for Android.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agodri/common: add MESA_FORMAT_R8G8B8{A8, X8}_UNORM formats as supported configs
Rob Herring [Tue, 19 Apr 2016 19:38:39 +0000 (14:38 -0500)]
dri/common: add MESA_FORMAT_R8G8B8{A8, X8}_UNORM formats as supported configs

Add MESA_FORMAT_R8G8B8A8_UNORM and MESA_FORMAT_R8G8B8X8_UNORM formats as
these are the preferred formats for Android.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agoi965: add build rule for brw_nir_trig_workarounds.c on Android
Rob Herring [Tue, 19 Apr 2016 19:51:02 +0000 (14:51 -0500)]
i965: add build rule for brw_nir_trig_workarounds.c on Android

Commit bfd17c76c126 ("i965: Port INTEL_PRECISE_TRIG=1 to NIR.") added a
generated file brw_nir_trig_workarounds.c which broke the Android build.
Add the necessary makefiles to the Android build.

Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: android: add back missing generated glcpp include path
Rob Herring [Thu, 14 Apr 2016 19:40:56 +0000 (14:40 -0500)]
glsl: android: add back missing generated glcpp include path

Commit 4db8f15a2576 ("glsl: move the android build scripts a level up")
dropped a generated include path for glcpp. Add it back adjusting for the
new location.

Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoloader: add a libdrm case for loader_get_device_name_for_fd
Jonathan Gray [Mon, 21 Dec 2015 05:39:55 +0000 (16:39 +1100)]
loader: add a libdrm case for loader_get_device_name_for_fd

Use dev_node_from_fd() with HAVE_LIBDRM to provide an implmentation
of loader_get_device_name_for_fd() for non-linux systems that
use libdrm but don't have udev or sysfs.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoi965/tiled_memcpy: don't unconditionally use __builtin_bswap32
Jonathan Gray [Tue, 19 Apr 2016 02:31:20 +0000 (12:31 +1000)]
i965/tiled_memcpy: don't unconditionally use __builtin_bswap32

Use the defines Mesa configure sets to indicate presence of the bswap32
builtins.  This lets i965 work on OpenBSD again after the changes that
were made in 0a5d8d9af42fd77fce1492d55f958da97816961a.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoegl/x11: authenticate before doing chipset id ioctls
Jonathan Gray [Tue, 19 Apr 2016 02:29:36 +0000 (12:29 +1000)]
egl/x11: authenticate before doing chipset id ioctls

For systems without udev or sysfs that use drm ioctls in the loader
drm authentication must take place earlier or the loader will fail
"MESA-LOADER: failed to get param for i915".

Patch from Mark Kettenis.

Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
[Emil Velikov: remove gratuitous white-space]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agogallium/radeon: Silence possibly uninitialized variable warning.
Bas Nieuwenhuizen [Thu, 21 Apr 2016 11:23:36 +0000 (13:23 +0200)]
gallium/radeon: Silence possibly uninitialized variable warning.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: Silence possibly uninitialized variable warning.
Bas Nieuwenhuizen [Thu, 21 Apr 2016 11:22:08 +0000 (13:22 +0200)]
winsys/amdgpu: Silence possibly uninitialized variable warning.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Enable loading into CE RAM.
Bas Nieuwenhuizen [Wed, 20 Apr 2016 23:22:02 +0000 (01:22 +0200)]
radeonsi: Enable loading into CE RAM.

We need to enable a bit in the CONTEXT_CONTROL packet for the
loads to work.

v2: Style issues.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: Use defines for CONTEXT_CONTROL instead of magic values.
Bas Nieuwenhuizen [Wed, 20 Apr 2016 23:19:28 +0000 (01:19 +0200)]
radeonsi: Use defines for CONTEXT_CONTROL instead of magic values.

v2: Use field names provided by Nicolai.
v3: Updated to use CONTEXT_CONTROL prefix.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: fix preamble IB size
Thomas Hindoe Paaboel Andersen [Wed, 20 Apr 2016 20:34:02 +0000 (22:34 +0200)]
winsys/amdgpu: fix preamble IB size

The missing break caused the IB size to be overwritten with
the size of IB_CONST.

This was introduced in: 7201230582e060aa2eb79c825d3188b437ef7bb8

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
8 years agoi965/blorp: Reduce the urb size requirement for vertex buffer
Topi Pohjolainen [Tue, 19 Apr 2016 17:08:55 +0000 (20:08 +0300)]
i965/blorp: Reduce the urb size requirement for vertex buffer

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Reduce the size of vertex buffer
Topi Pohjolainen [Tue, 19 Apr 2016 16:57:43 +0000 (19:57 +0300)]
i965/blorp: Reduce the size of vertex buffer

Previously the vertex buffer consisted of eight floats per vertex
of which six where constants. These can be as easily provided by
vertex fetcher as it is capable of filling vertex elements with
constant one and zero. This reduces the size of the vertex buffer
from 3 * 8 * 4 = 96 to 3 * 2 * 4 = 24 bytes.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Do not tricker urb re-configuration unnecessarily
Topi Pohjolainen [Fri, 15 Apr 2016 11:03:18 +0000 (14:03 +0300)]
i965/blorp: Do not tricker urb re-configuration unnecessarily

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Skip re-emitting urb config whenever possible
Topi Pohjolainen [Fri, 15 Apr 2016 10:39:48 +0000 (13:39 +0300)]
i965/blorp: Skip re-emitting urb config whenever possible

Otherwise clearing with blorp will regress performance in some
synthetic test cases.

v2: Used vsize >= 2 instead of vsize > 0, and updated the comment.
    Review by Ken in one of the earlier patches revealed this.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Prepare to switch from compute pipeline
Topi Pohjolainen [Fri, 15 Apr 2016 07:43:05 +0000 (10:43 +0300)]
i965/blorp: Prepare to switch from compute pipeline

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Skip uploading state/options not needed for clears
Topi Pohjolainen [Mon, 11 Apr 2016 21:18:45 +0000 (00:18 +0300)]
i965/blorp: Skip uploading state/options not needed for clears

In case there is no source it means the program does a simple
clear or a resolve. In such case there is no need to program
sampling state or enable pixel kill in fragment shader.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Re-introduce clear programs
Topi Pohjolainen [Fri, 1 Apr 2016 12:57:54 +0000 (15:57 +0300)]
i965/blorp: Re-introduce clear programs

This partially reverts 2f28a0dc23165123cf1e8b5942acad37878edd8a

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/meta: Move check for srgb into is_color_fast_clear_compatible()
Topi Pohjolainen [Wed, 6 Apr 2016 08:38:59 +0000 (11:38 +0300)]
i965/meta: Move check for srgb into is_color_fast_clear_compatible()

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/meta: Expose check for fast clear compatibility
Topi Pohjolainen [Wed, 6 Apr 2016 07:53:04 +0000 (10:53 +0300)]
i965/meta: Expose check for fast clear compatibility

Also add the additional render format check to the same utility.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/meta: Expose fast clear value setup
Topi Pohjolainen [Mon, 4 Apr 2016 18:05:58 +0000 (21:05 +0300)]
i965/meta: Expose fast clear value setup

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/meta: Expose non-fast clear rectangle calculation
Topi Pohjolainen [Mon, 4 Apr 2016 10:43:24 +0000 (13:43 +0300)]
i965/meta: Expose non-fast clear rectangle calculation

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/meta: Expose resolve clear rectangle calculation
Topi Pohjolainen [Mon, 4 Apr 2016 08:28:03 +0000 (11:28 +0300)]
i965/meta: Expose resolve clear rectangle calculation

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/meta: Expose fast clear rectangle calculation
Topi Pohjolainen [Sun, 3 Apr 2016 19:10:14 +0000 (22:10 +0300)]
i965/meta: Expose fast clear rectangle calculation

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Declare input to mcs alignment calculation constant
Topi Pohjolainen [Sun, 3 Apr 2016 19:15:13 +0000 (22:15 +0300)]
i965: Declare input to mcs alignment calculation constant

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Switch the order of render and texture targets
Topi Pohjolainen [Sun, 3 Apr 2016 18:38:24 +0000 (21:38 +0300)]
i965/blorp: Switch the order of render and texture targets

On gen8 color resolving won't work anymore if the target isn't
the first entry in the binding table.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Reduce scope for generator and its inputs
Topi Pohjolainen [Tue, 5 Apr 2016 07:36:11 +0000 (10:36 +0300)]
i965/blorp: Reduce scope for generator and its inputs

Generator is only needed for getting the assembly.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Add support for disabling color blending
Topi Pohjolainen [Sun, 3 Apr 2016 15:51:43 +0000 (18:51 +0300)]
i965/blorp: Add support for disabling color blending

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Add support for setting fast clear operation
Topi Pohjolainen [Fri, 1 Apr 2016 18:42:21 +0000 (21:42 +0300)]
i965/blorp: Add support for setting fast clear operation

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Enable blits on gen8
Topi Pohjolainen [Wed, 30 Mar 2016 17:41:30 +0000 (20:41 +0300)]
i965/blorp: Enable blits on gen8

v2 (Ken): Moved switch cases for gen8/9 in texel_fetch() to
          earlier patch adding gen8/9 sampling support.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Prepare stencil sampling for gen8
Topi Pohjolainen [Thu, 7 Apr 2016 15:50:56 +0000 (18:50 +0300)]
i965/blorp: Prepare stencil sampling for gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Add check for supported sample numbers
Topi Pohjolainen [Fri, 1 Apr 2016 09:01:23 +0000 (12:01 +0300)]
i965/blorp: Add check for supported sample numbers

v2 (Ken): Fix the condition on using meta for stencil blits:
          use_blorp -> !use_blorp

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Add support for sampling 3D textures
Topi Pohjolainen [Fri, 8 Apr 2016 07:22:37 +0000 (10:22 +0300)]
i965/blorp: Add support for sampling 3D textures

This patch adds additional MOV instruction for all blorp programs
that use SHADER_OPCODE_TXF. Alternative is to augment blorp program
key to tell if z-coordinate is needed, add condition to the blorp
blit compiler and to produce a variant with and without the MOV.
This seems a little overkill.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Add support for source swizzle
Topi Pohjolainen [Sat, 9 Apr 2016 16:48:14 +0000 (19:48 +0300)]
i965/blorp: Add support for source swizzle

In order to support cases where gen9 uses RGBA format to back client
requested RGB, one needs to have means to force alpha channel to one
when user requested RGB surface is used as blit source.

v2 (Ken): Use helper for constructing the swizzle (this should be
          changed to use brw_get_texture_swizzle() as a follow-up).
          Also calculate the swizzle for CopyTexSubImage.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Pipeline upload support for gen8
Topi Pohjolainen [Tue, 29 Mar 2016 07:50:42 +0000 (10:50 +0300)]
i965/blorp: Pipeline upload support for gen8

v2 (Ken): Drop GEN8_RASTER_FRONT_WINDING_CCW in raster state
          Add emission of pma stall.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen8: Expose pma stall emission
Topi Pohjolainen [Thu, 21 Apr 2016 07:12:46 +0000 (10:12 +0300)]
i965/gen8: Expose pma stall emission

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Allow texture surface state setup to be used by blorp
Topi Pohjolainen [Thu, 7 Apr 2016 10:09:52 +0000 (13:09 +0300)]
i965: Allow texture surface state setup to be used by blorp

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Prepare sampling for gen9
Topi Pohjolainen [Fri, 1 Apr 2016 08:21:03 +0000 (11:21 +0300)]
i965/blorp: Prepare sampling for gen9

v2 (Ken): Added switch cases for gen8/9 in texel_fetch(). These
          were wrongly introduced in blit-enabling patch.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Prepare render target write for gen8
Topi Pohjolainen [Wed, 30 Mar 2016 17:50:41 +0000 (20:50 +0300)]
i965/blorp: Prepare render target write for gen8

v2 (Ken): Use payload directly instead of retyping it into vec8.
          Drop the implied header, it isn't used for gen6+ anyway.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp/gen6: Prepare vertex buffer setup logic for gen8
Topi Pohjolainen [Fri, 6 Mar 2015 12:21:25 +0000 (14:21 +0200)]
i965/blorp/gen6: Prepare vertex buffer setup logic for gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp/gen7: Expose state setup applicable to gen8
Topi Pohjolainen [Sun, 1 Mar 2015 20:38:59 +0000 (22:38 +0200)]
i965/blorp/gen7: Expose state setup applicable to gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Use 8k chunk size for urb allocation
Topi Pohjolainen [Fri, 15 Apr 2016 07:12:20 +0000 (10:12 +0300)]
i965/blorp: Use 8k chunk size for urb allocation

Previously, we hardcoded "VS URB Starting Address" to 2 (in 8kB chunks),
which meant VS URB data would start at an offset of 16kB.

However, on Haswell GT3 and Gen8+, we allocate the first 32kB for the
push constant region.  This means that the PS push constant and VS URB
data regions overlap, which can lead to corruption.

v2 (Ken): Better description of the change, and do not change vs_size
          from 2 to 1.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp/gen7: Prepare re-using for gen8
Topi Pohjolainen [Fri, 6 Mar 2015 13:55:02 +0000 (15:55 +0200)]
i965/blorp/gen7: Prepare re-using for gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Let compiler calculate the vertex buffer size
Topi Pohjolainen [Tue, 12 Apr 2016 06:27:00 +0000 (09:27 +0300)]
i965/blorp: Let compiler calculate the vertex buffer size

Currently the size is sizeof(float) times too large. One reserves
GEN6_BLORP_VBO_SIZE many floats whereas GEN6_BLORP_VBO_SIZE stands
for the size of vertex buffer in bytes.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen8: Expose state base address setup
Topi Pohjolainen [Mon, 2 Mar 2015 09:29:05 +0000 (11:29 +0200)]
i965/gen8: Expose state base address setup

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen8: Expose surface state helpers
Topi Pohjolainen [Tue, 29 Mar 2016 08:36:23 +0000 (11:36 +0300)]
i965/gen8: Expose surface state helpers

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen9: Use correct size for DS_STATE
Topi Pohjolainen [Thu, 31 Mar 2016 07:19:24 +0000 (10:19 +0300)]
i965/gen9: Use correct size for DS_STATE

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl: add forgotten textureOffset function for sampler2DArrayShadow
Roland Scheidegger [Tue, 19 Apr 2016 00:21:35 +0000 (02:21 +0200)]
glsl: add forgotten textureOffset function for sampler2DArrayShadow

This was part of EXT_gpu_shader4 - as such it should have been supported
by glsl 130.
It was however forgotten, and not added until glsl 430 - with the wrong
syntax no less (glsl 430 mentions it was overlooked).
glsl 440 (but revision 8 only) fixed this finally for good.
At least nvidia supports this with just version glsl version 1.30 as well
(the spec doesn't explicitly say it should be supported retroactively),
so just add this to the other glsl 130 textureOffset functions.

Passes a (hacked) piglit tex-miplevel-selection test (2DArrayShadow
textureOffset -auto) with llvmpipe.

v2: fix up comment (by Ian), add testing to commit message.

Reviewed-by: Dave Airlie <airlied@gmail.com>
8 years agoi965: Fix interpolateAtSample() on single sampled buffers.
Kenneth Graunke [Wed, 6 Apr 2016 05:32:45 +0000 (22:32 -0700)]
i965: Fix interpolateAtSample() on single sampled buffers.

Fixes dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_sample.non_multisample_buffer.sample_n_default_framebuffer
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_rbo
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Fix gl_SampleMaskIn[] in per-sample shading mode.
Kenneth Graunke [Wed, 6 Apr 2016 03:14:22 +0000 (20:14 -0700)]
i965: Fix gl_SampleMaskIn[] in per-sample shading mode.

The coverage mask is not sufficient - in per-sample mode, we also need
to AND with a mask representing the samples being processed by the
current fragment shader invocation.

Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests:

sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8}
sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Only enable oMask output when there's a multisample FBO.
Kenneth Graunke [Tue, 5 Apr 2016 09:09:08 +0000 (02:09 -0700)]
i965: Only enable oMask output when there's a multisample FBO.

The ARB_sample_shading specification says that setting gl_SampleMask
bits to 0 means that the corresponding sample "should be considered
uncovered for the purposes of multisample fragment operations
(Section 4.1.3)."

The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment
Operations") specifies:

"No changes to the fragment alpha or coverage values are made at this
 step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS
 is not one."

oMask output alters coverage masks and can kill pixels.  We need to
disable it in the above case, which conveniently corresponds to
key->multisample_fbo being false.

Khronos bug #12188 also spells this out clearly:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188

Fixes two Piglit tests:
tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0
tests/spec/arb_sample_shading/builtin-gl-sample-mask 0

Fixes 21 ES3 conformance tests:
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7

Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask.discard_half_per_pixel.default_framebuffer
sample_mask.discard_half_per_pixel.singlesample_rbo
sample_mask.discard_half_per_pixel.singlesample_texture
sample_mask.discard_half_per_sample.default_framebuffer
sample_mask.discard_half_per_sample.singlesample_rbo
sample_mask.discard_half_per_sample.singlesample_texture
sample_mask.discard_half_per_two_samples.default_framebuffer
sample_mask.discard_half_per_two_samples.singlesample_rbo
sample_mask.discard_half_per_two_samples.singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.
Kenneth Graunke [Wed, 6 Apr 2016 02:35:46 +0000 (19:35 -0700)]
i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.

I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch.  This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.

The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value.  So we
can just eliminate the check and generalize the field.

v2: Also update the Vulkan driver.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Delete now dead persample_2x FS program key flag.
Kenneth Graunke [Wed, 6 Apr 2016 02:33:04 +0000 (19:33 -0700)]
i965: Delete now dead persample_2x FS program key flag.

This was only used by the old gl_SampleID calculations.  The new code
doesn't need to handle 2x specially.

v2: Delete it from the Vulkan driver, too.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Simplify gl_SampleID setup on Gen8+.
Kenneth Graunke [Wed, 6 Apr 2016 02:29:36 +0000 (19:29 -0700)]
i965: Simplify gl_SampleID setup on Gen8+.

On Gen7+, the thread payload provides the sample ID - we can read it
in two instructions, without any elaborate calculations.  We don't even
need a state dependency - this will properly produce zero in the
non-MSAA case.  Unfortunately, we need the state flag anyway, so we
may as well continue to use it to produce a single MOV 0 instead of
SHR/AND.

For some reason, the sample ID field is always zero on Gen7/7.5, so
we can't use this yet.  However, it works fine on Gen8+.  So, land the
code and use it where it's working, and leave a TODO for later.

v2: Fix register types in the comment (caught by Matt Turner!).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Flip key->compute_sample_id check.
Kenneth Graunke [Tue, 19 Apr 2016 01:11:01 +0000 (18:11 -0700)]
i965: Flip key->compute_sample_id check.

This just moves the simple case first.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agost/mesa: Use correct size for compute CAPs.
Bas Nieuwenhuizen [Wed, 20 Apr 2016 13:31:22 +0000 (15:31 +0200)]
st/mesa: Use correct size for compute CAPs.

Some CAPs are stored as 64-bit value while Mesa stores
the related constant as 32-bit value.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agoi965: Properly handle integer types in opt_vector_float().
Kenneth Graunke [Wed, 13 Apr 2016 23:58:10 +0000 (16:58 -0700)]
i965: Properly handle integer types in opt_vector_float().

Previously, opt_vector_float() always interpreted MOV sources as
floating point, and always created a MOV with a F-type destination.

This meant that we could mess up sequences of integer loads, such as:

   mov vgrf6.0.x:D, 0D
   mov vgrf6.0.y:D, 1D
   mov vgrf6.0.z:D, 2D
   mov vgrf6.0.w:D, 3D

Here, integer 0/1/2/3 become approximately 0.0f, so we generated:

   mov vgrf6.0:F, [0F, 0F, 0F, 0F]

which is clearly wrong.  We can properly handle this by converting
integer values to float (rather than bitcasting), and emitting a type
converting MOV:

   mov vgrf6.0:D, [0F, 1F, 2F, 3F]

To do this, see first see if the integer values (converted to float)
are representable.  If so, we use a D-type MOV.  If not, we then try
the floating point values and an F-type MOV.  We make zero not impose
type restrictions.  This is important because 0D would imply a D-type
MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D,
where we want to use an F-type MOV.

Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend.  This
recently became visible due to changes in opt_vector_float() which
made it optimize more cases, but it was a pre-existing bug.

Apparently it also manages to turn more integer loads into VFs,
producing the following shader-db statistics on Haswell:

total instructions in shared programs: 7084195 -> 7082191 (-0.03%)
instructions in affected programs: 246027 -> 244023 (-0.81%)
helped: 1937

total cycles in shared programs: 65669642 -> 65651968 (-0.03%)
cycles in affected programs: 531064 -> 513390 (-3.33%)
helped: 1177

v2: Handle the type of zero better.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Make opt_vector_float() only handle non-type-conversion MOVs.
Kenneth Graunke [Wed, 13 Apr 2016 23:39:54 +0000 (16:39 -0700)]
i965: Make opt_vector_float() only handle non-type-conversion MOVs.

We don't handle this properly - we'd have to perform the type conversion
before trying to convert the value to a VF.

While we could do that, it doesn't seem particularly useful - most
vector loads should be consistently typed (all float or all integer).

As a special case, we do allow type-converting MOVs of integer 0, as
it's represented the same regardless of the type.  I believe this case
does actually come up.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Fold vectorize_mov() back into the one caller.
Kenneth Graunke [Wed, 13 Apr 2016 23:04:04 +0000 (16:04 -0700)]
i965: Fold vectorize_mov() back into the one caller.

After the previous patch, this helper is only called in one place.
So, just fold it back in - there are a lot of parameters here and
not much code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Rework opt_vector_float() control flow.
Kenneth Graunke [Wed, 13 Apr 2016 22:56:07 +0000 (15:56 -0700)]
i965: Rework opt_vector_float() control flow.

This reworks opt_vector_float() so that there's only one place that
flushes out any accumulated state and emits a VF.

v2: Don't break the sequence for non-representable numbers - just skip
    recording their values.  Only break it for non-MOVs or register
    changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoanv: s/anv_batch_emit_blk/anv_batch_emit/
Jason Ekstrand [Tue, 19 Apr 2016 00:03:00 +0000 (17:03 -0700)]
anv: s/anv_batch_emit_blk/anv_batch_emit/

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv: Remove the old emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:53:11 +0000 (16:53 -0700)]
anv: Remove the old emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen7_pipeline: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:51:12 +0000 (16:51 -0700)]
anv/gen7_pipeline: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen7_cmd_buffer: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:33:46 +0000 (16:33 -0700)]
anv/gen7_cmd_buffer: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/device: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:32:29 +0000 (15:32 -0700)]
anv/device: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/state: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:24:59 +0000 (15:24 -0700)]
anv/state: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen8_pipeline: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:53:14 +0000 (15:53 -0700)]
anv/gen8_pipeline: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/genX_pipeline: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:29:42 +0000 (15:29 -0700)]
anv/genX_pipeline: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen8_cmd_buffer: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:08:49 +0000 (16:08 -0700)]
anv/gen8_cmd_buffer: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for quaries
Jason Ekstrand [Mon, 18 Apr 2016 22:20:06 +0000 (15:20 -0700)]
anv/cmd_buffer: Use the new emit macro for quaries

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for DRAWING_RECTANGLE
Jason Ekstrand [Mon, 18 Apr 2016 22:14:47 +0000 (15:14 -0700)]
anv/cmd_buffer: Use the new emit macro for DRAWING_RECTANGLE

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for compute shader dispatch
Jason Ekstrand [Mon, 18 Apr 2016 22:11:43 +0000 (15:11 -0700)]
anv/cmd_buffer: Use the new emit macro for compute shader dispatch

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for 3DSTATE_CONSTANT
Jason Ekstrand [Mon, 18 Apr 2016 21:55:10 +0000 (14:55 -0700)]
anv/cmd_buffer: Use the new emit macro for 3DSTATE_CONSTANT

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for DEPTH/STENCIL_BUFFER
Jason Ekstrand [Mon, 18 Apr 2016 21:48:33 +0000 (14:48 -0700)]
anv/cmd_buffer: Use the new emit macro for DEPTH/STENCIL_BUFFER

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for PIPE_CONTROL and STATE_BASE_ADDRESS
Jason Ekstrand [Mon, 18 Apr 2016 21:41:06 +0000 (14:41 -0700)]
anv/cmd_buffer: Use the new emit macro for PIPE_CONTROL and STATE_BASE_ADDRESS

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for 3DPRIMITIVE commands
Jason Ekstrand [Mon, 18 Apr 2016 21:27:29 +0000 (14:27 -0700)]
anv/cmd_buffer: Use the new emit macro for 3DPRIMITIVE commands

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv: Add a new block-based batch emit macro
Jason Ekstrand [Mon, 18 Apr 2016 21:25:03 +0000 (14:25 -0700)]
anv: Add a new block-based batch emit macro

This new macro uses a for loop to create an actual code block in which to
place the macro setup code.  One advantage of this is that you syntatically
use braces instead of parentheses.  Another is that the code in the block
doesn't even get executed if anv_batch_emit_dwords fails.

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agogk110/ir: make use of IMUL32I for all immediates
Samuel Pitoiset [Wed, 20 Apr 2016 17:06:24 +0000 (19:06 +0200)]
gk110/ir: make use of IMUL32I for all immediates

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
8 years agogk110/ir: do not overwrite def value with zero for EXCH ops
Samuel Pitoiset [Wed, 20 Apr 2016 17:47:37 +0000 (19:47 +0200)]
gk110/ir: do not overwrite def value with zero for EXCH ops

This is only valid for other atomic operations (including CAS). This
fixes an invalid opcode error from dmesg. While we are it, make sure
to initialize global addr to 0 for other atomic operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
8 years agoanv: fix build without Wayland platform
Marcin Ślusarz [Sat, 16 Apr 2016 20:48:09 +0000 (22:48 +0200)]
anv: fix build without Wayland platform

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoanv: fix building on i686 with -mcpu=generic
Laurent Carlier [Sat, 16 Apr 2016 19:50:39 +0000 (21:50 +0200)]
anv: fix building on i686 with -mcpu=generic

mcpu=generic doesn't enable sse2, and anvil definitly needs it

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agospirv: Trivially handle the NonWriteable decoration
Jason Ekstrand [Wed, 20 Apr 2016 17:32:59 +0000 (10:32 -0700)]
spirv: Trivially handle the NonWriteable decoration

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonir: rename nir_foreach_block*() to nir_foreach_block*_call()
Connor Abbott [Wed, 13 Apr 2016 20:25:34 +0000 (16:25 -0400)]
nir: rename nir_foreach_block*() to nir_foreach_block*_call()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonvc0: avoid tex read fault from compute shaders on GK110
Samuel Pitoiset [Sun, 10 Apr 2016 20:08:34 +0000 (22:08 +0200)]
nvc0: avoid tex read fault from compute shaders on GK110

After some investigation, it seems like that disabling the UNK02C4
command avoid a read fault with texelFetch() from a compute shader.

I have no clue on what this method actually does, but this avoid the
GPU to hang with basic-texelFetch.shader_test without introducing any
compute-related regressions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoi965/vec4: Always split uniforms in array_access_to_pull_constants
Jason Ekstrand [Tue, 19 Apr 2016 01:57:07 +0000 (18:57 -0700)]
i965/vec4: Always split uniforms in array_access_to_pull_constants

Normally, we split uniforms at the end but in Vulkan, we bail because we
don't want pull constants.  However, we still need them split because
pack_uniforms relies on it.

I really don't like this patch not because it doesn't work (it does) but
because now that we're using MOV_INDIRECT, uniform numbers and sizes don't
really matter anymore.  In the FS backend, uniform splitting and packing is
handled all at once (actual re-assignment of locations happens later) and
we really should do it that way in vec4 eventually as well.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001

8 years agoi965/vec4: Use the correct offset for the swizzle shift in push constants
Jason Ekstrand [Tue, 19 Apr 2016 01:52:36 +0000 (18:52 -0700)]
i965/vec4: Use the correct offset for the swizzle shift in push constants

This was actually caught by Ken in review the first time around but somehow
didn't get fixed before the patches were pushed. :-(

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001