mesa.git
6 years agointel/compiler: Memory fence commit must always be enabled for gen10+
Anuj Phogat [Wed, 7 Feb 2018 01:09:09 +0000 (17:09 -0800)]
intel/compiler: Memory fence commit must always be enabled for gen10+

Commit bit in the message descriptor (Bit 13) must be always set
to true in CNL+ for memory fence messages. It also fixes a piglit
GPU hang on cnl+ in simulation environment.
Piglit test: arb_shader_image_load_store-shader-mem-barrier
See HSD ES # 1404612949

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agoRevert "i965/fs: Predicate byte scattered writes if needed"
Francisco Jerez [Sun, 25 Feb 2018 00:05:21 +0000 (16:05 -0800)]
Revert "i965/fs: Predicate byte scattered writes if needed"

This reverts commit a4031bdfa927fb4c3c5d0bdadc70634f3c1a5eac.  It's
redundant with the sample mask predication done at this point by the
common logical send lowering infrastructure, and rather buggy because
it wasn't applying the correct sample mask in shaders using discard,
since the dispatch mask returned by FS_OPCODE_MOV_DISPATCH_TO_FLAGS
doesn't reflect samples discarded by the shader, so it could have led
to data corruption in fragment shader invocations that execute discard
based on a non-dynamically uniform condition.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/fs: Handle surface opcode sample masks via predication.
Francisco Jerez [Tue, 12 Dec 2017 20:05:04 +0000 (12:05 -0800)]
intel/fs: Handle surface opcode sample masks via predication.

The main motivation is to enable HDC surface opcodes on ICL which no
longer allows the sample mask to be provided in a message header, but
this is enabled all the way back to IVB when possible because it
decreases the instruction count of some shaders using HDC messages
significantly, e.g. one of the SynMark2 CSDof compute shaders
decreases instruction count by about 40% due to the removal of header
setup boilerplate which in turn makes a number of send message
payloads more easily CSE-able.  Shader-db results on SKL:

 total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
 instructions in affected programs: 311532 -> 300597 (-3.51%)
 helped: 491
 HURT: 1

Shader-db results on BDW where the optimization needs to be disabled
in some cases due to hardware restrictions:

 total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
 instructions in affected programs: 220863 -> 214097 (-3.06%)
 helped: 351
 HURT: 0

The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
laptop with this change.  According to Eero this improves performance
of the same test by 9% on BYT and by 7-8% on BXT J4205 and on SKL GT2
desktop.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
6 years agointel/eu: Plumb header present bit to codegen helpers for HDC messages.
Francisco Jerez [Tue, 12 Dec 2017 20:05:03 +0000 (12:05 -0800)]
intel/eu: Plumb header present bit to codegen helpers for HDC messages.

This makes sure that the header-present bit of the message descriptor
is in sync with the IR instruction fields, which gives the optimizer
more control to avoid the overhead of setting up a message header when
it's possible to do so.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/ir: Allow arbitrary scratch flag registers for SHADER_OPCODE_FIND_LIVE_CHANNEL.
Francisco Jerez [Thu, 22 Feb 2018 20:49:01 +0000 (12:49 -0800)]
intel/ir: Allow arbitrary scratch flag registers for SHADER_OPCODE_FIND_LIVE_CHANNEL.

This shouldn't cause any functional change at this point, it changes
SHADER_OPCODE_FIND_LIVE_CHANNEL to use the flag register specified at
the IR level instead of the hard-coded f1.0, now that it can be
represented in backend_instruction::flag_subreg.  This will be
necessary for scheduling to behave correctly once more things start
making use of f1.0.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/ir: Allow representing additional flag subregisters in the IR.
Francisco Jerez [Tue, 12 Dec 2017 20:05:02 +0000 (12:05 -0800)]
intel/ir: Allow representing additional flag subregisters in the IR.

This allows representing conditional mods and predicates on f1.0-f1.1
at the IR level by adding an extra bit to the flag_subreg
backend_instruction field.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/l3: Don't allocate SLM partition on ICL+.
Francisco Jerez [Tue, 12 Dec 2017 20:05:00 +0000 (12:05 -0800)]
intel/l3: Don't allocate SLM partition on ICL+.

SLM has a chunk of special-purpose memory separate from L3 on ICL+, we
shouldn't allocate a partition for it on L3 anymore.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agosvga: add SVGA_NEW_PRESCALE to the tracked dirty mask for gs
Charmaine Lee [Tue, 27 Feb 2018 12:09:58 +0000 (04:09 -0800)]
svga: add SVGA_NEW_PRESCALE to the tracked dirty mask for gs

Since geometry shader also consumes prescale constants, the
geometry shader constant buffer will need to be updated when prescale
factor is changed.

Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agosvga: fix blending regression
Brian Paul [Thu, 22 Feb 2018 04:00:38 +0000 (21:00 -0700)]
svga: fix blending regression

The earlier Mesa commit 3d06c8afb5 ("st/mesa: don't translate blend
state when it's disabled for a colorbuffer") subtly changed the
details of gallium's per-RT blend state.

In particular, when pipe_rt_blend_state[i].blend_enabled is true,
we have to get the src/dst blend terms from pipe_rt_blend_state[i],
not [0] as before.

We now have to scan the blend targets to find the first one that's
enabled (if any).  We have to use the index of that target for getting
the src/dst blend terms.  And note that we have to set identical blend
terms for all targets.

This fixes the Piglit fbo-drawbuffers2-blend test.  VMware bug 2063493.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
6 years agosvga: check svga_have_vgpu10() in svga_delete_blend_state()
Brian Paul [Thu, 22 Feb 2018 20:22:11 +0000 (13:22 -0700)]
svga: check svga_have_vgpu10() in svga_delete_blend_state()

We were calling SVGA3D_vgpu10_DestroyBlendState() when vgpu10 was not
enabled (bs->id==0 by default), resulting in lots of device errors.

Reviewed-by: Neha Bhende<bhenden@vmware.com>
6 years agosvga: if svga_update_state() fails, skip the draw call
Brian Paul [Wed, 21 Feb 2018 20:57:39 +0000 (13:57 -0700)]
svga: if svga_update_state() fails, skip the draw call

If svga_update_state() fails, we flush the command buffer and retry.
If it fails again, it likely means we were unable to translate a shader
for some reason (uses too many resources, for example).  In that case,
let's just skip the draw call.  The alternative, just disabling the
shader stage in question, would certainly lead to bad rendering anyway,
and probably device errors.

Fixes failed assertion running Piglit glsl-1.50/execution/
variable-indexing/gs-output-array-vec4-index-wr.shader_test since it
uses too many GS output registers (though the test still fails).
VMware bug 2063492.

v2: also call pipe_debug_message() so apps or apitrace can be notified
when this issue occurs.
v3: use svga_update_state_retry().

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
6 years agosvga: let svga_update_state_retry() return a bool
Brian Paul [Thu, 22 Feb 2018 16:32:33 +0000 (09:32 -0700)]
svga: let svga_update_state_retry() return a bool

This will allow minor simplifications elsewhere.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
6 years agosvga: s/unsigned/boolean/ for a few local vars
Brian Paul [Thu, 22 Feb 2018 21:43:41 +0000 (14:43 -0700)]
svga: s/unsigned/boolean/ for a few local vars

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
6 years agomeson: install vulkan_intel.h header
Dylan Baker [Fri, 2 Mar 2018 18:28:11 +0000 (10:28 -0800)]
meson: install vulkan_intel.h header

Fixes: d1992255bb29054fa51763376d125183a9f602f3
       ("meson: Add build Intel "anv" vulkan driver")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agost/omx_bellagio: add picture profile and entry point
Boyuan Zhang [Fri, 2 Mar 2018 16:11:01 +0000 (11:11 -0500)]
st/omx_bellagio: add picture profile and entry point

Profile and entry point were missing in the picture structure.
Therefore, add them back.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agoradeonsi: fix radeon create encoder return
Boyuan Zhang [Tue, 27 Feb 2018 22:29:44 +0000 (17:29 -0500)]
radeonsi: fix radeon create encoder return

Previous patch missed a "return" when trying to modify the create encoder
function, which made the whole logic fail. Therefore, add the return back.

Fixes: b38b208ff8886e799d6a2 "radeonsi:create uvd hevc enc entry"
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agoloader: Add support for platform and host1x busses
Thierry Reding [Wed, 21 Dec 2016 13:15:06 +0000 (14:15 +0100)]
loader: Add support for platform and host1x busses

ARM SoCs usually have their DRM/KMS devices on the platform bus, so add
support for this bus in order to allow use of the DRI_PRIME environment
variable with those devices.

While at it, also support the host1x bus, which is effectively the same
but uses an additional layer in the bus hierarchy.

Note that it isn't enough to support the bus that has the rendering GPU
because the loader code will also try to construct an ID path tag for a
scanout-only device if it is the default that is being opened.

The ID path tag for a device can be obtained by running udevadm info on
the device node, as shown in this example on NVIDIA Tegra:

$ udevadm info /dev/dri/card0 | grep ID_PATH_TAG
E: ID_PATH_TAG=platform-50000000_host1x

The corresponding OF_FULLNAME property, from which the ID_PATH_TAG is
constructed, can be found in the sysfs "uevent" attribute for the card0
device's parent:

$ grep OF_FULLNAME /sys/devices/platform/50000000.host1x/drm/uevent
OF_FULLNAME=/host1x@50000000

Similarily, /dev/dri/card1 corresponds to the GPU:

$ udevadm info /dev/dri/card1 | grep ID_PATH_TAG
E: ID_PATH_TAG=platform-57000000_gpu

and:

$ grep OF_FULLNAME /sys/devices/platform/57000000.gpu/uevent
OF_FULLNAME=/gpu@57000000

Changes in v2:
- avoid confusing pre-increment in strdup()
- add examples of tags to commit message

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agodisk cache: Link with -latomic if necessary
Thierry Reding [Fri, 23 Feb 2018 13:13:27 +0000 (14:13 +0100)]
disk cache: Link with -latomic if necessary

The disk cache implementation uses 64-bit atomic operations. For some
architectures, such as 32-bit ARM, GCC will not be able to translate
these operations into atomic, lock-free instructions and will instead
rely on the external atomics library to provide these operations.

Check at configuration time whether or not linking against libatomic
is necessary and if so, create a dependency that can be used while
linking the mesautil library.

This is the meson equivalent of 2ef7f23820a6 ("configure: check if
-latomic is needed for __atomic_*").

For some background information on this, see:

https://gcc.gnu.org/wiki/Atomic/GCCMM

Changes in v2:
- clarify meaning of lock-free in commit message
- fix build if -latomic is not necessary

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agoradv: do not set pending_reset_query in BeginCommandBuffer()
Samuel Pitoiset [Thu, 1 Mar 2018 09:53:49 +0000 (10:53 +0100)]
radv: do not set pending_reset_query in BeginCommandBuffer()

This is just useless for two reasons:
1) flush_bits is not set accordingly, so nothing will be flushed
   in BeginQuery().
2) we always flush caches in EndCommandBuffer(), so if a reset
   is done in a previous command buffer we are safe.

Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agor600/cayman: fix fragcood loading recip generation.
Dave Airlie [Thu, 1 Mar 2018 03:38:32 +0000 (03:38 +0000)]
r600/cayman: fix fragcood loading recip generation.

This fixes some hangs seen where the recip_ieee opcodes would
end up split across the wrong slots.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoi965: Allow 48-bit addressing on Gen8+.
Kenneth Graunke [Mon, 12 Feb 2018 15:18:29 +0000 (07:18 -0800)]
i965: Allow 48-bit addressing on Gen8+.

This allows most GPU objects to use the full 48-bit address space
offered by Gen8+ platforms, rather than being stuck with 32-bit.
This expands the available GPU memory from 4G to 256TB or so.

A few objects - instruction, scratch, and vertex buffers - need to
remain pinned in the low 4GB of the address space for various reasons.
We default everything to 48-bit but disable it in those cases.

Thanks to Jason Ekstrand for blazing this trail in anv first and
finding the nasty undocumented hardware issues.  This patch simply
rips off all of his findings.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: Shorten the name of the workaround BO.
Kenneth Graunke [Mon, 26 Feb 2018 23:51:04 +0000 (15:51 -0800)]
i965: Shorten the name of the workaround BO.

This makes the name shorter in debug printouts.  If "workaround_bo"
is good enough for the code, it's probably good enough for debugging.

6 years agoi965: Add debugging code to dump the validation list.
Kenneth Graunke [Tue, 28 Nov 2017 18:07:43 +0000 (10:07 -0800)]
i965: Add debugging code to dump the validation list.

When anything goes wrong with this code, dumping the validation list
is a useful way to figure out what's happening.

6 years agointel/fs: Set up sampler message headers in the visitor on gen7+
Jason Ekstrand [Thu, 1 Mar 2018 03:57:44 +0000 (19:57 -0800)]
intel/fs: Set up sampler message headers in the visitor on gen7+

This gives the scheduler visibility into the headers which should
improve scheduling.  More importantly, however, it lets the scheduler
know that the header gets written.  As-is, the scheduler thinks that a
texture instruction only reads it's payload and is unaware that it may
write to the first register so it may reorder it with respect to a read
from that register.  This is causing issues in a couple of Dota 2 vertex
shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104923
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agoac: fix nir_intrinsic_shared_atomic_comp_swap handling
Timothy Arceri [Thu, 1 Mar 2018 09:17:38 +0000 (20:17 +1100)]
ac: fix nir_intrinsic_shared_atomic_comp_swap handling

Following on from 49879f377870 this makes sure we use the correct
src index.

Fixes cts test:
KHR-GL46.compute_shader.atomic-case3

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agost/glsl_to_nir: simplify st_nir_assign_var_locations() and fix for fs outputs
Timothy Arceri [Thu, 1 Mar 2018 02:39:20 +0000 (13:39 +1100)]
st/glsl_to_nir: simplify st_nir_assign_var_locations() and fix for fs outputs

We only need to check for previously processed location on user
defined varyings as they are the only ones that support component
packing. Therefore a single instance of processed_locs can be
shared by regular varyings and patches.

For simplicity we make processed_locs an array in order to handle
dual source bleanding.

Fixes the follow piglit test on radeonsi:
tests/spec/arb_enhanced_layouts/execution/component-layout/fs-output.shader_test

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoanv: Enable MSAA fast-clears
Jason Ekstrand [Sat, 24 Feb 2018 05:12:50 +0000 (21:12 -0800)]
anv: Enable MSAA fast-clears

This speeds up the Sascha Willems multisampling demo by around 25% when
using 8x or 16x MSAA.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoanv/cmd_buffer: Add support for MCS fast-clears and resolves
Jason Ekstrand [Sat, 24 Feb 2018 05:12:35 +0000 (21:12 -0800)]
anv/cmd_buffer: Add support for MCS fast-clears and resolves

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoanv/cmd_buffer: Add helpers for computing resolve predicates
Jason Ekstrand [Sat, 24 Feb 2018 05:00:52 +0000 (21:00 -0800)]
anv/cmd_buffer: Add helpers for computing resolve predicates

We'll want to re-use the complex resolve predicate computations for MCS
resolves so it's nice to have them as helper functions.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoanv/cmd_buffer: Handle MCS identical to CCS_E in compute_aux_usage
Jason Ekstrand [Sat, 24 Feb 2018 04:45:26 +0000 (20:45 -0800)]
anv/cmd_buffer: Handle MCS identical to CCS_E in compute_aux_usage

This doesn't actually do anything because att_state->fast_clear is
determined based on the return value of anv_layout_to_fast_clear_type
which currently returns NONE for multisampled images.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoanv/blorp: Pass the clear address to blorp for subpass MSAA resolves
Jason Ekstrand [Sat, 24 Feb 2018 05:11:58 +0000 (21:11 -0800)]
anv/blorp: Pass the clear address to blorp for subpass MSAA resolves

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoanv/blorp: Allow indirect clear colors on blorp sources on gen7
Jason Ekstrand [Sat, 24 Feb 2018 06:05:39 +0000 (22:05 -0800)]
anv/blorp: Allow indirect clear colors on blorp sources on gen7

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoanv/blorp: Add partial clear support to anv_image_mcs_op
Jason Ekstrand [Sat, 11 Nov 2017 22:32:21 +0000 (14:32 -0800)]
anv/blorp: Add partial clear support to anv_image_mcs_op

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agointel/blorp: Add indirect clear color support to mcs_partial_resolve
Jason Ekstrand [Sat, 11 Nov 2017 22:28:17 +0000 (14:28 -0800)]
intel/blorp: Add indirect clear color support to mcs_partial_resolve

This is a bit complicated because we have to get the indirect clear
color in there somehow.  In order to not do any more work in the shader
than needed, we set it up as it's own vertex binding which points
directly at the clear color address specified by the client.

Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
6 years agointel/blorp: Add a helper for filling out VERTEX_BUFFER_STATE
Jason Ekstrand [Sat, 11 Nov 2017 21:40:03 +0000 (13:40 -0800)]
intel/blorp: Add a helper for filling out VERTEX_BUFFER_STATE

There are enough #ifs in there that it's kind-of pointless to duplicate
it for each buffer.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoi965: Fix RELOC_WRITE typo in brw_store_data_imm64()
Andriy Khulap [Thu, 1 Mar 2018 08:44:28 +0000 (10:44 +0200)]
i965: Fix RELOC_WRITE typo in brw_store_data_imm64()

Fixes: 6c530ad11605
("i965: Reduce passing 2x32b of reloc_domains to 2 bits")

Signed-off-by: Andriy Khulap <andriy.khulap@globallogic.com>
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agogallium/util: use sockets on PIPE_OS_UNIX in u_network
Jonathan Gray [Wed, 28 Feb 2018 10:21:14 +0000 (21:21 +1100)]
gallium/util: use sockets on PIPE_OS_UNIX in u_network

Instead of listing all the UNIX PIPE_OS platforms just use
PIPE_OS_UNIX.  Makes BSD sockets available on PIPE_OS_BSD.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agoutil: use clock_gettime() on PIPE_OS_BSD
Jonathan Gray [Wed, 28 Feb 2018 10:19:19 +0000 (21:19 +1100)]
util: use clock_gettime() on PIPE_OS_BSD

OpenBSD, FreeBSD, NetBSD and DragonFlyBSD all have clock_gettime()
so use it when PIPE_OS_BSD is defined.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agonir/search: Include 8 and 16-bit support in construct_value
Jose Maria Casanova Crespo [Thu, 1 Mar 2018 17:06:52 +0000 (18:06 +0100)]
nir/search: Include 8 and 16-bit support in construct_value

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir/search: Support 8 and 16-bit constants in match_value
Jason Ekstrand [Wed, 28 Feb 2018 21:15:04 +0000 (13:15 -0800)]
nir/search: Support 8 and 16-bit constants in match_value

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
6 years agotravis: make Meson find the proper llvm-config
Andres Gomez [Wed, 28 Feb 2018 21:18:59 +0000 (23:18 +0200)]
travis: make Meson find the proper llvm-config

Travis CI has moved to LLVM 5.0, and meson is detecting automatically
the available version in /usr/local/bin based on the PATH env variable
order preference.

As for 0.44.x, Meson cannot receive the path to the llvm-config binary
as a configuration parameter. See
https://github.com/mesonbuild/meson/issues/2887 and
https://github.com/dcbaker/meson/commit/7c8b6ee3fa42f43c9ac7dcacc61a77eca3f1bcef

We want to use the custom (APT) installed version. Therefore, let's
make Meson find our wanted version sooner than the one at
/usr/local/bin

Once this is corrected, we would still need a patch similar to:
https://lists.freedesktop.org/archives/mesa-dev/2017-December/180217.html

v2: Create the link only to the specificly wanted LLVM version (Gert).

Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Gert Wollny <gw.fossdev@gmail.com>
Cc: Jon Turney <jon.turney@dronecode.org.uk>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agomeson: fix LLVM version detection when <= 3.4
Andres Gomez [Wed, 28 Feb 2018 21:15:07 +0000 (23:15 +0200)]
meson: fix LLVM version detection when <= 3.4

3 digits versions in LLVM only started from 3.4.1 on.

Hence, even if you can perfectly build with an old LLVM (< 3.4.1) in
the system while not needing LLVM at all (auto), when passing through
the LLVM version detection code, meson will fail when accessing
"_llvm_version[2]" due to:

"Index 2 out of bounds of array of size 2."

v2: Properly compare LLVM version and set patch version to 0
    if < 3.4.1 (Eric).

v3: Improve the commit log explanation (Eric).

Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agoi965/sbe: fix number of inputs for active components
Iago Toral Quiroga [Thu, 1 Mar 2018 06:59:42 +0000 (07:59 +0100)]
i965/sbe: fix number of inputs for active components

In 16631ca30ea6 we fixed gen9 active components to account for padded
inputs in the URB, which we can have with SSO programs. To do that,
instead of going through the bitfield of inputs (which doesn't include
padding information), we compute the number of inputs from the size
of the URB entry.

Unfortunately, there are some special inputs that are not stored in
the URB and that we also need to account for. These special inputs
are identified and handled during calculate_attr_overrides().

Instead of keeping track of the exact number of inputs, we just
program active components for all possible inputs like we do in
anvil.

This fixes a regression in a WebGL program that uses Point Sprite
functionality (specifically, VARYING_SLOT_PNTC).

v2:
 - Add 'Fixes' tag (Mark Janes)
 - make no_vue_inputs int instead of uint32_t, and add const qualifier
   to num_inputs variable (Ian)

v3:
 - Do not try to count inputs correctly, just program all input
   slots like we do in anvil (Ken)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105224
Fixes: 16631ca30ea6 (i965/sbe: fix active components for SSO programs with over 16 inputs)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoradv: only emit cache flushes when the pool size is large enough
Samuel Pitoiset [Wed, 28 Feb 2018 19:28:53 +0000 (20:28 +0100)]
radv: only emit cache flushes when the pool size is large enough

This is an optimization which reduces the number of flushes for
small pool buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: keep track of the query pool size
Samuel Pitoiset [Wed, 28 Feb 2018 19:22:29 +0000 (20:22 +0100)]
radv: keep track of the query pool size

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: make sure to emit cache flushes before starting a query
Samuel Pitoiset [Wed, 28 Feb 2018 20:47:11 +0000 (21:47 +0100)]
radv: make sure to emit cache flushes before starting a query

If the query pool has been previously resetted using the compute
shader path.

Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105292
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonir/serialize: handle var->name being NULL
Alejandro Piñeiro [Wed, 28 Feb 2018 12:01:56 +0000 (13:01 +0100)]
nir/serialize: handle var->name being NULL

var->name could be NULL under ARB_gl_spirv for example. And in any
case, the code is already handing var name being NULL when reading a
variable, so it is consistent to do it writing a variable too.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoanv: Enable VK_KHR_16bit_storage for PushConstant
Jose Maria Casanova Crespo [Fri, 23 Feb 2018 00:15:13 +0000 (01:15 +0100)]
anv: Enable VK_KHR_16bit_storage for PushConstant

Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agospirv/i965/anv: Relax push constant offset assertions being 32-bit aligned
Jose Maria Casanova Crespo [Tue, 20 Feb 2018 09:28:41 +0000 (10:28 +0100)]
spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned

The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some cases.

For 16-bit types, the push constant offset takes into account the
internal offset in the 32-bit uniform bucket adding 2-bytes when we access
not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0.

v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agospirv: Calculate properly 16-bit vector sizes
Jose Maria Casanova Crespo [Thu, 22 Feb 2018 16:36:37 +0000 (17:36 +0100)]
spirv: Calculate properly 16-bit vector sizes

Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.

v2: Use glsl_get_bit_size instead of if statement
    (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoanv: Enable VK_KHR_16bit_storage for SSBO and UBO
Jose Maria Casanova Crespo [Mon, 20 Nov 2017 22:28:45 +0000 (23:28 +0100)]
anv: Enable VK_KHR_16bit_storage for SSBO and UBO

Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
Jose Maria Casanova Crespo [Wed, 31 Jan 2018 23:26:04 +0000 (00:26 +0100)]
i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout

Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.

Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non constant offsets we will always use
byte_scattered_write.

v2: (Jason Ekstrand)
    - Assert offset_reg to be multiple of 4 if it is immediate.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout
Jose Maria Casanova Crespo [Wed, 31 Jan 2018 23:05:11 +0000 (00:05 +0100)]
i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout

16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.

Vectors reads when we have non-constant offsets are implemented with
multiple byte_scattered_read messages that not require 32-bit aligned offsets.

Now for all constant offsets we can use the untyped_read_surface message.
In the case of constant offsets not aligned to 32-bits, we calculate a
start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data
function and the first_component parameter to skip the copy of the unneeded
component.

v2: (Jason Ekstrand)
    Use untyped_read_surface messages always we have constant offsets.

v3: (Jason Ekstrand)
    Simplify loop for reads with non constant offsets.
    Use end - start to calculate the number of 32-bit components to read with
    constant offsets.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/fs: shuffle_32bit_load_result_to_16bit_data now skips components
Jose Maria Casanova Crespo [Mon, 26 Feb 2018 19:28:34 +0000 (20:28 +0100)]
i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components

This helper used to load 16bit components from 32-bits read now allows
skipping components with the new parameter first_component. The semantics
now skip components until we reach the first_component, and then reads the
number of components passed to the function.

All previous uses of the helper are updated to use 0 as first_component.
This will allow read 16-bit components when the first one is not aligned
32-bit. Enabling more usages of untyped_reads with 16-bit types.

v2: (Jason Ektrand)
    Change parameters order to first_component, num_components

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoisl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit
Jose Maria Casanova Crespo [Tue, 30 Jan 2018 08:59:34 +0000 (09:59 +0100)]
isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytew multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = isl_align(buffer_size, 4) +
               (isl_align(buffer_size, 4) - buffer_size)

So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
    Move padding logic fron anv to isl_surface_state.
    Move calculus of original size from spirv to driver backend.
v3: (Jason Ekstrand)
    Rename some variables and use a similar expresion when calculating.
    padding than when obtaining the original buffer size.
    Avoid use of unnecesary component call at brw_fs_nir.
v4: (Jason Ekstrand)
    Complete comment with buffer size calculus explanation in brw_fs_nir.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agovbo: Remove vbo_save_vertex_list::vertex_size.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove vbo_save_vertex_list::vertex_size.

Like before use local variables from compile_vertex_list instead.
Remove vertex_size from struct vbo_save_vertex_list.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove vbo_save_vertex_list::buffer_offset.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove vbo_save_vertex_list::buffer_offset.

The buffer_offset is used in aligned_vertex_buffer_offset.
But now that most of these decisions are done in compile_vertex_list
we can work on local variables instead of struct members in the
display list code. Clean that up and remove buffer_offset.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove vbo_save_vertex_list::start_vertex.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove vbo_save_vertex_list::start_vertex.

Replace last use on replay with _vbo_save_get_{min,max}_index. Appart from
that it is not used anymore.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove vbo_save_vertex_list::attrsz.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove vbo_save_vertex_list::attrsz.

Is not used anymore on replay, move the last use in display list
compilation to the original array in the display list compiler.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove vbo_save_vertex_list::attrtype.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove vbo_save_vertex_list::attrtype.

Is not used anymore on replay, move the last use in display list
compilation to the original array in the display list compiler.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove vbo_save_vertex_list::enabled.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove vbo_save_vertex_list::enabled.

Is not used anymore on replay.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove reference to the vertex_store from the dlist node.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove reference to the vertex_store from the dlist node.

Since we now store a set of VAOs in the display list, use these object
to get the reference to the VBO in several places.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Implement current values update in terms of the VAO.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Implement current values update in terms of the VAO.

Use the information already present in the VAO to update the current values
after display list replay. Set GL_OUT_OF_MEMORY on allocation failure
for the current value update storage.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Implement vbo_loopback_vertex_list in terms of the VAO.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Implement vbo_loopback_vertex_list in terms of the VAO.

Use the information already present in the VAO to replay a display list
node using immediate mode draw commands. Use a hand full of helper methods
that will be useful for the next patches also.

v2: Insert asserts, constify local variables.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Use a local variable for the dlist offsets.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Use a local variable for the dlist offsets.

The master value is now stored inside the VAO already present in
struct vbo_save_vertex_list. Remove the unneeded copy from dlist storage.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove unused vbo_save_context::wrap_count.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove unused vbo_save_context::wrap_count.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove unused vbo_save_vertex_list::dangling_attr_ref.
Mathias Fröhlich [Sun, 25 Feb 2018 17:01:07 +0000 (18:01 +0100)]
vbo: Remove unused vbo_save_vertex_list::dangling_attr_ref.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agoanv: Always set has_context_priority
Jason Ekstrand [Wed, 28 Feb 2018 23:25:48 +0000 (15:25 -0800)]
anv: Always set has_context_priority

We don't zalloc the physical device so we need to unconditionally set
everything.  Crucible helpfully initializes all allocations to 139 so it
was getting true regardless of whether or not the kernel actually
supports context priorities.

Fixes: 6d8ab53303331 "anv: implement VK_EXT_global_priority extension"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoRevert "i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+"
Mark Janes [Thu, 1 Mar 2018 01:26:08 +0000 (17:26 -0800)]
Revert "i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+"

This reverts commit a2c1e48f15995a826dc759e064c2603882a37e0c.

On BDWGT3e and KBLGT3e systems, this commit regressed the following
tests:

  piglit.spec.ext_framebuffer_multisample.accuracy 2 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy 4 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy 6 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy 8 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy all_samples stencil_resolve small depthstencil

6 years agoradeonsi/nir: increase values to 8 for gs fetch.
Dave Airlie [Thu, 1 Mar 2018 00:01:33 +0000 (10:01 +1000)]
radeonsi/nir: increase values to 8 for gs fetch.

This stops a crash when running (still fails):
tests/spec/arb_gpu_shader_fp64/execution/explicit-location-gs-fs-vs.shader_test

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Use the syncobj wait ioctl to wait on fences if possible.
Bas Nieuwenhuizen [Mon, 26 Feb 2018 20:52:49 +0000 (21:52 +0100)]
radv: Use the syncobj wait ioctl to wait on fences if possible.

Handles the !waitAll and signal after the start of the wait cases correctly.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Implement more efficient !waitAll fence waiting.
Bas Nieuwenhuizen [Mon, 26 Feb 2018 22:48:27 +0000 (23:48 +0100)]
radv: Implement more efficient !waitAll fence waiting.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Implement waiting on non-submitted fences.
Bas Nieuwenhuizen [Mon, 26 Feb 2018 21:54:06 +0000 (22:54 +0100)]
radv: Implement waiting on non-submitted fences.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Implement WaitForFences with !waitAll.
Bas Nieuwenhuizen [Mon, 26 Feb 2018 21:50:41 +0000 (22:50 +0100)]
radv: Implement WaitForFences with !waitAll.

Nothing to do except using a busy wait loop. At least for old kernels.

A better implementation for newer kernels to come later.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105255
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoac/nir: fix shared atomic operations.
Dave Airlie [Wed, 28 Feb 2018 23:38:19 +0000 (09:38 +1000)]
ac/nir: fix shared atomic operations.

The nir->llvm conversion was using the wrong srcs.

Fixes:
tests/spec/arb_compute_shader/execution/shared-atomics.shader_test

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoac/nir: don't apply slice rounding on txf_ms
Dave Airlie [Wed, 28 Feb 2018 23:24:01 +0000 (09:24 +1000)]
ac/nir: don't apply slice rounding on txf_ms

This matches the tgsi code.

Fixes arb_texture_multisample texelFetch piglit tests.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec7914 (radv: add initial non-conformant radv vulkan driver)
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoradeonsi: set some context vars for nir path
Timothy Arceri [Tue, 13 Feb 2018 02:06:51 +0000 (13:06 +1100)]
radeonsi: set some context vars for nir path

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: remove llvm from ir struct
Timothy Arceri [Thu, 1 Feb 2018 21:50:09 +0000 (08:50 +1100)]
gallium: remove llvm from ir struct

This was added in 425dc4c4b366 but never used. Also since
100796c15c3a native has superseded llvm.

Acked-by: Dave Airlie <airlied@redhat.com>
6 years agoi965: Don't emit MOVs with undefined registers for Gen4 point clipping.
Kenneth Graunke [Wed, 28 Feb 2018 21:22:22 +0000 (13:22 -0800)]
i965: Don't emit MOVs with undefined registers for Gen4 point clipping.

Gen4 point clipping calls brw_clip_tri_alloc_regs with nr_verts == 0,
which means that c->reg.vertex[] isn't initialized.  It then emits MOVs
to stomp components of those uninitialized registers to 0.

This started causing assertions after Matt's recent series, when those
uninitialized registers started getting BRW_REGISTER_TYPE_NF, which
definitely doesn't exist on Gen4-5.

Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agobroadcom/vc5: Fix regression in the page-cache slice size alignment.
Eric Anholt [Fri, 23 Feb 2018 23:35:25 +0000 (15:35 -0800)]
broadcom/vc5: Fix regression in the page-cache slice size alignment.

We need to align the size of the slice, not the offset of the next slice.
Fixes KHR-GLES3.texture_repeat_mode.rgba32ui_11x131_2_clamp_to_edge.

Fixes: b4b4ada7616d ("broadcom/vc5: Fix layout of 3D textures.")
6 years agoi965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+
Jason Ekstrand [Fri, 3 Nov 2017 17:36:32 +0000 (10:36 -0700)]
i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: Be more clever about setting up our viewport clip
Jason Ekstrand [Fri, 3 Nov 2017 21:13:08 +0000 (14:13 -0700)]
i965: Be more clever about setting up our viewport clip

Before, we were trusting in the hardware to take the intersection
of the viewport clip with the drawing rectangle.  Unfortunately,
3DSTATE_DRAWING_RECTANGLE is fairly expensive because it implicitly
does a full pipeline stall.  If we're a bit more careful with our
viewport clipping, we can just re-emit it once at context creation
time.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Re-add .vs_inputs_dual_locations = true
Matt Turner [Wed, 28 Feb 2018 21:25:21 +0000 (13:25 -0800)]
intel/compiler: Re-add .vs_inputs_dual_locations = true

Looks like a rebase mistake.

Fixes: 89fe5190a256 ("intel/compiler: Lower flrp32 on Gen11+")
6 years agor600/shader: when using images always load thread id gpr at start (v2)
Dave Airlie [Wed, 28 Feb 2018 06:42:53 +0000 (06:42 +0000)]
r600/shader: when using images always load thread id gpr at start (v2)

The delayed loading code was fail if we had control flow.

This fixes:
tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test

v2: don't use temp_reg before setting temp_reg up.

Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agor600: fix whitespace in recent 1d texture commit.
Dave Airlie [Wed, 28 Feb 2018 20:15:30 +0000 (20:15 +0000)]
r600: fix whitespace in recent 1d texture commit.

trivial fix.

6 years agointel/compiler: Add ICL to test_eu_validate.cpp
Matt Turner [Mon, 29 Jan 2018 23:52:39 +0000 (15:52 -0800)]
intel/compiler: Add ICL to test_eu_validate.cpp

With the Align16 tests now disabled, we can run the rest of the tests in
ICL mode (and see them pass!)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Disable Align16 tests on Gen11+
Matt Turner [Thu, 8 Feb 2018 18:23:11 +0000 (10:23 -0800)]
intel/compiler: Disable Align16 tests on Gen11+

Align16 is no more.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Add instruction compaction support on Gen11
Matt Turner [Wed, 14 Jun 2017 23:43:05 +0000 (16:43 -0700)]
intel/compiler: Add instruction compaction support on Gen11

Gen11 only differs from SKL+ in that it uses a new datatype index table.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Mark line, pln, and lrp as removed on Gen11+
Matt Turner [Wed, 14 Jun 2017 23:14:11 +0000 (16:14 -0700)]
intel/compiler: Mark line, pln, and lrp as removed on Gen11+

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Lower flrp32 on Gen11+
Matt Turner [Wed, 14 Jun 2017 23:20:41 +0000 (16:20 -0700)]
intel/compiler: Lower flrp32 on Gen11+

The LRP instruction is no more.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/fs: Implement ddy without using align16 for Gen11+
Matt Turner [Fri, 16 Jun 2017 00:29:16 +0000 (17:29 -0700)]
intel/compiler/fs: Implement ddy without using align16 for Gen11+

Align16 is no more. We previously generated an align16 ADD instruction
to calculate DDY:

   add(16) g25<1>F  -g23<4>.xyxyF   g23<4>.zwzwF   { align16 1H };

Without align16, we now implement it as:

   add(4) g25<1>F   -g23<0,2,1>F    g23.2<0,2,1>F  { align1 1N };
   add(4) g25.4<1>F -g23.4<0,2,1>F  g23.6<0,2,1>F  { align1 1N };
   add(4) g26<1>F   -g24<0,2,1>F    g24.2<0,2,1>F  { align1 1N };
   add(4) g26.4<1>F -g24.4<0,2,1>F  g24.6<0,2,1>F  { align1 1N };

where only the first two instructions are needed in SIMD8 mode.

Note: an earlier version of the patch implemented this in two
instructions in SIMD16:

   add(8) g25<2>F   -g23<4,2,0>F    g23.2<4,2,0>F  { align1 1N };
   add(8) g25.1<2>F -g23.1<4,2,0>F  g23.3<4,2,0>F  { align1 1N };

but I realized that the channel enable bits will not be correct. If we
knew we were under uniform control flow, we could emit only those two
instructions however.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/fs: Simplify ddx/ddy code generation
Matt Turner [Fri, 16 Jun 2017 00:20:29 +0000 (17:20 -0700)]
intel/compiler/fs: Simplify ddx/ddy code generation

The brw_reg() constructor just obfuscates things here, in my opinion.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/fs: Pass fs_inst to generate_ddx/ddy instead of opcode
Matt Turner [Thu, 15 Jun 2017 22:41:40 +0000 (15:41 -0700)]
intel/compiler/fs: Pass fs_inst to generate_ddx/ddy instead of opcode

In a future patch, generate_ddy will want to inspect inst->exec_size.
Change generate_ddx as well for consistency.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/fs: Don't generate integer DWord multiply on Gen11
Matt Turner [Mon, 23 Oct 2017 17:44:39 +0000 (10:44 -0700)]
intel/compiler/fs: Don't generate integer DWord multiply on Gen11

Like CHV et al., Gen11 does not support 32x32 -> 32/64-bit integer
multiplies.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/fs: Implement FS_OPCODE_LINTERP with MADs on Gen11+
Matt Turner [Wed, 14 Jun 2017 21:47:19 +0000 (14:47 -0700)]
intel/compiler/fs: Implement FS_OPCODE_LINTERP with MADs on Gen11+

The PLN instruction is no more. Its functionality is now implemented
using two MAD instructions with the new native-float type. Instead of

   pln(16) r20.0<1>:F r10.4<0;1,0>:F r4.0<8;8,1>:F

we now have

   mad(8) acc0<1>:NF r10.7<0;1,0>:F r4.0<8;8,1>:F r10.4<0;1,0>:F
   mad(8) r20.0<1>:F acc0<8;8,1>:NF r5.0<8;8,1>:F r10.5<0;1,0>:F
   mad(8) acc0<1>:NF r10.7<0;1,0>:F r6.0<8;8,1>:F r10.4<0;1,0>:F
   mad(8) r21.0<1>:F acc0<8;8,1>:NF r7.0<8;8,1>:F r10.5<0;1,0>:F

... and in the case of SIMD8 only the first pair of MAD instructions is
used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/fs: Return multiple_instructions_emitted from generate_linterp
Matt Turner [Wed, 14 Jun 2017 18:06:45 +0000 (11:06 -0700)]
intel/compiler/fs: Return multiple_instructions_emitted from generate_linterp

If multiple instructions are emitted, special handling of things like
conditional mod and NoDDClr/NoDDChk need to be performed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/fs: Fix application of cmod and saturate to LINE/MAC pair
Matt Turner [Wed, 14 Jun 2017 21:47:19 +0000 (14:47 -0700)]
intel/compiler/fs: Fix application of cmod and saturate to LINE/MAC pair

This isn't technically broken, but the next patch will make this
function report whether it generated multiple instructions, and that
information will be used to disable the application of conditional mod
by the generic code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Add Gen11+ native float type
Matt Turner [Wed, 14 Jun 2017 18:03:19 +0000 (11:03 -0700)]
intel/compiler: Add Gen11+ native float type

This new type exposes the additional precision offered by the
accumulator register and will be used in the next patch to implement the
functionality of the PLN instruction using a pair of MAD instructions.

One weird thing to note: align1 ternary instructions may only have an
accumulator in the dst or src1 normally, but when src0's type is :NF
the accumulator is read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Add Gen11 register types
Matt Turner [Fri, 25 Aug 2017 16:50:29 +0000 (09:50 -0700)]
intel/compiler: Add Gen11 register types

The hardware register types' encodings have changed on Gen11. Good thing
we have that superfluous looking brw_reg_type abstraction lying around!

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel: Disable 64-bit extensions on platforms without 64-bit types
Matt Turner [Mon, 11 Dec 2017 21:59:13 +0000 (13:59 -0800)]
intel: Disable 64-bit extensions on platforms without 64-bit types

Gen11 does not support DF, Q, UQ types in hardware. As a result, we have
to disable some GL extensions until they can be reimplemented.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>