mesa.git
8 years agomesa/copyimage: report INVALID_VALUE for missing cube face
Dave Airlie [Thu, 2 Jun 2016 04:13:18 +0000 (14:13 +1000)]
mesa/copyimage: report INVALID_VALUE for missing cube face

The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.

This fixes:
GL45-CTS.copy_image.non_existent_mipmap

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agomesa/copyimage: fix num samples check to handle renderbuffers.
Dave Airlie [Thu, 2 Jun 2016 03:41:28 +0000 (13:41 +1000)]
mesa/copyimage: fix num samples check to handle renderbuffers.

This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.

This fixes:
GL45-CTS.copy_image.invalid_target

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agofreedreno/a4xx: silence coverity warning
Rob Clark [Thu, 2 Jun 2016 15:47:11 +0000 (11:47 -0400)]
freedreno/a4xx: silence coverity warning

CID 1362451

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/a3xx+a4xx: fix potential null ptr deref
Rob Clark [Thu, 2 Jun 2016 15:42:25 +0000 (11:42 -0400)]
freedreno/a3xx+a4xx: fix potential null ptr deref

Coverity spotted the a3xx case (not sure why not the a4xx).

CID 1362452

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/ir3: fix coverity warning
Rob Clark [Thu, 2 Jun 2016 15:19:43 +0000 (11:19 -0400)]
freedreno/ir3: fix coverity warning

CID 1362453

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/ir3: use nir_shader_get_entrypoint() helper
Rob Clark [Thu, 2 Jun 2016 15:13:26 +0000 (11:13 -0400)]
freedreno/ir3: use nir_shader_get_entrypoint() helper

Should also fix coverity warning: CID 1362454

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/a4xx: fix incorrect enum type
Rob Clark [Thu, 2 Jun 2016 14:49:18 +0000 (10:49 -0400)]
freedreno/a4xx: fix incorrect enum type

a4xx has it's own enum, different from a2xx/a3xx.

Spotted by coverity: CID 13624581362459

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno: fix coverity negative array index warning
Rob Clark [Thu, 2 Jun 2016 14:36:23 +0000 (10:36 -0400)]
freedreno: fix coverity negative array index warning

Never can happen, since query would not have been created in the first
place if pidx(query_type) return negative.  Lets let coverity realize
this.

CID 1362460

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno: fix dereference before null check
Rob Clark [Thu, 2 Jun 2016 14:33:08 +0000 (10:33 -0400)]
freedreno: fix dereference before null check

ptr can actually never be null so just drop the check.

CID 1362464 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking ptr suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agogallium/util: remove u_staging
Rob Clark [Sun, 29 May 2016 16:25:32 +0000 (12:25 -0400)]
gallium/util: remove u_staging

Unused, and fixes a couple of coverity warnings: CID 13621711362170

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
8 years agofreedreno/a3xx: only update/emit bordercolor state when needed
Rob Clark [Wed, 1 Jun 2016 17:40:53 +0000 (13:40 -0400)]
freedreno/a3xx: only update/emit bordercolor state when needed

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/a4xx: only update/emit bordercolor state when needed
Rob Clark [Wed, 1 Jun 2016 16:23:58 +0000 (12:23 -0400)]
freedreno/a4xx: only update/emit bordercolor state when needed

I noticed in stk that it was contributing to a lot of overhead.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agoi965: Add missing types to type_sz().
Matt Turner [Tue, 24 May 2016 22:10:25 +0000 (15:10 -0700)]
i965: Add missing types to type_sz().

Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
8 years agomesa/extensions: Fix ES1 extension reporting
Nanley Chery [Tue, 24 May 2016 21:27:26 +0000 (14:27 -0700)]
mesa/extensions: Fix ES1 extension reporting

Commit eda15abd84af575d3bde432e2163e30d743a7c87 , unintentionally
advertised these extensions in ES1 contexts. Undo this error.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoegl: Check if API is supported when using eglBindAPI.
Plamena Manolova [Tue, 31 May 2016 16:32:38 +0000 (17:32 +0100)]
egl: Check if API is supported when using eglBindAPI.

According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agost/osmesa: remove double-write (overwriting)
Eric Engestrom [Tue, 31 May 2016 01:26:00 +0000 (19:26 -0600)]
st/osmesa: remove double-write (overwriting)

These two lines have been here since the file was created.
I'm guessing the second one was just for testing during dev, so it's the
one that's going away.

CoverityID: 1296205

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agost/vdpau: check for null pointer in get/put bits.
Nayan Deshmukh [Thu, 2 Jun 2016 06:41:58 +0000 (12:11 +0530)]
st/vdpau: check for null pointer in get/put bits.

Check for null pointer before accessing arrays in get/put bits
native/YCbCr/Indexed in VdpOutputSurface and VdpVideoSurface.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agoradeon/uvd: fix the H264 level for Tonga v2
Christian König [Wed, 25 May 2016 14:55:48 +0000 (16:55 +0200)]
radeon/uvd: fix the H264 level for Tonga v2

We support 5.2 for a while now.

v2: we even support 5.2 for H264, 5.1 is for HEVC.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>
8 years agomesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED
Alejandro Piñeiro [Thu, 5 May 2016 09:27:05 +0000 (11:27 +0200)]
mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED

The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/formatquery: remove INTERNALFORMAT_PREFERRED implementation
Alejandro Piñeiro [Thu, 5 May 2016 09:28:37 +0000 (11:28 +0200)]
i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation

Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/eu: use simd8 when exec_size != EXECUTE_16
Alejandro Piñeiro [Wed, 1 Jun 2016 16:49:29 +0000 (18:49 +0200)]
i965/eu: use simd8 when exec_size != EXECUTE_16

Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Remove old CS local ID handling
Jordan Justen [Mon, 23 May 2016 05:31:06 +0000 (22:31 -0700)]
i965: Remove old CS local ID handling

The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.

The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Enable cross-thread constants and compact local IDs for hsw+
Jordan Justen [Tue, 31 May 2016 22:45:24 +0000 (15:45 -0700)]
i965: Enable cross-thread constants and compact local IDs for hsw+

The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.

Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.

To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.

v4:
 * Minimize size of patch that switches from the old local ID layout
   to the new layout (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoanv: Support new local ID generation & cross-thread constants
Jordan Justen [Fri, 27 May 2016 07:53:27 +0000 (00:53 -0700)]
anv: Support new local ID generation & cross-thread constants

The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Support new local ID push constant & cross-thread constants
Jordan Justen [Mon, 23 May 2016 04:55:43 +0000 (21:55 -0700)]
i965: Support new local ID push constant & cross-thread constants

The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add CS push constant info to brw_cs_prog_data
Jordan Justen [Mon, 23 May 2016 04:46:28 +0000 (21:46 -0700)]
i965: Add CS push constant info to brw_cs_prog_data

We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.

When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.

The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.

For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.

v4:
 * Support older local ID push constant layout as well. (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Store number of threads in brw_cs_prog_data
Jordan Justen [Thu, 26 May 2016 20:49:07 +0000 (13:49 -0700)]
i965: Store number of threads in brw_cs_prog_data

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add nir based intrinsic lowering and thread ID uniform
Jordan Justen [Sun, 22 May 2016 07:08:06 +0000 (00:08 -0700)]
i965: Add nir based intrinsic lowering and thread ID uniform

We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.

We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.

We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.

v2:
 * Create variable during lowering pass. (Ken)

v3:
 * Don't create a variable, but instead just insert an intrisic call
   to load a uniform from the allocated location. (Jason)

v4:
 * Don't run this pass if thread_local_id_index < 0

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Put CS local thread ID uniform in last push register
Jordan Justen [Mon, 23 May 2016 04:29:53 +0000 (21:29 -0700)]
i965: Put CS local thread ID uniform in last push register

This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.

It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.

The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)

v2:
 * Add variable in intrinsics lowering pass
 * Make sure the ID is pushed last in assign_constant_locations, and
   that we save a spot for the ID in the push constants

v3:
 * Simplify code based with Jason's suggestions.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add uniform for a CS thread local base ID
Jordan Justen [Sun, 29 May 2016 06:45:21 +0000 (23:45 -0700)]
i965: Add uniform for a CS thread local base ID

v4:
 * Force thread_local_id_index to -1 for now, and have
   fs_visitor::setup_cs_payload look at thread_local_id_index. This
   enables us to more easily cut over from the old local ID layout to
   the new layout, as suggested by Jason.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add nir channel_num system value
Jordan Justen [Sun, 22 May 2016 23:33:44 +0000 (16:33 -0700)]
i965: Add nir channel_num system value

v2:
 * simd16/32 fixes (curro)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonir: Make lowering gl_LocalInvocationIndex optional
Jordan Justen [Sun, 22 May 2016 22:54:48 +0000 (15:54 -0700)]
nir: Make lowering gl_LocalInvocationIndex optional

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoglsl: Add glsl LowerCsDerivedVariables option
Jordan Justen [Sat, 21 May 2016 21:21:32 +0000 (14:21 -0700)]
glsl: Add glsl LowerCsDerivedVariables option

v2:
 * Move lower flag to context constants. (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Copy the offset when lowering logical pull constant sends
Jason Ekstrand [Wed, 1 Jun 2016 22:01:04 +0000 (15:01 -0700)]
i965/fs: Copy the offset when lowering logical pull constant sends

This fixes 64 Vulkan CTS tests per gen

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoglsl/distance: make sure we use clip dist varying slot for lowered var.
Dave Airlie [Tue, 24 May 2016 20:03:24 +0000 (06:03 +1000)]
glsl/distance: make sure we use clip dist varying slot for lowered var.

When lowering, we always want to use the clip dist varying.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agowinsys/amdgpu: decay max_ib_size over time
Nicolai Hähnle [Sun, 8 May 2016 17:53:23 +0000 (12:53 -0500)]
winsys/amdgpu: decay max_ib_size over time

So that memory use will eventually decrease again after a temporary peak.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: implement IB chaining on the gfx ring
Nicolai Hähnle [Sat, 7 May 2016 02:33:17 +0000 (21:33 -0500)]
winsys/amdgpu: implement IB chaining on the gfx ring

As a consequence, CE IB size never triggers a flush anymore.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: consolidate IB size management in amdgpu_ib_finalize
Nicolai Hähnle [Mon, 9 May 2016 22:21:25 +0000 (17:21 -0500)]
winsys/amdgpu: consolidate IB size management in amdgpu_ib_finalize

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeon/winsys: introduce radeon_winsys_cs_chunk
Nicolai Hähnle [Fri, 6 May 2016 22:14:29 +0000 (17:14 -0500)]
radeon/winsys: introduce radeon_winsys_cs_chunk

We will chain multiple chunks together and will keep pointers to the older
chunks to support IB dumping.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi/sid: add packet definitions for IB chaining
Nicolai Hähnle [Sat, 7 May 2016 03:04:31 +0000 (22:04 -0500)]
radeonsi/sid: add packet definitions for IB chaining

While we're at it, add packet printing in si_debug.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: start with smaller IBs, growing as necessary
Nicolai Hähnle [Sat, 7 May 2016 15:58:13 +0000 (10:58 -0500)]
winsys/amdgpu: start with smaller IBs, growing as necessary

This avoids allocating giant IBs from the outset, especially for CE and DMA.

Since we now limit max_dw only by the size that the buffer happens to be
(which, due to the buffer cache, can be even larger than the rounded-up size
we request), the new function amdgpu_ib_max_submit_dwords controls when we
submit an IB.

With this change, we effectively never flush prematurely due to the CE IB,
after an initial warm-up phase.

v2:
- clean up buffer_size calculation

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: add amdgpu_ib and amdgpu_cs_from_ib helper functions
Nicolai Hähnle [Sat, 7 May 2016 02:16:05 +0000 (21:16 -0500)]
winsys/amdgpu: add amdgpu_ib and amdgpu_cs_from_ib helper functions

The latter function allows getting the containing amdgpu_cs from any IB
(including non-main ones).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: extract IB big buffer allocation for re-use
Nicolai Hähnle [Sat, 7 May 2016 02:37:52 +0000 (21:37 -0500)]
winsys/amdgpu: extract IB big buffer allocation for re-use

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: add IB buffer in amdgpu_get_new_ib
Nicolai Hähnle [Sat, 7 May 2016 02:29:56 +0000 (21:29 -0500)]
winsys/amdgpu: add IB buffer in amdgpu_get_new_ib

Adding the buffer when we start using it for the IB makes the logic for
chaining a bit simpler.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium/radeon: use cs_check_space throughout
Nicolai Hähnle [Fri, 6 May 2016 17:42:05 +0000 (12:42 -0500)]
gallium/radeon: use cs_check_space throughout

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeon/winsys: add cs_check_space
Nicolai Hähnle [Fri, 6 May 2016 17:34:25 +0000 (12:34 -0500)]
radeon/winsys: add cs_check_space

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: simplify interface of amdgpu_get_new_ib
Nicolai Hähnle [Sat, 7 May 2016 02:24:51 +0000 (21:24 -0500)]
winsys/amdgpu: simplify interface of amdgpu_get_new_ib

We'll want to have an amdgpu_cs pointer for future changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: add amdgpu_cs_has_user_fence
Nicolai Hähnle [Fri, 6 May 2016 02:09:13 +0000 (21:09 -0500)]
winsys/amdgpu: add amdgpu_cs_has_user_fence

v2: style change

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoi965: Fix isoline reads in scalar TES.
Kenneth Graunke [Wed, 1 Jun 2016 04:00:43 +0000 (21:00 -0700)]
i965: Fix isoline reads in scalar TES.

Isolines aren't reversed.  commit 5b2d8c2273c6f fixed this for the vec4
TES backend, but not the scalar one.

Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
8 years agost/mesa: implement PBO downloads for ReadPixels
Nicolai Hähnle [Tue, 26 Apr 2016 18:19:28 +0000 (13:19 -0500)]
st/mesa: implement PBO downloads for ReadPixels

v2: require PIPE_CAP_SAMPLER_VIEW_TARGET; technically only needed for some of
    the texture targets, but all hardware that has shader images should also
    have this cap.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: hook up a no-op try_pbo_readpixels
Nicolai Hähnle [Tue, 26 Apr 2016 16:33:33 +0000 (11:33 -0500)]
st/mesa: hook up a no-op try_pbo_readpixels

For better bisectability given that the order of some of the fallback tests
in the blit path are rearranged.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: add layer_offset to PBO fragment shader
Nicolai Hähnle [Wed, 18 May 2016 04:45:24 +0000 (23:45 -0500)]
st/mesa: add layer_offset to PBO fragment shader

This will be used to select a slice of a 3D texture.

v2: fix a comment (Marek)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: create PBO download fragment shaders
Nicolai Hähnle [Tue, 26 Apr 2016 20:59:17 +0000 (15:59 -0500)]
st/mesa: create PBO download fragment shaders

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: add PBO download enable bit and fragment shaders
Nicolai Hähnle [Tue, 26 Apr 2016 02:56:47 +0000 (21:56 -0500)]
st/mesa: add PBO download enable bit and fragment shaders

For downloads, the fragment shader must know the source texture target, hence
we may cache multiple fragment shaders.

v2: break long line (Marek)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: move shareable parts of PBO upload state and draw to st_pbo.c
Nicolai Hähnle [Tue, 26 Apr 2016 20:52:53 +0000 (15:52 -0500)]
st/mesa: move shareable parts of PBO upload state and draw to st_pbo.c

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: move PBO buffer address calculation to st_pbo.c
Nicolai Hähnle [Tue, 26 Apr 2016 17:59:29 +0000 (12:59 -0500)]
st/mesa: move PBO buffer address calculation to st_pbo.c

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: move PBO upload fs creation to st_pbo.c
Nicolai Hähnle [Tue, 26 Apr 2016 02:56:01 +0000 (21:56 -0500)]
st/mesa: move PBO upload fs creation to st_pbo.c

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: rename pbo_upload to pbo
Nicolai Hähnle [Tue, 26 Apr 2016 02:49:27 +0000 (21:49 -0500)]
st/mesa: rename pbo_upload to pbo

At the same time, rename members that are upload-specific to say so.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: move PBO vertex and geometry shader creation to st_pbo.c
Nicolai Hähnle [Tue, 26 Apr 2016 02:40:53 +0000 (21:40 -0500)]
st/mesa: move PBO vertex and geometry shader creation to st_pbo.c

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: begin moving PBO functions into their own file
Nicolai Hähnle [Tue, 26 Apr 2016 02:35:10 +0000 (21:35 -0500)]
st/mesa: begin moving PBO functions into their own file

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium/cso: allow saving the first fragment shader image slot
Nicolai Hähnle [Wed, 27 Apr 2016 00:54:41 +0000 (19:54 -0500)]
gallium/cso: allow saving the first fragment shader image slot

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium/u_inlines: allow NULL src in util_copy_image_view
Nicolai Hähnle [Fri, 29 Apr 2016 21:06:13 +0000 (16:06 -0500)]
gallium/u_inlines: allow NULL src in util_copy_image_view

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium: add PIPE_BARRIER_ALL define
Nicolai Hähnle [Wed, 27 Apr 2016 02:22:37 +0000 (21:22 -0500)]
gallium: add PIPE_BARRIER_ALL define

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoglsl: Use Geom.VerticesOut == -1 to specify unset
Ian Romanick [Mon, 23 May 2016 22:53:10 +0000 (15:53 -0700)]
glsl: Use Geom.VerticesOut == -1 to specify unset

Because apparently layout(max_vertices=0) is a thing.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoi965: If control_data_header_size_bits is zero, don't do EndPrimitive
Ian Romanick [Mon, 23 May 2016 22:17:02 +0000 (15:17 -0700)]
i965: If control_data_header_size_bits is zero, don't do EndPrimitive

This can occur when max_vertices=0 is explicitly specified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agomesa: Fix bogus strncmp
Ian Romanick [Tue, 24 May 2016 17:10:40 +0000 (10:10 -0700)]
mesa: Fix bogus strncmp

The string "[0]\0" is the same as "[0]" as far as the C string datatype
is concerned.  That string has length 3.  strncmp(s, length_3_string, 4)
is the same as strcmp(s, length_3_string), so make it be strcmp.

v2: Not the same as strncmp(..., 3).  Noticed by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoradeonsi: set correct stencil tile mode for texturing
Marek Olšák [Thu, 19 May 2016 18:12:10 +0000 (20:12 +0200)]
radeonsi: set correct stencil tile mode for texturing

Sadly, this doesn't affect SI and VI in any way.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agowinsys/amdgpu: set flags correctly when allocating depth-stencil buffers
Marek Olšák [Thu, 19 May 2016 18:10:10 +0000 (20:10 +0200)]
winsys/amdgpu: set flags correctly when allocating depth-stencil buffers

This mimics Vulkan. It also documents how to fix stencil texturing.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agogallium/radeon: lower memory usage during texture transfers
Marek Olšák [Wed, 18 May 2016 12:31:36 +0000 (14:31 +0200)]
gallium/radeon: lower memory usage during texture transfers

This improves throughput by keeping TTM overhead down.

Some piglit tests such as texelFetch and streaming-texture-leak will
use less memory now.

v2: use gart_size / 4 as the threshold

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/radeon: invalidate busy linear textures for whole-texture uploads
Marek Olšák [Wed, 18 May 2016 01:25:04 +0000 (03:25 +0200)]
gallium/radeon: invalidate busy linear textures for whole-texture uploads

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/radeon: degrade tiled textures mapped often to linear
Marek Olšák [Thu, 12 May 2016 11:33:06 +0000 (13:33 +0200)]
gallium/radeon: degrade tiled textures mapped often to linear

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/radeon: clean up and better comment use_staging_texture
Marek Olšák [Wed, 18 May 2016 01:06:04 +0000 (03:06 +0200)]
gallium/radeon: clean up and better comment use_staging_texture

Next commits will add other things around this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agoradeonsi: set some colorbuffer register fields at emit time
Marek Olšák [Wed, 18 May 2016 00:21:59 +0000 (02:21 +0200)]
radeonsi: set some colorbuffer register fields at emit time

to allow reallocating the texture storage with different parameters

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agoradeonsi: implement global resetting of texture descriptors
Marek Olšák [Tue, 17 May 2016 19:45:50 +0000 (21:45 +0200)]
radeonsi: implement global resetting of texture descriptors

it will be used by texture reallocation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agoradeonsi: move code for setting one shader image into separate function
Marek Olšák [Tue, 17 May 2016 19:25:39 +0000 (21:25 +0200)]
radeonsi: move code for setting one shader image into separate function

v2: fix set_shader_images(..., NULL). Found by Christoph Haag.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agoradeonsi: set some image descriptor fields at bind time
Marek Olšák [Tue, 17 May 2016 18:46:42 +0000 (20:46 +0200)]
radeonsi: set some image descriptor fields at bind time

mainly the fields that can change by reallocating a texture and changing
the tile mode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/radeon: strenghten some checking for DMA preparation
Marek Olšák [Wed, 11 May 2016 12:09:55 +0000 (14:09 +0200)]
gallium/radeon: strenghten some checking for DMA preparation

Just for consistency. This doesn't fix anything, because DCC is not
supported with non-mipmapped textures.

v1.1: fix the comment about DCC

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/util: add util_texrange_covers_whole_level from radeon
Marek Olšák [Mon, 9 May 2016 11:36:39 +0000 (13:36 +0200)]
gallium/util: add util_texrange_covers_whole_level from radeon

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agonir: allow sat on all float destination types
Ilia Mirkin [Tue, 31 May 2016 21:50:04 +0000 (17:50 -0400)]
nir: allow sat on all float destination types

With the introduction of fp64 and fp16 to nir, there are now a bunch of
float types running around. A F1 2015 shader ends up with an i2f.sat
operation, which has a nir_type_float32 destination. Allow sat on all
the float destination types.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoradeonsi: fix the raster config setup for 1 RB iceland chips
Alex Deucher [Mon, 23 May 2016 19:53:56 +0000 (15:53 -0400)]
radeonsi: fix the raster config setup for 1 RB iceland chips

I didn't realize there were 1 and 2 RB variants when this code
was originally added.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
8 years agomesa/sampler: fix error codes for sampler parameters.
Dave Airlie [Wed, 1 Jun 2016 06:35:59 +0000 (16:35 +1000)]
mesa/sampler: fix error codes for sampler parameters.

The initial ARB_sampler_objects spec had GL_INVALID_VALUE in it,
however version 8 of it fixed this, and the GL specs also have
the fixed value in them.

Fixes:
GL45-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoglsl: define some GLES3 constants in GLSL 4.1
Dave Airlie [Wed, 1 Jun 2016 06:16:30 +0000 (16:16 +1000)]
glsl: define some GLES3 constants in GLSL 4.1

The GLSL 4.1 spec adds:
gl_MaxVertexUniformVectors
gl_MaxFragmentUniformVectors
gl_MaxVaryingVectors

This fixes:
GL45-CTS.gtf31.GL3Tests.uniform_buffer_object.uniform_buffer_object_build_in_constants

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoi965: Add norbc debug option
Topi Pohjolainen [Tue, 31 May 2016 13:47:50 +0000 (16:47 +0300)]
i965: Add norbc debug option

This INTEL_DEBUG option disables lossless compression (also known
as render buffer compression).

v2: (Matt) Use likely(!lossless_compression_disabled) instead of
           !likely(lossless_compression_disabled)
    (Grazvydas) Update docs/envvars.html

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen9: Configure rbc buffers as plain for non-rbc tex views
Topi Pohjolainen [Tue, 31 May 2016 07:36:12 +0000 (10:36 +0300)]
i965/gen9: Configure rbc buffers as plain for non-rbc tex views

Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.

Fortunately none of tracked benchmarks showed a regression with
this.

v2 (Matt): Add missing space

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Fix the passthrough TCS for isolines.
Kenneth Graunke [Thu, 26 May 2016 07:29:56 +0000 (00:29 -0700)]
i965: Fix the passthrough TCS for isolines.

We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants.  We at least
need to initialize them to zero.  We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).

Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.

(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
8 years agoi965/xfb: skip components in correct buffer.
Dave Airlie [Wed, 1 Jun 2016 04:10:22 +0000 (14:10 +1000)]
i965/xfb: skip components in correct buffer.

The driver was adding the skip components but always for buffer 0.

This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoglsl/linker: fix multiple streams transform feedback.
Dave Airlie [Tue, 31 May 2016 02:51:47 +0000 (12:51 +1000)]
glsl/linker: fix multiple streams transform feedback.

e2791b38b42f83add5b07298c39741bf0a6d7d4b
mesa/program_interface_query: fix transform feedback varyings.

caused a regression in
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_multiple_streams
on radeonsi.

The problem was it was using the skip components varying to set
the stream id, when it should wait until a varying was written,
this just adds the varying checks in the right place.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agomesa/bufferobj: use mapping range in BufferSubData.
Dave Airlie [Wed, 25 May 2016 04:02:27 +0000 (14:02 +1000)]
mesa/bufferobj: use mapping range in BufferSubData.

According to GL4.5 spec:
An INVALID_OPERATION error is generated if any part of the speci-
fied buffer range is mapped with MapBufferRange or MapBuffer (see sec-
tion 6.3), unless it was mapped with MAP_PERSISTENT_BIT set in the Map-
BufferRange access flags.

So we should use the if range is mapped path.

This fixes:
GL45-CTS.buffer_storage.map_persistent_buffer_sub_data

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0, 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agonv50/ir: fix error finding free element in bitset in some situations
Ilia Mirkin [Tue, 31 May 2016 04:33:50 +0000 (00:33 -0400)]
nv50/ir: fix error finding free element in bitset in some situations

This really only hits for bitsets with a size of a multiple of 32. We
can end up with pos = -1 as a result of the ffs, which we in turn decide
is a valid position (since we fall through the loop and i == 1, we end
up adding 32 to it, so end up returning 31 again).

Up until recently this was largely unreachable, as the register file
sizes were all 63 or 255. However with the advent of compute shaders
which can restrict the number of registers, this can now happen.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonv50/ir: print relevant file's bitset when showing RA info
Ilia Mirkin [Tue, 31 May 2016 04:33:19 +0000 (00:33 -0400)]
nv50/ir: print relevant file's bitset when showing RA info

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoRevert "glsl: fix xfb_offset unsized array validation"
Timothy Arceri [Tue, 31 May 2016 23:21:01 +0000 (09:21 +1000)]
Revert "glsl: fix xfb_offset unsized array validation"

This reverts commit aac90ba2920cf5ceb4df6dba776dd3952780e456.

The commit caused a regression in:
piglit.spec.glsl-1_50.compiler.gs-input-nonarray-named-block.geom

Also the CTS test it was meant to fix seems like it may be bogus.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoi965/fs: Allow scalar source regions on SNB math instructions.
Francisco Jerez [Sat, 28 May 2016 06:29:14 +0000 (23:29 -0700)]
i965/fs: Allow scalar source regions on SNB math instructions.

I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:

 "The supported regioning modes for math instructions are align16,
  align1 with the following restrictions:
   - Scalar source is supported.
  [...]
   - Source and destination offset must be the same, except the case of
     scalar source."

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965/fs: Fix constant combining for instructions that cannot accept source mods.
Francisco Jerez [Sat, 28 May 2016 06:29:10 +0000 (23:29 -0700)]
i965/fs: Fix constant combining for instructions that cannot accept source mods.

This is the case for SNB math instructions so we need to be careful
and insert the literal value of the immediate into the table (rather
than its absolute value) if the instruction is unable to invert the
sign of the constant on the fly.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.
Francisco Jerez [Wed, 25 May 2016 20:17:41 +0000 (13:17 -0700)]
i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Fix compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF...
Francisco Jerez [Fri, 27 May 2016 23:03:34 +0000 (16:03 -0700)]
i965/fs: Fix compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes.

Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already.  The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.

Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa514f2d06d0e4b33bfe6bca83142fbf0 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Teach compute_to_mrf() about the COMPR4 address transformation.
Francisco Jerez [Fri, 27 May 2016 21:17:28 +0000 (14:17 -0700)]
i965/fs: Teach compute_to_mrf() about the COMPR4 address transformation.

This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops.
Francisco Jerez [Fri, 27 May 2016 20:15:55 +0000 (13:15 -0700)]
i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops.

This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction.  In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Fix compute-to-mrf VGRF region coverage condition.
Francisco Jerez [Fri, 27 May 2016 23:41:35 +0000 (16:41 -0700)]
i965/fs: Fix compute-to-mrf VGRF region coverage condition.

Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ.  Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap().
Francisco Jerez [Fri, 27 May 2016 19:50:28 +0000 (12:50 -0700)]
i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap().

Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Teach regions_overlap() about COMPR4 MRF regions.
Francisco Jerez [Fri, 27 May 2016 06:53:31 +0000 (23:53 -0700)]
i965/fs: Teach regions_overlap() about COMPR4 MRF regions.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>