mesa.git
7 years agoi965/miptree: Add a helper for getting the aux usage for texturing
Jason Ekstrand [Thu, 22 Jun 2017 03:19:32 +0000 (20:19 -0700)]
i965/miptree: Add a helper for getting the aux usage for texturing

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Partially resolve MCS for texture views
Jason Ekstrand [Fri, 23 Jun 2017 17:44:16 +0000 (10:44 -0700)]
i965/miptree: Partially resolve MCS for texture views

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Add support for partially resolving MCS
Jason Ekstrand [Fri, 23 Jun 2017 17:43:30 +0000 (10:43 -0700)]
i965/miptree: Add support for partially resolving MCS

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Tighten up finish_mcs_write
Jason Ekstrand [Fri, 23 Jun 2017 17:42:30 +0000 (10:42 -0700)]
i965/miptree: Tighten up finish_mcs_write

Multisample surfaces only have a single miplevel so there's no reason to
be passing the extra parameters around.  It only leads to confusion.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Make aux_state work in terms of logical layers
Jason Ekstrand [Mon, 17 Jul 2017 23:16:41 +0000 (16:16 -0700)]
i965/miptree: Make aux_state work in terms of logical layers

This commit changes layer_range_length to return locical layers and also
changes the way we allocate the aux_state field to not allocate extra
layers for MCS.  This will be important as we're about to start doing
significantly more detailed tracking of MCS state.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agointel/blorp: Add a partial resolve pass for MCS
Jason Ekstrand [Fri, 23 Jun 2017 17:27:27 +0000 (10:27 -0700)]
intel/blorp: Add a partial resolve pass for MCS

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Remove some unneeded restrictions
Jason Ekstrand [Thu, 22 Jun 2017 04:33:41 +0000 (21:33 -0700)]
i965/miptree: Remove some unneeded restrictions

intel_miptree_supports_ccs_e should handle the gen >= 9 requirement and
there's no reason why we can't do CCS_E on window system buffers so long
as we resolve.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Stop setting FOR_SCANOUT for renderbuffers
Jason Ekstrand [Wed, 19 Jul 2017 00:00:39 +0000 (17:00 -0700)]
i965/miptree: Stop setting FOR_SCANOUT for renderbuffers

Nothing created through intel_miptree_create_for_renderbuffer will ever
be exposed externally so there's no need to set FOR_SCANOUT.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/blorp: Do flushes around depth resolves
Jason Ekstrand [Wed, 19 Jul 2017 01:44:26 +0000 (18:44 -0700)]
i965/blorp: Do flushes around depth resolves

It turns out that if you have rendering in-flight with CCS_E enabled and
you go to do a depth resolve without flushing, the CCS data may never
hit the memory.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/blorp: Use the renderbuffer format for clears
Jason Ekstrand [Sun, 25 Jun 2017 05:50:53 +0000 (22:50 -0700)]
i965/blorp: Use the renderbuffer format for clears

This fixes the Piglit ARB_texture_views rendering-formats test.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoanv: Predicate fast-clear resolves
Nanley Chery [Tue, 18 Apr 2017 18:03:42 +0000 (11:03 -0700)]
anv: Predicate fast-clear resolves

Image layouts only let us know that an image *may* be fast-cleared. For
this reason we can end up with redundant resolves. Testing has shown
that such resolves can measurably hurt performance and that predicating
them can avoid the penalty.

v2:
- Introduce additional resolve state management function (Jason Ekstrand).
- Enable easy retrieval of fast clear state fields.
v3: Use more descriptive field enums (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agointel/blorp: Allow BLORP calls to be predicated
Nanley Chery [Tue, 25 Apr 2017 20:32:34 +0000 (13:32 -0700)]
intel/blorp: Allow BLORP calls to be predicated

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Skip some input attachment transitions
Nanley Chery [Wed, 24 May 2017 17:16:38 +0000 (10:16 -0700)]
anv/cmd_buffer: Skip some input attachment transitions

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Stop resolving CCS implicitly
Nanley Chery [Sat, 18 Mar 2017 05:36:05 +0000 (22:36 -0700)]
anv: Stop resolving CCS implicitly

With an earlier patch from this series, resolves are additionally
performed on layout transitions. Remove the now unnecessary implicit
resolves within render passes.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Transition more color buffer layouts
Nanley Chery [Sat, 4 Mar 2017 07:59:16 +0000 (23:59 -0800)]
anv: Transition more color buffer layouts

v2: Expound on comment for the pipe controls (Jason Ekstrand).
v3:
- Cast base_layer to uint64_t to avoid overflow.
- Remove "seems" from the pipe control comment.
- Fix clamp of layer_count (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Warn about not enabling CCS_E
Nanley Chery [Wed, 28 Jun 2017 17:29:04 +0000 (10:29 -0700)]
anv/cmd_buffer: Warn about not enabling CCS_E

Use the performance warning infrastructure to provide helpful
information when testing applications.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Move aux_usage assignment up
Nanley Chery [Wed, 28 Jun 2017 17:25:49 +0000 (10:25 -0700)]
anv/cmd_buffer: Move aux_usage assignment up

For readability, bring the assignment of CCS closer to the assignment of
NONE and MCS.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Always enable CCS_D in render passes
Nanley Chery [Fri, 31 Mar 2017 23:05:34 +0000 (16:05 -0700)]
anv/cmd_buffer: Always enable CCS_D in render passes

The lifespan of the fast-clear data will surpass the render pass scope.
We need CCS_D to be enabled in order to invalidate blocks previously
marked as cleared and to sample cleared data correctly.

v2: Avoid refactoring.
v3: Allow CCS_D for subpass resolves.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Disable CCS on gen7 color attachments upfront
Nanley Chery [Wed, 28 Jun 2017 16:35:08 +0000 (09:35 -0700)]
anv/cmd_buffer: Disable CCS on gen7 color attachments upfront

The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Ensure fast-clear values are current
Nanley Chery [Thu, 19 Jan 2017 18:12:36 +0000 (10:12 -0800)]
anv/cmd_buffer: Ensure fast-clear values are current

v2: Rewrite functions, change location of synchronization.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/gpu_memcpy: Add a lighter-weight GPU memcpy function
Nanley Chery [Wed, 25 Jan 2017 22:54:39 +0000 (14:54 -0800)]
anv/gpu_memcpy: Add a lighter-weight GPU memcpy function

We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.

v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Restrict fast clears in the GENERAL layout
Nanley Chery [Fri, 31 Mar 2017 20:52:53 +0000 (13:52 -0700)]
anv/cmd_buffer: Restrict fast clears in the GENERAL layout

v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Don't partially fast clear image layers
Nanley Chery [Fri, 10 Mar 2017 22:41:14 +0000 (14:41 -0800)]
anv/cmd_buffer: Don't partially fast clear image layers

v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Initialize the clear values buffer
Nanley Chery [Sat, 4 Mar 2017 07:59:16 +0000 (23:59 -0800)]
anv/cmd_buffer: Initialize the clear values buffer

v2: Rewrite functions.
v3 (Jason Ekstrand):
- Don't set ResourceMinLOD.
- Fix clamp of level_count.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/image: Append CCS/MCS with a fast-clear state buffer
Nanley Chery [Thu, 19 Jan 2017 01:39:53 +0000 (17:39 -0800)]
anv/image: Append CCS/MCS with a fast-clear state buffer

v2: Update comments, function signatures, and add assertions.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/image: Disable CCS if the image doesn't support rendering
Nanley Chery [Wed, 5 Jul 2017 19:15:24 +0000 (12:15 -0700)]
anv/image: Disable CCS if the image doesn't support rendering

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agointel/isl: Add surface state clear value information
Nanley Chery [Tue, 24 Jan 2017 23:55:57 +0000 (15:55 -0800)]
intel/isl: Add surface state clear value information

This will be used to load and store clear values from surface state
objects.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Transition MCS buffers from the undefined layout
Nanley Chery [Tue, 11 Jul 2017 17:46:58 +0000 (10:46 -0700)]
anv: Transition MCS buffers from the undefined layout

v2: Define MCS buffers with any sample count (Jason)

Cc: <mesa-stable@lists.freedesktop.org>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agointel/isl: Tighten up restrictions for CCS on gen7
Jason Ekstrand [Sat, 22 Jul 2017 23:50:22 +0000 (16:50 -0700)]
intel/isl: Tighten up restrictions for CCS on gen7

It may technically be possible to enable some sort of fast-clear support
for at least the base slice of a 2D array texture on gen7.  However,
it's not documented to work, we've never tried to do it in GL, and we
have no idea what the hardware does if you turn on CCS_D with arrayed
rendering.  Let's just play it safe and disallow it for now.  If someone
really cares that much about gen7 performance, they can come along and
try to get it working later.

7 years agoi965/bufmgr: Add comments about GTT coherency issues.
Chris Wilson [Sat, 22 Jul 2017 19:03:06 +0000 (12:03 -0700)]
i965/bufmgr: Add comments about GTT coherency issues.

(Patch written by Ken, but entirely comments written by Chris.)

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Drop non-LLC lunacy in the program cache code.
Kenneth Graunke [Tue, 11 Jul 2017 21:19:46 +0000 (14:19 -0700)]
i965: Drop non-LLC lunacy in the program cache code.

The non-LLC story was a horror show.  We uploaded data via pwrite
(drm_intel_bo_subdata), which would stall if the cache BO was in
use (being read) by the GPU.  Obviously, we wanted to avoid that.
So, we tried to detect whether the buffer was busy, and if so, we'd
allocate a new BO, map the old one read-only (hopefully not stalling),
copy all shaders compiled since the dawn of time to the new buffer,
upload our new one, toss the old BO, and let the state upload code
know that our program cache BO changed.  This was a lot of extra data
copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new
STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline.

Not only that, but our rudimentary busy tracking consistented of a flag
set at execbuf time, and not cleared until we threw out the program
cache BO.  So, the first shader upload after any drawing would hit this
"abandon the cache and start over" copying path.

This is largely unnecessary - it's just ancient and crufty code.  We can
use the same persistent mapping paths on all platforms.  On non-ancient
kernels, this will use a write combining map, which should be reasonably
fast.

One aspect that is worse: we do occasionally grow the program cache BO,
and copy the old contents to the newer BO.  This will suffer from UC
readback performance now.  To mitigate this, we use the MOVNTDQA based
streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms).  Gen4-5
are unfortunately going to be penalized.

v2: Add MOVNTDQA path, rebase on other map flag changes.
v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson).

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Set MAP_PERSISTENT on program cache buffers.
Kenneth Graunke [Fri, 21 Jul 2017 20:09:17 +0000 (13:09 -0700)]
i965: Set MAP_PERSISTENT on program cache buffers.

Chris Wilson pointed out that this mapping really is persistant.

Shouldn't actually have any effect today, but best to set it anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Correctly set MAP_WRITE when creating the LLC program cache map.
Kenneth Graunke [Fri, 21 Jul 2017 20:07:22 +0000 (13:07 -0700)]
i965: Correctly set MAP_WRITE when creating the LLC program cache map.

Using a read-only mapping is completely bogus - we use this mapping to
write all new shaders to the cache.

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965/bufmgr: Use write-combine mappings where available
Matt Turner [Tue, 11 Jul 2017 21:27:34 +0000 (22:27 +0100)]
i965/bufmgr: Use write-combine mappings where available

Write-combine mappings give much better performance on writes than
uncached access through the GTT.

Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768
on Apollolake by 3.6086% +/- 0.674193% (n=15).

v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind
    updates, potential for CPU/WC maps failing, and other changes.

v3: (by Ken and Chris Wilson)

    (Ken): Rebase on set_domain -> gem_wait
    (Chris): Fix up a failed CPU/WC mmaping with a GTT mapping

    Not all objects will be mappable for direct access by the CPU
    (either using WC/CPU or WC paths), for example, a dmabuf wrapping an
    object on a foreign device or an object wrapping access to stolen
    memory. Since either the physical pages are not known or even do not
    exist, we need to use the mediated, indirect access via the GTT. (If
    one day, the kernel does suddenly start providing mediated access
    via a regular WB/WC mmapping, we no longer need the fallback.)

v4: Avoid falling back for MAP_RAW (Chris).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965/bufmgr: Skip wait ioctl when not busy.
Kenneth Graunke [Mon, 17 Jul 2017 19:57:20 +0000 (12:57 -0700)]
i965/bufmgr: Skip wait ioctl when not busy.

If the buffer is idle, we I915_GEM_WAIT will return immediately,
so we may as well skip the ioctl altogether.  We can't trust the
"idle" flag for external buffers, but for most, it should be fine.

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965/bufmgr: Explicitly wait instead of using I915_GEM_SET_DOMAIN.
Kenneth Graunke [Mon, 17 Jul 2017 19:46:58 +0000 (12:46 -0700)]
i965/bufmgr: Explicitly wait instead of using I915_GEM_SET_DOMAIN.

With the advent of asynchronous maps, domain tracking doesn't make a
whole lot of sense.  Buffers can be in use on both the CPU and GPU at
the same time.  In order to avoid blocking, we stopped using set_domain
for asynchronous mappings, which means that the kernel's tracking has
lies.  We can't properly track it in userspace either, as the kernel
can change domains on us spontaneously (for example, when un-swapping).

According to Chris Wilson, I915_GEM_SET_DOMAIN does the following:

1. pins the backing storage (acquiring pages outside of the
   struct_mutex)

2. waits either for read/write access, including inter-device waits

3. updates the domain, clflushing as required

4. marks the object as used (for swapping)

5. turns off FBC/PSR/fancy scanout caching

Item (1) is not terribly important.  Most BOs are recycled via the
BO cache, so they already have pages.  Regardless, we fixed this
via an initial set_domain in the previous patch.

We implement item (2) with I915_GEM_WAIT.  This has one downside:
we'll stall unnecessarily if we do a read-only mapping of a buffer
that the GPU is reading.  I believe this is pretty uncommon.  We
may want to extend the wait ioctl at some point.

Mesa already does item (3) itself.  For cache-coherent buffers (most on
LLC systems), we don't need to do any clflushing - the CPU and GPU views
are coherent.  For non-coherent buffers (most on non-LLC systems), we
currently only use the CPU for read-only maps, and we explicitly clflush
when necessary.

We don't care about item (4)...swapping has already killed performance.
Plus, with async maps, the kernel's domain tracking is already bogus,
so it can't do this accurately regardless.

Item (5) should be okay because we avoid cached maps of scanout buffers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965/bufmgr: Allocate BO pages outside of the kernel's locking.
Kenneth Graunke [Fri, 21 Jul 2017 19:29:30 +0000 (12:29 -0700)]
i965/bufmgr: Allocate BO pages outside of the kernel's locking.

Suggested by Chris Wilson.

v2: Set the write domain to 0 (suggested by Chris).

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoglsl: rework misleading block layout code
Timothy Arceri [Fri, 21 Jul 2017 01:42:33 +0000 (11:42 +1000)]
glsl: rework misleading block layout code

From the ARB_uniform_buffer_object spec:

   ""shared" uniform blocks, the default layout, ..."

This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoglsl: remove placeholder comment
Timothy Arceri [Fri, 21 Jul 2017 01:09:33 +0000 (11:09 +1000)]
glsl: remove placeholder comment

This was added in 2d03f48a65a666 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agost/mesa: use proper resource target type in st_AllocTextureStorage()
Brian Paul [Thu, 20 Jul 2017 16:51:45 +0000 (10:51 -0600)]
st/mesa: use proper resource target type in st_AllocTextureStorage()

When we validate the texture sample count, pass the correct
pipe_texture_target for the texture, rather than PIPE_TEXTURE_2D.

Also add more comments about MSAA.

No piglit regressions with VMware driver.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agomesa: remove pointless assignments in init_teximage_fields_ms()
Brian Paul [Thu, 20 Jul 2017 15:57:32 +0000 (09:57 -0600)]
mesa: remove pointless assignments in init_teximage_fields_ms()

The NumSamples and FixedSampleLocation fields are set again later at
the end of the function so these earlier assignments aren't needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agosvga: Limit number of immediates in shader
Neha Bhende [Fri, 21 Jul 2017 00:03:19 +0000 (17:03 -0700)]
svga: Limit number of immediates in shader

imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which
is not used very frequently. So allocate it only if lit instruction is used.

Tested with mtt piglit and mtt glretrace

v2: As per Charmaine's comment

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agosvga: fix constant indices for texcoord scale factors and texture buffer size
Charmaine Lee [Tue, 18 Jul 2017 23:13:48 +0000 (16:13 -0700)]
svga: fix constant indices for texcoord scale factors and texture buffer size

This patch fixes the ordering of the constant indices for texcoord scale
factor and texture buffer size to match the order they were added to the
constant buffer in svga_get_extra_constants_common().

Tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agosvga: fix unnormalized->normalized texture coordinate conversion
Neha Bhende [Thu, 20 Jul 2017 20:59:36 +0000 (13:59 -0700)]
svga: fix unnormalized->normalized texture coordinate conversion

Sometimes, converting unnormalized coordinates to normalized
coordinates requires an epsilon value to produce the right texels with
nearest filtering.  Adding 0.0001 to the coordinates when the min/mag
filter is nearest fixes the issue.
Fixes piglit test fbo-blit-scaled-linear

Tested with mtt-piglit, mtt-glretrace

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agosvga: only support 4x, 8x, 16x msaa
Brian Paul [Thu, 20 Jul 2017 20:53:07 +0000 (14:53 -0600)]
svga: only support 4x, 8x, 16x msaa

Skip 2x MSAA, for example, since it's seldom used and just bloats
the list of pixel formats.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agomesa: include texture size in error messages
Brian Paul [Thu, 20 Jul 2017 13:56:03 +0000 (07:56 -0600)]
mesa: include texture size in error messages

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoi965: Support the mesa_no_error driconf option.
Kenneth Graunke [Sat, 22 Jul 2017 07:51:03 +0000 (00:51 -0700)]
i965: Support the mesa_no_error driconf option.

This allows us to override contexts to use no_error functionality
even if the applications themselves do not.

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoanv/blorp: Assert isl_surf_init success in do_buffer_copy
Jason Ekstrand [Sat, 22 Jul 2017 00:18:07 +0000 (17:18 -0700)]
anv/blorp: Assert isl_surf_init success in do_buffer_copy

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoanv/blorp: Explicitly set row_pitch in do_buffer_copy
Jason Ekstrand [Sat, 22 Jul 2017 00:14:52 +0000 (17:14 -0700)]
anv/blorp: Explicitly set row_pitch in do_buffer_copy

We have a very specific row pitch that we want and we don't want ISL to
be changing it on us so just be explicit about it.

Fixes: a40f0430347c07bf2d5794642fe02f5dd248a473
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965: Delete gen8_draw_upload.c
Kenneth Graunke [Sat, 22 Jul 2017 07:42:28 +0000 (00:42 -0700)]
i965: Delete gen8_draw_upload.c

For some reason we left an empty file, rather than deleting it.

7 years agonv50/ir: disable mul+add to mad for precise instructions
Karol Herbst [Fri, 23 Jun 2017 18:30:29 +0000 (20:30 +0200)]
nv50/ir: disable mul+add to mad for precise instructions

fixes
    missrendering in TombRaider
    KHR-GL44.gpu_shader5.precise_qualifier
    KHR-GL45.gpu_shader5.precise_qualifier

v4: disable opt only for MAD, it's fine for SAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
7 years agonv50/ir/tgsi: handle precise for most ALU instructions
Karol Herbst [Fri, 23 Jun 2017 18:30:28 +0000 (20:30 +0200)]
nv50/ir/tgsi: handle precise for most ALU instructions

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
7 years agonv50/ir: add precise field to Instruction
Karol Herbst [Fri, 23 Jun 2017 18:30:27 +0000 (20:30 +0200)]
nv50/ir: add precise field to Instruction

v4: initialize field with NULL

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
7 years agost/glsl_to_tgsi: don't optimize mul+add to mad if expression is precise
Karol Herbst [Fri, 23 Jun 2017 18:30:26 +0000 (20:30 +0200)]
st/glsl_to_tgsi: don't optimize mul+add to mad if expression is precise

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/docs: add precise instruction modifier
Karol Herbst [Fri, 23 Jun 2017 18:30:25 +0000 (20:30 +0200)]
gallium/docs: add precise instruction modifier

v4: add comment about intermediate rounding step to MAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agotgsi/text: parse _PRECISE modifier
Karol Herbst [Fri, 23 Jun 2017 18:30:24 +0000 (20:30 +0200)]
tgsi/text: parse _PRECISE modifier

v2: use str_match_no_case to fix _SAT_PRECISE detection
v4: usd is_digit_alpha_underscore to match end of mods

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agotgsi: populate precise
Karol Herbst [Fri, 23 Jun 2017 18:30:23 +0000 (20:30 +0200)]
tgsi: populate precise

Only implemented for glsl->tgsi. Other converters just set precise to 0.

v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/glsl_to_tgsi: handle precise modifier
Karol Herbst [Fri, 23 Jun 2017 18:30:22 +0000 (20:30 +0200)]
st/glsl_to_tgsi: handle precise modifier

all subexpression inside an ir_assignment needs to be tagged as precise.

v2: make precise handling more global inside the visitor

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agotgsi/dump: print _PRECISE modifier on Instructions
Karol Herbst [Fri, 23 Jun 2017 18:30:21 +0000 (20:30 +0200)]
tgsi/dump: print _PRECISE modifier on Instructions

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agotgsi: add precise flag to tgsi_instruction
Karol Herbst [Fri, 23 Jun 2017 18:30:20 +0000 (20:30 +0200)]
tgsi: add precise flag to tgsi_instruction

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agoi965: Set lower_vote_trivial in vector_nir_options_gen6 too.
Kenneth Graunke [Sat, 22 Jul 2017 01:08:37 +0000 (18:08 -0700)]
i965: Set lower_vote_trivial in vector_nir_options_gen6 too.

There's a second struct for Gen6+.

Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoradv: reset non-syncobj semaphore context after wait.
Dave Airlie [Fri, 21 Jul 2017 22:56:02 +0000 (23:56 +0100)]
radv: reset non-syncobj semaphore context after wait.

When I ported from libdrm, I forgot to add the line to reset
the sem, we just need to reset the context.

This fixes a regression in DOOM.

Fixes: 9ac1432a571 ("radv: port to new libdrm API.")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agost/mesa: add destroy_drawable interface
Charmaine Lee [Thu, 20 Jul 2017 18:04:14 +0000 (11:04 -0700)]
st/mesa: add destroy_drawable interface

With this patch, the st manager will maintain a hash table for
the active framebuffer interface objects. A destroy_drawable interface
is added to allow the state tracker to notify the st manager to remove
the associated framebuffer interface object from the hash table,
so the associated framebuffer and its resources can be deleted
at framebuffers purge time.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829
Fixes: 147d7fb772a ("st/mesa: add a winsys buffers list in st_context")
Tested-by: Brad King <brad.king@kitware.com>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agoradv: rebase radv_entrypoints_gen.py on anv_entrypoints_gen.py
Dylan Baker [Thu, 20 Jul 2017 00:53:42 +0000 (17:53 -0700)]
radv: rebase radv_entrypoints_gen.py on anv_entrypoints_gen.py

The two generators forked from each other, and they remain basically the
same. This rebases the radv version on the anv version, but with the
radv changes ported over. The result is that we get rid of the "cat |"
madness and gain mako, correct "generated by" attributions, and write
files out directly.

The only differences between the output is whitespace and comments.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoi965/miptree: Clean-up unused
Topi Pohjolainen [Tue, 20 Jun 2017 18:20:15 +0000 (21:20 +0300)]
i965/miptree: Clean-up unused

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Switch remaining surfaces to isl
Topi Pohjolainen [Tue, 27 Jun 2017 15:10:31 +0000 (18:10 +0300)]
i965/miptree: Switch remaining surfaces to isl

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Drop miptree_array_layout in get_isl_dim_layout()
Topi Pohjolainen [Thu, 29 Jun 2017 05:18:24 +0000 (08:18 +0300)]
i965/miptree: Drop miptree_array_layout in get_isl_dim_layout()

This was only needed for checking gen6 stencil which is already
using isl. One could delete GEN6_HIZ_STENCIL layout altogether
but that will be gone with the rest after a while anyway.

The dim_layout converter is needed even after transition to isl
when setting up surface states - see brw_emit_surface_state().
Hence dropping the unneeded argument separately.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Relax size alignment for linear surfaces
Topi Pohjolainen [Wed, 28 Jun 2017 09:11:16 +0000 (12:11 +0300)]
i965/miptree: Relax size alignment for linear surfaces

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Store compression flag also for isl based
Topi Pohjolainen [Fri, 30 Jun 2017 17:17:03 +0000 (20:17 +0300)]
i965/miptree: Store compression flag also for isl based

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Check tex image allocation failures
Topi Pohjolainen [Fri, 21 Jul 2017 08:17:57 +0000 (11:17 +0300)]
i965/miptree: Check tex image allocation failures

allowing graceful failure instead of crash on assert later on.

This can be hit, for example, on SNB when trying to allocate
8kx8k CUBE_MAP against isl: x-tiled buffer size becomes
2421161984 exceeding the maximum of 1 << 31 == 2147483648.

Another way to hit this on SNB is with multisampling of over
64-bit formats.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agomain/teximage: Even on failure use valid format for init()
Topi Pohjolainen [Fri, 21 Jul 2017 08:49:08 +0000 (11:49 +0300)]
main/teximage: Even on failure use valid format for init()

Otherwise init_teximage_fields_ms() (called by
_mesa_init_teximage_fields()) will always assert as it can't
find valid base format.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agointel/isl/gen7: Don't allow multisampled surfaces with valign2
Topi Pohjolainen [Tue, 18 Jul 2017 14:10:59 +0000 (17:10 +0300)]
intel/isl/gen7: Don't allow multisampled surfaces with valign2

There is the same constraintg later on as assert in
isl_gen7_choose_image_alignment_el() so catch it earlier in order
to return error instead of crash.

Needed to avoid crashes with piglits on IVB and HSW:

arb_internalformat_query2.image_format_compatibility_type pname checks
arb_internalformat_query2.all internalformat_<x>_type pname checks
arb_internalformat_query2.max dimensions related pname checks
arb_copy_image.arb_copy_image-formats --samples=2/4/6/8
arb_texture_float.multisample-fast-clear gl_arb_texture_float

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agointel/isl/gen7: Allow msaa with signed integer formats
Topi Pohjolainen [Tue, 18 Jul 2017 14:06:07 +0000 (17:06 +0300)]
intel/isl/gen7: Allow msaa with signed integer formats

These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.

There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try 16I/32I in addition to GL_RGBA8I.
IvyBridge passed all tests with all sample numbers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agointel/isl/gen7: Allow msaa with 128-bit formats
Topi Pohjolainen [Tue, 18 Jul 2017 13:25:43 +0000 (16:25 +0300)]
intel/isl/gen7: Allow msaa with 128-bit formats

These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.

There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try GL_RGBA16F/32F/16I/16UI/32I/32UI in addition to GL_RGBA/8I.
IvyBridge passed all tests with all sample numbers and even
with 128-bit formats.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agointel/isl: Allow 1D surfaces with compressed formats
Topi Pohjolainen [Wed, 5 Jul 2017 07:26:03 +0000 (10:26 +0300)]
intel/isl: Allow 1D surfaces with compressed formats

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agointel/isl: Align non-tiled horizontally by cache line
Topi Pohjolainen [Wed, 28 Jun 2017 09:07:32 +0000 (12:07 +0300)]
intel/isl: Align non-tiled horizontally by cache line

in order to support blit engine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree/gen4: Prepare x-tiled fallback for isl based
Topi Pohjolainen [Tue, 20 Jun 2017 17:33:08 +0000 (20:33 +0300)]
i965/miptree/gen4: Prepare x-tiled fallback for isl based

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Prepare non-tiled fallback for isl based
Topi Pohjolainen [Fri, 30 Jun 2017 06:34:48 +0000 (09:34 +0300)]
i965/miptree: Prepare non-tiled fallback for isl based

See brw_miptree_choose_tiling().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Prepare has_color_unresolved() for isl based
Topi Pohjolainen [Wed, 28 Jun 2017 07:04:10 +0000 (10:04 +0300)]
i965/miptree: Prepare has_color_unresolved() for isl based

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agogallivm: handle call attributes for llvm < 4.0 in lp_add_function_attr
Roland Scheidegger [Fri, 21 Jul 2017 18:27:43 +0000 (20:27 +0200)]
gallivm: handle call attributes for llvm < 4.0 in lp_add_function_attr

We had some caller using LLVMAddInstrAttributes, which couldn't be
converted to lp_add_function_attr, because attributes were only handled
for functions in this case, so fix this.
For llvm >= 4.0, this already works correctly.
(radeonsi seems to avoid setting call site attributes prior to llvm 4.0,
the patch then citing it doesn't work when calling intrinsics. But at
least for calling external functions we always used that, albeit only
for actual call attributes, not call parameter attributes, though some
quick test shows llvm seems to handle that as well. The attribute index
is sort of iffy though, since attribute 0 of the call is the actual function,
attribute 1 corresponds to the first parameter of the called function.)
(Verified with GALLIVM_DEBUG=dumpbc plus llvm-dis that the correct
attributes are shown for calls, both for llvm 4.0 and 3.3.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agoradv: Generate storage image descriptors unconditionally
Alex Smith [Fri, 21 Jul 2017 16:00:00 +0000 (17:00 +0100)]
radv: Generate storage image descriptors unconditionally

We can also use storage images internally for resolves, which don't
require TRANSFER_DST usage on the image, so currently we may not create
the needed descriptors.

Just create these descriptors unconditionally.

Fixes: 0e1886efb9e ("radv: Fix descriptors for cube images with VK_IMAGE_USAGE_STORAGE_BIT")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoswr/rast: quit using linux-specific gettid()
Tim Rowley [Fri, 21 Jul 2017 16:38:39 +0000 (11:38 -0500)]
swr/rast: quit using linux-specific gettid()

Linux-specific gettid() syscall shouldn't be used in portable code.
Fix does assume a 1:1 thread:LWP architecture, but works for our
current target platforms and can be revisited later if needed.

Fixes unresolved symbol in linux scons builds.

v2: add comment in code about the 1:1 assumption.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoradv: initial support for shared semaphores (v2)
Dave Airlie [Mon, 27 Feb 2017 19:14:00 +0000 (19:14 +0000)]
radv: initial support for shared semaphores (v2)

This adds support for sharing semaphores using kernel syncobjects.

Syncobj backed semaphores are used for any semaphore which is
created with external flags, and when a semaphore is imported,
otherwise we use the current non-kernel semaphores.

Temporary imports from syncobj fd are also available, these
just override the current user until the next wait, when the
temp syncobj is dropped.

v2: allocate more chunks upfront, fix off by one after
previous refactor of syncobj setup, remove unnecessary null
check.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/winsys: add syncobj hooks
Dave Airlie [Tue, 18 Jul 2017 05:00:44 +0000 (06:00 +0100)]
radv/winsys: add syncobj hooks

This just adds syncobj create/destroy/export/import paths into
the winsys interface.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoac/gpu: add code to detect if kernel supports sync objects.
Dave Airlie [Mon, 5 Jun 2017 00:54:52 +0000 (01:54 +0100)]
ac/gpu: add code to detect if kernel supports sync objects.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoswr/rast: fix memory paths for avx512 optimized avx/sse
Tim Rowley [Thu, 20 Jul 2017 15:51:30 +0000 (10:51 -0500)]
swr/rast: fix memory paths for avx512 optimized avx/sse

Source/destination will not be AVX512 aligned, use the
unaligned load/store intrinsics.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: cache line align hottile buffers
Tim Rowley [Thu, 20 Jul 2017 14:47:11 +0000 (09:47 -0500)]
swr/rast: cache line align hottile buffers

Prevents unalignment crashes with avx512 code on gcc/clang.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: simdlib changes for clang/gcc
Tim Rowley [Tue, 18 Jul 2017 17:04:41 +0000 (12:04 -0500)]
swr/rast: simdlib changes for clang/gcc

Tested with clang-4.0 and gcc-6.3.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoetnaviv: Avoid duplicates in formats table
Wladimir J. van der Laan [Fri, 21 Jul 2017 10:49:58 +0000 (12:49 +0200)]
etnaviv: Avoid duplicates in formats table

Remove the following duplicates from the formats table:

- R8G8B8A8_UNORM (V_,_T)
- R8G8B8X8_UNORM (_T,_T)
- DXT3_RGBA (_T,_T)

Only the first has an effect because the _T overrides the V_ initializer,
the latter two were harmless duplications of the same.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
7 years agoetnaviv: Add support for ETC2 texture compression
Wladimir J. van der Laan [Tue, 18 Jul 2017 10:01:14 +0000 (12:01 +0200)]
etnaviv: Add support for ETC2 texture compression

Add support for ETC2 compressed textures in the etnaviv driver.

One step closer towards GL ES 3 support.

For now, treat SRGB and RGB formats the same. It looks like these are
distinguished using a different bit in sampler state, and not part of
the format, but I have not yet been able to confirm this for sure.

(Only enabled on GC3000+ for now, as the GC2000 ETC2 decoder
implementation is buggy and we don't work around that)

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
7 years agogallium/util: Implement util_format_is_etc
Wladimir J. van der Laan [Tue, 18 Jul 2017 10:01:13 +0000 (12:01 +0200)]
gallium/util: Implement util_format_is_etc

This is the equivalent of util_format_is_s3tc, but for ETC.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
7 years agoAndroid: fix spirv_info.c generation
Chih-Wei Huang [Thu, 20 Jul 2017 10:30:57 +0000 (18:30 +0800)]
Android: fix spirv_info.c generation

It's incorrect to use $(LOCAL_PATH) in makefile recipes since it's
changing. The typical way to handle it is to use private variable.
Fortunately in this case we can just simplify them to $^.

See further:
https://patchwork.freedesktop.org/patch/167718/

Also simplify LOCAL_GENERATED_SOURCES.

Fixes: 2dd4e2ec (spirv: Generate spirv_info.c)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoandroid: fix libmesa_nir build
Tapani Pälli [Wed, 19 Jul 2017 07:12:47 +0000 (10:12 +0300)]
android: fix libmesa_nir build

current build did not find required include 'spirv_info.h'

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agonir: Optimize find_lsb/imsb/umsb error checks
Matt Turner [Fri, 30 Jun 2017 22:48:19 +0000 (15:48 -0700)]
nir: Optimize find_lsb/imsb/umsb error checks

Two of the ARB_shader_ballot piglit tests hit the find_lsb case,
removing some of the noise allowed me to better debug the test when it
was failing.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agoi965/fs: Match destination type to size for ballot
Matt Turner [Fri, 30 Jun 2017 22:11:15 +0000 (15:11 -0700)]
i965/fs: Match destination type to size for ballot

No use in taking a 64-bit value when we know the high 32-bits are zero.

7 years agonir: Reduce destination size of ballot intrinsic when possible
Matt Turner [Fri, 30 Jun 2017 22:07:10 +0000 (15:07 -0700)]
nir: Reduce destination size of ballot intrinsic when possible

Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Enable ARB_shader_ballot on Gen8+
Matt Turner [Fri, 23 Jun 2017 00:15:28 +0000 (17:15 -0700)]
i965: Enable ARB_shader_ballot on Gen8+

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965/fs: Implement ARB_shader_ballot operations
Matt Turner [Thu, 22 Jun 2017 23:46:39 +0000 (16:46 -0700)]
i965/fs: Implement ARB_shader_ballot operations

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965/fs: Do not move MOVs writing the flag outside of control flow
Matt Turner [Fri, 30 Jun 2017 21:58:22 +0000 (14:58 -0700)]
i965/fs: Do not move MOVs writing the flag outside of control flow

The implementation of ballotARB() will start by zeroing the flags
register. So, a doing something like

        if (gl_SubGroupInvocationARB % 2u == 0u) {
                ... = ballotARB(true);
[...]
        } else {
                ... = ballotARB(true);
[...]
}

(like fs-ballot-if-else.shader_test does) would generate identical MOVs
to the same destination (the flag register!), and we definitely do not
want to pull that out of the control flow.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965/fs: Handle explicit flag sources in flags_read()
Francisco Jerez [Thu, 22 Jun 2017 23:42:34 +0000 (16:42 -0700)]
i965/fs: Handle explicit flag sources in flags_read()

The implementations of the ARB_shader_ballot intrinsics will explicitly
read the flag as a source register.

Reviewed-by: Matt Turner <mattst88@gmail.com>