mesa.git
5 years agoRevert "iris: Enable auxiliary buffer support"
Kenneth Graunke [Thu, 21 Feb 2019 23:50:14 +0000 (15:50 -0800)]
Revert "iris: Enable auxiliary buffer support"

This reverts commit cd0ced49e7957182d23e21657445b720184ea425.

It breaks glxgears rendering.

5 years agoiris: Enable -msse2 and -mstackrealign
Kenneth Graunke [Thu, 21 Feb 2019 22:29:00 +0000 (14:29 -0800)]
iris: Enable -msse2 and -mstackrealign

This is needed for gen_clflush.h intrinsics to work on 32-bit builds.
i965 and anv both set these, and iris needs to as well.

Tested-by: Mark Janes <mark.a.janes@intel.com>
5 years agointel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.
Francisco Jerez [Fri, 18 Jan 2019 19:38:17 +0000 (11:38 -0800)]
intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.

Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Implement extended strides greater than 4 for IR source regions.
Francisco Jerez [Fri, 18 Jan 2019 20:51:57 +0000 (12:51 -0800)]
intel/fs: Implement extended strides greater than 4 for IR source regions.

Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57a501bebb1ca2
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

According to Jason, a nearly equivalent change had been committed
previously as e8c9e65185de3e821e1 and then (mistakenly?) reverted as
a31d0382084c8aa8.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Cap dst-aligned region stride to maximum representable hstride value.
Francisco Jerez [Thu, 17 Jan 2019 02:49:47 +0000 (18:49 -0800)]
intel/fs: Cap dst-aligned region stride to maximum representable hstride value.

This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Lower integer multiply correctly when destination stride equals 4.
Francisco Jerez [Thu, 17 Jan 2019 03:01:04 +0000 (19:01 -0800)]
intel/fs: Lower integer multiply correctly when destination stride equals 4.

Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Exclude control sources from execution type and region alignment calculations.
Francisco Jerez [Thu, 17 Jan 2019 02:30:08 +0000 (18:30 -0800)]
intel/fs: Exclude control sources from execution type and region alignment calculations.

Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: clone instruction set rather than removing individual entries
Timothy Arceri [Wed, 20 Feb 2019 03:03:37 +0000 (14:03 +1100)]
nir: clone instruction set rather than removing individual entries

This reduces the time spent in nir_opt_cse() by almost a half.

The massif tool from callgrind reported no change in peak
memory use with the large doliphin uber shaders I used for
testing.

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agogenxml: Remove extra space in gen4/45/5 field name
Jordan Justen [Fri, 18 Aug 2017 00:28:23 +0000 (17:28 -0700)]
genxml: Remove extra space in gen4/45/5 field name

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agogenxml/gen_bits_header.py: Use regex to strip no alphanum chars
Jordan Justen [Thu, 17 Aug 2017 22:44:53 +0000 (15:44 -0700)]
genxml/gen_bits_header.py: Use regex to strip no alphanum chars

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoiris: Enable auxiliary buffer support
Kenneth Graunke [Thu, 14 Feb 2019 01:31:52 +0000 (17:31 -0800)]
iris: Enable auxiliary buffer support

This currently regresses KHR-GL4x.compute_shader.resource-texture,
but that's a pre-existing bug (https://bugs.freedesktop.org/109113)
which should be fixed up once we have fast clear support.

5 years agoiris: Flag ALL_DIRTY_BINDINGS on aux state change.
Rafael Antognolli [Wed, 20 Feb 2019 01:12:19 +0000 (17:12 -0800)]
iris: Flag ALL_DIRTY_BINDINGS on aux state change.

If we change the aux state for a given resource, we need to re-emit the
binding table pointers for any stage that has such resource bound. Since
we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.

5 years agoiris: Skip resolve if there's no context.
Rafael Antognolli [Wed, 20 Feb 2019 01:08:14 +0000 (17:08 -0800)]
iris: Skip resolve if there's no context.

If iris_resource_get_handle() gets called without a context, we can't
resolve the resource. Hopefully it shouldn't be compressed anyway, so
let's just add an assert to ensure it's correct.

5 years agoiris/clear: Pass on render_condition_enabled.
Rafael Antognolli [Fri, 15 Feb 2019 23:23:56 +0000 (15:23 -0800)]
iris/clear: Pass on render_condition_enabled.

5 years agoiris: Avoid leaking if we fail to allocate the aux buffer.
Rafael Antognolli [Fri, 15 Feb 2019 22:16:04 +0000 (14:16 -0800)]
iris: Avoid leaking if we fail to allocate the aux buffer.

Otherwise we could leak the aux state map or the aux BO.

5 years agoiris: Only resolve compute resources for compute shaders
Kenneth Graunke [Thu, 14 Feb 2019 07:10:39 +0000 (23:10 -0800)]
iris: Only resolve compute resources for compute shaders

5 years agoiris: Fix aux usage in render resolve code
Kenneth Graunke [Thu, 14 Feb 2019 06:31:07 +0000 (22:31 -0800)]
iris: Fix aux usage in render resolve code

5 years agoiris: Pin HiZ buffers when rendering.
Rafael Antognolli [Wed, 13 Feb 2019 18:20:41 +0000 (10:20 -0800)]
iris: Pin HiZ buffers when rendering.

5 years agoiris: Flush before hiz_exec.
Rafael Antognolli [Wed, 6 Feb 2019 00:40:14 +0000 (16:40 -0800)]
iris: Flush before hiz_exec.

5 years agoiris: Allow disabling aux via INTEL_DEBUG options
Kenneth Graunke [Tue, 11 Dec 2018 08:43:05 +0000 (00:43 -0800)]
iris: Allow disabling aux via INTEL_DEBUG options

5 years agoiris: do flush for buffers still
Kenneth Graunke [Tue, 11 Dec 2018 07:13:23 +0000 (23:13 -0800)]
iris: do flush for buffers still

5 years agoiris: make surface states for CCS_D too
Kenneth Graunke [Tue, 11 Dec 2018 06:41:34 +0000 (22:41 -0800)]
iris: make surface states for CCS_D too

CCS_E can fall back to CCS_D with incompatible format views

CCS_D is pretty useless without fast clears and we may as well use NONE,
but we're surely going to hook those up at some point, so may as well
just go ahead and do it now...

5 years agoiris: Skip msaa16 on gen < 9.
Rafael Antognolli [Mon, 4 Feb 2019 23:16:18 +0000 (15:16 -0800)]
iris: Skip msaa16 on gen < 9.

Also needed to add gen information to KEY_INIT.

5 years agoiris: Set program key fields for MCS
Kenneth Graunke [Tue, 11 Dec 2018 06:03:14 +0000 (22:03 -0800)]
iris: Set program key fields for MCS

5 years agoiris: don't use hiz for MSAA buffers
Kenneth Graunke [Tue, 11 Dec 2018 05:54:44 +0000 (21:54 -0800)]
iris: don't use hiz for MSAA buffers

5 years agoiris: some initial HiZ bits
Kenneth Graunke [Mon, 10 Dec 2018 08:35:48 +0000 (00:35 -0800)]
iris: some initial HiZ bits

5 years agoiris: disable aux for external things
Kenneth Graunke [Mon, 10 Dec 2018 07:12:33 +0000 (23:12 -0800)]
iris: disable aux for external things

5 years agoiris: Resolves for compute
Kenneth Graunke [Mon, 10 Dec 2018 03:08:40 +0000 (19:08 -0800)]
iris: Resolves for compute

5 years agoiris: consider framebuffer parameter for aux usages
Kenneth Graunke [Mon, 10 Dec 2018 03:07:13 +0000 (19:07 -0800)]
iris: consider framebuffer parameter for aux usages

5 years agoiris: Make blit code use actual aux usages
Kenneth Graunke [Mon, 10 Dec 2018 00:09:55 +0000 (16:09 -0800)]
iris: Make blit code use actual aux usages

5 years agoiris: store modifier info in res
Kenneth Graunke [Sun, 9 Dec 2018 20:11:17 +0000 (12:11 -0800)]
iris: store modifier info in res

5 years agoiris: pin the buffers
Kenneth Graunke [Sat, 8 Dec 2018 19:52:55 +0000 (11:52 -0800)]
iris: pin the buffers

5 years agoiris: resolve before transfer maps
Kenneth Graunke [Sat, 8 Dec 2018 19:40:25 +0000 (11:40 -0800)]
iris: resolve before transfer maps

5 years agoiris: be sure to skip buffers in resolve code
Kenneth Graunke [Sat, 8 Dec 2018 10:01:19 +0000 (02:01 -0800)]
iris: be sure to skip buffers in resolve code

Buffers don't have ISL surfaces, and this can get us into trouble.

5 years agoiris: try to fix copyimage vs copybuffers
Kenneth Graunke [Sat, 8 Dec 2018 09:32:10 +0000 (01:32 -0800)]
iris: try to fix copyimage vs copybuffers

5 years agoiris: actually use the multiple surf states for aux modes
Kenneth Graunke [Sat, 8 Dec 2018 03:51:05 +0000 (19:51 -0800)]
iris: actually use the multiple surf states for aux modes

5 years agoiris: add some draw resolve hooks
Kenneth Graunke [Sat, 8 Dec 2018 02:13:07 +0000 (18:13 -0800)]
iris: add some draw resolve hooks

5 years agoiris: blorp using resolve hooks
Kenneth Graunke [Fri, 7 Dec 2018 21:33:25 +0000 (13:33 -0800)]
iris: blorp using resolve hooks

5 years agoiris: Initial import of resolve code
Kenneth Graunke [Fri, 7 Dec 2018 19:54:16 +0000 (11:54 -0800)]
iris: Initial import of resolve code

5 years agoiris: create aux surface if needed
Kenneth Graunke [Fri, 7 Dec 2018 19:54:02 +0000 (11:54 -0800)]
iris: create aux surface if needed

5 years agoiris: Fill out SURFACE_STATE entries for each possible aux usage
Kenneth Graunke [Fri, 7 Dec 2018 19:33:13 +0000 (11:33 -0800)]
iris: Fill out SURFACE_STATE entries for each possible aux usage

5 years agoiris: Fill out res->aux.possible_usages
Kenneth Graunke [Fri, 7 Dec 2018 19:02:50 +0000 (11:02 -0800)]
iris: Fill out res->aux.possible_usages

5 years agoiris: Add iris_resource fields for aux surfaces
Kenneth Graunke [Fri, 7 Dec 2018 18:46:04 +0000 (10:46 -0800)]
iris: Add iris_resource fields for aux surfaces

But without fast clears or HiZ per-level tracking just yet.

5 years agoiris: Emit default L3 config for the render pipeline
Jordan Justen [Thu, 14 Feb 2019 10:26:53 +0000 (02:26 -0800)]
iris: Emit default L3 config for the render pipeline

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
5 years agoiris: Always emit at least one BLEND_STATE
Kenneth Graunke [Fri, 15 Feb 2019 22:22:52 +0000 (14:22 -0800)]
iris: Always emit at least one BLEND_STATE

5 years agoiris: Add missing depth cache flushes
Kenneth Graunke [Thu, 14 Feb 2019 09:05:57 +0000 (01:05 -0800)]
iris: Add missing depth cache flushes

5 years agoiris: Simplify iris_get_depth_stencil_resources
Kenneth Graunke [Thu, 14 Feb 2019 06:12:01 +0000 (22:12 -0800)]
iris: Simplify iris_get_depth_stencil_resources

We can safely assume that the given resource is depth, depth/stencil,
or stencil already.  The stencil-only case is easily detectable with
a single format check, and all other cases are handled identically.

This saves some CPU overhead.

5 years agoiris: Make an IRIS_MAX_MIPLEVELS define
Kenneth Graunke [Thu, 14 Feb 2019 00:41:46 +0000 (16:41 -0800)]
iris: Make an IRIS_MAX_MIPLEVELS define

5 years agoiris: Store internal_format when getting resource from handle.
Rafael Antognolli [Wed, 13 Feb 2019 21:07:51 +0000 (13:07 -0800)]
iris: Store internal_format when getting resource from handle.

5 years agoiris: Move create and bind driver hooks to the end of iris_program.c
Kenneth Graunke [Tue, 1 Jan 2019 23:16:44 +0000 (15:16 -0800)]
iris: Move create and bind driver hooks to the end of iris_program.c

This just moves the code for dealing with pipe_shader_state /
pipe_compute_state / iris_uncompiled_shader to the end of the file.
Now that those do precompiles, they want to call the actual compile
functions.  Putting them at the end eliminates the need for a bunch
of prototypes.

5 years agoiris: implement clearing render target and depth stencil
Timur Kristóf [Mon, 11 Feb 2019 01:13:29 +0000 (02:13 +0100)]
iris: implement clearing render target and depth stencil

v2 (Kenneth Graunke): split color/depthstencil cases, fix iris_clear

5 years agoiris: Drop XXX about checking for swizzling
Kenneth Graunke [Tue, 12 Feb 2019 06:36:45 +0000 (22:36 -0800)]
iris: Drop XXX about checking for swizzling

Caio noted that this is not necessary on Gen8+:

   "Before Gen8, there was a historical configuration control field to
    swizzle address bit[6] for in X/Y tiling modes.  This was set in
    three different places: TILECTL[1:0], ARB_MODE[5:4], and
    DISP_ARB_CTL[14:13].  For Gen8 and subsequent generations, the
    swizzle fields are all reserved, and the CPU's memory controller
    performs all address swizzling modifications."

Since we don't support earlier hardware, we can skip it entirely.

5 years agoiris: Set HasWriteableRT correctly
Kenneth Graunke [Mon, 11 Feb 2019 20:07:51 +0000 (12:07 -0800)]
iris: Set HasWriteableRT correctly

A bit of irritating state cross dependency here, but nothing too hard

5 years agoiris: Set 3DSTATE_WM::ForceThreadDispatchEnable
Kenneth Graunke [Mon, 11 Feb 2019 22:22:50 +0000 (14:22 -0800)]
iris: Set 3DSTATE_WM::ForceThreadDispatchEnable

The Vulkan driver only sets this if color writes are disabled, which
is more conservative - but would require us to inspect blend state.

(If color writes are enabled, we don't need to force anything, because
the internal signal is already correct.  But it shouldn't hurt to do so.)

5 years agoiris: Drop XXX about alpha testing
Kenneth Graunke [Mon, 11 Feb 2019 19:40:38 +0000 (11:40 -0800)]
iris: Drop XXX about alpha testing

I was misreading i965 - the 3DSTATE_WM::PixelShaderKillsPixel bit from
Gen < 8 needed all of this, but the 3DSTATE_PS_EXTRA bit only needs
prog_data->uses_kill.

5 years agoiris: improve PIPE_CAP_VIDEO_MEMORY bogus value
Andre Heider [Wed, 6 Feb 2019 09:53:18 +0000 (10:53 +0100)]
iris: improve PIPE_CAP_VIDEO_MEMORY bogus value

-1 is a little too bogus for most games ;)

Signed-off-by: Andre Heider <a.heider@gmail.com>
5 years agoiris: fix build with gallium nine
Andre Heider [Wed, 6 Feb 2019 01:26:45 +0000 (02:26 +0100)]
iris: fix build with gallium nine

Signed-off-by: Andre Heider <a.heider@gmail.com>
5 years agoiris: Stop chopping off the first nine characters of the renderer string
Kenneth Graunke [Mon, 11 Feb 2019 19:05:48 +0000 (11:05 -0800)]
iris: Stop chopping off the first nine characters of the renderer string

5 years agoiris: rework num textures to util_lastbit
Kenneth Graunke [Sun, 13 Jan 2019 19:36:10 +0000 (11:36 -0800)]
iris: rework num textures to util_lastbit

5 years agoiris: Add PIPE_CAP_MAX_VARYINGS
Kenneth Graunke [Sun, 10 Feb 2019 22:23:45 +0000 (14:23 -0800)]
iris: Add PIPE_CAP_MAX_VARYINGS

5 years agoiris: Make a iris_batch_reference_signal_syncpt helper function.
Kenneth Graunke [Thu, 7 Feb 2019 16:48:38 +0000 (08:48 -0800)]
iris: Make a iris_batch_reference_signal_syncpt helper function.

Suggested by Chris Wilson.  More obvious what's going on.

5 years agoiris: Use READ_ONCE and WRITE_ONCE for snapshots_landed
Kenneth Graunke [Thu, 7 Feb 2019 16:42:50 +0000 (08:42 -0800)]
iris: Use READ_ONCE and WRITE_ONCE for snapshots_landed

Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads.  It may also be necessary for
the compiler in a few cases.

5 years agoiris: Fix accidental busy-looping in query waits
Kenneth Graunke [Thu, 7 Feb 2019 16:41:29 +0000 (08:41 -0800)]
iris: Fix accidental busy-looping in query waits

When switching from bo_wait to sync-points, I missed that we turned an
if (not landed) bo_wait into a while (not landed) check_syncpt(), which
has a timeout of 0.  This meant, rather than sleeping until the batch
is complete, we'd busy-loop, continually asking the kernel "is the batch
done yet???".  This is not what we want at all - if we wanted a busy
loop, we'd just loop on !snapshots_landed.  We want to sleep.

Add an effectively infinite timeout so that we sleep.

5 years agoiris: Add a timeout_nsec parameter, rename check_syncpt to wait_syncpt
Kenneth Graunke [Thu, 7 Feb 2019 17:40:00 +0000 (09:40 -0800)]
iris: Add a timeout_nsec parameter, rename check_syncpt to wait_syncpt

I want to be able to wait with a non-zero timeout from elsewhere.

5 years agoiris: Don't allocate a BO per query object
Sagar Ghuge [Tue, 15 Jan 2019 22:15:07 +0000 (14:15 -0800)]
iris: Don't allocate a BO per query object

Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.

Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
5 years agoiris: Implement ALT mode for ARB_{vertex,fragment}_shader
Kenneth Graunke [Tue, 5 Feb 2019 07:36:47 +0000 (23:36 -0800)]
iris: Implement ALT mode for ARB_{vertex,fragment}_shader

Fixes gl-1.0-spot-light

5 years agoiris: Fix bug in bound vertex buffer tracking
Kenneth Graunke [Mon, 7 Jan 2019 04:22:15 +0000 (20:22 -0800)]
iris: Fix bug in bound vertex buffer tracking

res might be NULL, at which point this is an unbind.

5 years agoiris: minor tidying
Kenneth Graunke [Thu, 24 Jan 2019 17:26:38 +0000 (09:26 -0800)]
iris: minor tidying

5 years agoiris: Unreference some more things on state module teardown
Kenneth Graunke [Thu, 24 Jan 2019 17:01:53 +0000 (09:01 -0800)]
iris: Unreference some more things on state module teardown

5 years agoiris: Drop dead state_size hash table
Kenneth Graunke [Thu, 24 Jan 2019 01:03:54 +0000 (17:03 -0800)]
iris: Drop dead state_size hash table

I inherited this from i965.  It would be nice to track the state size
so INTEL_DEBUG=color,bat decoding can print the right number of e.g.
binding table entries or blend states, but...without a single point
of entry for state, it's a little tricky to get right.  Punt for now,
and drop the dead code in the meantime.

5 years agoiris: Drop comment about ISP_DIS
Kenneth Graunke [Thu, 24 Jan 2019 00:58:30 +0000 (16:58 -0800)]
iris: Drop comment about ISP_DIS

i965 re-emits 3DSTATE_CONSTANT_* on every batch, so there's no point in
restoring the constants from the context.  Iris actually re-pins the
constant buffers properly across the batch, and avoids re-emitting the
constant packets unless it's necessary.  So, we don't want ISP_DIS.

5 years agoiris: Enable PIPE_CAP_COMPACT_ARRAYS
Kenneth Graunke [Wed, 23 Jan 2019 10:58:59 +0000 (02:58 -0800)]
iris: Enable PIPE_CAP_COMPACT_ARRAYS

5 years agoiris: Remap stream output indexes back to VARYING_SLOT_*.
Kenneth Graunke [Wed, 23 Jan 2019 07:28:39 +0000 (23:28 -0800)]
iris: Remap stream output indexes back to VARYING_SLOT_*.

Previously I had a hack in st/mesa to make it stop remapping
VARYING_SLOT_* into the naively compacted slots, which aren't
what we want.  But that wasn't very feasible, as we'd have to
update all drivers, or add capability bits, and it gets messy fast.

It turns out that I can map back to VARYING_SLOT_* in about 5 LOC,
so let's just do that.  It removes the need for hacks, and is easy.

This also fixes KHR-GL46.enhanced_layouts.xfb_capture_struct, which
apparently with my hack was still getting the wrong slot info.

5 years agoiris: Zero the compute predicate when changing the render condition
Kenneth Graunke [Tue, 22 Jan 2019 22:22:55 +0000 (14:22 -0800)]
iris: Zero the compute predicate when changing the render condition

1. Set a render condition.  We emit it immediately on the render
   engine, and stash q->bo as ice->state.compute_predicate in case
   the compute engine needs it.

2. Clear the render condition.  We were incorrectly leaving a stale
   compute_predicate kicking around...

3. Dispatch compute.  We would then read the stale compute predicate,
   and try to load it into MI_PREDICATE_DATA.  But q->bo may have been
   freed altogether, causing us to try and use garbage memory as a BO,
   adding it to the validation list, failing asserts, and tripping
   EINVALs in execbuf.

Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure
down to a list of 48 tests I could easily run to reproduce it.  Huge
thanks to the Valgrind authors for the memcheck tool that immediately
pinpointed the problem.

5 years agoiris: always include an extra constbuf0 if using UBOs
Caio Marcelo de Oliveira Filho [Sat, 19 Jan 2019 19:32:37 +0000 (11:32 -0800)]
iris: always include an extra constbuf0 if using UBOs

In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have
its index incremented to open room for uniforms in constbuf0.  So if
we use UBOs, we always need to include the extra binding entry in the
table.

To avoid doing this checks both when compiling the shader and when
assigning binding tables, store the num_cbufs in iris_compiled_shader.

Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use
uniforms or system values.  Note that some tests fitting this criteria
were passing because the UBOs were moved to be push
constants (avoiding the problem).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Do binder address allocations per-context, not globally.
Kenneth Graunke [Fri, 18 Jan 2019 20:26:41 +0000 (12:26 -0800)]
iris: Do binder address allocations per-context, not globally.

iris_bufmgr allocates addresses across the entire screen, since buffers
may be shared between multiple contexts.  There used to be a single
special address, IRIS_BINDER_ADDRESS, that was per-context - and all
contexts used the same address.  When I moved to the multi-binder
system, I made a separate memory zone for them.  I wanted there to be
2-3 binders per context, so we could cycle them to avoid the stalls
inherent in pinning two buffers to the same address in back-to-back
batches.  But I figured I'd allow 100 binders just to be wildly
excessive/cautious.

What I didn't realize was that we need 2-3 binders per *context*,
and what I did was allocate 100 binders per *screen*.  Web browsers,
for example, might have 1-2 contexts per tab, leading to hundreds of
contexts, and thus binders.

To fix this, we stop allocating VMA for binders in bufmgr, and let
the binder handle it itself.  Binders are per-context, and they can
assign context-local addresses for the buffers by simply doing a
ringbuffer style approach.  We only hold on to one binder BO at a
time, so we won't ever have a conflicting address.

This fixes dEQP-EGL.functional.multicontext.non_shared_clear.

Huge thanks to Tapani Pälli for debugging this whole mess and
figuring out what was going wrong.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoiris: Fix memzone_for_address for the surface and binder zones
Kenneth Graunke [Fri, 18 Jan 2019 20:20:43 +0000 (12:20 -0800)]
iris: Fix memzone_for_address for the surface and binder zones

We use > for IRIS_MEMZONE_DYNAMIC because IRIS_BORDER_COLOR_POOL_ADDRESS
lives at the very start of that zone.  However, IRIS_MEMZONE_SURFACE and
IRIS_MEMZONE_BINDER are normal zones.  They used to be a single zone
(surface) with a single binder BO at the beginning, similar to the
border color pool.  But when I moved us to multiple binders, I made them
have a real zone (if a small one).  So both zones should use >=.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoiris: Don't whack SO dirty bits when finishing a BLORP op
Kenneth Graunke [Fri, 18 Jan 2019 08:01:05 +0000 (00:01 -0800)]
iris: Don't whack SO dirty bits when finishing a BLORP op

Re-emitting 3DSTATE_SO_BUFFERS can be hazardous, as it could zero
offsets.  Plus, it's just not necessary - BLORP doesn't change these.

5 years agoiris: Fix SO issue with INTEL_DEBUG=reemit, set fewer bits
Kenneth Graunke [Wed, 16 Jan 2019 09:19:44 +0000 (01:19 -0800)]
iris: Fix SO issue with INTEL_DEBUG=reemit, set fewer bits

INTEL_DEBUG=reemit was breaking streamout tests, by re-emitting
3DSTATE_SO_BUFFER commands that tell the HW to zero the SO write
offsets.  We would need to alter them to use 0xFFFFFFFF for the offset.

Also, have each upload function only flag bits relevant to its own
pipeline.

5 years agoiris: CS stall on VF cache invalidate workarounds
Kenneth Graunke [Fri, 18 Jan 2019 07:44:09 +0000 (23:44 -0800)]
iris: CS stall on VF cache invalidate workarounds

See commit 31e4c9ce400341df9b0136419b3b3c73b8c9eb7e in i965.

5 years agoiris: Pay attention to blit masks
Kenneth Graunke [Wed, 16 Jan 2019 10:02:19 +0000 (02:02 -0800)]
iris: Pay attention to blit masks

For combined depth/stencil formats, we may want to only blit one half.
If PIPE_BLIT_Z is set, blit depth; if PIPE_BLIT_S is set, blit stencil.

5 years agoiris: Assert about blits with color masking
Kenneth Graunke [Wed, 16 Jan 2019 09:53:00 +0000 (01:53 -0800)]
iris: Assert about blits with color masking

st/mesa never asks for this today, but in theory someone might, and we
don't support it.

5 years agoiris: Don't enable smooth points when point sprites are enabled
Kenneth Graunke [Wed, 16 Jan 2019 07:41:34 +0000 (23:41 -0800)]
iris: Don't enable smooth points when point sprites are enabled

dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_*.primitives.points

5 years agoiris: Allow sample mask of 0
Kenneth Graunke [Wed, 16 Jan 2019 07:22:48 +0000 (23:22 -0800)]
iris: Allow sample mask of 0

I think this was an attempt to work around various sample mask bugs I
had early on.  It's not correct.  A sample mask of 0 is legal and means
to disable all samples.

Fixes dEQP-GLES31.functional.texture.multisample.*.*sample_mask*

5 years agoiris: fail to create screen for older unsupported HW
Kenneth Graunke [Mon, 14 Jan 2019 08:25:23 +0000 (00:25 -0800)]
iris: fail to create screen for older unsupported HW

loader shouldn't try, but let's be paranoid

5 years agoiris: Switch to the new PIPELINE_STATISTICS_QUERY_SINGLE capability
Kenneth Graunke [Fri, 11 Jan 2019 21:39:04 +0000 (13:39 -0800)]
iris: Switch to the new PIPELINE_STATISTICS_QUERY_SINGLE capability

I had a hack in place earlier to pass the query type as q->index
for the regular statistics query, but we ended up adjusting the
interface and adding a new query type.  Use that instead, fixing
pipeline statistics queries since the rebase.

5 years agoiris: Use new PIPE_STAT_QUERY enums rather than hardcoded numbers.
Kenneth Graunke [Fri, 11 Jan 2019 08:21:06 +0000 (00:21 -0800)]
iris: Use new PIPE_STAT_QUERY enums rather than hardcoded numbers.

5 years agoiris: Fix Broadwell WaDividePSInvocationCountBy4
Kenneth Graunke [Fri, 11 Jan 2019 08:28:07 +0000 (00:28 -0800)]
iris: Fix Broadwell WaDividePSInvocationCountBy4

We were dividing by 4 in calculate_result_on_gpu(), and also in
iris_get_query_result().  We should stop doing the latter, and instead
divide by 4 in calculate_result_on_cpu() as well.

Otherwise, if snapshots were available, and you hit the
calculate_result_on_cpu() path, but requested it be written to a QBO,
you'd fail to get a divide.

5 years agoiris: Delete genx->bound_vertex_buffers
Kenneth Graunke [Sun, 6 Jan 2019 23:56:26 +0000 (15:56 -0800)]
iris: Delete genx->bound_vertex_buffers

This is actually stored in ice->state, as it isn't gen-specific

5 years agoiris: Drop a dead comment
Kenneth Graunke [Fri, 4 Jan 2019 06:34:49 +0000 (22:34 -0800)]
iris: Drop a dead comment

5 years agoiris: Don't check other batches for our batch BO
Kenneth Graunke [Wed, 2 Jan 2019 10:45:00 +0000 (02:45 -0800)]
iris: Don't check other batches for our batch BO

This is an awkward corner case.  We create batches in order, each of
which creates and pins a BO.  The other batches may not be set up yet,
so it may not be safe to ask whether they reference a BO.

Just avoid this for now.  We could avoid it for other context-local BOs
too, but we currently don't have a flag for that (and I'm not certain
whether it's worth it).

5 years agoiris: Handle PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE somewhat
Kenneth Graunke [Tue, 1 Jan 2019 06:03:35 +0000 (22:03 -0800)]
iris: Handle PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE somewhat

Various places in the transfer code need to know whether they must
read the existing resource's values.  Rather than checking both flags
everywhere, just make PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE also flag
PIPE_TRANSFER_DISCARD_RANGE - if we can discard everything, we can
discard a subrange, too.

Obviously, we can do better for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE,
but eventually u_threaded_context should handle swapping out buffers
for new idle buffers, anyway.  In the meantime, this is at least better.

5 years agoiris: Flush the render cache in flush_and_dirty_for_history
Kenneth Graunke [Sun, 23 Dec 2018 05:24:02 +0000 (21:24 -0800)]
iris: Flush the render cache in flush_and_dirty_for_history

BLORP uses the render engine to write to buffers, and we need to flush
that data out to the actual surface (finishing the write).  Then, the
rest of this function invalidates any caches that might have stale data
which needs to be refetched.

5 years agoiris: Implement multi-slice copy_region
Kenneth Graunke [Mon, 24 Dec 2018 07:04:37 +0000 (23:04 -0800)]
iris: Implement multi-slice copy_region

I don't know if this is required - surprisingly, I haven't seen it
matter - but I'd like to use it for multi-slice transfer maps.  We may
as well do the right thing.

5 years agoiris: Leave a comment about why Broadwell images are broken
Kenneth Graunke [Mon, 31 Dec 2018 17:19:07 +0000 (09:19 -0800)]
iris: Leave a comment about why Broadwell images are broken

There are a variety of ways to fix this, many of which are simple, but
I could use some advice on which ones other people prefer, and so we'll
punt until after the holidays.

5 years agoiris: Fix surface states for Gen8 lowered-to-untype images
Kenneth Graunke [Wed, 26 Dec 2018 10:06:13 +0000 (02:06 -0800)]
iris: Fix surface states for Gen8 lowered-to-untype images

We have to use SURFTYPE_BUFFER and ISL_FORMAT_RAW for these.

5 years agoiris: Fill out brw_image_params for storage images on Broadwell
Kenneth Graunke [Fri, 30 Nov 2018 10:27:07 +0000 (02:27 -0800)]
iris: Fill out brw_image_params for storage images on Broadwell

5 years agoiris: Don't make duplicate system values
Kenneth Graunke [Thu, 27 Dec 2018 09:27:44 +0000 (01:27 -0800)]
iris: Don't make duplicate system values

We were relying on CSE/GVN/etc to coalesce all intrinsics that load the
same value, but that's a bad idea.  We might have a couple intrinsics
that reload the same value.  If so, we only want to set up the uniform
on the first one we see.

5 years agoiris: Don't enable push constants just because there are system values
Kenneth Graunke [Thu, 27 Dec 2018 08:49:56 +0000 (00:49 -0800)]
iris: Don't enable push constants just because there are system values

System values are built-in uniforms.  We set them up as UBO values, and
might pull or push them.  UBO push analysis will take care of that.  We
only want to enable push constants if there's an actual range being
pushed.  Otherwise, we might get into a scenario where 3DSTATE_PS
enables push constants but 3DSTATE_CONSTANT_PS isn't pushing anything.

This fixes GPU hangs in Broadwell image load store tests which have
unused image param system values but no other uniforms.  (We shouldn't
be making those anyway, but that's a separate fix...)

5 years agoiris: Fix framebuffer layer count
Kenneth Graunke [Mon, 24 Dec 2018 02:22:44 +0000 (18:22 -0800)]
iris: Fix framebuffer layer count

cso_fb->layers is only valid for no-attachment framebuffers.  Use the
helper function to get the real value, then stash it so we don't have
to call the helper function on the old value for comparison, or at draw
time for Force Zero RTA Index setting.

This fixes Force Zero RTA Index being set even when attempting layered
rendering.