Denis Pauk [Tue, 26 Jun 2018 20:30:49 +0000 (23:30 +0300)]
mesa: add header for share bptc decompress functions
Move shared bptc functions to texcompress_bptc_tmp.h:
* fetch_rgba_unorm_from_block
* fetch_rgb_float_from_block
* compress_rgba_unorm
* compress_rgb_float
Create decompress functions:
* decompress_rgba_unorm
* decompress_rgb_float
Functions will be reused in gallium/auxiliary code.
v2: Add block decompress function
v3: Move all shared code to header
Suggested-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Sat, 30 Jun 2018 04:57:08 +0000 (00:57 -0400)]
glsl/cache: save and restore ExternalSamplersUsed
Shaders that need special code for external samplers were broken if
they were loaded from the cache.
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Timothy Arceri [Mon, 4 Jun 2018 06:26:46 +0000 (16:26 +1000)]
nir: fix selection of loop terminator when two or more have the same limit
We need to add loop terminators to the list in the order we come
across them otherwise if two or more have the same exit condition
we will select that last one rather than the first one even though
its unreachable.
This fix is for simple unrolls where we only have a single exit
point. When unrolling these type of loops the unreachable
terminators and their unreachable branch are removed prior to
unrolling. Because of the logic change we also switch some
list access in the complex unrolling logic to avoid breakage.
Fixes: 6772a17acc8e ("nir: Add a loop analysis pass")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Timothy Arceri [Mon, 25 Jun 2018 10:31:02 +0000 (20:31 +1000)]
radeonsi: enable OpenGL 4.4 compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 20 Jun 2018 03:05:05 +0000 (13:05 +1000)]
mesa: enable ARB_vertex_attrib_64bit in compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Thu, 28 Jun 2018 05:31:09 +0000 (15:31 +1000)]
mesa: add outstanding ARB_vertex_attrib_64bit dlist support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Dave Airlie [Thu, 28 Jun 2018 02:40:20 +0000 (12:40 +1000)]
vbo_save: add support for doubles to display list code
Required for ARB_vertex_attrib_64bit compat profile support.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Mon, 25 Jun 2018 00:32:58 +0000 (10:32 +1000)]
mesa: add compat profile support for ARB_multi_draw_indirect
v2: add missing ARB_base_instance support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Mon, 25 Jun 2018 00:31:34 +0000 (10:31 +1000)]
mesa: make valid_draw_indirect_multi() accessible externally
We will use this to add compat support to ARB_multi_draw_indirect
in the following patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Sat, 23 Jun 2018 07:09:13 +0000 (17:09 +1000)]
mesa: add ARB_draw_indirect support to compat profile
v2: add missing ARB_base_instance support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Sat, 23 Jun 2018 02:29:50 +0000 (12:29 +1000)]
mesa: generate GL_INVALID_OPERATION using draw indirect in dlist
The spec doesn't explicitly say to generate an error but since
DrawArraysInstanced* and DrawElementsInstanced* do, it makes
sense to do it for these functions also.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Thu, 28 Jun 2018 00:25:17 +0000 (10:25 +1000)]
mesa: add missing display list support for ARB_compute_shader
The extension is enabled for compat profile but there is currently
no display list support.
I filed a spec bug and it has been agreed that
glDispatchComputeIndirect should generate an INVALID_OPERATION
error when called during display list compilation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Thu, 21 Jun 2018 00:35:15 +0000 (10:35 +1000)]
mesa: expose some ARB_viewport_array dependent extensions in compat
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 20 Jun 2018 03:03:40 +0000 (13:03 +1000)]
mesa: enable ARB_viewport_array in compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Thu, 21 Jun 2018 00:14:36 +0000 (10:14 +1000)]
mesa: add ARB_viewport_array display list support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 20 Jun 2018 00:55:34 +0000 (10:55 +1000)]
mesa: enable ARB_shader_subroutine in compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 20 Jun 2018 01:08:35 +0000 (11:08 +1000)]
mesa: add glUniformSubroutinesuiv() display list support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 20 Jun 2018 00:16:20 +0000 (10:16 +1000)]
mesa: stop hiding remaining query parameters from OpenGL compat
I managed to miss these two in my last pass at this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 19 Jun 2018 09:35:17 +0000 (19:35 +1000)]
mesa: enable ARB_gpu_shader_fp64 in compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 19 Jun 2018 09:33:26 +0000 (19:33 +1000)]
mesa: add ProgramUniform*d display list support
This is required for fp64 to be enabled in compat profile.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 19 Jun 2018 09:05:25 +0000 (19:05 +1000)]
mesa: add Uniform*d support to display lists
This is required so we can enable fp64 support in compat profile.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Karol Herbst [Tue, 20 Feb 2018 16:56:47 +0000 (17:56 +0100)]
st/glsl_to_nir: run lower_output_reads on !PIPE_CAP_TGSI_CAN_READ_OUTPUTS
this is required for Drivers which don't allow reading from outputs.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Eric Anholt [Thu, 28 Jun 2018 19:33:43 +0000 (12:33 -0700)]
v3d: Move GL shader state dumping out of per-version compilation.
It doesn't depend on V3D_VER, since it's just calling v3d_print_group.
Eric Anholt [Thu, 28 Jun 2018 20:08:59 +0000 (13:08 -0700)]
v3d: Add missing Stream field to transform feedback specs on V3D 4.1.
Noticed when trying to CLIF parse a transform feedback job that hangs on
HW.
Eric Anholt [Wed, 27 Jun 2018 23:40:36 +0000 (16:40 -0700)]
v3d: Add missing "tri trip or fan" flag in Primitive List Format.
Eric Anholt [Wed, 27 Jun 2018 23:31:19 +0000 (16:31 -0700)]
v3d: Fix the shader code address field widths on V3D 4.1+
We were overlapping it with the threadable/nan flags, resulting in
incorrect relocations (threadable/nan included in the offset) and wrong
ordering in the CLIF files.
Eric Anholt [Wed, 27 Jun 2018 23:28:25 +0000 (16:28 -0700)]
v3d: Add missing "no prim pack" field to the V3D4.1+ GL shader state.
It looks like we don't need this flag for anything (not that I'm clear on
what it does), but it makes our struct dumping line up with CLIF parsing.
Eric Anholt [Wed, 27 Jun 2018 23:00:16 +0000 (16:00 -0700)]
v3d: Express dithering mode in the same way that the CLIF parser does.
Eric Anholt [Wed, 27 Jun 2018 22:55:32 +0000 (15:55 -0700)]
v3d: Add missing "number of bin tile lists" field.
Noticed when trying to feed our dumps through the CLIF parser. Since this
is a "minus one" field, we were already filling in the value we wanted (0).
Eric Anholt [Wed, 27 Jun 2018 22:25:03 +0000 (15:25 -0700)]
v3d: Rewrite the color write masks to match CLIF format.
The render_target_* fields gave us pretty(ish) printing, but meant we were
incompatible with CLIF, and had much more verbose code generating them.
Eric Anholt [Wed, 27 Jun 2018 18:21:34 +0000 (11:21 -0700)]
v3d: Merge the V3D 4.1 and 4.2 XML into V3D 3.3'x XML.
The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication. This will
get even more significant if we ever support 3.2 or 4.0.
Eric Anholt [Wed, 27 Jun 2018 21:10:52 +0000 (14:10 -0700)]
v3d: Switch v3d_decoder.c to the XML's top min_ver/max_ver fields.
The XML zipper wants one XML per version for filling out its tables, but
we want to do more than one GPU version per XML now. Assume that the
"gen" field will be the same as min_ver and look up our XML text assuming
that they're listed in increasing min_ver.
Eric Anholt [Wed, 27 Jun 2018 18:10:07 +0000 (11:10 -0700)]
v3d: Create XML fields for min_ver and max_ver of a packet/struct/enum.
This will be used to merge together the V3D 3.3-4.1 XML with the variants
disabled based on the version.
Eric Anholt [Wed, 27 Jun 2018 17:46:04 +0000 (10:46 -0700)]
v3d: Pass the version being generated to the pack generator script.
It turns out that most V3D versions change very few packets, so keeping
separate copies of the XML per version makes changing the XML a pain as
you have to replicate your changes to each one. This is the start of
changing it so that one XML can generate headers for multiple versions.
Jose Maria Casanova Crespo [Thu, 28 Jun 2018 13:36:12 +0000 (15:36 +0200)]
anv: finish the binding_table_pool on destroyDevice when use_softpin
Running VK-CTS in batch execution mode was raising the
VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the
same failing tests were run isolated they always passed.
createDevice and destroyDevice were called before and after every
tests. Because the binding_table_pool was never closed, we reached the
maximum number of open file descriptors (ulimit -n) and when that
happened every call to createDevice implied a
VK_ERROR_INITIALIZATION_FAILED error.
Fixes: c7db0ed4e94dce563d722e1b098684fbd7315d51
("anv: Use a separate pool for binding tables when soft pinning")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek Olšák [Mon, 25 Jun 2018 16:34:39 +0000 (12:34 -0400)]
gallium/util: remove dummy function util_format_is_supported
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Dylan Baker [Fri, 29 Jun 2018 18:04:22 +0000 (11:04 -0700)]
docs: update calendar, add news and link release notes to 18.1.3
Dylan Baker [Fri, 29 Jun 2018 18:00:48 +0000 (11:00 -0700)]
docs: Add SHA256 sums to notes for 18.1.3
Dylan Baker [Fri, 29 Jun 2018 17:35:37 +0000 (10:35 -0700)]
docs: Add release notes for 18.1.3
Rhys Perry [Fri, 29 Jun 2018 13:51:11 +0000 (14:51 +0100)]
nv50/ir: improve maintainability of Target*::initOpInfo()
This is mainly useful for when one needs to add new opcodes in a painless
and reliable way.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Rhys Perry [Tue, 5 Jun 2018 20:09:32 +0000 (21:09 +0100)]
nv50/ir: fix image stores with indirect handles
Having this if statement here prevented the next if statement from being
reached in the case of image stores, which is needed for instructions with
indirect bindless handles like "STORE TEMP[ADDR[2].x+1](1) ...".
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Ross Burton [Thu, 28 Jun 2018 22:01:59 +0000 (23:01 +0100)]
egl: fix build race in automake
There is a parallel make build issue in src/egl/drivers/dri2/
for wayland builds. Can be reproduced with:
$ rm src/egl/drivers/dri2/*.h src/egl/drivers/dri2/platform_wayland.lo
$ make -C src/egl/ drivers/dri2/platform_wayland.lo
../../../mesa-18.1.2/src/egl/drivers/dri2/platform_wayland.c:50:10: fatal error: linux-dmabuf-unstable-v1-client-protocol.h: No such file or directory
This patch adds the missing dependency.
Fixes: 02cc359372773800de817 "egl/wayland: Use linux-dmabuf interface for buffers"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
[Eric: fixed up the commit title]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Marek Olšák [Sat, 23 Jun 2018 05:44:14 +0000 (01:44 -0400)]
radeonsi: implement vertex color clamping for tess and GS
Marek Olšák [Sat, 23 Jun 2018 05:43:12 +0000 (01:43 -0400)]
radeonsi: move VS_STATE_SGPR before draw SGPRs
for vertex color clamping.
Marek Olšák [Sat, 23 Jun 2018 05:39:02 +0000 (01:39 -0400)]
radeonsi: don't use malloc in si_generate_gs_copy_shader
Marek Olšák [Mon, 18 Jun 2018 20:03:39 +0000 (16:03 -0400)]
radeonsi: disable DCC statistics gathering on everything but Stoney
I think we don't need it on other chips.
Marek Olšák [Mon, 18 Jun 2018 20:02:14 +0000 (16:02 -0400)]
radeonsi: don't enable DCC statistics gathering for small surfaces
Marek Olšák [Mon, 18 Jun 2018 19:40:07 +0000 (15:40 -0400)]
radeonsi: simplify logic around vi_separate_dcc_try_enable
Marek Olšák [Mon, 18 Jun 2018 19:53:47 +0000 (15:53 -0400)]
radeonsi: fix memory exhaustion issue with DCC statistics gathering with DRI2
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Marek Olšák [Thu, 14 Jun 2018 02:31:21 +0000 (22:31 -0400)]
radeonsi: remove references to Evergreen
Marek Olšák [Thu, 14 Jun 2018 06:43:19 +0000 (02:43 -0400)]
radeonsi: enable shader caching for compute shaders
Compute shaders were not using the shader cache.
Marek Olšák [Thu, 14 Jun 2018 06:25:00 +0000 (02:25 -0400)]
radeonsi: store compute local_size into tgsi_shader_info
This is kinda a hack, but it's enough for the shader cache.
Marek Olšák [Thu, 14 Jun 2018 06:09:05 +0000 (02:09 -0400)]
radeonsi: unify duplicated code for initial shader compilation
Marek Olšák [Thu, 14 Jun 2018 05:27:10 +0000 (01:27 -0400)]
ac: set +auto-waitcnt-before-barrier when needed
This removes useless s_waitcnt before barriers.
Only radeonsi uses this function.
Marek Olšák [Thu, 14 Jun 2018 05:10:54 +0000 (01:10 -0400)]
radeonsi/gfx9: insert the barrier between merged shaders inside the if block
Joe M. Kniss [Thu, 21 Jun 2018 00:55:10 +0000 (17:55 -0700)]
gallium: plumb invariant output attrib thru TGSI
Add support for glsl 'invariant' modifier for output data declarations.
Gallium drivers that use TGSI serialization currently loose invariant
modifiers in glsl shaders.
v2: use boolean for invariant instead of unsigned.
Tested: chromiumos on qemu with virglrenderer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Francisco Jerez [Wed, 27 Apr 2016 02:45:41 +0000 (19:45 -0700)]
intel/fs: Build 32-wide FS shaders.
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Fri, 18 May 2018 23:39:21 +0000 (16:39 -0700)]
intel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 18 May 2018 06:26:02 +0000 (23:26 -0700)]
intel/fs: Add fields to wm_prog_data for SIMD32 dispatch
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Thu, 12 Jan 2017 03:55:33 +0000 (19:55 -0800)]
intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Mon, 9 Jan 2017 22:14:02 +0000 (14:14 -0800)]
intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:33:11 +0000 (15:33 -0800)]
intel/fs: Fix Gen6+ interpolation setup for SIMD32
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 24 May 2018 01:09:48 +0000 (18:09 -0700)]
intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS
We can just emit the MOV in the two places where we use this.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 24 May 2018 00:54:54 +0000 (17:54 -0700)]
intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround
There's no reason for us to emit it a pile of times and then have a
whole pass to clean it up. Just emit it once like we really want.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:33:45 +0000 (15:33 -0800)]
intel/fs: Generalize the unlit centroid workaround
This generalizes the unlit centroid workaround so it's less code and now
supports SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:32:05 +0000 (15:32 -0800)]
intel/fs: Fix sample id setup for SIMD32.
v2 (Jason Ekstrand):
- Disallow gl_SampleId in SIMD32 on gen7
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Sat, 14 Jan 2017 01:04:23 +0000 (17:04 -0800)]
intel/fs: Fix Gen7 compressed source region alignment restriction for SIMD32
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:40:38 +0000 (15:40 -0800)]
intel/fs: Implement 32-wide FS payload setup on Gen6+
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:36:51 +0000 (15:36 -0800)]
intel/fs: Extend thread payload layout to SIMD32
And handle 32-wide payload register reads in fetch_payload_reg().
v2 (Jason Ekstrand);
- Fix some whitespace and brace placement
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:23:48 +0000 (15:23 -0800)]
intel/fs: Wrap FS payload register look-up in a helper function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:18:07 +0000 (15:18 -0800)]
intel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaround
While we're here, we change to using horiz_offset() instead of abusing
half().
v2 (Jason Ekstrand):
- Use horiz_offset() instead of half()
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:53:00 +0000 (14:53 -0800)]
intel/fs: Simplify fs_visitor::emit_samplepos_setup
The original code manually handled splitting the MOVs to 8-wide to
handle various regioning restrictions. Now that we have a SIMD width
splitting pass that handles these things, we can just emit everything at
the full width and let the SIMD splitting pass handle it. We also now
have a useful "subscript" helper which is designed exactly for the case
where you want to take a W type and read it as a vector of Bs so we may
as well use that too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:02:05 +0000 (17:02 -0700)]
i965: Add plumbing for shader time in 32-wide FS dispatch mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:08:42 +0000 (17:08 -0700)]
intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 26 May 2018 05:23:30 +0000 (22:23 -0700)]
intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates
On g4x through Sandy Bridge, src1 (the coordinates) of the PLN
instruction is required to be an even register number. When it's odd
(which can happen with SIMD32), we have to emit a LINE+MAC combination
instead. Unfortunately, we can't just fall through to the gen4 case
because the input registers are still set up for PLN which lays out the
four src1 registers differently in SIMD16 than LINE.
v2 (Jason Ekstrand):
- Take advantage of both accumulators and emit LINE LINE MAC MAC
(Based on a patch from Francisco Jerez)
- Unify the gen4 and gen4x-6 cases using a loop
v3 (Jason Ekstrand):
- Don't unify gen4 with gen4x-6 as this turns out to be more fragile
than first thought without reworking the gen4 barycentric coordinate
layout.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 28 May 2018 16:42:49 +0000 (09:42 -0700)]
intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN
When we don't have PLN (gen4 and gen11+), we implement LINTERP as either
LINE+MAC or a pair of MADs. In both cases, the accumulator is written
by the first of the two instructions and read by the second. Even
though the accumulator value isn't actually ever used from a logical
instruction perspective, it is trashed so we need to make the scheduler
aware. Otherwise, the scheduler could end up re-ordering instructions
and putting a LINTERP between another an instruction which writes the
accumulator and another which tries to use that result.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 01:06:13 +0000 (18:06 -0700)]
intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU
operation and less like a send. This is less code over-all and, as a
side-effect, it now properly handles execution groups and lowering so
SIMD32 support just falls out.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 03:51:24 +0000 (20:51 -0700)]
intel/fs: Add the group to the flag subreg number on SNB and older
We want consistent behavior in the meaning of the flag_subreg field
between SNB and IVB+.
v2 (Jason Ekstrand):
- Add some extra commentary
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 10 Jan 2017 00:43:24 +0000 (16:43 -0800)]
intel/fs: Fix FB read header setup for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:25:37 +0000 (14:25 -0800)]
intel/fs: Fix logical FB write lowering for SIMD32
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:22:19 +0000 (14:22 -0800)]
intel/fs: Fix FB write message control codegen for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 6 Jan 2017 22:41:27 +0000 (14:41 -0800)]
intel/fs: Don't enable dual source blend if no outputs are written
This prevents a crash in some arb_enhanced_layouts tests that would be
caused by the next commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 02:28:21 +0000 (19:28 -0700)]
intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 02:20:49 +0000 (19:20 -0700)]
intel/eu: Fix pixel interpolator queries for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 6 Jan 2017 01:51:51 +0000 (17:51 -0800)]
intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
Current discard handling requires dedicating the second flag register to
discard. However, control-flow in SIMD32 requires both flag registers
so it's incompatible with the current discard handling. Just don't
support SIMD32+discard for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:29:57 +0000 (17:29 -0700)]
intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
The hardware's control flow logic is 16-wide so we're out of luck
here. We could, in theory, support SIMD32 if we know the control-flow
is uniform but we don't have that information at this point.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 21 May 2018 16:51:50 +0000 (09:51 -0700)]
intel/fs: Split instructions low to high in lower_simd_width
Commit
0d905597f fixed an issue with the placement of the zip and unzip
instructions. However, as a side-effect, it reversed the order in which
we were emitting the split instructions so that they went from high
group to low instead of low to high. This is fine for most things like
texture instructions and the like but certain render target writes
really want to be emitted low to high. This commit just switches the
order back around to be low to high.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 0d905597f "intel/fs: Be more explicit about our placement of [un]zip"
Jason Ekstrand [Fri, 18 May 2018 06:49:29 +0000 (23:49 -0700)]
intel/fs: Rework KSP data to be SIMD width-based
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 06:17:17 +0000 (23:17 -0700)]
intel/compiler: Add and use helpers for working with KSP indices
The pixel shader dispatch table is kind-of a confusing mess. This adds
some helpers for dealing with it and for easily extracting the correct
data from wm_prog_data.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 20:34:33 +0000 (13:34 -0700)]
i965: Re-arrange shader kernel setup in WM state
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:20:35 +0000 (17:20 -0700)]
intel/fs: Remove program key argument from generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 17 May 2018 15:46:03 +0000 (08:46 -0700)]
intel/fs: Set up FB write message headers in the visitor
Doing instruction header setup in the generator is awful for a number
of reasons. For one, we can't schedule the header setup at all. For
another, it means lots of implied writes which the instruction scheduler
and other passes can't properly read about. The second isn't a huge
problem for FB writes since they always happen at the end. We made a
similar change to sampler handling in
ff4726077d86.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:18:22 +0000 (14:18 -0800)]
intel/fs: Fix implied_mrf_writes() for headerless FB writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:17:20 +0000 (14:17 -0800)]
intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:16:12 +0000 (14:16 -0800)]
intel/eu: Return new instruction to caller from brw_fb_WRITE().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 01:47:19 +0000 (18:47 -0700)]
intel/fs: Pull FB write implied headers from src[0]
Now that we have the implied header in src[0] for tracking purposes, we
may as well use it in the generator. This makes things a tiny bit more
general.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 17 May 2018 22:40:48 +0000 (15:40 -0700)]
intel/fs: Properly track implied header regs read by FB writes
The FB write opcode on gen4-5 does implied copies from g0 and g1 to the
message payload. With this commit, we start tracking that as part of
the IR by having the FB write read from g0-1.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 17 May 2018 00:51:10 +0000 (17:51 -0700)]
intel/fs: FS_OPCODE_REP_FB_WRITE has side effects
It doesn't matter since we don't ever run replicated write shaders
through the optimizer but it's good to be complete.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dylan Baker [Thu, 28 Jun 2018 17:06:44 +0000 (10:06 -0700)]
docs: Add news item for mesa 18.1.2
Which I forgot to do when 18.1.2 came out.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Rhys Perry [Fri, 22 Jun 2018 20:47:43 +0000 (21:47 +0100)]
nvc0: remove magic values in nve4_set_tex_handles()
With this commit, things no longer break if NVC0_CB_AUX_TEX_INFO is
changed to anything other than 0x20.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>