Jason Ekstrand [Thu, 24 May 2018 01:09:48 +0000 (18:09 -0700)]
intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS
We can just emit the MOV in the two places where we use this.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 24 May 2018 00:54:54 +0000 (17:54 -0700)]
intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround
There's no reason for us to emit it a pile of times and then have a
whole pass to clean it up. Just emit it once like we really want.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:33:45 +0000 (15:33 -0800)]
intel/fs: Generalize the unlit centroid workaround
This generalizes the unlit centroid workaround so it's less code and now
supports SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:32:05 +0000 (15:32 -0800)]
intel/fs: Fix sample id setup for SIMD32.
v2 (Jason Ekstrand):
- Disallow gl_SampleId in SIMD32 on gen7
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Sat, 14 Jan 2017 01:04:23 +0000 (17:04 -0800)]
intel/fs: Fix Gen7 compressed source region alignment restriction for SIMD32
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:40:38 +0000 (15:40 -0800)]
intel/fs: Implement 32-wide FS payload setup on Gen6+
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:36:51 +0000 (15:36 -0800)]
intel/fs: Extend thread payload layout to SIMD32
And handle 32-wide payload register reads in fetch_payload_reg().
v2 (Jason Ekstrand);
- Fix some whitespace and brace placement
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:23:48 +0000 (15:23 -0800)]
intel/fs: Wrap FS payload register look-up in a helper function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 23:18:07 +0000 (15:18 -0800)]
intel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaround
While we're here, we change to using horiz_offset() instead of abusing
half().
v2 (Jason Ekstrand):
- Use horiz_offset() instead of half()
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:53:00 +0000 (14:53 -0800)]
intel/fs: Simplify fs_visitor::emit_samplepos_setup
The original code manually handled splitting the MOVs to 8-wide to
handle various regioning restrictions. Now that we have a SIMD width
splitting pass that handles these things, we can just emit everything at
the full width and let the SIMD splitting pass handle it. We also now
have a useful "subscript" helper which is designed exactly for the case
where you want to take a W type and read it as a vector of Bs so we may
as well use that too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:02:05 +0000 (17:02 -0700)]
i965: Add plumbing for shader time in 32-wide FS dispatch mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:08:42 +0000 (17:08 -0700)]
intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 26 May 2018 05:23:30 +0000 (22:23 -0700)]
intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates
On g4x through Sandy Bridge, src1 (the coordinates) of the PLN
instruction is required to be an even register number. When it's odd
(which can happen with SIMD32), we have to emit a LINE+MAC combination
instead. Unfortunately, we can't just fall through to the gen4 case
because the input registers are still set up for PLN which lays out the
four src1 registers differently in SIMD16 than LINE.
v2 (Jason Ekstrand):
- Take advantage of both accumulators and emit LINE LINE MAC MAC
(Based on a patch from Francisco Jerez)
- Unify the gen4 and gen4x-6 cases using a loop
v3 (Jason Ekstrand):
- Don't unify gen4 with gen4x-6 as this turns out to be more fragile
than first thought without reworking the gen4 barycentric coordinate
layout.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 28 May 2018 16:42:49 +0000 (09:42 -0700)]
intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN
When we don't have PLN (gen4 and gen11+), we implement LINTERP as either
LINE+MAC or a pair of MADs. In both cases, the accumulator is written
by the first of the two instructions and read by the second. Even
though the accumulator value isn't actually ever used from a logical
instruction perspective, it is trashed so we need to make the scheduler
aware. Otherwise, the scheduler could end up re-ordering instructions
and putting a LINTERP between another an instruction which writes the
accumulator and another which tries to use that result.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 01:06:13 +0000 (18:06 -0700)]
intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU
operation and less like a send. This is less code over-all and, as a
side-effect, it now properly handles execution groups and lowering so
SIMD32 support just falls out.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 03:51:24 +0000 (20:51 -0700)]
intel/fs: Add the group to the flag subreg number on SNB and older
We want consistent behavior in the meaning of the flag_subreg field
between SNB and IVB+.
v2 (Jason Ekstrand):
- Add some extra commentary
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 10 Jan 2017 00:43:24 +0000 (16:43 -0800)]
intel/fs: Fix FB read header setup for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:25:37 +0000 (14:25 -0800)]
intel/fs: Fix logical FB write lowering for SIMD32
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:22:19 +0000 (14:22 -0800)]
intel/fs: Fix FB write message control codegen for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 6 Jan 2017 22:41:27 +0000 (14:41 -0800)]
intel/fs: Don't enable dual source blend if no outputs are written
This prevents a crash in some arb_enhanced_layouts tests that would be
caused by the next commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 02:28:21 +0000 (19:28 -0700)]
intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 02:20:49 +0000 (19:20 -0700)]
intel/eu: Fix pixel interpolator queries for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 6 Jan 2017 01:51:51 +0000 (17:51 -0800)]
intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
Current discard handling requires dedicating the second flag register to
discard. However, control-flow in SIMD32 requires both flag registers
so it's incompatible with the current discard handling. Just don't
support SIMD32+discard for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:29:57 +0000 (17:29 -0700)]
intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
The hardware's control flow logic is 16-wide so we're out of luck
here. We could, in theory, support SIMD32 if we know the control-flow
is uniform but we don't have that information at this point.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 21 May 2018 16:51:50 +0000 (09:51 -0700)]
intel/fs: Split instructions low to high in lower_simd_width
Commit
0d905597f fixed an issue with the placement of the zip and unzip
instructions. However, as a side-effect, it reversed the order in which
we were emitting the split instructions so that they went from high
group to low instead of low to high. This is fine for most things like
texture instructions and the like but certain render target writes
really want to be emitted low to high. This commit just switches the
order back around to be low to high.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 0d905597f "intel/fs: Be more explicit about our placement of [un]zip"
Jason Ekstrand [Fri, 18 May 2018 06:49:29 +0000 (23:49 -0700)]
intel/fs: Rework KSP data to be SIMD width-based
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 06:17:17 +0000 (23:17 -0700)]
intel/compiler: Add and use helpers for working with KSP indices
The pixel shader dispatch table is kind-of a confusing mess. This adds
some helpers for dealing with it and for easily extracting the correct
data from wm_prog_data.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 20:34:33 +0000 (13:34 -0700)]
i965: Re-arrange shader kernel setup in WM state
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Tue, 26 Apr 2016 00:20:35 +0000 (17:20 -0700)]
intel/fs: Remove program key argument from generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 17 May 2018 15:46:03 +0000 (08:46 -0700)]
intel/fs: Set up FB write message headers in the visitor
Doing instruction header setup in the generator is awful for a number
of reasons. For one, we can't schedule the header setup at all. For
another, it means lots of implied writes which the instruction scheduler
and other passes can't properly read about. The second isn't a huge
problem for FB writes since they always happen at the end. We made a
similar change to sampler handling in
ff4726077d86.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:18:22 +0000 (14:18 -0800)]
intel/fs: Fix implied_mrf_writes() for headerless FB writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:17:20 +0000 (14:17 -0800)]
intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Fri, 13 Jan 2017 22:16:12 +0000 (14:16 -0800)]
intel/eu: Return new instruction to caller from brw_fb_WRITE().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 18 May 2018 01:47:19 +0000 (18:47 -0700)]
intel/fs: Pull FB write implied headers from src[0]
Now that we have the implied header in src[0] for tracking purposes, we
may as well use it in the generator. This makes things a tiny bit more
general.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 17 May 2018 22:40:48 +0000 (15:40 -0700)]
intel/fs: Properly track implied header regs read by FB writes
The FB write opcode on gen4-5 does implied copies from g0 and g1 to the
message payload. With this commit, we start tracking that as part of
the IR by having the FB write read from g0-1.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 17 May 2018 00:51:10 +0000 (17:51 -0700)]
intel/fs: FS_OPCODE_REP_FB_WRITE has side effects
It doesn't matter since we don't ever run replicated write shaders
through the optimizer but it's good to be complete.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dylan Baker [Thu, 28 Jun 2018 17:06:44 +0000 (10:06 -0700)]
docs: Add news item for mesa 18.1.2
Which I forgot to do when 18.1.2 came out.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Rhys Perry [Fri, 22 Jun 2018 20:47:43 +0000 (21:47 +0100)]
nvc0: remove magic values in nve4_set_tex_handles()
With this commit, things no longer break if NVC0_CB_AUX_TEX_INFO is
changed to anything other than 0x20.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Rhys Perry [Thu, 28 Jun 2018 13:26:33 +0000 (14:26 +0100)]
nvc0/ir: fix TargetNVC0::insnCanLoadOffset()
Previously, TargetNVC0::insnCanLoadOffset() returned whether the offset
could be set to a specific value. The IndirectPropagation pass expected
it to return whether the offset could be increased by a specific value,
which is what TargetNV50::insnCanLoadOffset() does.
Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812
("nvc0/ir: be careful about propagating very large offsets into const load")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Alok Hota [Fri, 22 Jun 2018 14:11:26 +0000 (09:11 -0500)]
swr/rast: Updating code style based on current clang-format rules
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Vinson Lee [Mon, 25 Jun 2018 14:52:19 +0000 (09:52 -0500)]
swr/rast: Fix addPassesToEmitFile usage with llvm-7.0.
Fix build error after llvm-7.0svn r332881 ("CodeGen: Add a dwo output
file argument to addPassesToEmitFile and hook it up to dwo output.").
CXX rasterizer/jitter/libmesaswr_la-JitManager.lo
rasterizer/jitter/JitManager.cpp:368:93: error: too few arguments to function call, expected at least 4, have 3
pTarget->addPassesToEmitFile(*pMPasses, filestream, TargetMachine::CGFT_AssemblyFile);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Mon, 25 Jun 2018 14:52:18 +0000 (09:52 -0500)]
swr/rast: Handling removed LLVM intrinsics in trunk
- Functionality replaced with emulated intrinsics
- Fixes Bug 106558
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Mon, 25 Jun 2018 14:52:17 +0000 (09:52 -0500)]
swr/rast: Adding SCATTERPS functionality to BuilderGfxMem
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Mon, 25 Jun 2018 14:52:16 +0000 (09:52 -0500)]
swr/rast: Adding Read/Write specifier to TranslateGfxAddress stack
- Removing unused generic translate function
- Requiring read/write specifier in builder_gfx_mem
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Chad Versace [Fri, 1 Jun 2018 02:57:55 +0000 (19:57 -0700)]
gallium: Fix automake for Android (v2)
Chromium OS uses Autotools and pkg-config when building Mesa for
Android. The gallium drivers were failing to find the headers and
libraries for zlib and Android's libbacktrace.
v2:
- Don't add a check for zlib.pc. configure.ac already checks for
zlib.pc elsewhere. [for tfiga]
- Check for backtrace.pc separately from the other Android libs.
[for tfiga]
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Wed, 27 Jun 2018 23:23:20 +0000 (09:23 +1000)]
glsl: skip comparison opt when adding vars of different size
The spec allows adding scalars with a vector or matrix. In this case
the opt was losing swizzle and size information.
This fixes a bug with Doom (2016) shaders.
Fixes: 34ec1a24d61f ("glsl: Optimize (x + y cmp 0) into (x cmp -y).")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Wed, 27 Jun 2018 21:09:51 +0000 (14:09 -0700)]
Revert "anv: Print the actual enum for ignored structure types"
This reverts commit
fda7014c35e5f5dfa26f078ad0512d13ead8b717. It was
hitting an unreachable when the sType was unknown.
Jason Ekstrand [Tue, 26 Jun 2018 20:33:29 +0000 (13:33 -0700)]
anv: Print the actual enum for ignored structure types
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jason Ekstrand [Wed, 27 Jun 2018 01:32:38 +0000 (18:32 -0700)]
i965/bufmgr: Use the correct argument order for bo_alloc_internal
The memzone and flags parameters were accidentally flipped in the call
from brw_bo_alloc_tiled_2d.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Keith Packard [Tue, 26 Jun 2018 23:01:45 +0000 (16:01 -0700)]
vulkan/wsi_common_display: Return SURFACE_LOST for fatal DRM errors
Instead of encouraging the client to re-create the swapchain and keep
going with an OUT_OF_DATE error, tell the client that further use of
the current surface will not succeed as the associated kernel objects
are no longer valid.
In particular, when a DRM lease is revoked, then the client needs to
get another lease and create a new surface for that.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Thu, 21 Jun 2018 23:39:15 +0000 (16:39 -0700)]
glsl: Make sure that packed varyings reflect always_active_io properly.
The always_active_io flag was only set according to the first variable
that got packed in, so NIR io compaction would end up compacting XFB
varyings that shouldn't move at that point.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Eric Anholt [Mon, 25 Jun 2018 20:29:42 +0000 (13:29 -0700)]
v3d: Fix Z clipping when viewport.scale[2] is negative.
Fixes:
dEQP-GLES3.functional.shaders.builtin_variable.depth_range_fragment
dEQP-GLES3.functional.shaders.builtin_variable.depth_range_vertex
Eric Anholt [Tue, 26 Jun 2018 22:58:21 +0000 (15:58 -0700)]
v3d: Convert a bunch of our "minus one" fields over to the new XML attr.
This fixes up their formatting for CLIF files and makes the code more
legible.
Eric Anholt [Tue, 26 Jun 2018 22:53:26 +0000 (15:53 -0700)]
v3d: Add pack/unpack/decode support for fields with a "- 1" modifier.
Right now, we name these fields as "field name minus one" so that your C
code obviously states what the value should be. However, it's easy enough
to handle at the codegen level with another little XML attribute, meaning
less C code and easier-to-read values in CLIF dumping and gdb as well.
(The actual CLIF format for simulator and FPGA replay takes in
pre-minus-one values, so we need it there too).
Tapani Pälli [Thu, 14 Jun 2018 11:08:11 +0000 (14:08 +0300)]
i965: small cleanup in blorp debug printing output (trivial)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tapani Pälli [Thu, 14 Jun 2018 11:08:10 +0000 (14:08 +0300)]
mesa: add a space between headers and source (trivial)
There used to be one and it looks like it was removed by
eb63640c1d.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tapani Pälli [Thu, 14 Jun 2018 11:08:09 +0000 (14:08 +0300)]
features.txt: mark some extensions as done
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Danylo Piliaiev [Thu, 21 Jun 2018 09:34:15 +0000 (12:34 +0300)]
mesa: Return number of result bits for GL_ANY_SAMPLES_PASSED_CONSERVATIVE
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106986
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Samuel Pitoiset [Tue, 26 Jun 2018 09:19:26 +0000 (11:19 +0200)]
radv: use separate bind points for the dynamic buffers
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 26 Jun 2018 20:35:04 +0000 (22:35 +0200)]
radv: remove unused 'predicated' parameter from some functions
It's always false.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 8 Jun 2018 00:02:20 +0000 (10:02 +1000)]
virgl: add ARB_texture_view support
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Jason Ekstrand [Mon, 25 Jun 2018 23:18:19 +0000 (16:18 -0700)]
nir/opt_if: Remove unneeded phis if we make progress
Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis. In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This fixes validation issues with some Vulkan CTS tests with the new
deref instructions.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Samuel Pitoiset [Fri, 22 Jun 2018 17:16:43 +0000 (19:16 +0200)]
radv: emit PIPELINESTAT_{START,STOP} events for pipeline stats queries
Ported from RadeonSI.
This appears to fix some random fails with:
dEQP-VK.query_pool.statistics_query.*
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tapani Pälli [Thu, 14 Jun 2018 08:10:20 +0000 (11:10 +0300)]
glsl: serialize data from glTransformFeedbackVaryings
While XFB has been enabled for cache, we did not serialize enough
data for the whole API to work (such as glGetProgramiv).
Fixes: 6d830940f7 "Allow shader cache usage with transform feedback"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106907
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Samuel Pitoiset [Mon, 25 Jun 2018 13:56:46 +0000 (15:56 +0200)]
radv: enable VK_EXT_shader_stencil_export
The driver already supports exporting the stencil value.
The following CTS test now pass:
dEQP-VK.pipeline.shader_stencil_export.op_replace
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 25 Jun 2018 14:22:43 +0000 (16:22 +0200)]
radv: ignore pInheritanceInfo for primary command buffers
From the Vulkan spec:
"If this is a primary command buffer, then this value is ignored."
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Andrii Simiklit [Fri, 22 Jun 2018 07:59:57 +0000 (10:59 +0300)]
i965/gen6/gs: Handle case where a GS doesn't allocate VUE
We can not use the VUE Dereference flags combination for EOT
message under ILK and SNB because the threads are not initialized
there with initial VUE handle unlike Pre-IL.
So to avoid GPU hangs on SNB and ILK we need
to avoid usage of the VUE Dereference flags combination.
(Was tested only on SNB but according to the specification
SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
the ILK must behave itself in the similar way)
v2: Approach to fix this issue was changed.
Instead of different EOT flags in the program end
we will create VUE every time even if GS produces no output.
v3: Clean up the patch.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Dave Airlie [Tue, 26 Jun 2018 00:43:14 +0000 (10:43 +1000)]
radeon: duplicate cmask surface for now.
The radeon winsys isn't linked against the ac code, I have vague
memories of this causing some problems before, for now fix the build
but just duplicating the code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Fri, 22 Jun 2018 04:02:47 +0000 (00:02 -0400)]
radeonsi: rename r600_transfer -> si_transfer
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 04:00:11 +0000 (00:00 -0400)]
radeonsi: properly set cmask_buffer in si_reallocate_texture_inplace
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 03:54:20 +0000 (23:54 -0400)]
radeonsi: remove redundant si_texture::cmask_size
cmask_buffer and surface.cmask_size can replace its role.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 03:00:56 +0000 (23:00 -0400)]
radeonsi: inline struct r600_cmask_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 02:54:59 +0000 (22:54 -0400)]
radeonsi: move CMASK size computation into ac_surface
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 02:50:51 +0000 (22:50 -0400)]
ac/surface: move cmask_size/alignment into radeon_surf
cmask_size is changed to uint32_t because it can't be greater than 4GB.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 02:16:07 +0000 (22:16 -0400)]
radeonsi: rename r600_surface -> si_surface
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 02:07:24 +0000 (22:07 -0400)]
radeonsi: rename r600_memory_object -> si_memory_object
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 02:04:33 +0000 (22:04 -0400)]
radeonsi: remove unused r600_memory_object::offset
The real offset is passed through resource_from_memobj.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 22 Jun 2018 00:41:06 +0000 (20:41 -0400)]
radeonsi: unify duplicated texture_from_handle & texture_from_memobj
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Fri, 15 Jun 2018 19:28:28 +0000 (15:28 -0400)]
radeonsi: reorder and initialize more fields in si_reallocate_texture_inplace
Some fields shouldn't be initialized, like framebuffers_bound and other stats.
It's hopefully complete now.
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Marek Olšák [Thu, 21 Jun 2018 23:19:49 +0000 (19:19 -0400)]
radeonsi: stop using lp_build_emit_llvm_unary/binary
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 22:52:47 +0000 (18:52 -0400)]
radeonsi: stop using lp_build_alloc
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 22:52:21 +0000 (18:52 -0400)]
radeonsi: use gallivm less
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 22:20:59 +0000 (18:20 -0400)]
radeonsi: stop using lp_bld_intr.h
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 22:20:59 +0000 (18:20 -0400)]
radeonsi: remove last uses of lp_build_context::undef
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 22:18:42 +0000 (18:18 -0400)]
radeonsi: stop using lp_bld_arit.h
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 22:06:23 +0000 (18:06 -0400)]
radeonsi: stop using lp_build_gather_values
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 22:03:06 +0000 (18:03 -0400)]
radeonsi: clean up some #includes
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Jun 2018 05:36:22 +0000 (01:36 -0400)]
radeonsi: clean up passing the is_monolithic flag for compilation
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Robert Foss [Wed, 18 Apr 2018 15:27:40 +0000 (17:27 +0200)]
egl/android: Add DRM node probing and filtering
This patch both adds support for probing & filtering DRM nodes
and switches away from using the GRALLOC_MODULE_PERFORM_GET_DRM_FD
gralloc call.
Currently the filtering is based just on the driver name,
and the desired name is supplied using the "drm.gpu.vendor_name"
Android property.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Rob Herring [Thu, 26 Apr 2018 14:02:01 +0000 (16:02 +0200)]
egl/android: #ifdef out flink name support
Maintaining both flink names and prime fd support which are provided by
2 different gralloc implementations is problematic because we have a
dependency on a specific gralloc implementation header.
This mostly disables the dependency on the gralloc implementation and
headers. The dependency on GRALLOC_MODULE_PERFORM_GET_DRM_FD remains for
now, but the definition is added locally to remove the header
dependency.
drm_gralloc support can be enabled by setting
BOARD_USES_DRM_GRALLOC=true in BoardConfig.mk.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Robert Foss [Fri, 4 May 2018 15:05:13 +0000 (17:05 +0200)]
gallium/util: Fix build error due to cast to different size
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Mon, 25 Jun 2018 11:34:10 +0000 (13:34 +0200)]
radv: fix HTILE metadata initialization in presence of subpass clears
If the driver ends up by performing a slow depthstencil clear,
the HTILE metadata won't be initialized correctly.
This fixes random VM faults on Polaris while running CTS
with Bas's runner. This doesn't seem to regress performance.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Gert Wollny [Thu, 31 May 2018 23:20:54 +0000 (01:20 +0200)]
r600/sb: give the scheduler more margin to find valid instructions groups
For instruction sequences that change the address register with every load
the current limit to bail out of the scheduler and reject the optimisation
was too tight, i.e. it was expected that at least one pending instruction
would be scheduled each time.
Give the scheduler more margin to sort out these load sequences by allowing
a number of rounds where no instruction is scheduled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106163
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Gert Wollny [Thu, 31 May 2018 21:25:09 +0000 (23:25 +0200)]
r600/sb: fix rotated register in while loop
This patch is based on
https://lists.freedesktop.org/archives/mesa-dev/2018-February/185805.html
Dave Airlie:
"A bunch of CTS tests led me to write
tests/shaders/ssa/fs-while-loop-rotate-value.shader_test
which r600/sb always fell over on.
GCM seems to move some of the copies into other basic blocks,
if we don't allow this to happen then it doesn't seem to schedule
them badly.
Everything I've read on SSA/phi copies say they have to happen
in parallel, so keeping them in the same basic block seems like
a good way to keep some of that property."
This patch differs from the one proposed by Dave in that it only adds
the NF_DONT_MOVE flag to copy_move instructions that are created by split_phi*
and that are located in loops.
Fixes piglit: tests/shaders/ssa/fs-while-loop-rotate-value.shader_test
(no regressions in the shader set). It also fixes all failing tests from
dEQP-GLES3.functional.shaders.loops.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Rob Clark [Sat, 23 Jun 2018 15:02:50 +0000 (11:02 -0400)]
freedreno/ir3: fix deref conversion fallout
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 23 Jun 2018 10:47:49 +0000 (06:47 -0400)]
freedreno/ir3: fix unused variable warning
Fixes: cf0c7258ee0 freedreno/a5xx: MSAA
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 22 Jun 2018 20:09:25 +0000 (16:09 -0400)]
freedreno: fix HW_ATOMIC_COUNTERS cap
This was mistakenly exposed, even though we want atomic counters to be
lowered to atomic ops on an SSBO like nearly every other GPU. Which
somehow recently started getting segfaults due to calling a null
pipe->set_hw_atomic_buffers().
Fixes a crash in stk, and probably other things.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Keith Packard [Fri, 16 Jun 2017 04:00:56 +0000 (21:00 -0700)]
radv: add VK_EXT_display_control to radv driver [v5]
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2:
Rework fence integration into the driver so that waiting for
any of a mixture of fence types (wsi, driver or syncobjs)
causes the driver to poll, while a list of just syncobjs or
just driver fences will block. When we get syncobjs for wsi
fences, we'll adapt to use them.
v3: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v4: Adapt to WSI fence API change. It now returns VkResult and
no longer has an option for relative timeouts.
v5: wsi_register_display_event and wsi_register_device_event now
use the default allocator when NULL is provided, so remove the
computation of 'alloc' here.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Keith Packard [Fri, 16 Jun 2017 04:00:56 +0000 (21:00 -0700)]
anv: add VK_EXT_display_control to anv driver [v5]
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Add extension to list in alphabetical order
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3: Adapt to WSI fence API change. It now returns VkResult and
no longer has an option for relative timeouts.
v4: wsi_register_display_event and wsi_register_device_event now
use the default allocator when NULL is provided, so remove the
computation of 'alloc' here.
v5: use zalloc2 instead of alloc2 for the WSI fence.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Keith Packard [Fri, 16 Jun 2017 04:00:56 +0000 (21:00 -0700)]
vulkan: add VK_EXT_display_control [v10]
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2: Remove DRM_CRTC_SEQUENCE_FIRST_PIXEL_OUT flag. This has
been removed from the proposed kernel API.
Add NULL parameter to drmCrtcQueueSequence ioctl as we
don't care what sequence the event was actually queued to.
v3: Adapt to pthread clock switch to MONOTONIC
v4: Fix scope for wsi_display_mode andwsi_display_connector allocs
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Use wsi_rel_to_abs_time helper function to convert relative
timeouts to absolute timeouts without causing overflow.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v6:
Change WSI fence wait function to return VkResult instead of
bool. This makes the meaning of the return value easier to
understand, and allows for the indication of failure.
Also change the WSI fence wait function to take only absolute
timeouts and not provide an option for a relative timeout. No
users wanted relative timeouts, and it's simpler if that option
isn't available.
Terminate the DPMS property loop once we've found the property.
Assert that the fence hasn't already been destroyed in
wsi_display_fence_destroy.
Rearrange the event handler function order in the file to place
routines in an easier to find order.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v7:
Adapt to API changes for surface_get_capabilities
v8:
Use wsi->alloc in register_display_event so that callers
don't have to dig out an allocator for us.
v9:
Fix a few minor formatting issues
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v10:
Use wsi->alloc if none provided in wsi_display_fence_alloc.
Now that drivers are expected to pass the allocator argument
straight through from the application, we need to check those
for NULL everywhere.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>