Jason Ekstrand [Thu, 6 Dec 2018 20:31:20 +0000 (14:31 -0600)]
nir/constant_folding: Fix source bit size logic
Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.
Fixes: 35baee5dce5 "nir/constant_folding: fix incorrect bit-size check"
Fixes: 9076c4e289d "nir: update opcode definitions for different bit sizes"
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Sat, 15 Dec 2018 18:29:32 +0000 (12:29 -0600)]
nir/tgsi: Use nir_bany in ttn_kill_if
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Sun, 16 Dec 2018 06:42:01 +0000 (00:42 -0600)]
nir/lower_idiv: Use ilt instead of bit twiddling
The previous code was creating a boolean by doing an arithmetic right-
shift by 31 which produces a boolean which is true if the argument is
negative. This is the same as the expression r < 0 which is much
simpler and doesn't depend on NIR's representation of booleans.
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Sun, 16 Dec 2018 06:17:52 +0000 (22:17 -0800)]
v3d: Use the original bit size when scalarizing uniform loads.
Prevents a regression in jekstrand's 1-bit series.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Sun, 16 Dec 2018 03:42:57 +0000 (19:42 -0800)]
vc4: Use the original bit size when scalarizing uniform loads.
Prevents a regression in jekstrand's 1-bit series.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rhys Perry [Thu, 13 Dec 2018 17:03:23 +0000 (17:03 +0000)]
ac: split 16-bit ssbo loads that may not be dword aligned
Fixes: 7e7ee826982 ('ac: add support for 16bit buffer loads')
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108114
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Rhys Perry [Wed, 5 Dec 2018 13:42:47 +0000 (13:42 +0000)]
ac: refactor visit_load_buffer
This is so that we can split different types of loads more easily.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Rhys Perry [Fri, 14 Dec 2018 11:08:51 +0000 (11:08 +0000)]
nir: fix constness in nir_intrinsic_align()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Jan Vesely [Thu, 13 Dec 2018 20:53:42 +0000 (15:53 -0500)]
clover: Fix build after clang r348827
CodeGenOptions were moved to Basic.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
CC: mesa-stable@lists.freedesktop.org
Jon Turney [Fri, 14 Dec 2018 13:20:10 +0000 (13:20 +0000)]
glx: Fix compilation with GLX_USE_WINDOWSGL
Sadly, the GLX_USE_APPLEGL and GLX_USE_WINDOWSGL cases are not identical
(because GLX_USE_WINDOWSGL uses vtables rather than a maze of ifdefs)
Include <sys/time.h> again, as functions prototyped by it are used in
the GLX_USE_WINDOWSGL path.
Make the include guard around the __glxGetMscRate() definition match the
one at it's declaration again, as it's referenced from dri_common.c
which is built for GLX_USE_WINDOWSGL.
Fixes: a95ec138 ("glx: mandate xf86vidmode only for "drm" dri platforms")
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Eric Anholt [Fri, 14 Dec 2018 22:46:48 +0000 (14:46 -0800)]
v3d: Drop in a bunch of notes about performance improvement opportunities.
These have all been floating in my head, and while I've thought about
encoding them in issues on gitlab once they're enabled, they also make
sense to just have in the area of the code you'll need to work in.
Eric Anholt [Wed, 12 Dec 2018 07:02:37 +0000 (23:02 -0800)]
v3d: Do uniform pretty-printing in the QPU dump.
If you're trying to trace what's going on in a QPU dump, this will
definitely help you find your way.
Eric Anholt [Wed, 12 Dec 2018 06:43:56 +0000 (22:43 -0800)]
v3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code.
This will be a lot easier than my usual "38400.000000? that looks like a
viewport scale" decoding strategy.
Eric Anholt [Wed, 12 Dec 2018 06:31:41 +0000 (22:31 -0800)]
v3d: Move uniform pretty-printing to its own helper function.
I want to reuse it in the QPU dump.
Eric Anholt [Wed, 12 Dec 2018 06:42:13 +0000 (22:42 -0800)]
v3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms().
Follows
3954331aff23 ("vc4: Pull uinfo->data[i] dereference out to the top
of the loop.") which showed a large performance win for vc4, but also
cleans up the code a decent bit.
Eric Anholt [Tue, 11 Dec 2018 22:07:52 +0000 (14:07 -0800)]
v3d: Avoid assertion failures when removing end-of-shader instructions.
After generating VIR, we leave c->cursor pointing at the end of the
shader. If the shader had dead code at the end (for example from preamble
instructions in a shader with no side effects), we would assertion fail
that we were leaving the cursor pointing at freed memory. Since anything
following DCE should be setting up a new cursor anyway, just clear the
cursor at the start.
Eric Anholt [Sat, 8 Dec 2018 05:37:26 +0000 (21:37 -0800)]
v3d: Add support for draw indirect for GLES3.1.
In trying to enable compute shaders, I found that a bunch of deqp-gles31's
compute stuff wanted to interact with indirect dispatch. This was easy to
do on its own.
Eric Anholt [Tue, 11 Dec 2018 00:47:13 +0000 (16:47 -0800)]
v3d: Add missing flagging of SYNCB as a TSY op.
Fixes: f2e41daac577 ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")
Eric Anholt [Wed, 12 Dec 2018 00:14:03 +0000 (16:14 -0800)]
v3d: Make sure that a thrsw doesn't split a multop from its umul24.
The thrsw will invalidate rtop, just like accumulators and flags. Caught
by simulator assertions in CS imulextended/umulextended tests.
Fixes: 90269ba35333 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
Eric Anholt [Fri, 30 Nov 2018 01:06:25 +0000 (17:06 -0800)]
v3d: Add safety checks for resource_create().
This should ease my debugging next time I screw it up.
Eric Anholt [Thu, 29 Nov 2018 01:22:45 +0000 (17:22 -0800)]
v3d: Add support for texturing from linear.
Just like vc4, we have to support linear shared BOs for X11 on arbitrary
displays. When we're faced with a request to texture from one of those,
make a shadow image that we copy using the TFU at the start of the draw
call.
Eric Anholt [Thu, 29 Nov 2018 01:59:51 +0000 (17:59 -0800)]
v3d: Add support for using the TFU to do some blits.
This will be useful in particular for blits from raster to UIF for X11.
Eric Anholt [Thu, 13 Dec 2018 23:47:29 +0000 (15:47 -0800)]
v3d: Don't forget to bump the number of writes when doing TFU ops.
generatemipmap is just filling out the rest of the mipmap that's already
been written (by a mapping or a draw call), so it didn't matter. As I
reuse the TFU code for linear-to-UIF conversions, it'll start mattering.
Eric Anholt [Thu, 13 Dec 2018 23:46:51 +0000 (15:46 -0800)]
v3d: Set up the right stride for raster TFU.
I didn't have any raster images in the generatemipmap path, so the
pixels-vs-bytes mixup didn't matter here.
Eric Anholt [Fri, 14 Dec 2018 17:43:15 +0000 (09:43 -0800)]
v3d: Don't forget to wait for our TFU job before rendering from it.
Otherwise we may race to read old contents. This didn't show up in the
CTS and piglit for me, but it did once I started using the TFU to do
linear->UIF blits for X11.
Fixes: 2ebca177dc18 ("v3d: Use the TFU to do generatemipmap.")
Ilia Mirkin [Sun, 2 Dec 2018 18:19:01 +0000 (13:19 -0500)]
nvc0: always keep TSC slot 0 bound to fix TXF
Same as on nv50, the TXF op always uses the TSC bound to slot 0,
returning blank values if nothing is bound.
An earlier change arranges for the TSC entries list to always have valid
data at entry 0, so here we just make use of it.
Fixes arb_texture_buffer_object-subdata-sync among others.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sun, 2 Dec 2018 17:27:23 +0000 (12:27 -0500)]
nvc0: replace use of explicit default_tsc with entry 0
This was used for implementing FBFETCH. However that uses TXF, which
doesn't do much with a TSC. The only important bit is that sRGB-decoding
works as expected, which we can achieve since all samplers we ever
generate enable sRGB-decoding. Always point to entry 0 in the TSC table,
and ensure that even before it ever gets initialized, the sRGB-decoding
enable bit is set.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Rob Clark [Fri, 14 Dec 2018 19:35:54 +0000 (14:35 -0500)]
freedreno/a6xx: fix corrupted uniforms
For older gen's fd_wfi() is used to conditionally insert a WFI if there
hasn't already been one since last draw. But this doesn't work out well
with stateobj since the order the stateobj is evaluated might not be
what you expect. (Ie. stateobj might not be evaluated until a later
draw if there is no geometry from the current draw in a given tile.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Alex Deucher [Fri, 7 Dec 2018 21:10:33 +0000 (16:10 -0500)]
pci_ids: add new vega20 pci id
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Alex Deucher [Fri, 7 Dec 2018 21:09:16 +0000 (16:09 -0500)]
pci_ids: add new vega10 pci ids
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Rafael Antognolli [Mon, 29 Oct 2018 17:19:54 +0000 (10:19 -0700)]
i965/gen9: Add workarounds for object preemption.
Gen9 hardware requires some workarounds to disable preemption depending
on the type of primitive being emitted.
We implement this by adding a function that checks the primitive type
and number of instances right before the 3DPRIMITIVE.
For now, we just ignore blorp. The only primitive it emits is
3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can
safely leave preemption enabled when it happens. Or it will be disabled
by a previous 3DPRIMITIVE, which should be fine too.
v3:
- Apply missing workarounds for instanced rendering and line loop (Ken)
- Move workaround code to brw_draw_single_prim()
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rafael Antognolli [Mon, 29 Oct 2018 17:19:53 +0000 (10:19 -0700)]
i965/gen10+: Enable object level preemption.
Set bit when initializing context.
v3:
- Always toggle preemption bool to false before enabling it for the
first time, so the state gets emitted (Chris Wilson).
- Emit end of pipe sync with PIPE_CONTROL_RENDER_TARGET_FLUSH (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rafael Antognolli [Mon, 29 Oct 2018 17:19:52 +0000 (10:19 -0700)]
intel/genxml: Add register for object preemption.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Mon, 26 Nov 2018 18:28:02 +0000 (10:28 -0800)]
util/slab: Rename slab_mempool typed parameters to mempool
Now everything with type 'struct slab_child_pool *' is name pool, and
everything with type 'struct slab_mempool *' is named mempool.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Ian Romanick [Tue, 30 Oct 2018 16:46:26 +0000 (09:46 -0700)]
nir/phi_builder: Internal users should use nir_phi_builder_value_set_block_def too
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Christian Gmeiner [Wed, 12 Dec 2018 13:45:56 +0000 (14:45 +0100)]
etnaviv: drop redundant ctx function parameter
There is no need to have an extra ctx paramter as all the other
parameters carry all the needed information.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Kenneth Graunke [Tue, 11 Dec 2018 08:34:11 +0000 (00:34 -0800)]
genxml: Consistently use a numeric "MOCS" field
When we first started using genxml, we decided to represent MOCS as an
actual structure, and pack values. However, in many places, it was more
convenient to use a numeric value rather than treating it as a struct,
so we added secondary setters in a bunch of places as well.
We were not entirely consistent, either. Some places only had one.
Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens
only had the struct-based setters. The names were sometimes "Constant
Buffer Object Control State" instead of "Memory", making it harder to
find. Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer
packet...which is a bit redundant.
On modern hardware, MOCS is simply an index into a table, but we were
still carrying around the structure with an "Index to MOCS Table" field,
in addition to the direct numeric setters. This is clunky - we really
just want a number on new hardware.
This patch eliminates the struct-based setters, and makes the numeric
setters be consistently called "MOCS". We leave the struct definition
around on Gen7-8 for reference purposes, but it is unused.
v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Timothy Arceri [Thu, 13 Dec 2018 23:23:27 +0000 (10:23 +1100)]
nir: fix opt_if_loop_last_continue()
The pass did not correctly handle loops ending in:
if ssa_7 {
block block_8:
/* preds: block_7 */
continue
/* succs: block_1 */
} else {
block block_9:
/* preds: block_7 */
break
/* succs: block_11 */
}
The break will get eliminated by another opt but if this pass gets
called first (as it does on RADV) we ended up inserting
instructions after the break.
Fixes: 5921a19d4b0c ("nir: add if opt opt_if_loop_last_continue()")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Rob Clark [Thu, 13 Dec 2018 14:15:33 +0000 (09:15 -0500)]
freedreno/a6xx: fix resource_copy_region()
pctx->resource_copy_region() needs to fall back to sw copy for
non-renderable formats. But previously for things that we could
not use the blitter for, would fall back to 3d. Which won't work
if 3d can't render to the dst format either.
Instead rework things to fallback to fd_resource_copy_region(),
which will try 3d core and then fall back to memcpy().
Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Thu, 13 Dec 2018 14:14:48 +0000 (09:14 -0500)]
freedreno: move fd_resource_copy_region()
Code-motion prep for next patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 11 Dec 2018 17:36:52 +0000 (12:36 -0500)]
freedreno/a6xx: more blitter fixes
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 11 Dec 2018 15:59:53 +0000 (10:59 -0500)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 11 Dec 2018 16:13:49 +0000 (11:13 -0500)]
gallium/aux: add is_unorm() helper
We already had one for is_snorm() but not unorm.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Mon, 10 Dec 2018 15:40:31 +0000 (10:40 -0500)]
freedreno/a6xx: fix blitter crash
Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot
Also fixes gpu hangs with some formats that are supported, but which we
don't know what internal-format to use for the blitter, for ex
dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Thu, 13 Dec 2018 18:50:50 +0000 (13:50 -0500)]
freedreno/ir3: don't remove unused input components
Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Mon, 10 Dec 2018 15:39:28 +0000 (10:39 -0500)]
freedreno/ir3: fix crash
Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z
Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 4 Dec 2018 13:07:50 +0000 (08:07 -0500)]
freedreno: also set DUMP flag on shaders
If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP
flag as a hint to kernel that this is a useful buffer to snapshot for
debug dumps.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 30 Nov 2018 13:29:51 +0000 (08:29 -0500)]
freedreno: debug GEM obj names
With a recent enough kernel, set debug names for GEM BOs, which will
show up in $debugfs/gem
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 28 Nov 2018 13:50:19 +0000 (08:50 -0500)]
freedreno/drm: sync uapi and enable softpin
Pull in updated UAPI and use kernel API version to enable softpin.
Since MSM_SUBMIT_BO_DUMP flag was added at same time, use that to
signal to kernel that cmdstream buffers are useful to dump for
debugging/cmdstream-traces.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Eric Anholt [Tue, 11 Dec 2018 21:49:28 +0000 (13:49 -0800)]
nir: Move intel's half-float image store lowering to to nir_format.h.
I needed the same function for v3d. This was originally in
d3e046e76c06
("nir: Pull some of intel's image load/store format conversion to
nir_format.h") before we made am istake about simplifying the function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Thu, 13 Dec 2018 19:25:08 +0000 (11:25 -0800)]
Revert "intel: Simplify the half-float packing in image load/store lowering."
This reverts commit
06fbcd2cd5cc5702c9039c26d20082a99bc157bf.
nir_pack_half_2x16_split *isn't* vectorizable, it's 1-component only, thus
why we had this split-scalar code in the first place.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Thu, 13 Dec 2018 19:15:07 +0000 (11:15 -0800)]
nir: Print the format of image variables.
This helps a lot when debugging image load/store lowering on large
testcases. Unfortunately the Mesa enum name stuff is under src/mesa and
we can't get at it from the compiler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Thu, 6 Dec 2018 00:08:12 +0000 (16:08 -0800)]
mesa/st: Expose compute shaders when NIR support is advertised.
We have a NIR path, and V3D doesn't have TGSI input for compute (only what
TTN can handle for the various gallium-internal shaders).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Dave Airlie [Thu, 13 Dec 2018 03:29:04 +0000 (03:29 +0000)]
radv/xfb: fix counter buffer bounds checks.
If we gave this function 0 counter buffers, we'd still try and
access pCounterBuffers[0] as this check was incorrect.
Fixes crash with ext_transform_feedback-pipeline-basic-primgen
on zink on radv.
Fixes: 677b496b6 (radv: fix begin/end transform feedback with 0 counter buffers.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Jason Ekstrand [Fri, 29 Dec 2017 03:53:36 +0000 (19:53 -0800)]
i965: Enable nir_opt_idiv_const for 32 and 64-bit integers
The pass should work for all bit sizes but it's less clear that the
extra instructions are worth it on small integers. Also, the hardware
doesn't do mul_high on anything other than 32-bit integers and, absent
any decent mechanism for testing the pass on 8 and 16-bit types, it's
probably best to just leave it disabled for now.
Shader-db results on Sky Lake:
total instructions in shared programs:
15105795 ->
15111403 (0.04%)
instructions in affected programs: 72774 -> 78382 (7.71%)
helped: 0
HURT: 265
Note that hurt here actually means helped because we're getting rid of
integer quotient operations (which are a send on some platforms!) and
replacing them with fairly cheap ALU ops.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Jason Ekstrand [Mon, 8 Oct 2018 22:33:10 +0000 (17:33 -0500)]
i965/vec4: Implement nir_op_uadd_sat
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Ian Romanick [Sat, 6 Oct 2018 02:04:47 +0000 (21:04 -0500)]
i965/fs: Implement nir_op_uadd_sat
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Thu, 28 Dec 2017 21:06:28 +0000 (13:06 -0800)]
nir: Add a pass for lowering integer division by constants
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts. This commit adds such an optimization to NIR for easiest case
of udiv. Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Ian Romanick [Sat, 6 Oct 2018 01:22:41 +0000 (20:22 -0500)]
nir: Add a saturated unsigned integer add opcode
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Fri, 29 Dec 2017 22:38:55 +0000 (14:38 -0800)]
nir/lower_int64: Add support for [iu]mul_high
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Jason Ekstrand [Fri, 29 Dec 2017 21:49:43 +0000 (13:49 -0800)]
nir: Allow [iu]mul_high on non-32-bit types
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Emil Velikov [Tue, 11 Dec 2018 16:20:40 +0000 (16:20 +0000)]
glx: mandate xf86vidmode only for "drm" dri platforms
Currently we have the three dri "platforms" - drm, apple and windows.
Since xf86vidmode is a thing only for the drm one, adjust the
preprocessor guards and correctly check for the dependency.
v2: terminate the GLX_USE_WINDOWSGL hunk
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Fixes: 5bc509363b6 ("glx: make xf86vidmode mandatory for direct rendering")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Alejandro Piñeiro [Thu, 13 Dec 2018 14:25:57 +0000 (15:25 +0100)]
nir: remove unused variable
To avoid the following warning:
./src/compiler/nir/nir_loop_analyze.c:807:16: warning: unused variable ‘ns’ [-Wunused-variable]
nir_shader *ns = impl->function->shader;
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Erik Faye-Lund [Tue, 11 Dec 2018 14:02:53 +0000 (14:02 +0000)]
virgl: work around bad assumptions in virglrenderer
Virglrenderer does the wrong thing when given an instance divisor;
it tries to use the element-index rather than the binding-index as
the argument to glVertexBindingDivisor(). This worked fine as long
as there was a 1:1 relationship between elements and bindings,
which was the case util
19a91841c34 "st/mesa: Use Array._DrawVAO in
st_atom_array.c.".
So let's detect instance divisors, and restore a 1:1 relationship in
that case. This will make old versions of virglrenderer behave
correctly. For newer versions, we can consider making a better
interface, where the instance divisor isn't specified per element,
but rather per binding. But let's save that for another day.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 19a91841c34 "st/mesa: Use Array._DrawVAO in st_atom_array.c."
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Erik Faye-Lund [Tue, 11 Dec 2018 12:34:38 +0000 (12:34 +0000)]
virgl: wrap vertex element state in a struct
This just has one member for now; the handle. But this is about to
change.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Erik Faye-Lund [Tue, 11 Dec 2018 11:16:47 +0000 (11:16 +0000)]
virgl: simplify virgl_hw_set_index_buffer
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Erik Faye-Lund [Tue, 11 Dec 2018 11:16:35 +0000 (11:16 +0000)]
virgl: simplify virgl_hw_set_vertex_buffers
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Juan A. Suarez Romero [Thu, 13 Dec 2018 14:45:20 +0000 (15:45 +0100)]
docs: update calendar, add news item and link release notes for 18.2.7
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Juan A. Suarez Romero [Thu, 13 Dec 2018 14:42:26 +0000 (15:42 +0100)]
docs: add sha256 checksums for 18.2.7
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
e90429cc6dc5b5ada4253fc0a8517645d86a4f6c)
Juan A. Suarez Romero [Thu, 13 Dec 2018 13:58:30 +0000 (14:58 +0100)]
docs: add release notes for 18.2.7
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
419ee20097597bed77e73fd283e6d15e8dcb89e9)
Samuel Pitoiset [Wed, 12 Dec 2018 13:15:53 +0000 (14:15 +0100)]
radv: don't check if format is depth in radv_image_can_enable_hile()
This is always TRUE if htile_size is not 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 12 Dec 2018 13:15:52 +0000 (14:15 +0100)]
radv: check if addrlib enabled HTILE in radv_image_can_enable_htile()
When hile_size is 0, we can't enable HTILE. This doesn't change
anything, except not calling radv_image_alloc_htile().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 9 Oct 2018 12:15:12 +0000 (14:15 +0200)]
radv: switch on EOP when primitive restart is enabled with triangle strips
Otherwise, Yakuza hangs the GPU with DXVK. We don't know if
linetrip and pointlist are affected, so my point is to do that
only for triangle strips.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 10 Dec 2018 12:00:33 +0000 (13:00 +0100)]
radv: allow to skip DCC decompressions with the new predicate
Feral games aren't affected because they don't decompress DCC.
F1 2018 has one DCC decompression per frame, but I don't see
any performance improvements. This new predicate will be
probably more useful for DCC/MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 10 Dec 2018 11:57:34 +0000 (12:57 +0100)]
radv: add a predicate for reflecting DCC decompression state
It's somehow similar to the FCE predicate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jordan Justen [Mon, 12 Nov 2018 02:01:56 +0000 (18:01 -0800)]
i965/compute: Emit GPGPU_WALKER in genX_state_upload
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jordan Justen [Mon, 12 Nov 2018 01:46:33 +0000 (17:46 -0800)]
i965/genX_state: Add register access functions
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Wed, 12 Dec 2018 19:29:29 +0000 (11:29 -0800)]
intel: Simplify the half-float packing in image load/store lowering.
This was noted by Jason in review when I tried to make a helper for the
old path.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Tue, 11 Dec 2018 21:49:28 +0000 (13:49 -0800)]
nir: Pull some of intel's image load/store format conversion to nir_format.h
I needed the same functions for v3d. Note that the color value in the
Intel lowering has already been cut down to image.chans num_components.
v2: Drop the half float one, since it was a 1-liner after cleanup.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Anholt [Tue, 11 Dec 2018 21:40:54 +0000 (13:40 -0800)]
nir: Add some more consts to the nir_format_convert.h helpers.
Most of the bits were constant, but a few were missed. Avoids warnings
from v3d's upcoming static const bits declarations.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Timothy Arceri [Mon, 26 Nov 2018 01:05:00 +0000 (12:05 +1100)]
nir: detect more induction variables
This allows loop analysis to detect inductions variables that
are incremented in both branches of an if rather than in a main
loop block. For example:
loop {
block block_1:
/* preds: block_0 block_7 */
vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20
vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4
vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4
vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21
vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22
vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9
vec1 32 ssa_14 = ige ssa_8, ssa_5
/* succs: block_2 block_3 */
if ssa_14 {
block block_2:
/* preds: block_1 */
break
/* succs: block_8 */
} else {
block block_3:
/* preds: block_1 */
/* succs: block_4 */
}
block block_4:
/* preds: block_3 */
vec1 32 ssa_15 = ilt ssa_6, ssa_8
/* succs: block_5 block_6 */
if ssa_15 {
block block_5:
/* preds: block_4 */
vec1 32 ssa_16 = iadd ssa_8, ssa_7
vec1 32 ssa_17 = load_const (0x3f800000 /* 1.000000*/)
/* succs: block_7 */
} else {
block block_6:
/* preds: block_4 */
vec1 32 ssa_18 = iadd ssa_8, ssa_7
vec1 32 ssa_19 = load_const (0x3f800000 /* 1.000000*/)
/* succs: block_7 */
}
block block_7:
/* preds: block_5 block_6 */
vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18
vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4
vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19
/* succs: block_1 */
}
Unfortunatly GCM could move the addition out of the if for us
(making this patch unrequired) but we still cannot enable the GCM
pass without regressions.
This unrolls a loop in Rise of The Tomb Raider.
vkpipeline-db results (VEGA):
Totals from affected shaders:
SGPRS: 88 -> 96 (9.09 %)
VGPRS: 56 -> 52 (-7.14 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2168 -> 4560 (110.33 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 4 -> 4 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211
Timothy Arceri [Mon, 26 Nov 2018 01:04:35 +0000 (12:04 +1100)]
nir: reword code comment
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Timothy Arceri [Sun, 25 Nov 2018 23:14:28 +0000 (10:14 +1100)]
nir: in loop analysis track actual control flow type
This will allow us to improve analysis to find more induction
variables.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Danylo Piliaiev [Sun, 25 Nov 2018 23:59:52 +0000 (10:59 +1100)]
nir: add if opt opt_if_loop_last_continue()
Removing the last continue can allow more loops to unroll. Also
inserting code into the if branch can allow the various if opts
to progress further.
The insertion of some loops into the if branch also reduces VGPR
use in some shaders.
vkpipeline-db results (VEGA):
Totals from affected shaders:
SGPRS: 6552 -> 6576 (0.37 %)
VGPRS: 6544 -> 6532 (-0.18 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 481952 -> 478032 (-0.81 %) bytes
LDS: 13 -> 13 (0.00 %) blocks
Max Waves: 241 -> 242 (0.41 %)
Wait states: 0 -> 0 (0.00 %)
Shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 168 -> 168 (0.00 %)
VGPRS: 144 -> 140 (-2.78 %)
Spilled SGPRs: 157 -> 157 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 8524 -> 8488 (-0.42 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 7 -> 7 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
v2: (Timothy Arceri):
- allow for continues in either branch
- move any trailing loops inside the if as well as blocks.
- leave nir_opt_trivial_continues() to actually remove the
continue.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211
Timothy Arceri [Thu, 15 Nov 2018 10:28:31 +0000 (21:28 +1100)]
nir: rework force_unroll_array_access()
Here we rework force_unroll_array_access() so that we can reuse
the induction variable detection in a following patch.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Timothy Arceri [Thu, 15 Nov 2018 08:51:20 +0000 (19:51 +1100)]
nir: factor out some of the complex loop unroll code to a helper
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Jordan Justen [Tue, 27 Nov 2018 23:39:10 +0000 (15:39 -0800)]
docs: Document GitLab merge request process (email alternative)
This documents a process for using GitLab Merge Requests as an second
way to submit code changes for Mesa. Only one of the two methods is
allowed for each patch series.
We will *not* require all patches to be emailed. Some code changes may
be reviewed and merged without any discussion on the mesa-dev email
list.
v2:
* No longer require email. Allow submitter to choose email or a
GitLab merge request.
* Various feedback from Brian, Daniel, Dylan, Eric, Erik, Jason,
Matt, Michel and Rob.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Rob Clark <robdclark@gmail.com>
Rhys Kidd [Wed, 12 Dec 2018 07:45:18 +0000 (02:45 -0500)]
meson: libfreedreno depends upon libdrm (for fence support)
Error message building freedreno Gallium driver with meson:
../src/gallium/drivers/freedreno/freedreno_fence.c:27:21: fatal error: libsync.h: No such file or directory
\#include <libsync.h>
Fixes: 4aa69cc4257 ("meson: build freedreno")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Jason Ekstrand [Mon, 29 Oct 2018 17:00:40 +0000 (12:00 -0500)]
nir: Document the function inlining process
This has thrown a few people off recently and it's good to have the
process and all the rational for it documented somewhere. A comment at
the top of nir_inline_functions seems as good a place as any.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 10 Feb 2018 18:52:51 +0000 (10:52 -0800)]
intel/blorp: Assert that we don't re-layout a compressed surface
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Tue, 13 Feb 2018 03:34:48 +0000 (19:34 -0800)]
anv/pipeline: Set the correct binding count for compute shaders
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Samuel Pitoiset [Tue, 11 Dec 2018 12:53:05 +0000 (13:53 +0100)]
radv: bump reported version to 1.1.90
After going through the spec changelog, it looks like RADV
is up to date. Note that ANV also reports 1.1.90.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Erik Faye-Lund [Mon, 10 Dec 2018 14:57:32 +0000 (14:57 +0000)]
virgl: force linear texturing support
When I made sure that half-float texture-filtering was required for ES3,
I didn't realize that virgl doesn't report support for this correctly.
This regressed the GLES version available on top of several drivers,
including i965 from 3.2 to 2.0.
This is going to need protocol changes to fix properly, so let's just
restore the previous behavior by enabling floating-point filtering
unconditionally for now.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: fcf9fcee3c8 "mesa/main: do not require float-texture filtering for es3"
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Iago Toral Quiroga [Mon, 28 May 2018 11:03:24 +0000 (13:03 +0200)]
intel/compiler: do not copy-propagate strided regions to ddx/ddy arguments
The implementation of these opcodes in the generator assumes that their
arguments are packed, and it generates register regions based on that
assumption.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Wed, 3 Oct 2018 03:04:09 +0000 (22:04 -0500)]
anv: Advertise support for MinLod on Skylake+
These are usually used for dealing with sparse resources but there's no
reason why we can't hook them up before we have sparse. We have the
hardware; let's light it up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Thu, 11 Oct 2018 20:57:50 +0000 (15:57 -0500)]
intel/fs: Support min_lod parameters on texture instructions
We have to lower some shadow instructions because they don't exist in
hardware and we have to lower txb+offset+clamp because the message gets
too big and we run into the sampler message length limit of 11 regs.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Thu, 11 Oct 2018 19:14:29 +0000 (14:14 -0500)]
nir/lower_tex: Add lowering for some min_lod cases
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Thu, 11 Oct 2018 19:27:26 +0000 (14:27 -0500)]
nir/lower_tex: Modify txd instructions instead of replacing them
I don't know if one is better than the other or not but this approach
has the advantage that we never forget to copy information over and
we're not hard-coding quite as many assumptions. It's also a lot
simpler and much less code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Thu, 11 Oct 2018 17:56:21 +0000 (12:56 -0500)]
nir/lower_tex: Simplify lower_gradient logic
Instead of having to call two different lower_gradient functions based
on whether or not it's a cube, just make lower_gradient handle cubes.
This significantly simplifies some of the logic.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Wed, 3 Oct 2018 02:15:47 +0000 (21:15 -0500)]
spirv: Add support for MinLod
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>