mesa.git
4 years agoradv/gfx10: determine the number of vertices per primitive for TES
Samuel Pitoiset [Thu, 5 Sep 2019 10:19:22 +0000 (12:19 +0200)]
radv/gfx10: determine the number of vertices per primitive for TES

This doesn't fix anything known but it's correct now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agonir/lower_io_to_vector: add flat mode
Rhys Perry [Fri, 17 May 2019 14:04:39 +0000 (15:04 +0100)]
nir/lower_io_to_vector: add flat mode

This has lower_io_to_vector try to turn variables into arrays of 4-sized
vectors when possible and fall back to the old approach when that isn't
possible.

This is so that lower_io_to_vector can guarantee that only one variable is
used for each fragment shader output.

v2: handle dual-source blending
v3: don't try to merge structs and non-32-bit types in get_flat_type()
v3: fix per-vertex inputs
v3: fix and cleanup location advancement in get_flat_type() and it's
    calling code
v4: prioritize the original mode over the flat mode
v4: don't create flat variables to merge only one variable
v5: don't skip an entire slot when encountering structs in the old mode

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agonir/lower_io_to_vector: allow FS outputs to be vectorized
Rhys Perry [Fri, 17 May 2019 10:53:32 +0000 (11:53 +0100)]
nir/lower_io_to_vector: allow FS outputs to be vectorized

v2: handle dual-source blending
v3: use a higher MAX_SLOTS

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agoradv/gfx10: make use the output usage mask when exporting NGG GS params
Samuel Pitoiset [Fri, 6 Sep 2019 08:34:35 +0000 (10:34 +0200)]
radv/gfx10: make use the output usage mask when exporting NGG GS params

It shouldn't matter much because output varyings should have been
compacted during NIR shader linking but it mirrors what the driver
does when emitting NGG GS vertex parameters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv/gfx10: account for the subpass view for the NGG GS storage
Samuel Pitoiset [Fri, 6 Sep 2019 08:32:13 +0000 (10:32 +0200)]
radv/gfx10: account for the subpass view for the NGG GS storage

If the fragment shader needs the layer index, we have to allocate
one more dword in the NGG GS storage. Found by inspection. This
doesn't fix anything known.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agopanfrost/ci: Increase timeouts
Tomeu Vizoso [Fri, 6 Sep 2019 14:17:26 +0000 (16:17 +0200)]
panfrost/ci: Increase timeouts

Sometimes LAVA jobs will timeout due to transient issues, and the Gitlab
job will fail in that case. Increase the timeouts to reduce the
likeliness of that happening and reduce false positives.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost/ci: Use special runner for LAVA jobs
Tomeu Vizoso [Fri, 6 Sep 2019 13:56:01 +0000 (15:56 +0200)]
panfrost/ci: Use special runner for LAVA jobs

So repositories don't need to be specially configured with a token to
access LAVA, store this token in a bind volume for a special runner.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost/ci: Re-add support for armhf
Tomeu Vizoso [Mon, 2 Sep 2019 06:33:11 +0000 (08:33 +0200)]
panfrost/ci: Re-add support for armhf

Now that Volt supports armhf, build again images and submit to LAVA for
RK3288.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agoradv: calculate esgs_itemsize in the shader info pass
Samuel Pitoiset [Tue, 3 Sep 2019 16:20:07 +0000 (18:20 +0200)]
radv: calculate esgs_itemsize in the shader info pass

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: calculate the GSVS vertex size in the shader info pass
Samuel Pitoiset [Tue, 3 Sep 2019 16:16:33 +0000 (18:16 +0200)]
radv: calculate the GSVS vertex size in the shader info pass

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: gather primitive ID in the shader info pass
Samuel Pitoiset [Tue, 3 Sep 2019 16:12:33 +0000 (18:12 +0200)]
radv: gather primitive ID in the shader info pass

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: gather layer in the shader info pass
Samuel Pitoiset [Tue, 3 Sep 2019 16:09:00 +0000 (18:09 +0200)]
radv: gather layer in the shader info pass

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: gather viewport in the shader info pass
Samuel Pitoiset [Tue, 3 Sep 2019 16:05:25 +0000 (18:05 +0200)]
radv: gather viewport in the shader info pass

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: gather pointsize in the shader info pass
Samuel Pitoiset [Tue, 3 Sep 2019 16:04:43 +0000 (18:04 +0200)]
radv: gather pointsize in the shader info pass

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: gather clip/cull distances in the shader info pass
Samuel Pitoiset [Tue, 3 Sep 2019 15:55:02 +0000 (17:55 +0200)]
radv: gather clip/cull distances in the shader info pass

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: move ac_fill_shader_info() to radv_nir_shader_info_pass()
Samuel Pitoiset [Tue, 3 Sep 2019 15:48:07 +0000 (17:48 +0200)]
radv: move ac_fill_shader_info() to radv_nir_shader_info_pass()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: merge radv_shader_variant_info into radv_shader_info
Samuel Pitoiset [Tue, 3 Sep 2019 15:39:23 +0000 (17:39 +0200)]
radv: merge radv_shader_variant_info into radv_shader_info

Having two different structs is useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradeon: Fix mjpeg issue for ARCTURUS
Zhu, James [Wed, 4 Sep 2019 17:59:39 +0000 (17:59 +0000)]
radeon: Fix mjpeg issue for ARCTURUS

ARCTURUS mjpeg is using direct register access.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
4 years agoradeon/vcn: add RENOIR VCN decode support
Leo Liu [Wed, 4 Sep 2019 17:27:02 +0000 (13:27 -0400)]
radeon/vcn: add RENOIR VCN decode support

It has same VCN2.x block as Navi1x

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
4 years agoglsl: Fix unroll of do{} while(false) like loops
Danylo Piliaiev [Thu, 22 Aug 2019 10:32:50 +0000 (13:32 +0300)]
glsl: Fix unroll of do{} while(false) like loops

For loops which condition is false on the first iteration
iteration count was falsely calculated under the assumption
that loop's condition is true until it becomes false, meaning
it's true at least one time.
Now such loops are reported as having 0 iteration.

Similar to the fix e71fc7f2 done in NIR.

Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agotgsi_to_nir: Remove dependency on libglsl.
Timur Kristóf [Wed, 4 Sep 2019 13:56:09 +0000 (16:56 +0300)]
tgsi_to_nir: Remove dependency on libglsl.

This commit removes the GLSL dependency in TTN by manually recording
the textures used and calling nir_lower_samplers
instead of its GL counterpart.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agonir: Carve out nir_lower_samplers from GLSL code.
Timur Kristóf [Wed, 28 Aug 2019 20:34:14 +0000 (22:34 +0200)]
nir: Carve out nir_lower_samplers from GLSL code.

Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.

This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agoradeonsi: Release storage for smda_uploads when the context is destroyed
Gert Wollny [Tue, 3 Sep 2019 17:24:09 +0000 (19:24 +0200)]
radeonsi: Release storage for smda_uploads when the context is destroyed

This fixes a memory leak in the flush code:

Direct leak of 128 byte(s) in 1 object(s) allocated from:
    #0 in __interceptor_realloc .../gcc-8.3.0/libsanitizer/asan/asan_malloc_linux.cc:105
    #1 in si_buffer_do_flush_region src/gallium/drivers/radeonsi/si_buffer.c:573
    #2 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:608
    #3 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:597

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoandroid: mesa: revert "Enable asm unconditionally"
Mauro Rossi [Sun, 14 Jul 2019 08:53:19 +0000 (10:53 +0200)]
android: mesa: revert "Enable asm unconditionally"

This patch partially reverts 20294dc ("mesa: Enable asm unconditionally, ...")

Android makefile build logic needs to disable assembler optimization
in 32bit builds to avoid text relocations for libglapi.so shared

Fixes the following build error with Android x86 32bit target:

[  0% 4/477] target SharedLib: libglapi (out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so)
FAILED: out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so
...
prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: warning: shared library text segment is not shareable
prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: error: treating warnings as errors
clang-6.0: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: 20294dc ("mesa: Enable asm unconditionally, now that gen_matypes is gone.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
4 years agoradv/gfx10: always set ballot_mask_bits to 64
Samuel Pitoiset [Tue, 27 Aug 2019 07:01:02 +0000 (09:01 +0200)]
radv/gfx10: always set ballot_mask_bits to 64

The codegen handles it and it adds the correct casts. This fixes
a bunch of LLVM validation errors when enabling Wave32 for compute.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agonir/lower_explicit_io: Handle 1 bit loads and stores
Caio Marcelo de Oliveira Filho [Wed, 28 Aug 2019 01:32:07 +0000 (18:32 -0700)]
nir/lower_explicit_io: Handle 1 bit loads and stores

Load a 32-bit value then convert to 1-bit.  Convert 1-bit to 32-bit
value, then Store it.

These cases started to appear when we changed Anvil to use derefs for
shared memory.

v2: Use `bit_size` in a couple of places we were missing.  (Jason)
    Reassign `value` instead of `src[0]`.  (Jason)

Fixes: 024a46a4079 ("anv: use derefs for shared memory access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoRevert "intel/fs: Move the scalar-region conversion to the generator."
Jason Ekstrand [Mon, 2 Sep 2019 03:12:07 +0000 (22:12 -0500)]
Revert "intel/fs: Move the scalar-region conversion to the generator."

This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f.  Now that
we're doing interpolation lowering in NIR, we can continue to stride the
FS input registers directly in the brw_fs_nir code like we did before.
This fixes SIMD32 fragment shaders which broke because lower_simd_width
depended on the 0 stride to split PLN instructions correctly.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
4 years agointel/fs: Fix FB write inst groups
Jason Ekstrand [Mon, 2 Sep 2019 02:57:05 +0000 (21:57 -0500)]
intel/fs: Fix FB write inst groups

This commit does two things.  First, it simplifies the way we compute
the FB write group bit.  There's no reason to use a ternary because
inst->group / 16 can only be 0 or 1.  Second, it fixes an order-of-
operations bug where the ternary wasn't selecting between (1 << 11) and
0 but between (1 << 11) and 0 | brw_dp_write_desc(...).

Fixes: 0d9648416 "intel/compiler: Use generic SEND for Gen7+ FB writes"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agolima/ppir: don't lower phis to scalar
Vasily Khoruzhick [Thu, 29 Aug 2019 06:09:38 +0000 (23:09 -0700)]
lima/ppir: don't lower phis to scalar

Utgard PP is vec4 architecture, so lowering phis to scalars
increases instruction count and potentially interferes with
spilling.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agofreedreno/a2xx: formats update
Jonathan Marek [Thu, 5 Sep 2019 02:34:23 +0000 (22:34 -0400)]
freedreno/a2xx: formats update

For render formats, update fd2_pipe2color to only work with HW supported
render formats, and remove the format whitelist is_format_supported. This
patch enables float render formats (which work).

For vertex/texture formats, use a generic function which translates using
the bitsize of the channels. Since we fake support for some vertex formats,
check for these in is_format_supported to avoid enabling them as sampler
formats.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a2xx: fix depth gmem restore
Jonathan Marek [Wed, 4 Sep 2019 19:23:27 +0000 (15:23 -0400)]
freedreno/a2xx: fix depth gmem restore

Use fd_gmem_restore_format() to avoid trying to use unsupported Z24S8/Z16
render formats for gmem restore.

Also apply this change to gmem2mem so it doesn't depend on fd2_pipe2color
working with depth formats.

gmem2mem/mem2gmem also doesn't need to use the swap/swizzle, since dst/src
formats are the same.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a2xx: implement polygon offset
Jonathan Marek [Thu, 5 Sep 2019 21:21:54 +0000 (17:21 -0400)]
freedreno/a2xx: implement polygon offset

Fixes failures in the following deqp tests:
dEQP-GLES2.functional.polygon_offset.*

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/a2xx: fix SRC_ALPHA_SATURATE for alpha blend function
Jonathan Marek [Thu, 5 Sep 2019 02:36:00 +0000 (22:36 -0400)]
freedreno/a2xx: fix SRC_ALPHA_SATURATE for alpha blend function

Fixes failures in the following deqp tests:
dEQP-GLES2.functional.fragment_ops.*src_alpha_saturate*

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/a2xx: ir2: update register state in scalar insert
Jonathan Marek [Thu, 5 Sep 2019 15:25:07 +0000 (11:25 -0400)]
freedreno/a2xx: ir2: update register state in scalar insert

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a2xx: ir2: fix incorrect instruction reordering
Jonathan Marek [Thu, 5 Sep 2019 15:23:53 +0000 (11:23 -0400)]
freedreno/a2xx: ir2: fix incorrect instruction reordering

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a2xx: ir2: check opcode on the right instruction in export cp
Jonathan Marek [Thu, 5 Sep 2019 15:21:16 +0000 (11:21 -0400)]
freedreno/a2xx: ir2: check opcode on the right instruction in export cp

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/a2xx: ir2: fix saturate in cp
Jonathan Marek [Thu, 5 Sep 2019 15:19:21 +0000 (11:19 -0400)]
freedreno/a2xx: ir2: fix saturate in cp

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/a2xx: ir2: set lower_fdph
Jonathan Marek [Thu, 5 Sep 2019 15:18:45 +0000 (11:18 -0400)]
freedreno/a2xx: ir2: set lower_fdph

The fdph opcode is not supported.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/a2xx: ir2: remove pointcoord y invert
Jonathan Marek [Thu, 5 Sep 2019 15:17:45 +0000 (11:17 -0400)]
freedreno/a2xx: ir2: remove pointcoord y invert

Fixes the following deqp test:
dEQP-GLES2.functional.shaders.builtin_variable.pointcoord

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/a2xx: ir2: fix lowering of instructions after float lowering
Jonathan Marek [Wed, 4 Sep 2019 19:18:09 +0000 (15:18 -0400)]
freedreno/a2xx: ir2: fix lowering of instructions after float lowering

Some instructions generated by int/bool float lowering need to be lowered
by opt_algebraic.

Fixes: 43dbd7d6
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agolima/ppir: don't lower vector {b,f}csel to scalar if condition is scalar
Vasily Khoruzhick [Fri, 30 Aug 2019 04:28:36 +0000 (21:28 -0700)]
lima/ppir: don't lower vector {b,f}csel to scalar if condition is scalar

Utgard PP has vector fcsel operation, but its condition is scalar. Add
filtering callback that checks whether {b,f}csel condition is not scalar
to lower {b,f}csel to scalar only in this case.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agonir: allow specifying filter callback in lower_alu_to_scalar
Vasily Khoruzhick [Fri, 30 Aug 2019 04:14:54 +0000 (21:14 -0700)]
nir: allow specifying filter callback in lower_alu_to_scalar

Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agoutil: android logging support
Rob Clark [Tue, 3 Sep 2019 18:43:40 +0000 (11:43 -0700)]
util: android logging support

In particular, it would be nice for failed debug_assert() msgs to show
up in logcat.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agofreedreno/ir3: allow copy propagation for relative
Rob Clark [Fri, 9 Aug 2019 16:08:20 +0000 (09:08 -0700)]
freedreno/ir3: allow copy propagation for relative

This appears to work fine (with the additional constraint of keeping the
indirect load in the same block that a0.x was loaded).

We can probably lift this restriction on earlier gens after testing.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/ir3: fix cp cmps.s opt
Rob Clark [Wed, 4 Sep 2019 18:28:26 +0000 (11:28 -0700)]
freedreno/ir3: fix cp cmps.s opt

Need to use ir3_instr_set_address(), otherwise the instruction might not
get added to the indirects table.  This becomes a problem when we turn
on copy propagation for relative accesses, as check_instr() in the sched
pass won't realize there is an indirect consumer of address register
load that is ready to be scheduled.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/ir3: assert that only single address
Rob Clark [Mon, 2 Sep 2019 17:08:37 +0000 (10:08 -0700)]
freedreno/ir3: assert that only single address

An instruction can reference only a single address register value.
Add an assert to catch bugs.

Also, address value should also be local to the same block as the
instruction.

(The one spot where changing the instruction address is actually legit
needs to clear the address first.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/ir3: fix mad copy propagation special case
Rob Clark [Fri, 30 Aug 2019 21:28:01 +0000 (14:28 -0700)]
freedreno/ir3: fix mad copy propagation special case

After the next patch enabling copy propagation for relative sources,
we'll need to dereference the n'th src in valid_flags(), so we actually
need to swap the sources before calling valid_flags().

But the logic was already a bit cumbersome, so move it into a helper
function.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/ir3: fix addr/pred spilling
Rob Clark [Mon, 12 Aug 2019 18:34:18 +0000 (11:34 -0700)]
freedreno/ir3: fix addr/pred spilling

The live_values and use_count was not being properly updated.  This
starts triggering problems with the next patch, where we allow copy
propagation for RELATIV access.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agofreedreno/ir3: cleanup "partially const" ubo srcs
Rob Clark [Thu, 8 Aug 2019 22:09:23 +0000 (15:09 -0700)]
freedreno/ir3: cleanup "partially const" ubo srcs

Move the constant part of the indirect offset into nir intrinsic base.
When we have multiple indirect accesses with different constant offsets,
this lets other opt passes clean up things to use a single address
register value.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agolima/ppir: improve regalloc spill cost calculation
Erico Nunes [Tue, 27 Aug 2019 23:09:12 +0000 (01:09 +0200)]
lima/ppir: improve regalloc spill cost calculation

Now that spilling ops can be inserted into existing instructions, it
makes sense to increase cost to spill registers that would cause the
creation of a new instruction.
Experimental results showed that penalizing too much due to this caused
worse results, however it is beneficial as a tie resolver between
registers with the same number of components.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agolima/ppir: optimizations in regalloc spilling code
Erico Nunes [Tue, 27 Aug 2019 23:07:55 +0000 (01:07 +0200)]
lima/ppir: optimizations in regalloc spilling code

Avoid creating unnecessary instructions for the load/store temp nodes
when not required, to further reduce register pressure.

The store_temp operation seems to be unable to do any spilling.
At least the offline shader seems to never output instructions accessing
swizzled components, and attempting to output that in ppir results in
errors. So, force spilled registers to allocate a full vec4 register.
This seems to be the optimal way as it is possible to always keep stores
and temps in a single instruction that can be pipelined.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agolima/ppir: mark regalloc created ssa unspillable
Erico Nunes [Mon, 26 Aug 2019 18:59:57 +0000 (20:59 +0200)]
lima/ppir: mark regalloc created ssa unspillable

One ssa created in the spillinc code in ppir_update_spilled_src was not
properly being marked 'spilled', which made it a candidate for future
spilling attempts.
Since it was being inserted by the spilling code itself, let's mark it
unspillable to avoid an infinite spilling loop.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agov3d: writes to magic registers aren't RF writes after THREND
Jose Maria Casanova Crespo [Wed, 24 Jul 2019 20:01:00 +0000 (22:01 +0200)]
v3d: writes to magic registers aren't RF writes after THREND

Shaders must not attempt to write to the register files in the last
three instructions, but that doesn't include the magic registers:

nop                  ; nop               ; thrsw; ldtmu.- *** ERROR ***
nop                  ; nop
nop                  ; nop

v2: Simplify validation rules. (Eric Anholt)
v3: Adjust validation even more. (Eric Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agointel/dri: finish proper glthread
Sergii Romantsov [Wed, 5 Jun 2019 11:33:58 +0000 (14:33 +0300)]
intel/dri: finish proper glthread

KWin was able to get NULL-context in the call
intelUnbindContext. But a call _mesa_glthread_finish
is not resistent to such case.
Case can be catched with steps:
1. Create both glx and egl contexts
2. Make glx as current
3. Make egl as current
4. Reset glx context
5. Make egl as current

Solution adds proper finishing of glthread-context
(context will be taken from the requested dri-context
for unbinding, but not from the saved current context).

Piglit-test: https://gitlab.freedesktop.org/mesa/piglit/merge_requests/87

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110814
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111271
Fixes: dca36d5516d0 (i965: Implement threaded GL support)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradv: Call nir_propagate_invariant()
Connor Abbott [Thu, 5 Sep 2019 11:57:11 +0000 (13:57 +0200)]
radv: Call nir_propagate_invariant()

Without this, invariant qualifiers don't do anything. Together with a
fix to the game, this fixes flickering in No Man's Sky.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradeonsi/nir: Don't lower constant arrays to uniforms
Connor Abbott [Mon, 2 Sep 2019 10:00:44 +0000 (12:00 +0200)]
radeonsi/nir: Don't lower constant arrays to uniforms

shader-db results:

Totals:
SGPRS: 3955968 -> 3954960 (-0.03 %)
VGPRS: 2220220 -> 2220092 (-0.01 %)
Spilled SGPRs: 11387 -> 11325 (-0.54 %)
Spilled VGPRs: 97 -> 97 (0.00 %)
Private memory VGPRs: 2528 -> 2528 (0.00 %)
Scratch size: 2656 -> 2656 (0.00 %) dwords per thread
Code Size: 76002204 -> 75994988 (-0.01 %) bytes
LDS: 740 -> 740 (0.00 %) blocks
Max Waves: 772776 -> 772787 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 16840 -> 15832 (-5.99 %)
VGPRS: 16452 -> 16324 (-0.78 %)
Spilled SGPRs: 1416 -> 1354 (-4.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 2016 -> 2016 (0.00 %)
Scratch size: 2040 -> 2040 (0.00 %) dwords per thread
Code Size: 953624 -> 946408 (-0.76 %) bytes
LDS: 303 -> 303 (0.00 %) blocks
Max Waves: 1622 -> 1633 (0.68 %)
Wait states: 0 -> 0 (0.00 %)

There were a large number of regressions in code size, but they seem to
be because NIR unrolls some loop which results in the table being
replaced by a bunch of immediates on multiplies etc. -- this bloats code
size since the table size is now included, but means that there are less
loads so it's still a net positive.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agogallium: Plumb through a way to disable GLSL const lowering
Connor Abbott [Fri, 30 Aug 2019 15:57:18 +0000 (17:57 +0200)]
gallium: Plumb through a way to disable GLSL const lowering

For radeonsi, we will prefer the NIR pass as it'll generate better code
(some index calculation and a single load vs. a load, then index
calculation, then another load) and oftentimes NIR optimization can kick
in and make all the access indices constant.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/nir: Don't lower indirects when linking
Connor Abbott [Mon, 2 Sep 2019 09:57:34 +0000 (11:57 +0200)]
st/nir: Don't lower indirects when linking

I believe this was stuck here early because otherwise
nir_opt_copy_prop_vars could undo what lower_io_to_temporaries does.
However that has since been fixed. Also, we now use scratch for large
variables so the comment is stale.

On radeonsi these are the shader-db results:

Totals:
SGPRS: 3955968 -> 3955968 (0.00 %)
VGPRS: 2220208 -> 2220220 (0.00 %)
Spilled SGPRs: 11387 -> 11387 (0.00 %)
Spilled VGPRs: 97 -> 97 (0.00 %)
Private memory VGPRs: 2528 -> 2528 (0.00 %)
Scratch size: 2656 -> 2656 (0.00 %) dwords per thread
Code Size: 76002108 -> 76002204 (0.00 %) bytes
LDS: 740 -> 740 (0.00 %) blocks
Max Waves: 772779 -> 772776 (-0.00 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 176 -> 176 (0.00 %)
VGPRS: 144 -> 156 (8.33 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 12104 -> 12200 (0.79 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 28 -> 25 (-10.71 %)
Wait states: 0 -> 0 (0.00 %)

The few small regressions are due to nir_opt_large_constants kicking in
when indirect lowering happens to result in smaller code after
optimization since the array is very simple.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/nir: Call nir_remove_unused_variables() in the opt loop
Connor Abbott [Wed, 4 Sep 2019 11:54:13 +0000 (13:54 +0200)]
st/nir: Call nir_remove_unused_variables() in the opt loop

This prevents regressions when disabling indirect lowering. Sometimes
the only use of an input array was copying it to the array created by
nir_lower_io_to_temporaries, and without lowering indirects we wouldn't
have eliminated the temporary array until after linking, which was too
late to remove unused code in the producer.

No shader-db changes with radeonsi NIR.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoac/nir: Enable nir_opt_large_constants
Connor Abbott [Fri, 30 Aug 2019 14:08:47 +0000 (16:08 +0200)]
ac/nir: Enable nir_opt_large_constants

vkpipeline-db numbers:

Totals:
SGPRS: 1740306 -> 1741322 (0.06 %)
VGPRS: 1331124 -> 1331712 (0.04 %)
Spilled SGPRs: 21201 -> 21316 (0.54 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 256 -> 256 (0.00 %) dwords per thread
Code Size: 79022628 -> 78694788 (-0.41 %) bytes
LDS: 6500 -> 6500 (0.00 %) blocks
Max Waves: 301413 -> 301302 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 53633 -> 54649 (1.89 %)
VGPRS: 53000 -> 53588 (1.11 %)
Spilled SGPRs: 3454 -> 3569 (3.33 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5284232 -> 4956392 (-6.20 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 4239 -> 4128 (-2.62 %)
Wait states: 0 -> 0 (0.00 %)

(The biggest VGPR and max wave regression is due to unrolling a loop,
which made the scheduler more aggressive, but in this case it's able to
effectively hide latency so it's actually probably a win.)

shader-db numbers with radeonsi NIR:

Totals:
SGPRS: 3526496 -> 3526512 (0.00 %)
VGPRS: 2198576 -> 2198576 (0.00 %)
Spilled SGPRs: 10463 -> 10463 (0.00 %)
Spilled VGPRs: 86 -> 86 (0.00 %)
Private memory VGPRs: 3182 -> 2528 (-20.55 %)
Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread
Code Size: 74117280 -> 74106140 (-0.02 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 775846 -> 775844 (-0.00 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 856 -> 872 (1.87 %)
VGPRS: 680 -> 680 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 654 -> 0 (-100.00 %)
Scratch size: 668 -> 0 (-100.00 %) dwords per thread
Code Size: 49652 -> 38512 (-22.44 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 182 -> 180 (-1.10 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoac/nir: Support load_constant intrinsics
Connor Abbott [Thu, 29 Aug 2019 15:28:01 +0000 (17:28 +0200)]
ac/nir: Support load_constant intrinsics

Setup a constant global variable that LLVM will stick in a .rodata
section and generate PC-relative loads for.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradv/radeonsi: Don't count read-only data when reporting code size
Connor Abbott [Thu, 29 Aug 2019 15:15:46 +0000 (17:15 +0200)]
radv/radeonsi: Don't count read-only data when reporting code size

We usually use these counts as a simple way to figure out if a change
reduces the number of instructions or shrinks an instruction. However,
since .rodata sections aren't executed, we shouldn't be counting their
size for this analysis. Make the linker return the total executable
size, and use it to report the more useful size in both drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoheaders: remove redundant GL token from GL wrapper
Heinrich Fink [Tue, 30 Jul 2019 12:59:41 +0000 (14:59 +0200)]
headers: remove redundant GL token from GL wrapper

Removing GL_FRAMEBUFFER_FLIP_Y_MESA token from glheader.h as it is now
provided by glext.h

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agospecs: Sync framebuffer_flip_y text with GL registry
Heinrich Fink [Mon, 29 Jul 2019 13:35:19 +0000 (15:35 +0200)]
specs: Sync framebuffer_flip_y text with GL registry

Sync extension spec of MESA_framebuffer_flip_y to what has been merged
upstream in the GL registry. Update now carries the accepted GL
extension no.

v2: split GL headers update off to separate commit

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agoinclude: sync GL headers with registry
Heinrich Fink [Tue, 30 Jul 2019 11:26:46 +0000 (13:26 +0200)]
include: sync GL headers with registry

Integrating headers from upstream registry [0] master branch. Effective
GL registry commit integrated:

9d534f9312e56c72df763207e449c6719576fd54

Keeping the following quirks local to Mesa:

- glext.h: BUILDING_MESA guard (see !1492)

- glxext.h: glXQueryGLXPbufferSGIX: 'int' return type (Mesa) vs while
'void' (GL registry)

- glxext.h: GLX_RENDERER_ID_MESA is still expected by some mesa tests,
even though its token has been removed from the spec (see
docs/specs/MESA_query_renderer.spec)

- glxext.h: glXGetTransparentIndexSUN / PFNGLXGETTRANSPARENTINDEXSUNPROC
argument pTransparentIndex has type 'unsigned long *' (Mesa) vs. 'long
*' (GL registry)

[0] https://github.com/KhronosGroup/OpenGL-Registry

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agoclover: Fix build after clang r370122.
Hal Gentz [Sun, 1 Sep 2019 23:31:04 +0000 (17:31 -0600)]
clover: Fix build after clang r370122.

../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp: In function ‘std::unique_ptr<clang::CompilerInstance> {anonymous}::create_compiler_instance(const clover::device&, const std::vector<std::__cxx11::basic_string<char> >&, std::string&)’:
../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:203:81: error: no matching function for call to ‘clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, const char* const*, const char* const*, clang::DiagnosticsEngine&)’
  203 |              c->getInvocation(), copts.data(), copts.data() + copts.size(), diag))
      |                                                                                 ^
In file included from /opt/llvm64/include/clang/Frontend/CompilerInstance.h:15,
                 from ../mesa/src/gallium/state_trackers/clover/llvm/codegen.hpp:37,
                 from ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:49:
/opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate: ‘static bool clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, llvm::ArrayRef<const char*>, clang::DiagnosticsEngine&)’
  157 |   static bool CreateFromArgs(CompilerInvocation &Res,
      |               ^~~~~~~~~~~~~~
/opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note:   candidate expects 3 arguments, 4 provided

Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
4 years agoscons: Add coroutines component to build.
Vinson Lee [Wed, 4 Sep 2019 07:44:22 +0000 (00:44 -0700)]
scons: Add coroutines component to build.

Fixes: d32690b43c91 ("gallivm: add coroutine pass manager support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agogallium/osmesa: Move 565 format selection checks where the rest are.
Eric Anholt [Wed, 3 Jul 2019 19:04:26 +0000 (12:04 -0700)]
gallium/osmesa: Move 565 format selection checks where the rest are.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agogallium/osmesa: Fix a race in creating the stmgr.
Eric Anholt [Wed, 3 Jul 2019 18:28:49 +0000 (11:28 -0700)]
gallium/osmesa: Fix a race in creating the stmgr.

Noticed while looking at other OSMesa bugs.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agogallium/osmesa: Introduce a test.
Eric Anholt [Wed, 3 Jul 2019 18:34:37 +0000 (11:34 -0700)]
gallium/osmesa: Introduce a test.

Given that we occasionally touch this code and probably nobody really
wants to think about it, introduce a minimal test so that we know we
haven't completely broken OSMesa.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agodocs: Mark 19.2.0-rc2 as done and push back rc3 and rc4/final
Dylan Baker [Wed, 4 Sep 2019 23:00:02 +0000 (16:00 -0700)]
docs: Mark 19.2.0-rc2 as done and push back rc3 and rc4/final

4 years agoglx: Fix SEGV due to dereferencing a NULL ptr from XCB-GLX.
Hal Gentz [Thu, 25 Jul 2019 21:40:50 +0000 (15:40 -0600)]
glx: Fix SEGV due to dereferencing a NULL ptr from XCB-GLX.

When run in optirun, applications that linked to `libGLX.so` and then
proceeded to querying Mesa for extension strings caused a SEGV in Mesa.

`glXQueryExtensionsString` was calling a chain of functions that
eventually led to `__glXQueryServerString`. This function would call
`xcb_glx_query_server_string` then `xcb_glx_query_server_string_reply`.
The latter for some unknown reason returned `NULL`. Passing this `NULL`
to `xcb_glx_query_server_string_string_length` would cause a SEGV as the
function tried to dereference it.

The reason behind the function returning `NULL` is yet to be determined,
however, simply checking that the ptr is not `NULL` resolves this. A
similar check has been added to `__glXGetString` for completeness sake,
although not immediately necessary.

In addition to that, we stumbled into a similar problem in
`AllocAndFetchScreenConfigs` which tries to access the configs to free
them if `__glXQueryServerString` fails. This, of course, SEGVs, because the
configs are yet to have been allocated. Simply continuing past the configs
if their config ptrs are `NULL` resolves this. We also switch to `calloc`
to make sure that the config ptrs are `NULL` by default, and not some
uninitialized value.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 24b8a8cfe821 "glx: implement __glXGetString, hide __glXGetStringFromServer"
Fixes: cb3610e37c4c "Import the GLX client side library, formerly from xc/lib/GL/glx. Build it "
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
4 years agoegl: Enable 10bpc EGLConfigs for platform_{device,surfaceless}
Adam Jackson [Tue, 3 Sep 2019 20:43:16 +0000 (16:43 -0400)]
egl: Enable 10bpc EGLConfigs for platform_{device,surfaceless}

It's somewhat annoying that these are so similar for so little benefit.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agoglsl: Store the precision for a function return type
Neil Roberts [Fri, 23 Aug 2019 12:24:27 +0000 (14:24 +0200)]
glsl: Store the precision for a function return type

The precision for a function return type is now stored in
ir_function_signature. This will later be useful to implement mediump
to float16 lowering. In the meantime it is also useful to catch errors
where a function is redeclared with a different precision.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agodocs: add llvmpipe features for fb_no_attach and compute shaders
Dave Airlie [Tue, 27 Aug 2019 07:12:01 +0000 (17:12 +1000)]
docs: add llvmpipe features for fb_no_attach and compute shaders

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: enable compute shaders if LLVM has coroutines
Dave Airlie [Tue, 27 Aug 2019 05:30:28 +0000 (15:30 +1000)]
llvmpipe: enable compute shaders if LLVM has coroutines

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add local memory allocation path
Dave Airlie [Tue, 27 Aug 2019 05:30:15 +0000 (15:30 +1000)]
llvmpipe: add local memory allocation path

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add compute shader parameter fetching support
Dave Airlie [Tue, 27 Aug 2019 05:28:26 +0000 (15:28 +1000)]
llvmpipe: add compute shader parameter fetching support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add compute shader images support
Dave Airlie [Tue, 27 Aug 2019 05:28:13 +0000 (15:28 +1000)]
llvmpipe: add compute shader images support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add ssbo support to compute shaders
Dave Airlie [Tue, 27 Aug 2019 05:21:48 +0000 (15:21 +1000)]
llvmpipe: add ssbo support to compute shaders

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add compute sampler + sampler view support.
Dave Airlie [Tue, 27 Aug 2019 05:17:29 +0000 (15:17 +1000)]
llvmpipe: add compute sampler + sampler view support.

This is ported from the fragment shader code.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add support for compute constant buffers.
Dave Airlie [Tue, 27 Aug 2019 05:08:19 +0000 (15:08 +1000)]
llvmpipe: add support for compute constant buffers.

This is mostly ported from the fragment shader code.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add compute pipeline statistics support.
Dave Airlie [Tue, 27 Aug 2019 05:04:28 +0000 (15:04 +1000)]
llvmpipe: add compute pipeline statistics support.

This just adds the CS invocations counter.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add grid launch
Dave Airlie [Tue, 27 Aug 2019 05:02:32 +0000 (15:02 +1000)]
llvmpipe: add grid launch

This adds the dispatch code. It creates a job for the number
of blocks in the grid, and dispatches them to the threadpool
implementation. The threadpool then calls the JIT code to
execute the coroutines.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add compute shader generation.
Dave Airlie [Tue, 27 Aug 2019 04:57:54 +0000 (14:57 +1000)]
llvmpipe: add compute shader generation.

This creates the coroutine execution environment and the
main compute shaders that get executed inside it.

Each compute shader block is executed in it's own coroutine
execution shader, which each "thread" being a coroutine executed
inside it in sequence.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: introduce variant building infrastrucutre.
Dave Airlie [Tue, 27 Aug 2019 04:50:27 +0000 (14:50 +1000)]
llvmpipe: introduce variant building infrastrucutre.

This doesn't actually build any of the shaders yet, but just
builds up the framework necessary to start building the shaders
and variants.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: introduce new state dirty tracking for compute.
Dave Airlie [Tue, 27 Aug 2019 04:43:33 +0000 (14:43 +1000)]
llvmpipe: introduce new state dirty tracking for compute.

Compute doesn't share dirty state with the fragment pipeline
so create a separate path for it.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add initial shader create/bind/destroy variants framework.
Dave Airlie [Tue, 27 Aug 2019 04:42:34 +0000 (14:42 +1000)]
llvmpipe: add initial shader create/bind/destroy variants framework.

This is mostly a port of the fragment shader framework

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add compute debug option
Dave Airlie [Tue, 27 Aug 2019 04:35:56 +0000 (14:35 +1000)]
llvmpipe: add compute debug option

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add compute jit interface.
Dave Airlie [Tue, 27 Aug 2019 04:32:46 +0000 (14:32 +1000)]
gallivm: add compute jit interface.

This adds the jit interface for compute shaders, it's based
on the fragment shader one.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add initial compute state structs
Dave Airlie [Tue, 27 Aug 2019 04:28:37 +0000 (14:28 +1000)]
llvmpipe: add initial compute state structs

These mirror the fragment shader structs, this is just a framework.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: introduce compute shader context
Dave Airlie [Tue, 27 Aug 2019 03:19:00 +0000 (13:19 +1000)]
llvmpipe: introduce compute shader context

The compute shader will need it's own context like the frag shader
has, this just introduces the framework struct and allocates/frees
for it in the right places.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add barrier support for compute shaders.
Dave Airlie [Tue, 27 Aug 2019 02:50:35 +0000 (12:50 +1000)]
gallivm: add barrier support for compute shaders.

When the code is executing an hits a barrier, it will suspend
the coroutine and return control to the coroutine dispatcher.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: add compute threadpool + mutex
Dave Airlie [Tue, 27 Aug 2019 02:45:39 +0000 (12:45 +1000)]
llvmpipe: add compute threadpool + mutex

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In order to efficiently run a number of compute blocks, use
a threadpool that just allows for jobs with unique sequential
ids to be dispatched.

4 years agogallivm: add support for compute shared memory
Dave Airlie [Sun, 21 Jul 2019 22:29:42 +0000 (08:29 +1000)]
gallivm: add support for compute shared memory

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add new compute related intrinsics
Dave Airlie [Sun, 21 Jul 2019 22:27:27 +0000 (08:27 +1000)]
gallivm: add new compute related intrinsics

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: reogranise jit pointer ordering
Dave Airlie [Wed, 26 Jun 2019 00:12:28 +0000 (10:12 +1000)]
llvmpipe: reogranise jit pointer ordering

In order to share the texture/image/sampler code with compute
shaders we need to reorg them to be at the front of context
same as draw does for vs/gs sharing.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add coroutine pass manager support
Dave Airlie [Tue, 25 Jun 2019 21:37:20 +0000 (07:37 +1000)]
gallivm: add coroutine pass manager support

coroutines require a proper pass manager, so add the passes
to the correct places

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add coroutine support files to gallivm.
Dave Airlie [Tue, 25 Jun 2019 21:36:40 +0000 (07:36 +1000)]
gallivm: add coroutine support files to gallivm.

These wrap the coroutine intrinsics and also add some higher
level wrappers around coroutine begin, end and suspend procedures

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm/flow: add counter reset for loops
Dave Airlie [Tue, 25 Jun 2019 21:35:36 +0000 (07:35 +1000)]
gallivm/flow: add counter reset for loops

This allows the counter value to be forced to a certain value

Reviewed-by: Roland Scheidegger <sroland@vmware.com>