mesa.git
8 years agonir/lower-tex: add srgb->linear lowering
Rob Clark [Tue, 19 Apr 2016 12:28:22 +0000 (08:28 -0400)]
nir/lower-tex: add srgb->linear lowering

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonir/builder: const'ify swiz param
Rob Clark [Tue, 19 Apr 2016 19:44:25 +0000 (15:44 -0400)]
nir/builder: const'ify swiz param

No need for it not to be const, and lets caller declare it const if
desired.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agonir/lower-tex: make options a local var
Rob Clark [Tue, 19 Apr 2016 11:46:50 +0000 (07:46 -0400)]
nir/lower-tex: make options a local var

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno: cleanup fd_set_sampler_views
Rob Clark [Tue, 19 Apr 2016 19:52:18 +0000 (15:52 -0400)]
freedreno: cleanup fd_set_sampler_views

The separate FS/VS entrypoints are no longer used since a3ed98f.  So
just inline them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: improved lowering for LRP
Russell King [Wed, 13 Apr 2016 22:42:42 +0000 (18:42 -0400)]
tgsi/lowering: improved lowering for LRP

Provide an improved lowering for LRP, which can be implemented in two
MAD instructions with a bit of rearranging of the equation, rather
than the literal implementation of two multiplies, an add and a
subtract.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: improved lowering for XPD
Russell King [Wed, 13 Apr 2016 22:42:41 +0000 (18:42 -0400)]
tgsi/lowering: improved lowering for XPD

Improve XPD lowering to consume less instructions by using the
MAD instruction to perform the multiply and subtraction together.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: add support for lowering TRUNC
Russell King [Wed, 13 Apr 2016 22:42:40 +0000 (18:42 -0400)]
tgsi/lowering: add support for lowering TRUNC

Add support for lowering TRUNC using the following sequence:

FRC tmpA, |src|
SUB tmpA, |src|, tmpA
CMP dst, -tmpA, tmpA

Note that this is incompatible with FRC lowering.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: add support for lowering FLR and CEIL
Russell King [Wed, 13 Apr 2016 22:42:39 +0000 (18:42 -0400)]
tgsi/lowering: add support for lowering FLR and CEIL

Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD
instructions for GPUs that support FRC but not FLR or CEIL.  Since
these uses FRC, it is invalid to ask for FLR or CEIL to be lowered
along with FRC, so add an assert to catch this invalid configuration.

We also need to deal with FLR instructions emitted by the lowering
code.  Fix these up with the FRC+SUB equivalent when FLR lowering is
enabled.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agoradeonsi: enable TGSI support cap for compute shaders
Bas Nieuwenhuizen [Sat, 19 Mar 2016 14:16:50 +0000 (15:16 +0100)]
radeonsi: enable TGSI support cap for compute shaders

v2: Use chip_class instead of family.

v3: Check kernel version for SI.

v4: Preemptively allow amdgpu winsys for SI.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Consider input SGPR count for compute shader SGPR count.
Bas Nieuwenhuizen [Tue, 19 Apr 2016 12:08:13 +0000 (14:08 +0200)]
radeonsi: Consider input SGPR count for compute shader SGPR count.

si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then
recompute the rsrc1 register to use the new SGPR count.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add CE synchronization for compute dispatches.
Bas Nieuwenhuizen [Tue, 19 Apr 2016 11:52:32 +0000 (13:52 +0200)]
radeonsi: Add CE synchronization for compute dispatches.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agomesa/st: enable compute shaders if images are also supported
Bas Nieuwenhuizen [Sat, 2 Apr 2016 11:39:54 +0000 (13:39 +0200)]
mesa/st: enable compute shaders if images are also supported

v2: Also depend on atomic counters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: clean up compute flush
Bas Nieuwenhuizen [Sat, 2 Apr 2016 09:37:06 +0000 (11:37 +0200)]
radeonsi: clean up compute flush

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: do not do two full flushes on every compute dispatch
Bas Nieuwenhuizen [Sun, 27 Mar 2016 09:14:34 +0000 (11:14 +0200)]
radeonsi: do not do two full flushes on every compute dispatch

v2: Add more CS_PARTIAL_FLUSH events.

Essentially every place with waits on finishing for pixel shaders
also has a write after read hazard with compute shaders.

Invalidating L2 waits implicitly on pixel and compute shaders,
so, we don't need a CS_PARTIAL_FLUSH for switching FBO.

v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2.

According to Marek the INV_GLOBAL_L2 events don't wait for compute
shaders to finish, so wait for them explicitly.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: split setting graphics and compute descriptors
Bas Nieuwenhuizen [Sat, 19 Mar 2016 12:56:29 +0000 (13:56 +0100)]
radeonsi: split setting graphics and compute descriptors

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: split texture decompression for compute shaders
Bas Nieuwenhuizen [Sat, 19 Mar 2016 17:41:20 +0000 (18:41 +0100)]
radeonsi: split texture decompression for compute shaders

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: update predicate condition for compute dispatches
Bas Nieuwenhuizen [Tue, 5 Apr 2016 15:38:38 +0000 (17:38 +0200)]
radeonsi: update predicate condition for compute dispatches

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: implement TGSI compute dispatch
Bas Nieuwenhuizen [Sat, 19 Mar 2016 14:15:20 +0000 (15:15 +0100)]
radeonsi: implement TGSI compute dispatch

v2: - Use radeon_set_sh_reg_seq.
    - Set predicate bit for conditional rendering.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: only emit compute shader state when switching shaders
Bas Nieuwenhuizen [Sat, 2 Apr 2016 11:19:42 +0000 (13:19 +0200)]
radeonsi: only emit compute shader state when switching shaders

v2: - Do check if anything changed earlier
    - Use emitted_program instead of emitted_bo to prevent
      shaders with shader->bo = NULL confusing the check
    - Use radeon_set_sh_reg*

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: rework compute scratch buffer
Bas Nieuwenhuizen [Sat, 2 Apr 2016 11:04:18 +0000 (13:04 +0200)]
radeonsi: rework compute scratch buffer

Instead of having a scratch buffer per program, have one per
context.

Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.

v2: Fix style issue.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: do per cs setup for compute shaders once per cs
Bas Nieuwenhuizen [Sat, 2 Apr 2016 10:35:36 +0000 (12:35 +0200)]
radeonsi: do per cs setup for compute shaders once per cs

Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.

v2: - Use radeon_set_sh_reg_seq.
    - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: don't pass scratch buffer to user SGPRs
Bas Nieuwenhuizen [Sat, 2 Apr 2016 10:48:05 +0000 (12:48 +0200)]
radeonsi: don't pass scratch buffer to user SGPRs

As far as I can see we use relocations for clover too.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: split input upload off from si_launch_grid
Bas Nieuwenhuizen [Sat, 2 Apr 2016 10:04:15 +0000 (12:04 +0200)]
radeonsi: split input upload off from si_launch_grid

Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.

v2: - Clarified commit message.
    - Use radeon_set_sh_reg_seq.
    - Also upload input buffer for clover kernels, even when
      input_size is 0, as it contains grid parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: implement TGSI compute shader creation
Bas Nieuwenhuizen [Sat, 19 Mar 2016 13:07:33 +0000 (14:07 +0100)]
radeonsi: implement TGSI compute shader creation

v2: Moved scratch_enabled initialization after compile.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: update shader count for compute shaders
Bas Nieuwenhuizen [Sat, 19 Mar 2016 12:54:55 +0000 (13:54 +0100)]
radeonsi: update shader count for compute shaders

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: set maximum work group size based on block size
Bas Nieuwenhuizen [Mon, 28 Mar 2016 01:01:56 +0000 (03:01 +0200)]
radeonsi: set maximum work group size based on block size

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: implement shared atomics
Bas Nieuwenhuizen [Tue, 29 Mar 2016 11:20:26 +0000 (13:20 +0200)]
radeonsi: implement shared atomics

v2: - Use single region
    - Use get_memory_ptr

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: implement shared memory load/store
Bas Nieuwenhuizen [Tue, 29 Mar 2016 11:17:40 +0000 (13:17 +0200)]
radeonsi: implement shared memory load/store

v2: - Use single region
    - Combine address calculation

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: add shared memory
Bas Nieuwenhuizen [Tue, 29 Mar 2016 15:51:49 +0000 (17:51 +0200)]
radeonsi: add shared memory

Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.

v2: - Use ctx->i8.
    - Dropped null-check for declare_memory_region.
    - Changed memory region array to single region.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: lower compute shader arguments
Bas Nieuwenhuizen [Thu, 17 Mar 2016 13:12:21 +0000 (14:12 +0100)]
radeonsi: lower compute shader arguments

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: Use CE for all descriptors.
Bas Nieuwenhuizen [Thu, 10 Mar 2016 20:39:20 +0000 (21:39 +0100)]
radeonsi: Use CE for all descriptors.

v2: Load previous list for new CS instead of re-emitting
    all descriptors.

v3: Do radeon_add_to_buffer_list in si_ce_upload.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium/util: Add u_bit_scan_consecutive_range64.
Bas Nieuwenhuizen [Wed, 13 Apr 2016 21:30:55 +0000 (23:30 +0200)]
gallium/util: Add u_bit_scan_consecutive_range64.

For use by radeonsi.

v2: Make sure that it works for all 64 bits set.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Replace list_dirty with a mask.
Bas Nieuwenhuizen [Thu, 14 Apr 2016 23:00:41 +0000 (01:00 +0200)]
radeonsi: Replace list_dirty with a mask.

We can then upload only the dirty ones with the constant engine.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add CE uploader.
Bas Nieuwenhuizen [Thu, 10 Mar 2016 20:23:49 +0000 (21:23 +0100)]
radeonsi: Add CE uploader.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Allocate chunks of CE ram.
Bas Nieuwenhuizen [Thu, 10 Mar 2016 20:19:37 +0000 (21:19 +0100)]
radeonsi: Allocate chunks of CE ram.

v2: Use 32 byte alignment.

v3: Don't allocate CE space for vertex buffer descriptors.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add CE synchronization.
Bas Nieuwenhuizen [Thu, 10 Mar 2016 20:01:39 +0000 (21:01 +0100)]
radeonsi: Add CE synchronization.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add CE packet definitions.
Bas Nieuwenhuizen [Thu, 10 Mar 2016 19:59:16 +0000 (20:59 +0100)]
radeonsi: Add CE packet definitions.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Create CE IB.
Bas Nieuwenhuizen [Wed, 13 Apr 2016 20:31:17 +0000 (22:31 +0200)]
radeonsi: Create CE IB.

Based on work by Marek Olšák.

v2: Add preamble IB.

Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.

The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.

v3: - Fix code style.
    - Remove needed space for vertex buffer descriptors.
    - Fail when the preamble cannot be created.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: Enlarge const IB size.
Bas Nieuwenhuizen [Thu, 14 Apr 2016 00:11:07 +0000 (02:11 +0200)]
winsys/amdgpu: Enlarge const IB size.

Necessary to prevent performance regressions due to extra flushing.

Probably should enlarge it even further when also updating
uniforms through the CE, but this seems large enough for now.

v2: Add preamble IB.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: Add support for const IB.
Marek Olšák [Sat, 8 Aug 2015 12:02:02 +0000 (14:02 +0200)]
winsys/amdgpu: Add support for const IB.

v2: Use the correct IB to update request (Bas Nieuwenhuizen)
v3: Add preamble IB. (Bas Nieuwenhuizen)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/amdgpu: split IB data into a new structure in preparation for CE
Marek Olšák [Sat, 8 Aug 2015 11:27:38 +0000 (13:27 +0200)]
winsys/amdgpu: split IB data into a new structure in preparation for CE

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/radeon: move ring_type into winsyses
Marek Olšák [Sat, 8 Aug 2015 12:12:10 +0000 (14:12 +0200)]
gallium/radeon: move ring_type into winsyses

Not used by drivers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agollvmpipe: Call LLVMShutdown before exiting.
Jose Fonseca [Tue, 19 Apr 2016 11:08:01 +0000 (12:08 +0100)]
llvmpipe: Call LLVMShutdown before exiting.

So that LLVM frees its globals.

Trivial.

8 years agollvmpipe: Avoid LLVMGetGlobalContext in tests.
Jose Fonseca [Tue, 19 Apr 2016 11:07:16 +0000 (12:07 +0100)]
llvmpipe: Avoid LLVMGetGlobalContext in tests.

Trivial.

8 years agollvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT.
Jose Fonseca [Fri, 15 Apr 2016 10:02:06 +0000 (11:02 +0100)]
llvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT.

64bits MSVCRT's exp2f(-inf) returns -inf instead of 0.  Tested with
MSVC 2013's CRT.  (I haven't tried 2015 yet.)

Also this does not happen with MinGW.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agollvmpipe: Test more vector lengths.
Jose Fonseca [Fri, 15 Apr 2016 13:41:19 +0000 (14:41 +0100)]
llvmpipe: Test more vector lengths.

All power of two of up native vector length.

There is actually a bug in lp_build_round for v2, whereby it doesn't
round to nearest.  Fixing is left to the future, but the test is now
able to expect it to fail.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallivm: Avoid llvm::sys::getProcessTriple().
Jose Fonseca [Fri, 15 Apr 2016 11:05:09 +0000 (12:05 +0100)]
gallivm: Avoid llvm::sys::getProcessTriple().

Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3
onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway,

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallivm: Remove lp_get_module_id.
Jose Fonseca [Fri, 15 Apr 2016 10:27:02 +0000 (11:27 +0100)]
gallivm: Remove lp_get_module_id.

Just keep a copy of the module_name in gallivm.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallivm: Fix MCJIT with LLVM 3.3.
Jose Fonseca [Thu, 14 Apr 2016 15:47:14 +0000 (16:47 +0100)]
gallivm: Fix MCJIT with LLVM 3.3.

One needs to call setJITMemoryManager for LLVM 3.3, instead of
setMCJITMemoryManager.

This regressed in commits 065256df/75ad4fe7 when trying to make the
code to build with LLVM 3.6.

Tested MCJIT with LLVM 3.3 to 3.6.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallivm: Make MCJIT a runtime option.
Jose Fonseca [Thu, 14 Apr 2016 12:42:37 +0000 (13:42 +0100)]
gallivm: Make MCJIT a runtime option.

On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.

The new option is GALLIVM_MCJIT.

Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows.  I'm almost certain this used to
work, so there probably is a regression somewhere.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoscons: Show the unit test full path.
Jose Fonseca [Thu, 14 Apr 2016 12:41:33 +0000 (13:41 +0100)]
scons: Show the unit test full path.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallivm: Use LLVMSetTarget.
Jose Fonseca [Thu, 14 Apr 2016 11:32:32 +0000 (12:32 +0100)]
gallivm: Use LLVMSetTarget.

Instead of LLVM C++ interfaces.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallivm: Use LLVMPrintValueToString where available.
Jose Fonseca [Thu, 14 Apr 2016 10:13:55 +0000 (11:13 +0100)]
gallivm: Use LLVMPrintValueToString where available.

And llvm::raw_string_ostream where not (LLVM 3.3).

Thereby eliminating yet another dependency on unstable LLVM interfaces.

As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which
was disabled, probably due to C++ issues.)

Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallium/tests: Update UTIL_FORMAT_MAX_* defines.
Jose Fonseca [Fri, 15 Apr 2016 14:02:02 +0000 (15:02 +0100)]
gallium/tests: Update UTIL_FORMAT_MAX_* defines.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoRevert "nv50/ra: `isinf()` is in namespace `std` since C++11."
Jose Fonseca [Tue, 19 Apr 2016 10:22:45 +0000 (11:22 +0100)]
Revert "nv50/ra: `isinf()` is in namespace `std` since C++11."

This reverts commit f525db6358fbaa7b4296d2e6484e0b1ae703ac78.

It was superseeded by commit 649704f1f7c9e1d0990d34a76154b2eb656bee42.

8 years agovc4: Fix fbo-generatemipmap-formats for NPOT.
Eric Anholt [Mon, 18 Apr 2016 21:03:39 +0000 (14:03 -0700)]
vc4: Fix fbo-generatemipmap-formats for NPOT.

Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but
we only get one value to control the stride of the src and dst for single
sampled buffers.  A RCL tile blit from level != 1 to level == 0 would
therefore load from the wrong stride.

8 years agovc4: Remove unused "immediates" field
Eric Anholt [Tue, 19 Jan 2016 22:18:21 +0000 (14:18 -0800)]
vc4: Remove unused "immediates" field

This was for TGSI, which we no longer have to deal with.

8 years agoi965: Define miptree map functions static (trivial)
Ben Widawsky [Tue, 2 Feb 2016 22:51:09 +0000 (14:51 -0800)]
i965: Define miptree map functions static (trivial)

They were already declared as such. It was changed here:
commit 31f0967fb50101437d2568e9ab9640ffbcbf7ef9
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Sep 2 14:43:18 2015 -0700

    i965: Make intel_miptree_map_raw static

Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
8 years agoglsl: Properly handle ldexp(0.0f, non-zero-exp).
Matt Turner [Wed, 13 Apr 2016 01:24:06 +0000 (18:24 -0700)]
glsl: Properly handle ldexp(0.0f, non-zero-exp).

8 years agogallivm: convert size query to using a set of parameters.
Dave Airlie [Mon, 18 Apr 2016 01:45:12 +0000 (11:45 +1000)]
gallivm: convert size query to using a set of parameters.

This isn't currently that easy to expand, so fix it up
before expanding it later to include dynamic samplers.

[airlied: use some local variables (Roland)]

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoswr: dereference cbuf/zbuf/views on context destroy
Tim Rowley [Mon, 18 Apr 2016 20:09:17 +0000 (15:09 -0500)]
swr: dereference cbuf/zbuf/views on context destroy

Fixes resource memory leaks.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agofreedreno/ir3: fix grouping issue w/ reverse swizzles
Rob Clark [Mon, 18 Apr 2016 15:32:40 +0000 (11:32 -0400)]
freedreno/ir3: fix grouping issue w/ reverse swizzles

When we have something like:

   MOV OUT[n], IN[m].wzyx

the existing grouping code was missing a potential conflict.  Due to
input needing to be sequential scalar regs, we have:

 IN:  x <-> y <-> z <-> w

which would be grouped to:

 OUT: w <-> z2 <-> y2 <-> x  (where the 2 denotes a copy/mov)

but that can't actually work.  We need to realize that x and w are
already in the same chain, not just that they aren't both already in
new chain being built.

With this fixed, we probably no longer need the hack from f68f6c0.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agoradeonsi: use enums in si_shader.h
Marek Olšák [Sat, 16 Apr 2016 12:30:46 +0000 (14:30 +0200)]
radeonsi: use enums in si_shader.h

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/radeon: use enums in r600_query.h
Marek Olšák [Sat, 16 Apr 2016 11:35:08 +0000 (13:35 +0200)]
gallium/radeon: use enums in r600_query.h

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: always use PFP_SYNC_ME when doing flushes and waits
Marek Olšák [Sun, 17 Apr 2016 15:28:25 +0000 (17:28 +0200)]
radeonsi: always use PFP_SYNC_ME when doing flushes and waits

This is typically used by the closed driver before SURFACE_SYNC.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: don't do VS/PS partial flushes if SURFACE_SYNC waits too
Marek Olšák [Sun, 17 Apr 2016 14:18:54 +0000 (16:18 +0200)]
radeonsi: don't do VS/PS partial flushes if SURFACE_SYNC waits too

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: add safety assertions for meta cache flushes
Marek Olšák [Sun, 17 Apr 2016 14:14:32 +0000 (16:14 +0200)]
radeonsi: add safety assertions for meta cache flushes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: don't use ACQUIRE_MEM on the graphics ring
Marek Olšák [Sun, 17 Apr 2016 13:52:55 +0000 (15:52 +0200)]
radeonsi: don't use ACQUIRE_MEM on the graphics ring

It's only required on the compute ring. This matches the closed driver.

The compute flag is removed to prevent confusion and Bas's compute shader
patches remove it in the whole function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: remove TODO and correct a comment in si_emit_cache_flush
Marek Olšák [Sun, 17 Apr 2016 13:34:24 +0000 (15:34 +0200)]
radeonsi: remove TODO and correct a comment in si_emit_cache_flush

Yes, that flag is really needed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: don't flush CB/DB caches for performance counters
Marek Olšák [Sun, 17 Apr 2016 13:18:31 +0000 (15:18 +0200)]
radeonsi: don't flush CB/DB caches for performance counters

I'm not sure about this. This will make the engines go idle, but the caches
will be unflushed. This should match app behavior without performance
counters, which can be a good thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/radeon: don't flush CB/DB caches for timestamp queries
Marek Olšák [Sun, 17 Apr 2016 13:17:31 +0000 (15:17 +0200)]
gallium/radeon: don't flush CB/DB caches for timestamp queries

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/util: fix undefined shift to the last bit in u_bit_scan
Marek Olšák [Sat, 16 Apr 2016 00:09:55 +0000 (02:09 +0200)]
gallium/util: fix undefined shift to the last bit in u_bit_scan

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/util: fix u_bit_scan_consecutive_range for mask == 0xffffffff
Marek Olšák [Fri, 15 Apr 2016 20:08:57 +0000 (22:08 +0200)]
gallium/util: fix u_bit_scan_consecutive_range for mask == 0xffffffff

The second ffs returns 0, yielding count == -1.

v2: change 1 to 1u

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/radeon: fix Nine with its slightly shifted viewports
Marek Olšák [Fri, 15 Apr 2016 22:29:05 +0000 (00:29 +0200)]
gallium/radeon: fix Nine with its slightly shifted viewports

just need to do the calculation in floating-point and then round things
properly

Reviewed-by: Axel Davy <axel.davy@ens.fr>
8 years agodocs: correct name for GL_OES_primitive_bounding_box
Erik Faye-Lund [Mon, 18 Apr 2016 15:26:33 +0000 (17:26 +0200)]
docs: correct name for GL_OES_primitive_bounding_box

When this extension was added, an underscore were mistakenly replaced
by a space. Let's correct this, so it's a tad easier to grep for this
extension.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
8 years agometa: Don't botch color masks when changing drawbuffers.
Kenneth Graunke [Tue, 12 Apr 2016 17:19:09 +0000 (10:19 -0700)]
meta: Don't botch color masks when changing drawbuffers.

Color clears should respect each drawbuffer's color mask state.

Previously, we tried to leave the color mask untouched.  However,
_mesa_meta_drawbuffers_from_bitfield() ended up rebinding all the
color drawbuffers in a different order, so we ended up pairing
drawbuffers with the wrong color mask state.

The new _mesa_meta_drawbuffers_and_colormask() function does the
same job as the old _mesa_meta_drawbuffers_from_bitfield(), but
also rearranges the color mask state to match the new drawbuffer
configuration.

This code was largely ripped off from Gallium's st_Clear code.

This fixes ES31-CTS.draw_buffers_indexed.color_masks, which binds
up to 8 drawbuffers, sets color masks for each, and then calls
glClearBufferfv to clear each buffer individually.  ClearBuffer
causes us to rebind only one drawbuffer, at which point we used
ctx->Color.ColorMask[0] (draw buffer 0's state) for everything.

We could probably delete _mesa_meta_drawbuffers_from_bitfield(),
but I'd rather not think about the i965 fast clear code.  Topi is
rewriting a bunch of that soon anyway, so let's delete it then.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94847
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agometa: Don't smash ColorMask when using MESA_META_COLOR_MASK save bit.
Kenneth Graunke [Tue, 12 Apr 2016 19:09:41 +0000 (12:09 -0700)]
meta: Don't smash ColorMask when using MESA_META_COLOR_MASK save bit.

This allows meta operations to inspect the existing color mask, and
then do their own smashing.

BlitFramebuffer and Clear already override the color mask, so this
was also redundant.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agovc4: Add support for rendering to cube map surfaces.
Eric Anholt [Fri, 15 Apr 2016 22:07:49 +0000 (15:07 -0700)]
vc4: Add support for rendering to cube map surfaces.

We need to fix up the offset to point at the face of the cube.  Fixes
piglit fbo-cubemap, copyteximage CUBE, and glean's fbo test.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
8 years agovc4: Don't flush on read-only access of buffers read by the CL.
Eric Anholt [Fri, 15 Apr 2016 21:27:34 +0000 (14:27 -0700)]
vc4: Don't flush on read-only access of buffers read by the CL.

Fixes piglit mixed-immediate-and-vbo, and may significantly improve
performance of applications that store a 4-byte IB in the same VBO as
vertex data.

8 years agovc4: Sanity check that flushes don't happen between state emit and draw.
Eric Anholt [Fri, 15 Apr 2016 20:43:14 +0000 (13:43 -0700)]
vc4: Sanity check that flushes don't happen between state emit and draw.

Catches the cause of failure in
arb_vertex_buffer_object-mixed-immediate-and-vbo, I've had this class of
failure before, and it probably won't be the last time.

8 years agovc4: Sanity check strides for imported BOs.
Eric Anholt [Fri, 15 Apr 2016 20:17:26 +0000 (13:17 -0700)]
vc4: Sanity check strides for imported BOs.

If we're going to sample from or render to them at some particular size,
we'd better make sure that they actually are that size.  Causes some tests
under simulation to generate appropriate error messages instead of
failures.

8 years agomath: Import isinf and others to global namespace
Pierre Moreau [Thu, 14 Apr 2016 18:43:00 +0000 (20:43 +0200)]
math: Import isinf and others to global namespace

Starting from C++11, several math functions, like isinf, moved into the std
namespace. Since cmath undefines those functions before redefining them inside
the namespace, and glibc 2.23 defines the C variants as macros, the C variants
in global namespace are not accessible any longer.

v2: Move the fix outside of Nouveau, as suggested by Jose Fonseca, since anyone
    might need it when GCC switches to C++14 by default with GCC 6.0.

v3:
*   Put the code directly inside c99_math.h rather than creating a new header
    file, as asked by Jose Fonseca;
*   Guard the code behind glibc version checks, as only glibc > =2.23 defines
    isinf & co. as functions, as suggested by Jose Fonseca.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
8 years agor600g: Move R600_BIG_ENDIAN to r600_pipe_common.h
Oded Gabbay [Sun, 6 Mar 2016 15:58:59 +0000 (17:58 +0200)]
r600g: Move R600_BIG_ENDIAN to r600_pipe_common.h

I need to do this so I could use R600_BIG_ENDIAN in files which include
r600_pipe_common.h but not r600_pipe.h

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agor600g: fix code indentation
Oded Gabbay [Mon, 7 Mar 2016 13:27:26 +0000 (15:27 +0200)]
r600g: fix code indentation

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agodocs: add news item and link release notes for 11.1.3
Emil Velikov [Sun, 17 Apr 2016 22:24:41 +0000 (23:24 +0100)]
docs: add news item and link release notes for 11.1.3

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
8 years agodocs: add sha256 checksums for 11.1.3
Emil Velikov [Sun, 17 Apr 2016 22:18:04 +0000 (23:18 +0100)]
docs: add sha256 checksums for 11.1.3

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 596c6504b3dcab318dc93ec42517c9a0fde1b255)

8 years agodocs: add release notes for 11.1.3
Emil Velikov [Sun, 17 Apr 2016 17:43:30 +0000 (18:43 +0100)]
docs: add release notes for 11.1.3

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ca2fbf6f8fe5e1853064c81fd3334a8172d65689)

8 years agogallivm: don't use vector selects with llvm 3.7
Roland Scheidegger [Sat, 16 Apr 2016 21:26:46 +0000 (23:26 +0200)]
gallivm: don't use vector selects with llvm 3.7

llvm 3.7 sometimes simply miscompiles vector selects.
See https://bugs.freedesktop.org/show_bug.cgi?id=94972

This was fixed in llvm r249669
(https://llvm.org/bugs/show_bug.cgi?id=24532).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
8 years agonir: only dereference undef after NULL check. (v2)
Dave Airlie [Sun, 17 Apr 2016 20:56:06 +0000 (06:56 +1000)]
nir: only dereference undef after NULL check. (v2)

Pointed out by coverity.

v2: nuke line, Jason pointed out the constructor does it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agodocs: update the sha256 checksums for 11.2.1
Emil Velikov [Sun, 17 Apr 2016 18:29:49 +0000 (19:29 +0100)]
docs: update the sha256 checksums for 11.2.1

Turns out the previous tarballs got corrupted during upload which I
carelessly forgot to check prior to deleting the local ones.
Lesson learned - double check before removing the local ones.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 79b0e13913b5189bb8629e80439fea746f99fe79)

8 years agodocs: add news item and link release notes for 11.2.1
Emil Velikov [Sun, 17 Apr 2016 17:35:21 +0000 (18:35 +0100)]
docs: add news item and link release notes for 11.2.1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
8 years agodocs: add sha256 checksums for 11.2.1
Emil Velikov [Sun, 17 Apr 2016 17:32:11 +0000 (18:32 +0100)]
docs: add sha256 checksums for 11.2.1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c65835d81230fbdc1544600c0a24f90647a4e75a)

8 years agodocs: add release notes for 11.2.1
Emil Velikov [Sun, 17 Apr 2016 15:03:34 +0000 (16:03 +0100)]
docs: add release notes for 11.2.1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 21e6440e82808364a6c2cc38ea92651c99b69aad)

8 years agoi965/fs: Don't allow OOB array access of images
Jason Ekstrand [Fri, 15 Apr 2016 01:36:05 +0000 (18:36 -0700)]
i965/fs: Don't allow OOB array access of images

We have had a guard against OOB array access of images on IVB for a long
time, but it can actually cause hangs on any GPU generation.  This can
happen due to getting an untyped SURFACE_STATE for a typed message.  We
didn't used to hit this with the piglit test on anything other than IVB
because the OOB in the test would cause us to go past the top of the pull
constant UBO and we would get a surface index of 0 which is was always a
valid surface.  Now that we're pushing small arrays, we can end up grabbing
garbage from the GRF and going to some random index which causes a hang.
The solution is to just do the bounds check on all hardware.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94944
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
8 years agoanv/device: Images are only enabled in scalar stages
Jason Ekstrand [Fri, 15 Apr 2016 23:39:17 +0000 (16:39 -0700)]
anv/device: Images are only enabled in scalar stages

Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agogallium/radeon: handle vertex shaders that disable clipping & viewport
Marek Olšák [Wed, 13 Apr 2016 15:28:30 +0000 (17:28 +0200)]
gallium/radeon: handle vertex shaders that disable clipping & viewport

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agomesa/texstore: Use Driver.CompressedTexSubImage in the default CompressedTexImage
Nanley Chery [Tue, 12 Apr 2016 21:27:42 +0000 (14:27 -0700)]
mesa/texstore: Use Driver.CompressedTexSubImage in the default CompressedTexImage

Enable drivers to use their own implementation of this method instead of
the mesa default. Since the drivers that currently overwrite
dd_function_table::CompressedTexSubImage also overwrite
::CompressedTexImage, there should be no behavioral change.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoanv: Advertise vertexPipelineStoresAndAtomics based on scalar stages
Jason Ekstrand [Fri, 15 Apr 2016 21:53:16 +0000 (14:53 -0700)]
anv: Advertise vertexPipelineStoresAndAtomics based on scalar stages

Previously, we just looked at the hardware generation but this meant that
if you did INTEL_DEBUG=vec4 on BDW or SKL, you would have advertised but
non-working features.

8 years agoi965/vec4: Support full std140 layout for push constants
Jason Ekstrand [Tue, 5 Apr 2016 22:55:35 +0000 (15:55 -0700)]
i965/vec4: Support full std140 layout for push constants

Up until now, we have been able to assume that all push constants are
vec4-aligned because this is what the GL driver gives us.  In Vulkan, we
need to be able to support full std140 because we get the layout from the
client.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/vec4: Handle MOV_INDIRECT in pack_uniform_registers
Jason Ekstrand [Tue, 5 Apr 2016 22:43:48 +0000 (15:43 -0700)]
i965/vec4: Handle MOV_INDIRECT in pack_uniform_registers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>