mesa.git
5 years agointel/fs: make scan/reduce work with SIMD32 when it fits 2 registers
Paulo Zanoni [Fri, 9 Aug 2019 22:40:33 +0000 (15:40 -0700)]
intel/fs: make scan/reduce work with SIMD32 when it fits 2 registers

When dealing with uint16_t and uint8_t on SIMD32 we can do all the
operations using just 2 registers, so we don't hit the recursion at
the beginning of emit_scan(). Because of that, we need to actually
compute scan/reduce for channels 31:16.

v2: Still missed instructions (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agofreedreno/regs: A couple of tess updates
Kristian H. Kristensen [Wed, 18 Sep 2019 20:09:50 +0000 (13:09 -0700)]
freedreno/regs: A couple of tess updates

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/regs: Fix CP_DRAW_INDX_OFFSET command
Kristian H. Kristensen [Wed, 18 Sep 2019 20:08:55 +0000 (13:08 -0700)]
freedreno/regs: Fix CP_DRAW_INDX_OFFSET command

On A5xx+ the INDX_BASE pointer is 64 bit.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Write multiple regs for SP_VS_OUT_REG and SP_VS_VPC_DST_REG
Kristian H. Kristensen [Mon, 10 Jun 2019 19:04:21 +0000 (12:04 -0700)]
freedreno/a6xx: Write multiple regs for SP_VS_OUT_REG and SP_VS_VPC_DST_REG

Compute the number of writes up front.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Turn on vectorize_io
Kristian H. Kristensen [Fri, 12 Jul 2019 19:36:45 +0000 (12:36 -0700)]
freedreno/a6xx: Turn on vectorize_io

We want this for tessellation eventually, but we can turn it on now.

Shader-db results:

total instructions in shared programs: 8612905 -> 8611387 (-0.02%)
instructions in affected programs: 164952 -> 163434 (-0.92%)

total dwords in shared programs: 11952000 -> 11950560 (-0.01%)
dwords in affected programs: 68096 -> 66656 (-2.11%)

total full in shared programs: 315019 -> 315009 (<.01%)
full in affected programs: 1642 -> 1632 (-0.61%)

total constlen in shared programs: 2463654 -> 2463654 (0.00%)
constlen in affected programs: 0 -> 0

total (ss) in shared programs: 152379 -> 152409 (0.02%)
(ss) in affected programs: 1503 -> 1533 (2.00%)

total (sy) in shared programs: 96473 -> 96525 (0.05%)
(sy) in affected programs: 654 -> 706 (7.95%)

total max_sun in shared programs: 1172454 -> 1172472 (<.01%)
max_sun in affected programs: 104 -> 122 (17.31%)

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Share shader state constructor and destructor
Kristian H. Kristensen [Mon, 10 Jun 2019 19:04:21 +0000 (12:04 -0700)]
freedreno/a6xx: Share shader state constructor and destructor

Also, swap vs and fs constructor or so fs comes first.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/a6xx: Track location of gl_Position out as we link it
Kristian H. Kristensen [Fri, 13 Sep 2019 22:20:05 +0000 (15:20 -0700)]
freedreno/a6xx: Track location of gl_Position out as we link it

When using xfb and rasterizing, the fragment shader may have fewer
inputs than the vertex shader outputs. We can't rely on gl_Position to
be placed at fs->total_in, but have to instead remember where we add
it in the link map and use that location.

Fixes 100+ tesselation dEQPs under

  dEQP-GLES31.functional.tessellation.primitive_discard.*
  dEQP-GLES31.functional.tessellation.user_defined_io.*

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agospirv: Add missing break for capability handling
Caio Marcelo de Oliveira Filho [Wed, 18 Sep 2019 15:57:15 +0000 (08:57 -0700)]
spirv: Add missing break for capability handling

New added cases "stole" the previous break.

Fixes: 420ad0a1a3d ("spirv: check support for SPV_KHR_float_controls capabilities")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoiris: Avoid uploading SURFACE_STATE descriptors for UBOs if possible
Kenneth Graunke [Sun, 15 Sep 2019 06:18:20 +0000 (23:18 -0700)]
iris: Avoid uploading SURFACE_STATE descriptors for UBOs if possible

If we can entirely push uniform data, we don't need a SURFACE_STATE
descriptor for pulling data.  Since constant uploads are a very common
operation, and being able to push all data is also very common, we would
like to avoid the overhead in this case.

This patch defers uploading new descriptors.  Instead of handling that
at iris_set_constant_buffer, we do it at iris_update_compiled_shaders,
where we can see the currently bound shader variants.  If any need pull
descriptors, and descriptors are missing, we update them and flag that
the binding table also needs to be refreshed.

Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by
31.9774% +/- 1.12947% (n=15).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/compiler: Record whether any pull constant loads occur
Kenneth Graunke [Tue, 10 Sep 2019 05:21:17 +0000 (22:21 -0700)]
intel/compiler: Record whether any pull constant loads occur

I would like for iris to be able to avoid setting up SURFACE_STATE
for UBOs in the common case where all constants are pushed.

Unfortunately, we don't know up front whether everything will be
pushed: the backend is allowed to demote pushed UBOs to pull loads
fairly late in the process.  This is probably desirable though, as
we'd like the backend to be able to re-pull pushed data to break up
long live ranges in response to register pressure.

Here we simply add a "are there any pull loads at all" boolean to
prog_data, which is a bit crude but at least allows us to skip work
in the common "everything pushed" case.  We could skip more work by
tracking exactly which UBO surfaces are pulled in a bitmask, but I
wanted to avoid bringing back the old mark_surface_used() mechanism.

Finer-grained tracking could allow us to skip a bit more work when
multiple UBOs are in use and /some/ are 100% pushed, but others are
accessed via pulls.  However, I'm not sure how common this is and
it would save at most 4 pull descriptors, so we defer that for now.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Track per-stage bind history, reduce work accordingly
Kenneth Graunke [Tue, 10 Sep 2019 18:14:57 +0000 (11:14 -0700)]
iris: Track per-stage bind history, reduce work accordingly

We now track per-stage bind history for constant and shader buffers,
shader images, and sampler views by adding an extra res->bind_stages
field to go with res->bind_history.

This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages
involved, and also skip some CPU overhead in iris_rebind_buffer.

Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace
on Icelake.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Don't flag IRIS_DIRTY_BINDINGS for constant usage history
Kenneth Graunke [Tue, 10 Sep 2019 10:28:59 +0000 (03:28 -0700)]
iris: Don't flag IRIS_DIRTY_BINDINGS for constant usage history

The underlying buffer isn't changing - so we don't need to update any
SURFACE_STATE descriptors - we just might have new constants, meaning
we need to re-emit 3DSTATE_CONSTANT_XS.  On Gen9, this means we need
to update 3DSTATE_BINDING_TABLE_POINTERS_XS too, but that's now handled
by the explicit check in the previous patch.

On Gen9, this should cause us to re-emit the binding table /pointer/ on
writing to a buffer with PIPE_BIND_CONSTANT_BUFFER, rather than emitting
a whole new /table/.

On Gen8 and Gen11, this avoids binding table churn altogether.

Cuts 61% of 3DSTATE_BINDING_TABLE_POINTERS_XS packets in a Shadow of
Mordor trace on Icelake.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Explicitly emit 3DSTATE_BTP_XS on Gen9 with DIRTY_CONSTANTS_XS
Kenneth Graunke [Tue, 10 Sep 2019 10:08:46 +0000 (03:08 -0700)]
iris: Explicitly emit 3DSTATE_BTP_XS on Gen9 with DIRTY_CONSTANTS_XS

Right now, we usually flag both IRIS_DIRTY_{CONSTANTS,BINDINGS}_XS,
because we have SURFACE_STATE for constant buffers in case the shaders
access them via pull mode.

But this flagging is overkill in many cases.  Gen8 and Gen11 don't need
it at all.  Gen9 doesn't need that large of a hammer in all cases.

Just handle it explicitly so the right thing happens.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Flag IRIS_DIRTY_BINDINGS_XS on constant buffer rebinds
Kenneth Graunke [Tue, 10 Sep 2019 19:10:26 +0000 (12:10 -0700)]
iris: Flag IRIS_DIRTY_BINDINGS_XS on constant buffer rebinds

We upload a new SURFACE_STATE for the UBO/SSBO in question, which
means that we need new binding tables as well.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoradv: Add DFSM support.
Bas Nieuwenhuizen [Sun, 15 Sep 2019 12:39:42 +0000 (14:39 +0200)]
radv: Add DFSM support.

Apparently we already enabled it without having support ...

Not sure if we also need to set disable_start_of_prim when the PS
has memory writes, but this mirrors radeonsi.

Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to
~32 pixels/cycle on a Raven.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Disable dfsm by default even on Raven.
Bas Nieuwenhuizen [Sun, 15 Sep 2019 13:57:52 +0000 (15:57 +0200)]
radv: Disable dfsm by default even on Raven.

When actually implementing it, Talos on low is still 3% slower.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Only break batch on framebuffer change with dfsm.
Bas Nieuwenhuizen [Sun, 15 Sep 2019 11:36:58 +0000 (13:36 +0200)]
radv: Only break batch on framebuffer change with dfsm.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir/opt_if: Fix undef handling in opt_split_alu_of_phi()
Connor Abbott [Wed, 28 Aug 2019 14:56:57 +0000 (16:56 +0200)]
nir/opt_if: Fix undef handling in opt_split_alu_of_phi()

The pass assumed that "Most ALU ops produce an undefined result if any
source is undef" which is completely untrue. Due to how we lower if
statements to selects and then optimize on those selects later, we
simply cannot make that assumption. In particular this pass tried to
replace an ior of undef and true, which had been generated by
optimizing a select which itself came from flattening an if statement,
to undef causing a miscompilation for a CTS test with radeonsi NIR.

We fix this by always doing what the non-undef path did, i.e. duplicate
the instruction twice. If there are cases where the instruction before
the loop can be folded away due to having an undef source, we should add
these to opt_undef instead.

The comment above the pass says that if the phi source from before the
loop is undef, and we can fold the instruction before the loop to undef,
then we can ignore sources of the original instruction that don't
dominate the block before the loop because we don't need them to create
the instruction before the loop. This is incorrect, because the
instruction at the bottom of the loop would get those sources from the
wrong loop iteration. The code never actually did what the comment said,
so we only have to update the comment to match what the pass actually
does. We also update the example to more closely match what most actual
loops look like after vtn and peephole_select.

There are no shader-db changes with i965, radeonsi NIR, or radv. With
anv and my vkpipeline-db there's only one change:

total instructions in shared programs: 14125290 -> 14125300 (<.01%)
instructions in affected programs: 2598 -> 2608 (0.38%)
helped: 0
HURT: 1

total cycles in shared programs: 2051473437 -> 2051473397 (<.01%)
cycles in affected programs: 36697 -> 36657 (-0.11%)
helped: 1
HURT: 0

Fixes
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
with radeonsi NIR.

5 years agogl: drop incorrect pkg-config file for glvnd
Eric Engestrom [Wed, 18 Sep 2019 20:48:49 +0000 (21:48 +0100)]
gl: drop incorrect pkg-config file for glvnd

Akin to 1a25980c469b38d2c645 ("egl: drop incorrect pkg-config file for
glvnd") and b01524fff05eef66e8cd ("meson: don't build libGLES*.so with
GLVND") , removes a pkg-config file that shouldn't have been there in
the first place, but was needed because of that GLVND bug.

Now that the glvnd bug has been fixed, it was apparent that this gl.pc
pkg-config file was forgotten to be removed, so let's do just that :)

Suggested-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agodocs: Add the maximum implemented Vulkan API version in 19.3 rel notes
Andres Gomez [Wed, 18 Sep 2019 09:44:47 +0000 (12:44 +0300)]
docs: Add the maximum implemented Vulkan API version in 19.3 rel notes

Currently, Vulkan 1.1.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agodocs: Add the maximum implemented Vulkan API version in 19.2 rel notes
Andres Gomez [Wed, 18 Sep 2019 09:44:13 +0000 (12:44 +0300)]
docs: Add the maximum implemented Vulkan API version in 19.2 rel notes

Currently, Vulkan 1.1.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agodocs: Add the maximum implemented Vulkan API version in 19.1 rel notes
Andres Gomez [Wed, 18 Sep 2019 09:42:13 +0000 (12:42 +0300)]
docs: Add the maximum implemented Vulkan API version in 19.1 rel notes

Currently, Vulkan 1.1.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agonir/opcodes: Clear variable names confusion
Andres Gomez [Wed, 18 Sep 2019 12:48:36 +0000 (15:48 +0300)]
nir/opcodes: Clear variable names confusion

Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.

Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoradv: never kill a NGG GS shader
Rhys Perry [Fri, 13 Sep 2019 18:38:28 +0000 (19:38 +0100)]
radv: never kill a NGG GS shader

Seems to fix a hang with excessive vertex emissions when NGG is used for
GS.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv/gfx10: fix VK_KHR_pipeline_executable_properties with NGG GS
Samuel Pitoiset [Wed, 18 Sep 2019 14:58:06 +0000 (16:58 +0200)]
radv/gfx10: fix VK_KHR_pipeline_executable_properties with NGG GS

No GS copy shader if a pipeline enables NGG GS.

This fixes
dEQP-VK.pipeline.executable_properties.graphics.*geometry_stage*.

Fixes: 86864eedd2d ("radv: Implement radv_GetPipelineExecutablePropertiesKHR.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: include drm_fourcc.h to fix the build
Marek Olšák [Wed, 18 Sep 2019 18:52:25 +0000 (14:52 -0400)]
radeonsi: include drm_fourcc.h to fix the build

5 years agoradeonsi: implement pipe_screen::resource_get_param
Marek Olšák [Fri, 30 Aug 2019 00:45:29 +0000 (20:45 -0400)]
radeonsi: implement pipe_screen::resource_get_param

v2: return DRM_FORMAT_MOD_INVALID from the function

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
5 years agogallium: extend resource_get_param to be as capable as resource_get_handle
Marek Olšák [Fri, 30 Aug 2019 00:35:54 +0000 (20:35 -0400)]
gallium: extend resource_get_param to be as capable as resource_get_handle

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoac: move ac_get_num_physical_vgprs into radeon_info
Marek Olšák [Thu, 12 Sep 2019 23:51:13 +0000 (19:51 -0400)]
ac: move ac_get_num_physical_vgprs into radeon_info

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac: move ac_get_num_physical_sgprs into radeon_info
Marek Olšák [Thu, 12 Sep 2019 23:46:02 +0000 (19:46 -0400)]
ac: move ac_get_num_physical_sgprs into radeon_info

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac: move ac_get_max_wave64_per_simd into radeon_info
Marek Olšák [Thu, 12 Sep 2019 23:39:02 +0000 (19:39 -0400)]
ac: move ac_get_max_wave64_per_simd into radeon_info

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac: move num_sdp_interfaces into radeon_info
Marek Olšák [Thu, 12 Sep 2019 23:00:23 +0000 (19:00 -0400)]
ac: move num_sdp_interfaces into radeon_info

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac: move PBB MAX_ALLOC_COUNT into radeon_info
Marek Olšák [Thu, 12 Sep 2019 23:00:23 +0000 (19:00 -0400)]
ac: move PBB MAX_ALLOC_COUNT into radeon_info

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoetnaviv: fix two-sided stencil
Jonathan Marek [Mon, 2 Sep 2019 18:44:51 +0000 (14:44 -0400)]
etnaviv: fix two-sided stencil

* Set missing STENCIL_CONFIG_EXT2 bits
* Swap stencil sides when rendering CCW

Fixes following deqp tests (which were 99% failing):
dEQP-GLES2.functional.fragment_ops.depth_stencil.*

Note: deqp tests require --deqp-gl-config-name=rgba8888d24s8ms0

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoradv: fix loading 64-bit GS inputs
Samuel Pitoiset [Wed, 18 Sep 2019 14:21:57 +0000 (16:21 +0200)]
radv: fix loading 64-bit GS inputs

We have to load 2 32-bit integer and to cast correctly.

This fixes crashes with gs-double-interpolator.vk_shader_test.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111734
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agotu: Set up glsl types.
Bas Nieuwenhuizen [Wed, 18 Sep 2019 12:11:47 +0000 (14:11 +0200)]
tu: Set up glsl types.

Addresses this assert:

deqp-vk: ../mesa-freedreno-9999/src/compiler/glsl_types.cpp:1244: static const glsl_type *glsl_type::get_interface_instance(const glsl_struct_field *, unsigned int, enum glsl_interface_packing, bool, const char *): Assertion `glsl_type_users > 0' failed.

running dEQP-VK.api.smoke.triangle .

Fixes: 624789e3708 "compiler/glsl: handle case where we have multiple users for types"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agodocs: Update to OpenGL 4.6 in the release notes
Andres Gomez [Wed, 18 Sep 2019 08:39:10 +0000 (11:39 +0300)]
docs: Update to OpenGL 4.6 in the release notes

After 41549a18e6c ("i965: Enable OpenGL 4.6 for Gen8+"), Mesa
implements the OpenGL 4.6 API.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
5 years ago.mailmap: add an alias for Eric Engestrom
Erik Faye-Lund [Mon, 16 Sep 2019 17:14:22 +0000 (19:14 +0200)]
.mailmap: add an alias for Eric Engestrom

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years ago.mailmap: add an alias for Michel Dänzer
Erik Faye-Lund [Mon, 16 Sep 2019 17:35:35 +0000 (19:35 +0200)]
.mailmap: add an alias for Michel Dänzer

Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
5 years agoradv: fix writing depth/stencil clear values to image
Samuel Pitoiset [Wed, 18 Sep 2019 08:58:04 +0000 (10:58 +0200)]
radv: fix writing depth/stencil clear values to image

Use the fastest way only if both aspects are used. Oops.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111728
Fixes: 218ce34962c ("radv: add mipmap support for the clear depth/stencil values")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agogitlab-ci: Merge scons-nollvm and scons-llvm jobs
Michel Dänzer [Thu, 12 Sep 2019 09:45:13 +0000 (11:45 +0200)]
gitlab-ci: Merge scons-nollvm and scons-llvm jobs

The new job tests scons without LLVM and with all LLVM versions >= 6.0.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Test scons with all LLVM versions
Michel Dänzer [Thu, 12 Sep 2019 09:38:06 +0000 (11:38 +0200)]
gitlab-ci: Test scons with all LLVM versions

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Move scons build/test commands to a separate shell script
Michel Dänzer [Thu, 12 Sep 2019 09:34:43 +0000 (11:34 +0200)]
gitlab-ci: Move scons build/test commands to a separate shell script

Preparatory, no functional change intended.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Use crossbuild-essential-* packages
Michel Dänzer [Fri, 6 Sep 2019 15:33:13 +0000 (17:33 +0200)]
gitlab-ci: Use crossbuild-essential-* packages

They are convenience packages which pull in everything needed for
cross-building via dependencies.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Use newer packages from backports by default
Michel Dänzer [Wed, 11 Sep 2019 16:35:08 +0000 (18:35 +0200)]
gitlab-ci: Use newer packages from backports by default

This is needed in particular to get a recent enough version of meson in
the stretch image, but should be generally beneficial.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Create separate docker images for Debian stretch & buster
Michel Dänzer [Fri, 6 Sep 2019 15:04:47 +0000 (17:04 +0200)]
gitlab-ci: Create separate docker images for Debian stretch & buster

Pros:
* Less fragile due to not mixing packages from stretch and buster
* No longer need to use third-party LLVM packages
* The buster image now uses GCC 8 for C++ as well (previously 6 for C++,
  8 for C), allowing to drop some hacks

Con:
* The stretch image now only uses GCC 6 for C as well as C++
* Need separate jobs for testing old LLVM versions

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Pass --no-remove to apt-get where possible
Michel Dänzer [Fri, 6 Sep 2019 14:15:40 +0000 (16:15 +0200)]
gitlab-ci: Pass --no-remove to apt-get where possible

If installing new packages would require removing previously installed
ones, this flag causes apt-get to abort with an error instead,
preventing later obscure failures due to the missing packages.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Reference full ci-templates commit hash
Michel Dänzer [Fri, 6 Sep 2019 15:01:50 +0000 (17:01 +0200)]
gitlab-ci: Reference full ci-templates commit hash

8 digits might become ambiguous at some point.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoi965: support AYUV/XYUV for external import only
Haihao Xiang [Mon, 16 Sep 2019 06:52:56 +0000 (14:52 +0800)]
i965: support AYUV/XYUV for external import only

Fixes: 89785e2d56e7fa ("i965: add support for sampling from AYUV")
Fixes: 7cab8d3661f243 ("i965: Add support for sampling from XYUV images")
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agopanfrost: Allocate tiler and scratchpad BOs per-batch
Boris Brezillon [Sat, 14 Sep 2019 17:18:51 +0000 (19:18 +0200)]
panfrost: Allocate tiler and scratchpad BOs per-batch

If we want to execute several batches in parallel they need to have
their own tiler and scratchpad BOs. Let move those objects to
panfrost_batch and allocate them on a per-batch basis.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Add FBO BOs to batch->bos earlier
Boris Brezillon [Sat, 14 Sep 2019 16:40:23 +0000 (18:40 +0200)]
panfrost: Add FBO BOs to batch->bos earlier

If we want the batch dependency tracking to work correctly we must
make sure all BOs are added to the batch->bos set early enough. Adding
FBO BOs when generating the fragment job is clearly to late. Add a
panfrost_batch_add_fbo_bos helper and call it in the clear/draw path.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Add the panfrost_batch_create_bo() helper
Boris Brezillon [Sat, 14 Sep 2019 15:57:06 +0000 (17:57 +0200)]
panfrost: Add the panfrost_batch_create_bo() helper

This helper automates the panfrost_bo_create()+panfrost_batch_add_bo()+
panfrost_bo_unreference() sequence that's done for all per-batch BOs.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Don't return imported/exported BOs to the cache
Boris Brezillon [Sat, 14 Sep 2019 09:46:44 +0000 (11:46 +0200)]
panfrost: Don't return imported/exported BOs to the cache

We don't know who else is using the BO in that case, and thus shouldn't
re-use it for something else.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Add panfrost_bo_{alloc,free}()
Boris Brezillon [Sat, 14 Sep 2019 09:42:38 +0000 (11:42 +0200)]
panfrost: Add panfrost_bo_{alloc,free}()

Thanks to that we avoid the recursive call into panfrost_bo_create()
and we can get rid of panfrost_bo_release() by inlining the code in
panfrost_bo_unreference().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Stop using panfrost_bo_release() outside of pan_bo.c
Boris Brezillon [Sat, 14 Sep 2019 09:23:51 +0000 (11:23 +0200)]
panfrost: Stop using panfrost_bo_release() outside of pan_bo.c

panfrost_bo_unreference() should be used instead.

The only difference caused by this change is that the scratchpad,
tiler_heap and tiler_dummy BOs are now returned to the cache instead
of being freed when a context is destroyed. This is only a problem if
we care about context isolation, which apparently is not the case since
transient BOs are already returned to the per-FD cache (and all contexts
share the same address space anyway, so enforcing context isolation
is almost impossible).

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Stop passing screen around for BO operations
Boris Brezillon [Sat, 14 Sep 2019 08:35:47 +0000 (10:35 +0200)]
panfrost: Stop passing screen around for BO operations

Store a screen pointer in panfrost_bo so we don't have to pass a screen
object to all functions manipulating the BO.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Don't check if BO is mmaped before calling panfrost_bo_mmap()
Boris Brezillon [Sat, 14 Sep 2019 08:26:38 +0000 (10:26 +0200)]
panfrost: Don't check if BO is mmaped before calling panfrost_bo_mmap()

panfrost_bo_mmap() already takes care of that.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Stop exposing panfrost_bo_cache_{fetch,put}()
Boris Brezillon [Sat, 14 Sep 2019 08:22:36 +0000 (10:22 +0200)]
panfrost: Stop exposing panfrost_bo_cache_{fetch,put}()

They are not expected to be called directly, users should use
panfrost_bo_{create,release}() instead.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Move the BO API to its own header
Boris Brezillon [Sat, 14 Sep 2019 07:58:55 +0000 (09:58 +0200)]
panfrost: Move the BO API to its own header

Right now, the BO API is spread over pan_{allocate,resource,screen}.h.
Let's move all BO related definitions to a separate header file.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: s/PAN_ALLOCATE_/PAN_BO_/
Boris Brezillon [Sat, 14 Sep 2019 11:24:47 +0000 (13:24 +0200)]
panfrost: s/PAN_ALLOCATE_/PAN_BO_/

Change the prefix for BO allocation flags to make it consistent with
the rest of the BO API.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Move panfrost_bo_{reference,unreference}() to pan_bo.c
Boris Brezillon [Sat, 14 Sep 2019 07:45:37 +0000 (09:45 +0200)]
panfrost: Move panfrost_bo_{reference,unreference}() to pan_bo.c

This way we have all BO related functions placed in the same source
file.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Get rid of pan_drm.c
Boris Brezillon [Sat, 14 Sep 2019 06:00:27 +0000 (08:00 +0200)]
panfrost: Get rid of pan_drm.c

pan_drm.c was only meaningful when we were supporting 2 kernel drivers
(mali_kbase, and the drm one). Now that there's now kernel-driver
abstraction we're better off moving those functions were they belong:

* BO related functions in pan_bo.c
* fence related functions + query_gpu_version() in pan_screen.c
* submit related functions in pan_job.c

While at it, we rename the functions according to the place they're
being moved to.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Stop passing has_draws to panfrost_drm_submit_vs_fs_batch()
Boris Brezillon [Sat, 14 Sep 2019 07:11:09 +0000 (09:11 +0200)]
panfrost: Stop passing has_draws to panfrost_drm_submit_vs_fs_batch()

has_draws can be inferred directly from the batch->last_job value, no
need to pass it around.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Kill a useless memset(0) in panfrost_create_context()
Boris Brezillon [Sat, 14 Sep 2019 06:05:46 +0000 (08:05 +0200)]
panfrost: Kill a useless memset(0) in panfrost_create_context()

ctx is allocated with rzalloc() which takes care of zero-ing the memory
region. No need to call memset(0) on top.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Add polygon_list to the batch BO set at allocation time
Boris Brezillon [Sat, 14 Sep 2019 16:15:26 +0000 (18:15 +0200)]
panfrost: Add polygon_list to the batch BO set at allocation time

That's what we do for other per-batch BOs, and we'll soon add an helper
to automate this create_bo()+add_bo()+bo_unreference() sequence, so
let's prepare the code to ease this transition.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Add missing panfrost_batch_add_bo() calls
Boris Brezillon [Sat, 14 Sep 2019 15:32:02 +0000 (17:32 +0200)]
panfrost: Add missing panfrost_batch_add_bo() calls

Some BOs are used by batches but never explicitly added to the BO set.
This is currently not a problem because we wait for the execution of
a batch to be finished before releasing a BO, but we will soon relax
this rule.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Use the correct type for the bo_handle array
Boris Brezillon [Sun, 15 Sep 2019 19:06:58 +0000 (21:06 +0200)]
panfrost: Use the correct type for the bo_handle array

The DRM driver expects an array of u32, let's use the correct type, even
if using an int works in practice because it's still a 32-bit integer.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Stop exposing internal panfrost_*_batch() functions
Boris Brezillon [Fri, 13 Sep 2019 16:32:42 +0000 (18:32 +0200)]
panfrost: Stop exposing internal panfrost_*_batch() functions

panfrost_{create,free,get}_batch() are only called inside pan_job.c.
Let's make them static.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoetnaviv: disable ARB_shadow
Christian Gmeiner [Mon, 17 Jun 2019 22:52:19 +0000 (00:52 +0200)]
etnaviv: disable ARB_shadow

Looks like only HALT2 GPUs have support for it but that is not yet
implemented so disable ARB_shadow for now.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoRevert "gallium: remove PIPE_CAP_TEXTURE_SHADOW_MAP"
Christian Gmeiner [Mon, 17 Jun 2019 22:54:47 +0000 (00:54 +0200)]
Revert "gallium: remove PIPE_CAP_TEXTURE_SHADOW_MAP"

There are GPUs that do not support this feature.

This reverts commit e871abe452ad40efcccb0bab6b88fc31d0551e29

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agovirgl: Remove wrong EAGAIN handling for drmIoctl
Lepton Wu [Wed, 4 Sep 2019 18:53:37 +0000 (11:53 -0700)]
virgl: Remove wrong EAGAIN handling for drmIoctl

drmIoctl handles EAGAIN itself and actually it always return -1 on errors.
Remove the wrong handling of its return value. Also, print a warning when
it fails.

v2: - use _debug_printf instead of fprintf (Gurchetan Singh)

Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
5 years agoiris: Skip allocating a null surface when there are 0 color regions.
Kenneth Graunke [Tue, 27 Aug 2019 18:32:24 +0000 (11:32 -0700)]
iris: Skip allocating a null surface when there are 0 color regions.

The compiler now sets the "Null Render Target" bit in the RT write
extended message descriptor, causing it to write to an implicit null
surface without us needing to set one up in the binding table.

Together with the last patch, this improves performance in Car Chase on
an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/compiler: Set "Null Render Target" ex_desc bit on Gen11
Kenneth Graunke [Tue, 27 Aug 2019 18:22:33 +0000 (11:22 -0700)]
intel/compiler: Set "Null Render Target" ex_desc bit on Gen11

When there are no color regions (i.e. a depth only pass), we can set
the "Null Render Target" bit in the Gen11 RT write extended message
descriptor to indicate that it should behave as if it's writing to a
null render target, without the need for a binding table entry.

This lets drivers avoid setting up that null RT binding table entry,
but more importantly means the HW doesn't actually have to bother
looking up the surface state.

Together with the next patch, this improves performance in Car Chase on
an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agodocs/relnotes: add support for VK_KHR_shader_float_controls on Intel
Samuel Iglesias Gonsálvez [Wed, 13 Feb 2019 12:50:01 +0000 (13:50 +0100)]
docs/relnotes: add support for VK_KHR_shader_float_controls on Intel

v2:
- Move to 19.2.0 release notes (Andres).

v3:
- Move to 19.3.0 release notes (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls
Samuel Iglesias Gonsálvez [Thu, 31 May 2018 09:44:21 +0000 (11:44 +0200)]
anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls

This adds support for
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and
enables de Vulkan and SPIR-V extensions.

Also, notice that this includes the updates applied to the
VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension
VK_KHR_shader_float_controls v4 and Vulkan 1.1.116.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs: add support for shader float control to remove_extra_rounding_modes()
Samuel Iglesias Gonsálvez [Mon, 19 Nov 2018 11:38:10 +0000 (12:38 +0100)]
i965/fs: add support for shader float control to remove_extra_rounding_modes()

The remove_extra_rounding_modes() optimization will remove duplicated
rounding mode changes.

v2:
- Fix bug in the rounding mode change (Alejandro).

v3:
- Fix rounding modes.

v4:
- Updated to renamed shader info member and enum values (Andres).

v5:
- Simplify flags logic operations (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs: set rounding mode when emitting nir_op_f2f32 or nir_op_f2f16
Samuel Iglesias Gonsálvez [Wed, 13 Feb 2019 09:42:05 +0000 (10:42 +0100)]
i965/fs: set rounding mode when emitting nir_op_f2f32 or nir_op_f2f16

v2:
- Consider nir_op_f2f16 case too (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs: set rounding mode when emitting fadd, fmul and ffma instructions
Samuel Iglesias Gonsálvez [Tue, 12 Feb 2019 15:13:59 +0000 (16:13 +0100)]
i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions

v2:
- Updated to renamed shader info member (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs: add emit_shader_float_controls_execution_mode() and aux functions
Samuel Iglesias Gonsálvez [Fri, 1 Jun 2018 10:36:47 +0000 (12:36 +0200)]
i965/fs: add emit_shader_float_controls_execution_mode() and aux functions

We need this function to emit code that setups the control register
later with the defined execution mode for the shader. Therefore, we
emit it as the first instruction.

v2:
- Fix bug in setting the default mode mask in brw_rnd_mode_from_nir().
- Fix support for rounding modes in brw_rnd_mode_from_nir().

v3:
- Updated to renamed shader info member and enum values (Andres).

v4:
- Add actual emission as first instruction of emit_nir_code (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs/generator: add new opcode to set float controls modes in control register
Samuel Iglesias Gonsálvez [Thu, 12 Sep 2019 22:38:06 +0000 (01:38 +0300)]
i965/fs/generator: add new opcode to set float controls modes in control register

Before this commit, we had only FPRoundingMode decoration (the per
instruction one) that is applied during the SPIR-V handling. In
vtn_alu we find out the rounding mode, and generate the code
accordingly that later will be used to look for the respective
nir_op_f2f16_{rtz,rtne}.

Per-instruction gets prioritized because we make them explicit
conversions (with RTZ or RTNE nir opcodes) and they will override the
default execution mode defined with float controls. However, we need
to come back to the mode defined by float controls after the execution
of the FP Rounding instruction.

Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be
used to set the default rounding mode and denorms treatment in the
whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be
used as prioritized rounding mode in a per-instruction basis.

v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.

v3:
- Update comment (Caio).

v4:
- Split the patch into the helper and the new opcode (this
  one) (Caio).

v5:
- Add an explanation on the actual purpose and priority of the newly
  introduced opcode in the commit log (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs/generator: refactor rounding mode helper in preparation for float controls
Samuel Iglesias Gonsálvez [Thu, 12 Sep 2019 22:34:35 +0000 (01:34 +0300)]
i965/fs/generator: refactor rounding mode helper in preparation for float controls

v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.

v3:
- Update comment (Caio).

v4:
- Split the patch into the helper (this one) and the new
  opcode (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs/nir: add nir_op_unpack_half_2x16_split_*_flush_to_zero
Samuel Iglesias Gonsálvez [Mon, 9 Jul 2018 08:32:10 +0000 (10:32 +0200)]
i965/fs/nir: add nir_op_unpack_half_2x16_split_*_flush_to_zero

The denorm mode is set in the control register, no need to do
something else.

v2:
- Add an assert to make sure that we realize if this assumption is
  broken in the future (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/nir: do not apply the fsin and fcos trig workarounds for consts
Samuel Iglesias Gonsálvez [Tue, 4 Dec 2018 15:41:36 +0000 (16:41 +0100)]
intel/nir: do not apply the fsin and fcos trig workarounds for consts

If we have fsin or fcos trigonometric operations with constant values
as inputs, we will multiply the result by 0.99997 in
brw_nir_apply_trig_workarounds, making the result wrong.

Adjusting the rules so they do not apply to const values we let a
later constant fold to deal with it.

v2:
- Do not early constant fold but only apply the trig workaround for
  non constants (Caio).
- Add fixes tag to commit log (Caio).

Fixes: bfd17c76c12 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: fix fmin/fmax support for doubles
Samuel Iglesias Gonsálvez [Tue, 10 Jul 2018 11:17:05 +0000 (13:17 +0200)]
nir: fix fmin/fmax support for doubles

Until now, it was using the floating point version of fmin/fmax,
instead of the double version.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir: fix denorm flush-to-zero in sqrt's lowering at nir_lower_double_ops
Samuel Iglesias Gonsálvez [Tue, 10 Jul 2018 10:04:38 +0000 (12:04 +0200)]
nir: fix denorm flush-to-zero in sqrt's lowering at nir_lower_double_ops

v2:
- Replace hard coded value with DBL_MIN (Connor).

v3:
- Have into account the FLOAT_CONTROLS_DENORM_PRESERVE_FP64
  flag (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
5 years agonir: fix denorms in unpack_half_1x16()
Samuel Iglesias Gonsálvez [Mon, 9 Jul 2018 07:46:59 +0000 (09:46 +0200)]
nir: fix denorms in unpack_half_1x16()

According to VK_KHR_shader_float_controls:

"Denormalized values obtained via unpacking an integer into a vector
 of values with smaller bit width and interpreting those values as
 floating-point numbers must: be flushed to zero, unless the entry
 point is declared with the code:DenormPreserve execution mode."

v2:
- Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor).

v3:
- Adapt to use the new NIR lowering framework (Andres).

v4:
- Updated to renamed shader info member and enum values (Andres).

v5:
- Simplify flags logic operations (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
5 years agonir/algebraic: disable inexact optimizations depending on float controls execution...
Samuel Iglesias Gonsálvez [Wed, 12 Dec 2018 15:29:13 +0000 (16:29 +0100)]
nir/algebraic: disable inexact optimizations depending on float controls execution mode

If FLOAT_CONTROLS_SIGNED_ZERO_INF_NAN_PRESERVE or
FLOAT_CONTROLS_DENORM_FLUSH_TO_ZERO are enabled, do not apply the
inexact optimizations so the VK_KHR_shader_float_controls execution
mode is respected.

v2:
- Do not apply inexact optimizations if SHADER_DENORM_FLUSH_TO_ZERO is
  enabled (Andres).

v3:
- Updated to renamed shader info member (Andres).

v4:
- Directly access execution mode instead of dragging it by parameter (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v1]
5 years agonir/algebraic: mark float optimizations returning one parameter as inexact
Andres Gomez [Tue, 23 Apr 2019 13:54:24 +0000 (15:54 +0200)]
nir/algebraic: mark float optimizations returning one parameter as inexact

With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.

For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.

Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/constant_expressions: mind rounding mode converting from float to float16 destina...
Samuel Iglesias Gonsálvez [Mon, 4 Feb 2019 14:10:35 +0000 (15:10 +0100)]
nir/constant_expressions: mind rounding mode converting from float to float16 destinations

v2:
- Move the op-code specific knowledge to nir_opcodes.py even if it
  means a rount trip conversion (Connor).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/opcodes: make sure f2f16_rtz and f2f16_rtne behavior is not overriden by the...
Samuel Iglesias Gonsálvez [Sun, 21 Apr 2019 10:35:17 +0000 (12:35 +0200)]
nir/opcodes: make sure f2f16_rtz and f2f16_rtne behavior is not overriden by the float controls execution mode

Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir: mind rounding mode on fadd, fsub, fmul and fma opcodes
Samuel Iglesias Gonsálvez [Tue, 12 Feb 2019 14:43:10 +0000 (15:43 +0100)]
nir: mind rounding mode on fadd, fsub, fmul and fma opcodes

According to Vulkan spec, the new execution modes affect only
correctly rounded SPIR-V instructions, which includes fadd, fsub and
fmul.

v2:
- Fix fmul, fsub and fadd round-to-zero definitions, they should use
  auxiliary functions to calculate the proper value because Mesa uses
  round-to-nearest-even rounding mode by default (Connor).

v3:
- Do an actual fused multiply-add at ffma (Connor).

v4:
- Simplify fadd and fmul for bit sizes < 64 (Connor).
- Do not use double ffma for 32 bits float (Connor).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
5 years agonir: add support for round to zero rounding mode to nir_op_f2f32
Samuel Iglesias Gonsálvez [Wed, 13 Feb 2019 09:31:37 +0000 (10:31 +0100)]
nir: add support for round to zero rounding mode to nir_op_f2f32

f2f16's rounding modes are already handled and f2f64 don't need it
as there is not a floating point type with higher bit size than 64 for
now.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agoutil: add fp64 -> fp32 conversion support for RTNE and RTZ rounding modes
Samuel Iglesias Gonsálvez [Thu, 6 Sep 2018 14:01:34 +0000 (16:01 +0200)]
util: add fp64 -> fp32 conversion support for RTNE and RTZ rounding modes

In order to be coherent with the pre-existent API for half floats,
this new API for double is the one meant to be used when doing double
to float conversions. It is no more than a wrapper for the softfloat.h
API but we meant to keep that one private.

v2:
- Fix bug in _mesa_double_to_float_rtz() in the inf/nan detection
  using the exponent value.

v3:
- Replace custom f64 -> f32 implementations with the softfloat
  one (Andres).

v4:
- Added API usage clarifying comments (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoutil: add float to float16 conversions with RTZ and RTNE
Samuel Iglesias Gonsálvez [Wed, 4 Jul 2018 10:02:30 +0000 (12:02 +0200)]
util: add float to float16 conversions with RTZ and RTNE

In order to be coherent with the pre-existent functions, this new API
is the one meant to be used when doing half float to float
conversions. It is no more than a wrapper for the softfloat.h API but
we meant to keep that one private.

v2:
- Replace custom f32 -> f16 RTZ implementation with the softfloat
  one (Andres).

v3:
- Added API usage clarifying comments (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoutil: add softfloat functions to operate with doubles and floats
Samuel Iglesias Gonsálvez [Tue, 12 Feb 2019 08:51:31 +0000 (09:51 +0100)]
util: add softfloat functions to operate with doubles and floats

Implemented fadd, fsub, fmul and ffma for doubles and ffma for floats,
rounding to zero, using a modified implementation from Berkely
Softfloat 3e Library.

Their implementation correctness has been checked with the Berkeley
TestFloat Release 3e tool for x86_64.

v2:
- Reuse util_last_bit64() in _mesa_count_leading_zeros64()
  implementation (Connor).

v3:
- Add a specific ffma for floats version (Connor).
- Implement the ffma for doubles version (Andres).
- Lots of fixes in fadd, fsub and fmul (Andres).
- Improved documentation (Andres).

v4:
- Added f64 -> f32 conversion function (Andres).
- Added f32 -> f16 RTZ conversion function (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Tested-by: Andres Gomez <agomez@igalia.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: add support for flushing to zero denorm constants
Samuel Iglesias Gonsálvez [Wed, 20 Jun 2018 07:11:14 +0000 (09:11 +0200)]
nir: add support for flushing to zero denorm constants

v2:
- Refactor conditions and shared function (Connor).
- Move code to nir_eval_const_opcode() (Connor).
- Don't flush to zero on fquantize2f16
  From Vulkan spec, VK_KHR_shader_float_controls section:

  "3) Do denorm and rounding mode controls apply to OpSpecConstantOp?

  RESOLVED: Yes, except when the opcode is OpQuantizeToF16."

v3:
- Fix bit size (Connor).
- Fix execution mode on nir_loop_analize (Connor).

v4:
- Adapt after API changes to nir_eval_const_opcode (Andres).

v5:
- Simplify constant_denorm_flush_to_zero (Caio).

v6:
- Adapt after API changes and to use the new constant
  constructors (Andres).
- Replace MAYBE_UNUSED with UNUSED as the first is going
  away (Andres).

v7:
- Adapt to newly added calls (Andres).
- Simplified the auxiliary to flush denorms to zero (Caio).
- Updated to renamed supported capabilities member (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v4]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: add auxiliary functions to detect if a mode is enabled
Samuel Iglesias Gonsálvez [Fri, 1 Feb 2019 10:23:28 +0000 (11:23 +0100)]
nir: add auxiliary functions to detect if a mode is enabled

v2:
- Added more functions.

v3:
- Simplify most of the functions (Caio).

v4:
- Updated to renamed enum values (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
5 years agospirv/nir: keep track of SPV_KHR_float_controls execution modes
Samuel Iglesias Gonsálvez [Thu, 31 May 2018 10:20:30 +0000 (12:20 +0200)]
spirv/nir: keep track of SPV_KHR_float_controls execution modes

v2:
- Add support for rounding modes for each floating point bit size.

v3:
- Commit e68871f6a44 ("spirv: Handle constants and types before
  execution modes") changed when the execution modes are handled,
  which affects the result of the floating point constants when the
  rounding mode is set in the execution mode. Moved the handling of
  the rounding modes before we handle the constants.

v4:
- Rename vtn_decoration "literals" to "operands" (Andres).
- Simplify execution mode parsing util function (Caio).
- Extend the comment about the timing of the handling of the rounding
  modes (Caio).

v5:
- Correct extension name (Caio).
- Rename shader info member (Andres).
- Rename float controls enum (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agospirv: check support for SPV_KHR_float_controls capabilities
Samuel Iglesias Gonsálvez [Thu, 31 May 2018 09:50:54 +0000 (11:50 +0200)]
spirv: check support for SPV_KHR_float_controls capabilities

v2:
- Correct extension name (Caio).
- Rename supported capabilities member (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v1]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agogallium/xlib: Fix glXMakeCurrent(dpy, None, None, ctx)
Adam Jackson [Tue, 10 Sep 2019 19:11:19 +0000 (15:11 -0400)]
gallium/xlib: Fix glXMakeCurrent(dpy, None, None, ctx)

This is entirely legal in GL 3.0+. I wonder how many more times I'll
need to fix this specific bug.