mesa.git
7 years agost/mesa: destroy pipe_context before destroying st_context (v2)
Marek Olšák [Fri, 20 Jan 2017 01:26:42 +0000 (02:26 +0100)]
st/mesa: destroy pipe_context before destroying st_context (v2)

If radeonsi starts compiling an optimized shader variant asynchronously
with a GL debug callback set and the application destroys the GL context,
radeonsi crashes when trying to write shader stats into the debug output
of a non-existent context after compilation, because st/mesa was destroyed
before pipe_context.

Firefox with WebGL2 enabled hits this bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99456

v2: protect against a double destroy in st_create_context_priv and callers.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agonir: bump loop max unroll limit
Timothy Arceri [Wed, 18 Jan 2017 02:12:37 +0000 (13:12 +1100)]
nir: bump loop max unroll limit

The original number was chosen in an attempt to match the limits applied to
GLSL IR.

A look at the git history of the why these limits were chosen for GLSL IR
shows it was more to do with the slow speed of unrolling large loops in
GLSL IR than anything else. The speed of loop unrolling in NIR is not a
problem so we may wish to bump this even higher in future.

No shader-db change, however a furture change will disbale the GLSL IR
optimisation loop in the i965 backend results in 4 loops from The Talos
Principle failing to unroll. Bumping the limit allows them to unroll which
results in the instruction count matching the previous output from when the
GLSL IR opts were still enabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoglsl: lower constant arrays to uniform arrays before optimisation loop
Timothy Arceri [Tue, 24 Jan 2017 03:07:04 +0000 (14:07 +1100)]
glsl: lower constant arrays to uniform arrays before optimisation loop

Previously the constant array would not get copy propagated until the backend
did its GLSL IR opt loop. I plan on removing that from i965 shortly which
caused huge regressions in Deus-ex and Tomb Raider which have large
constant arrays. Moving lowering before the opt loop in the GLSL linker
fixes this and unexpectedly improves some compute shaders also.

shader-db results BDW:

instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 204 -> 194 (-4.90%)
instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1010 -> 741 (-26.63%)
instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 542 -> 385 (-28.97%)

cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1831382 -> 1818492 (-0.70%)
cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 216238 -> 206180 (-4.65%)
cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 18484 -> 16644 (-9.95%)

total instructions in shared programs: 13060313 -> 13059877 (-0.00%)
instructions in affected programs: 1756 -> 1320 (-24.83%)
helped: 3
HURT: 0

total cycles in shared programs: 256586698 -> 256561910 (-0.01%)
cycles in affected programs: 2066104 -> 2041316 (-1.20%)
helped: 3
HURT: 0

V3: only call the opt loop if lowering progressed (Suggested by Eric)

V2: call opts before and after lowering (Suggested by Ken)

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agomesa: Don't advertise GL_OES_read_format in core profile
Ian Romanick [Mon, 23 Jan 2017 17:57:15 +0000 (09:57 -0800)]
mesa: Don't advertise GL_OES_read_format in core profile

OpenGL ES implementations are not allowed to ship ARB extensions, and
OpenGL implementations are not allowed to ship OES extensions.

The functionality is also included in GL_ARB_ES2_compatibility.  Ever
OpenGL core-profile driver currently exposes both extensions.  I don't
know of any applications that explicitly check for GL_OES_read_format,
so removing it seems very unlikely to cause problems.  No functionality
is removed.

I have left this extension in place for compatibility profile.  There
are still OpenGL 1.x drivers in Mesa, and adding code to check for
compatibility profile and not GL_ARB_ES2_compatibility for
GL_IMPLEMENTATION_COLOR_READ_TYPE and GL_IMPLEMENTATION_COLOR_READ_FORMAT
just feels dumb.

Three other other alternatives considered:

 - Remove the string from compatibility profile drivers but leave the
   functionality in place.

 - Add a flag to expose the extension string, and set it in every OpenGL
   driver that does not expose GL_ARB_ES2_compatibility (and those
   drivers only).  I tried this.  You can't have two instances of an
   extension in the extension table (one dummy_true for ES1 and one with
   a flag for compatibility profile), so the implementation requires a
   bit of effort.

 - Only expose the extension in compatibility if the version is less
   than 2.0.  I didn't see an easy way to do this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
7 years agodocs: fix incorrect link to 12.0.6 release notes
Brian Paul [Tue, 24 Jan 2017 21:30:07 +0000 (14:30 -0700)]
docs: fix incorrect link to 12.0.6 release notes

Trivial.

7 years agoanv: Expose VK_KHR_maintenance1
Jason Ekstrand [Sat, 21 Jan 2017 01:46:33 +0000 (17:46 -0800)]
anv: Expose VK_KHR_maintenance1

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Return better errors from AllocateDescriptorSets
Jason Ekstrand [Sat, 21 Jan 2017 15:40:22 +0000 (07:40 -0800)]
anv: Return better errors from AllocateDescriptorSets

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Allow selecting the slice of a 3D image
Jason Ekstrand [Sat, 21 Jan 2017 03:47:18 +0000 (19:47 -0800)]
anv: Allow selecting the slice of a 3D image

As per VK_KHR_maintenance1, clients can render to a slice of a 3D image
by creating a VK_IMAGE_VIEW_TYPE_2D view of it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Report FORMAT_FEATURE_TRANSFER_SRC/DST_BIT_KHR
Jason Ekstrand [Sat, 21 Jan 2017 03:39:03 +0000 (19:39 -0800)]
anv: Report FORMAT_FEATURE_TRANSFER_SRC/DST_BIT_KHR

As of VK_KHR_maintenance1, these are supposed to be reported for any
formats on which we support transfer operations.  For us, this is
anything that we can texture from.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Add trivial support for TrimCommandPoolKHR
Jason Ekstrand [Sat, 21 Jan 2017 01:46:21 +0000 (17:46 -0800)]
anv: Add trivial support for TrimCommandPoolKHR

Our command buffers already efficiently use a global pool so trimming
doesn't really need to do anything.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Set viewport extents correctly when height is negative
Jason Ekstrand [Sat, 21 Jan 2017 01:30:51 +0000 (17:30 -0800)]
anv: Set viewport extents correctly when height is negative

As per VK_KHR_maintenance1, setting a negative height in the viewport
can be used to get flipped coordinates.  This is, aparently, very useful
when porting D3D apps to Vulkan.  All we need to do to support this is
to make sure we actually set the min and max correctly.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agovulkan: Don't install vk_platform.h or vulkan.h.
Matt Turner [Tue, 24 Jan 2017 00:48:01 +0000 (16:48 -0800)]
vulkan: Don't install vk_platform.h or vulkan.h.

These files belong to the vulkan loader.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoglsl: fix compile errors with mingw due to missing PRIx64 definitions
Roland Scheidegger [Mon, 23 Jan 2017 19:21:00 +0000 (20:21 +0100)]
glsl: fix compile errors with mingw due to missing PRIx64 definitions

define __STDC_FORMAT_MACROS and include <inttypes.h> (same as
ir_builder_print_visitor.cpp already does).

Otherwise, some mingw build errors out (since
8e7e1ae0365ddc7edb0d4d98250ab46728e6c14a and
bbce1c538dc0cb8bf3769510283d11847dc07540 presumably) with:
src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’ before ‘PRIu64’
   case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;

(Note even with that fix I get other format specifier warnings:
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: unknown conversion type character ‘a’ in format [-Wformat=]
                fprintf(f, "%a", ir->value.f[i]);
                                               ^
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: too many arguments for format [-Wformat-extra-args]
but it still compiles at least)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agogallivm: don't try to use fast rcp for fdiv
Roland Scheidegger [Mon, 23 Jan 2017 17:06:03 +0000 (18:06 +0100)]
gallivm: don't try to use fast rcp for fdiv

The use of fast rcp instruction is disabled, and will always fall back
to use a division instead (1 / x). Hence, if we get a division opcode,
it doesn't make much sense trying to split that into rcp/mul.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agogallivm: (trivial) fix ddiv cpu implementation
Roland Scheidegger [Mon, 23 Jan 2017 17:04:12 +0000 (18:04 +0100)]
gallivm: (trivial) fix ddiv cpu implementation

we can't use the cpu implementation of fdiv, as this one uses different
lp_build_context, which causes assertion failure.
Just use default fdiv action (there is no fast rcp for doubles which we
could potentially use anyway).

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agotgsi: implement ddiv opcode
Roland Scheidegger [Mon, 23 Jan 2017 17:10:44 +0000 (18:10 +0100)]
tgsi: implement ddiv opcode

softpipe (along with llvmpipe) claims to support arb_gpu_shader_fp64,
so we really need to support that opcode.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agoi965/blorp: Use the correct ISL format for combined depth/stencil
Jason Ekstrand [Mon, 23 Jan 2017 18:53:13 +0000 (10:53 -0800)]
i965/blorp: Use the correct ISL format for combined depth/stencil

In brw_blorp_copyteximage, we use the format from the render buffer.
This could be a combined depth/stencil format.  In this case, we handle
stencil properly but we give blorp the wrong ISL format.  Specifically,
we would give blorp ISL_FORMAT_R32G32B32A32_FLOAT which is the wrong
size was causing GPU hangs.

Fixes: GL45-CTS.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_copyteximage
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
7 years agost/glsl_to_tgsi: fix compilation warnings since int64 types
Samuel Pitoiset [Tue, 24 Jan 2017 10:50:08 +0000 (11:50 +0100)]
st/glsl_to_tgsi: fix compilation warnings since int64 types

state_tracker/st_glsl_to_tgsi.cpp:302:28: warning: ‘glsl_to_tgsi_instruction::tex_type’
is too small to hold all values of ‘enum glsl_base_type’
    glsl_base_type tex_type:4;

Fixes: 8ce53d4a2f3 ("glsl: Add basic ARB_gpu_shader_int64 types")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: undef the very specific UPDATE_COUNTER macro
Samuel Pitoiset [Tue, 24 Jan 2017 10:11:59 +0000 (11:11 +0100)]
gallium/radeon: undef the very specific UPDATE_COUNTER macro

Also, wrap this into a do { ... } while (0). Suggested by Nicolai.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoi965/blorp: Add also depth and stencil buffers to render cache
Topi Pohjolainen [Thu, 19 Jan 2017 08:11:42 +0000 (10:11 +0200)]
i965/blorp: Add also depth and stencil buffers to render cache

v2 (Jason, Curro): Add stencil also even though it is not
                   enabled yet.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agogbm: Fix width height getters return type (trivial)
Ben Widawsky [Wed, 26 Oct 2016 19:23:32 +0000 (12:23 -0700)]
gbm: Fix width height getters return type (trivial)

v2: Other way round... to make consistent, make both return type have
the fixed width - uint32_t.

Cc: Daniel Stone <daniel@fooishbar.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
7 years agogbm: Move getters to match order in header file (trivial)
Ben Widawsky [Wed, 19 Oct 2016 21:52:12 +0000 (14:52 -0700)]
gbm: Move getters to match order in header file (trivial)

Other things are out of order, but I need to add a getter so I'm just
fixing those.

This helps people adding to GBM know where the right place to put things
is.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
7 years agodocs: add news item and link release notes for 12.0.6
Emil Velikov [Tue, 24 Jan 2017 02:13:42 +0000 (02:13 +0000)]
docs: add news item and link release notes for 12.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agodocs: use correct year for the 12.0.6 release notes
Emil Velikov [Tue, 24 Jan 2017 02:05:20 +0000 (02:05 +0000)]
docs: use correct year for the 12.0.6 release notes

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 13953f012dfc7f89dbb07f1eda856aa5353347cc)

7 years agodocs: add sha256 checksums for 12.0.6
Emil Velikov [Tue, 24 Jan 2017 02:02:48 +0000 (02:02 +0000)]
docs: add sha256 checksums for 12.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 36e3f2542d3cde1fe4f7ca0be83dc49d941cb988)

7 years agodocs: add release notes for 12.0.6
Emil Velikov [Tue, 24 Jan 2017 01:32:02 +0000 (01:32 +0000)]
docs: add release notes for 12.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 555885a0bf64d49bc6c31c0aaeb636c24ef61102)

7 years agodocs/releasing: remove stray "cd"
Emil Velikov [Tue, 24 Jan 2017 00:58:54 +0000 (00:58 +0000)]
docs/releasing: remove stray "cd"

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agonv50: add support for MUL_ZERO_WINS property
Ilia Mirkin [Sat, 14 Jan 2017 23:55:25 +0000 (18:55 -0500)]
nv50: add support for MUL_ZERO_WINS property

This is simply keyed off the vertex shader, as that's guaranteed to be
present in any pipeline.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0: add support for MUL_ZERO_WINS property
Ilia Mirkin [Sat, 14 Jan 2017 23:55:25 +0000 (18:55 -0500)]
nvc0: add support for MUL_ZERO_WINS property

This sets the dnz flag on all the relevant multiplication operations. At
emission time, this will only be supported by nvc0+, so nv50 will need a
different solution.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agost/nine: set the MUL_ZERO_WINS flag when supported
Ilia Mirkin [Sun, 15 Jan 2017 17:03:55 +0000 (12:03 -0500)]
st/nine: set the MUL_ZERO_WINS flag when supported

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agogallium: add PIPE_CAP_TGSI_MUL_ZERO_WINS
Ilia Mirkin [Tue, 17 Jan 2017 03:14:38 +0000 (22:14 -0500)]
gallium: add PIPE_CAP_TGSI_MUL_ZERO_WINS

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agogallium: add TGSI_PROPERTY_MUL_ZERO_WINS
Ilia Mirkin [Sat, 14 Jan 2017 23:39:41 +0000 (18:39 -0500)]
gallium: add TGSI_PROPERTY_MUL_ZERO_WINS

This will be useful for proper D3D9 emulation, where this behavior is
expected by some shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agoradeonsi: always set the TCL1_ACTION_ENA when invalidating L2
Marek Olšák [Fri, 20 Jan 2017 00:13:39 +0000 (01:13 +0100)]
radeonsi: always set the TCL1_ACTION_ENA when invalidating L2

Some CIK-VI docs say this is the default behavior on SI. That doesn't
answer whether it's also the default behavior on CIK-VI.

Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: don't declare LDS in TES
Marek Olšák [Thu, 19 Jan 2017 23:08:35 +0000 (00:08 +0100)]
radeonsi: don't declare LDS in TES

not used since we started using the offchip tess ring

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: preload PS inputs only if KILL is used
Marek Olšák [Thu, 19 Jan 2017 12:58:50 +0000 (13:58 +0100)]
radeonsi: preload PS inputs only if KILL is used

so that most shaders can get lower VGPR usage thanks to lazy input loading.
I think this is a more accurate constraint that prevents the black transitions
in Witcher 2.

Affected shaders (7758):
Max Waves: 57437 -> 58231 (1.38 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout
Marek Olšák [Fri, 20 Jan 2017 16:57:38 +0000 (17:57 +0100)]
gallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agowinsys/amdgpu: drop all IBs if at least one was rejected within the context
Marek Olšák [Thu, 19 Jan 2017 19:44:49 +0000 (20:44 +0100)]
winsys/amdgpu: drop all IBs if at least one was rejected within the context

The corruption is inevitable and hangs are possible too.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agowinsys/amdgpu: report a rejected IB as a lost context
Marek Olšák [Thu, 19 Jan 2017 19:32:28 +0000 (20:32 +0100)]
winsys/amdgpu: report a rejected IB as a lost context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agovulkan: import latest registry for 1.0.39 extensions.
Dave Airlie [Mon, 23 Jan 2017 22:05:39 +0000 (08:05 +1000)]
vulkan: import latest registry for 1.0.39 extensions.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agovulkan: bump vulkan.h to 1.0.39 version
Dave Airlie [Mon, 23 Jan 2017 21:55:51 +0000 (07:55 +1000)]
vulkan: bump vulkan.h to 1.0.39 version

This introduces a bunch of new extension defines.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: don't resubmit the same cs over and over while tracing
Grazvydas Ignotas [Mon, 23 Jan 2017 21:16:42 +0000 (23:16 +0200)]
radv: don't resubmit the same cs over and over while tracing

Fixes: 97dfff54 ("radv: Dump command buffer on hang.")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: <mesa-stable@lists.freedesktop.org>
7 years agogallium/radeon: add HUD queries for monitoring some hw blocks
Samuel Pitoiset [Fri, 20 Jan 2017 18:21:12 +0000 (19:21 +0100)]
gallium/radeon: add HUD queries for monitoring some hw blocks

It's also possible to monitor them via performance counters but
the hardware can only use two counters simultaneously. It seems
easier to re-use the existing code which reads from MMIO instead
of writing a multi-pass approach.

v2: - add new lines after ':'

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/radeon: refactor the GRBM counters path
Samuel Pitoiset [Fri, 20 Jan 2017 17:15:50 +0000 (18:15 +0100)]
gallium/radeon: refactor the GRBM counters path

This will allow to expose more queries in order to know which
blocks are busy/idle.

v2: - add new lines after ':'

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoswr: Align query results allocation
George Kyriazis [Wed, 18 Jan 2017 23:09:08 +0000 (17:09 -0600)]
swr: Align query results allocation

Some query results struct contents are declared as cache line aligned.
Use aligned malloc, and align the whole struct, to be safe.

Fixes crash when compiling with clang.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: Prune empty nodes in CalculateProcessorTopology.
Bruce Cherniak [Thu, 19 Jan 2017 21:44:52 +0000 (15:44 -0600)]
swr: Prune empty nodes in CalculateProcessorTopology.

CalculateProcessorTopology tries to figure out system topology by
parsing /proc/cpuinfo to determine the number of threads, cores, and
NUMA nodes.  There are some architectures where the "physical id" begins
with 1 rather than 0, which was creating and empty "0" node and causing a
crash in CreateThreadPool.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97102
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
7 years agoi965: Use UNUSED to silence unused variable (used in assert).
Matt Turner [Mon, 23 Jan 2017 18:50:20 +0000 (10:50 -0800)]
i965: Use UNUSED to silence unused variable (used in assert).

7 years agodri: allow 16bit R/GR images to be exported via drm buffers
Rainer Hochecker [Thu, 5 Jan 2017 15:58:56 +0000 (16:58 +0100)]
dri: allow 16bit R/GR images to be exported via drm buffers

This allows eglCreateImageKHR to access P010 surfaces created by vaapi

Signed-off-by: Rainer Hochecker <fernetmenta@online.de>
Acked-by: Ben Widawky <ben@bwidawsk.net>
7 years agost/va: make sure that we call begin_frame() only once v2
Christian König [Thu, 19 Jan 2017 12:44:34 +0000 (13:44 +0100)]
st/va: make sure that we call begin_frame() only once v2

This fixes "st/va: delay calling begin_frame until we have all parameters".

v2: call begin frame after decoder (re)creation as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
7 years agodrirc: remove spurious tabs
Eric Engestrom [Thu, 5 Jan 2017 21:06:35 +0000 (21:06 +0000)]
drirc: remove spurious tabs

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/glsl_to_tgsi: use DDIV instead of DRCP + DMUL
Nicolai Hähnle [Mon, 16 Jan 2017 15:43:54 +0000 (16:43 +0100)]
st/glsl_to_tgsi: use DDIV instead of DRCP + DMUL

Fixes GL45-CTS.gpu_shader_fp64.built_in_functions.

v2: use DDIV unconditionally (Roland)

Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agoglsl: split DIV_TO_MUL_RCP into single- and double-precision flags
Nicolai Hähnle [Mon, 16 Jan 2017 15:39:06 +0000 (16:39 +0100)]
glsl: split DIV_TO_MUL_RCP into single- and double-precision flags

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agor600: implement DDIV
Nicolai Hähnle [Thu, 19 Jan 2017 13:44:57 +0000 (14:44 +0100)]
r600: implement DDIV

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agor600: factor out cayman_emit_unary_double_raw
Nicolai Hähnle [Thu, 19 Jan 2017 13:44:24 +0000 (14:44 +0100)]
r600: factor out cayman_emit_unary_double_raw

We will use it for DDIV.

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agor600: double multiply can handle only one multiply at a time
Nicolai Hähnle [Thu, 19 Jan 2017 13:38:54 +0000 (14:38 +0100)]
r600: double multiply can handle only one multiply at a time

It seems clear that trying to multiply two pairs of doubles would result
in the temporary register getting overwritten by the second pair. So
make the code more explicit.

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agoglsl: fix tes linking regression
Timothy Arceri [Mon, 23 Jan 2017 07:06:37 +0000 (18:06 +1100)]
glsl: fix tes linking regression

Fixes regression caused by cbeba6bd48da2c. I accidentally pushed the
wrong version of the patch.

7 years agomesa: remove unused gl_shader_info field from gl_linked_shader
Timothy Arceri [Tue, 22 Nov 2016 13:05:01 +0000 (00:05 +1100)]
mesa: remove unused gl_shader_info field from gl_linked_shader

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa/glsl: set and get cs layouts to and from shader_info
Timothy Arceri [Tue, 22 Nov 2016 12:31:08 +0000 (23:31 +1100)]
mesa/glsl: set and get cs layouts to and from shader_info

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa/glsl: set and get gs layouts directly to and from shader_info
Timothy Arceri [Tue, 22 Nov 2016 10:45:16 +0000 (21:45 +1100)]
mesa/glsl: set and get gs layouts directly to and from shader_info

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa/glsl/i965: set and get tes layouts directly to and from shader_info
Timothy Arceri [Tue, 22 Nov 2016 10:14:14 +0000 (21:14 +1100)]
mesa/glsl/i965: set and get tes layouts directly to and from shader_info

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl: use last_vert_prog to get last {clip,cull}_distance_array_size
Timothy Arceri [Sun, 20 Nov 2016 11:23:17 +0000 (22:23 +1100)]
glsl: use last_vert_prog to get last {clip,cull}_distance_array_size

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa/glsl: set {clip,cull}_distance_array_size directly in gl_program
Timothy Arceri [Sun, 20 Nov 2016 12:05:42 +0000 (23:05 +1100)]
mesa/glsl: set {clip,cull}_distance_array_size directly in gl_program

There are some line wrapping violations here but those lines will get
deleted in the following patch.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/mesa/glsl: change xfb_program field to last_vert_prog
Timothy Arceri [Sun, 20 Nov 2016 10:44:29 +0000 (21:44 +1100)]
st/mesa/glsl: change xfb_program field to last_vert_prog

Now that the i965 backend doesn't depend on this field we can
make it more generic and short circuit a bunch of code paths.

The new field will be used in a following patch for another
clean-up.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa: use gl_program for CurrentProgram rather than gl_shader_program
Timothy Arceri [Mon, 31 Oct 2016 11:39:17 +0000 (22:39 +1100)]
mesa: use gl_program for CurrentProgram rather than gl_shader_program

This makes much more sense and should be more performant in some
critical paths such as SSO validation which is called at draw time.

Previously the CurrentProgram array could have contained multiple
pointers to the same struct which was confusing and we would often
need to fish out the information we were really after from the
gl_program anyway.

Also it was error prone to depend on the _LinkedShader array for
programs in current use because a failed linking attempt will lose
the infomation about the current program in use which is still
valid.

V2: fix validate_io() to compare linked_stages rather than the
consumer and producer to decide if we are looking at inward
facing shader interfaces which don't need validation.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
To avoid build regressions the following 2 patches were squashed in to
this commit:

mesa/meta: rewrite _mesa_shader_program_use() and _mesa_program_use()

These are rewritten to do what the function name suggests, that is
_mesa_shader_program_use() sets the use of all stage and
_mesa_program_use() sets the use of a single stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
mesa: update active relinked program

This likely fixes a subroutine bug were
_mesa_shader_program_init_subroutine_defaults() would never have been
called for the relinked program as we previously just set
_NEW_PROGRAM as dirty and never called the _mesa_use* functions when
linking.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agofreedreno/a5xx: set frag shader threadsize
Rob Clark [Sun, 22 Jan 2017 18:38:43 +0000 (13:38 -0500)]
freedreno/a5xx: set frag shader threadsize

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agofreedreno/a5xx: set fragcoordxy properly
Rob Clark [Sun, 22 Jan 2017 17:23:27 +0000 (12:23 -0500)]
freedreno/a5xx: set fragcoordxy properly

What a3xx docs call IJPERSPCENTERREGID.. the xy coord passed into
bary.f.  We were incorrectly setting both this and gl_FragCoord.xy to
the same register resulting in all sorts of hilarity.

Fixes stk, vdrift, 0ad, probably a bunch others.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agofreedreno/ir3: setup var locations in standalone compiler
Rob Clark [Thu, 19 Jan 2017 18:27:10 +0000 (13:27 -0500)]
freedreno/ir3: setup var locations in standalone compiler

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/a5xx: fix psize
Rob Clark [Mon, 16 Jan 2017 19:02:54 +0000 (14:02 -0500)]
freedreno/a5xx: fix psize

Note spritelist (POINTLIST_PSIZE) seems not to be a thing anymore on
a5xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agofreedreno/a5xx: srgb fix
Rob Clark [Sun, 15 Jan 2017 18:19:47 +0000 (13:19 -0500)]
freedreno/a5xx: srgb fix

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agofreedreno/a5xx: fix int vbos
Rob Clark [Sun, 15 Jan 2017 13:43:44 +0000 (08:43 -0500)]
freedreno/a5xx: fix int vbos

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agofreedreno/a5xx: fix clear for uint/sint formats
Rob Clark [Sat, 14 Jan 2017 12:59:42 +0000 (07:59 -0500)]
freedreno/a5xx: fix clear for uint/sint formats

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agofreedreno/a5xx: fix cull state
Rob Clark [Wed, 11 Jan 2017 16:31:40 +0000 (11:31 -0500)]
freedreno/a5xx: fix cull state

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agofreedreno: update generated headers
Rob Clark [Wed, 11 Jan 2017 16:30:21 +0000 (11:30 -0500)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv: descriptors: don't update immutables samplers with anything but their immutable...
Lionel Landwerlin [Wed, 18 Jan 2017 12:00:49 +0000 (12:00 +0000)]
anv: descriptors: don't update immutables samplers with anything but their immutable value

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agonir/search: Use the correct bit size for integer comparisons
Jason Ekstrand [Thu, 19 Jan 2017 19:28:31 +0000 (11:28 -0800)]
nir/search: Use the correct bit size for integer comparisons

The previous code always compared integers as 64-bit.  Due to variations
in sign-extension in the code generated by nir_opt_algebraic.py, this
meant that nir_search doesn't always do what you want.  Instead, 32-bit
values should be matched as 32-bit and 64-bit values should be matched
as 64-bit.  While we're here we unify the unsigned and signed paths.
Now that we're using the right bit size, they should be the same since
the only difference we had before was sign extension.

This gets the UE4 bitfield_extract optimization working again.  It had
stopped working due to the constant 0xff00ff00 getting sign-extended
when it shouldn't have.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
7 years agointel/blorp/copy: Properly handle clear colors for CCS_E images
Jason Ekstrand [Fri, 20 Jan 2017 20:27:34 +0000 (12:27 -0800)]
intel/blorp/copy: Properly handle clear colors for CCS_E images

In order to handle CCS_E, we stomp the image format to a UINT format and
then do some bitcasting logic in the shader.  This works fine since SKL
render compression only considers the channel layout of the format and
not the format itself.  In order for this to work on images that have
been fast-cleared, we need to also convert the clear color so that, when
interpreted as UINT, it provides the same bit value as it would have in
the original format.  This fixes a bunch of OpenGL ES CTS tests for
copy_image when we start using CCS more aggressively.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agoglsl: Rename [u]int64_t tokens.
Kenneth Graunke [Sat, 21 Jan 2017 03:14:45 +0000 (19:14 -0800)]
glsl: Rename [u]int64_t tokens.

basetsd.h on Windows defines INT64 and UINT64 typedefs which conflict
with these.  Append "_TOK" to avoid conflicts.

Should fix the Windows build.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoRevert "i965: Really don't emit Q or UQ moves on Gen < 8"
Matt Turner [Sat, 21 Jan 2017 03:07:04 +0000 (19:07 -0800)]
Revert "i965: Really don't emit Q or UQ moves on Gen < 8"

This reverts commit c95380c4044237d73fb537511667c3c8f658fcee.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Select DF type for 64-bit integers on Gen < 8.
Matt Turner [Sat, 21 Jan 2017 03:03:21 +0000 (19:03 -0800)]
i965: Select DF type for 64-bit integers on Gen < 8.

Gen8 adds Q/UQ types. We attempted to change the types back to DF in the
generator (commit c95380c40), but an assertion added in the FP64 series
(commit e481dcc3) triggers before that code has a chance to execute.

In fact, using Q/UQ in the IR and then changing to DF in the generator
would not work in the presence of source modifiers, etc.

Fixes: d6fcede6 ("i965: Return Q and UQ types for int64 and uint64")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Enable ARB_gpu_shader_int64 on Gen8+
Ian Romanick [Thu, 1 Sep 2016 19:06:09 +0000 (12:06 -0700)]
i965: Enable ARB_gpu_shader_int64 on Gen8+

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Split SIMD16 CMP of Q and UQ instructions
Ian Romanick [Tue, 25 Oct 2016 03:24:56 +0000 (20:24 -0700)]
i965: Split SIMD16 CMP of Q and UQ instructions

This is basically the same as happens for doubles.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Enable 64-bit integer support for almost all unary and binary operations
Ian Romanick [Sat, 3 Sep 2016 01:51:26 +0000 (18:51 -0700)]
i965: Enable 64-bit integer support for almost all unary and binary operations

Integer comparison functions (e.g., nir_op_ilt) are handled in the next
commit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Enable uploading 64-bit integer uniforms
Ian Romanick [Sat, 3 Sep 2016 01:50:49 +0000 (18:50 -0700)]
i965: Enable uploading 64-bit integer uniforms

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Add 64-bit integer support for conversions and bitcasts
Ian Romanick [Sat, 3 Sep 2016 01:49:20 +0000 (18:49 -0700)]
i965: Add 64-bit integer support for conversions and bitcasts

v2 (idr): Make the "from" type in a cast unsized.  This reduces the
number of required cast operations at the expensive slightly more
complex code.  However, this will be a dramatic improvement when other
sized integer types are added.  Suggested by Connor.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Enable emitting Q and UQ instructions in the fs backend
Ian Romanick [Thu, 1 Sep 2016 19:00:10 +0000 (12:00 -0700)]
i965: Enable emitting Q and UQ instructions in the fs backend

v2: Fixup assertion in brw_reg_type_to_hw_type to allow
BRW_REGISTER_TYPE_{UQ,Q} on Gen8+.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Add support for constant evaluation on Q and UQ types
Ian Romanick [Thu, 1 Sep 2016 18:48:26 +0000 (11:48 -0700)]
i965: Add support for constant evaluation on Q and UQ types

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Return Q and UQ types for int64 and uint64
Ian Romanick [Thu, 1 Sep 2016 18:45:22 +0000 (11:45 -0700)]
i965: Return Q and UQ types for int64 and uint64

It seems like maybe this should return a different type based on Gen.  Q
and UQ only exist on Gen8+, but, based on the old comment, I believe
previous Gens can generate 64-bit moves.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Really don't emit Q or UQ moves on Gen < 8
Ian Romanick [Mon, 24 Oct 2016 06:47:14 +0000 (23:47 -0700)]
i965: Really don't emit Q or UQ moves on Gen < 8

It's much easier to do this in the generator rather than while coming
out of NIR.  brw_type_for_nir_type doesn't know the Gen, so we'd have to
add a bunch of plumbing.  The alternate fix is to not emit int64 moves
for doubles in the first place... but that seems even more difficult.

This change won't catch non-MOV instructions that try to use 64-bit
integer types on Gen < 8.  This may convert certain kinds of bugs in to
different kinds of bugs that are more difficult to detect (since the
assertions in the function won't catch them).

NOTE: I don't think anything can emit mixed-type 64-bit moves until the
same platform supports both ARB_gpu_shader_fp64 and
ARB_gpu_shader_int64.  When we enable int64 on Gen < 8, we can solve
this problem other ways.

This prevents regressions on HSW in the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agonir: Add support for 64-bit integer types to split_var_copies_block
Ian Romanick [Wed, 7 Sep 2016 21:43:30 +0000 (14:43 -0700)]
nir: Add support for 64-bit integer types to split_var_copies_block

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir: Enable 64-bit integer support for almost all unary and binary operations
Ian Romanick [Sat, 3 Sep 2016 01:46:55 +0000 (18:46 -0700)]
nir: Enable 64-bit integer support for almost all unary and binary operations

v2: Don't up-convert the shift count parameter if shift instructions.
Suggested by Connor.  Add type_is_singed() function.  This will make
adding 8- and 16-bit types easier.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
7 years agonir: Shift count for shift opcodes is always 32-bits
Ian Romanick [Thu, 27 Oct 2016 10:02:58 +0000 (03:02 -0700)]
nir: Shift count for shift opcodes is always 32-bits

Previously both sources were unsized.  This caused problems when the
thing being shifted was 64-bit but the shift count was 32-bit.  The
expectation in NIR is that all unsized sources (and destination) will
ultimately have the same size.

The changes in nir_opt_algebraic.py are to prevent errors like:

 Failed to parse transformation:
03:12:25   (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte')
03:12:25 Traceback (most recent call last):
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__
03:12:25     xform = SearchAndReplace(xform)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__
03:12:25     BitSizeValidator(varset).validate(self.search, self.replace)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate
03:12:25     validate_dst_class = self._validate_bit_class_up(replace)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up
03:12:25     src_class = self._validate_bit_class_up(val.sources[i])
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up
03:12:25     assert src_class == src_type_bits
03:12:25 AssertionError

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
7 years agonir: Lower packing and unpacking of 64-bit integer types
Ian Romanick [Fri, 2 Sep 2016 15:09:53 +0000 (08:09 -0700)]
nir: Lower packing and unpacking of 64-bit integer types

This change makes me wonder whether double packing should be
reimplemented as int64BitsToDouble(packInt2x32(v)).  I'm a little on the
fence since not all platforms that support fp64 natively support int64.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir: Add 64-bit integer support for conversions and bitcasts
Ian Romanick [Thu, 1 Sep 2016 22:21:04 +0000 (15:21 -0700)]
nir: Add 64-bit integer support for conversions and bitcasts

v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

v3 (idr): Make the "from" type in a cast unsized.  This reduces the
number of required cast operations at the expensive slightly more
complex code.  However, this will be a dramatic improvement when other
sized integer types are added.  Suggested by Connor.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir: Add 64-bit integer constant support
Ian Romanick [Thu, 1 Sep 2016 21:17:49 +0000 (14:17 -0700)]
nir: Add 64-bit integer constant support

v2: Rebase on 19a541f (nir: Get rid of nir_constant_data)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v1]
7 years agonir: Add GLSL_TYPE_INT64 and GLSL_TYPE_UINT64 to glsl_get_bit_size
Ian Romanick [Thu, 1 Sep 2016 21:11:32 +0000 (14:11 -0700)]
nir: Add GLSL_TYPE_INT64 and GLSL_TYPE_UINT64 to glsl_get_bit_size

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agoglsl: Optimize redundant pack(unpack()) and unpack(pack()) combinations
Ian Romanick [Wed, 12 Oct 2016 22:31:22 +0000 (15:31 -0700)]
glsl: Optimize redundant pack(unpack()) and unpack(pack()) combinations

The lowering passes 64-bit integer operations will generate a lot of
these.

v2: Modify the HANDLE_PACK_UNPACK_INVERSE so that the breaks apply to
the switch instead of the 'do { } while(true)' loop.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoglsl: Add a lowering pass for 64-bit integer modulus
Ian Romanick [Tue, 18 Oct 2016 23:47:14 +0000 (16:47 -0700)]
glsl: Add a lowering pass for 64-bit integer modulus

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoglsl: Add "built-in" functions to do 64%64 => 64 modulus
Ian Romanick [Tue, 18 Oct 2016 23:46:35 +0000 (16:46 -0700)]
glsl: Add "built-in" functions to do 64%64 => 64 modulus

These functions are directly available in shaders.  A #define is added
to detect the presence.  This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering.  The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!

v2: Use function inlining.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoglsl: Add a lowering pass for 64-bit integer division
Ian Romanick [Tue, 18 Oct 2016 00:55:18 +0000 (17:55 -0700)]
glsl: Add a lowering pass for 64-bit integer division

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoglsl: Add "built-in" functions to do 64/64 => 64 division
Ian Romanick [Tue, 18 Oct 2016 00:54:40 +0000 (17:54 -0700)]
glsl: Add "built-in" functions to do 64/64 => 64 division

These functions are directly available in shaders.  A #define is added
to detect the presence.  This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering.  The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!

v2: Use function inlining.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoglsl: Add a lowering pass for 64-bit integer sign()
Ian Romanick [Mon, 17 Oct 2016 20:55:54 +0000 (13:55 -0700)]
glsl: Add a lowering pass for 64-bit integer sign()

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>