mesa.git
6 years agost/mesa, gallium: add a workaround for No Mans Sky
Timothy Arceri [Wed, 29 Aug 2018 05:48:47 +0000 (15:48 +1000)]
st/mesa, gallium: add a workaround for No Mans Sky

The spec seems clear this is not allowed but the Nvidia binary
forces apps to add layout qualifiers so this works around the
issue for No Mans Sky until the CTS can be sorted out.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoglsl: add a mechanism to allow layout qualifiers on function params
Timothy Arceri [Wed, 29 Aug 2018 05:48:46 +0000 (15:48 +1000)]
glsl: add a mechanism to allow layout qualifiers on function params

The spec is quite clear this is not allowed:

    From Section 4.4. (Layout Qualifiers) of the GLSL 4.60 spec:

       "Layout qualifiers can appear in several forms of declaration.
       They can appear as part of an interface block definition or
       block member, as shown in the grammar in the previous section.
       They can also appear with just an interface-qualifier to establish
       layouts of other declarations made with that qualifier:

          layout-qualifier interface-qualifier ;

       Or, they can appear with an individual variable declared with
       an interface qualifier:

          layout-qualifier interface-qualifier declaration ;"

    From Section 4.10 (Memory Qualifiers) of the GLSL 4.60 spec:

       "Layout qualifiers cannot be used on formal function parameters,
       and layout qualification is not included in parameter matching."

However on the Nvidia binary driver they actually fail to compile
if image function params don't have a layout qualifier. This results
in applications such as No Mans Sky using layout qualifiers on params.

I've submitted a CTS test to expose this problem in the Nvidia driver
but until that is resolved this patch will help Mesa drivers work
around the issue.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoglsl: skip stringification in preprocessor if in unreachable branch
Timothy Arceri [Wed, 29 Aug 2018 01:36:51 +0000 (11:36 +1000)]
glsl: skip stringification in preprocessor if in unreachable branch

This fixes compilation of some "No Mans Sky" shaders where the stringification
happens in branches intended for DX12.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoradv: Add missing checks in radv_get_image_format_properties.
Bas Nieuwenhuizen [Wed, 29 Aug 2018 15:04:25 +0000 (17:04 +0200)]
radv: Add missing checks in radv_get_image_format_properties.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agogallivm: allow to pass two swizzles into fetches.
Dave Airlie [Mon, 27 Aug 2018 01:03:41 +0000 (02:03 +0100)]
gallivm: allow to pass two swizzles into fetches.

This hijacks the top 16-bits of swizzle, to pass in the swizzle
for the second channel.

This fixes handling .yx swizzles of 64-bit values.

This should fixup radeonsi and llvmpipe.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: enable radeonsi_zerovram for No Mans Sky
Timothy Arceri [Fri, 24 Aug 2018 11:06:19 +0000 (21:06 +1000)]
radeonsi: enable radeonsi_zerovram for No Mans Sky

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: add radeonsi_zerovram driconfig option
Timothy Arceri [Fri, 24 Aug 2018 11:06:18 +0000 (21:06 +1000)]
radeonsi: add radeonsi_zerovram driconfig option

More and more games seem to require this so lets make it a config
option.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: enable GL 4.5 in compat profile
Timothy Arceri [Fri, 24 Aug 2018 11:06:17 +0000 (21:06 +1000)]
radeonsi: enable GL 4.5 in compat profile

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: enable ARB_direct_state_access in compat for GL3.1+
Timothy Arceri [Wed, 29 Aug 2018 02:40:12 +0000 (12:40 +1000)]
mesa: enable ARB_direct_state_access in compat for GL3.1+

We could enable it for lower versions of GL but this allows us
to just use the existing version/extension checks that are already
used by the core profile.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: add a thorough clear/copy_buffer benchmark
Marek Olšák [Thu, 2 Aug 2018 22:15:48 +0000 (18:15 -0400)]
radeonsi: add a thorough clear/copy_buffer benchmark

6 years agoradeonsi: let internal compute dispatches tune WAVES_PER_SH
Marek Olšák [Thu, 2 Aug 2018 20:37:17 +0000 (16:37 -0400)]
radeonsi: let internal compute dispatches tune WAVES_PER_SH

6 years agoradeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI
Marek Olšák [Wed, 25 Jul 2018 05:37:21 +0000 (01:37 -0400)]
radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI

6 years agoradeonsi: add SI_QUERY_TIME_ELAPSED_SDMA_SI for measuring DMA on SI
Marek Olšák [Tue, 21 Aug 2018 04:46:53 +0000 (00:46 -0400)]
radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA_SI for measuring DMA on SI

DMA on SI doesn't support the timestamp packet, so it's emulated.

6 years agoradeonsi: add SI_QUERY_TIME_ELAPSED_SDMA for measuring SDMA performance
Marek Olšák [Tue, 24 Jul 2018 17:14:29 +0000 (13:14 -0400)]
radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA for measuring SDMA performance

6 years agoradeonsi: add flag L2_STREAM for minimal cache usage
Marek Olšák [Fri, 3 Aug 2018 00:33:06 +0000 (20:33 -0400)]
radeonsi: add flag L2_STREAM for minimal cache usage

6 years agogallium: add TGSI_MEMORY_STREAM_CACHE_POLICY
Marek Olšák [Wed, 25 Jul 2018 04:41:48 +0000 (00:41 -0400)]
gallium: add TGSI_MEMORY_STREAM_CACHE_POLICY

For internal radeonsi shaders.

6 years agointel/compiler: Remove surface_idx from brw_image_param
Jason Ekstrand [Fri, 17 Aug 2018 14:15:56 +0000 (09:15 -0500)]
intel/compiler: Remove surface_idx from brw_image_param

Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel: Use TXS for image_size when we have a typed surface
Jason Ekstrand [Thu, 16 Aug 2018 16:01:24 +0000 (11:01 -0500)]
intel: Use TXS for image_size when we have a typed surface

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoanv,i965: Lower away image derefs in the driver
Jason Ekstrand [Thu, 16 Aug 2018 21:23:10 +0000 (16:23 -0500)]
anv,i965: Lower away image derefs in the driver

Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params.  As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index.  There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.

This also has the side-effect of making most image use compile-time
binding table indices.  Previously, all image access pulled the binding
table index from a uniform.  Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart.  Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
    instructions in affected programs: 115834 -> 113255 (-2.23%)
    helped: 191
    HURT: 0

    total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
    cycles in affected programs: 4757115 -> 4642085 (-2.42%)
    helped: 73
    HURT: 67

    total spills in shared programs: 10951 -> 10926 (-0.23%)
    spills in affected programs: 742 -> 717 (-3.37%)
    helped: 7
    HURT: 0

    total fills in shared programs: 22226 -> 22201 (-0.11%)
    fills in affected programs: 1146 -> 1121 (-2.18%)
    helped: 7
    HURT: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir: Add handle/index-based image intrinsics
Jason Ekstrand [Thu, 16 Aug 2018 20:11:44 +0000 (15:11 -0500)]
nir: Add handle/index-based image intrinsics

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir: Use a bitfield for image access qualifiers
Jason Ekstrand [Thu, 16 Aug 2018 20:11:12 +0000 (15:11 -0500)]
nir: Use a bitfield for image access qualifiers

This commit expands the current memory access enum to contain the extra
two bits provided for images.  We choose to follow the SPIR-V convention
of NonReadable and NonWriteable because readonly implies that you *can*
read so readonly + writeonly doesn't make as much sense as NonReadable +
NonWriteable.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoglsl/link,i965: Make ImageAccess four-state
Jason Ekstrand [Thu, 16 Aug 2018 19:31:28 +0000 (14:31 -0500)]
glsl/link,i965: Make ImageAccess four-state

The GLSL spec allows you to set both the "readonly" and "writeonly"
qualifiers on images to indicate that it can only be used with
imageSize.  However, we had no way of representing this int he linked
shader and flagged it as GL_READ_ONLY.  This is good from a "does it use
this buffer?" perspective but not from a format and access lowering
perspective.  By using GL_NONE for if "readonly" and "writeonly" are
both set, we can detect this case in the driver and handle it correctly.

Nothing currently relies on the type of surface in the "readonly" +
"writeonly" case but that's about to change.  i965 is the only drier
which uses the ImageAccess field and gl_bindless_image::access is
currently unused.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Use two components for 1D array image sizes
Jason Ekstrand [Thu, 16 Aug 2018 15:16:41 +0000 (10:16 -0500)]
intel/compiler: Use two components for 1D array image sizes

Having the array length component stored in .z was a small convenience
for the ISL image param filling code and an annoyance in the NIR
lowering code.  The only convenience of treating 1D arrays like 2D
arrays in the lowering code is in the address calculation code so let's
put all the complexity there as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoisl: Use the view array length for the image size
Jason Ekstrand [Thu, 16 Aug 2018 15:12:16 +0000 (10:12 -0500)]
isl: Use the view array length for the image size

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler: Do image load/store lowering to NIR
Jason Ekstrand [Sat, 27 Jan 2018 21:19:57 +0000 (13:19 -0800)]
intel/compiler: Do image load/store lowering to NIR

This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end.  This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down.  In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end.  On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166910 -> 15166872 (<.01%)
    instructions in affected programs: 5895 -> 5857 (-0.64%)
    helped: 15
    HURT: 0

Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/types: Add a wrapper for coordinate_components
Jason Ekstrand [Thu, 16 Aug 2018 15:22:32 +0000 (10:22 -0500)]
nir/types: Add a wrapper for coordinate_components

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoanv/pipeline: Remove dead image loads in lower_input_attacnments
Jason Ekstrand [Wed, 15 Aug 2018 19:04:25 +0000 (14:04 -0500)]
anv/pipeline: Remove dead image loads in lower_input_attacnments

Dead code will get rid of them eventually but it's better if they're
just gone so we guarantee they won't trip up later passes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir: Make image load/store intrinsics variable-width
Jason Ekstrand [Tue, 14 Aug 2018 19:03:05 +0000 (14:03 -0500)]
nir: Make image load/store intrinsics variable-width

Instead of requiring 4 components, this allows them to potentially use
fewer.  Both the SPIR-V and GLSL paths still generate vec4 intrinsics so
drivers which assume 4 components should be safe.  However, we want to
be able to shrink them for i965.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/format_convert: Fix a bitmask in unpack_11f11f10f
Jason Ekstrand [Thu, 16 Aug 2018 14:21:10 +0000 (09:21 -0500)]
nir/format_convert: Fix a bitmask in unpack_11f11f10f

Fixes: 4e337b42f9a2 "nir/format_convert: Add pack/unpack for R11F_G11F_B10F"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/format_convert: Rename pack_r11g11b10f to pack_11f11f10f
Jason Ekstrand [Mon, 13 Aug 2018 22:31:19 +0000 (17:31 -0500)]
nir/format_convert: Rename pack_r11g11b10f to pack_11f11f10f

This matches the unpack function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/format_convert: Add [us]norm conversion helpers
Jason Ekstrand [Mon, 13 Aug 2018 21:13:50 +0000 (16:13 -0500)]
nir/format_convert: Add [us]norm conversion helpers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/format_convert: Rename nir_format_bitcast_uint_vec
Jason Ekstrand [Mon, 13 Aug 2018 19:57:22 +0000 (14:57 -0500)]
nir/format_convert: Rename nir_format_bitcast_uint_vec

We have a name for that, it's called a uvec.  This just makes the
function name a bit shorter.  While we're here, we also add an assert
for one of the assumptions this function makes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/format_convert: Add vec mask and sign-extend helpers
Jason Ekstrand [Mon, 13 Aug 2018 17:04:25 +0000 (12:04 -0500)]
nir/format_convert: Add vec mask and sign-extend helpers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/format_convert: Add support for unpacking signed integers
Jason Ekstrand [Mon, 13 Aug 2018 16:41:41 +0000 (11:41 -0500)]
nir/format_convert: Add support for unpacking signed integers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/opcodes: Make unpack_half_2x16_split_* variable-width
Jason Ekstrand [Wed, 15 Aug 2018 16:58:50 +0000 (11:58 -0500)]
nir/opcodes: Make unpack_half_2x16_split_* variable-width

There is nothing inherent about these opcodes that requires them to only
take scalars.  It's very convenient if we let them take vectors as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/algebraic: Add some max/min optimizations
Jason Ekstrand [Wed, 15 Aug 2018 09:29:42 +0000 (04:29 -0500)]
nir/algebraic: Add some max/min optimizations

Found by inspection.  This doesn't help much now but we'll see this
pattern with images if you load UNORM and then store UNORM.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166916 -> 15166910 (<.01%)
    instructions in affected programs: 761 -> 755 (-0.79%)
    helped: 6
    HURT: 0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/algebraic: Add more extract_[iu](8|16) optimizations
Jason Ekstrand [Tue, 14 Aug 2018 19:37:39 +0000 (14:37 -0500)]
nir/algebraic: Add more extract_[iu](8|16) optimizations

This adds the "(a << N) >> M" family of mask or sign-extensions.  Not a
huge win right now but this pattern will soon be generated by NIR format
lowering code.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166918 -> 15166916 (<.01%)
    instructions in affected programs: 36 -> 34 (-5.56%)
    helped: 2
    HURT: 0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/algebraic: Be more careful converting ushr to extract_u8/16
Jason Ekstrand [Tue, 14 Aug 2018 20:10:22 +0000 (15:10 -0500)]
nir/algebraic: Be more careful converting ushr to extract_u8/16

If it's not the right bit-size, it may not actually be the correct
extraction.  For now, we'll only worry about 32-bit versions.

Fixes: 905ff8619824 "nir: Recognize open-coded extract_u16"
Fixes: 76289fbfa84a "nir: Recognize open-coded extract_u8"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/tools: new i965_disasm tool
Sagar Ghuge [Wed, 29 Aug 2018 17:52:41 +0000 (10:52 -0700)]
intel/tools: new i965_disasm tool

Adds a new i965 instruction disassemble tool

v2: 1) fix a few nits (Matt Turner)
    2) Remove i965_disasm header (Matt Turner)

v3: 1) Redirect output to correct file descriptors (Matt Turner)
    2) Refactor code (Matt Turner)
    3) Use better formatting style (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
6 years agost/mesa: Disable blending for integer formats.
Kenneth Graunke [Sat, 25 Aug 2018 06:45:27 +0000 (23:45 -0700)]
st/mesa: Disable blending for integer formats.

Blending isn't valid for integer formats.  Rather than having drivers
worry about this, just disable blending in this case.  This hopefully
will increase hits in the CSO cache as well, by eliminating most of the
meaningless fields in this case.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agosvga: add missing switch cases for shadow textures
Brian Paul [Fri, 24 Aug 2018 02:48:04 +0000 (20:48 -0600)]
svga: add missing switch cases for shadow textures

This doesn't seem to make any difference in testing, but it fixes a
failed assertion when dumping sm3 shaders.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
6 years agosvga: fix vgpu9 sprite coordinate bug
Brian Paul [Thu, 23 Aug 2018 15:25:15 +0000 (09:25 -0600)]
svga: fix vgpu9 sprite coordinate bug

Setting GL_POINT_SPRITE_COORD_ORIGIN to GL_LOWER_LEFT did not work for
vgpu9.  We can use the rasterizer sprite_coord_enable bitfield as-is.
We need to index into it using the TGSI semantic index, not the
register index.

This fixes the Piglit fbo-gl_pointcoord and glsl-fs-pointcoord tests.

Testing done: Piglit, Mesa sprite demos

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
6 years agosvga: fix PIPE_TEXTURE_RECT/BUFFER const buffer issue
Brian Paul [Wed, 22 Aug 2018 17:11:22 +0000 (11:11 -0600)]
svga: fix PIPE_TEXTURE_RECT/BUFFER const buffer issue

The flag_rect and flag_buffer fields didn't sufficiently capture
the state changes needed for those resource types.  For example,
if a texture binding was changed from a 500x500 rect texture to a
400x400 rect texture we didn't set SVGA_NEW_TEXTURE_CONSTS.  But
we need to do that to emit the new texcoord scale factors to the
constant buffers.  Rather than track the sizes of all bound
resources, just set the flag if the resource is a rect.  Same
story with texture buffers.

Also, since rect/buffer textures are usable with VS/GS shaders,
add SVGA_NEW_TEXTURE_CONSTS to the flags we check for emitting
VS/GS constants.

This seems to help with XFCE / xfwm4 desktop scaling.
VMware issue 2156696.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
6 years agosvga: minor improvements in svga_state_constants.c
Brian Paul [Fri, 24 Aug 2018 21:17:20 +0000 (15:17 -0600)]
svga: minor improvements in svga_state_constants.c

Add const qualifiers.  Add 'f' suffix on floats to avoid double
promotion.

Remove unneeded shader type assertion since the switch statement
handled it already.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
6 years agoanv: Free the app and engine name
Jason Ekstrand [Wed, 29 Aug 2018 15:06:56 +0000 (10:06 -0500)]
anv: Free the app and engine name

Fixes: 8c048af5890d4 "anv: Copy the appliation info into the instance"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agonv50/ir: silence partitionLoadStore() unused function warning
Rhys Kidd [Fri, 10 Aug 2018 16:44:37 +0000 (12:44 -0400)]
nv50/ir: silence partitionLoadStore() unused function warning

Move this now-unused function into the existing comment block, which was its only prior use.

../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning:
      unused function 'partitionLoadStore' [-Wunused-function]
partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask)

Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
6 years agoglsl/linker: Link all out vars from a shader objects on a single stage
vadym.shovkoplias [Tue, 28 Aug 2018 07:32:18 +0000 (10:32 +0300)]
glsl/linker: Link all out vars from a shader objects on a single stage

During intra stage linking some out variables can be dropped because
it is not used in a shader with the main function. But these out vars
can be referenced on later stages which can lead to further linking
errors.

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105731

6 years agoanv: blorp: support multiple aspect blits
Lionel Landwerlin [Tue, 28 Aug 2018 10:16:33 +0000 (11:16 +0100)]
anv: blorp: support multiple aspect blits

Newer blit tests are enabling depth&stencils blits. We currently don't
support it but can do by iterating over the aspects masks (copy some
logic from the CopyImage function).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9f44745eca0e41 ("anv: Use blorp to implement VkBlitImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agomesa: allow GL_UNSIGNED_BYTE type for SNORM reads
Tapani Pälli [Mon, 27 Aug 2018 11:40:41 +0000 (14:40 +0300)]
mesa: allow GL_UNSIGNED_BYTE type for SNORM reads

OpenGL ES spec states:
   "For normalized fixed-point rendering surfaces, the combination format
    RGBA and type UNSIGNED_BYTE is accepted."

This fixes following failing VK-GL-CTS tests:

   KHR-GLES3.packed_pixels.pbo_rectangle.rgba8_snorm
   KHR-GLES3.packed_pixels.rectangle.rgba8_snorm
   KHR-GLES3.packed_pixels.varied_rectangle.rgba8_snorm

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=107658
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andres Gomez <agomez@igalia.com>
6 years agonir: add loop unroll support for wrapper loops
Timothy Arceri [Sat, 7 Jul 2018 07:56:26 +0000 (17:56 +1000)]
nir: add loop unroll support for wrapper loops

This adds support for unrolling the classic

    do {
        // ...
    } while (false)

that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.

shader-db results IVB:

total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir/opt_loop_unroll: Remove unneeded phis if we make progress
Timothy Arceri [Wed, 11 Jul 2018 00:50:16 +0000 (10:50 +1000)]
nir/opt_loop_unroll: Remove unneeded phis if we make progress

Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis.  In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir: add complex_loop bool to loop info
Timothy Arceri [Sat, 7 Jul 2018 02:09:26 +0000 (12:09 +1000)]
nir: add complex_loop bool to loop info

In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.

This will be used in the following patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir: always attempt to find loop terminators
Timothy Arceri [Sat, 7 Jul 2018 02:02:08 +0000 (12:02 +1000)]
nir: always attempt to find loop terminators

This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI
Marek Olšák [Tue, 28 Aug 2018 18:39:09 +0000 (14:39 -0400)]
ac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI

This fixes VM faults and corruption.

Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoi965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Ian Romanick [Sat, 25 Aug 2018 00:24:36 +0000 (17:24 -0700)]
i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1

No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agoi965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1
Ian Romanick [Sat, 25 Aug 2018 00:41:01 +0000 (17:41 -0700)]
i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

No changes on any other Intel platforms.

Skylake
total instructions in shared programs: 14304261 -> 14304241 (<.01%)
instructions in affected programs: 1625 -> 1605 (-1.23%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -15.91% 4.19%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527531226 -> 527531194 (<.01%)
cycles in affected programs: 92204 -> 92172 (-0.03%)
helped: 2
HURT: 0

Haswell and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 14615730 -> 14615710 (<.01%)
instructions in affected programs: 1838 -> 1818 (-1.09%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -14.59% 3.85%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/fs: Refactor image atomics to be a bit more like other atomics
Ian Romanick [Sat, 25 Aug 2018 00:23:26 +0000 (17:23 -0700)]
i965/fs: Refactor image atomics to be a bit more like other atomics

This greatly simplifies the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agoi965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Ian Romanick [Thu, 23 Aug 2018 03:31:11 +0000 (20:31 -0700)]
i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1

Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel/compiler: Silence unused parameter warnings in brw_eu.h
Ian Romanick [Sat, 25 Aug 2018 00:24:59 +0000 (17:24 -0700)]
intel/compiler: Silence unused parameter warnings in brw_eu.h

All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it.  Keep the parameter,
but mark it as unused.

Silences 37 warnings like:

In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_pixel_interp_desc(const struct gen_device_info *devinfo,
                                                     ^~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agoi965: enable AMD_depth_clamp_separate
Sagar Ghuge [Tue, 21 Aug 2018 21:28:17 +0000 (14:28 -0700)]
i965: enable AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoi965: add functional changes for AMD_depth_clamp_separate
Sagar Ghuge [Tue, 21 Aug 2018 21:25:17 +0000 (14:25 -0700)]
i965: add functional changes for AMD_depth_clamp_separate

Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.

z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.

v2: 1) Use better coding style (Ian Romanick)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: add EXTRA_EXT for AMD_depth_clamp_separate
Sagar Ghuge [Fri, 27 Jul 2018 22:03:54 +0000 (15:03 -0700)]
mesa: add EXTRA_EXT for AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add support for GL_AMD_depth_clamp_separate tokens
Sagar Ghuge [Tue, 21 Aug 2018 20:40:55 +0000 (13:40 -0700)]
mesa: add support for GL_AMD_depth_clamp_separate tokens

_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.

v2: Remove unnecessary parentheses (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: Add support for AMD_depth_clamp_separate
Sagar Ghuge [Fri, 27 Jul 2018 21:55:57 +0000 (14:55 -0700)]
mesa: Add support for AMD_depth_clamp_separate

Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.

Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.

Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.

v2: 1) Remove unnecessary parentheses (Marek Olsak)
    2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
       GL_DEPTH_CLAMP only (Marek Olsak)
    3) Clamp against near and far plane separately (Marek Olsak)
    4) Clip point separately for near and far Z clipping plane (Marek
       Olsak)

v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
    and [0, max(n,f)] for far plane (Marek Olsak)

v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: Add types for AMD_depth_clamp_separate.
Sagar Ghuge [Fri, 27 Jul 2018 03:08:44 +0000 (20:08 -0700)]
mesa: Add types for AMD_depth_clamp_separate.

Add some basic types and storage for the AMD_depth_clamp_separate
extension.

v2: 1) Drop unnecessary definition (Marek Olsak)
    2) Expose extension in compatibility profile (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoglapi: define AMD_depth_clamp_separate
Sagar Ghuge [Fri, 27 Jul 2018 21:42:36 +0000 (14:42 -0700)]
glapi: define AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoanv: Claim to support depthBounds for ID games
Jason Ekstrand [Tue, 30 Jan 2018 02:41:15 +0000 (18:41 -0800)]
anv: Claim to support depthBounds for ID games

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: Copy the appliation info into the instance
Jason Ekstrand [Tue, 30 Jan 2018 02:12:04 +0000 (18:12 -0800)]
anv: Copy the appliation info into the instance

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agovulkan/alloc: Add a vk_strdup helper
Jason Ekstrand [Tue, 30 Jan 2018 02:11:38 +0000 (18:11 -0800)]
vulkan/alloc: Add a vk_strdup helper

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agomeson: Actually load translation files
Dylan Baker [Fri, 24 Aug 2018 14:05:36 +0000 (07:05 -0700)]
meson: Actually load translation files

Currently we run the script but don't actually load any files, even in a
tarball where they exist.

Fixes: 3218056e0eb375eeda470058d06add1532acd6d4
       ("meson: Build i965 and dri stack")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
6 years agonir: Remove outdated comment
Caio Marcelo de Oliveira Filho [Mon, 27 Aug 2018 22:18:10 +0000 (15:18 -0700)]
nir: Remove outdated comment

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: Add INTEL_fragment_shader_ordering support.
Kevin Rogovin [Mon, 27 Aug 2018 06:54:24 +0000 (09:54 +0300)]
i965: Add INTEL_fragment_shader_ordering support.

Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
6 years agomesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering
Kevin Rogovin [Mon, 27 Aug 2018 06:54:23 +0000 (09:54 +0300)]
mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering

This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().

One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
6 years agoi965/gen6/xfb: handle case where transform feedback is not active
Andrii Simiklit [Wed, 15 Aug 2018 15:20:32 +0000 (18:20 +0300)]
i965/gen6/xfb: handle case where transform feedback is not active

When the SVBI Payload Enable is false I guess the register R1.4
which contains the Maximum Streamed Vertex Buffer Index is filled by zero
and GS stops to write transform feedback when the transform feedback
is not active.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agodocs: add forgotten features to 18.2.0 release notes
Rhys Perry [Tue, 21 Aug 2018 10:08:17 +0000 (11:08 +0100)]
docs: add forgotten features to 18.2.0 release notes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewied-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 18.2: <mesa-stable@lists.freedesktop.org>
6 years agovirgl: add debug-switch to output TGSI
Erik Faye-Lund [Mon, 20 Aug 2018 11:46:32 +0000 (12:46 +0100)]
virgl: add debug-switch to output TGSI

This is quite useful for debugging shader-transpiling issues in
virglrenderer.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agovirgl: introduce $VIRGL_DEBUG=verbose
Erik Faye-Lund [Mon, 20 Aug 2018 11:08:55 +0000 (13:08 +0200)]
virgl: introduce $VIRGL_DEBUG=verbose

This adds an environment-varaible that can be used for driver-specific
flags, as well as a flag for it to enable verbose output.

While we're at it, quiet some overly chatty debug-output by default.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agovirgl: replace fprintf-call with debug_printf
Erik Faye-Lund [Mon, 20 Aug 2018 10:48:51 +0000 (12:48 +0200)]
virgl: replace fprintf-call with debug_printf

This is the only direct call-site for fprintf in virgl; all other
call-sites call debug_printf instead. So let's follow in style here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agovirgl: delete commented out fprintf-call
Erik Faye-Lund [Mon, 20 Aug 2018 10:45:14 +0000 (12:45 +0200)]
virgl: delete commented out fprintf-call

This is just debug-cruft left over. Let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agomeson: Don't enable any vulkan drivers on arm, aarch64
Guido Günther [Sun, 26 Aug 2018 20:24:00 +0000 (22:24 +0200)]
meson: Don't enable any vulkan drivers on arm, aarch64

There's no Vulkan support for arm atm.

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agomeson: Be a bit more helpful when arch or OS is unknown
Guido Günther [Sun, 26 Aug 2018 20:23:59 +0000 (22:23 +0200)]
meson: Be a bit more helpful when arch or OS is unknown

V2: Add one missing @0@

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agointel/eu: print bytes instead of 32 bit hex value
Sagar Ghuge [Mon, 27 Aug 2018 17:23:19 +0000 (10:23 -0700)]
intel/eu: print bytes instead of 32 bit hex value

INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.

v2: Print blank spaces in order to vertically align output of compacted
    instructions hex value with uncompacted instructions hex value.
    (Matt Turner)

v3: Fix line wrap at correct length

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel: decoder: handle 0 sized structs
Lionel Landwerlin [Sat, 25 Aug 2018 17:22:00 +0000 (18:22 +0100)]
intel: decoder: handle 0 sized structs

Gen7.5 has a BLEND_STATE of size 0 which includes a variable length
group. We did not deal with that very well, leading to an endless
loop.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonv50/ir,nvc0: use constant buffers for compute when possible on Kepler+
Rhys Perry [Fri, 3 Aug 2018 21:11:28 +0000 (22:11 +0100)]
nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+

Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.

total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs    : 669901 -> 669373 (-0.08%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21064 (-0.02%)

                local     shared        gpr       inst      bytes
    helped           1           0         152         274         274
      hurt           0           0           0           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: optimize multiplication by 16-bit immediates into two xmads
Rhys Perry [Sat, 18 Aug 2018 14:06:50 +0000 (15:06 +0100)]
nv50/ir: optimize multiplication by 16-bit immediates into two xmads

Rather than the usual three that would be created.

total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs    : 670103 -> 669968 (-0.02%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21068 (-0.45%)

                local     shared        gpr       inst      bytes
    helped           1           0          64        1040        1040
      hurt           0           0          27           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: optimize near power-of-twos into shladd
Rhys Perry [Sat, 18 Aug 2018 14:06:01 +0000 (15:06 +0100)]
nv50/ir: optimize near power-of-twos into shladd

total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs    : 670571 -> 670103 (-0.07%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0         318        1758        1758
      hurt           0           0          63           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: move a * b -> a << log2(b) code into createMul()
Rhys Perry [Wed, 13 Jun 2018 15:30:01 +0000 (16:30 +0100)]
nv50/ir: move a * b -> a << log2(b) code into createMul()

With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.

Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.

total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs    : 670595 -> 670571 (-0.00%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0          18         230         230
      hurt           0           0           8         263         263

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: optimize imul/imad to xmads
Rhys Perry [Wed, 13 Jun 2018 15:25:23 +0000 (16:25 +0100)]
nv50/ir: optimize imul/imad to xmads

This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.

total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs    : 669919 -> 670595 (0.10%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21164 (0.46%)

                local     shared        gpr       inst      bytes
    helped           0           0          38           0           0
      hurt           1           0         365        3076        3076

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agogm107/ir: add support for OP_XMAD on GM107+
Rhys Perry [Wed, 13 Jun 2018 15:21:56 +0000 (16:21 +0100)]
gm107/ir: add support for OP_XMAD on GM107+

v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: add preliminary support for OP_XMAD
Rhys Perry [Wed, 13 Jun 2018 15:21:20 +0000 (16:21 +0100)]
nv50/ir: add preliminary support for OP_XMAD

v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agoglsl/linker: Allow unused in blocks which are not declated on previous stage
vadym.shovkoplias [Thu, 23 Aug 2018 10:12:16 +0000 (13:12 +0300)]
glsl/linker: Allow unused in blocks which are not declated on previous stage

>From Section 4.3.4 (Inputs) of the GLSL 1.50 spec:

    "Only the input variables that are actually read need to be written
     by the previous stage; it is allowed to have superfluous
     declarations of input variables."

Fixes:
    * interstage-multiple-shader-objects.shader_test

v2:
  Update comment in ir.h since the usage of "used" field
  has been extended.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agonir: Pull block_ends_in_jump into nir.h
Jason Ekstrand [Fri, 24 Aug 2018 14:34:05 +0000 (09:34 -0500)]
nir: Pull block_ends_in_jump into nir.h

We had two different implementations in different files.  May as well
have one and put it in nir.h.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoanv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2()
Samuel Iglesias Gonsálvez [Fri, 24 Aug 2018 10:11:49 +0000 (12:11 +0200)]
anv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2()

VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1.

Fixes Vulkan CTS CL#2849.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/tools: Add 0x in front of a couple of hex values
Jason Ekstrand [Sat, 25 Aug 2018 22:34:17 +0000 (17:34 -0500)]
intel/tools: Add 0x in front of a couple of hex values

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: Fill holes in the VF VUE to zero
Jason Ekstrand [Sat, 25 Aug 2018 22:08:04 +0000 (17:08 -0500)]
anv: Fill holes in the VF VUE to zero

This fixes a GPU hang in DOOM 2016 running under wine.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel: tools: Fix aubinator_error's fprintf call (format-security)
Kai Wasserbäch [Sat, 25 Aug 2018 10:00:30 +0000 (12:00 +0200)]
intel: tools: Fix aubinator_error's fprintf call (format-security)

The recent commit 4616639b49b4bbc91e503c1c27632dccc1c2b5be introduced
the new function aubinator_error() which is a trivial wrapper around
fprintf() to STDERR. The call to fprintf() however is passed the message
msg directly:
  fprintf(stderr, msg);

This is a format-security violation and leads to an FTBFS with
-Werror=format-security (GCC 8):
  ../../../src/intel/tools/aubinator.c: In function 'aubinator_error':
  ../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security]
      fprintf(stderr, msg);
      ^~~~~~~

This patch fixes this trivially by introducing a catch-all "%s" format
argument.

Fixes: 4616639b49b ("intel: tools: split aub parsing from aubinator")
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch_decoder: Print blend states properly
Jason Ekstrand [Fri, 24 Aug 2018 21:05:08 +0000 (16:05 -0500)]
intel/batch_decoder: Print blend states properly

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch_decoder: Fix dynamic state printing
Jason Ekstrand [Fri, 24 Aug 2018 21:04:03 +0000 (16:04 -0500)]
intel/batch_decoder: Fix dynamic state printing

Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address.  Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Print ISL formats for vertex elements
Jason Ekstrand [Fri, 24 Aug 2018 20:27:38 +0000 (15:27 -0500)]
intel/decoder: Print ISL formats for vertex elements

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Clean up field iteration and fix sub-dword fields
Jason Ekstrand [Fri, 24 Aug 2018 20:23:04 +0000 (15:23 -0500)]
intel/decoder: Clean up field iteration and fix sub-dword fields

First of all, setting iter->name in advance_field is unnecessary because
it gets set by gen_decode_field which gets called immediately after
gen_decode_field in the one call-site.  Second, we weren't properly
initializing start_bit and end_bit in the initial condition of
gen_field_iterator_next so the first field of a struct would get printed
wrong if it doesn't start on the first bit.  This is fixed by adding a
iter_start_field helper which sets the field and also sets up the other
bits we need.  This fixes decoding of 3DSTATE_SBE_SWIZ.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>