mesa.git
5 years agospirv: Add vtn_variable_mode_image
Caio Marcelo de Oliveira Filho [Fri, 10 May 2019 02:33:51 +0000 (19:33 -0700)]
spirv: Add vtn_variable_mode_image

Corresponding to SpvStorageClassImage.  We see pointers for that
storage class in tests, but don't use the storage class any further.
Adding this so that we can call vtn_mode_to_address_format() for all
supported pointers.

v2: Fail when trying to create a SpvStorageClassImage
    variable.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agospirv: Add vtn_mode_to_address_format()
Caio Marcelo de Oliveira Filho [Fri, 3 May 2019 19:42:39 +0000 (12:42 -0700)]
spirv: Add vtn_mode_to_address_format()

Handles all the modes and we can use it in combination with
nir_address_format_to_glsl_type() to replace the
vtn_ptr_type_for_mode() helper.  Since the new helper is more generic,
moved the assertions from the old one to the call sites.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agospirv: Add vtn_mode_uses_ssa_offset()
Caio Marcelo de Oliveira Filho [Fri, 3 May 2019 05:11:31 +0000 (22:11 -0700)]
spirv: Add vtn_mode_uses_ssa_offset()

Just the mode is needed to decide whether SSA offsets are needed, so
make a function that takes that and reuse it for
vtn_pointer_uses_ssa_offset().

This will be used for constant null pointers, that won't have a
vtn_pointer handy.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agospirv: Add and use vtn_type_without_array() helper
Caio Marcelo de Oliveira Filho [Thu, 2 May 2019 23:12:07 +0000 (16:12 -0700)]
spirv: Add and use vtn_type_without_array() helper

v2: Renamed from vtn_interface_type. (Jason)
    Accept any type not only pointers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agospirv: Change vtn_null_constant() to use vtn_type
Caio Marcelo de Oliveira Filho [Thu, 2 May 2019 22:57:45 +0000 (15:57 -0700)]
spirv: Change vtn_null_constant() to use vtn_type

This is a preparation to handle OpConstantNull for pointers, we'll use
the vtn_type to get to the address format and then the appropriate
representation of NULL pointer.

v2: Move rest of body to use vtn_type. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agospirv: Export vtn_storage_class_to_mode()
Caio Marcelo de Oliveira Filho [Thu, 2 May 2019 22:53:22 +0000 (15:53 -0700)]
spirv: Export vtn_storage_class_to_mode()

So we can reuse in spirv_to_nir.c.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Add nir_address_format_null_value()
Caio Marcelo de Oliveira Filho [Wed, 1 May 2019 21:44:15 +0000 (14:44 -0700)]
nir: Add nir_address_format_null_value()

Returns the nir_const_value * with the representation of the NULL
pointer for each address format.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agospirv, radv, anv: Replace ptr_type with addr_format
Caio Marcelo de Oliveira Filho [Wed, 1 May 2019 21:15:32 +0000 (14:15 -0700)]
spirv, radv, anv: Replace ptr_type with addr_format

Instead of setting the glsl types of the pointers for each resource,
set the nir_address_format, from which we can derive the glsl_type,
and in the future the bit pattern representing a NULL pointer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Add nir_address_format_32bit_offset
Caio Marcelo de Oliveira Filho [Fri, 3 May 2019 21:34:55 +0000 (14:34 -0700)]
nir: Add nir_address_format_32bit_offset

This is a simple 32-bit address which is not a global address.  Gives
us a format that don't use 0 as its null pointer value.  We will need
this in anv to represent nir_var_mem_shared addresses.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: Add nir_address_format_logical
Caio Marcelo de Oliveira Filho [Wed, 1 May 2019 20:24:45 +0000 (13:24 -0700)]
nir: Add nir_address_format_logical

An address format representing a purely logical addressing model.  In
this model, all deref chains must be complete from the dereference
operation to the variable.  Cast derefs are not allowed.  These
addresses will be 32-bit scalars but the format is immaterial because
you can always chase the chain.  E.g. push constants in anv.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agofreedreno/a6xx: WFI in program stateobj too
Rob Clark [Fri, 17 May 2019 04:04:29 +0000 (21:04 -0700)]
freedreno/a6xx: WFI in program stateobj too

This "fixes" hangs seen w/ various android games.  I think a similar
issue to with constant state, we need to avoid CP_LOAD_STATE until
previous draw completes.

It isn't entirely clear why blob doesn't need to do this, but it might
have a different way to accomplish the same thing.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: make sure binning pass constlen is large enough
Rob Clark [Thu, 16 May 2019 18:34:13 +0000 (11:34 -0700)]
freedreno/a6xx: make sure binning pass constlen is large enough

Since we use same constant state for both binning pass program state and
draw pass state, and it is possible for binning pass shader to use fewer
consts, we need to make sure we program a large enough constlen.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: limit IBO state to draw pass
Rob Clark [Thu, 16 May 2019 18:33:36 +0000 (11:33 -0700)]
freedreno/a6xx: limit IBO state to draw pass

Currently we are only supporting images in FS (and CS) so limit this
stateobj to draw pass.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: don't evaluate FS tex state in binning pass
Rob Clark [Thu, 16 May 2019 17:58:48 +0000 (10:58 -0700)]
freedreno/a6xx: don't evaluate FS tex state in binning pass

It is unneeded since FS doesn't run in binning pass.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agoradv: decompress FMASK before performing a MSAA decompress using FMASK
Samuel Pitoiset [Thu, 16 May 2019 07:24:58 +0000 (09:24 +0200)]
radv: decompress FMASK before performing a MSAA decompress using FMASK

This fixes some CTS failures related to VK_EXT_sample_locations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/validate: fix crash if entry is null.
Dave Airlie [Mon, 20 May 2019 01:05:15 +0000 (11:05 +1000)]
nir/validate: fix crash if entry is null.

we validate assert entry just before this, but since that doesn't
stop execution, we need to check entry before the next validation
assert.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agolima/gpir: switch to use nir_lower_viewport_transform
Qiang Yu [Wed, 15 May 2019 02:52:39 +0000 (10:52 +0800)]
lima/gpir: switch to use nir_lower_viewport_transform

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/gpir: support vector ssa load
Qiang Yu [Wed, 15 May 2019 02:39:57 +0000 (10:39 +0800)]
lima/gpir: support vector ssa load

Some vector sysval can't be lowered to scaler, so need to break
it to scaler in nir to gpir convertion.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/gpir: add helper function for emit load node
Qiang Yu [Sun, 21 Apr 2019 01:54:46 +0000 (09:54 +0800)]
lima/gpir: add helper function for emit load node

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
5 years agoutil: add missing include to build_id.h
Timothy Arceri [Fri, 17 May 2019 05:23:11 +0000 (15:23 +1000)]
util: add missing include to build_id.h

Required to use uint8_t

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agopanfrost/midgard: Split up midgard_compile.c (RA)
Alyssa Rosenzweig [Sun, 19 May 2019 23:20:34 +0000 (23:20 +0000)]
panfrost/midgard: Split up midgard_compile.c (RA)

This commit moves the register allocator out of midgard_compile.c and
into its own midgard_ra.c file. In doing so, a number of dependencies
are identified and moved into their own files in turn. midgard_compile.c
is still fairly monolithic, but this should help.

Code churn, but no functional changes should be introduced by this
commit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Improve fixed-function blending
Alyssa Rosenzweig [Mon, 6 May 2019 02:15:38 +0000 (02:15 +0000)]
panfrost: Improve fixed-function blending

This fixes a few miscellaneous issues with the fixed-function blending
programming, though it is far from complete. For cases known to be
buggy, we force a fallback to blend shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Wire up nir_lower_blend
Alyssa Rosenzweig [Mon, 6 May 2019 02:13:55 +0000 (02:13 +0000)]
panfrost: Wire up nir_lower_blend

This implements blend shaders via nir_lower_blend, by creating dummy
fragment shaders simply passing through the source color and using the
new lowering pass to inject blendability.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Route new blending intrinsics
Alyssa Rosenzweig [Mon, 6 May 2019 02:12:41 +0000 (02:12 +0000)]
panfrost/midgard: Route new blending intrinsics

To prepare for the new nir_lower_blend pass, we wire up the intrinsics
for tilebuffer reads and constant colour loading.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/nir: Add nir_lower_blend pass
Alyssa Rosenzweig [Mon, 6 May 2019 02:06:02 +0000 (02:06 +0000)]
panfrost/nir: Add nir_lower_blend pass

This new lowering pass implements the OpenGL ES blend pipeline in
shaders, applicable to hardware lacking full-featured blending hardware
(including Midgard/Bifrost and vc4). This pass is run on a fragment
shader, rewriting the store to a blended version, loading in the
framebuffer destination color and constant color via intrinsics as
necessary. This pass is sufficient for OpenGL ES 2.0 and is verified to
pass dEQP's blend tests. MIN/MAX modes are included and tested as well.
That said, at present it has the following limitations:

 - MRT is not supported (ES3).
 - sRGB support is missing (ES3).
 - Extended blending is not yet ported from GLSL IR lowering (ES3.2)
 - Dual-source blending is not supported. (N/A)
 - Logic ops are not supported. (N/A)

v2: Fix code conventions (per Ian Romanick's feedback). Implement color
masks.

This pass should be in common nir/ space, but due to non-technical
reasons, for now it's in Panfrost space. In the future, depending if
other drivers need some of the functionality, we can move this back to
src/compiler/nir space.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Fix Bifrost-specific padding
Alyssa Rosenzweig [Sat, 18 May 2019 21:04:33 +0000 (21:04 +0000)]
panfrost: Fix Bifrost-specific padding

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
5 years agopanfrost: Cleanup panfrost_job comments
Alyssa Rosenzweig [Sat, 18 May 2019 21:01:03 +0000 (21:01 +0000)]
panfrost: Cleanup panfrost_job comments

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
5 years agopanfrost/decode: Decode blend constant
Alyssa Rosenzweig [Sat, 18 May 2019 20:48:43 +0000 (20:48 +0000)]
panfrost/decode: Decode blend constant

This adds a forgotten decode line on Midgard and adds the field of a
blend constant on Bifrost. The Bifrost encoding is fairly weird; whereas
Midgard is just a regular 32-bit float, Bifrost uses a fancy
fixed-point-esque encoding. The decode logic here is experimentally
correct. The encode logic is a sort of "guesstimate", assuming that the
high byte is just int(f / 255.0) and then solving algebraicly for the
low byte. This might be slightly off in some cases.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
5 years agopanfrost: Hoist blend constant into Midgard-specific struct
Alyssa Rosenzweig [Sat, 18 May 2019 20:36:00 +0000 (20:36 +0000)]
panfrost: Hoist blend constant into Midgard-specific struct

This eliminates one major source of #ifdef parity between Midgard and
Bifrost, better representing how the struct acts on Midgard and allowing
proper decodes on Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
5 years agopanfrost/decode: Disassemble Bifrost shaders
Alyssa Rosenzweig [Sat, 18 May 2019 18:58:56 +0000 (18:58 +0000)]
panfrost/decode: Disassemble Bifrost shaders

We already have the Bifrost disassembler in-tree, so now that panwrap is
able to dump Bifrost command streams, hook up the disassembler to
pandecode.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
5 years agovulkan/wsi: Set X11 minImageCount to 3.
Bas Nieuwenhuizen [Sat, 27 Apr 2019 23:50:36 +0000 (01:50 +0200)]
vulkan/wsi: Set X11 minImageCount to 3.

For IMMEDIATE and FIFO, most games work in a pipelined manner where the
can produce frames at a rate of 1/MAX(CPU duration, GPU duration), but
the render latency is CPU duration + GPU duration.

This means that with scanout from pageflipping we need 3 frames to run
full speed:
1) CPU rendering work
2) GPU rendering work
3) scanout

Once we have a nonblocking acquire that returns a semaphore we can merge
1 and 3. Hence the ideal implementation needs only 2 images, but games
cannot tellwe currently do not have an ideal implementation and that
hence they need to allocate 3 images. So let us do it for them.

This is a tradeoff as it uses more memory than needed for non-fullscreen
and non-performance intensive applications.

Since this is pretty much a TODO that can use the context I added this as
a comment.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agomeson: expose glapi through osmesa
Eric Engestrom [Thu, 2 May 2019 11:42:48 +0000 (12:42 +0100)]
meson: expose glapi through osmesa

Suggested-by: Pierre Guillou <pierre.guillou@lip6.fr>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109659
Fixes: f121a669c7d94d2ff672 "meson: build gallium based osmesa"
Fixes: cbbd5bb889a2c271a504 "meson: build classic osmesa"
Cc: Brian Paul <brianp@vmware.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
5 years agoegl: Allow EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY in ES and GL
Kenneth Graunke [Fri, 17 May 2019 05:05:51 +0000 (22:05 -0700)]
egl: Allow EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY in ES and GL

EGL annoyingly defines a few variants of this token:

   EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_EXT - 0x3138
   EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_KHR - 0x31BD
   EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY     - 0x31BD

The EGL_EXT_create_context_robustness extension specifies that the EXT
token is only valid for ES contexts, not GL.  The EGL_KHR_create_context
extension defines the KHR version, and says it is only allowed for GL
contexts, and specifically calls out that it's an error for ES contexts.

But EGL 1.5 includes the new suffixless token, which has the same value
as the KHR version, and specifically calls out that it's now valid to
use with both GL and ES contexts.  So we should allow this.

Fixes KHR-NoContext.es32.robustness.no_reset_notification and
KHR-NoContext.es32.robustness.lose_context_on_reset on iris, which
apparently is exposing EGL 1.5.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoanv: Only consider minSampleShading when sampleShadingEnable is set
Jason Ekstrand [Fri, 17 May 2019 18:04:24 +0000 (13:04 -0500)]
anv: Only consider minSampleShading when sampleShadingEnable is set

From the Vulkan 1.1.107 spec:

    Sample shading is enabled for a graphics pipeline:

      - If the interface of the fragment shader entry point of the
        graphics pipeline includes an input variable decorated with
        SampleId or SamplePosition. In this case minSampleShadingFactor
        takes the value 1.0.

      - Else if the sampleShadingEnable member of the
        VkPipelineMultisampleStateCreateInfo structure specified when
        creating the graphics pipeline is set to VK_TRUE. In this case
        minSampleShadingFactor takes the value of
        VkPipelineMultisampleStateCreateInfo::minSampleShading.

    Otherwise, sample shading is considered disabled.

In other words, if sampleShadingEnable is set to VK_FALSE, we should
ignore minSampleShading.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoanv: Stop forcing bindless for images
Jason Ekstrand [Fri, 17 May 2019 17:08:46 +0000 (12:08 -0500)]
anv: Stop forcing bindless for images

This was an unintended artifact of my testing of bindless images.  We
should be choosing bindless or not dynamically.

Fixes: c0d9926df7d "anv: Use bindless handles for images"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agodraw: fix memory leak introduced 7720ce32a
Neha Bhende [Thu, 16 May 2019 21:46:00 +0000 (15:46 -0600)]
draw: fix memory leak introduced 7720ce32a

We need to free memory allocation PrimitiveOffsets in draw_gs_destroy().
This fixes memory leak found while running piglit on windows.

Fixes: 7720ce32a ("draw: add support to tgsi paths for geometry streams. (v2)")
Tested with piglit

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoanv: Emulate texture swizzle in the shader when needed
Jason Ekstrand [Fri, 17 May 2019 15:04:58 +0000 (10:04 -0500)]
anv: Emulate texture swizzle in the shader when needed

Now that we have the descriptor buffer mechanism, emulated texture
swizzle can be implemented in a very non-invasive way.  Previous
attempts all tried to extend the push constant based image param
mechanism which was gross.  This could, in theory, be done much faster
with a magic back-end instruction which does indirect MOVs but Vulkan on
IVB is already so slow this isn't going to matter much.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104355
Cc: "19.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agopanfrost/midgard: Typofix
Alyssa Rosenzweig [Fri, 17 May 2019 14:56:23 +0000 (14:56 +0000)]
panfrost/midgard: Typofix

Reported-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agogitlab-ci: build-test the tools as well
Eric Engestrom [Tue, 12 Mar 2019 10:25:54 +0000 (10:25 +0000)]
gitlab-ci: build-test the tools as well

Suggested-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoradv: add a workaround for Monster Hunter World and LLVM 7&8
Samuel Pitoiset [Tue, 7 May 2019 14:09:46 +0000 (16:09 +0200)]
radv: add a workaround for Monster Hunter World and LLVM 7&8

The load/store optimizer pass doesn't handle WaW hazards correctly
and this is the root cause of the reflection issue with Monster
Hunter World. AFAIK, it's the only game that are affected by this
issue.

This is fixed with LLVM r361008, but we need a workaround for older
LLVM versions unfortunately.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agosvga: Add an environment variable to force coherent surface memory
Thomas Hellstrom [Thu, 4 Apr 2019 08:58:19 +0000 (10:58 +0200)]
svga: Add an environment variable to force coherent surface memory

The vmwgfx driver supports emulated coherent surface memory as of version
2.16. Add en environtment variable to enable this functionality for
texture- and buffer maps: SVGA_FORCE_COHERENT.
This environment variable should be used for testing only.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
5 years agopipebuffer, winsys/svga: Add functionality to update pb_validate_entry flags
Thomas Hellstrom [Fri, 10 May 2019 11:45:19 +0000 (13:45 +0200)]
pipebuffer, winsys/svga: Add functionality to update pb_validate_entry flags

In order to be able to add access modes to a pb_validate_entry, update
the pb_validate_add_buffer function to take a pointer hash table and also
to return whether the buffer was already on the validate list.

Update the svga winsys accordingly.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
5 years agosvga: Set the rendered-to flag for dma transfers to surfaces
Thomas Hellstrom [Wed, 8 May 2019 14:26:27 +0000 (16:26 +0200)]
svga: Set the rendered-to flag for dma transfers to surfaces

The rendered-to flag indicates that the HW surface content is more recent
than the content of the mob. That's the case after a SurfaceDMA transfer
to the surface.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
5 years agowinsys/svga: Fix RELOC_INTERNAL mob GPU access
Thomas Hellstrom [Wed, 8 May 2019 13:57:09 +0000 (15:57 +0200)]
winsys/svga: Fix RELOC_INTERNAL mob GPU access

SVGA_RELOC_INTERNAL indicates a transfer between surface and backing mob.
This means that if the GPU for example reads from the surface it writes
to the backing mob. But since the buffer mapping code allows for
simultaneous gpu- and cpu read access, a read from the surface to the mob
will not synchronize a subsequent map to the readback.

Fix this by inverting the mob access mode in a surface relocation with
SVGA_RELOC_INTERNAL set.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
5 years agosvga: Remove the surface_invalidate winsys function
Thomas Hellstrom [Wed, 8 May 2019 13:50:18 +0000 (15:50 +0200)]
svga: Remove the surface_invalidate winsys function

Instead unconditionally call SVGA3D_InvalidateGBSurface() since it's needed
also for Linux for dirty buffers and operation without SurfaceDMA.
For non-guest-backed operation, remove the surface cache surface invalidation
altogether.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
5 years agoRevert "softpipe/buffer: load only as many components as the the buffer resource...
Gert Wollny [Thu, 16 May 2019 12:48:50 +0000 (14:48 +0200)]
Revert "softpipe/buffer: load only as many components as the the buffer resource type provides"

This reverts commit 865b9ddae4874186182e529b5fd154ab04a61f79.

The buffer always reports format PIPE_FORMAT_R8_UNORM so with this patch only
one component would be supported. The original issue is still relevant, but
the fix should be different.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoglsl/nir: init non-static class member.
Dave Airlie [Fri, 17 May 2019 02:20:19 +0000 (12:20 +1000)]
glsl/nir: init non-static class member.

glsl_to_nir.cpp:276: uninit_member: Non-static class member "sig" is not initialized in this constructor nor in any functions that it calls.

Reported by coverity

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agoimgui: fix undefined behaviour bitshift.
Dave Airlie [Fri, 17 May 2019 01:26:57 +0000 (11:26 +1000)]
imgui: fix undefined behaviour bitshift.

imgui_draw.cpp:1781: error[shiftTooManyBitsSigned]: Shifting signed 32-bit value by 31 bits is undefined behaviour

Reported by coverity

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agoglsl: init non-static class member in link uniforms. (v2)
Dave Airlie [Fri, 17 May 2019 01:25:48 +0000 (11:25 +1000)]
glsl: init non-static class member in link uniforms. (v2)

link_uniforms.cpp:477: uninit_member: Non-static class member "shader_storage_blocks_write_access" is not initialized in this constructor nor in any functions that it calls.

Reported by coverity.

v2: fix 9->0 typo (Ilia)

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agoglsl: init packed in more constructors.
Dave Airlie [Fri, 17 May 2019 01:23:36 +0000 (11:23 +1000)]
glsl: init packed in more constructors.

src/compiler/glsl_types.cpp:577: uninit_member: Non-static class member "packed" is not initialized in this constructor nor in any functions that it calls.

from Coverity.

Fixes: 659f333b3a4 (glsl: add packed for struct types)
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agopanfrost: Cleanup leak todos
Alyssa Rosenzweig [Fri, 17 May 2019 00:14:49 +0000 (00:14 +0000)]
panfrost: Cleanup leak todos

Many of these are now patched; one of them we patch here. Regardless,
this is one less thing to worry about in the code, I suppose.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: assert(0) -> unreachable for some switch
Alyssa Rosenzweig [Thu, 16 May 2019 23:42:33 +0000 (23:42 +0000)]
panfrost: assert(0) -> unreachable for some switch

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agoanv: Fix some depth buffer sampling cases on ICL+
Nanley Chery [Tue, 30 Apr 2019 21:49:10 +0000 (14:49 -0700)]
anv: Fix some depth buffer sampling cases on ICL+

Don't attempt sampling with HiZ if the sampler lacks support for it. On
ICL, the HW docs state that sampling with HiZ is not supported and that
instances of AUX_HIZ in the RENDER_SURFACE_STATE object will be
interpreted as AUX_NONE.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
5 years agonir: Only convert SSA values to regs when needed
Caio Marcelo de Oliveira Filho [Tue, 7 May 2019 08:49:42 +0000 (01:49 -0700)]
nir: Only convert SSA values to regs when needed

If the SSA def produced by this instruction is only in the block in
which it is defined and is not used by ifs or phis, then we don't have
a reason to convert it to a register in
nir_lower_ssa_defs_to_regs_block().

The special case for derefs is covered by the general case, so can be
removed: at this point all derefs in the block are
materialized (i.e. the whole deref chain is in the block) and derefs
are not used in phis.

v2: Fix wrong check for if_uses.  If there's such an use, the def is
    not "local_to_block".  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agost/mesa: Record samplers for extra planes in info->textures_used.
Kenneth Graunke [Wed, 15 May 2019 20:58:33 +0000 (13:58 -0700)]
st/mesa: Record samplers for extra planes in info->textures_used.

Normally gl_nir_lower_samplers_as_deref records info->textures_used
for us, but this pass runs after that, attempting to assign samplers
in the same order as st_atom_texture's external_samplers_used loop
so the stars align and we get the same locations.

Since we're adding textures late, we need to amend info->textures_used.

iris uses info->textures_used to set up texture bindings; this fixes
Piglit's ext_image_dma_buf_import-sample-{nv12,yuv420,yvu420} there.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agonir: Fix nir_opt_idiv_const when negatives are involved
Caio Marcelo de Oliveira Filho [Sat, 11 May 2019 07:15:41 +0000 (00:15 -0700)]
nir: Fix nir_opt_idiv_const when negatives are involved

First, allow the case for negative powers of two.  Then ensure that we
use the absolute value of the non-constant value to calculate the
quotient -- this was hinted in the code by the name 'uq'.

This fixes an issue when 'd' is positive and 'n' is negative.  The
ishr will propagate the negative sign and we'll use nir_ineg() again,
incorrectly.

v2: First version used only ishr, but that isn't sufficient, since it
    never can produce a zero as a result.  (Jason)
    Allow negative powers of two.  (Caio)

Fixes: 74492ebad94 "nir: Add a pass for lowering integer division by constants"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agofreedreno: Log the number of loops in the shader for shader-db.
Eric Anholt [Tue, 14 May 2019 23:24:33 +0000 (16:24 -0700)]
freedreno: Log the number of loops in the shader for shader-db.

shader-db's report.py will use this to see when we've changed loop
unrolling behavior on a shader and skip including other stats like
instruction count from being considered for that shader, since they won't
be useful as a proxy for real world performance in that case.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno: Output the same shader-db format as v3d and intel.
Eric Anholt [Tue, 14 May 2019 23:02:17 +0000 (16:02 -0700)]
freedreno: Output the same shader-db format as v3d and intel.

This lets us reuse their report.py, at the expense of fd-report.py no
longer working.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno: Remove the ir3_tgsi_to_nir() helper function.
Eric Anholt [Tue, 14 May 2019 00:06:47 +0000 (17:06 -0700)]
freedreno: Remove the ir3_tgsi_to_nir() helper function.

It was more of a hindrance, as it pretended that we could compile in the
driver with a missing screen.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno: Fix assertion failures in context setup in shader-db mode.
Eric Anholt [Mon, 13 May 2019 23:58:51 +0000 (16:58 -0700)]
freedreno: Fix assertion failures in context setup in shader-db mode.

The TTN path needs access to the screen to make the right decisions about
lowering, but we didn't have pctx->screen set up at fdN_prog_init time.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agoac: match radeonsi code in ac_shader_binary_read_config
Marek Olšák [Thu, 9 May 2019 00:13:17 +0000 (20:13 -0400)]
ac: match radeonsi code in ac_shader_binary_read_config

5 years agor600+radeonsi: use ctx_query_reset_status on radeon
Marek Olšák [Thu, 9 May 2019 00:49:58 +0000 (20:49 -0400)]
r600+radeonsi: use ctx_query_reset_status on radeon

This allows a nice cleanup, because the winsys always handles it.

5 years agowinsys/radeon: implement ctx_query_reset_status by copying radeonsi
Marek Olšák [Thu, 9 May 2019 00:45:26 +0000 (20:45 -0400)]
winsys/radeon: implement ctx_query_reset_status by copying radeonsi

To make it behave like amdgpu. I'm just trying to move this out of
radeonsi. The radeonsi code will be removed in the next commit.

5 years agowinsys/amdgpu: report a CS rejection as a reset only if there's no GPU reset
Marek Olšák [Thu, 9 May 2019 00:13:46 +0000 (20:13 -0400)]
winsys/amdgpu: report a CS rejection as a reset only if there's no GPU reset

5 years agoradeonsi: update buffer descriptors in all contexts after buffer invalidation
Marek Olšák [Fri, 10 May 2019 05:14:07 +0000 (01:14 -0400)]
radeonsi: update buffer descriptors in all contexts after buffer invalidation

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108824

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
5 years agoradeonsi: remove old_va parameter from si_rebind_buffer by remembering offsets
Marek Olšák [Fri, 10 May 2019 04:40:19 +0000 (00:40 -0400)]
radeonsi: remove old_va parameter from si_rebind_buffer by remembering offsets

This is a prerequisite for the next commit.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
5 years agoradeonsi: compute culling - flush CS to remove write references to buffers
Marek Olšák [Tue, 26 Feb 2019 21:13:08 +0000 (16:13 -0500)]
radeonsi: compute culling - flush CS to remove write references to buffers

Only read-only buffers can use compute culling.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: invalidate caches at the beginning of the prim discard compute IB
Marek Olšák [Tue, 26 Feb 2019 03:53:37 +0000 (22:53 -0500)]
radeonsi: invalidate caches at the beginning of the prim discard compute IB

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: disable primitive restart for triangles for DiRT Rally
Marek Olšák [Wed, 20 Feb 2019 16:42:05 +0000 (11:42 -0500)]
radeonsi: disable primitive restart for triangles for DiRT Rally

It may decrease performance and it prevents compute-based primitive culling.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: add primitive culling stats to the HUD
Marek Olšák [Wed, 20 Feb 2019 04:27:16 +0000 (23:27 -0500)]
radeonsi: add primitive culling stats to the HUD

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: cull primitives with async compute for large draw calls
Marek Olšák [Tue, 14 Aug 2018 06:01:18 +0000 (02:01 -0400)]
radeonsi: cull primitives with async compute for large draw calls

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agowinsys/amdgpu: add REWIND emulation via INDIRECT_BUFFER into cs_check_space
Marek Olšák [Thu, 4 Apr 2019 14:02:27 +0000 (10:02 -0400)]
winsys/amdgpu: add REWIND emulation via INDIRECT_BUFFER into cs_check_space

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: add si_vs_prolog_bits::unpack_instance_id_from_vertex_id:1
Marek Olšák [Tue, 12 Feb 2019 20:26:41 +0000 (15:26 -0500)]
radeonsi: add si_vs_prolog_bits::unpack_instance_id_from_vertex_id:1

The prim discard compute shader bakes InstanceID into the output index buffer.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: make some functions non-static
Marek Olšák [Tue, 12 Feb 2019 19:49:55 +0000 (14:49 -0500)]
radeonsi: make some functions non-static

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: allow si_shader_select_with_key to return an optimized shader or fail
Marek Olšák [Tue, 12 Feb 2019 19:38:31 +0000 (14:38 -0500)]
radeonsi: allow si_shader_select_with_key to return an optimized shader or fail

If a prim discard compute shader hasn't finished compilation, we don't want
to any shader.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: use pipe_draw_info::instance_count indirectly
Marek Olšák [Sat, 22 Sep 2018 05:32:20 +0000 (01:32 -0400)]
radeonsi: use pipe_draw_info::instance_count indirectly

It will be modified by compute shader culling.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: use pipe_draw_info::prim and primitive_restart indirectly
Marek Olšák [Fri, 31 Aug 2018 02:15:13 +0000 (22:15 -0400)]
radeonsi: use pipe_draw_info::prim and primitive_restart indirectly

so that the fields can be changed by the driver.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: make functions for creating LLVM functions non-static
Marek Olšák [Thu, 16 Aug 2018 01:41:52 +0000 (21:41 -0400)]
radeonsi: make functions for creating LLVM functions non-static

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agowinsys/amdgpu: add a parallel compute IB coupled with a gfx IB
Marek Olšák [Mon, 4 Feb 2019 22:48:04 +0000 (17:48 -0500)]
winsys/amdgpu: add a parallel compute IB coupled with a gfx IB

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoac: add LLVM code for triangle culling
Marek Olšák [Wed, 13 Feb 2019 02:02:04 +0000 (21:02 -0500)]
ac: add LLVM code for triangle culling

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: add a cs parameter into si_cp_copy_data
Marek Olšák [Tue, 12 Feb 2019 20:03:13 +0000 (15:03 -0500)]
radeonsi: add a cs parameter into si_cp_copy_data

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: add a cs parameter into si_cp_release_mem
Marek Olšák [Tue, 22 Jan 2019 22:22:18 +0000 (17:22 -0500)]
radeonsi: add a cs parameter into si_cp_release_mem

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits
Marek Olšák [Tue, 22 Jan 2019 22:18:01 +0000 (17:18 -0500)]
radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: move si_*_descriptors_idx functions into si_state.h
Marek Olšák [Thu, 16 Aug 2018 01:39:52 +0000 (21:39 -0400)]
radeonsi: move si_*_descriptors_idx functions into si_state.h

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: make si_initialize_compute reusable
Marek Olšák [Thu, 16 Aug 2018 01:36:14 +0000 (21:36 -0400)]
radeonsi: make si_initialize_compute reusable

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helper
Marek Olšák [Thu, 16 Aug 2018 01:29:31 +0000 (21:29 -0400)]
radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helper

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoradeonsi: return the last part's return value from @wrapper
Marek Olšák [Mon, 13 Aug 2018 23:11:55 +0000 (19:11 -0400)]
radeonsi: return the last part's return value from @wrapper

The primitive discard compute shader will get the position output this way.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agowinsys/amdgpu: always set NO_CPU_ACCESS and NO_SUBALLOC on GDS resources
Marek Olšák [Tue, 23 Apr 2019 19:24:33 +0000 (15:24 -0400)]
winsys/amdgpu: always set NO_CPU_ACCESS and NO_SUBALLOC on GDS resources

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoswr: clean up supported OGL4.0/4.1 extensions list
Jan Zielinski [Wed, 15 May 2019 15:04:15 +0000 (17:04 +0200)]
swr: clean up supported OGL4.0/4.1 extensions list

This commit adjusts the capabilities returned
by the SWR driver and the documentation to correctly
report the following extensions:

GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array,
GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather,
GL_ARB_vertex_attrib_64bit.

Reviewed-by: Alok Hota <alok.hota@intel.com>
5 years agovl/dri3: set back buffer from output to NULL with front buffer case
Leo Liu [Thu, 16 May 2019 14:24:01 +0000 (10:24 -0400)]
vl/dri3: set back buffer from output to NULL with front buffer case

Since the using output optimization is only for back buffer case

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodocs: advice to resolve discussion on gitlab MR doc
Alejandro Piñeiro [Thu, 16 May 2019 09:32:23 +0000 (11:32 +0200)]
docs: advice to resolve discussion on gitlab MR doc

For newcomers to gitlab, it is not evident that it is better to press
the "Resolve Discussion" button when you update your branch handling
feedback.

v2:
   * Fix several grammar nits, reorder, use new corrected text (Connor
     Abbot)
   * Use "reviewers", instead of reviewer (Eric Engestrom)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoauxiliary/draw: fix crash with zero-stride draw auto
Roland Scheidegger [Wed, 15 May 2019 18:35:21 +0000 (20:35 +0200)]
auxiliary/draw: fix crash with zero-stride draw auto

transform feedback draws get the number of vertices from the transform
feedback object. In draw, we'll figure this out with the number of bytes
written divided by the stride. However, it is apparently possible we end
up with a stride of 0 there (not entirely sure it could happen with GL).
Probably when nothing was actually ever written (so we don't actually
have a stride set). Just avoid the division by zero by setting the count
to 0.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
5 years agoutil/os_file: always use the 'grow' mechanism
Eric Engestrom [Wed, 1 May 2019 10:44:16 +0000 (11:44 +0100)]
util/os_file: always use the 'grow' mechanism

Use fstat() only to pre-allocate a big enough buffer.

This fixes a race where if the file grows between fstat() and read()
we would be missing the end of the file, and if the file slims down
read() would just fail.

Fixes: 316964709e21286c2af5 "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: lower_non_uniform_access: iterate over instructions safely
Lionel Landwerlin [Wed, 15 May 2019 22:02:51 +0000 (23:02 +0100)]
nir: lower_non_uniform_access: iterate over instructions safely

This pass moves instructions around and adds control-flow in the
middle of blocks. We need to use nir_foreach_instr_safe to ensure that
we iterate over instructions correctly anyway.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoiris: Dodge more GLSL IR lowering
Kenneth Graunke [Sun, 5 May 2019 09:39:23 +0000 (02:39 -0700)]
iris: Dodge more GLSL IR lowering

This avoids some lower_instructions bits in st.

5 years agointel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks
Jason Ekstrand [Wed, 15 May 2019 17:06:38 +0000 (12:06 -0500)]
intel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks

For a block with a contiguous chunk of 32 vars that don't need updating,
this lets us skip 32 vars at a time. Also, by using bitscan, we only
iterate for each set bit rather than testing them all one at a time.
Looking at perf (with -O0 which is unfortunately necessary to get
reasonable back-traces), this seems to cuts about 50-60% of the time
spent in compute_start_end() which is, itself about 4-6% of the
run-time. In the real world, with a release driver build, this cuts
1.34% off a full shader-db run. (I ran shader-db 5 times in each
configuration).

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs/ra: Choose a spill reg before throwing away the graph
Jason Ekstrand [Wed, 15 May 2019 03:51:20 +0000 (22:51 -0500)]
intel/fs/ra: Choose a spill reg before throwing away the graph

Otherwise, we get an effectively random spill reg because we no longer
have the information from RA to guide us.  Also, a completely clean
graph has undefined data in in_stack which is used for choosing the
spill reg so it really is non-deterministic.

Fixes: e99081e76d4 "intel/fs/ra: Spill without destroying the..."
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agointel/fs/ra: Add spill costs to the graph on-demand
Jason Ekstrand [Wed, 15 May 2019 04:03:29 +0000 (23:03 -0500)]
intel/fs/ra: Add spill costs to the graph on-demand

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agointel/fs/ra: Add a helper for discarding the interference graph
Jason Ekstrand [Wed, 15 May 2019 04:02:42 +0000 (23:02 -0500)]
intel/fs/ra: Add a helper for discarding the interference graph

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agonir/algebraic: Remove problematic "optimization"
Alyssa Rosenzweig [Wed, 15 May 2019 05:03:19 +0000 (05:03 +0000)]
nir/algebraic: Remove problematic "optimization"

This line is no longer relevant now that booleans are 1-bit, and in fact
causes issues (infinite progress loop between algebraic optimizations
and copy prop) with constant vector masks.

No shader-db changes on Intel platforms (Jason).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>