mesa.git
4 years agoradv: enable VK_EXT_sample_locations
Samuel Pitoiset [Thu, 30 May 2019 07:58:01 +0000 (09:58 +0200)]
radv: enable VK_EXT_sample_locations

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: enable HTILE for images that might need variable sample locations
Samuel Pitoiset [Thu, 30 May 2019 08:26:43 +0000 (10:26 +0200)]
radv: enable HTILE for images that might need variable sample locations

This is now supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: handle sample locations during automatic layout transitions
Samuel Pitoiset [Thu, 30 May 2019 10:27:29 +0000 (12:27 +0200)]
radv: handle sample locations during automatic layout transitions

From the Vulkan spec 1.1.109:

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. [...] and
    VkRenderPassSampleLocationsBeginInfoEXT can be chained from
    VkRenderPassBeginInfo to provide sample locations for layout
    transitions performed implicitly by a render pass instance."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: determine the first subpass id for every attachments
Samuel Pitoiset [Thu, 30 May 2019 12:10:42 +0000 (14:10 +0200)]
radv: determine the first subpass id for every attachments

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: handle sample locations during explicit depth/stencil transitions
Samuel Pitoiset [Thu, 30 May 2019 10:23:21 +0000 (12:23 +0200)]
radv: handle sample locations during explicit depth/stencil transitions

From the Vulkan spec 1.1.109,

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. VkSampleLocationsInfoEXT
    can be chained from VkImageMemoryBarrier structures to provide
    sample locations for layout transitions performed by
    vkCmdWaitEvents and vkCmdPipelineBarrier calls."

This handles explicit depth/stencil layout transitions performed
with CmdWaitEvents() or CmdPipelineBarrier().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: allow the depth decompress pass to emit dynamic sample locations
Samuel Pitoiset [Thu, 30 May 2019 10:20:12 +0000 (12:20 +0200)]
radv: allow the depth decompress pass to emit dynamic sample locations

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: allow to set dynamic sample locations to the depth decompress pass
Samuel Pitoiset [Thu, 30 May 2019 09:52:56 +0000 (11:52 +0200)]
radv: allow to set dynamic sample locations to the depth decompress pass

If VK_EXT_sample_locations is used, the driver might need to emit
the sample locations specified during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: allow to save/restore sample locations during meta operations
Samuel Pitoiset [Thu, 30 May 2019 09:50:22 +0000 (11:50 +0200)]
radv: allow to save/restore sample locations during meta operations

This will be used for the depth decompress pass that might need
to emit variable sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoiris: Sweep the NIR in iris_create_uncompiled_shader().
Kenneth Graunke [Fri, 7 Jun 2019 07:57:25 +0000 (00:57 -0700)]
iris: Sweep the NIR in iris_create_uncompiled_shader().

We run a ton of backend specific passes here (mostly brw_preprocess_nir)
and ought to sweep up any unused memory at this point, since we're going
to hang on to this NIR for as long as the linked program lives.

4 years agoir3: Use the new NIR lowering pass for integer multiplication
Eduardo Lima Mitev [Sun, 12 May 2019 22:33:57 +0000 (00:33 +0200)]
ir3: Use the new NIR lowering pass for integer multiplication

Shader-db stats courtesy of Eric Anholt:

total instructions in shared programs: 6480215 -> 6475457 (-0.07%)
instructions in affected programs: 662105 -> 657347 (-0.72%)
helped: 1209
HURT: 13
total constlen in shared programs: 1432704 -> 1427769 (-0.34%)
constlen in affected programs: 100063 -> 95128 (-4.93%)
helped: 512
HURT: 0
total max_sun in shared programs: 875561 -> 873387 (-0.25%)
max_sun in affected programs: 46179 -> 44005 (-4.71%)
helped: 1087
HURT: 0

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoir3/nir: Add new NIR AlgebraicPass for lowering imul
Eduardo Lima Mitev [Sun, 12 May 2019 22:23:58 +0000 (00:23 +0200)]
ir3/nir: Add new NIR AlgebraicPass for lowering imul

Currently, ir3 backend compiler is lowering integer multiplication from:

dst = a * b

to:

dst = (al * bl) + (ah * bl << 16) + (al * bh << 16)

by emitting this code:

mull.u tmp0, a, b           ; mul low, i.e. al * bl
madsh.m16 tmp1, a, b, tmp0  ; mul-add shift high mix, i.e. ah * bl << 16
madsh.m16 dst, b, a, tmp1   ; i.e. al * bh << 16

which at that point has very low chances of being optimized.

This patch adds a new nir_algebraic.AlgebraicPass to performs this
lowering during NIR algebraic optimization passes, giving it a better
chance for optimizing the resulting code.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agonir_algebraic: Add basic optimizations for umul_low and imadsh_mix16
Eduardo Lima Mitev [Sun, 12 May 2019 22:09:38 +0000 (00:09 +0200)]
nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16

For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.

for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.

A couple of nir_search_helpers are added:

is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.

is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'
Eduardo Lima Mitev [Sun, 12 May 2019 19:12:59 +0000 (21:12 +0200)]
ir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'

They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agonir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes
Eduardo Lima Mitev [Fri, 29 Mar 2019 09:49:12 +0000 (10:49 +0100)]
nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes

'umul_low' is the low 32-bits of unsigned integer multiply. It maps
directly to ir3's MULL_U.

'imadsh_mix16' is multiply add with shift and mix, an ir3 specific
instruction that maps directly to ir3's IMADSH_M16.

Both are necessary for the lowering of integer multiplication on
Freedreno, which will be introduced later in this series.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agov3d: don't emit point coordinates varyings if the FS doesn't read them
Iago Toral Quiroga [Thu, 6 Jun 2019 08:04:27 +0000 (10:04 +0200)]
v3d: don't emit point coordinates varyings if the FS doesn't read them

We still need to emit them in V3D 3.x since there there is no mechanism to
disable them.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agov3d: add a helper to track variables that need point coordinates
Iago Toral Quiroga [Thu, 6 Jun 2019 07:41:33 +0000 (09:41 +0200)]
v3d: add a helper to track variables that need point coordinates

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoegl/x11: calloc dri2_surf so it's properly zeroed
Kenneth Graunke [Fri, 7 Jun 2019 05:17:06 +0000 (22:17 -0700)]
egl/x11: calloc dri2_surf so it's properly zeroed

Commit 2282ec0a refactored drawable creation across various platforms
into a new dri2_create_drawable helper function.

The GBM code in platform_drm.c code passed in dri2_surf->gbm_surf as the
loaderPrivate, while most other backends passed in dri2_surf directly.

To try and handle this, the patch checked if dri2_surf->gbm_surf was
non-NULL, and if so, presumed that the caller is the DRM platform and
we should use the dri2_surf->gbm_surf pointer.

This worked for most platforms, which calloc their dri2_surf structure,
zeroing the data.  Unfortunately, platform_x11.c used malloc, leaving
most of the dri2_surf as garbage.  In particular, dri2_surf->gbm_surf
was often non-NULL, causing dri2_create_drawable to try and use it,
passing a garbage pointer to the createNewDrawable hook, usually leading
to a SIGBUS or SIGSEGV when trying to dereference that bad pointer.

Since most callers calloc the data, make platform_x11.c follow suit.

Fixes crashes with i915_dri.so when running dEQP-GLES2.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
4 years agotests/graw: use C99 print conversion specifier for 32 bit builds
Mark Janes [Thu, 6 Jun 2019 05:48:41 +0000 (22:48 -0700)]
tests/graw: use C99 print conversion specifier for 32 bit builds

Fixes formatting errors for 32 bit compilations, eg:

  error: format specifies type 'unsigned long' but the argument has
  type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat]
  printf("result1 = %lu result2 = %lu\n", res1.u64, res2.u64);

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agopanfrost/midgard: Fix crash with unused SSA values
Alyssa Rosenzweig [Thu, 6 Jun 2019 15:15:23 +0000 (08:15 -0700)]
panfrost/midgard: Fix crash with unused SSA values

Crash introduced in "b38dab101ca7e0896255dccbd85fd510c47d84d1" but not
adding a Fixes tag since it's our bug anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Report sRGB colorspace as not supported
Boris Brezillon [Thu, 6 Jun 2019 16:44:09 +0000 (18:44 +0200)]
panfrost: Report sRGB colorspace as not supported

The driver does not support sRGB yet, so let's report it as unsupported.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agodocs: do not use div for line-breaking
Erik Faye-Lund [Thu, 6 Jun 2019 09:19:08 +0000 (11:19 +0200)]
docs: do not use div for line-breaking

HTML has the <p>-tag for this purpose. It adds some margins, but that
just makes this read better, IMO.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: fixup code-tag positioning
Erik Faye-Lund [Thu, 6 Jun 2019 08:11:31 +0000 (10:11 +0200)]
docs: fixup code-tag positioning

This reads better if we include the asterisk in the code-block, as it's
part of the function-reference, even though it's not technically
speaking code. But as the <code>-tag isn't purely for code, this should
be fine.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: add missing code-tags
Erik Faye-Lund [Thu, 6 Jun 2019 08:16:36 +0000 (10:16 +0200)]
docs: add missing code-tags

Looks like I missed a few cases when I recently added more code-tags
here. So let's add these cases as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: add accidentally dropped "at"
Erik Faye-Lund [Thu, 6 Jun 2019 09:01:54 +0000 (11:01 +0200)]
docs: add accidentally dropped "at"

When rewriting 20c56e18c21 after review, I accidentally dropped the "at"
here. Sorry for that, and let's fix it up!

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 20c56e18c21 ("docs: use proper links instead of code-tags")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agoanv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-op
Gurchetan Singh [Wed, 5 Jun 2019 16:51:07 +0000 (09:51 -0700)]
anv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-op

AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined
flexible YUV format.  Most of the times, it's NV12 or YV12.
On Intel, NV12 is preferred since it can be used by the display
engine.  

This API adds a dependency between gralloc and buffer consumers,
unfortunately.  Right now, the code seems to work for i915 gralloc,
but not cros_gralloc.  Add a preprocessor flag to fix this.

TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
4 years agoac/nir: Remove stale TODO
Connor Abbott [Wed, 5 Jun 2019 14:54:24 +0000 (16:54 +0200)]
ac/nir: Remove stale TODO

While we're here, copy the comment explaining this from radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradeonsi: Don't force dcc disable for loads
Connor Abbott [Wed, 5 Jun 2019 10:37:46 +0000 (12:37 +0200)]
radeonsi: Don't force dcc disable for loads

When e9d935ed0e2 added force_dcc_off(), we forced it off for any
preloaded image descriptor which had stores associated with them, since
the same preloaded descriptors were used for loads and stores. However,
when the preloading was removed in 16be87c9042, the existing logic was
kept despite it not being necessary anymore. The comment above
force_dcc_off() only mentions stores, so only force DCC off for stores.

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agomesa/main: Expose EXT_clip_control and related enums and the function
Gert Wollny [Sat, 11 May 2019 15:48:18 +0000 (17:48 +0200)]
mesa/main: Expose EXT_clip_control and related enums and the function

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
4 years agomapi/glapi/registry: Update gl.xml to latest upstream version
Gert Wollny [Sat, 11 May 2019 15:44:17 +0000 (17:44 +0200)]
mapi/glapi/registry: Update gl.xml to latest upstream version

The old copy didn't include EXT_clip_control, so update it.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
4 years agovirgl: Enable CAP_CLIP_HALFZ if host supports it
Gert Wollny [Tue, 7 May 2019 17:50:46 +0000 (19:50 +0200)]
virgl: Enable CAP_CLIP_HALFZ if host supports it

On according hosts this enables the piglits as "pass":
  arb_clip_control-*

v2: sync flag with host

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
4 years agosvga: Remove unnecessary check for the pre flush bit for setting vertex buffers
Charmaine Lee [Tue, 7 May 2019 21:07:50 +0000 (14:07 -0700)]
svga: Remove unnecessary check for the pre flush bit for setting vertex buffers

This fixes the missing rebind when the can_pre_flush bit
is not set and the vertex buffers are the same as what have been sent.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
4 years agowinsys/svga/drm: Fix 32-bit RPCI send message
Deepak Rawat [Wed, 9 May 2018 22:50:39 +0000 (15:50 -0700)]
winsys/svga/drm: Fix 32-bit RPCI send message

Depending on whether compiled with frame-pointer or not, the temporary
memory location used for the bp parameter in these macros are referenced
relative to the stack pointer or the frame pointer.
Hence we can never reference that parameter when we've modified either
the stack pointer or the frame pointer, because then the compiler would
generate an incorrect stack reference.

Fix this by pushing the temporary memory parameter on a known location on
the stack before modifying the stack- and frame pointers.

Also in case of failuire RPCI channel is not closed which lead to vmx
running out of channels.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
4 years agoradv: set the subpass before any initial subpass transitions
Samuel Pitoiset [Thu, 30 May 2019 13:13:59 +0000 (15:13 +0200)]
radv: set the subpass before any initial subpass transitions

This might fix initial subpass transitions when multiview is used.
Noticed while implementing sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoanv: Fix check for isl_fmt in assert
Nataraj Deshpande [Wed, 5 Jun 2019 19:32:01 +0000 (12:32 -0700)]
anv: Fix check for isl_fmt in assert

Checking isl_fmt returned value in assert seems appropriate
instead of format variable.

Fixes: f1654fa7e31 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
4 years agov3d: fix scheduling dependency tracking for ALU with small immediates
Iago Toral Quiroga [Wed, 5 Jun 2019 06:53:10 +0000 (08:53 +0200)]
v3d: fix scheduling dependency tracking for ALU with small immediates

We were not accountint for small immediates in the B mux so the scheduler
was interpreting these are regular register file accesses, which could
lead to additional (incorrect) write-read dependencies.

Shader-db changes:

total instructions in shared programs: 9163664 -> 9137263 (-0.29%)
instructions in affected programs: 3931035 -> 3904634 (-0.67%)
helped: 12457
HURT: 2563

total max-temps in shared programs: 1325787 -> 1325597 (-0.01%)
max-temps in affected programs: 5746 -> 5556 (-3.31%)
helped: 186
HURT: 16
helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1
helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28%
HURT stats (abs)   min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88%
95% mean confidence interval for max-temps value: -1.04 -0.84
95% mean confidence interval for max-temps %-change: -4.16% -3.07%
Max-temps are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agolima/ppir: add missing handling of min/max ops for vec4 add slot
Vasily Khoruzhick [Tue, 4 Jun 2019 15:56:38 +0000 (08:56 -0700)]
lima/ppir: add missing handling of min/max ops for vec4 add slot

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
4 years agolima/ppir: fix crash when program uses no registers at all
Vasily Khoruzhick [Sat, 1 Jun 2019 05:30:54 +0000 (22:30 -0700)]
lima/ppir: fix crash when program uses no registers at all

Program may need no regalloc at all, e.g. in case when program consists
of single discard op.

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
4 years agoutil/hash_table: Assert that keys are not reserved pointers
Jason Ekstrand [Wed, 5 Jun 2019 22:30:47 +0000 (17:30 -0500)]
util/hash_table: Assert that keys are not reserved pointers

If we insert a NULL key, it will appear to succeed but will mess up
entry counting.  Similar errors can occur if someone accidentally
inserts the deleted key.  The later is highly unlikely but technically
possible so we should guard against it too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoutil/set: Assert that keys are not reserved pointers
Jason Ekstrand [Wed, 5 Jun 2019 21:56:20 +0000 (16:56 -0500)]
util/set: Assert that keys are not reserved pointers

If we insert a NULL key, it will appear to succeed but will mess up
entry counting.  Similar errors can occur if someone accidentally
inserts the deleted key.  The later is highly unlikely but technically
possible so we should guard against it too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoglsl/loop_analysis: Don't search for NULL variables in the hash table
Jason Ekstrand [Wed, 5 Jun 2019 23:35:14 +0000 (18:35 -0500)]
glsl/loop_analysis: Don't search for NULL variables in the hash table

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agonir/propagate_invariant: Don't add NULL vars to the hash table
Jason Ekstrand [Wed, 5 Jun 2019 21:54:40 +0000 (16:54 -0500)]
nir/propagate_invariant: Don't add NULL vars to the hash table

Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agointel/compiler: Treat b32csel as potentially producing a Boolean result for resolve...
Ian Romanick [Tue, 4 Jun 2019 19:16:55 +0000 (12:16 -0700)]
intel/compiler: Treat b32csel as potentially producing a Boolean result for resolve analysis

If the 2nd and 3rd source are both Boolean values, we can potentially
avoid a resolve by only resolving the result of the b32csel.

No changes on any Gen6+ Intel platform.

v2: Use ?: instead of cast from bool to unsigned.  Suggested by Caio.

Iron Lake
total instructions in shared programs: 8142729 -> 8142677 (<.01%)
instructions in affected programs: 12890 -> 12838 (-0.40%)
helped: 26
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.39%
Instructions are helped.

total cycles in shared programs: 188549632 -> 188549394 (<.01%)
cycles in affected programs: 60754 -> 60516 (-0.39%)
helped: 25
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8
helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for cycles value: -12.91 -5.40
95% mean confidence interval for cycles %-change: -0.84% -0.23%
Cycles are helped.

GM45
total instructions in shared programs: 5013119 -> 5013093 (<.01%)
instructions in affected programs: 6764 -> 6738 (-0.38%)
helped: 13
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.34%
Instructions are helped.

total cycles in shared programs: 128977804 -> 128977700 (<.01%)
cycles in affected programs: 37738 -> 37634 (-0.28%)
helped: 13
HURT: 0
helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26%
95% mean confidence interval for cycles value: -8.00 -8.00
95% mean confidence interval for cycles %-change: -0.36% -0.24%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agointel/fs: Improve discard_if code generation
Ian Romanick [Tue, 21 May 2019 00:25:01 +0000 (17:25 -0700)]
intel/fs: Improve discard_if code generation

Previously we would blindly emit an sequence like:

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.z.f0.1(16)  null<1>D        g7<8,8,1>D      0D

The first move sets the flags based on the initial execution mask.
Later discard sequences contain a predicated compare that can only
remove more SIMD channels.  Often times the only user of the result from
the first compare is the second compare.  Instead, generate a sequence
like

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.ge.f0.1(8)  null<1>F        g5<8,8,1>F      0x41700000F  /* 15F */

If the results stored in g7 and f0.0 are not used, the comparison will
be eliminated.  This removes an instruction and potentially reduces
register pressure.

v2: Major re-write of the commit message (including fixing the assembly
code).  Suggested by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224434 -> 17198659 (-0.15%)
instructions in affected programs: 2908125 -> 2882350 (-0.89%)
helped: 18891
HURT: 5
helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02%
HURT stats (abs)   min: 9 max: 105 x̄: 51.40 x̃: 35
HURT stats (rel)   min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56%
95% mean confidence interval for instructions value: -1.39 -1.34
95% mean confidence interval for instructions %-change: -1.79% -1.73%
Instructions are helped.

total cycles in shared programs: 361468458 -> 361170679 (-0.08%)
cycles in affected programs: 38470116 -> 38172337 (-0.77%)
helped: 16202
HURT: 1456
helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18
helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18%
HURT stats (abs)   min: 1 max: 5982 x̄: 87.51 x̃: 28
HURT stats (rel)   min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64%
95% mean confidence interval for cycles value: -18.24 -15.49
95% mean confidence interval for cycles %-change: -2.26% -2.14%
Cycles are helped.

total spills in shared programs: 12147 -> 12176 (0.24%)
spills in affected programs: 175 -> 204 (16.57%)
helped: 8
HURT: 5

total fills in shared programs: 25262 -> 25292 (0.12%)
fills in affected programs: 269 -> 299 (11.15%)
helped: 8
HURT: 5

Haswell
total instructions in shared programs: 13530316 -> 13502647 (-0.20%)
instructions in affected programs: 2507824 -> 2480155 (-1.10%)
helped: 18859
HURT: 10
helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1
helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41%
HURT stats (abs)   min: 5 max: 39 x̄: 25.70 x̃: 31
HURT stats (rel)   min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31%
95% mean confidence interval for instructions value: -1.49 -1.44
95% mean confidence interval for instructions %-change: -2.42% -2.34%
Instructions are helped.

total cycles in shared programs: 377865412 -> 377639034 (-0.06%)
cycles in affected programs: 40169572 -> 39943194 (-0.56%)
helped: 15550
HURT: 1938
helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18
helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25%
HURT stats (abs)   min: 1 max: 4862 x̄: 89.17 x̃: 35
HURT stats (rel)   min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75%
95% mean confidence interval for cycles value: -14.42 -11.47
95% mean confidence interval for cycles %-change: -2.05% -1.91%
Cycles are helped.

total spills in shared programs: 26769 -> 26814 (0.17%)
spills in affected programs: 826 -> 871 (5.45%)
helped: 9
HURT: 10

total fills in shared programs: 38383 -> 38425 (0.11%)
fills in affected programs: 834 -> 876 (5.04%)
helped: 9
HURT: 10

LOST:   5
GAINED: 10

Ivy Bridge
total instructions in shared programs: 12079250 -> 12044139 (-0.29%)
instructions in affected programs: 2409680 -> 2374569 (-1.46%)
helped: 16135
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2
helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68%
95% mean confidence interval for instructions value: -2.21 -2.14
95% mean confidence interval for instructions %-change: -2.76% -2.67%
Instructions are helped.

total cycles in shared programs: 180116747 -> 179900405 (-0.12%)
cycles in affected programs: 25439823 -> 25223481 (-0.85%)
helped: 13817
HURT: 1499
helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18
helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97%
HURT stats (abs)   min: 1 max: 3684 x̄: 98.99 x̃: 52
HURT stats (rel)   min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42%
95% mean confidence interval for cycles value: -15.68 -12.57
95% mean confidence interval for cycles %-change: -1.77% -1.63%
Cycles are helped.

LOST:   8
GAINED: 10

Sandy Bridge
total instructions in shared programs: 10878990 -> 10863659 (-0.14%)
instructions in affected programs: 1806702 -> 1791371 (-0.85%)
helped: 13023
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1
helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10%
95% mean confidence interval for instructions value: -1.18 -1.17
95% mean confidence interval for instructions %-change: -1.68% -1.62%
Instructions are helped.

total cycles in shared programs: 154082878 -> 153862810 (-0.14%)
cycles in affected programs: 20199374 -> 19979306 (-1.09%)
helped: 12048
HURT: 510
helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18
helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52%
HURT stats (abs)   min: 1 max: 448 x̄: 54.39 x̃: 16
HURT stats (rel)   min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17%
95% mean confidence interval for cycles value: -17.97 -17.08
95% mean confidence interval for cycles %-change: -1.84% -1.75%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 8155075 -> 8142729 (-0.15%)
instructions in affected programs: 949495 -> 937149 (-1.30%)
helped: 5810
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.59% -2.48%
Instructions are helped.

total cycles in shared programs: 188584610 -> 188549632 (-0.02%)
cycles in affected programs: 17274446 -> 17239468 (-0.20%)
helped: 3881
HURT: 90
helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6
helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30%
HURT stats (abs)   min: 2 max: 10 x̄: 2.80 x̃: 2
HURT stats (rel)   min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07%
95% mean confidence interval for cycles value: -9.35 -8.27
95% mean confidence interval for cycles %-change: -0.85% -0.77%
Cycles are helped.

GM45
total instructions in shared programs: 5019308 -> 5013119 (-0.12%)
instructions in affected programs: 489028 -> 482839 (-1.27%)
helped: 2912
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.54% -2.39%
Instructions are helped.

total cycles in shared programs: 129002592 -> 128977804 (-0.02%)
cycles in affected programs: 12669152 -> 12644364 (-0.20%)
helped: 2759
HURT: 37
helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4
helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31%
HURT stats (abs)   min: 2 max: 10 x̄: 3.62 x̃: 4
HURT stats (rel)   min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04%
95% mean confidence interval for cycles value: -9.53 -8.20
95% mean confidence interval for cycles %-change: -0.79% -0.70%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agointel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu
Ian Romanick [Tue, 21 May 2019 19:09:42 +0000 (12:09 -0700)]
intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu

This is the same as the need_dest parameter to
prepare_alu_destination_and_sources.  This allows us to not change the
register that is expected to hold an result if an instruction is
re-emitted.  This is particularly a problem if the re-emitted
instruction is a partial write.  A later patch will use this feature.

No shader-db changes on any Intel platform.

v2: Don't do the Boolean resolve when there is no destination.  If the
ALU instruction didn't write a register, there's nothing to resolve.
This replaces an earlier patch "intel/fs: Allocate dummy destination
register when need_dest is false".

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agointel/fs: Allow cmod propagation across reads and writes of different flags
Ian Romanick [Wed, 22 May 2019 19:32:03 +0000 (12:32 -0700)]
intel/fs: Allow cmod propagation across reads and writes of different flags

This also helps a later patch (intel/fs: Improve discard_if code
generation) on about 200 shaders.

v2: Document that other instruction sequences are also valid in
subtract_merge_with_compare_intervening_mismatch_flag_write.  Suggested
by Caio.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17224434 (<.01%)
instructions in affected programs: 296 -> 292 (-1.35%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.04% -0.81%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361468458 (<.01%)
cycles in affected programs: 2862 -> 2865 (0.10%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31%
HURT stats (abs)   min: 3 max: 4 x̄: 3.50 x̃: 3
HURT stats (rel)   min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -4.34 5.84
95% mean confidence interval for cycles %-change: -0.70% 0.90%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agointel/fs: Fix flag_subreg handling in cmod propagation
Ian Romanick [Wed, 22 May 2019 17:18:06 +0000 (10:18 -0700)]
intel/fs: Fix flag_subreg handling in cmod propagation

There were two errors.  First, the pass could propagate conditional
modifiers from an instruction that writes on flag register to an
instruction that writes a different flag register.  For example,

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
    cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F

could be come

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F

Second, if an instruction writes f0.1 has it's condition propagated, the
modified instruction will incorrectly write flag f0.0.  For example,

    linterp(16) vgrf6:F, g2:F, attr0:F
    cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
    (-f0.1) discard_jump(16) (null):UD

could become

    linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
    (-f0.1) discard_jump(16) (null):UD

None of these cases will occur currently.  The only time we use f0.1 is
for generating discard intrinsics.  In all those cases, we generate a
squence like:

    cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
    (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
    (-f0.1) discard_jump(16) (null):UD

Due to the mixed types and incompatible conditions, this sequence would
never see any cmod propagation.  The next patch will change this.

No shader-db changes on any Intel platform.

v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
Noticed by Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agointel/fs: Add missing tests for cmod_propagate_not
Ian Romanick [Wed, 22 May 2019 18:06:19 +0000 (11:06 -0700)]
intel/fs: Add missing tests for cmod_propagate_not

Tests like this should have been added in 4467040cb65 ("i965/fs:
Propagate conditional modifiers from not instructions").

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agoi965: Allow signed/unsigned integer conversions in miptree up/download
Kenneth Graunke [Wed, 5 Jun 2019 06:19:22 +0000 (23:19 -0700)]
i965: Allow signed/unsigned integer conversions in miptree up/download

BLORP now handles this so there's no reason to fall back.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agointel/blorp: Handle SINT/UINT clamping on blits.
Kenneth Graunke [Wed, 5 Jun 2019 06:18:45 +0000 (23:18 -0700)]
intel/blorp: Handle SINT/UINT clamping on blits.

This patch makes blorp_blit handle SINT<->UINT blit value clamping.
After reading the source's integer data (which is expanded to 32-bit),
we either IMAX with 0 (for SINT -> UINT, to clamp negative numbers) or
UMIN with (1 << 31) - 1 (for UINT -> SINT, to clamp positive numbers
outside of the representable range).

Such blits are not allowed by the OpenGL or Vulkan APIs directly:

   The Vulkan 1.1 spec for vkCmdBlitImage says:

   "Integer formats can only be converted to other integer formats with
    the same signedness."

   The GL 4.5 spec for glBlitFramebuffer says:

   "An INVALID_OPERATION error is generated if format conversions are
    not supported, which occurs under any of the following conditions:
    [...]
    * The read buffer contains unsigned integer values and any draw
      buffer does not contain unsigned integer values.
    * The read buffer contains signed integer values and any draw buffer
      does not contain signed integer values."

However, they are useful for other operations, such as texture upload
and download, which typically are implemented via blorp_blit().  i965
has code to fall back in this case (which the next commit will delete),
and Gallium expects blit() to handle this case for texture upload.

Fixes the following tests on iris:
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv/pipeline: Move lowering of nir_var_mem_global later
Caio Marcelo de Oliveira Filho [Thu, 30 May 2019 23:55:19 +0000 (16:55 -0700)]
anv/pipeline: Move lowering of nir_var_mem_global later

This let deref optimizations apply to globals before lowering them.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agost/nir: Don't use GLSL IR's MOD_TO_FLOOR lowering when using NIR.
Kenneth Graunke [Fri, 17 May 2019 05:41:13 +0000 (22:41 -0700)]
st/nir: Don't use GLSL IR's MOD_TO_FLOOR lowering when using NIR.

Both GLSL IR and NIR perform the same mod -> floor lowering for 32-bit
types.  But nir_lower_double_ops is slightly more defensive against
lowered drcp precision loss, and handles mod(x, x) = 0 directly.  This
works well...assuming nir_lower_double_ops actually gets an fmod op to
lower in the first place.

The previous patches enabled NIR-based lowering for the remaining
drivers, so we can stop using the GLSL IR lowering when using NIR.

Fixes KHR-GL45.gpu_shader_fp64.builtin.mod_dvec[234] on iris.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: Enable NIR's lower_fmod option.
Kenneth Graunke [Mon, 3 Jun 2019 20:40:05 +0000 (13:40 -0700)]
radeonsi: Enable NIR's lower_fmod option.

Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR.  This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.

The AMD NIR backend also has code to handle fmod, so we could
potentially skip this and still be fine.  I don't have an opinion
on that.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agovc4: Enable NIR's lower_fmod option.
Kenneth Graunke [Tue, 4 Jun 2019 06:06:49 +0000 (23:06 -0700)]
vc4: Enable NIR's lower_fmod option.

Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR.  This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
4 years agov3d: Enable NIR's lower_fmod option.
Kenneth Graunke [Mon, 3 Jun 2019 20:36:56 +0000 (13:36 -0700)]
v3d: Enable NIR's lower_fmod option.

Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR.  This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
4 years agonir: Combine lower_fmod16/32 back into a single lower_fmod.
Kenneth Graunke [Mon, 3 Jun 2019 20:18:55 +0000 (13:18 -0700)]
nir: Combine lower_fmod16/32 back into a single lower_fmod.

We originally had a single lower_fmod option.  In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there.  This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.

Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again.  I'm not aware of any hardware which
need lowering for one bitsize and not the other.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agonir: Drop lower_fmod64 option.
Kenneth Graunke [Mon, 3 Jun 2019 20:15:49 +0000 (13:15 -0700)]
nir: Drop lower_fmod64 option.

nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64.  The version there also better handles imprecisions
due to lowered frcp@64.  Let's consolidate on one version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agopanfrost: Switch to nir_lower_doubles instead of lower_fmod64.
Kenneth Graunke [Mon, 3 Jun 2019 18:54:21 +0000 (11:54 -0700)]
panfrost: Switch to nir_lower_doubles instead of lower_fmod64.

I don't think panfrost actually does doubles yet, but it at least
claims to support PIPE_CAP_DOUBLES, so at least pretend to switch
to the new lowering.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agonouveau: Use nir_lower_doubles instead of lower_fmod64 on nvc0.
Kenneth Graunke [Mon, 3 Jun 2019 18:43:38 +0000 (11:43 -0700)]
nouveau: Use nir_lower_doubles instead of lower_fmod64 on nvc0.

We currently have two duplicate mechanisms for lowering fmod@64.
One is a nir_opt_algebraic rule keyed off of options->lower_fmod64,
and the other is nir_lower_doubles, which offers a full gamut of
fp64 lowering.  The latter works slightly better in some corner cases,
so I'm trying to eliminate lower_fmod64 and drop the redundancy.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agogallium: Drop lower_fmod64 from drivers that don't support doubles.
Kenneth Graunke [Mon, 3 Jun 2019 18:41:37 +0000 (11:41 -0700)]
gallium: Drop lower_fmod64 from drivers that don't support doubles.

Neither freedreno nor nv50 expose PIPE_CAP_DOUBLES, so there's no
fmod64 to be lowered.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agodocs: update calendar, and news item and link release notes for 19.0.6
Dylan Baker [Wed, 5 Jun 2019 23:42:36 +0000 (16:42 -0700)]
docs: update calendar, and news item and link release notes for 19.0.6

4 years agodocs: Add SHA256 sums for 19.0.6
Dylan Baker [Wed, 5 Jun 2019 23:37:20 +0000 (16:37 -0700)]
docs: Add SHA256 sums for 19.0.6

4 years agodocs: Add relnotes for 19.0.6
Dylan Baker [Wed, 5 Jun 2019 23:32:35 +0000 (16:32 -0700)]
docs: Add relnotes for 19.0.6

4 years agodocs: add day of month to all news-entries
Erik Faye-Lund [Mon, 3 Jun 2019 15:53:13 +0000 (17:53 +0200)]
docs: add day of month to all news-entries

This makes it easier to batch-convert them to other structured
markup-formats.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: add MD5 checksums for 9.2.2 files
Erik Faye-Lund [Tue, 4 Jun 2019 14:15:45 +0000 (16:15 +0200)]
docs: add MD5 checksums for 9.2.2 files

These checksums were obtained by downloading the releases from
ftp://ftp.freedesktop.org/pub/mesa/older-versions/9.x/9.2.2/ and
running md5sum on them.

Hopefully the server wasn't compromised since release.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use pre-block for showing commit-note
Erik Faye-Lund [Tue, 4 Jun 2019 14:08:27 +0000 (16:08 +0200)]
docs: use pre-block for showing commit-note

Having a single-item list for this seems odd. Let's just use a pre-block
in stead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: switch to definition list and code-tags
Erik Faye-Lund [Tue, 4 Jun 2019 13:42:46 +0000 (15:42 +0200)]
docs: switch to definition list and code-tags

A definition list is a better semantic match for what this list is
supposed to convey, so let's use that instead. And while we're at it,
let's add some code-tags around filenames, as they stand a bit more out
that way.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: combine headings
Erik Faye-Lund [Tue, 4 Jun 2019 11:03:57 +0000 (13:03 +0200)]
docs: combine headings

This is more in line with how we mark-up other definition lists, and
avoids portability issues with other markup-formats.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: more code-tags in llvmpipe.html
Erik Faye-Lund [Tue, 4 Jun 2019 11:11:59 +0000 (13:11 +0200)]
docs: more code-tags in llvmpipe.html

This makes the article a bit easier to read.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use more code-tags in envvars.html
Erik Faye-Lund [Tue, 4 Jun 2019 10:50:23 +0000 (12:50 +0200)]
docs: use more code-tags in envvars.html

This wraps code, identifiers, values and paths in code-tags, which makes
them appear in a monospace-font for readability.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use code-tags for envvars and options
Erik Faye-Lund [Tue, 4 Jun 2019 10:19:51 +0000 (12:19 +0200)]
docs: use code-tags for envvars and options

This makes it a bit easier to tell what's what.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use dl instead of ul
Erik Faye-Lund [Tue, 4 Jun 2019 09:26:40 +0000 (11:26 +0200)]
docs: use dl instead of ul

A HTML definition-list is more semantically strong than just some
unordered list, and renders a bit cleaner by default. So let's use that
instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: remove pointlessly repeated list
Erik Faye-Lund [Tue, 4 Jun 2019 10:26:34 +0000 (12:26 +0200)]
docs: remove pointlessly repeated list

The examples listed above are exactly the same ones are we're about to
list, so let's just keep the list that defines what they do.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: remove stray whitespace
Erik Faye-Lund [Wed, 8 May 2019 09:29:31 +0000 (11:29 +0200)]
docs: remove stray whitespace

There's some stray whitespace in these files that doesn't do anything
useful. Let's get rid of if.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use proper links instead of code-tags
Erik Faye-Lund [Tue, 4 Jun 2019 07:38:25 +0000 (09:38 +0200)]
docs: use proper links instead of code-tags

These links are a bit odd in that the URLs are simply placed in
code-tags. This makes them harder to work with. Let's use proper
links instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: update doxygen-links
Erik Faye-Lund [Mon, 3 Jun 2019 17:23:07 +0000 (19:23 +0200)]
docs: update doxygen-links

One of these URLs are dead these days, and the other one forwards to the
current one, doxygen.nl. Let's get these links up to date.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: remove some noisy spacing in pre-blocks
Erik Faye-Lund [Mon, 3 Jun 2019 16:54:05 +0000 (18:54 +0200)]
docs: remove some noisy spacing in pre-blocks

These newlines caused the blocks to have trailing newlines in them,
which renders a bit noisily.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: improve quoting slightly
Erik Faye-Lund [Mon, 3 Jun 2019 16:50:41 +0000 (18:50 +0200)]
docs: improve quoting slightly

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: do not use br-tag for non-significant breaks
Erik Faye-Lund [Mon, 3 Jun 2019 16:30:23 +0000 (18:30 +0200)]
docs: do not use br-tag for non-significant breaks

According to the W3C, we shouldn't use the br-tag unless the line-break
is part of the content:

https://www.w3.org/TR/2011/WD-html5-author-20110809/the-br-element.html

All of these instances are for non-content usage, and is as such technically
out-of-spec. So let's either remove them, or split paragraphs, based on
how related the content are.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: remove pointless line-break
Erik Faye-Lund [Mon, 3 Jun 2019 16:26:38 +0000 (18:26 +0200)]
docs: remove pointless line-break

Line-breaks at the end of a paragraph doesn't do anything useful,
so let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: remove pointless trailing hard-breaks
Erik Faye-Lund [Mon, 3 Jun 2019 16:24:26 +0000 (18:24 +0200)]
docs: remove pointless trailing hard-breaks

Line-break at the end of an article is quite pointless, and doesn't do
much to increase the readability. Let's get rid of them.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: rewrite paragraph to be free-form
Erik Faye-Lund [Tue, 28 May 2019 12:07:48 +0000 (14:07 +0200)]
docs: rewrite paragraph to be free-form

These half-way structured sections are needlessly problematic to
translate cleanly to other markup-languages, so let's just make this
into a free-form paragraph instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use h4 instead of free-standing paragraphs and br-tags
Erik Faye-Lund [Tue, 28 May 2019 12:03:08 +0000 (14:03 +0200)]
docs: use h4 instead of free-standing paragraphs and br-tags

This makes this document a bit more structured, which is generally
considered a good thing for HTML. It will also translate a bit better
into other markup-formats.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: slightly reword paragraph and tweak markup
Erik Faye-Lund [Tue, 28 May 2019 11:57:42 +0000 (13:57 +0200)]
docs: slightly reword paragraph and tweak markup

This makes this paragraph a bit easier to digest.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: remove stray space in code-block
Erik Faye-Lund [Tue, 28 May 2019 11:53:03 +0000 (13:53 +0200)]
docs: remove stray space in code-block

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: remove some pointless spacing
Erik Faye-Lund [Tue, 28 May 2019 11:48:15 +0000 (13:48 +0200)]
docs: remove some pointless spacing

The different headers and header-sizes already convey the hierarchical
structure of this document, the unusual spacing arguably just looks a
bit inconsistent with the rest of the site. Let's remove it; it looks
fine without it, and will translate better to other markup languages.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: add more more code-tags
Erik Faye-Lund [Tue, 28 May 2019 11:14:03 +0000 (13:14 +0200)]
docs: add more more code-tags

It's easier to read function-names, file-names and other
"machine"-related strings if they are formatted in a monospace font. So
let's mark these up with code-tags.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use code instead of tt-tag
Erik Faye-Lund [Tue, 28 May 2019 11:34:34 +0000 (13:34 +0200)]
docs: use code instead of tt-tag

The tt-tag has been removed from HTML5, so let's normalize this to
code-tags intead. This just makes things a bit more consistent, as we've
mixed these left and right so far anyway.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use paragraph instead of double newlines
Erik Faye-Lund [Tue, 28 May 2019 11:20:29 +0000 (13:20 +0200)]
docs: use paragraph instead of double newlines

This is a bit more semantically clean in HTML, and makes us keep
content and presentation a bit more separated.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use verbatim .plan quote
Erik Faye-Lund [Mon, 3 Jun 2019 17:07:50 +0000 (19:07 +0200)]
docs: use verbatim .plan quote

This quote is now verbatim, as archived here:

https://github.com/ESWAT/john-carmack-plan-archive/blob/master/by_year/johnc_plan_1999.txt

This makes it look a bit more consistent with the following news-entry,
and makes things IMO a bit more clear.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agopanfrost/midgard: Verify SSA claims when pipelining
Alyssa Rosenzweig [Wed, 5 Jun 2019 18:51:16 +0000 (18:51 +0000)]
panfrost/midgard: Verify SSA claims when pipelining

The pipeline register creation algorithm is only valid for SSA indices;
NIR registers and such cannot be pipelined without more complex
analysis. However, there are the ocassional class of "liars" -- indices
that claim to be SSA but are not. This occurs in the blend shader
prologue, for example. Detect this and just bail quietly for now.

Eventually we need to rewrite the blend shader prologue to occur in NIR
anyway (which would mitigate the issue), but that's more involved and
depends on a better understanding of pixel formats in blend shaders (for
non-RGBA8888/UNORM cases).

Fixes some blend shader regressions.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost/midgard: Don't assign var locations ourselves
Alyssa Rosenzweig [Wed, 5 Jun 2019 18:26:29 +0000 (18:26 +0000)]
panfrost/midgard: Don't assign var locations ourselves

This piece of code was cargo-culted from the ir3 standalone compiler and
made sense when we were a standalone compiler ourselves. Unfortunately,
for the online compiler, mesa/st *already handles this for us* and if we
duplicate it here, we're duplicating it *incorrectly*. So just delete
these lines and fix a heck of a lot of tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Reload framebuffer contents if there's no clear
Tomeu Vizoso [Tue, 14 May 2019 15:28:17 +0000 (17:28 +0200)]
panfrost: Reload framebuffer contents if there's no clear

If by flush time the client hasn't submitted a clear, add jobs for
reloading the framebuffer contents as the first draw in the frame.

This is required by programs such as Weston which don't do clears and
rely on the previous contents of the framebuffer being there.

Reloading the whole framebuffer on every frame without regards to what
is needed or what is going to be covered is very inefficient, but future
work will introduce support for damage regions and partial updates so we
know what needs to be actually reloaded.

Fixes quite a few tests in dEQP-EGL.functional.buffer_age.*.

[Alyssa: The context is that tilers do an implicit glClear() on every
frame, whether you asked them to or not. If you want a clear, this is
very efficient. But if you don't, you have to explicitly blit the
backbuffer back into tile memory, accomplished by a dummy texturing
draw. This patch generates that draw via u_blitter, although we could do
a bit better ourselves by eliding the vertex job. This fixes "black
rectangles in Weston/sway" as well as "video not displaying when UI
visible in mpv"]

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Don't flip scanout
Alyssa Rosenzweig [Thu, 23 May 2019 03:01:32 +0000 (03:01 +0000)]
panfrost: Don't flip scanout

The mesa/st flips the viewport, so we respect that rather than
trying to flip the framebuffer itself and ignoring the viewport and
using a messy heuristic.

However, this brings an underlying disagreement about the interpretation
of winding order to light. The blob uses a different strategy than Mesa
for handling viewport Y flipping, so the meanings of the winding order
bit are flipped for it. To keep things clean on our end, we rename to
explicitly use Gallium (rather than flipped OpenGL) conventions.

Fixes upside-down Xwayland/egl windows.

v2: Adjust lowering configuration to correctly flip gl_PointCoord.y and
gl_FragCoord.y. v1 was R-b'd by Tomeu, but then retracted due to these
regressions which are not fixed.

Suggested-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Sort-of-reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agost/nine: Use tgsi_to_nir when preferred IR is NIR.
Timur Kristóf [Fri, 31 May 2019 16:43:20 +0000 (18:43 +0200)]
st/nine: Use tgsi_to_nir when preferred IR is NIR.

This patch allows nine to read the preferred IR from pipe caps and use
NIR when that is preferred by the driver, by calling tgsi_to_nir. Also
adds some debug options that allow overriding it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
4 years agointel/perf: improve dynamic loading config detection
Lionel Landwerlin [Wed, 5 Jun 2019 08:20:23 +0000 (11:20 +0300)]
intel/perf: improve dynamic loading config detection

We're currently trying to detect dynamic loading config support by
trying to remove to test config (hard coded in the i915 driver) and
checking we get ENOENT.

This can fail if the test config was updated in Mesa but not yet in
i915.

A better way to do this is to pick an invalid ID and check for ENOENT.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/nir: Take nir_shader*s in brw_nir_link_shaders
Jason Ekstrand [Tue, 4 Jun 2019 23:23:17 +0000 (18:23 -0500)]
intel/nir: Take nir_shader*s in brw_nir_link_shaders

Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is
enabled, we can just take a single pointer and not a pointer to pointer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/nir: Stop returning the shader from helpers
Jason Ekstrand [Tue, 4 Jun 2019 23:19:06 +0000 (18:19 -0500)]
intel/nir: Stop returning the shader from helpers

Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agonir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1
Jason Ekstrand [Tue, 4 Jun 2019 22:50:22 +0000 (17:50 -0500)]
nir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agonir: Don't replace the nir_shader when NIR_TEST_CLONE=1
Jason Ekstrand [Tue, 4 Jun 2019 22:48:33 +0000 (17:48 -0500)]
nir: Don't replace the nir_shader when NIR_TEST_CLONE=1

Instead, we add a new helper which stomps one nir_shader and replaces it
with another.  The new helper effectively just changes which pointer
gets used for the base nir_shader.  It should be 99% as good at testing
cloning but without requiring that everything handle having the shader
swapped out from under it constantly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agoiris: Only recompile CS when needed
Caio Marcelo de Oliveira Filho [Wed, 5 Jun 2019 05:55:13 +0000 (22:55 -0700)]
iris: Only recompile CS when needed

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>