mesa.git
5 years agodocs: spell out "and" in sidebar
Erik Faye-Lund [Mon, 6 May 2019 10:32:59 +0000 (12:32 +0200)]
docs: spell out "and" in sidebar

There's no need to keep this short, we can just spell out "and" here.
Besides, a slash kind of implies "or", but these articles are about
both of these, not either.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agodocs: remove pointless list-entry
Erik Faye-Lund [Thu, 2 May 2019 18:47:55 +0000 (20:47 +0200)]
docs: remove pointless list-entry

It's quite visible that there's more docs below, we don't need to spell
it out for the reader.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agodocs: spell out faq in sidebar
Erik Faye-Lund [Mon, 6 May 2019 10:28:54 +0000 (12:28 +0200)]
docs: spell out faq in sidebar

We're not short on space here, so there's little point in abbreviating
this. This also matches the heading in the article.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agodocs: spell out "and" in sidebar
Erik Faye-Lund [Thu, 2 May 2019 18:43:25 +0000 (20:43 +0200)]
docs: spell out "and" in sidebar

We're not short on space here, so let's just spell out "and" instead of
using the ampersand. This is more consistent with the entry above in the
sidebar.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoglsl_to_nir: remove unused type_is_int()
Timothy Arceri [Wed, 8 May 2019 03:55:53 +0000 (13:55 +1000)]
glsl_to_nir: remove unused type_is_int()

This was missed in e00fa99b08b3.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoRevert "glx: Fix synthetic error generation in __glXSendError"
Timothy Arceri [Tue, 7 May 2019 03:55:32 +0000 (13:55 +1000)]
Revert "glx: Fix synthetic error generation in __glXSendError"

This reverts commit e91ee763c378d03883eb88cf0eadd8aa916f7878.

This seems to have broken a number of wine games. Lets revert
everything for now and try again later.

Acked-by: Adam Jackson <ajax@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110632
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110590

5 years agoradeonsi: add an AMD_TEX_ANISO environment variable
Timothy Arceri [Tue, 7 May 2019 00:18:54 +0000 (10:18 +1000)]
radeonsi: add an AMD_TEX_ANISO environment variable

This brings it inline with the recently added AMD_DEBUG.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109619

5 years agoi965: leave the top 4Gb of the high heap VMA unused
Kenneth Graunke [Fri, 3 May 2019 19:02:41 +0000 (12:02 -0700)]
i965: leave the top 4Gb of the high heap VMA unused

This ports commit 9e7b0988d6e98690eb8902e477b51713a6ef9cae from anv
to i965.  Thanks to Lionel for noticing that it was missing!

Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoi965: Force VMA alignment to be a multiple of the page size.
Kenneth Graunke [Sat, 27 Apr 2019 01:52:45 +0000 (18:52 -0700)]
i965: Force VMA alignment to be a multiple of the page size.

This should happen regardless, but let's be paranoid.

Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoi965: Fix BRW_MEMZONE_LOW_4G heap size.
Kenneth Graunke [Sat, 27 Apr 2019 00:09:11 +0000 (17:09 -0700)]
i965: Fix BRW_MEMZONE_LOW_4G heap size.

The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages,
and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB.

So we can't use the top page.

Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: Unset flag reg when FB write is not predicated
Matt Turner [Mon, 29 Apr 2019 23:01:08 +0000 (16:01 -0700)]
intel/compiler: Unset flag reg when FB write is not predicated

In the FS IR we pretend that the instruction is predicated with (+f0.1)
just for flag dependency tracking purposes. Since the instruction
doesn't support predication before Haswell, we unset the predicate so we
should also unset the flag register so that we can round-trip the
disassembly.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/disasm: Disassemble immediate value properly for dim
Sagar Ghuge [Fri, 29 Mar 2019 21:04:03 +0000 (14:04 -0700)]
intel/disasm: Disassemble immediate value properly for dim

On haswell, for dim instruction we encode immediate float value operand
into double float,

v2: Fix comment (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/disasm: Disassemble JIP offset for while
Sagar Ghuge [Thu, 28 Mar 2019 00:07:01 +0000 (17:07 -0700)]
intel/disasm: Disassemble JIP offset for while

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Replicate 16 bit immediate value correctly
Sagar Ghuge [Tue, 26 Mar 2019 04:17:08 +0000 (21:17 -0700)]
intel/compiler: Replicate 16 bit immediate value correctly

For the W or UW (signed or unsigned word) source types, the 16-bit value
must be replicated in both the low and high words of the 32-bit
immediate value.

v2: Fix replication in other places as well
V3: fix a few nits (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Print quad value in hex format
Sagar Ghuge [Sun, 24 Mar 2019 03:02:54 +0000 (20:02 -0700)]
intel/compiler: Print quad value in hex format

Print quad value same as unsigned quad so that we can distinguish in
between quater control disassembled values for e.g 1/2/3[Q] and
immediate quad value for e.g 1Q. This allows round-tripping through the
assembler/disassembler.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/tools: Add unit tests for assembler
Sagar Ghuge [Sat, 23 Mar 2019 02:13:54 +0000 (19:13 -0700)]
intel/tools: Add unit tests for assembler

v1: Pass executable object from meson to test(Dylan Baker)
v2: Ignore generated output files from git status(Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agointel/tools: Initialize offset correctly for i965_asm
Mika Kuoppala [Thu, 21 Feb 2019 00:47:01 +0000 (16:47 -0800)]
intel/tools: Initialize offset correctly for i965_asm

If we leave offset uninitialized, access to store
will be random depending on stack value and can
segfault.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/tools: Add meson pthread dependancy for i965_asm
Mika Kuoppala [Mon, 18 Feb 2019 13:50:03 +0000 (15:50 +0200)]
intel/tools: Add meson pthread dependancy for i965_asm

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/tools: New i965 instruction assembler tool
Sagar Ghuge [Tue, 11 Dec 2018 00:12:07 +0000 (16:12 -0800)]
intel/tools: New i965 instruction assembler tool

Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who
mentored me through out this project.

v2: Fix memory leaks and naming convention (Caio)
v3: Fix meson changes (Dylan Baker)
v4: Fix usage options (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141
5 years agoiris: Also handle res->offset for buffer sampler/image views
Kenneth Graunke [Tue, 7 May 2019 17:31:55 +0000 (10:31 -0700)]
iris: Also handle res->offset for buffer sampler/image views

5 years agoiris: support dmabuf imports with offsets
Mike Blumenkrantz [Tue, 30 Apr 2019 18:51:52 +0000 (14:51 -0400)]
iris: support dmabuf imports with offsets

this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264

fixes kwg/mesa#47

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agogallivm: fix broken 8-wide s3tc decoding
Roland Scheidegger [Tue, 7 May 2019 00:11:08 +0000 (02:11 +0200)]
gallivm: fix broken 8-wide s3tc decoding

Brian noticed there was an uninitialized var for the 8-wide case and 128
bit blocks, which made it always crash. Likewise, the 64bit block case
had another crash bug due to type mismatch.
Color decode (used for all s3tc formats) also had a bogus shuffle for
this case, leading to decode artifacts.
Fix these all up, which makes the code actually work 8-wide. Note that
it's still not used - I've verified it works, and the generated assembly
does look quite a bit simpler actually (20-30% less instructions for the
s3tc decode part with avx2), however in practice it still seems to be
sligthly slower for some unknown reason (tested with openarena) on my
haswell box, so for now continue to split things into 4-wide vectors
before decoding.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
5 years agodocs: Add relnotes stub for 19.2
Juan A. Suarez Romero [Tue, 7 May 2019 16:07:16 +0000 (16:07 +0000)]
docs: Add relnotes stub for 19.2

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
5 years agoBump version for 19.1 branch
Juan A. Suarez Romero [Tue, 7 May 2019 16:02:34 +0000 (16:02 +0000)]
Bump version for 19.1 branch

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
5 years agolima: enable sin and cos lowering for GP
Vasily Khoruzhick [Mon, 8 Apr 2019 03:56:24 +0000 (20:56 -0700)]
lima: enable sin and cos lowering for GP

GP doesn't support sin/cos natively, so we have to lower them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agonir: implement lowering for fsin and fcos
Vasily Khoruzhick [Sun, 7 Apr 2019 20:24:45 +0000 (13:24 -0700)]
nir: implement lowering for fsin and fcos

Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agofreedreno/ir3: move const_state to ir3_shader
Rob Clark [Tue, 7 May 2019 13:38:01 +0000 (06:38 -0700)]
freedreno/ir3: move const_state to ir3_shader

For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass.  So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this.  The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: split out const_state setup
Rob Clark [Tue, 7 May 2019 13:05:58 +0000 (06:05 -0700)]
freedreno/ir3: split out const_state setup

Next patch moves const_state to ir3_shader, before the compile context
is created.  So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: move immediates to const_state
Rob Clark [Mon, 6 May 2019 23:02:19 +0000 (16:02 -0700)]
freedreno/ir3: move immediates to const_state

They are really part of the constant state, and it will moving things
from ir3_shader_variant to ir3_shader if we combine them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: consolidate const state
Rob Clark [Mon, 6 May 2019 21:52:27 +0000 (14:52 -0700)]
freedreno/ir3: consolidate const state

Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: move ir3_pointer_size()
Rob Clark [Mon, 6 May 2019 18:58:07 +0000 (11:58 -0700)]
freedreno/ir3: move ir3_pointer_size()

Move to ir3_compiler so it doesn't depend on the compile context.  Prep
work for moving constant state from variant (where we have compile
context) to shader (where we do not).

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agovulkan/overlay-layer: fix cast errors
Lionel Landwerlin [Fri, 3 May 2019 15:42:55 +0000 (16:42 +0100)]
vulkan/overlay-layer: fix cast errors

Not quite sure what version of GCC/Clang produces errors (8.3.0
locally was fine).

v2: also fix an integer literal issue (Karol)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoanv: fix alphaToCoverage when there is no color attachment
Samuel Iglesias Gonsálvez [Tue, 30 Apr 2019 06:38:16 +0000 (08:38 +0200)]
anv: fix alphaToCoverage when there is no color attachment

There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.

v2:
  - We already create a null rt when we don't have any, so reuse that
    for this case (Jason)
  - Simplify the code a bit (Iago)

v3:
  - Take alpha to coverage from the key and don't tie this to depth-only
    rendering only, we want the same behavior if we have multiple render
    targets but the one at location 0 is not used. (Jason).
  - Rewrite commit message (Iago)

v4:
  - Make sure we take into account the array length of the shader outputs,
    which we were no handling correctly either and make sure we also
    create null render targets for any invalid array entries too.

v5:
  - Simplify removal of unused outputs by using rt_used[] so we don't have
    to special case alpha to coverage there too.

Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: Don't always require precise lowering of flrp
Ian Romanick [Sun, 19 Aug 2018 00:11:12 +0000 (17:11 -0700)]
intel/compiler: Don't always require precise lowering of flrp

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
instructions in affected programs: 3271235 -> 3242419 (-0.88%)
helped: 13636
HURT: 90
helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
95% mean confidence interval for instructions value: -2.13 -2.07
95% mean confidence interval for instructions %-change: -1.16% -1.13%
Instructions are helped.

total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
cycles in affected programs: 70415766 -> 70282014 (-0.19%)
helped: 12563
HURT: 515
helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
95% mean confidence interval for cycles value: -10.56 -9.90
95% mean confidence interval for cycles %-change: -0.47% -0.45%
Cycles are helped.

LOST:   0
GAINED: 13

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Reassociate open-coded flrp(1, b, c)
Ian Romanick [Sun, 19 Aug 2018 19:42:05 +0000 (12:42 -0700)]
nir/algebraic: Reassociate open-coded flrp(1, b, c)

In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/flrp: Lower flrp(a, b, #c) differently
Ian Romanick [Sat, 18 Aug 2018 23:53:55 +0000 (16:53 -0700)]
nir/flrp: Lower flrp(a, b, #c) differently

This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists
Ian Romanick [Thu, 23 Aug 2018 04:21:04 +0000 (21:21 -0700)]
nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists

There is little effect on Intel GPUs now because we almost always take
the "always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any other Intel platforms.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 188852500 -> 188852484 (<.01%)
cycles in affected programs: 14612 -> 14596 (-0.11%)
helped: 4
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11%
95% mean confidence interval for cycles value: -4.00 -4.00
95% mean confidence interval for cycles %-change: -0.13% -0.09%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists
Ian Romanick [Sun, 19 Aug 2018 00:07:22 +0000 (17:07 -0700)]
nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists

This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any Intel platform.  Before a number of large rebases this
helped cycles in a couple shaders on Iron Lake and GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently
Ian Romanick [Wed, 22 Aug 2018 00:17:24 +0000 (17:17 -0700)]
nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8189888 -> 8153912 (-0.44%)
instructions in affected programs: 1199037 -> 1163061 (-3.00%)
helped: 4124
HURT: 10
helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9
helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02%
HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
HURT stats (rel)   min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06%
95% mean confidence interval for instructions value: -8.84 -8.56
95% mean confidence interval for instructions %-change: -5.12% -4.77%
Instructions are helped.

total cycles in shared programs: 188606710 -> 188426964 (-0.10%)
cycles in affected programs: 27505596 -> 27325850 (-0.65%)
helped: 4026
HURT: 77
helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46
helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85%
HURT stats (abs)   min: 2 max: 376 x̄: 17.79 x̃: 6
HURT stats (rel)   min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04%
95% mean confidence interval for cycles value: -44.75 -42.87
95% mean confidence interval for cycles %-change: -2.44% -2.17%
Cycles are helped.

LOST:   3
GAINED: 35

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/flrp: Lower flrp(#a, #b, c) differently
Ian Romanick [Sat, 18 Aug 2018 23:49:48 +0000 (16:49 -0700)]
nir/flrp: Lower flrp(#a, #b, c) differently

If the magnitudes of #a and #b are such that (b-a) won't lose too much
precision, lower as a+c(b-a).

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8192503 -> 8192383 (<.01%)
instructions in affected programs: 18417 -> 18297 (-0.65%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43%
95% mean confidence interval for instructions value: -2.48 -1.05
95% mean confidence interval for instructions %-change: -1.56% -0.63%
Instructions are helped.

total cycles in shared programs: 188662536 -> 188661956 (<.01%)
cycles in affected programs: 744476 -> 743896 (-0.08%)
helped: 62
HURT: 0
helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6
helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06%
95% mean confidence interval for cycles value: -12.37 -6.34
95% mean confidence interval for cycles %-change: -0.48% -0.06%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5
Ian Romanick [Sat, 18 Aug 2018 23:42:04 +0000 (16:42 -0700)]
intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5

Previously lower_flrp32 was only set for vertex shaders.  Fragment
shaders performed a(1-c)+bc lowering during code generation.

The shaders with loops hurt are SIMD8 and SIMD16 shaders for a
text-identical fragment shader.

v2: Rebase on 26391cceaa1 ("intel/compiler: Lower ffma on Gen4 and
Gen5").

v3: Rebase on a004e95dd73 ("radeonsi/nir: create si_nir_opts() helper")

Iron Lake
total instructions in shared programs: 8211385 -> 8185974 (-0.31%)
instructions in affected programs: 2503898 -> 2478487 (-1.01%)
helped: 9936
HURT: 921
helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11%
HURT stats (abs)   min: 1 max: 12 x̄: 3.24 x̃: 2
HURT stats (rel)   min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89%
95% mean confidence interval for instructions value: -2.43 -2.25
95% mean confidence interval for instructions %-change: -1.41% -1.33%
Instructions are helped.

total cycles in shared programs: 188523186 -> 188401198 (-0.06%)
cycles in affected programs: 71541604 -> 71419616 (-0.17%)
helped: 11649
HURT: 1871
helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6
helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 13.38 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17%
95% mean confidence interval for cycles value: -9.42 -8.63
95% mean confidence interval for cycles %-change: -0.54% -0.50%
Cycles are helped.

total loops in shared programs: 852 -> 856 (0.47%)
loops in affected programs: 0 -> 4
helped: 0
HURT: 4
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for loops value: 1.00 1.00
95% mean confidence interval for loops %-change: 0.00% 0.00%
Loops are HURT.

LOST:   3
GAINED: 12

GM45
total instructions in shared programs: 5046407 -> 5033694 (-0.25%)
instructions in affected programs: 1303584 -> 1290871 (-0.98%)
helped: 5010
HURT: 464
helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2
helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08%
HURT stats (abs)   min: 1 max: 75 x̄: 3.39 x̃: 2
HURT stats (rel)   min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87%
95% mean confidence interval for instructions value: -2.45 -2.20
95% mean confidence interval for instructions %-change: -1.40% -1.28%
Instructions are helped.

total cycles in shared programs: 128889476 -> 128812366 (-0.06%)
cycles in affected programs: 44845402 -> 44768292 (-0.17%)
helped: 6079
HURT: 940
helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8
helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 16.01 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17%
95% mean confidence interval for cycles value: -11.63 -10.34
95% mean confidence interval for cycles %-change: -0.58% -0.52%
Cycles are helped.

total loops in shared programs: 633 -> 635 (0.32%)
loops in affected programs: 0 -> 2
helped: 0
HURT: 2

total spills in shared programs: 60 -> 69 (15.00%)
spills in affected programs: 54 -> 63 (16.67%)
helped: 0
HURT: 1

total fills in shared programs: 92 -> 105 (14.13%)
fills in affected programs: 80 -> 93 (16.25%)
helped: 0
HURT: 1

LOST:   15
GAINED: 15

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
5 years agonir: Use the flrp lowering pass instead of nir_opt_algebraic
Ian Romanick [Sat, 18 Aug 2018 23:42:04 +0000 (16:42 -0700)]
nir: Use the flrp lowering pass instead of nir_opt_algebraic

I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/flrp: Add new lowering pass for flrp instructions
Ian Romanick [Sat, 18 Aug 2018 18:46:46 +0000 (11:46 -0700)]
nir/flrp: Add new lowering pass for flrp instructions

This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp.  Rebase on top of
3766334923e ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Pull common multiplication out of flrp arguments
Ian Romanick [Thu, 23 Aug 2018 04:55:55 +0000 (21:55 -0700)]
nir/algebraic: Pull common multiplication out of flrp arguments

All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Pull common addition out of flrp arguments
Ian Romanick [Thu, 23 Aug 2018 02:15:15 +0000 (19:15 -0700)]
nir/algebraic: Pull common addition out of flrp arguments

v2: Augment the late optimization patterns with a couple pre-ffma pass
patterns.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342982 -> 15342485 (<.01%)
instructions in affected programs: 56304 -> 55807 (-0.88%)
helped: 235
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
95% mean confidence interval for instructions value: -2.31 -1.92
95% mean confidence interval for instructions %-change: -1.46% -1.09%
Instructions are helped.

total cycles in shared programs: 355734740 -> 355734320 (<.01%)
cycles in affected programs: 1028807 -> 1028387 (-0.04%)
helped: 134
HURT: 104
helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
95% mean confidence interval for cycles value: -8.51 4.98
95% mean confidence interval for cycles %-change: -0.35% 0.39%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10886815 -> 10886390 (<.01%)
instructions in affected programs: 36883 -> 36458 (-1.15%)
helped: 147
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
95% mean confidence interval for instructions value: -3.12 -2.67
95% mean confidence interval for instructions %-change: -1.83% -1.38%
Instructions are helped.

total cycles in shared programs: 154188360 -> 154186902 (<.01%)
cycles in affected programs: 388094 -> 386636 (-0.38%)
helped: 90
HURT: 58
helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
95% mean confidence interval for cycles value: -22.62 2.92
95% mean confidence interval for cycles %-change: -0.68% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
instructions in affected programs: 54560 -> 53678 (-1.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
95% mean confidence interval for instructions value: -5.21 -4.28
95% mean confidence interval for instructions %-change: -2.23% -1.72%
Instructions are helped.

total cycles in shared programs: 188654442 -> 188650364 (<.01%)
cycles in affected programs: 1454384 -> 1450306 (-0.28%)
helped: 204
HURT: 0
helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
95% mean confidence interval for cycles value: -22.38 -17.60
95% mean confidence interval for cycles %-change: -0.67% -0.46%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoglsl_to_nir: drop supports_ints
Christian Gmeiner [Sun, 5 May 2019 09:39:08 +0000 (11:39 +0200)]
glsl_to_nir: drop supports_ints

At initial nir level all drivers are supporting ints.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: nir_shader_compiler_options: drop native_integers
Christian Gmeiner [Sun, 5 May 2019 09:35:41 +0000 (11:35 +0200)]
nir: nir_shader_compiler_options: drop native_integers

Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agopanfrost: Refactor blend descriptors
Alyssa Rosenzweig [Sat, 4 May 2019 21:57:01 +0000 (21:57 +0000)]
panfrost: Refactor blend descriptors

This commit does a fairly large cleanup of blend descriptors, although
there should not be any functional changes. In particular, we split
apart the Midgard and Bifrost blend descriptors, since they are
radically different. From there, we can identify that the Midgard
descriptor as previously written was really two render targets'
descriptors stuck together. From this observation, we split the Midgard
descriptor into what a single RT actually needs. This enables us to
correctly dump blending configuration for MRT samples on Midgard. It
also allows the Midgard and Bifrost blend code to peacefully coexist,
with runtime selection rather than a #ifdef. So, as a bonus, this will
help the future Bifrost effort, eliminating one major source of
compile-time architectural divergence.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agolima/gpir: enable lowering for ftrunc
Vasily Khoruzhick [Sat, 4 May 2019 14:51:27 +0000 (07:51 -0700)]
lima/gpir: enable lowering for ftrunc

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/gpir: implement nir_op_fmov
Vasily Khoruzhick [Sat, 4 May 2019 14:51:00 +0000 (07:51 -0700)]
lima/gpir: implement nir_op_fmov

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima: use int_to_float lowering pass
Vasily Khoruzhick [Wed, 1 May 2019 05:25:23 +0000 (22:25 -0700)]
lima: use int_to_float lowering pass

Neither GP nor PP in Mali4x0 support integers, so utilize new pass
and set native_integers to true for now until this flag is dropped.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agonir: add int_to_float lowering pass
Vasily Khoruzhick [Wed, 1 May 2019 05:25:05 +0000 (22:25 -0700)]
nir: add int_to_float lowering pass

This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.

It uses newly introduced pass to gather SSA types and should be
used as late as possible.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agoradeonsi: add config entry for Counter-Strike Global Offensive
Timothy Arceri [Mon, 6 May 2019 04:39:44 +0000 (14:39 +1000)]
radeonsi: add config entry for Counter-Strike Global Offensive

This fixes rendering issues with gun scopes which is rather
important.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100239

5 years agolima/gpir: fix float uniform alignment issue
Vasily Khoruzhick [Wed, 1 May 2019 02:53:01 +0000 (19:53 -0700)]
lima/gpir: fix float uniform alignment issue

If PIPE_CAP_PACKED_UNIFORMS is not set uniforms are vec4 aligned,
so lima_nir_lower_uniform_to_scalar should use first channel of vec4
for float uniforms.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agodraw: flush when setting stream-out targets
Erik Faye-Lund [Wed, 1 May 2019 13:37:45 +0000 (15:37 +0200)]
draw: flush when setting stream-out targets

We need to re-prepare the middle-end state to pick up changes to this
state to react correctly to pausing/resuming stream-out. So let's add a
flush here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: ec8cbd79ac4 "draw/softpipe: EXT_transform_feedback support (v2)"
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: pass stream-out targets to draw-module early
Erik Faye-Lund [Mon, 6 May 2019 13:35:04 +0000 (15:35 +0200)]
llvmpipe: pass stream-out targets to draw-module early

We currently set this state in the draw-module twice on each draw, but
which trashes this state. So far that's not a problem, because we don't
really do much from that function.

But it turns out, we're going to have to do more; namely flush when the
state changes. This will incur a large performance penalty due to the
excessive setting.

Instead, let's rely on the CSO caching making sure that
llvmpipe_set_so_targets doesn't get called needlessly, and setup the
state directly there instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agodoc: Update GL_KHR_robustness in features.txt for r600
Uros Bizjak [Mon, 6 May 2019 20:21:14 +0000 (06:21 +1000)]
doc: Update GL_KHR_robustness in features.txt for r600

glxinfo for Cypress XT [Radeon HD 5870] lists GL_KHR_robustness
as supported extension.  This was the last missing extension
for GL 4.5, so Mark GL 4.5 as all DONE for r600.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agovirgl: do not use inline writes for subdata
Chia-I Wu [Fri, 3 May 2019 17:40:38 +0000 (10:40 -0700)]
virgl: do not use inline writes for subdata

Inline writes skip transfer map/unamp at the cost of an extra copy
on the data during execbuffer.  That is generally a win for small
transfers.  But the heuristic to use inline writes based on buffer
sizes rather than transfer sizes makes little sense.  More
importantly, inline writes miss optimizations that are done for
buffer transfers.

Let's just use transfers.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
5 years agovirgl: rework queries
Chia-I Wu [Mon, 18 Mar 2019 22:56:35 +0000 (15:56 -0700)]
virgl: rework queries

virglrender has been changed such that

 - VIRGL_CCMD_GET_QUERY_RESULT is fenced
 - query buffers (PIPE_BIND_CUSTOM) are coherent

We can check if a query is ready using DRM_IOCTL_VIRTGPU_WAIT, and also
avoid a synchronized transfer to retrieve the query result.  When
running against an older virglrenderer, it falls back to the old
behavior automatically.

TF2 @ 640x480 for pts4.dem went from 17fps to 40fps on my testing
machine.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
5 years agovirgl: export resource_is_busy from winsys
Chia-I Wu [Tue, 19 Mar 2019 18:13:40 +0000 (11:13 -0700)]
virgl: export resource_is_busy from winsys

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
5 years agoradv: fix rowPitch for R32G32B32 formats on GFX9
Samuel Pitoiset [Mon, 6 May 2019 14:17:26 +0000 (16:17 +0200)]
radv: fix rowPitch for R32G32B32 formats on GFX9

The pitch is actually the number of components per row. We found
the problem when we implemented some meta operations for these
formats and the wrong pitch has been confirmed with a small test case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108325
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoiris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
Kenneth Graunke [Wed, 1 May 2019 21:34:00 +0000 (14:34 -0700)]
iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS

This makes CompressedTexSubImage from a PBO source do proper GPU
rendering to upload instead of stalling to map the PBO source on
the CPU (then copying it on the CPU).

Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this
functionality, and to Jason Ekstrand for writing the code I adapted.
Vulkan only supports a single layer, however, and this code tries to
support multiple layers as long as it's miplevel 0.

Improves performance in Sid Meier's Civilization VI:

   Average frame time (ms):         -3.67423% +/- 1.46201% (n=5)
   99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)

5 years agoradv: Use given stride for images imported from Android.
Bas Nieuwenhuizen [Mon, 6 May 2019 13:44:04 +0000 (15:44 +0200)]
radv: Use given stride for images imported from Android.

Handled similarly as radeonsi. I checked the offsets are actually used.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agolima/ppir: abort compilation in case of unsupported intrinsic
Erico Nunes [Sun, 5 May 2019 08:53:33 +0000 (10:53 +0200)]
lima/ppir: abort compilation in case of unsupported intrinsic

Currently ppir continues compilation when there is an unsupported
intrinsic, resulting in a shader that will surely not work as intended.

This is a problem during piglit runs as some tests don't compile
properly due to this but actually still get submitted to the gpu and
leave the system in an unstable state after executing, causing further
tests to fail.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/ir: print names of unsupported intrinsics
Erico Nunes [Sun, 5 May 2019 08:51:43 +0000 (10:51 +0200)]
lima/ir: print names of unsupported intrinsics

While lima still doesn't support some kinds of intrinsics, it is more
helpful to display the name of the unsupported instr->intrinsic to make
debugging easier.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agomesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list
John Stultz [Thu, 2 May 2019 20:36:11 +0000 (13:36 -0700)]
mesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list

In commit a99c360a4630 (nir: add pass to lower fb reads), a new
file was added that needs to also be added to the
Makefile.sources list used by the Android and SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: a99c360a463 ("nir: add pass to lower fb reads")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
5 years agomesa: Makefile.sources: Add ir3_nir_lower_load_barycentric_at_sample/offset to Makefi...
John Stultz [Fri, 3 May 2019 16:17:57 +0000 (09:17 -0700)]
mesa: Makefile.sources: Add ir3_nir_lower_load_barycentric_at_sample/offset to Makefile.sources

In commit 2f0b9d22495 ("freedreno/ir3: lower
load_barycentric_at_offset") a new file was added that needs to
also be added to the Makefile.sources list used by Android and
SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 2f0b9d22495 ("freedreno/ir3: lower load_barycentric_at_offset")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
5 years agomesa: android: freedreno: Fix build failure due to path change
John Stultz [Thu, 2 May 2019 16:25:57 +0000 (09:25 -0700)]
mesa: android: freedreno: Fix build failure due to path change

The ir3_nir_trig.py file was moved in a previous commit,
aa0fed10d3574 (freedreno: move ir3 to common location),
so update the Android.gen.mk file to match.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: aa0fed10d35 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
5 years agomesa: android: freedreno: build libfreedreno_{drm,ir3} static libs
Amit Pundir [Tue, 30 Apr 2019 07:36:19 +0000 (13:06 +0530)]
mesa: android: freedreno: build libfreedreno_{drm,ir3} static libs

Add libfreedreno_drm/ir3 to the build

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: b4476138d5a ("freedreno: move drm to common location")
Fixes: aa0fed10d35 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Tweaked to add extra ir3 files from master]
Signed-off-by: John Stultz <john.stultz@linaro.org>
5 years agomesa: android: Remove unnecessary dependency tracking rules
Alistair Strachan [Wed, 19 Sep 2018 22:05:33 +0000 (15:05 -0700)]
mesa: android: Remove unnecessary dependency tracking rules

The current AOSP master build system breaks building mesa due to the
following error:

external/mesa3d/src/compiler/Android.glsl.gen.mk:94: error:
  writing to readonly directory: "external/mesa3d/src/compiler/glsl/ir.h"

This error is bogus -- nothing "writes" to ir.h -- but the rule is
unnecessary because the generated header that is a dependency of the
non-generated header should be added to LOCAL_GENERATED_SOURCES and this
will track if the dependency needs to be regenerated.

(This change fixes a similar problem affecting nir.h too.)

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
[jstultz: Forward ported and tweaked commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
5 years agoradv: Implement cosited_even sampling.
Bas Nieuwenhuizen [Mon, 6 May 2019 00:15:55 +0000 (02:15 +0200)]
radv: Implement cosited_even sampling.

Apparently cosited_even was the required one instead of midpoint.

This adds slight offset of 0.5 pixels to the coordinates (+ we need
the image size to convert to normalized coords)

Fixes: 91702374d5d "radv: Add ycbcr lowering pass."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoRestore erroneously removed .gitignore entry for "build" directory
Michel Dänzer [Fri, 3 May 2019 08:12:56 +0000 (10:12 +0200)]
Restore erroneously removed .gitignore entry for "build" directory

It was removed in "delete autotools .gitignore files", but the build
directory is created by scons.

[Skip CI]

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoradv: Disable subsampled formats.
Bas Nieuwenhuizen [Sun, 5 May 2019 23:42:21 +0000 (01:42 +0200)]
radv: Disable subsampled formats.

Broken on Polaris and since I discovered NV12 is not subsampled, but
a 2-plane format I decided I don't really care.

Work to do to re-enable:

1) Figure out which devices support it natively.
2) Write some software emulation for the others.

Fixes: 52c1adda21b "radv: Add ycbcr format features."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoutil/drirc: add workarounds for bugs in Doom 3: BFG
Timothy Arceri [Fri, 3 May 2019 03:59:05 +0000 (13:59 +1000)]
util/drirc: add workarounds for bugs in Doom 3: BFG

This makes the game playable on radeonsi.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110143

5 years agofreedreno: remove unused forward struct declaration
Rob Clark [Sat, 4 May 2019 20:59:20 +0000 (13:59 -0700)]
freedreno: remove unused forward struct declaration

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agopanfrost/midgard: iabs cannot run on mul
Alyssa Rosenzweig [Fri, 3 May 2019 03:27:18 +0000 (03:27 +0000)]
panfrost/midgard: iabs cannot run on mul

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Lower mixed csel (NIR)
Alyssa Rosenzweig [Fri, 3 May 2019 03:16:14 +0000 (03:16 +0000)]
panfrost/midgard: Lower mixed csel (NIR)

Basically, when the conditions of a csel diverge, we scalarize to avoid
going into weird code paths during emit. We could be doing better, but
this case can't occur organically from GLSL as far as I can, though it
does fix lowered atan2.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Fix RA when temp_count = 0
Alyssa Rosenzweig [Fri, 3 May 2019 02:50:16 +0000 (02:50 +0000)]
panfrost/midgard: Fix RA when temp_count = 0

A previous commit by Tomeu aborted RA early, which solves the memory
corruption issue, but then generates an incorrect compile. This fixes
that.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Fix integer selection
Alyssa Rosenzweig [Fri, 3 May 2019 01:54:16 +0000 (01:54 +0000)]
panfrost/midgard: Fix integer selection

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Support RGB565 FBOs
Alyssa Rosenzweig [Thu, 2 May 2019 02:27:04 +0000 (02:27 +0000)]
panfrost: Support RGB565 FBOs

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard/disasm: Handle dest_override generalized
Alyssa Rosenzweig [Wed, 1 May 2019 02:00:08 +0000 (02:00 +0000)]
panfrost/midgard/disasm: Handle dest_override generalized

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard/disasm: Stub out 64-bit
Alyssa Rosenzweig [Tue, 30 Apr 2019 23:19:41 +0000 (23:19 +0000)]
panfrost/midgard/disasm: Stub out 64-bit

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard/disasm: Print 8-bit sources
Alyssa Rosenzweig [Tue, 30 Apr 2019 19:16:22 +0000 (19:16 +0000)]
panfrost/midgard/disasm: Print 8-bit sources

This handles the usual case. 8-bit register access parallels 16-bit
access, but with one major caveat: in 8-bit mode, only half of the
register file is actually (directly) accessible as sources. In
particular, for each 16-bit integer register (hrN), we can only index a
*single* 8-bit integer (qrN), corresponding to the lower 8-bits. To get
the upper 8-bits, it is required to do an explicit shift. For example,
to add the bytes of a 16-bit integer hr0.x and get the result as an
8-bit qr0, you'd need to do something like:

   ilsr hr1.x, hr0.x, #8
   iadd qr0.x, qr0.x, qr1.x

This scheme diverges from 32-bit registers, in that both the upper and
lower halves of a 32-bit register are individually accessible as a pair
of half registers. For contrast, to add the lower and upper 16-bits of a
32-bit integer r0.x, you can just:

   iadd hr0.x, hr0.x, hr1.x

Since hr1.x = upper 16-bit of r0.x.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard/disasm: Support 8-bit destination
Alyssa Rosenzweig [Tue, 30 Apr 2019 19:05:49 +0000 (19:05 +0000)]
panfrost/midgard/disasm: Support 8-bit destination

Meanwhile, we're forced to disable dest_override, since it's not yet
clear how this interacts with other bitnesses (it'll likely need to be
overhauled in any case).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Rename ilzcnt8 -> iclz
Alyssa Rosenzweig [Tue, 30 Apr 2019 06:19:33 +0000 (06:19 +0000)]
panfrost/midgard: Rename ilzcnt8 -> iclz

Per OpenCL.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Fix crash on unknown op
Alyssa Rosenzweig [Tue, 30 Apr 2019 05:06:18 +0000 (05:06 +0000)]
panfrost/midgard: Fix crash on unknown op

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard/disasm: Fill in .int mod
Alyssa Rosenzweig [Tue, 30 Apr 2019 04:59:28 +0000 (04:59 +0000)]
panfrost/midgard/disasm: Fill in .int mod

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard/disasm: Extend print_reg to 8-bit
Alyssa Rosenzweig [Tue, 30 Apr 2019 04:58:52 +0000 (04:58 +0000)]
panfrost/midgard/disasm: Extend print_reg to 8-bit

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard/disasm: Catch mask errors
Alyssa Rosenzweig [Tue, 30 Apr 2019 04:52:36 +0000 (04:52 +0000)]
panfrost/midgard/disasm: Catch mask errors

We silently ignored certain bits of the mask, which causes issues when
disassembly 8/64-bit ops.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: reg_mode_full -> reg_mode_32, etc
Alyssa Rosenzweig [Tue, 30 Apr 2019 02:19:26 +0000 (02:19 +0000)]
panfrost/midgard: reg_mode_full -> reg_mode_32, etc

In preparation for 8-bit and 64-bit operands, let's not reinforce the
32-bit-centric biases in the ISA.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agofreedreno/a6xx: deduplicate a few lines
Rob Clark [Sat, 4 May 2019 16:16:58 +0000 (09:16 -0700)]
freedreno/a6xx: deduplicate a few lines

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno: add ubwc_enabled helper
Rob Clark [Sat, 4 May 2019 16:02:54 +0000 (09:02 -0700)]
freedreno:  add ubwc_enabled helper

Since it is dependent on the tile mode (ie. disabled for smaller mipmap
levels), we should handle it a similar way to fd_resource_level_linear().
The code previously mostly did the right thing because the old helper
took the tile mode.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno: move UBWC color offset to fd_resource_offset()
Rob Clark [Sat, 4 May 2019 15:04:59 +0000 (08:04 -0700)]
freedreno: move UBWC color offset to fd_resource_offset()

Best to keep it encapsulated in the helper which returns layer/level
offset (and actually use that helper everywhere) rather than spreading
the logic around the code.

Also add a helper to find UBWC offset, to complete the encapsulation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: buffer resources cannot be compressed
Rob Clark [Sat, 4 May 2019 14:56:12 +0000 (07:56 -0700)]
freedreno/a6xx: buffer resources cannot be compressed

Small cleanup.  They are just an array of data and only ever linear/
uncompressed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno: mark imported resources as valid
Rob Clark [Sat, 4 May 2019 12:06:50 +0000 (05:06 -0700)]
freedreno: mark imported resources as valid

If someone is importing a buffer, we can't really know the state of it's
contents, so assume it is valid.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: UBWC support for images
Rob Clark [Fri, 3 May 2019 20:39:45 +0000 (13:39 -0700)]
freedreno/a6xx: UBWC support for images

There are still some fallbacks we'll need to handle before we can enable
UBWC by default.  I think we may need to fallback to uncompressed if
image atomic operations are used.  And we still need to sort out how to
handle image and sampler views of compressed resources if the image/
sampler view is using a format that does not support compression.  (I
think the latter should hopefully be uncommon outside of deqp/piglit.)

But at least this gets us to the point where supertuxkart works properly
with UBWC enabled ;-)

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: UBWC fixes
Rob Clark [Fri, 3 May 2019 20:10:22 +0000 (13:10 -0700)]
freedreno/a6xx: UBWC fixes

A few fixes that get UBWC working for the games/benchmarks where I
noticed problems before (in particular and manhattan, and stk (modulo
image support for UBWC when compute shaders are used for post-process
effects):

  + fix the size of the UBWC meta buffer (ie, the offset to color
    pixel data) that is returned by ->fill_ubwc_buffer_sizes()
  + correct size/layout for 8 and 16 byte per pixel formats
  + limit the supported formats.. Note all formats that can be
    tiled can be compressed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno: update generated headers
Rob Clark [Fri, 3 May 2019 17:23:00 +0000 (10:23 -0700)]
freedreno: update generated headers

Corrects tex state ubwc pitch/size

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: OUT_RELOC vs OUT_RELOCW fixes
Rob Clark [Fri, 3 May 2019 13:22:08 +0000 (06:22 -0700)]
freedreno/a6xx: OUT_RELOC vs OUT_RELOCW fixes

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: remove assert
Rob Clark [Fri, 3 May 2019 16:33:34 +0000 (09:33 -0700)]
freedreno/ir3: remove assert

Fixes dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 and .20

ca3eb5db665cbcc2de5a5d3158e3dc68f86e5822 went from silently truncating
the constant state, which was also the wrong thing to do, to an assert.
Which then showed up in a couple of dEQPs.  Actually there is nothing
wrong with larger constant file so just drop the assert.

Signed-off-by: Rob Clark <robdclark@chromium.org>