Alyssa Rosenzweig [Sat, 19 Oct 2019 19:07:27 +0000 (15:07 -0400)]
panfrost/ci: Update expectations list
A bunch of blend tests fixed on T760. A single blend test regressed on
both T760/T860 but I am unable to reproduce locally so am just
documenting the regression and moving on.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 16 Oct 2019 17:19:49 +0000 (13:19 -0400)]
pan/midgard: Implement SIMD-aware dead code elimination
We would like to eliminate not just entire dead instructions, but also
dead components, which increases scheduler flexibility (since some
vector instructions can become scalar after eliminating dead
components). This also will allow better RA in the future.
Results are meh.
total instructions in shared programs: 3453 -> 3451 (-0.06%)
instructions in affected programs: 60 -> 58 (-3.33%)
helped: 2
HURT: 0
total bundles in shared programs: 1826 -> 1824 (-0.11%)
bundles in affected programs: 33 -> 31 (-6.06%)
helped: 2
HURT: 0
total quadwords in shared programs: 3144 -> 3144 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 321 -> 321 (0.00%)
registers in affected programs: 45 -> 45 (0.00%)
helped: 11
HURT: 11
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 50.00% x̄: 39.70% x̃: 50.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for registers value: -0.45 0.45
95% mean confidence interval for registers %-change: -1.87% 62.18%
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 445 -> 447 (0.45%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 17 Oct 2019 20:37:11 +0000 (16:37 -0400)]
pan/midgard: Create dependency graph bytewise
This allows for vec16 dependencies in the scheduler, not that we have
any yet (thankfully).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 16 Oct 2019 17:01:41 +0000 (13:01 -0400)]
pan/midgard: Handle nontrivial masks in texture RA
The texture instruction has a mask we need to take into account.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:30:13 +0000 (12:30 -0400)]
pan/midgard: Implement per-byte liveness tracking
Now that we have notion of byte masks, liveness tracking can be updated
to reflect this extra granularity without loss of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 18 Oct 2019 02:18:36 +0000 (22:18 -0400)]
pan/midgard: Simplify mir_bytemask_of_read_components
There are easy ways to iterate sources!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:25:32 +0000 (12:25 -0400)]
pan/midgard: Report byte masks for read components
Read component masks don't have a particular type associated, since the
type of the ALU operation may not match the type of the operands in
question. So let's generate byte masks instead, and update the rest of
the compiler to use byte masks when analyzing reads.
Preparation for mixed types.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:24:28 +0000 (12:24 -0400)]
pan/midgard: Add helpers for manipulating byte masks
There are essentially two formats of masks in play beginning with this
commit: masks per-channel and masks per-byte. The former make sense
within a given fixed-size instruction; the latter are
typesize-independent. It turns out you need the latter to meaningfully
manipulate instructions containing multiple sizes (which is quite
possible with ALU operations).
Similarly, we have mir_srcsize. We calculate the size of the source by
analyzing the size of the instruction itself and stepping down if there
is a half-modifier.
Finally, we have mir_round_bytemask_down, for when we want to take a
byte mask and "round it down" to a given component size, so that we can
use it as a component mask.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 19 Oct 2019 18:04:39 +0000 (14:04 -0400)]
pan/midgard: Implement OP_IS_STORE with table
..rather than open-coding.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 18 Oct 2019 12:22:40 +0000 (08:22 -0400)]
pan/midgard: Tableize load/store ops
This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 17 Oct 2019 20:37:48 +0000 (16:37 -0400)]
pan/midgard: Factor out mir_get_alu_src
This helper is used in a bunch of places ... might as well make that
common.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 16 Oct 2019 21:34:28 +0000 (17:34 -0400)]
pan/midgard/disasm: Fix printing 8-bit/16-bit masks
The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 18 Oct 2019 12:18:52 +0000 (08:18 -0400)]
pan/midgard: Identify 64-bit atomic opcodes
They are symmetric to their 32-bit counterparts, just shifted.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:18:51 +0000 (12:18 -0400)]
pan/midgard: Debug mir_insert_instruction_after_scheduled
Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 2d914ebe818 ("pan/midgard: Fix memory corruption in register spilling")
Christian Gmeiner [Fri, 6 Sep 2019 13:13:51 +0000 (15:13 +0200)]
etnaviv: keep track of buffer valid ranges for PIPE_BUFFER
This allows a write to proceed to an uninitialized part of a buffer
even when the GPU is using the previously-initialized portions.
Such a situation can be triggered with the following API usage example:
glBufferSubData(..., offset, size, data1);
glDrawArrays(...);
// append new vertex data
glBufferSubData(..., offset+size, size, data2);
glDrawArrays(...);
Same is done for freedreno, nouveau and radeon.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Christian Gmeiner [Fri, 6 Sep 2019 19:21:26 +0000 (21:21 +0200)]
etnaviv: store updated usage in pipe_transfer object
Store the changed usage in the newly created transfer object.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Christian Gmeiner [Sun, 20 Oct 2019 06:02:11 +0000 (08:02 +0200)]
etnaviv: fix code style
Fixes: 1194afdfe35 ("etnaviv: rework the stream flush to always go through the context flush")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Lionel Landwerlin [Fri, 18 Oct 2019 12:28:30 +0000 (15:28 +0300)]
anv: fix memory leak on device destroy
v2: handle vma destruction if vkCreateDevice fails (Jordan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1959
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Christian Gmeiner [Sun, 20 Oct 2019 05:38:03 +0000 (07:38 +0200)]
etnaviv: fix compile warnings
Fixes: e5cc66dfad0 ("etnaviv: Rework locking")
Fixes: 1456aa61cc5 ("etnaviv: Rework resource status tracking")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Eric Anholt [Thu, 29 Aug 2019 23:05:20 +0000 (16:05 -0700)]
mesa: Redefine the RG formats as array formats.
This is the layout used in the GL API, and maps directly to PIPE
formats with no endianness trickery. As with the LA change, this
fixes big-endian fetching from texbos. Also cleans up some endian
shenanigans in shader images.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 29 Aug 2019 22:56:19 +0000 (15:56 -0700)]
gallium: Drop the unused PIPE_FORMAT_A*L* formats.
Now that Mesa is also using an array format for LA, nothing was using
these. (And, clearly, no HW driver had exposed them).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 5 Sep 2019 21:42:15 +0000 (14:42 -0700)]
mesa: Replace MESA_FORMAT_L8A8/A8L8 UNORM/SNORM/SRGB with an array format.
The array format is what the GL API wants (fixing texbos on
big-endian), and matches directly to gallium's corresponding array
format. The only driver exposing A8L8 was radeon/r200 in big-endian,
where the HW's underlying format was trying to read as array and we
needed to flip things around to make our packed format come out right
(note that while the radeon format tables had both AL and LA,
ChooseTextureFormat would only pick one of them based on endianness).
v2: Don't make r200/radeon use endian swaps.
v3: Rebase on dropping the r200 _be/_le format table removal patch
v4: reword commit message to explain why we can drop both formats
from radeon.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Eric Anholt [Thu, 29 Aug 2019 22:45:18 +0000 (15:45 -0700)]
mesa: Replace the LA16_UNORM packed formats with one array format.
The array format is what the GL API wants (and we made a mistake in
the format returned for texbos on big-endian!), and it's exactly what
the gallium-side PIPE_FORMAT_L16A16 is. The only downside is that
dri_util tries to fall back to sampling RG16 using LA16, which doesn't
have a match for big-endian any more. No HW drivers supported A16L16
anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 5 Sep 2019 23:06:34 +0000 (16:06 -0700)]
radeon: Drop the unused first arg of OUT_BATCH_RELOC.
This was a trap when trying to figure out how to fit data bits into
the reloc.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 5 Sep 2019 23:04:01 +0000 (16:04 -0700)]
radeon: Fill in the TXOFFSET field containing the tile bits in our relocs.
The first arg to OUT_BATCH_RELOC is ignored, we actually wanted these
in the third arg. They're always 0 so far, so it didn't matter.
v2: Reword commit message that I don't end up using the tile bits, but
keep the commit as a cleanup anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Eric Anholt [Thu, 5 Sep 2019 22:48:58 +0000 (15:48 -0700)]
r100/r200: factor out txformat/txfilter setup from the TFP path.
No matter what, we deref the texFormat from the table, except for a
mistake in cpp=4 where we pulled a 0 out of the table either way.
v2: Rebase on dropping r200 table deduplication patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Vasily Khoruzhick [Sat, 19 Oct 2019 01:06:56 +0000 (18:06 -0700)]
lima: fix PP stack size
PP stack size should be set to maximum PP stack size, not to stack size of
last shader.
Fixes: 27e7603c344a ("lima: fix ppir spill stack allocation")
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Marijn Suijten [Sat, 19 Oct 2019 14:43:49 +0000 (16:43 +0200)]
freedreno/a5xx: enable a510
Kernel support for this GPU is added by the following series:
https://patchwork.kernel.org/project/linux-arm-msm/list/?series=187609
In particular https://patchwork.kernel.org/patch/
11189953/
Tested on Sony Xperia X and X Compact.
Signed-off-by: Marijn Suijten <marijns95@gmail.com>
Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
Prodea Alexandru-Liviu [Sat, 19 Oct 2019 14:44:44 +0000 (14:44 +0000)]
Appveyor/Meson: Add build test of osmesa gallium
Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Lionel Landwerlin [Fri, 18 Oct 2019 11:50:02 +0000 (14:50 +0300)]
anv: fix vkUpdateDescriptorSets with inline uniform blocks
With inline uniform blocks descriptor, the meaning of descriptorCount
is a number of bytes to copy into the descriptor. Don't try to use
that size as an index into the descriptor table.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 43f40dc7cb ("anv: Implement VK_EXT_inline_uniform_block")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1195
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rob Clark [Tue, 8 Oct 2019 20:37:10 +0000 (13:37 -0700)]
freedreno/ir3: handle imad24_ir3 case in UBO lowering
Similiar to iadd, we can fold an added constant value from an imad24_ir3
into the load_uniform's constant offset. This avoids some cases where
the addition of imad24_ir3 could otherwise be a regression in instr
count.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Tue, 8 Oct 2019 20:36:14 +0000 (13:36 -0700)]
freedreno/ir3: add imul24 opcode
This maps to mul.s24
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Mon, 30 Sep 2019 18:44:16 +0000 (11:44 -0700)]
freedreno/ir3: optimize immed 2nd src to mad
We can't encode immed sources for cat3 (mad) instructions, but we can
use const in first or third src. We handled this case already, but we
weren't considering that we could lower immed to const.
For manhattan:
total instructions in shared programs: 35202 -> 34718 (-1.37%)
instructions in affected programs: 14931 -> 14447 (-3.24%)
helped: 90
HURT: 0
total full in shared programs: 2451 -> 2359 (-3.75%)
full in affected programs: 653 -> 561 (-14.09%)
helped: 69
HURT: 2
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 27 Sep 2019 18:36:43 +0000 (11:36 -0700)]
freedreno/ir3: add rule to generate imad24
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Fri, 27 Sep 2019 17:15:02 +0000 (10:15 -0700)]
nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Thu, 26 Sep 2019 17:34:51 +0000 (10:34 -0700)]
nir: add address calc related opt rules
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Thu, 26 Sep 2019 17:32:00 +0000 (10:32 -0700)]
nir: add amul instruction
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size. For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Wed, 25 Sep 2019 17:10:39 +0000 (10:10 -0700)]
nir: Add a new ALU nir_op_imul24
Some hardware can do 24b multiply in a single instruction, but not 32b.
However in most cases 24b is sufficient for address/offset calculation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Eduardo Lima Mitev [Fri, 28 Jun 2019 07:43:03 +0000 (09:43 +0200)]
freedreno/ir3: Handle newly added opcode nir_op_imad24_ir3
Simply emit an ir3_MAD_S24 instruction in the backend.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Eduardo Lima Mitev [Fri, 28 Jun 2019 07:39:38 +0000 (09:39 +0200)]
nir: Add a new ALU nir_op_imad24_ir3
ir3 compiler has a signed integer multiply-add instruction (MAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can directly represent it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Wed, 25 Sep 2019 17:21:24 +0000 (10:21 -0700)]
freedreno/ir3: rename mul.s/mul.u
to mul.s24/mul.u24, to better reflect that these are 24b multiply.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Wed, 25 Sep 2019 18:59:49 +0000 (11:59 -0700)]
nir/search: fix the PoT helpers
Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Rob Clark [Fri, 18 Oct 2019 18:30:48 +0000 (11:30 -0700)]
freedreno/ir3: enable pre-fs texture fetch for a6xx
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 18 Oct 2019 18:52:35 +0000 (11:52 -0700)]
turnip: add support for pre-fs texture fetch
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 11 Oct 2019 23:43:03 +0000 (16:43 -0700)]
freedreno/a6xx: add support for pre-fs texture fetch
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Hyunjun Ko [Mon, 5 Aug 2019 06:38:57 +0000 (08:38 +0200)]
freedreno/ir3: Add support for texture sampling pre-dispatch
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Eduardo Lima Mitev [Mon, 5 Aug 2019 06:09:23 +0000 (08:09 +0200)]
freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch
The pass should run once at the end of shader compilation, for a4xx
onwards. It iterates texture sampling instructions and mark those
eligibile for pre-dispatch by changing the tex op from 'tex' to
'tex_prefetch'. An instruction is eligibile if:
* The coordinate is a vector where all its components come from a
shader input.
* The order of the components match exactly that of the input (no
swizzles).
* The instruction is in the 'main' function, and in the outer
most-block.
The first two restrictions were arrived to empirically, so more
testing could tighten or loosen it.
The 3rd restriction is there to allow moving the instructions
eligible for pre-dispatch to the beginning of the shader, so
that we don't block the registers holding the result for too
long.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 11 Oct 2019 02:36:30 +0000 (19:36 -0700)]
freedreno/ir3: force i/j pixel to r0.x
It seems that pre-fs texture fetch only works if ij_pix ends up in r0.x.
I've tried unknown zero bits, to no avail, and blob also seems to force
r0.x when this feature is used.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Thu, 10 Oct 2019 19:09:15 +0000 (12:09 -0700)]
freedreno/ir3: add pre-dispatch tex fetch to disasm
Useful to see in disassembly listing texture fetches that were moved to
pre-dispatch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Wed, 9 Oct 2019 22:51:01 +0000 (15:51 -0700)]
freedreno/ir3: add dummy bary.f(ei) for pre-fs-fetch
If the only use of varyings is a pre-shader texture-fetch, we still need
to issue a bary.f with the end-input flag, otherwise we'll block further
VS invocations, as the hw will think varying storage is still busy.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 11 Oct 2019 23:15:44 +0000 (16:15 -0700)]
freedreno/ir3: fixup register footprint to account for prefetch
It is possible that the result of a pre-fs texture fetch is an output
(or partially an output) of the FS. Sine the meta:tex_prefetch
instructions are dropped before the assembler, we need to account for
this when we fixup the register footprint.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 11 Oct 2019 22:57:22 +0000 (15:57 -0700)]
freedreno/ir3: add meta instruction for pre-fs texture fetch
Add a placeholder instruction to track texture fetches made prior to FS
shader dispatch. These, like meta:input instructions are scheduled
before any real instructions, so that RA realizes their result values
are live before the first real instruction. And to give legalize a way
to track usage of fetched sample requiring (sy) sync flags.
There is some related special handling for varying texcoord inputs used
for pre-fs-fetch, so that they are not DCE'd and remain in linkage
between FS and previous stage. Note that we could almost avoid this
special handling by giving meta:tex_prefetch real src arguments, except
that in the FS stage, inputs are actual bary.f/ldlv instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 11 Oct 2019 18:50:22 +0000 (11:50 -0700)]
freedreno/ir3: don't DCE ij_pix if used for pre-fs-texture-fetch
When we enable pre-dispatch texture fetch, we could have a scenario
where the barycentric i/j coord sysval is not used in the shader, but
only used for the varying fetch for the pre-dispatch texture fetch.
In this case we need to take care not to DCE this sysval.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 11 Oct 2019 18:35:53 +0000 (11:35 -0700)]
freedreno/ir3: track sysval slot for inputs
Will be needed for special handling of SYSTEM_VALUE_BARYCENTRIC_PIXEL
(ij_pix) when pre-fs texture fetch is enabled.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 11 Oct 2019 18:26:08 +0000 (11:26 -0700)]
freedreno/ir3: remove unused ir3_instruction::inout
Not sure I remember how long this has been unused for. But it's unused
now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Hyunjun Ko [Fri, 2 Aug 2019 19:12:22 +0000 (21:12 +0200)]
freedreno/ir3: Add data structures to support texture pre-fetch
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Wed, 9 Oct 2019 19:16:03 +0000 (12:16 -0700)]
freedreno: update registers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Eduardo Lima Mitev [Wed, 10 Jul 2019 07:48:21 +0000 (09:48 +0200)]
nir: Add new texop nir_texop_tex_prefetch
This is like nir_texop_tex, but signals that the sampling coordinates
are immutable during the shader stage, in a way that allows the HW
that supports pre-dispatching sampling operations to pre-fetch
the result prior to scheduling the shader stage.
This is introduced to support the feature in Freedreno. Adreno HW
from a4xx supports it.
A NIR pass introduced later in this series will detect sampling
operations that are eligible for pre-dispatch, and replace
nir_texop_tex by this new op, to tell the backend to enable
pre-fetch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Eric Engestrom [Wed, 16 Oct 2019 10:54:59 +0000 (11:54 +0100)]
osmesa: add missing #include <stdint.h>
Fixes: 281466332ba81a4277a1 ("gallium/osmesa: Introduce a test.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1947
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Dylan Baker [Thu, 17 Oct 2019 17:09:59 +0000 (10:09 -0700)]
docs: Add new feature for compiling for windows with meson
Reviewed-by: Adam Jackson <ajax@redhat.com>
Dylan Baker [Tue, 15 Oct 2019 18:17:54 +0000 (11:17 -0700)]
appveyor: Move appveyor script into .appveyor directory
This clears out the scripts directory completely
Reviewed-by: Adam Jackson <ajax@redhat.com>
Dylan Baker [Tue, 15 Oct 2019 18:16:01 +0000 (11:16 -0700)]
appveyor: Add support for building llvmpipe with meson
Reviewed-by: Adam Jackson <ajax@redhat.com>
Dylan Baker [Thu, 17 Oct 2019 17:07:44 +0000 (10:07 -0700)]
docs: update meson docs for windows
Reviewed-by: Adam Jackson <ajax@redhat.com>
Dylan Baker [Thu, 25 Jul 2019 21:27:43 +0000 (14:27 -0700)]
meson: Use cmake to find LLVM when building for windows
We don't use cmake normally because it always results in static linking.
This is very problematic for *nix OSes which expect shared linking by
default, but for windows this isn't a problem as LLVM doesn't support
shared linking on windows anyway.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Dylan Baker [Tue, 24 Apr 2018 20:48:25 +0000 (13:48 -0700)]
meson: Add support for wrapping llvm
For building on Windows (when not using cygwin), users may want to use a
binary wrap of LLVM, this provides a fallback to the LLVM dependency
which may be used in this case
Reviewed-by: Adam Jackson <ajax@redhat.com>
Dylan Baker [Tue, 15 Oct 2019 20:06:58 +0000 (13:06 -0700)]
meson/llvmpipe: Add dep_llvm to driver_swrast
This fixes build errors in gl-gdi on windows when using llvmpipe
Reviewed-by: Adam Jackson <ajax@redhat.com>
Hal Gentz [Fri, 18 Oct 2019 07:03:37 +0000 (01:03 -0600)]
Revert "egl: Add EGL_CONFIG_SELECT_GROUP_MESA ext."
This reverts commit
173bc9d6842efdec54ea3fd415a6946dcee7b02a.
Hal Gentz [Fri, 18 Oct 2019 07:03:36 +0000 (01:03 -0600)]
Revert "egl: Fixes transparency with EGL and X11."
This reverts commit
90a19074b4e1d4d8f8ababaade8170c05aeecffe.
Hal Gentz [Fri, 18 Oct 2019 07:03:33 +0000 (01:03 -0600)]
Revert "egl: Puts RGBA visuals in the second config selection group."
This reverts commit
a800d16e4f1589e41e53edf8e8a771a33bb46a6a.
Hal Gentz [Fri, 18 Oct 2019 07:03:30 +0000 (01:03 -0600)]
Revert "egl: Configs w/o double buffering support have no `EGL_WINDOW_BIT`."
This reverts commit
075a96aa926e6e89795f95a6a59693f44d9ac970.
Jonathan Marek [Fri, 9 Aug 2019 15:44:07 +0000 (11:44 -0400)]
etnaviv: check NO_ASTC feature bit
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Jonathan Marek [Mon, 2 Sep 2019 20:23:21 +0000 (16:23 -0400)]
etnaviv: fix TS samplers on GC7000L
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Jonathan Marek [Fri, 21 Jun 2019 00:01:28 +0000 (20:01 -0400)]
etnaviv: fix linear_nearest / nearest_linear filters on GC7000Lite
MIN filter is only used when LOD MAX is at least 4 (I guess the 2 LSB don't
actually exist).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Lucas Stach [Thu, 22 Feb 2018 10:54:21 +0000 (11:54 +0100)]
etnaviv: GC7000: flush TX descriptor and instruction cache
The etnaviv kernel driver will only ever flush write caches. As both
the TX descriptor and instruction cache are read caches they must be
flushed from the user cmdstream at an appropriate time.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Lucas Stach [Thu, 28 Mar 2019 09:14:23 +0000 (10:14 +0100)]
etnaviv: add linear texture support on GC7000
It's just a matter of writing the addressing mode into the
texture descriptor.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Wladimir J. van der Laan [Sun, 29 Oct 2017 14:59:43 +0000 (14:59 +0000)]
etnaviv: GC7000: Texture descriptors
Create a separate implementation file with texture-descriptor-based
sampler views and sampler states. Initialize the one or the other
based on the GPU. There is so little in common that this seemed more
appropriate that keeping them as one type of state object would
only be confusing.
This commit is actually a combiation of the original commit by
Wladimir, fixes and TS implementation from Jonathan and changed to
use softpin by Lucas.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Lucas Stach [Fri, 2 Aug 2019 12:53:08 +0000 (14:53 +0200)]
etnaviv: check for softpin availability on Halti5 devices
Halti5 uses texture descriptors to control the samplers, and thus needs to
know the GPU virtual address for the texture buffers to fill into the
descriptor buffer. Without softpin userspace has no control over the GPU
VM and also no way to fix up the texture descriptor buffer, so there is
no point in creating a screen on a Halti5 device without softpin being
available.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Lucas Stach [Fri, 2 Aug 2019 12:48:09 +0000 (14:48 +0200)]
etnaviv: drm: add softpin interface
If softpin is available on the kernel side, we transparently replace the
relocs with self-managed GPU virtual addresses. This allows to skip some
work at the kernel side, as it doesn't need to touch the command stream
anymore before submitting it to the hardware.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Marek Vasut [Thu, 5 Sep 2019 18:02:58 +0000 (20:02 +0200)]
etnaviv: Rework locking
Replace the per-screen locking of flushing with per-context one and
add per-context lock around command stream buffer accesses, to prevent
cross-context flushing from corrupting these command stream buffers.
Signed-off-by: Marek Vasut <marex@denx.de>
Marek Vasut [Thu, 5 Sep 2019 17:57:39 +0000 (19:57 +0200)]
etnaviv: Command buffer realloc
Reallocate the command stream buffer in case it is too small.
The older kernel versions are limited to 64 kiB buffer, so
limit the size to avoid oversized buffers.
Signed-off-by: Marek Vasut <marex@denx.de>
Marek Vasut [Wed, 4 Sep 2019 23:23:52 +0000 (01:23 +0200)]
etnaviv: Rework resource status tracking
Have each context track which resources it marked as pending read and
pending write. Have each resource track in which context it is pending.
This way, it is possible to identify when a resource is both pending
read and pending write at the same time. Moreover, the status field
can be correctly calculated and updated when necessary.
Signed-off-by: Marek Vasut <marex@denx.de>
Lucas Stach [Fri, 9 Aug 2019 15:11:23 +0000 (17:11 +0200)]
etnaviv: rework the stream flush to always go through the context flush
This way we can ensure that the pipe driver tracking of pending resources
stays in sync with the actual command buffer state, even if a space
reservation triggers a forced flush.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Lucas Stach [Fri, 9 Aug 2019 14:46:01 +0000 (16:46 +0200)]
etnaviv: drm: remove unused etna_cmd_stream_finish
It's not used by anything and gets in the way for the refactoring of
the flush handling.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Lucas Stach [Fri, 9 Aug 2019 13:34:31 +0000 (15:34 +0200)]
etnaviv: keep references to pending resources
As long as a resource is pending in any context we must not destroy
it, otherwise we'll hit a classical use-after-free with fireworks.
To avoid this take a reference when the resource is first added to
the pending set and put the reference when no longer pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Marek Vasut [Sat, 8 Jun 2019 17:52:55 +0000 (19:52 +0200)]
etnaviv: Make contexts track resources
Currently, the screen tracks all resources for all contexts, but this
is not correct. Each context should track the resources it uses. This
also allows a context to detect whether a resource is used by another
context and to notify another context using a resource that the current
context is done using the resource.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Guido Günther <guido.gunther@puri.sm>
Cc: Lucas Stach <l.stach@pengutronix.de>
Brian Paul [Wed, 7 Aug 2019 20:22:33 +0000 (14:22 -0600)]
REVIEWERS: add VMware reviewers
Samuel Pitoiset [Mon, 14 Oct 2019 09:27:32 +0000 (11:27 +0200)]
radv: implement VK_KHR_shader_float_controls
This exposes what's required for DX and this is what we already
configure. The driver flushes denorms for FP32 and preserves them
for FP16/FP64. Note that we can't allow both preserving and
flushing denorms because this won't work for merged shaders. This
will require LLVM to update the float mode register to make it work.
Only enabled on GFX8+ with the LLVM path because it's untested on
previous chips and ACO doesn't support it.
This extension is required for SPIRV 1.4.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 14 Oct 2019 13:39:06 +0000 (15:39 +0200)]
ac/llvm: force fneg/fabs to flush denorms to zero if requested
LLVM optimizes these instructions with XOR/AND and it loses
the sign bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 14 Oct 2019 13:36:37 +0000 (15:36 +0200)]
ac/llvm: add AC_FLOAT_MODE_ROUND_TO_ZERO
Because some instructions will be optimized by the backend compiler,
the driver has to manually flush to zero to keep the result exact.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 14 Oct 2019 12:23:35 +0000 (14:23 +0200)]
ac/llvm: add ac_build_canonicalize() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Fri, 18 Oct 2019 14:05:21 +0000 (15:05 +0100)]
travis: test meson install as well
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Eric Engestrom [Fri, 18 Oct 2019 14:03:43 +0000 (15:03 +0100)]
travis: don't (re)install python
The new Mac OS X images apparently already have python2 and python3,
and `brew` considers asking to install something already installed
as a fatal error...
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Lepton Wu [Thu, 17 Oct 2019 08:53:49 +0000 (01:53 -0700)]
gbm: Add GBM_MAX_PLANES definition
This removed hard coded "4".
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Jose Maria Casanova Crespo [Fri, 11 Oct 2019 11:53:32 +0000 (13:53 +0200)]
v3d: Explicitly expose OpenGL ES Shading Language 3.1
This will expose GL_EXT_primitive_bounding_box and
GL_OES_primitive_bounding_box after previous commits
expose OpenGL ES 3.1 once Compute Shaders are available.
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Tue, 3 Sep 2019 08:31:42 +0000 (10:31 +0200)]
v3d: request the kernel to flush caches when TMU is dirty
This adapts the v3d driver to the new CL submit ioctl interface that
allows the driver to request a flush of the caches after the render
job has completed. This seems to eliminate the kernel write violation
errors reported during CTS and Piglit excutions, fixing some CTS tests
and GPU resets along the way.
v2:
- Adapt to changes in the kernel side.
- Disable shader storage and shader images if the kernel doesn't
implement cache flushing.
Fixes CTS tests:
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-float
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-int
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-uint
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-float
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-int
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-uint
KHR-GLES31.core.shader_atomic_counters.advanced-usage-many-draw-calls2
KHR-GLES31.core.shader_atomic_counters.advanced-usage-draw-update-draw
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-int
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-matR
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-struct
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-matC-pad
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-vec
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Wed, 5 Dec 2018 23:41:35 +0000 (15:41 -0800)]
v3d: Add Compute Shader support
Now that the UAPI has landed, add the pipe_context function for
dispatching compute shaders. This is the last major feature for GLES 3.1,
though it's not enabled quite yet.
Iago Toral Quiroga [Thu, 5 Sep 2019 06:35:01 +0000 (08:35 +0200)]
broadcom: document known hardware issues for L2T flush command
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 14 Aug 2019 07:27:13 +0000 (09:27 +0200)]
v3d: add new flag dirty TMU cache at v3d_compiler
That we set for any TMU write on spills and general tmu. It is then
used as part of v3d_emit_gl_shader_state later.
v2: add a new flag instead at v3d_compiler instead of dirty the flag
at v3dx if there is any spill (change suggested by Eric, added by
Alejandro)
v3: set this for anything that is not a load and do it also in
v3d40_vir_emit_image_load_store (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 14 Aug 2019 07:28:15 +0000 (09:28 +0200)]
v3d: trivial update to obsolete comment
Reviewed-by: Eric Anholt <eric@anholt.net>
Bas Nieuwenhuizen [Thu, 17 Oct 2019 23:21:29 +0000 (01:21 +0200)]
radv: Fix single stage constant flush with merged shaders.
e.g. a VERTEX only flush with tess on Vega should look at the TCS
to see which bits are needed.
CC: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1953
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>