Iago Toral Quiroga [Tue, 12 Feb 2019 11:43:30 +0000 (12:43 +0100)]
intel/compiler: remove inexact algebraic optimizations from the backend
NIR already has these and correctly considers exact/inexact qualification,
whereas the backend doesn't and can apply the optimizations where it
shouldn't. This happened to be the case in a handful of Tomb Raider shaders,
where NIR would skip the optimizations because of a precise qualification
but the backend would then (incorrectly) apply them anyway.
Besides this, considering that we are not emitting much math in the backend
these days it is unlikely that these optimizations are useful in general. A
shader-db run confirms that MAD and LRP optimizations, for example, were only
being triggered in cases where NIR would skip them due to precise
requirements, so in the near future we might want to remove more of these,
but for now we just remove the ones that are not completely correct.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Mon, 19 Nov 2018 12:08:07 +0000 (13:08 +0100)]
intel/compiler: fix cmod propagation for non 32-bit types
v2:
- Do not propagate if the bit-size changes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Tue, 20 Nov 2018 13:04:26 +0000 (14:04 +0100)]
intel/compiler: add a brw_reg_type_is_integer helper
v2:
- Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Iago Toral Quiroga [Fri, 26 Oct 2018 11:40:27 +0000 (13:40 +0200)]
intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit
There are no 8-bit immediates, so assert in that case.
16-bit immediates are replicated in each word of a 32-bit immediate, so
we only need to check the lower 16-bits.
v2:
- Fix is_zero with half-float to consider -0 as well (Jason).
- Fix is_negative_one for word type.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Tue, 29 Jan 2019 09:58:49 +0000 (10:58 +0100)]
intel/compiler: generalize the combine constants pass
At the very least we need it to handle HF too, since we are doing
constant propagation for MAD and LRP, which relies on this pass
to promote the immediates to GRF in the end, but ideally
we want it to support even more types so we can take advantage
of it to improve register pressure in some scenarios.
v2 (Jason):
- Support 64-bit types too.
- Check if we need to set the half-float flag if the immediate already
existed.
- Multiply the size of the immediate by the width of the copy
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Wed, 7 Nov 2018 11:08:02 +0000 (12:08 +0100)]
intel/eu: force stride of 2 on NULL register for Byte instructions
The hardware only allows a stride of 1 on a Byte destination for raw
byte MOV instructions. This is required even when the destination
is the NULL register.
Rather than making sure that we emit a proper NULL:B destination
every time we need one, just fix it at emission time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Fri, 4 Jan 2019 09:15:39 +0000 (10:15 +0100)]
intel/compiler: ask for an integer type if requesting an 8-bit type
v2:
- Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Tue, 17 Jul 2018 07:02:27 +0000 (09:02 +0200)]
intel/compiler: rework conversion opcodes
Now that we have the regioning lowering pass we can just put all of these
opcodes together in a single block and we can just assert on the few cases
of conversion instructions that are not supported in hardware and that should
be lowered in brw_nir_lower_conversions.
The only cases what we still handle separately are the conversions from float
to half-float since the rounding variants would need to fallthrough and we
are already doing this for boolean opcodes (since they need to negate), plus
there is also a large comment about these opcodes that we probably want to
keep so it is just easier to keep these separate.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Fri, 13 Jul 2018 08:03:14 +0000 (10:03 +0200)]
intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
Particularly, we need the same lowewrings we use for 16-bit
integers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Tue, 10 Jul 2018 07:52:46 +0000 (09:52 +0200)]
intel/compiler: split is_partial_write() into two variants
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.
One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.
The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.
For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).
This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.
This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.
Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.
Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.
Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:
Patched | Master | % |
================================================
SIMD8 | 621,900 | 706,721 | -12.00% |
================================================
SIMD16 | 93,252 | 93,252 | 0.00% |
================================================
As expected, the change only affects SIMD8 dispatches.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Iago Toral Quiroga [Mon, 21 Jan 2019 11:11:44 +0000 (12:11 +0100)]
intel/compiler: workaround for SIMD8 half-float MAD in gen8
Empirical testing shows that gen8 has a bug where MAD instructions with
a half-float source starting at a non-zero offset fail to execute
properly.
This scenario usually happened in SIMD8 executions, where we used to
pack vector components Y and W in the second half of SIMD registers
(therefore, with a 16B offset). It looks like we are not currently doing
this any more but this would handle the situation properly if we ever
happen to produce code like this again.
v2 (Jason):
- Move this workaround to the lower_regioning pass as an additional case
to has_invalid_src_region()
- Do not apply the workaround if the stride of the source operand is 0,
testing suggests the problem doesn't exist in that case.
v3 (Jason):
- We want offset % REG_SIZE > 0, not just offset > 0
- Use a helper to compute the offset
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Iago Toral Quiroga [Wed, 30 May 2018 10:14:14 +0000 (12:14 +0200)]
intel/compiler: fix ddy for half-float in Broadwell
Broadwell has restrictions that apply to Align16 half-float that
make the Align16 implementation of this invalid for this platform.
Use the gen11 path for this instead, which uses Align1 mode.
The restriction is not present in cherryview, gen9 or gen10, where
the Align16 implementation seems to work just fine.
v2:
- Rework the comment in the code, move the PRM citation from the
commit message to the comment in the code (Matt)
- Cherryview isn't affected, only Broadwell (Matt)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Iago Toral Quiroga [Mon, 28 May 2018 10:32:08 +0000 (12:32 +0200)]
intel/compiler: fix ddx and ddy for 16-bit float
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base offset we
have, but instead add to it.
v2
- Use byte_offset() helper (Jason)
- Merge the fix for SIMD8: using byte_offset() fixes that too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Iago Toral Quiroga [Tue, 22 May 2018 06:17:38 +0000 (08:17 +0200)]
intel/compiler: set correct precision fields for 3-source float instructions
Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
means half-precision. Since we always use the type of the destination for
all operands when we emit 3-source instructions, we only need set Src1Type
and Src2Type to 1 when we are emitting a half-precision instruction.
v2:
- Set the bit separately for each source based on its type so we can
do mixed floating-point mode in the future (Topi).
v3:
- Use regular citation style for the comment referencing the PRM (Matt).
- Decided not to add asserts in the emission code to check that only
mixed HF/F types are used since such checks would break negative tests
for brw_eu_validate.c (Matt)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Iago Toral Quiroga [Tue, 22 May 2018 06:17:17 +0000 (08:17 +0200)]
intel/compiler: allow half-float on 3-source instructions since gen8
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Iago Toral Quiroga [Tue, 22 May 2018 08:21:29 +0000 (10:21 +0200)]
intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits
We are now using these bits, so don't assert that they are not set. In gen8,
if these bits are set compaction is not possible. On gen9 and CHV platforms
set_3src_control_index() checks these bits (and others) against a table to
validate if the particular bit combination is eligible for compaction or not.
v2
- Add more detail in the commit message explaining the situation for SKL+
and CHV (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Iago Toral Quiroga [Mon, 21 May 2018 12:42:42 +0000 (14:42 +0200)]
intel/compiler: add new half-float register type for 3-src instructions
This is available since gen8.
v2: restore previously existing assertion.
v3: don't use separate tables for gen7 and gen8, just assert that we
don't use half-float before gen8 (Matt)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Mon, 21 May 2018 12:34:01 +0000 (14:34 +0200)]
intel/compiler: add instruction setters for Src1Type and Src2Type.
The original SrcType is a 3-bit field that takes a subset of the types
supported for the hardware for 3-source instructions. Since gen8,
when the half-float type was added, 3-source floating point operations
can use use mixed precision mode, where not all the operands have the
same floating-point precision. While the precision for the first operand
is taken from the type in SrcType, the bits in Src1Type (bit 36) and
Src2Type (bit 35) define the precision for the other operands
(0: normal precision, 1: half precision).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Mon, 21 Jan 2019 08:47:59 +0000 (09:47 +0100)]
intel/compiler: drop unnecessary temporary from 32-bit fsign implementation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Mon, 21 Jan 2019 08:47:15 +0000 (09:47 +0100)]
intel/compiler: implement 16-bit fsign
v2:
- make 16-bit be its own separate case (Jason)
v3:
- Drop the result_int temporary (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Thu, 26 Apr 2018 08:26:22 +0000 (10:26 +0200)]
intel/compiler: handle extended math restrictions for half-float
Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.
v2: quashed together the following patches (Jason):
- intel/compiler: allow extended math functions with HF operands
- intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
- intel/compiler: extended Math is limited to SIMD8 on half-float
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(allow extended math functions with HF operands,
extended Math is limited to SIMD8 on half-float)
Iago Toral Quiroga [Thu, 26 Apr 2018 08:12:12 +0000 (10:12 +0200)]
intel/compiler: lower some 16-bit float operations to 32-bit
The hardware doesn't support half-float for these.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Tue, 18 Dec 2018 08:27:21 +0000 (09:27 +0100)]
intel/compiler: assert restrictions on conversions to half-float
There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.
v2:
- rebased on top of regioning lowering pass
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Thu, 22 Nov 2018 09:59:59 +0000 (10:59 +0100)]
intel/compiler: handle b2i/b2f with other integer conversion opcodes
Since we handle booleans as integers this makes more sense.
v2:
- rebased to incorporate new boolean conversion opcodes
v3:
- rebased on top regioning lowering pass
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)
Iago Toral Quiroga [Fri, 2 Mar 2018 12:37:59 +0000 (13:37 +0100)]
intel/compiler: split float to 64-bit opcodes from int to 64-bit
Going forward having these split is a bit more convenient since these two
groups have different restrictions.
v2:
- Rebased on top of new regioning lowering pass.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Mon, 17 Dec 2018 08:17:06 +0000 (09:17 +0100)]
intel/compiler: add a NIR pass to lower conversions
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.
v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles on conversion sources.
v3
- Run the pass earlier, right after nir_opt_algebraic_late (Jason)
- NIR alu output types already have the bit-size (Jason)
- Use 'is_conversion' to identify conversion operations (Jason)
v4:
- Be careful about the intermediate types we use so we don't lose
range and avoid incorrect rounding semantics (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Dominik Drees [Mon, 15 Apr 2019 09:05:46 +0000 (11:05 +0200)]
Add no_aos_sampling GALLIVM_PERF option
This forces using general sampling and should improve precision and
performance in some cases.
Samuel Pitoiset [Tue, 26 Mar 2019 11:37:39 +0000 (12:37 +0100)]
ac: use struct/raw store intrinsics for 8-bit/16-bit int with LLVM 9+
This changes requires LLVM r356465.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Tue, 26 Mar 2019 11:24:52 +0000 (12:24 +0100)]
ac: use struct/raw load intrinsics for 8-bit/16-bit int with LLVM 9+
This changes requires LLVM r356465.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Mon, 15 Apr 2019 13:23:58 +0000 (15:23 +0200)]
ac: add support for more types with struct/raw LLVM intrinsics
LLVM 9+ now supports 8-bit and 16-bit types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Tue, 16 Apr 2019 08:38:24 +0000 (10:38 +0200)]
radv: add VK_KHR_shader_atomic_int64 but disable it for now
No support for 64-bit compare&swap atomic operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 16 Apr 2019 08:38:23 +0000 (10:38 +0200)]
ac/nir: add 64-bit SSBO atomic operations support
Except compare&swap which is still buggy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 16 Apr 2019 08:38:22 +0000 (10:38 +0200)]
ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap
Use the raw version (ie. IDXEN=0) because vindex is unused.
Use the old intrinsic for compare&swap because the new one
hangs the GPU for some reasons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Roland Scheidegger [Wed, 17 Apr 2019 00:34:01 +0000 (02:34 +0200)]
gallivm: fix saturated signed add / sub with llvm 9
llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and
now llvm 9 removed the signed versions as well - they were proposed for
removal earlier, but the pattern to recognize those was very complex,
so it wasn't done then. However, instead of these arch-specific
intrinsics, there's now arch-independent intrinsics for saturated
add / sub, both for signed and unsigned, so use these.
They should have only advantages (work with arbitrary vector sizes,
optimal code for all archs), although I don't know how well they work
in practice for other archs (at least for x86 they do the right thing).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454
Reviewed-by: Brian Paul <brianp@vmware.com>
Juan A. Suarez Romero [Wed, 17 Apr 2019 09:38:00 +0000 (09:38 +0000)]
meson: Add dependency on genxml to anvil genfiles
This fixes a race condition where anv_gen_files are executed before
genxml files, which causes a build failure
v2: add dependency on idep_genxml (Lionel)
Fixes: d1992255bb29054fa51763376d125183a9f602f
("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Lionel Landwerlin [Fri, 5 Oct 2018 16:29:17 +0000 (17:29 +0100)]
intel/perf: constify accumlator parameter
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Tue, 2 Oct 2018 14:41:41 +0000 (15:41 +0100)]
intel/perf: drop counter size field
We can deduct the size from another field, let's just save some space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Mon, 18 Jun 2018 10:40:24 +0000 (11:40 +0100)]
i965: perf: add mdapi pipeline statistics queries on gen10/11
The Gen10+ expected format adds an additional counter which we can't
disclose yet. We can still make the size of the expected query result
match.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Fri, 8 Jun 2018 16:26:49 +0000 (17:26 +0100)]
intel/perf: stub gen10/11 missing definitions
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Fri, 8 Jun 2018 21:18:46 +0000 (22:18 +0100)]
i965: move mdapi guid into intel/perf
One more thing we want to share between the different APIs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Fri, 8 Jun 2018 16:53:08 +0000 (17:53 +0100)]
i965: move mdapi result data format to intel/perf
We want to reuse this in Anv.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Fri, 8 Jun 2018 16:51:33 +0000 (17:51 +0100)]
i965: move brw_timebase_scale to device info
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Fri, 8 Jun 2018 14:29:51 +0000 (15:29 +0100)]
i965: move OA accumulation code to intel/perf
We'll want to reuse this in our Vulkan extension.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Thu, 7 Jun 2018 17:18:43 +0000 (18:18 +0100)]
i965: move mdapi data structure to intel/perf
We'll want to reuse those structures later on.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Lionel Landwerlin [Sun, 27 May 2018 19:33:25 +0000 (20:33 +0100)]
i965: extract performance query metrics
We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Sun, 27 May 2018 19:36:49 +0000 (20:36 +0100)]
i965: store device revision in gen_device_info
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Topi Pohjolainen [Wed, 27 Mar 2019 16:38:15 +0000 (09:38 -0700)]
intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27
Similarly to
1cc17fb731466c68586915acbb916586457b19bc
Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Erik Faye-Lund [Tue, 9 Apr 2019 12:25:51 +0000 (14:25 +0200)]
virgl: document potentially failing blit
This blit can fail, but this is not new; in the old version we
didn't even try to blit in this case. So let's just document the
limitation for now, and leave this for another day.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 09:55:24 +0000 (11:55 +0200)]
virgl: do color-conversion during when mapping transfer
When running on OpenGL ES, we can't just map any format for reading,
because of limitations on glReadPixels. So let's fall back to the
blit code-path, and translate the pixels to the correct format in the
end.
This fixes the remaining failures of KHR-GL32.packed_pixels.* apart
from the sRGB tests.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Fri, 5 Apr 2019 06:27:14 +0000 (08:27 +0200)]
virgl: only blit if resource is read
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Fri, 5 Apr 2019 06:13:37 +0000 (08:13 +0200)]
virgl: get readback-formats from host
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Tue, 12 Mar 2019 20:17:59 +0000 (21:17 +0100)]
gallium/util: support translating between uint and sint formats
Without this, we can't for instance convert between r8_sint and
r8g8b8a8_sint. But that's pretty useful, so let's support it as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Tue, 12 Mar 2019 12:57:14 +0000 (13:57 +0100)]
virgl: make sure bind is set for non-buffers
Otherwise, virglrenderer will reject the resource.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 15:27:52 +0000 (17:27 +0200)]
virgl: support write-back with staged transfers
We currently don't support writing to resources that uses a temporary
staging-resource to resolve the pixels. If a write-bit was set, we
forgot to perform a blit back to the old resource, followed by trying to
update the wrong resource, which lacks backing-storage. The end-result
would be that nothing useful happened.
This approach also fixes a few smaller bugs, like using the wrong box
(without x y and z zeroed out), which means a partial update of a
multisampled texture could result in the wrong part of the texture being
updated.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 14:58:46 +0000 (16:58 +0200)]
virgl: use pipe_box for blit dst-rect
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 13:36:42 +0000 (15:36 +0200)]
virgl: rewrite core of virgl_texture_transfer_map
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 10:36:08 +0000 (12:36 +0200)]
virgl: return error if allocating resolve_tmp fails
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Wed, 13 Mar 2019 15:03:39 +0000 (16:03 +0100)]
virgl: wait for the right resource
In case we're resolving, we need to wait for the resolved resource
instead of the original one.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 10:22:07 +0000 (12:22 +0200)]
virgl: check for readback on correct resource
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 14:52:21 +0000 (16:52 +0200)]
virgl: make unmap queuing a bit more straight-forward
It's hard to read the code that decides if we want to queue up an unmap
or destroy the transfer right away. So let's make it a bit simpler, by
setting a bool in case we want to queue it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 14:50:18 +0000 (16:50 +0200)]
virgl: simplify virgl_texture_transfer_unmap logic
There's no reason to keep an extra indentation level here, let's merge
the two if-conditions.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 10:19:44 +0000 (12:19 +0200)]
virgl: track full virgl_resource instead of just virgl_hw_res
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Thu, 4 Apr 2019 09:53:42 +0000 (11:53 +0200)]
virgl: tmp_resource -> templ
This isn't the temporary resource itself, it's the template that we'll
create the resource from. So let's name it appropriately.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Erik Faye-Lund [Wed, 13 Mar 2019 16:25:41 +0000 (17:25 +0100)]
virgl: remove pointless transfer-counter
This is only written to, never read. Let's just get rid of it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Timothy Arceri [Tue, 16 Apr 2019 04:33:31 +0000 (14:33 +1000)]
radeonsi/nir: fix scanning of bindless images
Fixes: d62d434fe920 ("ac/nir_to_llvm: add image bindless support")
Kenneth Graunke [Fri, 5 Apr 2019 23:47:39 +0000 (16:47 -0700)]
iris: Add texture cache flushing hacks for blit and resource_copy_region
This is a port of Jason's
8379bff6c4456f8a77041eee225dcd44e5e00a76
from i965 to iris. We can't find anything relevant in the documentation
and no one we've talked to has been able to help us pin down a solution.
Unfortunately, we have to put the hack in both iris_blit() and
iris_copy_region(). st/mesa's CopyImage() implementation sometimes
chooses to use pipe->blit() instead of pipe->resource_copy_region().
For blits, we only do the hack if the blit source format doesn't match
the underlying resource (i.e. it's reinterpreting the bits). Hopefully
this should not be too common.
Eric Anholt [Mon, 15 Apr 2019 23:36:17 +0000 (16:36 -0700)]
v3d: Always set up the qregs for CSD payload.
We were failing to set up payload[1] for use by LocalInvocationIndex/ID
and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID
wasn't used (possibly because you only have one workgroup). You're always
going to use payload[1], and payload[0] is common enough and we have DCE
in the backend to clean it up if it happens to not be used.
Eric Anholt [Fri, 12 Apr 2019 16:38:03 +0000 (09:38 -0700)]
v3d: Only look up the 3rd texture gather offset for non-arrays.
Fixes assertion failures in the CTS since Karol's cleanup when NIR started
noticing that we were reading an invalid component.
Fixes: 5450f1c9fb09 ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")
Caio Marcelo de Oliveira Filho [Wed, 10 Apr 2019 18:13:40 +0000 (11:13 -0700)]
spirv: Tell which opcode or value is unhandled when failing
v2: When available, include the opcode name too. (Karol)
v3: Use more to_string helpers. (Karol)
Include the wrong bit_size in those failures.
Include the capability number in spv_check_supported.
Provide vtn_fail_with_* macros to avoid noise in the call sites.
v4: Provide macros only for opcode and decoration, which have enough
usages to justify them. (Jason)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Caio Marcelo de Oliveira Filho [Wed, 10 Apr 2019 17:04:05 +0000 (10:04 -0700)]
spirv: Add more to_string helpers
Also, use a set to identify repeated values. The previous arrangement
worked when the repetitions were one after another, but in some of the
new cases they are not.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Jason Ekstrand [Tue, 16 Apr 2019 17:37:12 +0000 (12:37 -0500)]
intel/mi_builder: Disable mem_mem tests on IVB
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Kenneth Graunke [Tue, 16 Apr 2019 07:27:33 +0000 (00:27 -0700)]
iris: Change vendor and renderer strings
This patch changes the GL_VENDOR string from "Mesa Project" to "Intel".
This makes GLX_MESA_query_renderer report "Vendor: Intel (0x8086)"
instead of "Vendor: Mesa Project (0x8086)" which is arguably wrong.
We now also use a consistent vendor string across Windows and Linux.
It also prepends "Mesa" to the GL_RENDERER string, both to credit the
community and have a distinguishing mark between the two drivers. We
drop "DRI" compared to i965, as it's not really that important.
Improves performance in Portal by 1.8x. Iris is now 3.86% faster
than i965 at the portal-d1.dem timedemo on my Kabylake laptop. One
change is that Portal selects the MapBufferRange path based on the
vendor string, and iris's BufferSubData path is still missing the
storage invalidation optimization.
Jason Ekstrand [Mon, 15 Apr 2019 20:39:22 +0000 (15:39 -0500)]
intel/mi_builder: Re-order an initializer
The order doesn't matter in C99 but some C++ compilers seem to care.
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Jason Ekstrand [Sat, 13 Apr 2019 15:35:07 +0000 (10:35 -0500)]
nir/algebraic: Use a cache to avoid re-emitting structs
This takes the stupid simplest and most reliable approach to reducing
redundancy that I could come up with: Just use the struct declaration
as the cach key. This cuts the size of the generated C file to about
half and takes about 50 KiB off the .data section.
size before (release build):
text data bss dec hex filename
5363833 336880 13584
5714297 573179 _install/lib64/libvulkan_intel.so
size after (release build):
text data bss dec hex filename
5229017 285264 13584
5527865 545939 _install/lib64/libvulkan_intel.so
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 13 Apr 2019 15:32:55 +0000 (10:32 -0500)]
nir/algebraic: Move the template closer to the render function
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Kenneth Graunke [Tue, 16 Apr 2019 05:58:17 +0000 (22:58 -0700)]
iris: Move iris_debug_recompile calls before uploading.
Order of operations is important, otherwise we'll find the program we
just uploaded as the "old" compile and get confused why nothing is
different between the two keys.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Tue, 16 Apr 2019 05:17:49 +0000 (22:17 -0700)]
iris: Print the reason for shader recompiles.
I was lazy earlier and hadn't bothered typing / refactoring this.
Now I'm hitting some extra recompiles and would like to see why.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Tue, 16 Apr 2019 04:59:50 +0000 (21:59 -0700)]
i965: Move program key debugging to the compiler.
The i965 driver has a bunch of code to compare two sets of program keys
and print out the differences. This can be useful for debugging why a
shader needed to be recompiled on the fly due to non-orthogonal state
dependencies. anv doesn't do recompiles, so we didn't need to share
this in the past - but I'd like to use it in iris.
This moves the bulk of the code to the compiler where it can be reused.
To make that possible, we need to decouple it from i965 - we can't get
at the brw program cache directly, nor use brw_context to print things.
Instead, we use compiler->shader_perf_log(), and simply pass in keys.
We put all of this debugging code in brw_debug_recompile.c, and only
export a single function, for simplicity. I also tidied the code a
bit while moving it, now that it all lives in one file.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Marek Olšák [Mon, 15 Apr 2019 16:49:33 +0000 (12:49 -0400)]
winsys/amdgpu: don't set GTT with GDS & OA placements on APUs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 10 Apr 2019 01:40:33 +0000 (21:40 -0400)]
nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possible
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
suresh guttula [Thu, 11 Apr 2019 04:51:56 +0000 (10:21 +0530)]
st/va/enc: Add support for frame_cropping_flag of VAEncSequenceParameterBufferH264
This patch will add support for frame_cropping when the input size is not
matched with aligned size. Currently vaapi driver ignores frame cropping
values provided by client. This change will update SPS nalu with proper
cropping values.
Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
suresh guttula [Thu, 11 Apr 2019 04:49:33 +0000 (10:19 +0530)]
radeon/vce:Add support for frame_cropping_flag of VAEncSequenceParameterBufferH264
This patch will add support for frame_cropping when the input size is not
matched with aligned size. Currently vaapi driver ignores frame cropping
values provided by client. This change will update SPS nalu with proper
cropping values.
v2: Moving default crop setting to else when enc_frame_cropping_flag is not set.
Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
suresh guttula [Thu, 11 Apr 2019 04:39:10 +0000 (10:09 +0530)]
vl: Add cropping flags for H264
This patch adds cropping flags for H264 in pipe_h264_enc_pic_control.
Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Tapani Pälli [Fri, 15 Mar 2019 07:47:49 +0000 (09:47 +0200)]
compiler/glsl: handle case where we have multiple users for types
Both Vulkan and OpenGL might be using glsl_types simultaneously or we
can also have multiple concurrent Vulkan instances using glsl_types.
Patch adds a one time init to track number of users and will release
types only when last user calls _glsl_type_singleton_decref().
This change fixes glsl_type memory leaks we have with anv driver.
v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
rename helper functions (Jason)
v3: move init, destroy to happen on GL context init and destroy
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Danylo Piliaiev [Mon, 25 Mar 2019 12:15:27 +0000 (14:15 +0200)]
intel/compiler: Do not reswizzle dst if instruction writes to flag register
If we write to the flag register changing the swizzle would change
what channels are written to the flag register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201
Fixes: 4cd1a0be
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: <ian.d.romanick@intel.com>
Michel Dänzer [Thu, 11 Apr 2019 16:38:30 +0000 (18:38 +0200)]
gitlab-ci: Use LLVM 3.4 from Debian jessie for scons-llvm job
This gets us closer to the officially supported minimum version of LLVM,
which is 3.3.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Fri, 5 Apr 2019 16:32:25 +0000 (18:32 +0200)]
gitlab-ci: Do not use subshells for compiling dependencies
bash subshells don't inherit the -e option by default, so failures in
the subshell commands wouldn't cause the CI job to fail.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Fri, 5 Apr 2019 08:38:05 +0000 (10:38 +0200)]
gitlab-ci: Drop unused clang 5/6 packages
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Fri, 5 Apr 2019 08:36:29 +0000 (10:36 +0200)]
gitlab-ci: Use clang 8 instead of 7
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Wed, 3 Apr 2019 13:48:51 +0000 (15:48 +0200)]
gitlab-ci: Remove unused Debian packages from Docker image
v2:
* Also remove autotools, now that the Mesa autotools build system has
been dropped.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1
Michel Dänzer [Wed, 3 Apr 2019 10:21:48 +0000 (12:21 +0200)]
gitlab-ci: Remove unneded (stuff from) APT command lines
We either compile these locally, or they are dependencies of other
packages we install.
v2:
* Adapt to leaving self-compiled packages untouched.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Thu, 4 Apr 2019 16:01:27 +0000 (18:01 +0200)]
gitlab-ci: Install most packages from Debian buster
We now use the C frontend of GCC 8 instead of 6 (required tweaking the
before_script for the clang job). We cannot use the C++ frontend of GCC
7 or newer yet, because upstream GCC 7 changed some C++ name mangling
stuff in backwards incompatible ways, and LLVM < 6.0 packages aren't
available in buster.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Wed, 3 Apr 2019 10:23:51 +0000 (12:23 +0200)]
gitlab-ci: Use Debian packages instead of pip ones for meson and scons
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Thu, 4 Apr 2019 09:25:28 +0000 (11:25 +0200)]
gitlab-ci: Use HTTPS for APT repositories
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Tue, 2 Apr 2019 14:56:54 +0000 (16:56 +0200)]
gitlab-ci: Use Debian stretch instead of Ubuntu bionic
The APT archive used by the Ubuntu docker image can be slow, even timing
out sometimes, causing spurious failures of the containers-build job.
The Debian docker image uses deb.debian.org, which is backed by a
content distribution network.
One downside is that stretch only has GCC 6, whereas bionic had 7.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Gert Wollny [Thu, 11 Apr 2019 07:18:37 +0000 (09:18 +0200)]
doc/features: Add a few extensions to the feature matrix
These additions already landed but I forgot to update the feature
matrix.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Tue, 16 Apr 2019 07:13:37 +0000 (09:13 +0200)]
radv: sort the shader capabilities alphabetically
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Kenneth Graunke [Tue, 16 Apr 2019 05:34:15 +0000 (22:34 -0700)]
iris: Make shader_perf_log print to stderr if INTEL_DEBUG=perf is set
This matches i965's behavior, and makes sure that shader compiler
messages are visible when setting INTEL_DEBUG=perf.
Samuel Pitoiset [Mon, 15 Apr 2019 15:42:20 +0000 (17:42 +0200)]
radv: enable shaderInt8 on SI and CIK
No CTS failures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Chia-I Wu [Tue, 9 Apr 2019 20:46:38 +0000 (20:46 +0000)]
virgl: fix fence fd version check
Fixes: d1a1c21e762 ("virgl: native fence fd support")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>