Francisco Jerez [Fri, 20 May 2016 04:12:32 +0000 (21:12 -0700)]
i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf().
Prevents an assertion failure in the following commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 26 Apr 2016 00:09:39 +0000 (17:09 -0700)]
i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.
The pass is disabled in SIMD16 dispatch mode for the same reason, it
cannot handle instructions that write multiple MRF registers at once.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 21:27:20 +0000 (14:27 -0700)]
i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 21:17:48 +0000 (14:17 -0700)]
i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 26 Apr 2016 00:30:54 +0000 (17:30 -0700)]
i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 01:18:43 +0000 (18:18 -0700)]
i965/fs: Return 32 bit mask from fs_builder::sample_mask().
This doesn't actually handle the FS case, just add an assertion for
the moment so I don't forget to update it later on for SIMD32 fragment
shader dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 04:26:51 +0000 (21:26 -0700)]
i965/fs: Emit fixed-width null register regardless of the dispatch width.
brw_null_vec() cannot handle widths over 16 but it doesn't really
matter what width we specify for null registers because destination
regions have no width field at the hardware level.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 00:37:25 +0000 (17:37 -0700)]
i965/fs: Fix half() to handle more exotic register files.
horiz_offset() is able to deal with a superset of the register files
currently special-cased in half(). Just call horiz_offset() in all
cases.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 00:32:55 +0000 (17:32 -0700)]
i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.
We'll hit these in some cases during SIMD lowering in 32-wide
programs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 05:40:40 +0000 (22:40 -0700)]
i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 05:13:52 +0000 (22:13 -0700)]
i965/fs: Keep track of flag dependencies with byte granularity during scheduling.
This prevents false dependencies from being created between
instructions that write disjoint 8-bit portions of the flag register
and OTOH should make sure that the scheduler considers dependencies
between instructions that write or read multiple flag subregisters
at once (e.g. 32-wide predication or conditional mods).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 04:34:27 +0000 (21:34 -0700)]
i965/fs: Track flag register liveness with byte granularity.
This is required for correctness in presence of multiple 8-wide flag
writes (e.g. 8-wide instructions with a conditional mod set) which
update a different portion of the same 16-bit flag subregister. Right
now we keep track of flag dataflow with 16-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 16 channels wide,
which can cause live flag register updates to be dead code-eliminated
incorrectly.
Additionally this makes sure that we handle 32-wide flag writes and
reads which may span multiple flag subregisters so the current
approach of just setting/testing a single bit from the live set
wouldn't have worked.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 04:54:35 +0000 (21:54 -0700)]
i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.
v2: Codestyle fixes (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 23:14:13 +0000 (16:14 -0700)]
i965/fs: Expose arbitrary channel execution groups to the IR.
This generalizes the current fs_inst::force_sechalf flag to allow
specifying channel enable groups other than 0 or 8. At some point it
will likely make sense to fix the vec4 generator to support arbitrary
execution groups and then move the definition of fs_inst::group into
backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 07:10:03 +0000 (00:10 -0700)]
i965/ir: Make BROADCAST emit an unmasked single-channel move.
Alternatively we could have extended the current semantics to 32-wide
mode by changing brw_broadcast() to emit multiple indexed MOV
instructions in the generator copying the selected value to all
destination registers, but it seemed rather silly to waste EU cycles
unnecessarily copying the exact same value 32 times in the GRF.
The vstride change in the Align16 path is required to avoid assertions
in validate_reg() since the change causes the execution size of the
MOV and SEL instructions to be equal to the source region width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 23:25:42 +0000 (16:25 -0700)]
i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.
This makes FIND_LIVE_CHANNEL behave like a normal instruction for
non-zero quarter control. On Gen8+ we just leave the quarter control
field of the emitted FBL instruction set to the default value so the
hardware applies the expected shift to the execution mask signals. On
Gen7 we apply the offset manually by specifying a non-zero subregister
offset in the source region of the FBL instruction.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 00:34:14 +0000 (17:34 -0700)]
i965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL.
Due to a Gen7-specific hardware bug native 32-wide instructions get
the lower 16 bits of the execution mask applied incorrectly to both
halves of the instruction, so the MOV trick we currently use wouldn't
work. Instead emit multiple 16-wide MOV instructions in 32-wide mode
in order to cover the whole execution mask.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Sat, 28 May 2016 06:29:02 +0000 (23:29 -0700)]
i965/fs: Lower 32-wide scratch writes in the generator.
The hardware has messages that can write 32 32bit components at once
but the channel enable mask gets messed up. We need to split them
into several 16-wide scratch writes for the channel enables to be
applied correctly. The SIMD lowering pass cannot be used for this
because scratch writes are emitted rather late during register
allocation long after SIMD lowering has been done.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Mon, 16 May 2016 22:47:39 +0000 (15:47 -0700)]
i965/fs: Implement scratch reads and writes of 4 GRFs at a time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Mon, 16 May 2016 23:03:33 +0000 (16:03 -0700)]
i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.
Gen7 hardware expects the block size field in the message descriptor
to be the number of registers minus one instead of the log2 of the
number of registers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 26 Apr 2016 02:20:12 +0000 (19:20 -0700)]
i965/eu: Set execution size explicitly for memory fence send message.
We don't want to emit a 32-wide send message in 32-wide programs. The
memory fence message should have the same effect regardless of the
execution size (as long as it's valid) so just set it to one.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 26 Apr 2016 02:18:30 +0000 (19:18 -0700)]
i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.
In SIMD32 programs the compiler is responsible for providing the
appropriate half of the sample mask in the message header, so the
first and third quarters both map to the first slot group of the
provided 16-bit half, while the second and fourth quarters map to the
second slot group -- IOW they should be equivalent to 1Q and 2Q modulo
two.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 07:13:33 +0000 (00:13 -0700)]
i965/fs: Clean up remaining uses of dispatch_width in the generator.
Most of these are bugs because the intended execution size of an
instruction and the dispatch width of the shader aren't necessarily
the same (especially in SIMD32 programs).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 22:25:28 +0000 (15:25 -0700)]
i965/eu: Remove brw_codegen::compressed and ::compressed_stack.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Sat, 28 May 2016 06:28:46 +0000 (23:28 -0700)]
i965/eu: Use current exec size instead of p->compressed in surface message generation.
This was kind of an abuse of p->compressed, dataport send message
instructions are always uncompressed. Use the current execution size
instead since p->compressed is on its way out.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 02:47:30 +0000 (19:47 -0700)]
i965/fs: No need to reset predicate control after emitting some instructions.
Trivial clean-up.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 02:36:03 +0000 (19:36 -0700)]
i965/fs: Pass current execution size to brw_IF() and brw_DO().
This gets IF and DO instructions working in SIMD32 programs. brw_IF()
and brw_DO() should probably behave in the same way as other generator
functions that emit control flow instructions and just figure out the
right execution size by themselves from the current execution controls
specified through the brw_codegen argument. Changing that will
require updating lots of Gen4-5 clipper code though, so for the moment
just pass the current value redundantly from the FS generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 02:17:31 +0000 (19:17 -0700)]
i965/eu: Stop using p->compressed to specify the exec size of control flow instructions.
p->compressed won't work for SIMD32, we should just be using the
execution size value specified via p->current instead.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 04:43:48 +0000 (21:43 -0700)]
i965/fs: Extend region width calculation to allow arbitrary execution sizes.
Instead of just halving the execution size when the instruction is
compressed hoping that it will give a legal source region width, we
can calculate the maximum legal width value in closed form from the
component size and stride. This makes sure that brw_reg_from_fs_reg()
always returns a valid hardware region even for virtual 32-wide
instructions (e.g. send-like instructions) that would seem to exceed
the hardware region width limit after halving.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Thu, 19 May 2016 02:02:45 +0000 (19:02 -0700)]
i965/fs: Pass the compression mode to brw_reg_from_fs_reg().
Curro is planning to eliminate p->compressed, so let's avoid using it
here and just pass in the value directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 01:48:04 +0000 (18:48 -0700)]
i965/fs: Simplify per-instruction compression control setup in generator.
By using the new compression/group control interface. This will allow
easier extension to support arbitrary channel enable groups at the IR
level.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 10:59:20 +0000 (03:59 -0700)]
i965/fs: No need to set compression control at the top of generate_code().
The right value is dependent on the specific IR instruction being
generated so it has to be reset in every iteration of the loop anyway.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 22:29:27 +0000 (15:29 -0700)]
i965/eu: Fix a bunch of compression control bugs in the generator.
Most of these were resetting quarter control to zero incorrectly even
though everything they needed to do was disable instruction
compression -- The brw_SAMPLE() case was doing the right thing but it
can be simplified slightly by using the new compression control
interface.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 22:29:07 +0000 (15:29 -0700)]
i965/eu: Define alternative interface for setting compression and group controls.
This implements some simple helper functions that can be used to
specify the group of channel enable signals and compression enable
that apply to a brw_inst instruction.
It's intended to replace brw_set_default_compression_control
eventually because the current interface has a number of shortcomings
inherited from the Gen-4-5-centric representation of compression and
group controls as a single non-orthogonal enum: On the one hand it
doesn't work for specifying arbitrary group controls other than 1Q and
2Q, which are frequently useful in SIMD32 and FP64 programs. On the
other hand the current interface forces you to update the compression
*and* group controls simultaneously, which has been the source of a
number of generator bugs (a bunch of them fixed in this series),
because in many cases we would end up resetting the group controls to
zero inadvertently even though everything we wanted to do was disable
instruction compression -- The latter seems especially unfortunate on
Gen6+ hardware which have no explicit compression control, so we would
end up bashing the quarter control field of the instruction for no
benefit.
Instead of a single function that updates both at the same time
introduce separate interfaces to update one or the other independently
preserving the current value of the other (which typically comes from
the back-end IR so it has to be respected).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 07:13:19 +0000 (00:13 -0700)]
i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.
It's just a byte MOV with strided source.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 01:43:54 +0000 (18:43 -0700)]
i965/fs: Remove extract virtual opcodes.
These can be easily represented in the IR as a MOV instruction with
strided source so they seem rather redundant.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 26 Apr 2016 00:35:52 +0000 (17:35 -0700)]
i965: Define brw_int_type() helper.
Intended as a (partial) inverse of type_sz(). Will be useful in the
next commit and some other SIMD32 generator changes I have queued up.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Sat, 28 May 2016 06:22:02 +0000 (23:22 -0700)]
i965/fs: Remove manual splitting of DDY ops in the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 03:02:29 +0000 (20:02 -0700)]
i965/fs: Remove manual unrolling of BFI instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 02:59:18 +0000 (19:59 -0700)]
i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 02:51:50 +0000 (19:51 -0700)]
i965/fs: Drop lowering code for a few three-source instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 19 May 2016 01:41:28 +0000 (18:41 -0700)]
i965/fs: Set default access mode to Align1 for all instructions in the generator.
Currently the generator code for most opcodes honours the default
access mode (which should typically be Align1 in the scalar back-end),
but generate_code() doesn't set it explicitly which means that the
access mode from a previous instruction could leak into the following
ones if you did something special and weren't careful enough to save
and restore the previous access mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 02:10:48 +0000 (19:10 -0700)]
i965/fs: Remove handcrafted math SIMD lowering from the generator.
Most of this wouldn't have worked for SIMD32 and had various
dispatch_width and compression control bugs. It's mostly dead now
with SIMD lowering of math instructions turned on in the compiler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 20:34:46 +0000 (13:34 -0700)]
i965/fs: Limit SIMD width of various virtual opcodes to the maximum supported value.
Which is 16 or 8 in most cases. This will make sure that 32-wide
virtual instructions get chopped up into chunks of their maximum
execution size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 06:44:23 +0000 (23:44 -0700)]
i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.
Only per-channel LOAD_PAYLOAD instructions can be lowered, which
should cover everything that comes in from the front-end.
LOAD_PAYLOAD instructions used to construct actual message payloads
cannot be easily lowered because they contain headers and vectors of
variable type that aren't necessarily channel-aligned -- We shouldn't
find any of them in the program at SIMD lowering time though because
they're introduced during logical send lowering.
An alternative that may be worth considering would be to re-run the
SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 23:27:09 +0000 (16:27 -0700)]
i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time
...on hardware lacking compressed Align16 support. Will allow
simplifying the generator code and fixing it for SIMD32 codegen.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 23:43:05 +0000 (16:43 -0700)]
i965/fs: Apply usual FPU-like execution size restrictions to MULH.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 23:10:38 +0000 (16:10 -0700)]
i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 23:01:29 +0000 (16:01 -0700)]
i965/fs: Assert that IF instruction with embedded compare has legal exec_size.
We shouldn't encounter these right now but if we did it wouldn't be
possible for the SIMD lowering pass to split it into multiple
instructions because of its side effects on control flow, so just
assert in order to kill the program.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 23:00:19 +0000 (16:00 -0700)]
i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 22:58:04 +0000 (15:58 -0700)]
i965/fs: Implement workaround for IVB CMP dependency race in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 20:15:49 +0000 (13:15 -0700)]
i965/fs: Enforce common regioning restrictions by SIMD splitting.
This change addresses a number of hardware restrictions on the source
and destination regions and other execution controls of regular
FPU-like instructions that in some cases can be avoided by reducing
the execution size of the instruction. Some of these restrictions
(e.g. the one about 3src instructions not supporting compression on
some hardware) are currently being worked around case by case in the
generator with ad-hoc splitting code that is buggy in several ways
(e.g. doesn't handle non-trivial execution controls which would break
SIMD32 code), but it seems cleaner to implement as many restrictions
as we can in a single lowering pass since that will allow us to
simplify some of the surrounding code considerably and also make sure
that we don't forget applying them in the future.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 20:14:20 +0000 (13:14 -0700)]
i965/fs: Enforce extended math exec size limits during SIMD lowering.
This teaches the SIMD lowering pass about the hardware limits on the
execution size of math instructions, which will allow simplifying the
generator code and at the same time get rid of a number of bugs in the
manual SIMD unrolling done currently that prevent SIMD32 codegen from
working.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 07:37:37 +0000 (00:37 -0700)]
i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.
Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end. At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 06:54:25 +0000 (23:54 -0700)]
i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends.
The benefit is we will be able to use the SIMD lowering pass to unroll
math instructions of unsupported width and then remove some cruft from
the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 06:52:15 +0000 (23:52 -0700)]
i965/fs: Add missing get_latency_gen7() cases for the Gen7 pull constant opcodes.
This was causing the scheduler to be rather optimistic about the
latency of pull constant opcodes on Gen7+. This might seem to
increase the cycle count estimate calculated by the scheduler itself
for some shaders, even though the actual cycle count should actually
be decreased.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 20:03:31 +0000 (13:03 -0700)]
i965/fs: Rename Gen4 physical varying pull constant load opcode.
For consistency with the Gen7 variant. I'm not doing the same to the
uniform pull constant message at this point because the non-GEN7 one
is still overloaded to be either an expression-like logical
instruction or a Gen4-specific physical send message.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 08:26:03 +0000 (01:26 -0700)]
i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD lowering.
Varying pull constant loads inherit the same limitation of pre-ILK
hardware that requires expanding SIMD8 texel fetch instructions to
SIMD16, we can deal with pull constant loads in the same way it's done
for texturing during SIMD lowering.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 06:18:38 +0000 (23:18 -0700)]
i965/fs: Hide varying pull constant load message setup behind logical opcode.
This will allow the SIMD lowering pass to split 32-wide varying pull
constant loads (not natively supported by the hardware) into 16-wide
instructions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Fri, 20 May 2016 04:32:14 +0000 (21:32 -0700)]
i965/fs: Avoid constant propagation when the type sizes don't match.
The case where the source type of the instruction is smaller than the
immediate type could be handled by calculating the portion of the
immediate read by the instruction (assuming that the source channels
are aligned with the destination channels of the copy) and then
representing the same value as an immediate of the source type
(assuming such an immediate type exists), but the code below doesn't
do that, so just bail for the moment.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 26 Apr 2016 00:25:26 +0000 (17:25 -0700)]
i965/fs: Fix CSE temporary copy for some LOAD_PAYLOAD corner cases.
If the LOAD_PAYLOAD instruction only has header sources it's possible
for the number of registers written to be less than or equal to the
SIMD component size, in which case it would take the single-MOV path
at the bottom which would cause the channel enable masks to be applied
incorrectly to the header contents and/or cause it to write past the
end of the allocated temporary. If the instruction is either
LOAD_PAYLOAD or doesn't write exactly one component the MOV path is
going to mess up the program so just don't use it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 23:48:32 +0000 (16:48 -0700)]
i965/fs: Handle instruction predication in SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 23:54:16 +0000 (16:54 -0700)]
i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering.
If the source value is going to the same for all SIMD-lowered chunks
of the instruction there should be no need to unzip the value into
multiple temporary registers one for each lowered chunk. As a side
effect this fixes SIMD lowering of instructions with a vector
immediate source. In the long term it *might* still be worth fixing
offset() to handle vector immediates correctly though, this should be
good enough for the moment.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 18 May 2016 00:45:41 +0000 (17:45 -0700)]
i965/fs: Generalize is_uniform() to is_periodic().
This will be useful in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 17 May 2016 00:19:17 +0000 (17:19 -0700)]
i965/fs: Fix byte_offset() for MRF/ARF/FIXED_GRF regs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Tue, 24 May 2016 02:32:51 +0000 (19:32 -0700)]
i965/fs: Fix off-by-one region overlap comparison in copy propagation.
This was introduced in
cf375a3333e54a01462f192202d609436e5fbec8 but
the blame is mine because the pseudocode I sent in my review comment
for the original patch suggesting to do things this way already had
the off-by-one error. This may have caused copy propagation to be
unnecessarily strict while checking whether VGRF writes interfere with
any ACP entries and possibly miss valid optimization opportunities in
cases where multiple copy instructions write sequential locations of
the same VGRF.
Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Ronie Salgado [Sat, 28 May 2016 00:32:44 +0000 (17:32 -0700)]
anv/cmd_buffer: Don't delete command buffers in ResetCommandPool()
v2 (Jason Ekstrand): Destroy command buffers in DestroyCommandPool().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95034
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Brian Paul [Sat, 28 May 2016 00:32:04 +0000 (18:32 -0600)]
gallium/util: another s/unsigned/enum pipe_prim_type/ for clang
Trivial.
Jason Ekstrand [Tue, 24 May 2016 19:06:35 +0000 (12:06 -0700)]
anv: Try the first 8 render nodes instead of just renderD128
This way, if you have other cards installed, the Vulkan driver will still
work. No guarantees about WSI working correctly but offscreen should at
least work.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95537
Jason Ekstrand [Tue, 24 May 2016 18:02:18 +0000 (11:02 -0700)]
anv: strdup the device path into the physical device
This way we don't have to assume that the string coming in is a piece of
constant data that exists forever.
Jason Ekstrand [Sat, 28 May 2016 00:16:09 +0000 (17:16 -0700)]
anv/formats: Exit early for unsupported formats
Jason Ekstrand [Sat, 28 May 2016 00:14:29 +0000 (17:14 -0700)]
anv/formats: Map VK_FORMAT_UNDEFINED to ISL_FORMAT_UNSUPPORTED
At one point in time, we may have used the mapping to ISL_FORMAT_RAW for
certain buffer surfaces but that time has long since passed. This fixes a
bug where doing format queries on VK_FORMAT_UNDEFINED would assert-fail.
Jason Ekstrand [Sat, 28 May 2016 00:13:45 +0000 (17:13 -0700)]
anv/clear: Remove an unused variable
Brian Paul [Fri, 27 May 2016 21:56:07 +0000 (15:56 -0600)]
gallium/util: another unsigned -> enum pipe_prim_type change
gcc didn't warn about the unsigned / enum pipe_prim_type mismatch
between the .c and .h file.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Jordan Justen [Wed, 18 May 2016 19:04:03 +0000 (12:04 -0700)]
i965/compute: Fix uniform init issue when SIMD8 is skipped
In
d8347f12ead89c5a58f69ce9283a54ac8487159c, we added support for
skipping SIMD8 generation when the program local size is too large for
SIMD8 to be usable. This change was missed in that commit.
This bug would impact gen7 platforms when the compute shader local
size is greater than 512, and gen8 platforms when the local size is
greater than 448.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bas Nieuwenhuizen [Fri, 27 May 2016 22:57:31 +0000 (00:57 +0200)]
docs: Mention GL4.3 and ES3.1 support for nvc0 and radeonsi
v2: also update the introductory text.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Jason Ekstrand [Fri, 11 Mar 2016 03:15:32 +0000 (19:15 -0800)]
anv: Emit DRAWING_RECTANGLE once at driver initialization
Also, we don't actually need it for clipping because meta always colors
inside the lines and, for all other operations, the user is required to set
a scissor. Since DRAWING_RECTANGLE stalls the GPU, we want to emit it as
little as possible.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 20 May 2016 18:49:12 +0000 (11:49 -0700)]
anv/cmd_buffer: Only emit PIPE_CONTROL on-demand
This is in contrast to emitting it directly in vkCmdPipelineBarrier. This
has a couple of advantages. First, it means that no matter how many
vkCmdPipelineBarrier calls the application strings together it gets one or
two PIPE_CONTROLs. Second, it allow us to better track when we need to do
stalls because we can flag when a flush has happened and we need a stall.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 20 May 2016 19:07:53 +0000 (12:07 -0700)]
genxml: Make PIPE_CONTROL::CommandStreamerStallEnable a boolean
This has been declared as a uint since SNB but it's only one bit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 20 May 2016 07:11:32 +0000 (00:11 -0700)]
anv/clear: Only clear the render area when doing subpass clears
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Wed, 9 Mar 2016 02:10:22 +0000 (18:10 -0800)]
anv: Move push constant allocation to the command buffer
Instead of blasting it out as part of the pipeline, we put it in the
command buffer and only blast it out when it's really needed. Since the
PUSH_CONSTANT_ALLOC commands aren't pipelined, they immediately cause a
stall which we would like to avoid.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bas Nieuwenhuizen [Mon, 18 Apr 2016 22:47:49 +0000 (00:47 +0200)]
radeonsi: enable OpenGL 4.3
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Dave Airlie [Fri, 27 May 2016 19:51:12 +0000 (05:51 +1000)]
nouveau: enable GL 4.3 on kepler/fermi
Signed-off-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Fri, 27 May 2016 10:39:30 +0000 (12:39 +0200)]
radeonsi: always reserve output space for tess factors
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 27 May 2016 03:21:57 +0000 (13:21 +1000)]
glsl/linker: call link_uniform blocks on linked shader.
The old code called this on the prelinked shader list,
but at this point we have the linked shader, so we should
call the interface on that alone.
This fixes a regression in:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.13
introduced in
5b2675093e863a52b610f112884ae12d42513770
glsl: handle implicit sized arrays in ssbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96228
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reported-by: Mark James
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 27 May 2016 05:11:33 +0000 (15:11 +1000)]
mesa/get: drop unused extension checks.
These all show up as unused warnings here, so drop them for now.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Fri, 27 May 2016 11:55:56 +0000 (13:55 +0200)]
gallium/ddebug: Add passthrough for query_memory_info.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jason Ekstrand [Wed, 25 May 2016 17:51:33 +0000 (10:51 -0700)]
nir/inline: Also rewrite param derefs for texture instructions
Without this, samplers get left hanging as derefs to variables that don't
actually exist.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 25 May 2016 17:48:05 +0000 (10:48 -0700)]
nir/inline: Break the guts of rewrite_param-derefs into a helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 25 May 2016 17:36:23 +0000 (10:36 -0700)]
nir/inline: Make the rewrite_param_derefs helper work on instructions
Now that we have the better nir_foreach_block macro, there's no reason to
use the archaic block version for everything.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 27 May 2016 16:25:51 +0000 (09:25 -0700)]
nir/inline: Don't use foreach_instr_safe unless we need to
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Roland Scheidegger [Thu, 12 May 2016 23:44:39 +0000 (01:44 +0200)]
gallivm: eliminate a unnecessary AND with unorm lerps
Instead of doing a add and then mask out the upper bits, we can
simply do a add with a half wide type (this, of course, assumes
the hw can actually do it...), so we'll get the required zero
in the upper bits automatically.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Fri, 27 May 2016 16:49:44 +0000 (18:49 +0200)]
gallium/util: use enum pipe_prim_type instead of unsigned some more
There were complaints from a mingw build:
u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’
to ‘pipe_prim_type’ [-fpermissive]
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul [Fri, 27 May 2016 00:58:16 +0000 (18:58 -0600)]
svga: remove unneeded casts in get_query_result_vgpu9() calls
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 27 May 2016 00:57:51 +0000 (18:57 -0600)]
svga: use MAYBE_UNUSED to silence release-build warnings
Signed-off-by: Brian Paul <brianp@vmware.com>
Ben Widawsky [Fri, 27 May 2016 04:59:17 +0000 (21:59 -0700)]
isl: Fix some tautological-compare warnings
Fixes:
isl.c:62:22: warning: self-comparison always evaluates to true [-Wtautological-compare]
assert(ISL_DEV_GEN(dev) == dev->info->gen);
^~
isl.c:63:33: warning: self-comparison always evaluates to true [-Wtautological-compare]
assert(ISL_DEV_USE_SEPARATE_STENCIL(dev) == dev->use_separate_stencil);
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ilia Mirkin [Thu, 26 May 2016 17:58:42 +0000 (13:58 -0400)]
mesa: add support for GLSL ES 3.20 version string
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ilia Mirkin [Thu, 26 May 2016 17:58:41 +0000 (13:58 -0400)]
mapi: expose new functions in GL ES 3.2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ilia Mirkin [Thu, 26 May 2016 02:41:06 +0000 (22:41 -0400)]
nvc0/ir: handle a load's reg result not being used for locked variants
For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Ilia Mirkin [Thu, 26 May 2016 01:54:39 +0000 (21:54 -0400)]
nvc0/ir: avoid generating illegal instructions for compute constbuf loads
For user-supplied constbufs, fileIndex is 0. In that case, when we
subtract 1, we'll end up loading from constbuf offset -16. This is
illegal, and there are asserts to avoid it. Normally we'd just DCE it,
but no point in generating the instructions if they're not going to be
used.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>