mesa.git
8 years agogallium: push offset down to driver
Stanimir Varbanov [Thu, 26 May 2016 22:10:37 +0000 (01:10 +0300)]
gallium: push offset down to driver

Push offset down to drivers when importing dmabuf. This is needed
to more fully support EGL_EXT_image_dma_buf_import when a non-zero
offset is specified.

Tesing has been done for freedreno, and compile tested following
gallium drivers:
nouveau,svga,virgl,r600,r300,radeonsi,swrast,i915,ilo

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
8 years agost/dri: cleanup image_from_fd/dma_buf paths
Stanimir Varbanov [Thu, 26 May 2016 22:10:36 +0000 (01:10 +0300)]
st/dri: cleanup image_from_fd/dma_buf paths

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
8 years agost/dri: add handling of R8 and GR88 DRI fourcc formats
Stanimir Varbanov [Thu, 26 May 2016 22:10:35 +0000 (01:10 +0300)]
st/dri: add handling of R8 and GR88 DRI fourcc formats

This helps to import dmabuf buffers from DRM_FORMAT_R8 and
DRM_FORMAT_GR88 used for example by GStreamer for YUV to RGB
conversion using shaders.

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
8 years agoradeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards.
Bas Nieuwenhuizen [Sun, 29 May 2016 16:35:22 +0000 (18:35 +0200)]
radeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96239
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoi965: Expose GL 4.3 on Gen8+.
Francisco Jerez [Fri, 20 May 2016 07:19:18 +0000 (00:19 -0700)]
i965: Expose GL 4.3 on Gen8+.

ARB_compute_shader was the last feature missing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block.
Francisco Jerez [Wed, 25 May 2016 21:21:49 +0000 (14:21 -0700)]
i965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block.

We know that there cannot be any destination dependency race if we
reach the beginning or end of the program without having found any
other instruction the send could possibly race with.  This avoids
emitting a pile of useless moves at the beginning or end of the
program in the most common case in which the program has a single
basic block only.

On the original i965 I get the following shader-db results:

 total instructions in shared programs: 3354165 -> 3215637 (-4.13%)
 instructions in affected programs: 3183065 -> 3044537 (-4.35%)
 helped: 13498
 HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Skip SIMD lowering source unzipping for regular scalar regions.
Francisco Jerez [Sun, 29 May 2016 05:44:13 +0000 (22:44 -0700)]
i965/fs: Skip SIMD lowering source unzipping for regular scalar regions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Factor out region zipping and unzipping from the SIMD lowering pass.
Francisco Jerez [Fri, 27 May 2016 06:07:58 +0000 (23:07 -0700)]
i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass.

Just to make sure we keep the SIMD lowering pass tidy when we
introduce additional logic to try to optimize out the copy
instructions used to zip and unzip the destination and source regions
into multiple packed regions of the lowered instruction width.
Shouldn't cause any functional changes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files.
Francisco Jerez [Fri, 27 May 2016 06:20:19 +0000 (23:20 -0700)]
i965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files.

This will be useful in several places.  The only externally visible
difference (other than non-VGRF files being supported now) is that the
region sizes are now passed in byte units instead of in GRF units
because the loss of precision would have become a problem in the SIMD
lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Refactor offset() into a separate function taking the width as argument.
Francisco Jerez [Fri, 27 May 2016 06:09:46 +0000 (23:09 -0700)]
i965/fs: Refactor offset() into a separate function taking the width as argument.

This will be useful in the SIMD lowering pass to avoid having to
construct a builder object of the known region width just to pass it
as argument to offset(), which doesn't do anything with it other than
taking the builder dispatch_width as region width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Implement opt_sampler_eot() in terms of logical sends.
Francisco Jerez [Fri, 20 May 2016 07:38:17 +0000 (00:38 -0700)]
i965/fs: Implement opt_sampler_eot() in terms of logical sends.

This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies
the code and will allow the optimization to succeed in more cases
independent of whether the LOAD_PAYLOAD instruction can be found or
not.

The following patch is squashed in:

SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot().

The sampler EOT optimization pass naively assumes that the texturing
instruction provides all the data used by the FB write just because
they're standing next to each other.  The least we should be checking
is whether the source and destination regions of the FB write and
texturing instructions match.  Without this the previous seemingly
harmless patch would have caused opt_sampler_eot() to misoptimize a
shader from dota-2 causing DCE to eliminate all of its 78 instructions
except for the final sampler EOT message (!).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Fix UB list sentinel dereference in opt_sampler_eot().
Francisco Jerez [Sat, 30 Apr 2016 21:24:31 +0000 (14:24 -0700)]
i965/fs: Fix UB list sentinel dereference in opt_sampler_eot().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Take opt_redundant_discard_jumps out of the optimization loop.
Francisco Jerez [Wed, 4 May 2016 01:36:02 +0000 (18:36 -0700)]
i965/fs: Take opt_redundant_discard_jumps out of the optimization loop.

No shader-db regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Run SIMD and logical send lowering after the optimization loop.
Francisco Jerez [Sat, 30 Apr 2016 22:08:29 +0000 (15:08 -0700)]
i965/fs: Run SIMD and logical send lowering after the optimization loop.

There are two reasons why this is useful:

 - It avoids the introduction of an amount of partial writes emitted
   by the SIMD lowering pass to zip and unzip register regions early
   during optimization, which can make subsequent optimization less
   effective.

 - It substantially reduces the burden on the compiler when a large
   fraction of the instructions in the program need to be split (e.g.
   during SIMD32 builds).  Individual halves of split instructions
   will be optimized identically (if they can still be optimized at
   all), so doing it up front can duplicate the amount of instructions
   the optimizer has to deal with which causes the compilation time to
   explode in some cases due to the worse-than-linear runtime
   behaviour of the back-end.

It seems helpful to re-run a few optimization passes in cases where
any of the lowering passes was able to make progress.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to has_side_effects().
Francisco Jerez [Sat, 30 Apr 2016 21:57:59 +0000 (14:57 -0700)]
i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to has_side_effects().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Allow constant propagation into logical send sources.
Francisco Jerez [Sat, 30 Apr 2016 06:36:59 +0000 (23:36 -0700)]
i965/fs: Allow constant propagation into logical send sources.

Logical sends are eventually lowered into a series of copies so they
can take almost anything as source.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Let CSE handle logical sampler sends as expressions.
Francisco Jerez [Sat, 30 Apr 2016 06:35:01 +0000 (23:35 -0700)]
i965/fs: Let CSE handle logical sampler sends as expressions.

This will prevent some shader-db regressions when we start plumbing
logical sends through the optimizer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Pass a BAD_FILE register to the logical FB write when oMask is unused.
Francisco Jerez [Sat, 30 Apr 2016 02:47:44 +0000 (19:47 -0700)]
i965/fs: Pass a BAD_FILE register to the logical FB write when oMask is unused.

This will let the optimizer know that the sample mask value is unused
so its definition can be DCE'ed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoglsl: fix xfb_offset unsized array validation
Timothy Arceri [Fri, 27 May 2016 09:07:19 +0000 (19:07 +1000)]
glsl: fix xfb_offset unsized array validation

This partially fixes CTS test:
GL44-CTS.enhanced_layouts.xfb_get_program_resource_api

The test now fails at a tes evaluation shader with unsized output arrays.

The ARB_enhanced_layouts spec says:

   "It is a compile-time error to apply xfb_offset to the declaration of an
   unsized array."

So this seems like a bug in the CTS.

Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agoglsl: dont crash when attempting to assign a value to a builtin define
Timothy Arceri [Mon, 30 May 2016 02:16:39 +0000 (12:16 +1000)]
glsl: dont crash when attempting to assign a value to a builtin define

For example GL_ARB_enhanced_layouts = 3;

Fixes:
GL44-CTS.enhanced_layouts.glsl_contant_immutablity

Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agoegl/dri3: don't crash on no context.
Dave Airlie [Sun, 29 May 2016 22:02:00 +0000 (08:02 +1000)]
egl/dri3: don't crash on no context.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94925

Pointed out by Karol Herbst on irc.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agomesa/program_interface_query: fix transform feedback varyings.
Dave Airlie [Mon, 23 May 2016 21:58:32 +0000 (07:58 +1000)]
mesa/program_interface_query: fix transform feedback varyings.

The spec says gl_NextBuffer and gl_SkipComponents need to be
returned to userspace in the program interface queries.

We currently throw those away, this requires a complete piglit
run to make sure no drivers fallover due to the extra varyings.

This fixes:
GL45-CTS.program_interface_query.transform-feedback-built-in

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoglsl/ast: subroutineTypes can't be returned from functions.
Dave Airlie [Mon, 23 May 2016 02:14:01 +0000 (12:14 +1000)]
glsl/ast: subroutineTypes can't be returned from functions.

These types can't be returned.

This fixes:
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
for the return type case.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoglsl: use has_double() helper
Timothy Arceri [Sat, 28 May 2016 01:56:17 +0000 (11:56 +1000)]
glsl: use has_double() helper

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
8 years agoglsl: fix explicit uniform block alignment
Timothy Arceri [Sat, 28 May 2016 01:40:22 +0000 (11:40 +1000)]
glsl: fix explicit uniform block alignment

This stops the offset being bumped again when and an explicit
alignment has already been applied.

Fixes alignment issues in:
GL44-CTS.enhanced_layouts.uniform_block_alignment

Note the test still fails due to unrelated issues with doubles.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
8 years agoi965: Shrink stage_prog_data param array length
Jordan Justen [Sun, 29 May 2016 00:57:31 +0000 (17:57 -0700)]
i965: Shrink stage_prog_data param array length

It appears we were over-allocating these arrays.

Previously we would use nir->num_uniforms directly for scalar
programs, and multiply it by 4 for vec4 programs.

Instead we should have been dividing by 4 in both cases to convert
from bytes to a gl_constant_value count. The size of gl_constant_value
is 4 bytes.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonv50,nvc0: fix the max_vertices=0 case
Ilia Mirkin [Sun, 29 May 2016 13:26:11 +0000 (09:26 -0400)]
nv50,nvc0: fix the max_vertices=0 case

This is apparently legal. Drop any emit/restarts, and pass a 1 to the
hardware.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agost/mesa: fix setting of point_size_per_vertex in ES contexts
Ilia Mirkin [Sun, 29 May 2016 02:38:24 +0000 (22:38 -0400)]
st/mesa: fix setting of point_size_per_vertex in ES contexts

GL ES 2.0+ does not have a GL_PROGRAM_POINT_SIZE enable, unlike desktop
GL. So we have to go and check the last pre-rasterizer stage to see
whether it outputs a point size or not.

This fixes a number of dEQP tests that use a geometry or tessellation
shader to emit points primitives.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
8 years agomesa: skip level checking for FramebufferTexture*D if texture is zero
Marek Olšák [Fri, 27 May 2016 19:40:19 +0000 (21:40 +0200)]
mesa: skip level checking for FramebufferTexture*D if texture is zero

From the OpenGL 4.5 core spec:
  "An INVALID_VALUE error is generated if texture is not zero and level is
  not a supported texture level for textarget, as described above."

Other FramebufferTexture functions already do the right thing.

This fixes the main menu in F1 2015.

Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agost/mesa: expose OES_shader_io_blocks when we have enough for ES 3.1
Ilia Mirkin [Fri, 27 May 2016 03:02:18 +0000 (23:02 -0400)]
st/mesa: expose OES_shader_io_blocks when we have enough for ES 3.1

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoswr: [rasterizer] Do not define _mm256_storeu2_m128i with icc.
Vinson Lee [Thu, 26 May 2016 04:09:10 +0000 (21:09 -0700)]
swr: [rasterizer] Do not define _mm256_storeu2_m128i with icc.

Fix build error with icc.

  CXX      libswrAVX_la-swr_clear.lo
icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor'
In file included from ./rasterizer/jitter/jit_api.h(31),
                 from swr_context.h(30),
                 from swr_clear.cpp(24):
./rasterizer/common/os.h(135): error: expected an identifier
  void _mm256_storeu2_m128i(__m128i *hi, __m128i *lo, __m256i a)
       ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
8 years agoi965: add missing return in if statement
Thomas Hindoe Paaboel Andersen [Sat, 28 May 2016 11:16:03 +0000 (13:16 +0200)]
i965: add missing return in if statement

Re-add the "return false" that was removed in 0c02d7002d6c005b4c1fe997b5ef5916978dd183

It seems that something went wrong when merging the patch. The patch
sent to the mailing list does not directly match what was committed.
https://lists.freedesktop.org/archives/mesa-dev/2016-May/118198.html

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agogk110/ir: fix unspilling of predicates from registers
Ilia Mirkin [Sat, 28 May 2016 17:07:12 +0000 (13:07 -0400)]
gk110/ir: fix unspilling of predicates from registers

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96258
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
8 years agonvc0: remove outdated surfaces validation code for GK104
Samuel Pitoiset [Fri, 27 May 2016 08:14:45 +0000 (10:14 +0200)]
nvc0: remove outdated surfaces validation code for GK104

This code was used for validating surfaces with compute but now we use
pipe_image_view instead. Anyway, surfaces support should be
re-introduced properly once OpenCL happens.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonvc0: do not always invalidate 3D CBs when using compute
Samuel Pitoiset [Thu, 26 May 2016 22:30:17 +0000 (00:30 +0200)]
nvc0: do not always invalidate 3D CBs when using compute

Constant buffers are aliased between 3D and CP on Fermi, but we should
only invalidate them when a compute shader actually uses CBs and not
all the time after a lauching grid.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoi965: Update compute workgroup size limit calculation for SIMD32.
Francisco Jerez [Tue, 26 Apr 2016 00:02:25 +0000 (17:02 -0700)]
i965: Update compute workgroup size limit calculation for SIMD32.

This should have the side effect of enabling the ARB_compute_shader
extension on Gen8+ hardware and all Gen7 platforms that didn't
previously expose it (VLV and IVB GT1) due to the number of hardware
threads per subslice being insufficient in SIMD16 mode.

v2: Bump workgroup size limit for GLES too (Jordan).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
8 years agoi965: Add do32 debug option.
Francisco Jerez [Fri, 27 May 2016 04:28:45 +0000 (21:28 -0700)]
i965: Add do32 debug option.

The do32 INTEL_DEBUG option causes the back-end to try to generate a
SIMD32 program when compiling a compute shader regardless of the
specified compute shader workgroup size, which will be useful for
testing SIMD32 code generation in the most common case in which the
workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen
wouldn't be automatically enabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Build 32-wide compute shader when needed.
Francisco Jerez [Tue, 17 May 2016 01:25:22 +0000 (18:25 -0700)]
i965/fs: Build 32-wide compute shader when needed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Extend back-end interface for limiting the shader dispatch width.
Francisco Jerez [Wed, 18 May 2016 21:39:52 +0000 (14:39 -0700)]
i965/fs: Extend back-end interface for limiting the shader dispatch width.

This replaces the current fs_visitor::no16() interface with
fs_visitor::limit_dispatch_width(), which takes an additional
parameter allowing the caller to specify the maximum dispatch width a
shader can be compiled with.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Implement SIMD32 register allocation support.
Francisco Jerez [Wed, 18 May 2016 20:52:25 +0000 (13:52 -0700)]
i965/fs: Implement SIMD32 register allocation support.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Remove pre-Gen7 register allocation class micro-optimization.
Francisco Jerez [Sun, 1 May 2016 03:47:49 +0000 (20:47 -0700)]
i965/fs: Remove pre-Gen7 register allocation class micro-optimization.

This was trying to save some one-time init on pre-Gen7 hardware under
the assumption that one would only ever need 1, 2, 4 and 8-wide
registers on those platforms.  However nothing guarantees that those
will be the only VGRF sizes used after lowering and optimization.  In
some cases we may end up with a temporary of different size being
allocated (e.g. by SIMD lowering to zip or unzip a multi-component
register region of a logical send instruction), and there is no
guarantee that they will be optimized away before register allocation
(especially since the compute_to_mrf coalescing pass is
rather... lacking...).  Instead just allocate classes for all possible
VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we
encounter a variable of any other size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Don't mutate multi-component arguments in sampler payload set-up.
Francisco Jerez [Sun, 1 May 2016 04:54:47 +0000 (21:54 -0700)]
i965/fs: Don't mutate multi-component arguments in sampler payload set-up.

The Gen5+ sampler message payload construction code steps through the
coordinate and derivative components by induction like 'coordinate =
offset(coordinate, bld, 1)', the problem is that while doing that it
may step one past the end of the coordinate vector causing an
assertion failure in offset() if it happens to be a (single component)
immediate.  Right now coordinates and derivatives are typically passed
as actual registers but that will no longer be the case when we start
propagating constants into logical messages.

Instead express coordinate components in closed form like
'offset(coordinate, bld, i)' -- The end result seems slightly more
readable that way and it allows passing the coordinate and derivative
registers by const reference instead of by value, so it seems like a
clean-up in its own right.

v2: Fold a few post-increment operators into the last MOV
    statement. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Fix multiple ACP interference during copy propagation.
Francisco Jerez [Fri, 27 May 2016 01:51:41 +0000 (18:51 -0700)]
i965/fs: Fix multiple ACP interference during copy propagation.

This is more fallout from cf375a3333e54a01462f192202d609436e5fbec8.
It's possible for multiple ACP entries to interfere with a given VGRF
write, so we need to continue iterating even if an overlapping entry
has already been found.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
Francisco Jerez [Mon, 25 Apr 2016 22:39:29 +0000 (15:39 -0700)]
i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).

The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction.  This prevents
cmod propagation from (mis)optimizing:

 cmp.z.f0 tmp, ...
 mov.z.f0 null, tmp

into:

 cmp.z.f0 tmp, ...

which gives the negation of the flag result of the original sequence.
I could reproduce this easily with SIMD32 but I don't see any reason
why the problem would be SIMD32-specific, it was most likely working
by luck.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Estimate number of registers written correctly in opt_register_renaming.
Francisco Jerez [Tue, 26 Apr 2016 00:09:00 +0000 (17:09 -0700)]
i965/fs: Estimate number of registers written correctly in opt_register_renaming.

The current estimate is incorrect for non-32b types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.
Francisco Jerez [Tue, 26 Apr 2016 00:25:48 +0000 (17:25 -0700)]
i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.

These are completely ignored by the conversion to brw_reg, so they
better be zero.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf().
Francisco Jerez [Fri, 20 May 2016 04:12:32 +0000 (21:12 -0700)]
i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf().

Prevents an assertion failure in the following commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.
Francisco Jerez [Tue, 26 Apr 2016 00:09:39 +0000 (17:09 -0700)]
i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.

The pass is disabled in SIMD16 dispatch mode for the same reason, it
cannot handle instructions that write multiple MRF registers at once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width.
Francisco Jerez [Wed, 18 May 2016 21:27:20 +0000 (14:27 -0700)]
i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.
Francisco Jerez [Wed, 18 May 2016 21:17:48 +0000 (14:17 -0700)]
i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.
Francisco Jerez [Tue, 26 Apr 2016 00:30:54 +0000 (17:30 -0700)]
i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Return 32 bit mask from fs_builder::sample_mask().
Francisco Jerez [Tue, 17 May 2016 01:18:43 +0000 (18:18 -0700)]
i965/fs: Return 32 bit mask from fs_builder::sample_mask().

This doesn't actually handle the FS case, just add an assertion for
the moment so I don't forget to update it later on for SIMD32 fragment
shader dispatch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Emit fixed-width null register regardless of the dispatch width.
Francisco Jerez [Fri, 20 May 2016 04:26:51 +0000 (21:26 -0700)]
i965/fs: Emit fixed-width null register regardless of the dispatch width.

brw_null_vec() cannot handle widths over 16 but it doesn't really
matter what width we specify for null registers because destination
regions have no width field at the hardware level.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Fix half() to handle more exotic register files.
Francisco Jerez [Wed, 18 May 2016 00:37:25 +0000 (17:37 -0700)]
i965/fs: Fix half() to handle more exotic register files.

horiz_offset() is able to deal with a superset of the register files
currently special-cased in half().  Just call horiz_offset() in all
cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.
Francisco Jerez [Wed, 18 May 2016 00:32:55 +0000 (17:32 -0700)]
i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.

We'll hit these in some cases during SIMD lowering in 32-wide
programs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.
Francisco Jerez [Thu, 19 May 2016 05:40:40 +0000 (22:40 -0700)]
i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Keep track of flag dependencies with byte granularity during scheduling.
Francisco Jerez [Thu, 19 May 2016 05:13:52 +0000 (22:13 -0700)]
i965/fs: Keep track of flag dependencies with byte granularity during scheduling.

This prevents false dependencies from being created between
instructions that write disjoint 8-bit portions of the flag register
and OTOH should make sure that the scheduler considers dependencies
between instructions that write or read multiple flag subregisters
at once (e.g. 32-wide predication or conditional mods).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Track flag register liveness with byte granularity.
Francisco Jerez [Thu, 19 May 2016 04:34:27 +0000 (21:34 -0700)]
i965/fs: Track flag register liveness with byte granularity.

This is required for correctness in presence of multiple 8-wide flag
writes (e.g. 8-wide instructions with a conditional mod set) which
update a different portion of the same 16-bit flag subregister.  Right
now we keep track of flag dataflow with 16-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 16 channels wide,
which can cause live flag register updates to be dead code-eliminated
incorrectly.

Additionally this makes sure that we handle 32-wide flag writes and
reads which may span multiple flag subregisters so the current
approach of just setting/testing a single bit from the live set
wouldn't have worked.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Define methods to calculate the flag subset read or written by an fs_inst.
Francisco Jerez [Thu, 19 May 2016 04:54:35 +0000 (21:54 -0700)]
i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.

v2: Codestyle fixes (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Expose arbitrary channel execution groups to the IR.
Francisco Jerez [Fri, 20 May 2016 23:14:13 +0000 (16:14 -0700)]
i965/fs: Expose arbitrary channel execution groups to the IR.

This generalizes the current fs_inst::force_sechalf flag to allow
specifying channel enable groups other than 0 or 8.  At some point it
will likely make sense to fix the vec4 generator to support arbitrary
execution groups and then move the definition of fs_inst::group into
backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/ir: Make BROADCAST emit an unmasked single-channel move.
Francisco Jerez [Thu, 19 May 2016 07:10:03 +0000 (00:10 -0700)]
i965/ir: Make BROADCAST emit an unmasked single-channel move.

Alternatively we could have extended the current semantics to 32-wide
mode by changing brw_broadcast() to emit multiple indexed MOV
instructions in the generator copying the selected value to all
destination registers, but it seemed rather silly to waste EU cycles
unnecessarily copying the exact same value 32 times in the GRF.

The vstride change in the Align16 path is required to avoid assertions
in validate_reg() since the change causes the execution size of the
MOV and SEL instructions to be equal to the source region width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.
Francisco Jerez [Fri, 20 May 2016 23:25:42 +0000 (16:25 -0700)]
i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.

This makes FIND_LIVE_CHANNEL behave like a normal instruction for
non-zero quarter control.  On Gen8+ we just leave the quarter control
field of the emitted FBL instruction set to the default value so the
hardware applies the expected shift to the execution mask signals.  On
Gen7 we apply the offset manually by specifying a non-zero subregister
offset in the source region of the FBL instruction.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL.
Francisco Jerez [Thu, 19 May 2016 00:34:14 +0000 (17:34 -0700)]
i965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL.

Due to a Gen7-specific hardware bug native 32-wide instructions get
the lower 16 bits of the execution mask applied incorrectly to both
halves of the instruction, so the MOV trick we currently use wouldn't
work.  Instead emit multiple 16-wide MOV instructions in 32-wide mode
in order to cover the whole execution mask.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Lower 32-wide scratch writes in the generator.
Francisco Jerez [Sat, 28 May 2016 06:29:02 +0000 (23:29 -0700)]
i965/fs: Lower 32-wide scratch writes in the generator.

The hardware has messages that can write 32 32bit components at once
but the channel enable mask gets messed up.  We need to split them
into several 16-wide scratch writes for the channel enables to be
applied correctly.  The SIMD lowering pass cannot be used for this
because scratch writes are emitted rather late during register
allocation long after SIMD lowering has been done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Implement scratch reads and writes of 4 GRFs at a time.
Francisco Jerez [Mon, 16 May 2016 22:47:39 +0000 (15:47 -0700)]
i965/fs: Implement scratch reads and writes of 4 GRFs at a time.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.
Francisco Jerez [Mon, 16 May 2016 23:03:33 +0000 (16:03 -0700)]
i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.

Gen7 hardware expects the block size field in the message descriptor
to be the number of registers minus one instead of the log2 of the
number of registers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Set execution size explicitly for memory fence send message.
Francisco Jerez [Tue, 26 Apr 2016 02:20:12 +0000 (19:20 -0700)]
i965/eu: Set execution size explicitly for memory fence send message.

We don't want to emit a 32-wide send message in 32-wide programs.  The
memory fence message should have the same effect regardless of the
execution size (as long as it's valid) so just set it to one.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.
Francisco Jerez [Tue, 26 Apr 2016 02:18:30 +0000 (19:18 -0700)]
i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.

In SIMD32 programs the compiler is responsible for providing the
appropriate half of the sample mask in the message header, so the
first and third quarters both map to the first slot group of the
provided 16-bit half, while the second and fourth quarters map to the
second slot group -- IOW they should be equivalent to 1Q and 2Q modulo
two.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Clean up remaining uses of dispatch_width in the generator.
Francisco Jerez [Fri, 20 May 2016 07:13:33 +0000 (00:13 -0700)]
i965/fs: Clean up remaining uses of dispatch_width in the generator.

Most of these are bugs because the intended execution size of an
instruction and the dispatch width of the shader aren't necessarily
the same (especially in SIMD32 programs).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Remove brw_codegen::compressed and ::compressed_stack.
Francisco Jerez [Fri, 20 May 2016 22:25:28 +0000 (15:25 -0700)]
i965/eu: Remove brw_codegen::compressed and ::compressed_stack.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Use current exec size instead of p->compressed in surface message generation.
Francisco Jerez [Sat, 28 May 2016 06:28:46 +0000 (23:28 -0700)]
i965/eu: Use current exec size instead of p->compressed in surface message generation.

This was kind of an abuse of p->compressed, dataport send message
instructions are always uncompressed.  Use the current execution size
instead since p->compressed is on its way out.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: No need to reset predicate control after emitting some instructions.
Francisco Jerez [Thu, 19 May 2016 02:47:30 +0000 (19:47 -0700)]
i965/fs: No need to reset predicate control after emitting some instructions.

Trivial clean-up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Pass current execution size to brw_IF() and brw_DO().
Francisco Jerez [Thu, 19 May 2016 02:36:03 +0000 (19:36 -0700)]
i965/fs: Pass current execution size to brw_IF() and brw_DO().

This gets IF and DO instructions working in SIMD32 programs.  brw_IF()
and brw_DO() should probably behave in the same way as other generator
functions that emit control flow instructions and just figure out the
right execution size by themselves from the current execution controls
specified through the brw_codegen argument.  Changing that will
require updating lots of Gen4-5 clipper code though, so for the moment
just pass the current value redundantly from the FS generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Stop using p->compressed to specify the exec size of control flow instructions.
Francisco Jerez [Thu, 19 May 2016 02:17:31 +0000 (19:17 -0700)]
i965/eu: Stop using p->compressed to specify the exec size of control flow instructions.

p->compressed won't work for SIMD32, we should just be using the
execution size value specified via p->current instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Extend region width calculation to allow arbitrary execution sizes.
Francisco Jerez [Fri, 20 May 2016 04:43:48 +0000 (21:43 -0700)]
i965/fs: Extend region width calculation to allow arbitrary execution sizes.

Instead of just halving the execution size when the instruction is
compressed hoping that it will give a legal source region width, we
can calculate the maximum legal width value in closed form from the
component size and stride.  This makes sure that brw_reg_from_fs_reg()
always returns a valid hardware region even for virtual 32-wide
instructions (e.g. send-like instructions) that would seem to exceed
the hardware region width limit after halving.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Pass the compression mode to brw_reg_from_fs_reg().
Kenneth Graunke [Thu, 19 May 2016 02:02:45 +0000 (19:02 -0700)]
i965/fs: Pass the compression mode to brw_reg_from_fs_reg().

Curro is planning to eliminate p->compressed, so let's avoid using it
here and just pass in the value directly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Simplify per-instruction compression control setup in generator.
Francisco Jerez [Thu, 19 May 2016 01:48:04 +0000 (18:48 -0700)]
i965/fs: Simplify per-instruction compression control setup in generator.

By using the new compression/group control interface.  This will allow
easier extension to support arbitrary channel enable groups at the IR
level.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: No need to set compression control at the top of generate_code().
Francisco Jerez [Wed, 18 May 2016 10:59:20 +0000 (03:59 -0700)]
i965/fs: No need to set compression control at the top of generate_code().

The right value is dependent on the specific IR instruction being
generated so it has to be reset in every iteration of the loop anyway.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Fix a bunch of compression control bugs in the generator.
Francisco Jerez [Wed, 18 May 2016 22:29:27 +0000 (15:29 -0700)]
i965/eu: Fix a bunch of compression control bugs in the generator.

Most of these were resetting quarter control to zero incorrectly even
though everything they needed to do was disable instruction
compression -- The brw_SAMPLE() case was doing the right thing but it
can be simplified slightly by using the new compression control
interface.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/eu: Define alternative interface for setting compression and group controls.
Francisco Jerez [Wed, 18 May 2016 22:29:07 +0000 (15:29 -0700)]
i965/eu: Define alternative interface for setting compression and group controls.

This implements some simple helper functions that can be used to
specify the group of channel enable signals and compression enable
that apply to a brw_inst instruction.

It's intended to replace brw_set_default_compression_control
eventually because the current interface has a number of shortcomings
inherited from the Gen-4-5-centric representation of compression and
group controls as a single non-orthogonal enum: On the one hand it
doesn't work for specifying arbitrary group controls other than 1Q and
2Q, which are frequently useful in SIMD32 and FP64 programs.  On the
other hand the current interface forces you to update the compression
*and* group controls simultaneously, which has been the source of a
number of generator bugs (a bunch of them fixed in this series),
because in many cases we would end up resetting the group controls to
zero inadvertently even though everything we wanted to do was disable
instruction compression -- The latter seems especially unfortunate on
Gen6+ hardware which have no explicit compression control, so we would
end up bashing the quarter control field of the instruction for no
benefit.

Instead of a single function that updates both at the same time
introduce separate interfaces to update one or the other independently
preserving the current value of the other (which typically comes from
the back-end IR so it has to be respected).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.
Francisco Jerez [Fri, 20 May 2016 07:13:19 +0000 (00:13 -0700)]
i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.

It's just a byte MOV with strided source.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Remove extract virtual opcodes.
Francisco Jerez [Thu, 19 May 2016 01:43:54 +0000 (18:43 -0700)]
i965/fs: Remove extract virtual opcodes.

These can be easily represented in the IR as a MOV instruction with
strided source so they seem rather redundant.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Define brw_int_type() helper.
Francisco Jerez [Tue, 26 Apr 2016 00:35:52 +0000 (17:35 -0700)]
i965: Define brw_int_type() helper.

Intended as a (partial) inverse of type_sz().  Will be useful in the
next commit and some other SIMD32 generator changes I have queued up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Remove manual splitting of DDY ops in the generator.
Francisco Jerez [Sat, 28 May 2016 06:22:02 +0000 (23:22 -0700)]
i965/fs: Remove manual splitting of DDY ops in the generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Remove manual unrolling of BFI instructions from the generator.
Francisco Jerez [Wed, 18 May 2016 03:02:29 +0000 (20:02 -0700)]
i965/fs: Remove manual unrolling of BFI instructions from the generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.
Francisco Jerez [Wed, 18 May 2016 02:59:18 +0000 (19:59 -0700)]
i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Drop lowering code for a few three-source instructions from the generator.
Francisco Jerez [Wed, 18 May 2016 02:51:50 +0000 (19:51 -0700)]
i965/fs: Drop lowering code for a few three-source instructions from the generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Set default access mode to Align1 for all instructions in the generator.
Francisco Jerez [Thu, 19 May 2016 01:41:28 +0000 (18:41 -0700)]
i965/fs: Set default access mode to Align1 for all instructions in the generator.

Currently the generator code for most opcodes honours the default
access mode (which should typically be Align1 in the scalar back-end),
but generate_code() doesn't set it explicitly which means that the
access mode from a previous instruction could leak into the following
ones if you did something special and weren't careful enough to save
and restore the previous access mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Remove handcrafted math SIMD lowering from the generator.
Francisco Jerez [Wed, 18 May 2016 02:10:48 +0000 (19:10 -0700)]
i965/fs: Remove handcrafted math SIMD lowering from the generator.

Most of this wouldn't have worked for SIMD32 and had various
dispatch_width and compression control bugs.  It's mostly dead now
with SIMD lowering of math instructions turned on in the compiler.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Limit SIMD width of various virtual opcodes to the maximum supported value.
Francisco Jerez [Fri, 20 May 2016 20:34:46 +0000 (13:34 -0700)]
i965/fs: Limit SIMD width of various virtual opcodes to the maximum supported value.

Which is 16 or 8 in most cases.  This will make sure that 32-wide
virtual instructions get chopped up into chunks of their maximum
execution size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.
Francisco Jerez [Fri, 20 May 2016 06:44:23 +0000 (23:44 -0700)]
i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.

Only per-channel LOAD_PAYLOAD instructions can be lowered, which
should cover everything that comes in from the front-end.

LOAD_PAYLOAD instructions used to construct actual message payloads
cannot be easily lowered because they contain headers and vectors of
variable type that aren't necessarily channel-aligned -- We shouldn't
find any of them in the program at SIMD lowering time though because
they're introduced during logical send lowering.

An alternative that may be worth considering would be to re-run the
SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time
Francisco Jerez [Tue, 17 May 2016 23:27:09 +0000 (16:27 -0700)]
i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time

...on hardware lacking compressed Align16 support.  Will allow
simplifying the generator code and fixing it for SIMD32 codegen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Apply usual FPU-like execution size restrictions to MULH.
Francisco Jerez [Tue, 17 May 2016 23:43:05 +0000 (16:43 -0700)]
i965/fs: Apply usual FPU-like execution size restrictions to MULH.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.
Francisco Jerez [Tue, 17 May 2016 23:10:38 +0000 (16:10 -0700)]
i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Assert that IF instruction with embedded compare has legal exec_size.
Francisco Jerez [Tue, 17 May 2016 23:01:29 +0000 (16:01 -0700)]
i965/fs: Assert that IF instruction with embedded compare has legal exec_size.

We shouldn't encounter these right now but if we did it wouldn't be
possible for the SIMD lowering pass to split it into multiple
instructions because of its side effects on control flow, so just
assert in order to kill the program.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.
Francisco Jerez [Tue, 17 May 2016 23:00:19 +0000 (16:00 -0700)]
i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Implement workaround for IVB CMP dependency race in the SIMD lowering pass.
Francisco Jerez [Tue, 17 May 2016 22:58:04 +0000 (15:58 -0700)]
i965/fs: Implement workaround for IVB CMP dependency race in the SIMD lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Enforce common regioning restrictions by SIMD splitting.
Francisco Jerez [Fri, 20 May 2016 20:15:49 +0000 (13:15 -0700)]
i965/fs: Enforce common regioning restrictions by SIMD splitting.

This change addresses a number of hardware restrictions on the source
and destination regions and other execution controls of regular
FPU-like instructions that in some cases can be avoided by reducing
the execution size of the instruction.  Some of these restrictions
(e.g. the one about 3src instructions not supporting compression on
some hardware) are currently being worked around case by case in the
generator with ad-hoc splitting code that is buggy in several ways
(e.g. doesn't handle non-trivial execution controls which would break
SIMD32 code), but it seems cleaner to implement as many restrictions
as we can in a single lowering pass since that will allow us to
simplify some of the surrounding code considerably and also make sure
that we don't forget applying them in the future.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Enforce extended math exec size limits during SIMD lowering.
Francisco Jerez [Fri, 20 May 2016 20:14:20 +0000 (13:14 -0700)]
i965/fs: Enforce extended math exec size limits during SIMD lowering.

This teaches the SIMD lowering pass about the hardware limits on the
execution size of math instructions, which will allow simplifying the
generator code and at the same time get rid of a number of bugs in the
manual SIMD unrolling done currently that prevent SIMD32 codegen from
working.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Handle SAMPLEINFO consistently like other texturing instructions.
Francisco Jerez [Fri, 20 May 2016 07:37:37 +0000 (00:37 -0700)]
i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.

Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end.  At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>