mesa.git
7 years agoi965/vec4: split d2x conversion and data gathering from one opcode to two explicit...
Samuel Iglesias Gonsálvez [Wed, 8 Mar 2017 08:27:49 +0000 (09:27 +0100)]
i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones

When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.

Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT
Juan A. Suarez Romero [Fri, 23 Sep 2016 09:57:43 +0000 (09:57 +0000)]
i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT

In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.

v2:
- Use stride and don't need to fix dst (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/vec4: keep original type when dealing with null registers
Juan A. Suarez Romero [Mon, 12 Sep 2016 16:06:22 +0000 (16:06 +0000)]
i965/vec4: keep original type when dealing with null registers

Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.

This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.

v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/vec4: split DF instructions and later double its execsize in IVB/BYT
Samuel Iglesias Gonsálvez [Mon, 29 Aug 2016 08:10:30 +0000 (10:10 +0200)]
i965/vec4: split DF instructions and later double its execsize in IVB/BYT

We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).

v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).

v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)

v4:
- Add get_exec_type() function and use it to calculate the execution
  size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Take
  destination type as execution type where there is no valid source.
  Assert-fail if the deduced execution type is byte.  Clarify comment
  in get_lowered_simd_width().  Move SIMD width workaround outside of
  'if (...inst->size_written > REG_SIZE)' conditional block, since the
  problem should be independent of whether the amount of data written
  by the instruction is greater or lower than a GRF.  Drop redundant
  is_ivb_df definition.  Drop bogus inst->exec_size < 8 check.
  Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT
Samuel Iglesias Gonsálvez [Thu, 25 Aug 2016 14:05:24 +0000 (16:05 +0200)]
i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT

The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/fs: Get 64-bit indirect moves working on IVB.
Francisco Jerez [Thu, 9 Feb 2017 18:16:58 +0000 (10:16 -0800)]
i965/fs: Get 64-bit indirect moves working on IVB.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Use source region <1,2,0> when converting to DF.
Matt Turner [Fri, 13 Jan 2017 02:05:58 +0000 (18:05 -0800)]
i965: Use source region <1,2,0> when converting to DF.

Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT
Juan A. Suarez Romero [Wed, 3 Aug 2016 11:51:44 +0000 (11:51 +0000)]
i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT

According to the IVB and HSW PRMs:

"2.When the destination requires two registers and the sources are
 indirect, the sources must use 1x1 regioning mode."

So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.

This patch limits the SIMD width to 4 in this case.

v2:
- Fix typo (Matt).
- Fix condition (Curro)

v3:
- Add spec quote (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/fs: fix dst stride in IVB/BYT type conversions
Juan A. Suarez Romero [Fri, 20 Jan 2017 07:50:50 +0000 (08:50 +0100)]
i965/fs: fix dst stride in IVB/BYT type conversions

When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.

But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.

v2:
- Fix typo (Matt)

v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)

v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/fs: rename lower_d2x to lower_conversions
Samuel Iglesias Gonsálvez [Tue, 14 Mar 2017 07:17:36 +0000 (08:17 +0100)]
i965/fs: rename lower_d2x to lower_conversions

v2:
- Change the name to lower_conversions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoRevert "i965/fs: Don't emit SEL instructions for type-converting MOVs."
Samuel Iglesias Gonsálvez [Tue, 28 Mar 2017 04:25:13 +0000 (06:25 +0200)]
Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."

This reverts commit 7dccd38b400d3a65da20ddefe282a7bb0b7ccb58.

d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.

Then, 7dccd38b400d3a65da20ddefe282a7bb0b7ccb58 is not needed anymore.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965/fs: generalize the legalization d2x pass
Samuel Iglesias Gonsálvez [Fri, 20 Jan 2017 07:47:05 +0000 (08:47 +0100)]
i965/fs: generalize the legalization d2x pass

Generalize it to lower any unsupported narrower conversion.

v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.

v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
  when fixing the conversion.
- Add get_exec_type() function.

v4 (Curro):
- Use get_exec_type() function to get sources' type.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965: Use <0,2,1> region for scalar DF sources on IVB/BYT.
Matt Turner [Wed, 11 Jan 2017 03:33:22 +0000 (19:33 -0800)]
i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.

On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.

v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).

v3:
- Added comment explaining the reason (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/fs: clamp exec_size when an instruction has a scalar DF source
Samuel Iglesias Gonsálvez [Wed, 11 Jan 2017 07:17:57 +0000 (08:17 +0100)]
i965/fs: clamp exec_size when an instruction has a scalar DF source

Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/fs: double regioning parameters and execsize for DF in IVB/BYT
Juan A. Suarez Romero [Mon, 18 Jul 2016 07:27:56 +0000 (07:27 +0000)]
i965/fs: double regioning parameters and execsize for DF in IVB/BYT

In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.

So when we have something like:

mov(8) g2<1>DF g3<4,4,1>DF

We are not actually moving 8 doubles (our intention), but 4 doubles.

We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.

v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)

v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).

v4:
- Fix comment (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965/fs: add helper to retrieve instruction execution type
Juan A. Suarez Romero [Mon, 18 Jul 2016 07:17:39 +0000 (07:17 +0000)]
i965/fs: add helper to retrieve instruction execution type

The execution data size is the biggest type size of any instruction
operand.

We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.

v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).

v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
  calculate its size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Fix deduced
  execution type for integer vector types.  Take destination type as
  execution type where there is no valid source.  Assert-fail if the
  deduced execution type is byte.  Move into brw_ir_fs.h header for
  consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agoi965: Handle IVB DF differences in the validator.
Matt Turner [Fri, 20 Jan 2017 21:35:31 +0000 (13:35 -0800)]
i965: Handle IVB DF differences in the validator.

On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.

v2 (Sam):
- Add comments.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965/disasm: also print nibctrl in IVB for execsize=8
Iago Toral Quiroga [Fri, 22 Jul 2016 11:36:25 +0000 (13:36 +0200)]
i965/disasm: also print nibctrl in IVB for execsize=8

4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.

v2:
- Refactor NibCtrl printing (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agonir: Destination component count of shader_clock intrinsic is 2
Boyan Ding [Wed, 12 Apr 2017 13:14:22 +0000 (21:14 +0800)]
nir: Destination component count of shader_clock intrinsic is 2

This fixes the following error when using ARB_shader_clock on i965:
vec1 32 ssa_0 = intrinsic shader_clock () () ()
intrinsic store_var (ssa_0) (clock_retval) (3) /* wrmask=xy */
error: src->ssa->num_components == num_components (nir/nir_validate.c:204)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
7 years agoradeonsi: add missing initialization for userptr buffers
Nicolai Hähnle [Wed, 12 Apr 2017 15:05:56 +0000 (17:05 +0200)]
radeonsi: add missing initialization for userptr buffers

Fix the accounting for memory usage of userptr buffers, which has been wrong
forever (or at least for a long time).

Also initialize flags. Without this initialization, the sparse buffer flag
might end up being set, which leads to staging buffers being used unnecessarily
(and incorrectly) in transfers to or from userptr buffers.

This works around VM faults that occur with the radeon kernel module when
running piglit ./bin/amd_pinned_memory decrement-offset map-buffer -auto

Fixes: e077c5fe6579 ("gallium/radeon: transfers and invalidation for sparse buffers")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradv: remove the temp descriptor set infrastructure
Fredrik Höglund [Thu, 13 Apr 2017 22:27:00 +0000 (00:27 +0200)]
radv: remove the temp descriptor set infrastructure

It is no longer used.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: use push descriptors in meta
Fredrik Höglund [Thu, 13 Apr 2017 22:26:59 +0000 (00:26 +0200)]
radv: use push descriptors in meta

Use push descriptors instead of temp descriptor sets.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: add private push descriptors for meta
Fredrik Höglund [Thu, 13 Apr 2017 22:26:58 +0000 (00:26 +0200)]
radv: add private push descriptors for meta

This allows meta to use push descriptors without disturbing user
push descriptors.

radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR
in that partial updates are not supported; all descriptors used in
subsequent draw commands must be pushed at the same time.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoanv/blorp: Properly handle VK_ATTACHMENT_UNUSED
Jason Ekstrand [Thu, 6 Apr 2017 21:15:55 +0000 (14:15 -0700)]
anv/blorp: Properly handle VK_ATTACHMENT_UNUSED

The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description.  The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere.  This commit should
fix all of them.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
7 years agoanv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED
Jason Ekstrand [Fri, 7 Apr 2017 17:33:25 +0000 (10:33 -0700)]
anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
7 years agoanv/cmd_buffer: Always set up a null surface state
Jason Ekstrand [Fri, 7 Apr 2017 17:31:01 +0000 (10:31 -0700)]
anv/cmd_buffer: Always set up a null surface state

We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
7 years agoradeonsi: cope with missing disassembly
Nicolai Hähnle [Fri, 31 Mar 2017 11:03:03 +0000 (13:03 +0200)]
radeonsi: cope with missing disassembly

For robustness and testing purposes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/ddebug: dump missing members of pipe_draw_info
Nicolai Hähnle [Fri, 7 Apr 2017 14:14:52 +0000 (16:14 +0200)]
gallium/ddebug: dump missing members of pipe_draw_info

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: enable ARB_shader_viewport_layer_array
Nicolai Hähnle [Thu, 13 Apr 2017 20:16:26 +0000 (22:16 +0200)]
radeonsi: enable ARB_shader_viewport_layer_array

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agoradeonsi: handle ignored LAYER and VIEWPORT_INDEX writes
Nicolai Hähnle [Thu, 13 Apr 2017 20:14:20 +0000 (22:14 +0200)]
radeonsi: handle ignored LAYER and VIEWPORT_INDEX writes

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agost/mesa: enable ARB_shader_viewport_layer_array
Nicolai Hähnle [Thu, 13 Apr 2017 19:47:00 +0000 (21:47 +0200)]
st/mesa: enable ARB_shader_viewport_layer_array

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agotgsi: clarify TGSI_SEMANTIC_{LAYER,VIEWPORT_INDEX}
Nicolai Hähnle [Thu, 13 Apr 2017 20:13:55 +0000 (22:13 +0200)]
tgsi: clarify TGSI_SEMANTIC_{LAYER,VIEWPORT_INDEX}

Depending on pipe caps they can be writable in all vertex processing
stages, but only the output of the last stage counts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agogallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORT
Nicolai Hähnle [Thu, 13 Apr 2017 19:54:54 +0000 (21:54 +0200)]
gallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agoconfigure.ac: add --enable-sanitize option
Nicolai Hähnle [Mon, 3 Apr 2017 09:17:48 +0000 (11:17 +0200)]
configure.ac: add --enable-sanitize option

Enable code sanitizers by adding -fsanitize=$foo flags for the compiler
and linker.

In addition, this also disables checking for undefined symbols: running
the address sanitizer requires additional symbols which should be provided
by a preloaded libasan.so (preloaded for hooking into malloc & friends
globally), and the undefined symbols check gets tripped up by that.

Running the tests works normally via `make check`, but shows additional
failures with the address sanitizer due to memory leaks that seem to be
mostly leaks in the tests themselves. I believe those failures should
really be fixed. In the mean-time, you can set

export ASAN_OPTIONS=detect_leaks=0

to only check for more serious error types.

v2:
- fail reasonably when an unsupported sanitize flag is given (Eric Engestrom)

Reviewed-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoanv/cmd_buffer: Flush the VF cache at the top of all primaries
Jason Ekstrand [Fri, 31 Mar 2017 22:33:39 +0000 (15:33 -0700)]
anv/cmd_buffer: Flush the VF cache at the top of all primaries

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv/blorp: Flush the texture cache in UpdateBuffer
Jason Ekstrand [Fri, 31 Mar 2017 22:33:51 +0000 (15:33 -0700)]
anv/blorp: Flush the texture cache in UpdateBuffer

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv: Limit VkDeviceMemory objects to 2GB
Jason Ekstrand [Tue, 11 Apr 2017 15:33:19 +0000 (08:33 -0700)]
anv: Limit VkDeviceMemory objects to 2GB

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
7 years agointel/blorp: Add a blorp_emit_dynamic macro
Jason Ekstrand [Sat, 10 Sep 2016 21:15:51 +0000 (14:15 -0700)]
intel/blorp: Add a blorp_emit_dynamic macro

This makes it much easier to throw together a bit of dynamic state.  It
also automatically handles flushing so you don't accidentally forget.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
7 years agoswr: Enable MSAA in OpenSWR software renderer
Bruce Cherniak [Thu, 13 Apr 2017 22:40:11 +0000 (17:40 -0500)]
swr: Enable MSAA in OpenSWR software renderer

This patch enables multisample antialiasing in the OpenSWR software renderer.

MSAA is a proof-of-concept/work-in-progress with bug fixes and performance
on the way.  We wanted to get the changes out now to allow several customers
to begin experimenting with MSAA in a software renderer.  So as not to
impact current customers, MSAA is turned off by default - previous
functionality and performance remain intact.  It is easily enabled via
environment variables, as described below.

It has only been tested with the glx-lib winsys.  The intention is to
enable other state-trackers, both Windows and Linux and more fully support
FBOs.

There are 2 environment variables that affect behavior:

* SWR_MSAA_FORCE_ENABLE - force MSAA on, for apps that are not designed
  for MSAA... Beware, results will vary.  This is mainly for testing.

* SWR_MSAA_MAX_SAMPLE_COUNT - sets maximum supported number of
  samples (1,2,4,8,16), or 0 to disable MSAA altogether.
  (The default is currently 0.)

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
7 years agoswr: Removed unnecessary PIPE_BIND flags from swr_is_format_supported
Bruce Cherniak [Wed, 12 Apr 2017 23:53:01 +0000 (18:53 -0500)]
swr: Removed unnecessary PIPE_BIND flags from swr_is_format_supported

Removed unnecessary and probably wrong PIPE_BIND_SCANOUT and PIPE_BIND_SHARED
flags in favor of check on single PIPE_BIND_DISPLAY_TARGET flag.

Reference llvmpipe change <bee4c7718a3bd57e3d99f0913d9081cd13fe5fd>

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: Align swr_context allocation to SIMD alignment.
Bruce Cherniak [Wed, 12 Apr 2017 23:43:25 +0000 (18:43 -0500)]
swr: Align swr_context allocation to SIMD alignment.

The context now contains SIMD vectors which must be aligned (specifically
samplePositions in the rastState in the derived state).  Failure to align
can result in segv crash on unaligned memory access in vector
instructions.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: update gallium driver docs
Tim Rowley [Thu, 13 Apr 2017 18:10:18 +0000 (13:10 -0500)]
swr: update gallium driver docs

v2: add back scons section, mention additional built swr libraries

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoradv: remove irrelevant comment
Grazvydas Ignotas [Fri, 14 Apr 2017 16:54:35 +0000 (19:54 +0300)]
radv: remove irrelevant comment

A leftover from anv.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: report timestampPeriod correctly
Grazvydas Ignotas [Fri, 14 Apr 2017 17:00:26 +0000 (20:00 +0300)]
radv: report timestampPeriod correctly

The kernel returns frequency in kHz, so to convert to nanosecond
interval that Vulkan uses the dividend should be 1000000.0 and not
100000.0.

This fixes the GPU graph in DOOM and matches the amdgpu-pro blob.

Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agonir/print: add compute shader info
Rob Clark [Thu, 6 Apr 2017 15:56:23 +0000 (11:56 -0400)]
nir/print: add compute shader info

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
7 years agogallium/docs: small correction about register files for atomics
Rob Clark [Wed, 12 Apr 2017 15:47:22 +0000 (11:47 -0400)]
gallium/docs: small correction about register files for atomics

These can operate on MEMORY[], in addition to BUFFER[] and IMAGE[]

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agofreedreno: enable draw/batch reordering by default
Rob Clark [Fri, 7 Apr 2017 14:02:53 +0000 (10:02 -0400)]
freedreno: enable draw/batch reordering by default

Probably should have flipped the switch a long time ago, since it
doesn't seem to cause any problems and is a nice perf boost in a number
of cases.

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/ir3: small re-order
Rob Clark [Wed, 5 Apr 2017 20:02:36 +0000 (16:02 -0400)]
freedreno/ir3: small re-order

Small re-order of switch statement to handled op-code categories in
order.

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/ir3: move 'keeps' to block level
Rob Clark [Wed, 5 Apr 2017 00:29:53 +0000 (20:29 -0400)]
freedreno/ir3: move 'keeps' to block level

For things like SSBOs and atomics we'll want to track this at a block
level.

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/ir3: convert dynamic arrays to ralloc
Rob Clark [Wed, 5 Apr 2017 00:22:57 +0000 (20:22 -0400)]
freedreno/ir3: convert dynamic arrays to ralloc

Want to move one of these under ir3_block, so that gives a reason to
migrate the remaining malloc/realloc to ralloc.

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agoswr: add linux to scons build
George Kyriazis [Thu, 13 Apr 2017 13:44:08 +0000 (08:44 -0500)]
swr: add linux to scons build

Make swr compile for both linux and windows.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoradv: make sizes & offsets 32 bit in radv_descriptor_update_template_entry.
Bas Nieuwenhuizen [Thu, 13 Apr 2017 21:49:00 +0000 (23:49 +0200)]
radv: make sizes & offsets 32 bit in radv_descriptor_update_template_entry.

v2: Also convert the calculations.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
7 years agodocs: Update MESA_shader_integer_functions spec to version 3.
Kenneth Graunke [Thu, 13 Apr 2017 16:28:10 +0000 (09:28 -0700)]
docs: Update MESA_shader_integer_functions spec to version 3.

When publishing this spec on the OpenGL ES registry, Jon Leech noticed
that it didn't actually mention what the ES dependencies and
interactions were.  I looked at extensions_table.h and noted that we
expose it in ES 3.0 contexts, and he added the obvious spec texts.

The updated copy also contains our official extension number.

https://github.com/KhronosGroup/OpenGL-Registry/issues/3

Acked-by: Matt Turner <mattst88@gmail.com>
7 years agoradv: Set descriptor set limits.
Bas Nieuwenhuizen [Thu, 13 Apr 2017 20:34:33 +0000 (22:34 +0200)]
radv: Set descriptor set limits.

Properly and with comments this time.

Signed-off-by: Bas Nieuwenhuizen <bansi@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Increase integer sizes in descriptor sets.
Bas Nieuwenhuizen [Thu, 13 Apr 2017 20:18:35 +0000 (22:18 +0200)]
radv: Increase integer sizes in descriptor sets.

Needed if we want to allow them taking more than 64 KiB. The calculations
of these already used 32 bits.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: support S8_UINT as a depth/stencil format.
Dave Airlie [Thu, 13 Apr 2017 19:34:26 +0000 (05:34 +1000)]
radv: support S8_UINT as a depth/stencil format.

This enables a bunch of NotSupported CTS tests.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: bump maxGeometryShaderInvocations.
Dave Airlie [Thu, 13 Apr 2017 19:28:52 +0000 (05:28 +1000)]
radv: bump maxGeometryShaderInvocations.

This bumps it to the same level as amdgpu-pro, it also
moves a bunch of dEQP-VK.geometry.instanced.* from
NotSupported to Pass.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agost/nine: Fix support for ps 1.4 dw and dz modifiers
Axel Davy [Sun, 26 Mar 2017 20:57:15 +0000 (22:57 +0200)]
st/nine: Fix support for ps 1.4 dw and dz modifiers

RCP was used incorrectly to support NINED3DSPSM_DW and
NINED3DSPSM_DZ. src.x was used as input instead of src.w
or src.z.

Fixes: https://github.com/iXit/Mesa-3D/issues/271
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
7 years agoclover: Add missing include to compat header
Jan Vesely [Thu, 13 Apr 2017 16:20:21 +0000 (12:20 -0400)]
clover: Add missing include to compat header

Fixes build failure with LLVM 4

Fixes: a981e68c26dc4079a335101da0033185030207f6
(clover: Fix build against clang SVN >= r299965)

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agogallium/radeon: never use staging buffers with AMD_pinned_memory
Nicolai Hähnle [Wed, 12 Apr 2017 10:41:05 +0000 (12:41 +0200)]
gallium/radeon: never use staging buffers with AMD_pinned_memory

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agoradeonsi: fix gl_BaseVertex in non-indexed draws
Nicolai Hähnle [Wed, 12 Apr 2017 09:01:19 +0000 (11:01 +0200)]
radeonsi: fix gl_BaseVertex in non-indexed draws

gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the
way they're implemented, the VGT always generates indices starting at 0,
and the VS prolog adds the start index.

There's a VGT_INDX_OFFSET register which causes the VGT to start at a
driver-defined index. However, this register cannot be written from
indirect draws.

So fix this unlikely case by setting a bit to tell the VS whether the
draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly
when used.

Fixes a bug in
KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: provide VS_STATE input to all VS variants
Nicolai Hähnle [Wed, 12 Apr 2017 08:46:22 +0000 (10:46 +0200)]
radeonsi: provide VS_STATE input to all VS variants

v2: fix incorrect change in get_tcs_out_patch_stride

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: change the bit-packing of LS out/TCS in data
Nicolai Hähnle [Wed, 12 Apr 2017 08:16:07 +0000 (10:16 +0200)]
radeonsi: change the bit-packing of LS out/TCS in data

Avoid conflicts when merging various VS state bits.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: emit VS_STATE register explicitly from si_draw_vbo
Nicolai Hähnle [Wed, 12 Apr 2017 08:00:18 +0000 (10:00 +0200)]
radeonsi: emit VS_STATE register explicitly from si_draw_vbo

We will merge other derived state information into this register.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: extract derived tess state emit to higher level
Nicolai Hähnle [Wed, 12 Apr 2017 07:40:28 +0000 (09:40 +0200)]
radeonsi: extract derived tess state emit to higher level

Especially with subsequent changes, this makes it easier to see the
sequence of state emits at the higher level.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: drop support for TGSI_SEMANTIC_VERTEXID_NOBASE
Nicolai Hähnle [Wed, 12 Apr 2017 08:58:37 +0000 (10:58 +0200)]
radeonsi: drop support for TGSI_SEMANTIC_VERTEXID_NOBASE

It is unused.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradv: Add more trace points.
Bas Nieuwenhuizen [Wed, 12 Apr 2017 22:06:48 +0000 (00:06 +0200)]
radv: Add more trace points.

Most trace points happen after an operation, so add a trace point
at the start of the command buffer.

Furthermore, add one after a CmdUpdateBuffer using CP_DMA as that
didn't emit one yet.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Ignore CmdUpdateBuffer with size 0.
Bas Nieuwenhuizen [Wed, 12 Apr 2017 22:04:23 +0000 (00:04 +0200)]
radv: Ignore CmdUpdateBuffer with size 0.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Enable query inheritance.
Bas Nieuwenhuizen [Wed, 12 Apr 2017 21:17:14 +0000 (23:17 +0200)]
radv: Enable query inheritance.

timestamp and pipeline_statistics only do something on begin & end,
so they don't need any action.

Occlusion queries only do something to enable/disable and that
register is set nowhere else so that doesn't need extra support either.
(We technically should fix it to update the reg with the number of
 samples, but that hasn't happened yet, so we only change it to
 enable/disable counting)

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: enable variableMultisampleRate.
Bas Nieuwenhuizen [Wed, 12 Apr 2017 21:29:58 +0000 (23:29 +0200)]
radv: enable variableMultisampleRate.

This is only relevant with 0 attachments. In that case we do nothing
on subpass switch already, and the pipeline is the authoritative
source of the number of samples, so this shouldn't change anything.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agogallium/hud: set the dump file streams to line buffered
Edmondo Tommasina [Wed, 5 Apr 2017 19:03:55 +0000 (21:03 +0200)]
gallium/hud: set the dump file streams to line buffered

Flush the HUD value streams to the dump files after every newline.

v2: check that fopen succeeded  (Julien)

Reviewed-and-Tested-by: Julien Isorce <jisorce@oblong.com>
7 years agoradv: fix stencil regression since new addrlib import
Dave Airlie [Thu, 13 Apr 2017 04:36:26 +0000 (14:36 +1000)]
radv: fix stencil regression since new addrlib import

The addrlib import meant we'd return after we attempted
to setup the no stencil bits for an S8_UINT, now we break
and use the stencil level info when creating stencil DB
info.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: allocate thin textures as linear.
Dave Airlie [Thu, 13 Apr 2017 04:12:28 +0000 (14:12 +1000)]
radv: allocate thin textures as linear.

This is ported from radeonsi, and avoids the bug in the
addrlib code. This should probably be something addrlib
does for us, but for now this fixes the regression without
changing addrlib and aligns us with radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoi965: add missing ir_unop_*/ir_binop_* in visit_leave()
Samuel Pitoiset [Tue, 11 Apr 2017 12:50:39 +0000 (14:50 +0200)]
i965: add missing ir_unop_*/ir_binop_* in visit_leave()

Fixes the following Clang warnings.

brw_fs_channel_expressions.cpp:219:12: warning: enumeration values 'ir_unop_ballot', 'ir_unop_read_first_invocation', and 'ir_binop_read_invocation' not handled in switch [-Wswitch]
   switch (expr->operation) {
           ^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agost/mesa: fix wrong comparison in update_framebuffer_state()
Samuel Pitoiset [Tue, 11 Apr 2017 12:19:19 +0000 (14:19 +0200)]
st/mesa: fix wrong comparison in update_framebuffer_state()

state_tracker/st_atom_framebuffer.c:208:27: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
   if (framebuffer->width == UINT_MAX)
       ~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~
state_tracker/st_atom_framebuffer.c:210:28: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
   if (framebuffer->height == UINT_MAX)
       ~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~
2 warnings generated.

Fixes: eb0fd0e5f86 ("gallium: decrease the size of pipe_framebuffer_state - 96 -> 80 bytes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agoradeon: fix duplicate 'const' specifier
Samuel Pitoiset [Tue, 11 Apr 2017 12:55:12 +0000 (14:55 +0200)]
radeon: fix duplicate 'const' specifier

Fixes the following Clang warning.

In file included from radeon_debug.c:32:
./radeon_common_context.h:500:19: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
extern const char const *radeonVendorString;

v2: - do not remove the duplicate 'const' qualifier, fix it

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
7 years agosvga: remove unused vmw_dri1_intersect_src_bbox()
Samuel Pitoiset [Tue, 11 Apr 2017 12:33:13 +0000 (14:33 +0200)]
svga: remove unused vmw_dri1_intersect_src_bbox()

Fixes the following Clang warning.

vmw_screen_dri.c:130:1: warning: unused function 'vmw_dri1_intersect_src_bbox' [-Wunused-function]
vmw_dri1_intersect_src_bbox(struct drm_clip_rect *dst,
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agollvmpipe: remove unused subpixel_snap() and fixed_to_float()
Samuel Pitoiset [Tue, 11 Apr 2017 12:30:42 +0000 (14:30 +0200)]
llvmpipe: remove unused subpixel_snap() and fixed_to_float()

Fixes the following Clang warnings.

lp_setup_tri.c:55:1: warning: unused function 'subpixel_snap' [-Wunused-function]
subpixel_snap(float a)
^
lp_setup_tri.c:61:1: warning: unused function 'fixed_to_float' [-Wunused-function]
fixed_to_float(int a)
^

v2: - do not remove subpixel_snap() (use !PIPE_ARCH_SSE instead)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agosoftpipe: remove unused sp_exec_fragment_shader()
Samuel Pitoiset [Tue, 11 Apr 2017 12:42:39 +0000 (14:42 +0200)]
softpipe: remove unused sp_exec_fragment_shader()

Fixes the following Clang warning.

sp_fs_exec.c:56:1: warning: unused function 'sp_exec_fragment_shader' [-Wunused-function]
sp_exec_fragment_shader(const struct sp_fragment_shader_variant *var)
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agosoftpipe: remove unused quad_shade_stage()
Samuel Pitoiset [Tue, 11 Apr 2017 12:32:19 +0000 (14:32 +0200)]
softpipe: remove unused quad_shade_stage()

Fixes the following Clang warning.

sp_quad_fs.c:60:1: warning: unused function 'quad_shade_stage' [-Wunused-function]
quad_shade_stage(struct quad_stage *qs)
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agosoftpipe: remove unused get_texel_quad_2d()
Samuel Pitoiset [Tue, 11 Apr 2017 12:29:35 +0000 (14:29 +0200)]
softpipe: remove unused get_texel_quad_2d()

Fixes the following Clang warning.

sp_tex_sample.c:802:1: warning: unused function 'get_texel_quad_2d' [-Wunused-function]
get_texel_quad_2d(const struct sp_sampler_view *sp_sview,
^
  CC       sp_tile_cache.lo
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agotrace: remove some unused trace_dump_tag*() functions
Samuel Pitoiset [Tue, 11 Apr 2017 12:13:12 +0000 (14:13 +0200)]
trace: remove some unused trace_dump_tag*() functions

Fixes the following Clang warnings.

tr_dump.c:137:1: warning: unused function 'trace_dump_tag' [-Wunused-function]
trace_dump_tag(const char *name)
^
tr_dump.c:168:1: warning: unused function 'trace_dump_tag_begin2' [-Wunused-function]
trace_dump_tag_begin2(const char *name,
^
tr_dump.c:187:1: warning: unused function 'trace_dump_tag_begin3' [-Wunused-function]
trace_dump_tag_begin3(const char *name,
^
  CC       tr_texture.lo
3 warnings generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agodraw: remove unused wideline_stage()
Samuel Pitoiset [Tue, 11 Apr 2017 12:34:22 +0000 (14:34 +0200)]
draw: remove unused wideline_stage()

Fixes the following Clang warning.

draw/draw_pipe_wide_line.c:48:38: warning: unused function 'wideline_stage' [-Wunused-function]
static inline struct wideline_stage *wideline_stage( struct draw_stage *stage )
                                     ^
1 warning generated.

v2: - remove commented code (Roland Scheidegger)
v3: - remove half_line_width in the struct

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agodraw: remove unused overflow()
Samuel Pitoiset [Tue, 11 Apr 2017 12:10:09 +0000 (14:10 +0200)]
draw: remove unused overflow()

Fixes the following Clang warning.

draw/draw_pipe_vbuf.c:102:1: warning: unused function 'overflow' [-Wunused-function]
overflow( void *map, void *ptr, unsigned bytes, unsigned bufsz )
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agomesa: remove some unused functions in the perf monitor area
Samuel Pitoiset [Tue, 11 Apr 2017 12:05:17 +0000 (14:05 +0200)]
mesa: remove some unused functions in the perf monitor area

Fixes the following Clang warnings.

main/performance_monitor.c:157:1: warning: unused function 'index_to_queryid' [-Wunused-function]
index_to_queryid(GLuint index)
^
main/performance_monitor.c:163:1: warning: unused function 'queryid_valid' [-Wunused-function]
queryid_valid(const struct gl_context *ctx, GLuint queryid)
^
main/performance_monitor.c:169:1: warning: unused function 'counterid_to_index' [-Wunused-function]
counterid_to_index(GLuint counterid)
^
3 warnings generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agomesa: remove unused clamp_float_to_uint() and clamp_half_to_uint()
Samuel Pitoiset [Tue, 11 Apr 2017 12:03:00 +0000 (14:03 +0200)]
mesa: remove unused clamp_float_to_uint() and clamp_half_to_uint()

Fixes the following Clang warnings.

main/pack.c:470:1: warning: unused function 'clamp_float_to_uint' [-Wunused-function]
clamp_float_to_uint(GLfloat f)
^
main/pack.c:477:1: warning: unused function 'clamp_half_to_uint' [-Wunused-function]
clamp_half_to_uint(GLhalfARB h)
^
2 warnings generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agomesa: remove unused _mesa_unmarshal_BindBufferBase()
Samuel Pitoiset [Tue, 11 Apr 2017 12:01:51 +0000 (14:01 +0200)]
mesa: remove unused _mesa_unmarshal_BindBufferBase()

Fixes the following Clang warning.

main/marshal.c:209:1: warning: unused function '_mesa_unmarshal_BindBufferBase' [-Wunused-function]
_mesa_unmarshal_BindBufferBase(struct gl_context *ctx, const struct marshal_cmd_BindBufferBase *cmd)
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agovirgl: add missing PIPE_CAP_DOUBLES
Samuel Pitoiset [Tue, 11 Apr 2017 11:54:44 +0000 (13:54 +0200)]
virgl: add missing PIPE_CAP_DOUBLES

Fixes the following Clang warning.

virgl_screen.c:60:12: warning: enumeration value 'PIPE_CAP_DOUBLES' not handled in switch [-Wswitch]
   switch (param) {
           ^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agoglsl: simplify apply_image_qualifier_to_variable()
Samuel Pitoiset [Wed, 12 Apr 2017 12:36:32 +0000 (14:36 +0200)]
glsl: simplify apply_image_qualifier_to_variable()

This removes one level of indentation and will improve readability
for bindless images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoglsl: add validate_fragment_flat_interpolation_input()
Samuel Pitoiset [Wed, 12 Apr 2017 10:47:47 +0000 (12:47 +0200)]
glsl: add validate_fragment_flat_interpolation_input()

Requested by Timothy Arceri.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agonvc0: Enable ARB_shader_ballot on Kepler+
Boyan Ding [Mon, 10 Apr 2017 14:56:05 +0000 (22:56 +0800)]
nvc0: Enable ARB_shader_ballot on Kepler+

readInvocationARB() and readFirstInvocationARB() need SHFL.IDX
instruction which is introduced in Kepler.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
Boyan Ding [Mon, 10 Apr 2017 14:56:04 +0000 (22:56 +0800)]
nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*

v2: Check if each channel is masked in TGSI_OPCODE_BALLOT (Ilia Mirkin)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*
Boyan Ding [Mon, 10 Apr 2017 14:56:03 +0000 (22:56 +0800)]
nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: Add SV_LANEMASK_* system values.
Boyan Ding [Mon, 10 Apr 2017 14:56:02 +0000 (22:56 +0800)]
nvc0/ir: Add SV_LANEMASK_* system values.

v2: Add name strings in nv50_ir_print.cpp (Ilia Mirkin)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: Allow 0/1 immediate value as source of OP_VOTE
Boyan Ding [Mon, 10 Apr 2017 14:56:01 +0000 (22:56 +0800)]
nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE

Implementation of readFirstInvocationARB() on nvidia hardware needs a
ballotARB(true) used to decide the first active thread. This expressed
in gm107 asm as (supposing output is $r0):
vote any $r0 0x1 0x1

To model the always true input, which corresponds to the second 0x1
above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and
"not 0x1" in the src field respectively.

v2: Make sure that asImm() is not NULL (Samuel Pitoiset)

v3: (Ilia Mirkin)
Make the handling more symmetric with predicate version in gm107
Use i->getSrc(s)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agogk110/ir: Emit OP_SHFL
Boyan Ding [Mon, 10 Apr 2017 14:56:00 +0000 (22:56 +0800)]
gk110/ir: Emit OP_SHFL

v2: Make sure that asImm() is not NULL (Samuel Pitoiset)

v3: Check the range of immediate in OP_SHFL (Ilia Mirkin)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: Emit OP_SHFL
Boyan Ding [Mon, 10 Apr 2017 14:55:59 +0000 (22:55 +0800)]
nvc0/ir: Emit OP_SHFL

v2: (Samuel Pitoiset)
Add an assertion to check if the target is Kepler
Make sure that asImm() is not NULL

v3: (Ilia Mirkin)
Check the range of immediate value of OP_SHFL
Use the new setPDSTL API

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: Properly handle a "split form" of predicate destination
Boyan Ding [Mon, 10 Apr 2017 14:55:58 +0000 (22:55 +0800)]
nvc0/ir: Properly handle a "split form" of predicate destination

GF100's ISA encoding has a weird form of predicate destination where its
3 bits are split across whole the instruction. Use a dedicated setPDSTL
function instead of original defId which is incorrect in this case.

v2: (Ilia Mirkin)
Change API of setPDSTL() to handle cases of no output
Fix setting of the highest bit in setPDSTL()

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agogm107/ir: Emit third src 'bound' and optional predicate output of SHFL
Boyan Ding [Mon, 10 Apr 2017 14:55:57 +0000 (22:55 +0800)]
gm107/ir: Emit third src 'bound' and optional predicate output of SHFL

v2: Emit the original hard-coded 0x1c03 when OP_SHFL is used in gm107's
    lowering (Samuel Pitoiset)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agoclover: Fix build against clang SVN >= r299965
Michel Dänzer [Wed, 12 Apr 2017 08:17:34 +0000 (17:17 +0900)]
clover: Fix build against clang SVN >= r299965

clang::LangAS::Offset is gone, the behaviour is as if it was 0.

v2: Introduce and use clover::llvm::compat::lang_as_offset (Francisco
    Jerez)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>