mesa.git
6 years agofreedreno: add non-draw batches for compute/blit
Rob Clark [Fri, 24 Nov 2017 15:37:22 +0000 (10:37 -0500)]
freedreno: add non-draw batches for compute/blit

Get rid of "gmem" (ie. tiling) ringbuffer, and just emit setup commands
directly to "draw" ringbuffer for compute (and in future for blits not
using the 3d pipe).  This way we can have a simple flat cmdstream buffer
and bypass setup related to 3d pipe.

Signed-off-by: Rob Clark <robdclark@gmail.com>
6 years agofreedreno: track staging and shadow perf ctrs for the HUD
Rob Clark [Tue, 21 Nov 2017 18:20:53 +0000 (13:20 -0500)]
freedreno: track staging and shadow perf ctrs for the HUD

Signed-off-by: Rob Clark <robdclark@gmail.com>
6 years agofreedreno: staging upload transfers
Rob Clark [Mon, 20 Nov 2017 20:34:40 +0000 (15:34 -0500)]
freedreno: staging upload transfers

In the busy && !needs_flush case, we can support a DISCARD_RANGE upload
using a staging buffer.  This is a bit different from the case of mid-
batch uploads which require us to shadow the whole resource (because
later draws in an earlier tile happen before earlier draws in a later
tile).

Signed-off-by: Rob Clark <robdclark@gmail.com>
6 years agofreedreno: update generated headers
Rob Clark [Sat, 25 Nov 2017 19:10:34 +0000 (14:10 -0500)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robdclark@gmail.com>
6 years agoanv: Remove unused variable.
Bas Nieuwenhuizen [Sat, 16 Dec 2017 21:02:11 +0000 (22:02 +0100)]
anv: Remove unused variable.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoradeonsi: don't call force_dcc_off for buffers
Marek Olšák [Tue, 12 Dec 2017 21:21:13 +0000 (22:21 +0100)]
radeonsi: don't call force_dcc_off for buffers

This was undefined yet harmless behavior in LLVM.
Not anymore - it causes a hang now.

Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
6 years agoisl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.
Kenneth Graunke [Fri, 15 Dec 2017 00:17:45 +0000 (16:17 -0800)]
isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.

According to the RENDER_SURFACE_STATE internal documentation, the
R32G32B32_FLOAT restriction is marked "IVB" only.  We choose to apply
it to Ivybridge and Baytrail, but not Haswell.

Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell.

Changes these tests from crashing to skipping on Haswell:
- KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f
- KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoradeon/uvd: add and manage render picture list
Boyuan Zhang [Fri, 15 Dec 2017 16:23:25 +0000 (11:23 -0500)]
radeon/uvd: add and manage render picture list

Create a list in decoder to store all render picture buffer pointers that
currently being used in reference picture lists.

During get message buffer call, check each pointer in render_pic_list[]
within given pic->ref[] list, remove pointer that no longer being used by
pic->ref[]. Then add current render surface pointer to the render_pic_list[]
and assign the associated index to result.curr_idx.

As a result, result.curr_idx will have the correct index to represent the
current render picture, instead of the previous increamenting values.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agoradeon/vcn: add and manage render picture list
Boyuan Zhang [Fri, 15 Dec 2017 16:17:32 +0000 (11:17 -0500)]
radeon/vcn: add and manage render picture list

Create a list in decoder to store all render picture buffer pointers that
currently being used in reference picture lists.

During get message buffer call, check each pointer in render_pic_list[]
within given pic->ref[] list, remove pointer that no longer being used by
pic->ref[]. Then add current render surface pointer to the render_pic_list[]
and assign the associated index to result.curr_idx.

As a result, result.curr_idx will have the correct index to represent the
current render picture, instead of the previous increamenting values.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agovl: remove is idr flag
Boyuan Zhang [Thu, 7 Dec 2017 21:13:51 +0000 (16:13 -0500)]
vl: remove is idr flag

Remove is_idr flag since not being used anymore.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agost/va: directly use idr pic flag
Boyuan Zhang [Fri, 8 Dec 2017 23:22:25 +0000 (18:22 -0500)]
st/va: directly use idr pic flag

Remove is_idr flag, and use idr_pic_flag provided by vaapi directly

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agoradeon/vce: determine idr by pic type
Boyuan Zhang [Thu, 7 Dec 2017 21:10:13 +0000 (16:10 -0500)]
radeon/vce: determine idr by pic type

Vaapi encode interface provides idr frame flags, where omx interface doesn't.
Therefore, change to use picture type to determine idr frame, which will
work for both interfaces.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
6 years agoradeon/vcn: determine idr by pic type
Boyuan Zhang [Thu, 30 Nov 2017 16:58:32 +0000 (11:58 -0500)]
radeon/vcn: determine idr by pic type

Vaapi encode interface provides idr frame flags, where omx interface doesn't.
Therefore, change to use picture type to determine idr frame, which will
work for both interfaces.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agoutil: scons: wire up the sha1 test
Emil Velikov [Thu, 14 Dec 2017 17:20:30 +0000 (17:20 +0000)]
util: scons: wire up the sha1 test

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
6 years agoswr/rast: Move more RTAI handling out of binner
Tim Rowley [Thu, 14 Dec 2017 19:49:56 +0000 (13:49 -0600)]
swr/rast: Move more RTAI handling out of binner

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: EXTRACT2 changed from vextract/vinsert to vshuffle
Tim Rowley [Thu, 14 Dec 2017 19:39:29 +0000 (13:39 -0600)]
swr/rast: EXTRACT2 changed from vextract/vinsert to vshuffle

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Fix cache of API thread event manager
Tim Rowley [Wed, 13 Dec 2017 23:52:52 +0000 (17:52 -0600)]
swr/rast: Fix cache of API thread event manager

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Replace VPSRL with LSHR
Tim Rowley [Tue, 12 Dec 2017 20:23:50 +0000 (14:23 -0600)]
swr/rast: Replace VPSRL with LSHR

Replace use of x86 intrinsic with general llvm IR instruction.

Generates the same final assembly.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Rework thread binding parameters for machine partitioning
Tim Rowley [Mon, 11 Dec 2017 23:45:58 +0000 (17:45 -0600)]
swr/rast: Rework thread binding parameters for machine partitioning

Add BASE_NUMA_NODE, BASE_CORE, BASE_THREAD parameters to
SwrCreateContext.

Add optional SWR_API_THREADING_INFO parameter to SwrCreateContext to
control reservation of API threads.

Add SwrBindApiThread() function to allow binding of API threads to
reserved HW threads.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Pull of RTAI gather & offset out of clip/bin code
Tim Rowley [Mon, 11 Dec 2017 21:51:46 +0000 (15:51 -0600)]
swr/rast: Pull of RTAI gather & offset out of clip/bin code

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Remove no-op VBROADCAST of vID
Tim Rowley [Mon, 11 Dec 2017 14:38:46 +0000 (08:38 -0600)]
swr/rast: Remove no-op VBROADCAST of vID

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: SIMD16 Fetch - Fully widen 32-bit integer vertex components
Tim Rowley [Mon, 11 Dec 2017 05:54:30 +0000 (23:54 -0600)]
swr/rast: SIMD16 Fetch - Fully widen 32-bit integer vertex components

Also widen the 16-bit a 8-bit integer vertex component gathers to SIMD16.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Replace INSERT2 vextract/vinsert with JOIN2 vshuffle
Tim Rowley [Fri, 8 Dec 2017 23:33:23 +0000 (17:33 -0600)]
swr/rast: Replace INSERT2 vextract/vinsert with JOIN2 vshuffle

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: SIMD16 Fetch - Fully widen 16-bit float vertex components
Tim Rowley [Fri, 8 Dec 2017 19:59:19 +0000 (13:59 -0600)]
swr/rast: SIMD16 Fetch - Fully widen 16-bit float vertex components

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: SIMD16 Fetch - Fully widen 32-bit float vertex components
Tim Rowley [Fri, 8 Dec 2017 00:37:07 +0000 (18:37 -0600)]
swr/rast: SIMD16 Fetch - Fully widen 32-bit float vertex components

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Pass prim to ClipSimd
Tim Rowley [Thu, 7 Dec 2017 23:54:40 +0000 (17:54 -0600)]
swr/rast: Pass prim to ClipSimd

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Pull most of the VPAI manipulation out of the binner/clipper
Tim Rowley [Thu, 7 Dec 2017 17:59:45 +0000 (11:59 -0600)]
swr/rast: Pull most of the VPAI manipulation out of the binner/clipper

Move out of binner/clipper; hand them down from the frontend code instead.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Move GatherScissors to header
Tim Rowley [Wed, 6 Dec 2017 18:07:59 +0000 (12:07 -0600)]
swr/rast: Move GatherScissors to header

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Rewrite Shuffle8bpcGatherd using shuffle
Tim Rowley [Wed, 6 Dec 2017 16:37:41 +0000 (10:37 -0600)]
swr/rast: Rewrite Shuffle8bpcGatherd using shuffle

Ease future code maintenance, prepare for folding simd8 and simd16 versions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Convert gather masks to Nx1bit
Tim Rowley [Mon, 4 Dec 2017 21:16:13 +0000 (15:16 -0600)]
swr/rast: Convert gather masks to Nx1bit

Simplifies calling code, gets gather function interface closer to llvm's
masked_gather.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: WIP - Widen fetch shader to SIMD16
Tim Rowley [Mon, 4 Dec 2017 00:49:29 +0000 (18:49 -0600)]
swr/rast: WIP - Widen fetch shader to SIMD16

Widen vertex gather/storage to SIMD16 for all component types.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Corrections to multi-scissor handling
Tim Rowley [Wed, 29 Nov 2017 21:14:20 +0000 (15:14 -0600)]
swr/rast: Corrections to multi-scissor handling

binner's GatherScissors() will be turned into a real gather in the not
too distant future.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Binner fixes for viewport index offset handling
Tim Rowley [Wed, 29 Nov 2017 16:46:49 +0000 (10:46 -0600)]
swr/rast: Binner fixes for viewport index offset handling

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Remove unneeded copy of gather mask
Tim Rowley [Tue, 21 Nov 2017 17:05:08 +0000 (11:05 -0600)]
swr/rast: Remove unneeded copy of gather mask

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoi965: Allow old begin/end queryobj for gen4/5 with HW contexts
Chris Wilson [Thu, 23 Nov 2017 09:57:08 +0000 (09:57 +0000)]
i965: Allow old begin/end queryobj for gen4/5 with HW contexts

Since we have HW contexts on gen4/5, we could take advantage of them, as
done for gen6+ in commit e32cd5ffbb72 ("i965: Rely on hardware contexts
for query objects on Gen6+."), to only emit a pair of counters at
begin/end queryobj, rather than around every primitive. However, to keep
queryobj working in the meantime as we bringup support for HW ctx on
gen4/5, we can keep using the existing code.

References: e32cd5ffbb72 ("i965: Rely on hardware contexts for query objects on Gen6+.")
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agofreedreno: use u_transfer_helper
Rob Clark [Mon, 4 Dec 2017 14:15:27 +0000 (09:15 -0500)]
freedreno: use u_transfer_helper

Signed-off-by: Rob Clark <robdclark@gmail.com>
6 years agogallium/util: add u_transfer_helper
Rob Clark [Tue, 28 Nov 2017 15:47:06 +0000 (10:47 -0500)]
gallium/util: add u_transfer_helper

Add a new helper that drivers can use to emulate various things that
need special handling in particular in transfer_map:

 1) z32_s8x24.. gl/gallium treats this as a single buffer with depth
    and stencil interleaved but hardware frequently treats this as
    separate z32 and s8 buffers.  Special pack/unpack handling is
    needed in transfer_map/unmap to pack/unpack the exposed buffer

 2) fake RGTC.. GPUs designed with GLES in mind, but which can other-
    wise do GL3, if native RGTC is not supported it can be emulated
    by converting to uncompressed internally, but needs pack/unpack
    in transfer_map/unmap

 3) MSAA resolves in the transfer_map() case

v2: add MSAA resolve based on Eric's "gallium: Add helpers for MSAA
    resolves in pipe_transfer_map()/unmap()." patch; avoid wrapping
    pipe_resource, to make it possible for drivers to use both this
    and threaded_context.

Signed-off-by: Rob Clark <robdclark@gmail.com>
6 years agoi965: enable EXT_disjoint_timer_query extension
Tapani Pälli [Thu, 14 Dec 2017 11:53:10 +0000 (13:53 +0200)]
i965: enable EXT_disjoint_timer_query extension

Following dEQP cases pass:
   dEQP-EGL.functional.get_proc_address.extension.gl_ext_disjoint_timer_query
   dEQP-EGL.functional.client_extensions.disjoint

Piglit test 'ext_disjoint_timer_query-simple' passes with these changes.

No changes/regression observed in Intel CI.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: GL_EXT_disjoint_timer_query extension API bits
Tapani Pälli [Tue, 12 Dec 2017 12:46:13 +0000 (14:46 +0200)]
mesa: GL_EXT_disjoint_timer_query extension API bits

Patch adds GL_GPU_DISJOINT_EXT and enables to use timer queries when
EXT_disjoint_timer_query is enabled.

v2: enable extension only when EXT_disjoint_timer_query set

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoglapi: add GL_EXT_disjoint_timer_query
Tapani Pälli [Mon, 20 Nov 2017 06:36:52 +0000 (08:36 +0200)]
glapi: add GL_EXT_disjoint_timer_query

Most entrypoints already available via other extensions like
GL_EXT_occlusion_query_boolean, GL_EXT_timer_query.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: add DisjointOperation to gl_shared_state
Tapani Pälli [Mon, 20 Nov 2017 06:31:40 +0000 (08:31 +0200)]
mesa: add DisjointOperation to gl_shared_state

This state will be used by EXT_disjoint_timer_query. As first
usage, patch sets DisjointOperation true when gpu reset happens.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agobroadcom/vc5: Fix a typo in memcmp for sig unpack checking.
Eric Anholt [Thu, 14 Dec 2017 17:41:16 +0000 (09:41 -0800)]
broadcom/vc5: Fix a typo in memcmp for sig unpack checking.

This shockingly ended up working out, because only the first byte of *sig
is used and (sizeof(*sig) != 0) == 1.  Fixes a compiler warning.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=104183
6 years agobroadcom/vc5: Enable NIR txd lowering on all txd instructions.
Eric Anholt [Wed, 22 Nov 2017 00:33:29 +0000 (16:33 -0800)]
broadcom/vc5: Enable NIR txd lowering on all txd instructions.

Fixes almost all of piglit's arb_shader_texture_lod grad tests, except for
the base -texgrad/texgradcube ones which fail on what appear to be
precision problems.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agonir: Add a new lowering option to lower all txd to txl.
Eric Anholt [Wed, 22 Nov 2017 00:21:36 +0000 (16:21 -0800)]
nir: Add a new lowering option to lower all txd to txl.

VC5 requires that all txd are lowered in the shader.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agonir: Fix interaction of GL_CLAMP lowering with texture offsets.
Eric Anholt [Tue, 21 Nov 2017 21:42:08 +0000 (13:42 -0800)]
nir: Fix interaction of GL_CLAMP lowering with texture offsets.

We want the clamping of the coordinate to apply after the offset, so we
need to do math to lower the offset out of the instruction.  Fixes texwrap
offset cases for GL_CLAMP with GL_NEAREST on vc5.

Note: I moved the get_texture_size() verbatim, so that it was defined
before use.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agobroadcom/vc5: Fix shader input/outputs for gallium's new NIR linking.
Eric Anholt [Wed, 6 Dec 2017 19:30:02 +0000 (11:30 -0800)]
broadcom/vc5: Fix shader input/outputs for gallium's new NIR linking.

6 years agogallivm: implement accurate corner behavior for textureGather with cube maps
Roland Scheidegger [Wed, 13 Dec 2017 02:33:07 +0000 (03:33 +0100)]
gallivm: implement accurate corner behavior for textureGather with cube maps

The spec says the missing texel (when we wrap around both x and y axis)
should be synthesized as the average of the 3 other texels. For bilinear
filtering however we instead adjusted the filter weights (because, while
the complexity looks similar, there would be 4 times as many color values
to fix up than weights). Obviously this could not work for gather (hence
accurate corner filtering was disabled with gather).
Implement this by just doing it as the spec implies - calculate the 4th
texel as the average of the other 3. With gather of course there's only
one color to worry about, so it's not all that many instructions neither
(albeit surely the whole cube map filtering is hilariously complex).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agogallivm: fix an issue with NaNs with seamless cube filtering
Roland Scheidegger [Wed, 13 Dec 2017 02:33:21 +0000 (03:33 +0100)]
gallivm: fix an issue with NaNs with seamless cube filtering

Cube texture wrapping is a bit special since the values (post face
projection) always are within [0,1], so we took advantage of that and
omitted some clamps.
However, we can still get NaNs (either because the coords already had NaNs,
or the face projection generated them), and in fact we didn't handle them
quite safely. I've seen -INT_MAX + 1 been propagated through as the final int
coord value, albeit I didn't observe a crash. (Not quite a coincidence, since
any stride mul with -INT_MAX or -INT_MAX+1 will turn up as a small positive
number - nevertheless, I'd rather not try my luck, I'm not entirely sure it
can't really turn up negative neither due to seamless coord swapping, plus
ifloor of a NaN is not guaranteed to return -INT_MAX by any standard. And
we kill off NaNs similarly with ordinary texture wrapping too.)
So kill off the NaNs by using the common max against zero method.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agointel/tools: Convert aubinator over to the common framework
Jason Ekstrand [Wed, 13 Dec 2017 19:51:01 +0000 (11:51 -0800)]
intel/tools: Convert aubinator over to the common framework

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch-decoder: Decode registers
Jason Ekstrand [Wed, 13 Dec 2017 18:17:40 +0000 (10:17 -0800)]
intel/batch-decoder: Decode registers

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch-decoder: Decode dynamic state
Jason Ekstrand [Wed, 13 Dec 2017 18:16:46 +0000 (10:16 -0800)]
intel/batch-decoder: Decode dynamic state

Unfortunately, in aubinator and aubinator_error_decode we don't always
know how many of a given state we have, so we must guess.  One day,
we'll come up with a way to annotate the batch to solve this problem.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch-decoder: Decode constants, binding tables, and samplers
Jason Ekstrand [Wed, 13 Dec 2017 17:58:27 +0000 (09:58 -0800)]
intel/batch-decoder: Decode constants, binding tables, and samplers

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/tools: Switch aubinator_error_decode over to the gen_print_batch
Jason Ekstrand [Wed, 13 Dec 2017 19:03:32 +0000 (11:03 -0800)]
intel/tools: Switch aubinator_error_decode over to the gen_print_batch

The shared framework can now do everything that aubinator_error_decode
ever did and more.  It's time to make the switch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch-decoder: Decode graphics shaders
Jason Ekstrand [Wed, 13 Dec 2017 17:46:39 +0000 (09:46 -0800)]
intel/batch-decoder: Decode graphics shaders

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch-decoder: Decode vertex and index buffers
Jason Ekstrand [Wed, 13 Dec 2017 17:19:57 +0000 (09:19 -0800)]
intel/batch-decoder: Decode vertex and index buffers

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD
Jason Ekstrand [Wed, 13 Dec 2017 16:01:03 +0000 (08:01 -0800)]
intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/tools: Add the start of a generic batch decoder
Jason Ekstrand [Wed, 13 Dec 2017 08:10:12 +0000 (00:10 -0800)]
intel/tools: Add the start of a generic batch decoder

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Expose the raw field value in the iterator
Jason Ekstrand [Wed, 13 Dec 2017 16:23:50 +0000 (08:23 -0800)]
intel/decoder: Expose the raw field value in the iterator

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/disasm: Take a devinfo in gen_disasm_create
Jason Ekstrand [Wed, 13 Dec 2017 07:26:51 +0000 (23:26 -0800)]
intel/disasm: Take a devinfo in gen_disasm_create

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Take a bit offset in gen_print_group
Jason Ekstrand [Wed, 13 Dec 2017 01:36:47 +0000 (17:36 -0800)]
intel/decoder: Take a bit offset in gen_print_group

Previously, if a group was nested in another group such that it didn't
start on a dword boundary, we would decode it as if it started at the
start of its first dword.  This changes things to work even more in
terms of bits so that we can properly decode these structs.  This
affects MOCS, attribute swizzles, and several other things.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Stop rounding down to the nearest dword
Jason Ekstrand [Wed, 13 Dec 2017 01:05:38 +0000 (17:05 -0800)]
intel/decoder: Stop rounding down to the nearest dword

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Convert the iterator to work entirely in bits
Jason Ekstrand [Wed, 13 Dec 2017 00:51:54 +0000 (16:51 -0800)]
intel/decoder: Convert the iterator to work entirely in bits

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Drop gen_field_decode helper
Jason Ekstrand [Wed, 13 Dec 2017 00:12:16 +0000 (16:12 -0800)]
intel/decoder: Drop gen_field_decode helper

It's unused

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoamd/common: add ac_build_waitcnt()
Samuel Pitoiset [Tue, 12 Dec 2017 17:10:23 +0000 (18:10 +0100)]
amd/common: add ac_build_waitcnt()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoamd/common: more use of i32_1
Samuel Pitoiset [Tue, 12 Dec 2017 17:10:22 +0000 (18:10 +0100)]
amd/common: more use of i32_1

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoamd/common: more use of i32_0
Samuel Pitoiset [Tue, 12 Dec 2017 17:10:21 +0000 (18:10 +0100)]
amd/common: more use of i32_0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradeonsi: make use of ac_build_fdiv()
Samuel Pitoiset [Tue, 12 Dec 2017 17:10:20 +0000 (18:10 +0100)]
radeonsi: make use of ac_build_fdiv()

And move the comment to amd/common.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: export SampleMask from pixel shaders at full rate
Samuel Pitoiset [Thu, 14 Dec 2017 12:51:47 +0000 (13:51 +0100)]
radv: export SampleMask from pixel shaders at full rate

Use 16_ABGR instead of 32_ABGR if Z isn't written.

Ported from RadeonSI.

No CTS regressions on Polaris.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradeonsi: make use of ac_get_spi_shader_z_format()
Samuel Pitoiset [Thu, 14 Dec 2017 12:51:46 +0000 (13:51 +0100)]
radeonsi: make use of ac_get_spi_shader_z_format()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoamd/common: add ac_get_spi_shader_z_format()
Samuel Pitoiset [Thu, 14 Dec 2017 12:51:45 +0000 (13:51 +0100)]
amd/common: add ac_get_spi_shader_z_format()

ac_shader_util.c will contain shader helpers for RadeonSI
and RADV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: do not load the local invocation index when it's unused
Samuel Pitoiset [Thu, 14 Dec 2017 16:32:41 +0000 (17:32 +0100)]
radv: do not load the local invocation index when it's unused

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components
Samuel Pitoiset [Thu, 14 Dec 2017 15:48:03 +0000 (16:48 +0100)]
radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components

We should also not load the input SGPRs and VGPRS, but
let's start with this for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoamd/common: scan which components of gl_LocalInvocationID are used
Samuel Pitoiset [Thu, 14 Dec 2017 15:48:02 +0000 (16:48 +0100)]
amd/common: scan which components of gl_LocalInvocationID are used

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoamd/common: scan which components of gl_WorkGroupID are used
Samuel Pitoiset [Thu, 14 Dec 2017 15:48:01 +0000 (16:48 +0100)]
amd/common: scan which components of gl_WorkGroupID are used

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: set FORCE_SIMD_DIST(1) for compute when profitable
Samuel Pitoiset [Thu, 14 Dec 2017 14:51:20 +0000 (15:51 +0100)]
radv: set FORCE_SIMD_DIST(1) for compute when profitable

Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: calculate best compute resource limits
Samuel Pitoiset [Thu, 14 Dec 2017 14:51:19 +0000 (15:51 +0100)]
radv: calculate best compute resource limits

Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: store the dispatch initiator into the device
Samuel Pitoiset [Thu, 14 Dec 2017 14:51:18 +0000 (15:51 +0100)]
radv: store the dispatch initiator into the device

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: replace grid_components_used by uses_grid_size
Samuel Pitoiset [Thu, 14 Dec 2017 11:51:07 +0000 (12:51 +0100)]
radv: replace grid_components_used by uses_grid_size

Use a boolean instead because the number of needed SGPRs
is always 3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: always emit all compute block components
Samuel Pitoiset [Thu, 14 Dec 2017 11:51:06 +0000 (12:51 +0100)]
radv: always emit all compute block components

The number of grid components is always 3 when gl_NumWorkGroups
is declared, because it relies on the number of components of
nir_instrinsic_load_num_work_groups.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agodocs: update calendar, add news item and link release notes for 17.2.7
Emil Velikov [Thu, 14 Dec 2017 13:52:11 +0000 (13:52 +0000)]
docs: update calendar, add news item and link release notes for 17.2.7

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: add sha256 checksums for 17.2.7
Emil Velikov [Thu, 14 Dec 2017 13:49:09 +0000 (13:49 +0000)]
docs: add sha256 checksums for 17.2.7

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: add release notes for 17.2.7
Emil Velikov [Thu, 14 Dec 2017 13:27:23 +0000 (13:27 +0000)]
docs: add release notes for 17.2.7

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoegl/android: Provide an option for the backend to expose KHR_image
Harish Krupo [Fri, 8 Dec 2017 15:59:39 +0000 (21:29 +0530)]
egl/android: Provide an option for the backend to expose KHR_image

From android cts 8.0_r4, a new test case checks if all the required egl
extensions are exposed. In the current implementation we expose KHR_image
if KHR_image_base and KHR_image_pixmap are supported but KHR_image spec
does not mandate the existence of both the extensions.
This patch preserves the current check and also provides the backend
with an option to expose the KHR_image extension.

Test: run cts -m CtsOpenGLTestCases -t \
android.opengl.cts.OpenGlEsVersionTest#testRequiredEglExtensions

Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoradv: Don't advertise VK_EXT_debug_report.
Bas Nieuwenhuizen [Tue, 12 Dec 2017 21:16:55 +0000 (22:16 +0100)]
radv: Don't advertise VK_EXT_debug_report.

We never supported it. Missed during copy and pasting.

Fixes: 17201a2eb0b "radv: port to using updated anv entrypoint/extension generator."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoi965: Don't allocate an MCS for 16x MSAA and width > 8192.
Kenneth Graunke [Wed, 13 Dec 2017 17:45:49 +0000 (09:45 -0800)]
i965: Don't allocate an MCS for 16x MSAA and width > 8192.

The hardware doesn't support this, and isl_surf_get_mcs_surf will fail.

I feel a bit bad replicating this logic, but we want to decide up front.

This fixes the following test when run with --deqp-surface-width=16384:
- GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_error_blitframebuffer_multisampled_framebuffers_different_sample_count

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoAndroid: fix missing generation of vtn_gather_types.c
Rob Herring [Wed, 13 Dec 2017 21:06:08 +0000 (15:06 -0600)]
Android: fix missing generation of vtn_gather_types.c

Commit bb1e6ff161c9 ("spirv: Add a prepass to set types on vtn_values")
added generation of vtn_gather_types.c, but forgot to add it to the
Android build files.

Fixes: bb1e6ff161c9 ("spirv: Add a prepass to set types on vtn_values")
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
6 years agomesa: Add glSpecializeShaderARB to common_desktop_functions
Dylan Baker [Tue, 12 Dec 2017 19:48:31 +0000 (11:48 -0800)]
mesa: Add glSpecializeShaderARB to common_desktop_functions

CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104231
Fixes: 46b21b8f906 ("mesa: add GL_ARB_gl_spirv boilerplate")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoegl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED
Tomasz Figa [Mon, 4 Dec 2017 18:22:39 +0000 (19:22 +0100)]
egl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED

There is no API available to properly query the IMPLEMENTATION_DEFINED
format. As a workaround we rely here on gralloc allocating either
an arbitrary YCbCr 4:2:0 or RGBX_8888, with the latter being recognized
by lock_ycbcr failing.

Reviewed-on: https://chromium-review.googlesource.com/566793

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
6 years agoswr: Correct texture allocation and limit max size to 2GB
Bruce Cherniak [Mon, 20 Nov 2017 17:32:55 +0000 (11:32 -0600)]
swr: Correct texture allocation and limit max size to 2GB

This patch fixes piglit tex3d-maxsize by correcting 4 things:

The total_size calculation was using 32-bit math, therefore a >4GB
allocation request overflowed and was not returning false (unsupported).

Changed AlignedMalloc arguments from "unsigned int" to size_t, to handle
>4GB allocations.

Added error checking on texture allocations to fail gracefully.

Finally, temporarily decreased supported max texture size from 4GB to 2GB.
The gallivm texture-sampler needs some additional work to correctly handle
larger than 2GB textures (offsets to LLVMBuildGEP are signed).

I'm working on a follow-on patch to allow up to 4GB textures, as this is
useful in HPC visualization applications.

Fixes piglit tex3d-maxsize.

v2: Updated patch description to clarify ">4GB".

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
6 years agoswr: Fix KNOB_MAX_WORKER_THREADS thread creation override.
Bruce Cherniak [Tue, 12 Dec 2017 23:18:23 +0000 (17:18 -0600)]
swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.

Environment variable KNOB_MAX_WORKER_THREADS allows the user to override
default thread creation and thread binding.  Previous commit to adjust
linux cpu topology caused setting this KNOB to bind all threads to a single
core.

This patch restores correct functionality of override.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
6 years agomeson: fix glx-test race
Dylan Baker [Tue, 12 Dec 2017 18:23:48 +0000 (10:23 -0800)]
meson: fix glx-test race

This test should rely on dispatch.h being generated, but it doesn't.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agogallium/docs: document behavior of set_sample_mask()
Brian Paul [Wed, 13 Dec 2017 03:32:06 +0000 (20:32 -0700)]
gallium/docs: document behavior of set_sample_mask()

The sample mask is used even if msaa is not explicity enabled when we
have a framebuffer with multisampled surfaces.  That's DX behavior and
what the Radeon drivers do.  Not sure about other drivers at this point.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agoglsl: trivial whitespace fixes in link_varyings.cpp
Brian Paul [Tue, 12 Dec 2017 22:11:21 +0000 (15:11 -0700)]
glsl: trivial whitespace fixes in link_varyings.cpp

6 years agoprogram: Don't reset SamplersValidated when restoring from shader cache
Jordan Justen [Tue, 12 Dec 2017 19:44:01 +0000 (11:44 -0800)]
program: Don't reset SamplersValidated when restoring from shader cache

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103988
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agomesa: remove second include of errors.h in src/mesa/main/glspirv.c
Kai Wasserbäch [Tue, 12 Dec 2017 15:20:06 +0000 (16:20 +0100)]
mesa: remove second include of errors.h in src/mesa/main/glspirv.c

Fixes: 5bc03d2508 ("mesa: implement SPIR-V loading in glShaderBinary")
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoradeonsi: create get_tcs_tes_buffer_address helper
Timothy Arceri [Thu, 23 Nov 2017 01:59:01 +0000 (12:59 +1100)]
radeonsi: create get_tcs_tes_buffer_address helper

This will be shared between the NIR and TGSI backends.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoac: fix nir_op_f2f64
Timothy Arceri [Tue, 12 Dec 2017 05:10:24 +0000 (16:10 +1100)]
ac: fix nir_op_f2f64

Without this we get the error "FPExt only operates on FP" when
converting the following:

   vec1 32 ssa_5 = b2f ssa_4
   vec1 64 ssa_6 = f2f64 ssa_5

Which results in:

   %44 = and i32 %43, 1065353216
   %45 = fpext i32 %44 to double

With this patch we now get:

   %44 = and i32 %43, 1065353216
   %45 = bitcast i32 %44 to float
   %46 = fpext float %45 to double

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonir: fix shift for uint64_t
Timothy Arceri [Tue, 12 Dec 2017 02:52:50 +0000 (13:52 +1100)]
nir: fix shift for uint64_t

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agost/glsl_to_nir: skip forced array splitting for tcs
Timothy Arceri [Tue, 12 Dec 2017 02:49:41 +0000 (13:49 +1100)]
st/glsl_to_nir: skip forced array splitting for tcs

nir_lower_io_to_temporaries() does not support tcs so we cannot
assume there are no indirects here. Also the radeonsi backend
(the only backend to support tess) has support for tcs indirects
so there is no need to lower them anyway.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agointel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.
Francisco Jerez [Tue, 12 Dec 2017 04:24:53 +0000 (20:24 -0800)]
intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.

Fixes: af2c320190f3c731 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199
Reported-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>