git.libre-soc.org Git - mesa.git/log

Nicolai Hähnle [Fri, 15 Sep 2017 14:51:14 +0000 (16:51 +0200)]

gallium: add LDEXP TGSI instruction and corresponding cap

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 15 Sep 2017 16:47:52 +0000 (18:47 +0200)]

tgsi: infer that dst[1] of DFRACEXP is an integer

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Sat, 16 Sep 2017 10:50:42 +0000 (12:50 +0200)]

gallivm: add support for TGSI instructions with two outputs

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 15 Sep 2017 16:45:32 +0000 (18:45 +0200)]

gallivm: add dst register index to lp_build_tgsi_context::emit_store

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 15 Sep 2017 16:34:48 +0000 (18:34 +0200)]

tgsi: clarify the semantics of DFRACEXP

The status quo is quite the mess:

1. tgsi_exec will do a per-channel computation, and store the dst[0]
   result (significand) correctly for each channel. The dst[1] result
   (exponent) will be written to the first bit set in the writemask.
   So per-component calculation only works partially.

2. r600 will only do a single computation. It will replicate the
   exponent but not the significand.

3. The docs pretend that there's per-component calculation, but even
   get dst[0] and dst[1] confused.

4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions,
   and kind-of assumes that everything is replicated, generating this for
   the dvec4 case:

     DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy
     DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw
     DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy
     DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw

Settle on the simplest behavior, which is single-component calculation
with replication, document it, and adjust tgsi_exec and r600.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 15 Sep 2017 15:47:27 +0000 (17:47 +0200)]

tgsi: fix the documentation of DLDEXP

Sourcing the exponent for the zw destination pair from Z is consistent
with both tgsi_exec and gallivm. In practice, st_glsl_to_tgsi always
generates per-channel instructions anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 15 Sep 2017 15:40:05 +0000 (17:40 +0200)]

tgsi: infer that DLDEXP's second source has an integer type

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 15 Sep 2017 14:39:31 +0000 (16:39 +0200)]

glsl/lower_instruction: handle denorms and overflow in ldexp correctly

GLSL ES requires both, and while GLSL explicitly doesn't require correct
overflow handling, it does appear to require handling input inf/denorms
correctly.

Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.*

Cc: mesa-stable@lists.freedesktop.org
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 28 Sep 2017 15:52:42 +0000 (17:52 +0200)]

util/queue: fix a race condition in the fence code

A tempting alternative fix would be adding a lock/unlock pair in
util_queue_fence_is_signalled. However, that wouldn't actually
improve anything in the semantics of util_queue_fence_is_signalled,
while making that test much more heavy-weight. So this lock/unlock
pair in util_queue_fence_destroy for "flushing out" other threads
that may still be in util_queue_fence_signal looks like the better
fix.

v2: rephrase the comment

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 28 Sep 2017 19:46:30 +0000 (21:46 +0200)]

r600: cleanup set_occlusion_query_state

This fixes a warning caused by the fork (note the change in the function
signature):

../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c: In function ‘r600_init_common_state_functions’:
../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c:2974:36: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
rctx->b.set_occlusion_query_state = r600_set_occlusion_query_state;

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 28 Sep 2017 19:44:35 +0000 (21:44 +0200)]

r300: add missing case PIPE_SHADER_CAP_INT64_ATOMICS

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Nicolai Hähnle [Sat, 23 Sep 2017 12:19:59 +0000 (14:19 +0200)]

radeonsi: fix border color translation for integer textures

This fixes the extremely unlikely case that an application uses
0x80000000 or 0x3f800000 as border color for an integer texture and
helps in the also, but perhaps slightly less, unlikely case that 1 is
used as a border color.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Sat, 23 Sep 2017 08:29:51 +0000 (10:29 +0200)]

radeonsi: clamp border colors for upgraded depth textures

The hardware does this automatically for unorm formats, but we need to
do it manually for unorm depth formats that have been upgraded to
Z32_FLOAT.

Fixes dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
and others.

Fixes: d4d9ec55c589 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Sat, 23 Sep 2017 11:20:25 +0000 (13:20 +0200)]

radeonsi: clamp depth comparison value only for fixed point formats

The hardware usually does this automatically. However, we upgrade
depth to Z32_FLOAT to enable TC-compatible HTILE, which means the
hardware no longer clamps the comparison value for us.

The only way to tell in the shader whether a clamp is required
seems to be to communicate an additional bit in the descriptor
table. While VI has some unused bits in the resource descriptor,
those bits have unfortunately all been used in gfx9. So we use
an unused bit in the sampler state instead.

Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f
and many other tests in dEQP-GLES3.functional.texture.shadow.*

Fixes: d4d9ec55c589 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 21 Sep 2017 14:50:08 +0000 (16:50 +0200)]

radeonsi/gfx9: fix geometry shaders without output vertices

Not that those are super common or useful, but hey! Fun corner cases
of the API...

Fixes dEQP-GLES31.functional.geometry_shading.emit.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 22 Sep 2017 17:14:16 +0000 (19:14 +0200)]

amd/common: save an instruction in the build_cube_select sequence

Avoid a v_cndmask: the absolute value is free due to input modifiers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 22 Sep 2017 17:05:52 +0000 (19:05 +0200)]

amd/common: fix build_cube_select

Fix the custom cube coord selection sequence to be identical to
the hardware v_cubesc/tc and OpenGL spec. Affects texture sampling
with user-provided derivatives.

Fixes dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Fri, 22 Sep 2017 14:59:08 +0000 (16:59 +0200)]

st/glsl_to_tgsi: fix conditional assignments to packed shader outputs

Overriding the default (no-op) swizzle is clearly counter-productive,
since the whole point is putting the destination register as one of
the source operands so that it remains unmodified when the assignment
condition is false.

Fragment depth and stencil outputs are a special case due to how their
source swizzles are manipulated in translate_src when compiling to
TGSI.

Fixes dEQP-GLES2.functional.shaders.conditionals.if.*_vertex
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 21 Sep 2017 14:55:35 +0000 (16:55 +0200)]

st/glsl_to_tgsi: fix a use-after-free in merge_two_dsts

Found by address sanitizer.

The loop here tries to be safe, but in doing so, it ends up doing
exactly the wrong thing: the safe foreach is for when the loop
variable (inst) could be deleted and nothing else. However, this
particular can delete inst's successor, but not inst itself.

Fixes: 8c6a0ebaad72 ("st/mesa: add st fp64 support (v7.1)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Sat, 23 Sep 2017 20:34:10 +0000 (22:34 +0200)]

radeonsi: move descriptor logs to after corresponding draw/compute packet

It has to happen after descriptor uploads since otherwise we'll print out
the wrong GPU list / incorrectly claim descriptor corruption.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Nicolai Hähnle [Mon, 18 Sep 2017 13:44:50 +0000 (15:44 +0200)]

amd/common: remove ac_shader_abi::chip_class

Redundant with the recently added ac_llvm_context::chip_class.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 14 Sep 2017 14:17:31 +0000 (16:17 +0200)]

gallium/radeon: fix a comment

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 27 Sep 2017 09:36:31 +0000 (11:36 +0200)]

i965/fs: force pull model for 64-bit GS inputs

Triggering the push model when 64-bit inputs are involved is not easy due to
the constrains on the maximum number of registers that we allow for this mode,
however, for GS with 'points' primitive type and just a couple of double
varyings we can trigger this and it just doesn't work because the
implementation is not 64-bit aware at all. For now, let's make sure that we
don't attempt this model whith 64-bit inputs and we always fall back to pull
model for them.

Also, don't enable the VUE handles in the thread payload on the fly when we
find an input for which we need the pull model, this is not safe: if we need
to resort to the pull model we need to account for that when we setup the
thread payload so we compute the first non-payload register properly. If we
didn't do that correctly and we enable it on-the-fly here then we will end up
VUE handles on the first non-payload register which will probably lead to
GPU hangs. Instead, always enable the VUE handles for the pull model so we
can safely use them when needed. The GS is going to resort to pull model
almost in every situation anyway, so this shouldn't make a significant
difference and it makes things easier and safer.

v2: Always enable the VUE handles for pull model, this is easier and safer
and the GS is going to fallback to pull model almost always anyway (Ken)

v3: Only clamp the URB read length if we are over the maximum reserved for
push inputs as we were doing in the original code (Ken).

v4: No need to clamp the urb read length if invocations > 1

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Jason Ekstrand [Thu, 28 Sep 2017 16:58:59 +0000 (09:58 -0700)]

i965/link: Use prog->nir instead of creating a temporary

This way, when NIR_PASS_V makes a clone of the shader (for testing
nir_clone), the new and lowered version gets re-assigned to prog->nir.

[jordan.l.justen@intel.com: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Thu, 28 Sep 2017 16:58:38 +0000 (09:58 -0700)]

i965/link: Make more use of NIR_PASS

[jordan.l.justen@intel.com: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Thu, 28 Sep 2017 16:55:15 +0000 (09:55 -0700)]

i965/link: Make better use of temporary variables

The way NIR_PASS works (and, by extension, nir_optimize) is that they
may clone the shader and throw the old one away. (We use this for
testing nir_clone.) It's better if we just make a temporary variable,
use it for everything, and re-assign to the gl_program at the end.

[jordan.l.justen@intel.com: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Thomas Helland [Wed, 27 Sep 2017 19:24:06 +0000 (21:24 +0200)]

util: fix in-class initialization of static member

Fix a compile error with G++ 4.4

string_buffer_test.cpp:43: error: ISO C++ forbids initialization of
member ‘str1’
string_buffer_test.cpp:43: error: making ‘str1’ static
string_buffer_test.cpp:43: error: invalid in-class initialization of
static data member of non-integral type ‘const char*’

Tested-by: Vinson Lee <vlee at freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103002

commit | commitdiff | tree

Eric Engestrom [Thu, 28 Sep 2017 17:08:59 +0000 (18:08 +0100)]

REVIEWERS: add myself as a Meson reviewer

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>

commit | commitdiff | tree

Eric Engestrom [Thu, 28 Sep 2017 12:36:09 +0000 (13:36 +0100)]

REVIEWERS: add Meson

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

commit | commitdiff | tree

Dylan Baker [Wed, 27 Sep 2017 23:24:38 +0000 (16:24 -0700)]

meson: remove duplicate libisl dependency in anv

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Brian Paul [Tue, 26 Sep 2017 15:57:02 +0000 (09:57 -0600)]

svga: add missing PIPE_SHADER_CAP_INT64_ATOMICS switch cases

Silences a compiler warning.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Brian Paul [Tue, 26 Sep 2017 15:58:55 +0000 (09:58 -0600)]

svga: trivial whitespace clean-ups in svga_screen.c

commit | commitdiff | tree

Brian Paul [Fri, 8 Sep 2017 22:42:12 +0000 (16:42 -0600)]

gallium/util: use new util_vasprintf() function

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Brian Paul [Fri, 8 Sep 2017 22:41:16 +0000 (16:41 -0600)]

util: add util_vasprintf() for Windows (v2)

We don't have vasprintf() on Windows so we need to implement it ourselves.

v2: compute actual length of output string, per Nicolai Hähnle.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Brian Paul [Fri, 8 Sep 2017 22:40:52 +0000 (16:40 -0600)]

st/mesa: don't call close() on Windows

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Neha Bhende [Fri, 14 Jul 2017 17:59:46 +0000 (10:59 -0700)]

svga: start advertising PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION

Since our driver support arb_provoking_vertex, we can start
advertising PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION
Fixes ./clipflat & ./arb-provoking-vertex-render piglit tests

Tested piglit, glretrace on Hw 11 and Hw 13

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Marek Olšák [Wed, 27 Sep 2017 11:17:21 +0000 (13:17 +0200)]

mesa: fix texture updates for ATI_fragment_shader

Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Lucas Stach [Wed, 6 Sep 2017 12:30:40 +0000 (14:30 +0200)]

etnaviv: optimize RS transfers

Currently we are blitting the whole resource when the RS is used to
de-/tile a resource. This can be very inefficient for large resources
where the transfer is only changing a small part of the resource
(happens a lot with glTexSubImage2D).

Optimize this by only blitting the tile aligned subregion of the
resource, which the transfer is going to change.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>

commit | commitdiff | tree

Lucas Stach [Wed, 6 Sep 2017 12:28:21 +0000 (14:28 +0200)]

etnaviv: add resource subregion copy

This is useful if we only need to copy part of a larger resource, mostly
when using the RS engine to de-/tile on pipe transfers.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>

commit | commitdiff | tree

Lucas Stach [Wed, 6 Sep 2017 12:24:30 +0000 (14:24 +0200)]

etnaviv: support tile aligned RS blits

The RS can blit abitrary tile aligned subregions of a resource by
adjusting the buffer offset.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>

commit | commitdiff | tree

Leo Liu [Tue, 26 Sep 2017 13:11:52 +0000 (09:11 -0400)]

st/va: use pipe transfer_map to map upload buffer

The function pipe_buffer_map() is only for linear pipe buffer,
with height as 0, and it's not for any 2D textures.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Gwan-gyeong Mun [Tue, 26 Sep 2017 08:14:17 +0000 (01:14 -0700)]

anv: add an assertion in genX(BeginCommandBuffer)

To check a valid usage requirement.

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Gwan-gyeong Mun [Tue, 26 Sep 2017 08:14:16 +0000 (01:14 -0700)]

radv: add an assertion in radv_BeginCommandBuffer()

To check a valid usage requirement.

CID: 1401616

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Gwan-gyeong Mun [Thu, 24 Aug 2017 14:17:33 +0000 (23:17 +0900)]

gallium/docs: add reference links for resource_create method

It adds reference links for arguments usage and bind of resource_create().

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Gwan-gyeong Mun [Thu, 24 Aug 2017 13:53:04 +0000 (22:53 +0900)]

gallium/docs: fix a reference link for get_paramf

Previous get_paramf links same as get_param. It changes the reference link to
PIPE_CAPF_*

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 27 Sep 2017 11:05:39 +0000 (13:05 +0200)]

i965: enable up to 32 inputs for geometry shaders in gen8+

We have been exposing only 16 since 1e3e72e3054de with arguments
based on register pressure and the number of available GRFs, however,
our scalar backend will always limit the number of push registers
for GS threads to 24 and fallback to pull model for anything else,
so there is really no reason to lower the number under those arguments.

By bumping this up to 32 we make it the same as all the other stages,
which is a nice feature to have that can help applications in some
cases (I recently fixed a bug in CTS that assumed that the number
of input locations in a stage matches the number of output locations
in the previous stage for example).

Pre-gen8, we use the vector backend and push model, so in that case
the arguments in 1e3e72e3054de are still valid.

v2: check if we have scalar GS instead of the hw gen to enable this (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 27 Sep 2017 21:09:08 +0000 (23:09 +0200)]

radv: set image view type when decompressing depth surfaces

This was missing.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Eric Anholt [Wed, 27 Sep 2017 20:01:04 +0000 (13:01 -0700)]

broadcom/vc4: Fix release build

I remember thinking "gosh, it would be nice if I could do a kernel-style
'if (!IS_ENABLED(DEBUG))' instead of using an #ifdef, so the code was
compiled on both builds", and then forgot to test a release build anyway.

Fixes: a8fd58eae596 ("vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.")
Reported-by: Derek Foreman <derekf@osg.samsung.com>

commit | commitdiff | tree

Eric Anholt [Fri, 12 May 2017 23:05:44 +0000 (16:05 -0700)]

vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.

This has proven to be incredibly useful for debugging CMA allocation
failures and driving memory management improvements. However, we don't
want to burden entry and exit from the BO cache with the labeling ioctl's
overhead on release builds.

commit | commitdiff | tree

Dylan Baker [Wed, 20 Sep 2017 18:53:29 +0000 (11:53 -0700)]

meson: build "radv" vulkan driver for radeon hardware

This builds, installs, and has been tested on a r290x (Hawaii) with the Vulkan
CTS. It dies horribly in a fire at the same point for the meson build as the
autotools build.

v2: - enable radv by default
    - add shader cache support and enforce that it's built for radv
v3: - Fix typo in meson_options (Nicholas)
    - strip trailing 'svn' from llvm version before setting the version
      preprocessor flag (Bas)
    - Check for LLVM module requirements

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Dylan Baker [Fri, 15 Sep 2017 00:57:17 +0000 (17:57 -0700)]

meson: Add build Intel "anv" vulkan driver

This allows building and installing the Intel "anv" Vulkan driver using
meson and ninja, the driver has been tested against the CTS and has
seems to pass the same series of tests (they both segfault when the CTS
tries to run wayland wsi tests).

There are still a mess of TODO, XXX, and FIXME comments in here. Those
are mostly for meson bugs I'm trying to fix, or for additional things to
implement for other drivers/features.

I have configured all intermediate libraries and optional tools to not
build by default, meaning they will only be built if they're pulled in
as a dependency of a target that will actually be installed) this allows
us to avoid massive if chains, while ensuring that only the bits that
need to be built are.

v2: - enable anv, x11, and wayland by default
    - add configure option to disable valgrind
v3: - fix typo in meson_options (Nicholas)
v4: - Remove dead code (Eric)
    - Remove change to generator that was from v0 (Eric)
    - replace if chain with loop (Eric)
    - Fix typos (Eric)
    - define HAVE_DLOPEN for both libdl and builtin dl cases (Eric)
v5: - rebase on util string buffer implementation

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v4)

commit | commitdiff | tree

Dylan Baker [Wed, 20 Sep 2017 17:52:40 +0000 (10:52 -0700)]

util/ralloc: Don't define assert with magic member without DEBUG

It is possible to have DEBUG disabled but asserts on (NDEBUG), which
cannot build because these asserts work on members that are only present
when DEBUG is on.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>

commit | commitdiff | tree

Dylan Baker [Tue, 19 Sep 2017 18:46:16 +0000 (11:46 -0700)]

intel: use a flag instead of setting PYTHONPATH

Meson doesn't allow setting environment variables for custom targets, so
we either need to not pass this as an environment variable or use a
shell script to wrap the invocation. The chosen solution has the
advantage of working for both autotools and meson.

v2: - put rules back in top scope (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>

commit | commitdiff | tree

Marek Olšák [Wed, 27 Sep 2017 14:53:26 +0000 (16:53 +0200)]

st/dri: don't expose modifiers in EGL if the driver doesn't implement them

This unbreaks waffle/gbm (piglit/gbm) which fails initialization.

v2: also don't set queryDmaBufFormats

Reviewed-by: Daniel Stone <daniel@fooishbar.org>

commit | commitdiff | tree

Jason Ekstrand [Tue, 26 Sep 2017 16:42:56 +0000 (09:42 -0700)]

vulkan/wsi/wayland: Return better error messages

Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Jason Ekstrand [Tue, 26 Sep 2017 16:20:47 +0000 (09:20 -0700)]

vulkan/wsi/wayland: Copy wl_proxy objects from oldSwapchain if available

This should save us some round trips while resizing.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Jason Ekstrand [Tue, 26 Sep 2017 15:30:22 +0000 (08:30 -0700)]

vulkan/wsi/wayland: Stop caching Wayland displays

We originally implemented caching to avoid unneeded round-trips to the
compositor when querying surface capabilities etc. to set up the
swapchain.  Unfortunately, this doesn't work if vkDestroyInstance is
called after the Wayland connection has been dropped.  In this case, we
end up trying to clean up already destroyed wl_proxy objects which leads
to crashes.  In particular most of dEQP-VK.wsi.wayland is crashing
thanks to this problem.

This commit gets rid of the cache and simply embeds the wsi_wl_display
struct in the swapchain.  While we're at it, we can get rid of the
wl_event_queue that we were storing in the swapchain because we can just
use the one in the embedded wsi_wl_display.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Bugzilla: https://bugs.freedesktop.org/102578
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Jason Ekstrand [Fri, 22 Sep 2017 20:08:15 +0000 (13:08 -0700)]

vulkan/wsi/wayland: Refactor wsi_wl_display code

We convert it over to an inti/finish model and make create/destroy
wrappers for the former.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Jan Vesely [Wed, 20 Sep 2017 20:06:10 +0000 (16:06 -0400)]

clover: Query and export int64 atomics

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

commit | commitdiff | tree

Adam Jackson [Tue, 26 Sep 2017 20:38:31 +0000 (16:38 -0400)]

glx: Be more tolerant in glXImportContext (v2)

Ugh the GLX code. __GLX_MAX_CONTEXT_PROPS is 3 because glxproto.h is
just a pile of ancient runes, so when the server begins sending more
than 3 context properties this code refuses to work _at all_. Which is
all just silly. If _XReply succeeds, it will have buffered the whole
reply, we can just walk through each property one at a time.

v2: Now with no arbitrary limits. (Eric Anholt)

Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Tomasz Figa [Thu, 10 Aug 2017 13:59:45 +0000 (22:59 +0900)]

egl/dri2: Implement swapInterval fallback in a conformant way (v2)

dri2_fallback_swap_interval() currently used to stub out swap interval
support in Android backend does nothing besides returning EGL_FALSE.
This causes at least one known application (Android Snapchat) to fail
due to an unexpected error and my loose interpretation of the EGL 1.5
specification justifies it. Relevant quote below:

    The function

        EGLBoolean eglSwapInterval(EGLDisplay dpy, EGLint interval);

    specifies the minimum number of video frame periods per buffer swap
    for the draw surface of the current context, for the current rendering
    API. [...]

    The parameter interval specifies the minimum number of video frames
    that are displayed before a buffer swap will occur. The interval
    specified by the function applies to the draw surface bound to the
    context that is current on the calling thread. [...] interval is
    silently clamped to minimum and maximum implementation dependent
    values before being stored; these values are defined by EGLConfig
    attributes EGL_MIN_SWAP_INTERVAL and EGL_MAX_SWAP_INTERVAL
    respectively.

    The default swap interval is 1.

Even though it does not specify the exact behavior if the platform does
not support changing the swap interval, the default assumed state is the
swap interval of 1, which I interpret as a value that eglSwapInterval()
should succeed if called with, even if there is no ability to change the
interval (but there is no change requested). Moreover, since the
behavior is defined to clamp the requested value to minimum and maximum
and at least the default value of 1 must be present in the range, the
implementation might be expected to have a valid range, which in case of
the feature being unsupported, would correspond to {1} and any request
might be expected to be clamped to this value.

Fix this by defaulting dri2_dpy's min_swap_interval, max_swap_interval
and default_swap_interval to 1 in dri2_setup_screen() and let platforms,
which support this functionality set their own values after this
function returns. Thanks to patches merged earlier, we can also remove
the dri2_fallback_swap_interval() completely, as with a singular range
it would not be called anyway.

v2: Remove dri2_fallback_swap_interval() completely thanks to higher
    layer already clamping the requested interval and not calling the
    driver layer if the clamped value is the same as current.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Marek Olšák [Mon, 18 Sep 2017 16:04:25 +0000 (18:04 +0200)]

gallium/radeon: consolidate PIPE_BIND_SHARED/SCANOUT handling

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 27 Sep 2017 07:30:46 +0000 (09:30 +0200)]

radeonsi: remove useless check in si_blit_decompress_color()

That's unnecessary to double-check that dcc_offset is not 0
because all callers already check that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 27 Sep 2017 07:29:27 +0000 (09:29 +0200)]

gallium/radeon: more use of vi_dcc_formats_are_incompatible()

Found by inspection.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 26 Sep 2017 21:26:20 +0000 (23:26 +0200)]

radv: store the amount of saved constants in the compute state

It's safer and more elegant.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 26 Sep 2017 21:26:19 +0000 (23:26 +0200)]

radv: remove useless radv_meta_{begin,end}_XXX() helpers

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

George Kyriazis [Mon, 25 Sep 2017 17:58:18 +0000 (12:58 -0500)]

swr: Remove unneeeded comparison

No need to check if screen->pipe != pipe, so we can just assign it. Just do it.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

commit | commitdiff | tree

George Kyriazis [Thu, 14 Sep 2017 02:06:44 +0000 (21:06 -0500)]

swr: Handle resource across context changes

Swr caches fb contents in tiles.  Those tiles are stored on a per-context
basis.

When switching contexts that share resources we need to make sure that
the tiles of the old context are being stored and the tiles of the new
context are being invalidated (marked as invalid, hence contents need
to be reloaded).

The context does not get any dirty bits to identify this case.  This has
to be, then, coordinated by the resources that are being shared between
the contexts.

Add a "curr_pipe" hook in swr_resource that will allow us to identify a
MakeCurrent of the above form during swr_update_derived().  At that time,
we invalidate the tiles of the new context.  The old context, will need to
have already store its tiles by that time, which happens during glFlush().
glFlush() is being called at the beginning of MakeCurrent.

So, the sequence of operations is:
- At the beginning of glXMakeCurrent(), glFlush() will store the tiles
  of all bound surfaces of the old context.
- After the store, a fence will guarantee that the all tile store make
  it to the surface
- During swr_update_derived(), when we validate the new context, we check
  all resources to see what changed, and if so, we invalidate the
  current tiles.

Fixes rendering problems with CEI/Ensight.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Fri, 22 Sep 2017 19:44:36 +0000 (12:44 -0700)]

vulkan/wsi/wayland: Stop printing out the DRM device

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Kenneth Graunke [Sun, 24 Sep 2017 19:42:52 +0000 (12:42 -0700)]

i965: Support copy propagating of untyped atomic surface indexes.

In the vec4 backend, SHADER_OPCODE_UNTYPED_ATOMIC's src[1] is the
surface index. We want to copy propagate so we can use an immediate
message descriptor, rather than an indirect send.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Sun, 24 Sep 2017 21:24:53 +0000 (14:24 -0700)]

i965/vec4: Fix swizzles on atomic sources.

Atomic operation sources are scalar values, but we were failing to
select the .x component of the second operand.  For example,

   atomicCounterCompSwapARB(counter, 5u, 10u)

would generate

   mov(8) vgrf4.x:D, 5D
   mov(8) vgrf5.x:D, 10D

   mov(8) vgrf9.x:UD, vgrf4.xyzw:D
   mov(8) vgrf9.y:UD, vgrf5.xyzw:D

which wrongly selects the .y component of vgrf5, so the actual 10u value
would get dead code eliminated.  The swizzle works for the other source,
but both of them ought to be .xxxx.

Fixes the compare and swap CTS tests in:
KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase

Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Sun, 24 Sep 2017 19:23:34 +0000 (12:23 -0700)]

i965/vec4: Actually handle atomic op intrinsics.

Embarassingly, someone enabled the ARB_shader_atomic_counter_ops
extension for Gen7+ but never added the intrinsics to the switch
statement in the vec4 backend, so they just hit an unreachable()
call and died.

Fixes: 40dd45d0c6aa4a9d (i965: Enable ARB_shader_atomic_counter_ops)
Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Mon, 25 Sep 2017 02:59:41 +0000 (19:59 -0700)]

i965: Convert brw->*_program into a brw->programs[i] array.

This makes it easier to loop over programs.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

commit | commitdiff | tree

Eric Anholt [Sun, 17 Sep 2017 19:14:20 +0000 (12:14 -0700)]

anv: Fix some comment typos.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Eric Anholt [Sat, 21 Jan 2017 05:56:27 +0000 (16:56 +1100)]

gallium: Weaken assertion about u_mm's align2 field.

vc5 MMU mappings are access-controlled at a 128kb boundary, so the 4kb
here was too small for that purpose. Allowing any valid align2 value that
u_mm's 32-bit addressing can represent will still catch most cases of
people passing in a byte alignment.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Eric Anholt [Fri, 8 Sep 2017 22:30:00 +0000 (15:30 -0700)]

intel/genxml: Convert a not-present-or-"1" dict to a set.

I was implementing the same enum support in broadcom's gen_pack_header.py,
and did this same simplification there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Boris Brezillon [Tue, 26 Sep 2017 07:48:37 +0000 (09:48 +0200)]

broadcom/vc4: Fix infinite retry in vc4_bo_alloc()

cleared_and_retried is always reset to false when jumping to the retry
label, thus leading to an infinite retry loop.

Fix that by moving the cleared_and_retried variable definitions at the
beginning of the function. While we're at it, move the create variable
with the other local variables and explicitly reset its content in the
retry path.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: 78087676c98aa8884ba92 "vc4: Restructure the simulator mode."

commit | commitdiff | tree

Eric Anholt [Fri, 28 Jul 2017 01:11:19 +0000 (18:11 -0700)]

broadcom/vc4: Keep pipe_sampler_view->texture matching the original texture.

I was overwriting view->texture with the shadow resource when we need to
do shadow copies (retiling or baselevel rebase), but that tripped up some
critical new sanity checking in state_tracker (making sure that stObj->pt
hasn't changed from view->texture through TexImage-related paths).

To avoid that, move the shadow resource to the vc4_sampler_view struct.

Fixes: f0ecd36ef8e1 ("st/mesa: add an entirely separate codepath for setting up buffer views")

commit | commitdiff | tree

Samuel Pitoiset [Tue, 26 Sep 2017 14:52:06 +0000 (16:52 +0200)]

radv: fix saved compute state when doing statistics/occlusion queries

We are pushing 16-bytes of constants, so we have to save/restore
the same amount of data to avoid data corruption.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Daniel Stone [Fri, 22 Sep 2017 22:40:59 +0000 (15:40 -0700)]

Revert "wayland-drm: constify the callbacks struct"

The wayland-drm callback struct is referenced, rather than duplicated,
inside wayland-drm. Constifying this struct involved moving it on to the
stack; as a result, starting any EGL client on Wayland called into
random stack memory, and killed the compositor.

This reverts commit 1d0be5b3fe548ee33d4520092f583c76d42510a6 and
39d539e321c6c97433a15660c9d9a20ad8657ff0.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Krzysztof Sobiecki <sobkas@gmail.com>
Fixes: 1d0be5b3fe54 ("wayland-drm: constify the callbacks struct")

commit | commitdiff | tree

Brian Paul [Mon, 25 Sep 2017 20:10:53 +0000 (14:10 -0600)]

svga: silence unused var warning in optimized build with MAYBE_UNUSED

Trivial

commit | commitdiff | tree

Thomas Helland [Sat, 20 May 2017 22:21:54 +0000 (00:21 +0200)]

glcpp: Avoid unnecessary call to strlen

Length of the token was already calculated by flex and stored in yyleng,
no need to implicitly call strlen() via linear_strdup().

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>
V2: Also convert this pattern in glsl_lexer.ll

V3: Remove a misplaced comment

V4: Use a temporary char to avoid type change
Remove bogus +1 on length check of identifier

commit | commitdiff | tree

Thomas Helland [Sat, 20 May 2017 20:50:09 +0000 (22:50 +0200)]

glcpp: Use string_buffer for line continuation removal

Migrate removal of line continuations to string_buffer. Before this
it used ralloc_strncat() to append strings, which internally
each time calculates strlen() of its argument. Its argument is
entire shader, so it multiple time scans the whole shader text.

Signed-off-by: Vladislav Egorov <vegorov180@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
V2: Adapt to different API of string buffer (Thomas Helland)

commit | commitdiff | tree

Thomas Helland [Fri, 19 May 2017 22:14:52 +0000 (00:14 +0200)]

glsl: Change the parser to use the string buffer

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
V2: Pointed out by Timothy
   - Fix pp.c reralloc size issue and comment

V3 - Use vprintf instead of printf where we should
   - Fixes failing make-check tests

V4 - Use buffer_append_char in a couple places
   - Use append_char in even more places

commit | commitdiff | tree

Thomas Helland [Fri, 19 May 2017 20:07:17 +0000 (22:07 +0200)]

util: Add tests for the string buffer

More tests could probably be added, but this should cover
concatenation, resizing, clearing, formatted printing,
and checking the length, so it should be quite complete.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
V2: Address review feedback from Timothy, plus fixes
   - Use a large enough char array
   - Actually test the formatted appending
   - Test that clear function resets string length

V3: Port to gtest

V4: Fix test makefile
    Fix copyright header
    Fix missing extern C
    Use more appropriate name for C-file
    Add tests for append_char

commit | commitdiff | tree

Thomas Helland [Mon, 15 May 2017 19:36:52 +0000 (21:36 +0200)]

util: Add a string buffer implementation

Based on Vladislav Egorovs work on the preprocessor, but split
out to a util functionality that should be universal. Setup, teardown,
memory handling and general layout is modeled around the hash_table
and the set, to make it familiar for everyone.

A notable change is that this implementation is always null terminated.
The rationale is that it will be less error-prone, as one might
access the buffer directly, thereby reading a non-terminated string.
Also, vsnprintf and friends prints the null-terminator.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: Address review feedback from Timothy and Grazvydas
   - Fix MINGW preprocessor check
   - Changed len from uint to int
   - Make string argument const in append function
   - Move to header and inline append function
   - Add crimp_to_fit function for resizing buffer

V3: Move include of ralloc to string_buffer.h

V4: Use u_string.h for a cross-platform working vsnprintf

V5: Remember to cast to char * in crimp function

V6: Address review feedback from Nicolai
   - Handle !str->buf in buffer_create
   - Ensure va_end is always called in buffer_append_all
   - Add overflow check in buffer_append_len
   - Do not expose buffer_space_left, just remove it
   - Clarify why a loop is used in vprintf, change to for-loop
   - Add a va_copy to buffer_vprintf to fix failure to append arguments
     when having to resize the buffer for vsnprintf.

V7: Address more review feedback from Nicolai
   - Add missing va_end corresponding to va_copy
   - Error check failure to allocate in crimp_to_fit

commit | commitdiff | tree

Timothy Arceri [Thu, 7 Sep 2017 13:29:25 +0000 (23:29 +1000)]

i965: make use of nir linking

For now linking is just removing unused varyings between stages.

shader-db results BDW:

total instructions in shared programs: 13198288 -> 13191693 (-0.05%)
instructions in affected programs: 48325 -> 41730 (-13.65%)
helped: 473
HURT: 0

total cycles in shared programs: 541184926 -> 541159260 (-0.00%)
cycles in affected programs: 213238 -> 187572 (-12.04%)
helped: 435
HURT: 8

V2:
- lower indirects on demoted inputs as well as outputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Timothy Arceri [Thu, 7 Sep 2017 03:42:17 +0000 (13:42 +1000)]

i965/nir: export nir_optimize

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

commit | commitdiff | tree

Timothy Arceri [Tue, 12 Sep 2017 07:45:47 +0000 (17:45 +1000)]

i965: call brw_shader_gather_info() from the callers of brw_create_nir()

This will allow us to insert a nir linking step in brw_link_shader().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

commit | commitdiff | tree

Timothy Arceri [Tue, 12 Sep 2017 07:30:53 +0000 (17:30 +1000)]

i965: create a brw_shader_gather_info() helper

This will help us call gather info at a later point and allow us
to do some linking in nir.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

commit | commitdiff | tree

Timothy Arceri [Thu, 7 Sep 2017 13:27:59 +0000 (23:27 +1000)]

nir: add some helpers for doing linking

The initial helpers add support for removing unused varyings between
stages.

V2:
- Moved the io mask helper function into this file rather than
nir.h so it's not used elsewhere considering it doesn't handle
all corner cases.
- Use bitmask rather than hash table to handle tcs outputs (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Timothy Arceri [Tue, 12 Sep 2017 03:18:29 +0000 (13:18 +1000)]

glsl: mark xfb varyings as always active

This will be used by the nir linking pass so that we don't remove
otherwise unused varyings.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

commit | commitdiff | tree

Timothy Arceri [Mon, 11 Sep 2017 06:19:22 +0000 (16:19 +1000)]

nir: add always_active_io to nir variable

Will be used in nir link pass to decided if we can remove a varying
or not.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

commit | commitdiff | tree

Marek Olšák [Wed, 13 Sep 2017 00:26:26 +0000 (02:26 +0200)]

r600: fork and import gallium/radeon

This marks the end of code sharing between r600 and radeonsi.
It's getting difficult to work on radeonsi without breaking r600.

A lot of functions had to be renamed to prevent linker conflicts.
There are also minor cleanups.

Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Kenneth Graunke [Thu, 21 Sep 2017 20:30:47 +0000 (13:30 -0700)]

i965: Rename do_flush_locked to submit_batch().

do_flush_locked isn't a great name - especially given that there's no
locking going on in our code relating to execbuf.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

commit | commitdiff | tree

Kenneth Graunke [Thu, 21 Sep 2017 20:45:27 +0000 (13:45 -0700)]

i965: Use atomic ops in get_new_program_id().

We have a nice utility function for this, which eliminates the need for
locking stuff. This isn't really performance critical, but it's less
code to use the atomic.

p_atomic_inc_return does pre-increment rather than post-increment, so we
change screen->program_id to be initialized to 0 instead of 1. At which
point, we can just delete the initialization because intel_screen is
rzalloc'd.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

commit | commitdiff | tree

Kenneth Graunke [Thu, 21 Sep 2017 20:43:30 +0000 (13:43 -0700)]

i965: Convert brw_bufmgr to use C11 mutexes instead of pthreads.

There's no real advantage or disadvantage here, it's just for stylistic
consistency with the rest of the codebase.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

commit | commitdiff | tree

Kenneth Graunke [Mon, 25 Sep 2017 02:20:43 +0000 (19:20 -0700)]

i965: Delete dead meta stencil blit program fields from brw_context.

These have been unused for a while now.

commit | commitdiff | tree

Tim Rowley [Wed, 20 Sep 2017 16:50:32 +0000 (11:50 -0500)]

swr/rast: Handle instanceID offset / Instance Stride enable

Supported in JitGatherVertices(); FetchJit::JitLoadVertices() may require
similar changes, will need address this if it is determined that this
path is still in use.

Handle Force Sequential Access in FetchJit::Create.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

commit | commitdiff | tree

Tim Rowley [Tue, 19 Sep 2017 23:19:53 +0000 (18:19 -0500)]

swr/rast: Remove code supporting legacy llvm (<3.9)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

RSS Atom