Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
svga: avoid emitting redundant SetVertexBuffers() commands
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 5 Jan 2016 20:03:04 +0000 (13:03 -0700)]
svga: check for no-ops in svga_bind_sampler_states()
and svga_set_sampler_views(). If there's no change, return early
and don't set a SVGA_NEW_x dirty state flag.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Ilia Mirkin [Tue, 5 Jan 2016 04:28:52 +0000 (23:28 -0500)]
i965: quieten compiler warning about out-of-bounds access
gcc 4.9.3 shows the following error:
brw_vue_map.c:260:20: warning: array subscript is above array bounds
[-Warray-bounds]
return brw_names[slot - VARYING_SLOT_MAX];
This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum
type. Adding an assert will generate no additional code but will teach
the compiler to not complain.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Julien Isorce [Thu, 9 Apr 2015 12:45:17 +0000 (13:45 +0100)]
build: enable st/va with nouveau driver
vainfo fails in vaDriverInit because "dd_create_screen"
does not reach strcmp(driver_name, "nouveau") code.
Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU
is not defined.
This patch define the macro the same it is done for dri and
vdpau targets.
Tested with:
./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va
--with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11
LIBVA_DRIVER_NAME=gallium vainfo
Output:
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileMPEG4Simple : VAEntrypointVLD
VAProfileMPEG4AdvancedSimple : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264Baseline : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Julien Isorce [Wed, 23 Dec 2015 09:25:53 +0000 (09:25 +0000)]
nvc0: add support for st/va
- split nvc0_decoder_bsp in begin/next/end
- preserve content buffer when calling nvc0_decoder_bsp_next
- implement pipe_video_codec::begin_frame/end_frame
https://bugs.freedesktop.org/show_bug.cgi?id=89969
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Julien Isorce [Wed, 23 Dec 2015 09:25:52 +0000 (09:25 +0000)]
nouveau: split nouveau_vp3_bsp in begin/next/end
It allows to call nouveau_vp3_bsp_next multiple times
between one begin/end.
It is required to support st/va.
https://bugs.freedesktop.org/show_bug.cgi?id=89969
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
[imirkin: create strparm_bsp function, simplified w0 calculation]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Julien Isorce [Tue, 5 Jan 2016 15:02:47 +0000 (15:02 +0000)]
st/va: count number of slices
The counter was not set but used by the nouveau driver.
It is required otherwise visual output is garbage.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Ilia Mirkin [Tue, 5 Jan 2016 00:57:11 +0000 (19:57 -0500)]
i965/wm: use binding size for ubo/ssbo when automatic size is unset
This fixes the same tests that commit
8cf2e892f was attempting to fix:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
as confirmed by Samuel.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Ilia Mirkin [Tue, 5 Jan 2016 00:48:08 +0000 (19:48 -0500)]
Revert "i965/wm: use proper API buffer size for the surfaces."
This reverts commit
8cf2e892fca20c4776b4a07c39918343cb2d4e0e. It's
entirely bogus to attempt to store anything about the binding in the
buffer object itself, which might be bound any number of times.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Nicolai Hähnle [Mon, 4 Jan 2016 22:31:05 +0000 (17:31 -0500)]
st/mesa: make KHR_debug output independent of context creation flags (v2)
Instead, keep track of GL_DEBUG_OUTPUT and (un)install the pipe_debug_callback
accordingly. Hardware drivers can still use the absence of the callback to
skip more expensive operations in the normal case, and users can no longer be
surprised by the need to set the debug flag at context creation time.
v2:
- re-add the proper initialization of debug contexts (Ilia Mirkin)
- silence a potential warning (Ilia Mirkin)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 4 Jan 2016 16:26:27 +0000 (11:26 -0500)]
nvc0: scale up inter_bo size so that it's 16M for a 4K video
Experimentally, 4M causes corruption and slowness, try to ramp it up
with size instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Mon, 4 Jan 2016 16:16:45 +0000 (11:16 -0500)]
nv50,nvc0: fix crash when increasing bsp bo size for h264
H264 doesn't have a bitplane bo. We just need a device reference, so use
the one from the client.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Samuel Iglesias Gonsálvez [Tue, 15 Dec 2015 11:51:48 +0000 (12:51 +0100)]
i965/wm: use proper API buffer size for the surfaces.
Commit
5bb5eeea fixes a bug indicating that the surfaces should have the
API buffer size. Hovewer it picked the wrong value.
This patch adds a new variable, which takes into account
glBindBufferRange() values. This patch fixes the following CTS
regressions:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Marek Olšák [Mon, 28 Dec 2015 00:39:20 +0000 (01:39 +0100)]
radeonsi: remove unused parameter from si_shader_binary_read_config
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 22:22:14 +0000 (23:22 +0100)]
radeonsi: move si_shader_binary_upload out of si_shader_binary_read
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 21:16:05 +0000 (22:16 +0100)]
gallium/radeon: dump LLVM module outside of radeon_llvm_compile
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 20:57:40 +0000 (21:57 +0100)]
gallium/radeon: always add +DumpCode to the LLVM target machine for LLVM <= 3.5
It's the same behavior that we use for later LLVM.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 20:24:47 +0000 (21:24 +0100)]
gallium/radeon: r600_can_dump_shader should get TGSI processor type directly
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 21:24:41 +0000 (22:24 +0100)]
radeonsi: pass TGSI processor type to si_shader_binary_read for dumping
the parameter will be used later
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 27 Dec 2015 21:22:24 +0000 (22:22 +0100)]
radeonsi: pass TGSI processor type to si_compile_llvm for dumping
the parameter will be used later
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 30 Dec 2015 14:04:26 +0000 (15:04 +0100)]
radeonsi: rename shader parameter definitions and variables for more clarity
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ilia Mirkin [Thu, 29 Oct 2015 06:52:56 +0000 (02:52 -0400)]
nvc0/ir: add support for PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Thu, 29 Oct 2015 06:52:57 +0000 (02:52 -0400)]
st/mesa: use PK2H/UP2H when supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Sat, 2 Jan 2016 23:55:48 +0000 (18:55 -0500)]
gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Thu, 29 Oct 2015 21:52:46 +0000 (17:52 -0400)]
tgsi: update PK2H/UP2H channel behavior info
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Thu, 29 Oct 2015 06:52:55 +0000 (02:52 -0400)]
gallium: document PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Samuel Pitoiset [Sun, 3 Jan 2016 17:40:39 +0000 (18:40 +0100)]
st/mesa: fix parameter names for tesseval/tessctrl prototypes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sun, 3 Jan 2016 16:29:09 +0000 (11:29 -0500)]
nouveau: fix double-const qualifier
Reported by Tom^ on IRC. The original intent was to mark the pointer
constant as well as the data being pointed to, so move the *.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Rob Clark [Sat, 24 Oct 2015 18:54:56 +0000 (14:54 -0400)]
freedreno/ir3: use NIR_PASS helper macros
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 18 Nov 2015 21:33:41 +0000 (16:33 -0500)]
nir: extract out helper macros for running passes
Note these are a bit uglier, due to avoidance of GNU C extensions. But
drivers which do not need to be built with compilers that don't support
the extension can wrap these macros with their own.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rob Clark [Mon, 26 Oct 2015 14:50:35 +0000 (10:50 -0400)]
freedreno/ir3: we require block_index metadata
Found during NIR_TEST_CLONE=1 piglit run. We were using block->index
but forgetting to require it. Causing things to not work with a cloned
shader which didn't preserve block_index.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 24 Oct 2015 18:30:31 +0000 (14:30 -0400)]
freedreno/ir3: refactor NIR IR handling
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass. This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 21 Dec 2015 15:21:29 +0000 (10:21 -0500)]
freedreno/ir3: drop unnecessary unreachable() case
It will still hit a compile_assert() in emit_tex, which has the
advantage of dumping out the offending shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Samuel Pitoiset [Wed, 25 Nov 2015 00:19:16 +0000 (01:19 +0100)]
gallium/tests: fix build with clang compiler
Nested functions are supported as an extension in GNU C, but Clang
don't support them.
This fixes compilation errors when (manually) building compute.c,
or by setting --enable-gallium-tests to the configure script.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Samuel Pitoiset [Wed, 9 Dec 2015 18:53:18 +0000 (19:53 +0100)]
nv50,nvc0: optimize coherent buffer checking at draw time
Instead of iterating over all the buffer resources looking for coherent
buffers, we keep track of a context-wide count. This will save some
iterations (and CPU cycles) in 99.99% case because usually coherent
buffers are not so used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kenneth Graunke [Sat, 2 Jan 2016 06:27:22 +0000 (22:27 -0800)]
i965: Make TCS precompile use the TES primitive mode when available.
If there's a linked TES program, we should just use the actual
primitive mode. If not, just guess triangles (as we did before).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Mon, 28 Dec 2015 01:26:30 +0000 (17:26 -0800)]
i965: Push most TES inputs in SIMD8 mode.
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache. Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.
However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs. An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.
Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.
Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.
Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).
Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 28 Dec 2015 00:14:11 +0000 (16:14 -0800)]
i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.
We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.
All input loads can use the same header - we don't need to re-expand
g0 every time. CSE accomplishes this, saving instructions.
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0
total cycles in shared programs:
1009850 ->
1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 31 Dec 2015 20:47:19 +0000 (12:47 -0800)]
i965: Move 3-src subnr swizzle handling into the vec4 backend.
While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums. When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.
In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum. This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].
This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.
This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend. I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Anholt [Sun, 3 Jan 2016 01:33:19 +0000 (17:33 -0800)]
vc4: Fix build from upload changes.
Nicolai Hähnle [Sat, 2 Jan 2016 21:40:47 +0000 (16:40 -0500)]
gallium/radeon: send LLVM diagnostics as debug messages
Diagnostics sent during code generation and the every error message reported
by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We
take care of both and also send an explicit message indicating failure at the
end, so that log parsers can more easily tell the boundary between shader
compiles.
Removed an fprintf that could never be triggered.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 30 Dec 2015 21:00:56 +0000 (16:00 -0500)]
gallium/radeon: pass pipe_debug_callback into radeon_llvm_compile (v2)
This will allow us to send shader debug info via the context's debug callback.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 2 Jan 2016 21:30:57 +0000 (16:30 -0500)]
radeonsi: send shader info as debug messages in addition to stderr output
The output via stderr is very helpful for ad-hoc debugging tasks, so that remains
unchanged, but having the information available via debug messages as well
will allow the use of parallel shader-db runs.
Shader stats are always provided (if the context is a debug context, that is),
but you still have to enable the appropriate R600_DEBUG flags to get
disassembly (since it is rather spammy and is only generated by LLVM when we
explicitly ask for it).
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 30 Dec 2015 20:02:57 +0000 (15:02 -0500)]
radeonsi: pass pipe_debug_callback down into si_shader_binary_read (v2)
This will allow us to send shader debug info.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 30 Dec 2015 19:55:34 +0000 (14:55 -0500)]
gallium/radeon: implement set_debug_callback
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:54:31 +0000 (17:54 +0100)]
u_upload_mgr: allow specifying PIPE_USAGE_* for the upload buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:43:48 +0000 (17:43 +0100)]
u_upload_mgr: remove alignment parameter from u_upload_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:15:02 +0000 (17:15 +0100)]
u_upload_mgr: pass alignment to u_upload_buffer manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:15:02 +0000 (17:15 +0100)]
u_upload_mgr: pass alignment to u_upload_data manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 16:15:02 +0000 (17:15 +0100)]
u_upload_mgr: pass alignment to u_upload_alloc manually
The fixed alignment of u_upload_mgr will go away.
This is the first step.
The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 19 Dec 2015 15:44:52 +0000 (16:44 +0100)]
u_upload_mgr: rework the application of alignment
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.
Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.
This a prerequisite for allowing a variable alignment for suballocations.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 6 Dec 2015 12:36:57 +0000 (13:36 +0100)]
st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)
Spotted by luck. The GLSL uniform storage is only associated once
in LinkShader and can't be reallocated afterwards, because that would
break the association.
v2: don't remove st_upload_constants calls, clarify why they're needed
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Marek Olšák [Sun, 6 Dec 2015 12:31:25 +0000 (13:31 +0100)]
program: add _mesa_reserve_parameter_storage
The next commit will use this.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Jordan Justen [Sat, 2 Jan 2016 00:58:49 +0000 (16:58 -0800)]
mesa: Fix warning with MESA_VERBOSE=api for BindBufferRange
Reported-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Ilia Mirkin [Fri, 1 Jan 2016 01:33:15 +0000 (20:33 -0500)]
nv50,nvc0: make sure there's pushbuf space and that we ref the bo early
First off, we can't flush in the middle of a command. Secondly
requesting the extra push space might cause a flush to happen. If that
flush happens, we'd have to do the PUSH_REFN again. So instead do
PUSH_REFN after the push space request. This helps avoid rare crashes
with supertuxkart in libdrm due to assertion failures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Thu, 31 Dec 2015 21:22:40 +0000 (16:22 -0500)]
st/mesa: sort extensions enablement array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Rob Clark [Fri, 1 Jan 2016 17:52:22 +0000 (12:52 -0500)]
nir/lower_clip: add missing writemask on store
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jordan Justen [Wed, 30 Dec 2015 21:08:07 +0000 (13:08 -0800)]
mesa: Add MESA_VERBOSE=api for GL_ARB_program_interface_query
v2:
* Add braces '{}' when the _mesa_debug call spans multiple lines (Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jordan Justen [Tue, 29 Dec 2015 22:24:09 +0000 (14:24 -0800)]
mesa: Add MESA_VERBOSE=api for several indexed BindBuffer variants
v2:
* Add braces '{}' when the _mesa_debug call spans multiple lines (Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:19 +0000 (14:43 +1000)]
st/glsl_to_tgsi: fix block movs for doubles
While playing with fp64, I disable varying packing to debug
something else, and noticed we never emitted half the output
movs for double matrix arrays.
We should be moving the left index two slots for dual
source doubles, and the right index two slots for non-vs
input doubles.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:18 +0000 (14:43 +1000)]
st/glsl_to_tgsi: handle different attrib size
vertex inputs are counted differently in some cases, with
vertex inputs we need to make sure we don't double count them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:17 +0000 (14:43 +1000)]
st/glsl_to_tgsi: readd the double_reg2 for input index mapping
Otherwise we end up emitting the wrong index for the second
double.
This fixes dmat-vs-gs-tcs-tes.shader_test and dvec3-vs-gs-tcs-tes.shader_test
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:16 +0000 (14:43 +1000)]
st/glsl_to_tgsi: when doing reladdr get vec4 of correct type
This fixes fp64 relative addressing, in the upcoming
dmat-vs-gs-tcs-tes.shader_test.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:15 +0000 (14:43 +1000)]
st/glsl_to_tgsi: handle double immediates in matrices properly.
This handles matrix initialisation properly.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:14 +0000 (14:43 +1000)]
st/glsl_to_tgsi: setup writemask for double arrays and matricies.
It's important for the double instruction emission code that
the writemasks are correct going in for double so it know
which channels to replicate.
This fixes it for the array and matrix cases.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:13 +0000 (14:43 +1000)]
st/glsl_to_tgsi: handle doubles in array shrinking code.
This code takes into account double inputs in the array
shrinking code. This fixes some issues with doubles
and geom/tess inputs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:12 +0000 (14:43 +1000)]
st/glsl_to_tgsi: handle doubles outputs in arrays.
This handles the case where a double output is stored
in an array, and tracks it for use in the double
instruction emit code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 19 Dec 2015 04:43:11 +0000 (14:43 +1000)]
st/glsl_to_tgsi: store if dst is double in array
This is just a precursor patch to a fix for doubles with
tessellation that I've written.
We need to descend into output arrays in that case and
mark dst's as double.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Kenneth Graunke [Wed, 30 Dec 2015 10:53:08 +0000 (02:53 -0800)]
nvc0: Set winding order regardless of domain.
Quads need to respect winding order, too - not just triangles.
Fixes rendering in GFXBench 4.0's tessellation benchmark.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Wed, 30 Dec 2015 10:33:00 +0000 (02:33 -0800)]
glsl: Fix varying struct locations when varying packing is disabled.
varying_matches::record tries to compute the number of components in
each varying, which varying_matches::assign_locations uses to assign
locations. With varying packing, it uses glsl_type::component_slots()
to come up with a reasonable value.
Without varying packing, it fell back to an open-coded computation
that didn't bother to handle structs at all. I believe we can simply
use 4 * glsl_type::count_attribute_slots(false), which already handles
these cases correctly.
Partially fixes rendering in GFXBench 4.0's tessellation benchmark.
(NVE0 is almost right after this, but i965 is still mostly garbage.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Mon, 28 Dec 2015 22:20:28 +0000 (14:20 -0800)]
drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.
Unigine Heaven 4.0 and Valley 1.0 use dual color blending but don't
specify which fragment shader output is which, so there's at best a
50/50 chance of us guessing it correctly. This is invalid.
Unigine fixed this in 4.1 and 1.1 versions over a year and a half ago,
but hasn't actually released them for whatever reason. So, add the
workaround back so that it works for most people.
Fixes Heaven 4.0/Valley 1.0 rendering on Ivybridge. For whatever
reason, Broadwell worked. 4.1 and 1.1 have always worked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Ilia Mirkin [Wed, 30 Dec 2015 23:47:18 +0000 (18:47 -0500)]
glsl: add GL_ARB_shader_draw_parameters define
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Ilia Mirkin [Wed, 30 Dec 2015 19:50:02 +0000 (14:50 -0500)]
nvc0: add ARB_shader_draw_parameters support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Tue, 29 Dec 2015 21:39:16 +0000 (16:39 -0500)]
st/mesa: add GL_ARB_shader_draw_parameters support
Hooks up the new system values, passes the drawid in.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Tue, 29 Dec 2015 22:00:05 +0000 (17:00 -0500)]
gallium: add a drawid to pipe_draw_info
This will allow the state tracker to inform the driver where in a
broken-up multidraw we currently are. This can then be passed into the
vertex shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Tue, 29 Dec 2015 21:49:32 +0000 (16:49 -0500)]
gallium: add PIPE_CAP_DRAW_PARAMETERS
This allows the state tracker to know that the various draw parameters
are available in vertex shaders.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Tue, 29 Dec 2015 21:37:19 +0000 (16:37 -0500)]
gallium: add baseinstance/drawid semantics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Mon, 7 Dec 2015 04:49:48 +0000 (23:49 -0500)]
nv50/ir: attempt to do more constant folding on mad -> add conversion
The add might actually have a 0 as an argument, which would convert it
into a mov. Make sure to detect that. Also avoid the hack of putting the
immediate directly into the instruction, instead use a mov to put it
into place and let the later LoadPropagation pass place it if possible.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marta Lofstedt [Tue, 29 Dec 2015 15:15:45 +0000 (16:15 +0100)]
i965/gen8: Always use BRW_REGISTER_TYPE_UW for MUL on GEN8+
The imulExtended tests of the shader bitfield tests of the
OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W
is used for SHADER_OPECODE_MULH.
Also, remove unused helper function:
static inline bool type_is_signed(unsigned type)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Timothy Arceri [Tue, 29 Dec 2015 10:02:56 +0000 (21:02 +1100)]
glsl: tidy up struct with a single member
There used to be more members but they now share other fields
in order to keep memory use low.
Also making the naming more generic will allow us to reuse the
field for explicit byte offsets within blocks for
ARB_enhanced_layouts.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Emil Velikov [Tue, 29 Dec 2015 10:02:55 +0000 (21:02 +1100)]
glsl/linker: annotate static functions as such
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Emil Velikov [Tue, 29 Dec 2015 10:02:54 +0000 (21:02 +1100)]
glsl: annotate ast_process_struct_or_iface_block_members() as static
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Jason Ekstrand [Tue, 29 Dec 2015 17:56:44 +0000 (09:56 -0800)]
nir/builder: Add an init function that creates a simple shader for you
A hugely common case when using nir_builder is to have a shader with a
single function called main. This adds a helper that gives you just that.
This commit also makes us use it in the NIR control-flow unit tests as well
as tgsi_to_nir and prog_to_nir.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Kristian Høgsberg Kristensen [Tue, 29 Dec 2015 19:14:07 +0000 (11:14 -0800)]
mesa/st: Pad out _mesa_sysval_to_semantic for new SYSTEM_VALUE_* enums
GL_ARB_shader_draw_parameters added two new system values. This gets us
back to mapping mesa system values to the right TGSI semantics.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Tue, 29 Dec 2015 20:05:34 +0000 (15:05 -0500)]
nv50/ir: float(s32 & 0xff) = float(u8), not s8
Make sure to make conversion unsigned when we're ANDing the high bits
away. Fixes corruption in dolphin.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Kristian Høgsberg Kristensen [Tue, 15 Dec 2015 07:36:06 +0000 (23:36 -0800)]
i965: Reemit vertex state between indirect multi draws
If we're doing an indirect draw, prims[i].basevertex is always 0 and the
real base vertex value is in the indirect parameter buffer. We try to
avoid flagging BRW_NEW_VERTICES if prims[i].basevertex doesn't change,
which then breaks down for indirect draws. Thus, if a program uses base
vertex or base instance, and the draw call is indirect, always flag
BRW_NEW_VERTICES. A new piglit test,
spec/ARB_shader_draw_parameters/drawid-indirect-vertexid tests this.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kristian Høgsberg Kristensen [Tue, 15 Dec 2015 01:44:23 +0000 (17:44 -0800)]
nir: Teach nir_opt_algebraic about adding and subtracting the same thing
This optimizes a + b - b to just a. Modest shader-db results (BDW):
total instructions in shared programs:
7842452 ->
7841862 (-0.01%)
instructions in affected programs: 61938 -> 61348 (-0.95%)
total loops in shared programs: 2131 -> 2131 (0.00%)
helped: 263
HURT: 0
GAINED: 0
LOST: 0
but the optimization turns
gl_VertexID - gl_BaseVertexARB
into just a reference to SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, which the
i965 hardware supports natively. That means we can avoid using the
internal vertex buffer for gl_BaseVertexARB in this case.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kristian Høgsberg Kristensen [Thu, 10 Dec 2015 20:27:38 +0000 (12:27 -0800)]
i965: Add support for gl_DrawIDARB and enable extension
We have to break open a new vec4 for gl_DrawIDARB. We've used up all
space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its
own separate vertex buffer anyway. This is because we point the vb for
base vertex and base instance into the draw parameter BO for indirect
draw calls, but the draw id is generated by mesa in a different buffer.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kristian Høgsberg Kristensen [Thu, 10 Dec 2015 20:24:50 +0000 (12:24 -0800)]
i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB
We already have gl_BaseVertexARB in the .x component of the SGVS vec4
and plug gl_BaseInstanceARB into the last free component (.y).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kristian Høgsberg Kristensen [Thu, 10 Dec 2015 20:10:28 +0000 (12:10 -0800)]
i965: Assert that SYSTEM_VALUE_VERTEX_ID gets lowered
fs_visitor::emit_vs_system_value() looks like it's trying to handle
SYSTEM_VALUE_VERTEX_ID, but we should never see that value in the
backend.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kristian Høgsberg Kristensen [Thu, 10 Dec 2015 20:07:43 +0000 (12:07 -0800)]
mesa: Add core mesa support for GL_ARB_shader_draw_parameters
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kristian Høgsberg Kristensen [Thu, 10 Dec 2015 20:06:17 +0000 (12:06 -0800)]
mesa/vbo: Add draw_id field to struct _mesa_prim
The drivers will need this for passing in gl_DrawIDARB. For indirect
multidraw calls, we get the prim array and prim[i].draw_id == i and is
redundant. But for non-indirect calls, we get one primitive at a time
and need the draw_id field.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aaron Watry [Tue, 29 Dec 2015 16:51:54 +0000 (10:51 -0600)]
nir: Remove function overload in control flow test
Fixes make check.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Nicolai Hähnle [Tue, 15 Dec 2015 01:41:15 +0000 (20:41 -0500)]
radeonsi: add RADEON_REPLACE_SHADERS debug option
This option allows replacing a single shader by a pre-compiled ELF object
as generated by LLVM's llc, for example. This can be useful for debugging a
deterministically occuring error in shaders (and has in fact helped find
the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264).
v2: drop the debug flag, use DEBUG_GET_ONCE_OPTION instead
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 15 Dec 2015 00:34:45 +0000 (19:34 -0500)]
radeonsi: count compilations in si_compile_llvm
This changes the count slightly (because of si_generate_gs_copy_shader), but
this is only relevant for the driver-specific num-compilations query. It sets
the stage for the next commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 21 Dec 2015 21:11:37 +0000 (16:11 -0500)]
gallium/util: add DEBUG_GET_ONCE_OPTION
This is analogous to the alreading existing macros for BOOL, NUM, and FLAGS.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Grazvydas Ignotas [Tue, 22 Dec 2015 02:12:07 +0000 (04:12 +0200)]
r600: fix constant buffer size programming
When buffer size is less than 16, zero ends up being programmed as
size, which prevents the hardware from fetching the correct values.
Fix it by combining shift and align so that the value is always
rounded up.
Cc: "11.1 11.0 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92229
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Kenneth Graunke [Fri, 25 Dec 2015 01:19:14 +0000 (17:19 -0800)]
docs: Mark ARB_tessellation_shader as done on all i965 platforms.
We now support all Intel GPUs which can do tessellation.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Thu, 26 Nov 2015 01:56:33 +0000 (17:56 -0800)]
i965: Enable ARB_tessellation_shader on Gen7-7.5.
We've resolved all the GPU hangs, and everything seems to be working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Thu, 24 Dec 2015 21:09:26 +0000 (13:09 -0800)]
i965: Don't set interleave or complete on TCS EOT message.
Setting interleave on the TCS EOT message causes Ivybridge hardware to
GPU hang like crazy. Individual tests would pass, but running even a
simple test like nop.shader_test in a loop would hang within 1-3 runs.
Adding sleep delays worked around the problem, somehow.
Interleave doesn't make much sense given that we only have one patch
URB handle, not two. Complete doesn't seem useful either.
There's no reason to actually set those bits. We were just being lazy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>