Sergii Romantsov [Wed, 5 Jun 2019 11:33:58 +0000 (14:33 +0300)]
intel/dri: finish proper glthread
KWin was able to get NULL-context in the call
intelUnbindContext. But a call _mesa_glthread_finish
is not resistent to such case.
Case can be catched with steps:
1. Create both glx and egl contexts
2. Make glx as current
3. Make egl as current
4. Reset glx context
5. Make egl as current
Solution adds proper finishing of glthread-context
(context will be taken from the requested dri-context
for unbinding, but not from the saved current context).
Piglit-test: https://gitlab.freedesktop.org/mesa/piglit/merge_requests/87
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110814
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111271
Fixes: dca36d5516d0 (i965: Implement threaded GL support)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Thu, 5 Sep 2019 11:57:11 +0000 (13:57 +0200)]
radv: Call nir_propagate_invariant()
Without this, invariant qualifiers don't do anything. Together with a
fix to the game, this fixes flickering in No Man's Sky.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Mon, 2 Sep 2019 10:00:44 +0000 (12:00 +0200)]
radeonsi/nir: Don't lower constant arrays to uniforms
shader-db results:
Totals:
SGPRS:
3955968 ->
3954960 (-0.03 %)
VGPRS:
2220220 ->
2220092 (-0.01 %)
Spilled SGPRs: 11387 -> 11325 (-0.54 %)
Spilled VGPRs: 97 -> 97 (0.00 %)
Private memory VGPRs: 2528 -> 2528 (0.00 %)
Scratch size: 2656 -> 2656 (0.00 %) dwords per thread
Code Size:
76002204 ->
75994988 (-0.01 %) bytes
LDS: 740 -> 740 (0.00 %) blocks
Max Waves: 772776 -> 772787 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 16840 -> 15832 (-5.99 %)
VGPRS: 16452 -> 16324 (-0.78 %)
Spilled SGPRs: 1416 -> 1354 (-4.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 2016 -> 2016 (0.00 %)
Scratch size: 2040 -> 2040 (0.00 %) dwords per thread
Code Size: 953624 -> 946408 (-0.76 %) bytes
LDS: 303 -> 303 (0.00 %) blocks
Max Waves: 1622 -> 1633 (0.68 %)
Wait states: 0 -> 0 (0.00 %)
There were a large number of regressions in code size, but they seem to
be because NIR unrolls some loop which results in the table being
replaced by a bunch of immediates on multiplies etc. -- this bloats code
size since the table size is now included, but means that there are less
loads so it's still a net positive.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Connor Abbott [Fri, 30 Aug 2019 15:57:18 +0000 (17:57 +0200)]
gallium: Plumb through a way to disable GLSL const lowering
For radeonsi, we will prefer the NIR pass as it'll generate better code
(some index calculation and a single load vs. a load, then index
calculation, then another load) and oftentimes NIR optimization can kick
in and make all the access indices constant.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Connor Abbott [Mon, 2 Sep 2019 09:57:34 +0000 (11:57 +0200)]
st/nir: Don't lower indirects when linking
I believe this was stuck here early because otherwise
nir_opt_copy_prop_vars could undo what lower_io_to_temporaries does.
However that has since been fixed. Also, we now use scratch for large
variables so the comment is stale.
On radeonsi these are the shader-db results:
Totals:
SGPRS:
3955968 ->
3955968 (0.00 %)
VGPRS:
2220208 ->
2220220 (0.00 %)
Spilled SGPRs: 11387 -> 11387 (0.00 %)
Spilled VGPRs: 97 -> 97 (0.00 %)
Private memory VGPRs: 2528 -> 2528 (0.00 %)
Scratch size: 2656 -> 2656 (0.00 %) dwords per thread
Code Size:
76002108 ->
76002204 (0.00 %) bytes
LDS: 740 -> 740 (0.00 %) blocks
Max Waves: 772779 -> 772776 (-0.00 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 176 -> 176 (0.00 %)
VGPRS: 144 -> 156 (8.33 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 12104 -> 12200 (0.79 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 28 -> 25 (-10.71 %)
Wait states: 0 -> 0 (0.00 %)
The few small regressions are due to nir_opt_large_constants kicking in
when indirect lowering happens to result in smaller code after
optimization since the array is very simple.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Connor Abbott [Wed, 4 Sep 2019 11:54:13 +0000 (13:54 +0200)]
st/nir: Call nir_remove_unused_variables() in the opt loop
This prevents regressions when disabling indirect lowering. Sometimes
the only use of an input array was copying it to the array created by
nir_lower_io_to_temporaries, and without lowering indirects we wouldn't
have eliminated the temporary array until after linking, which was too
late to remove unused code in the producer.
No shader-db changes with radeonsi NIR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Connor Abbott [Fri, 30 Aug 2019 14:08:47 +0000 (16:08 +0200)]
ac/nir: Enable nir_opt_large_constants
vkpipeline-db numbers:
Totals:
SGPRS:
1740306 ->
1741322 (0.06 %)
VGPRS:
1331124 ->
1331712 (0.04 %)
Spilled SGPRs: 21201 -> 21316 (0.54 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 256 -> 256 (0.00 %) dwords per thread
Code Size:
79022628 ->
78694788 (-0.41 %) bytes
LDS: 6500 -> 6500 (0.00 %) blocks
Max Waves: 301413 -> 301302 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 53633 -> 54649 (1.89 %)
VGPRS: 53000 -> 53588 (1.11 %)
Spilled SGPRs: 3454 -> 3569 (3.33 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size:
5284232 ->
4956392 (-6.20 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 4239 -> 4128 (-2.62 %)
Wait states: 0 -> 0 (0.00 %)
(The biggest VGPR and max wave regression is due to unrolling a loop,
which made the scheduler more aggressive, but in this case it's able to
effectively hide latency so it's actually probably a win.)
shader-db numbers with radeonsi NIR:
Totals:
SGPRS:
3526496 ->
3526512 (0.00 %)
VGPRS:
2198576 ->
2198576 (0.00 %)
Spilled SGPRs: 10463 -> 10463 (0.00 %)
Spilled VGPRs: 86 -> 86 (0.00 %)
Private memory VGPRs: 3182 -> 2528 (-20.55 %)
Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread
Code Size:
74117280 ->
74106140 (-0.02 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 775846 -> 775844 (-0.00 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 856 -> 872 (1.87 %)
VGPRS: 680 -> 680 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 654 -> 0 (-100.00 %)
Scratch size: 668 -> 0 (-100.00 %) dwords per thread
Code Size: 49652 -> 38512 (-22.44 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 182 -> 180 (-1.10 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Thu, 29 Aug 2019 15:28:01 +0000 (17:28 +0200)]
ac/nir: Support load_constant intrinsics
Setup a constant global variable that LLVM will stick in a .rodata
section and generate PC-relative loads for.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Thu, 29 Aug 2019 15:15:46 +0000 (17:15 +0200)]
radv/radeonsi: Don't count read-only data when reporting code size
We usually use these counts as a simple way to figure out if a change
reduces the number of instructions or shrinks an instruction. However,
since .rodata sections aren't executed, we shouldn't be counting their
size for this analysis. Make the linker return the total executable
size, and use it to report the more useful size in both drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Heinrich Fink [Tue, 30 Jul 2019 12:59:41 +0000 (14:59 +0200)]
headers: remove redundant GL token from GL wrapper
Removing GL_FRAMEBUFFER_FLIP_Y_MESA token from glheader.h as it is now
provided by glext.h
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Heinrich Fink [Mon, 29 Jul 2019 13:35:19 +0000 (15:35 +0200)]
specs: Sync framebuffer_flip_y text with GL registry
Sync extension spec of MESA_framebuffer_flip_y to what has been merged
upstream in the GL registry. Update now carries the accepted GL
extension no.
v2: split GL headers update off to separate commit
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Heinrich Fink [Tue, 30 Jul 2019 11:26:46 +0000 (13:26 +0200)]
include: sync GL headers with registry
Integrating headers from upstream registry [0] master branch. Effective
GL registry commit integrated:
9d534f9312e56c72df763207e449c6719576fd54
Keeping the following quirks local to Mesa:
- glext.h: BUILDING_MESA guard (see !1492)
- glxext.h: glXQueryGLXPbufferSGIX: 'int' return type (Mesa) vs while
'void' (GL registry)
- glxext.h: GLX_RENDERER_ID_MESA is still expected by some mesa tests,
even though its token has been removed from the spec (see
docs/specs/MESA_query_renderer.spec)
- glxext.h: glXGetTransparentIndexSUN / PFNGLXGETTRANSPARENTINDEXSUNPROC
argument pTransparentIndex has type 'unsigned long *' (Mesa) vs. 'long
*' (GL registry)
[0] https://github.com/KhronosGroup/OpenGL-Registry
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Hal Gentz [Sun, 1 Sep 2019 23:31:04 +0000 (17:31 -0600)]
clover: Fix build after clang r370122.
../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp: In function ‘std::unique_ptr<clang::CompilerInstance> {anonymous}::create_compiler_instance(const clover::device&, const std::vector<std::__cxx11::basic_string<char> >&, std::string&)’:
../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:203:81: error: no matching function for call to ‘clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, const char* const*, const char* const*, clang::DiagnosticsEngine&)’
203 | c->getInvocation(), copts.data(), copts.data() + copts.size(), diag))
| ^
In file included from /opt/llvm64/include/clang/Frontend/CompilerInstance.h:15,
from ../mesa/src/gallium/state_trackers/clover/llvm/codegen.hpp:37,
from ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:49:
/opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate: ‘static bool clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, llvm::ArrayRef<const char*>, clang::DiagnosticsEngine&)’
157 | static bool CreateFromArgs(CompilerInvocation &Res,
| ^~~~~~~~~~~~~~
/opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate expects 3 arguments, 4 provided
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Vinson Lee [Wed, 4 Sep 2019 07:44:22 +0000 (00:44 -0700)]
scons: Add coroutines component to build.
Fixes: d32690b43c91 ("gallivm: add coroutine pass manager support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Eric Anholt [Wed, 3 Jul 2019 19:04:26 +0000 (12:04 -0700)]
gallium/osmesa: Move 565 format selection checks where the rest are.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Eric Anholt [Wed, 3 Jul 2019 18:28:49 +0000 (11:28 -0700)]
gallium/osmesa: Fix a race in creating the stmgr.
Noticed while looking at other OSMesa bugs.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Eric Anholt [Wed, 3 Jul 2019 18:34:37 +0000 (11:34 -0700)]
gallium/osmesa: Introduce a test.
Given that we occasionally touch this code and probably nobody really
wants to think about it, introduce a minimal test so that we know we
haven't completely broken OSMesa.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Dylan Baker [Wed, 4 Sep 2019 23:00:02 +0000 (16:00 -0700)]
docs: Mark 19.2.0-rc2 as done and push back rc3 and rc4/final
Hal Gentz [Thu, 25 Jul 2019 21:40:50 +0000 (15:40 -0600)]
glx: Fix SEGV due to dereferencing a NULL ptr from XCB-GLX.
When run in optirun, applications that linked to `libGLX.so` and then
proceeded to querying Mesa for extension strings caused a SEGV in Mesa.
`glXQueryExtensionsString` was calling a chain of functions that
eventually led to `__glXQueryServerString`. This function would call
`xcb_glx_query_server_string` then `xcb_glx_query_server_string_reply`.
The latter for some unknown reason returned `NULL`. Passing this `NULL`
to `xcb_glx_query_server_string_string_length` would cause a SEGV as the
function tried to dereference it.
The reason behind the function returning `NULL` is yet to be determined,
however, simply checking that the ptr is not `NULL` resolves this. A
similar check has been added to `__glXGetString` for completeness sake,
although not immediately necessary.
In addition to that, we stumbled into a similar problem in
`AllocAndFetchScreenConfigs` which tries to access the configs to free
them if `__glXQueryServerString` fails. This, of course, SEGVs, because the
configs are yet to have been allocated. Simply continuing past the configs
if their config ptrs are `NULL` resolves this. We also switch to `calloc`
to make sure that the config ptrs are `NULL` by default, and not some
uninitialized value.
Cc: mesa-stable@lists.freedesktop.org
Fixes: 24b8a8cfe821 "glx: implement __glXGetString, hide __glXGetStringFromServer"
Fixes: cb3610e37c4c "Import the GLX client side library, formerly from xc/lib/GL/glx. Build it "
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
Adam Jackson [Tue, 3 Sep 2019 20:43:16 +0000 (16:43 -0400)]
egl: Enable 10bpc EGLConfigs for platform_{device,surfaceless}
It's somewhat annoying that these are so similar for so little benefit.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Neil Roberts [Fri, 23 Aug 2019 12:24:27 +0000 (14:24 +0200)]
glsl: Store the precision for a function return type
The precision for a function return type is now stored in
ir_function_signature. This will later be useful to implement mediump
to float16 lowering. In the meantime it is also useful to catch errors
where a function is redeclared with a different precision.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Dave Airlie [Tue, 27 Aug 2019 07:12:01 +0000 (17:12 +1000)]
docs: add llvmpipe features for fb_no_attach and compute shaders
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:30:28 +0000 (15:30 +1000)]
llvmpipe: enable compute shaders if LLVM has coroutines
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:30:15 +0000 (15:30 +1000)]
llvmpipe: add local memory allocation path
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:28:26 +0000 (15:28 +1000)]
llvmpipe: add compute shader parameter fetching support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:28:13 +0000 (15:28 +1000)]
llvmpipe: add compute shader images support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:21:48 +0000 (15:21 +1000)]
llvmpipe: add ssbo support to compute shaders
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:17:29 +0000 (15:17 +1000)]
llvmpipe: add compute sampler + sampler view support.
This is ported from the fragment shader code.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:08:19 +0000 (15:08 +1000)]
llvmpipe: add support for compute constant buffers.
This is mostly ported from the fragment shader code.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:04:28 +0000 (15:04 +1000)]
llvmpipe: add compute pipeline statistics support.
This just adds the CS invocations counter.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 05:02:32 +0000 (15:02 +1000)]
llvmpipe: add grid launch
This adds the dispatch code. It creates a job for the number
of blocks in the grid, and dispatches them to the threadpool
implementation. The threadpool then calls the JIT code to
execute the coroutines.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 04:57:54 +0000 (14:57 +1000)]
llvmpipe: add compute shader generation.
This creates the coroutine execution environment and the
main compute shaders that get executed inside it.
Each compute shader block is executed in it's own coroutine
execution shader, which each "thread" being a coroutine executed
inside it in sequence.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 04:50:27 +0000 (14:50 +1000)]
llvmpipe: introduce variant building infrastrucutre.
This doesn't actually build any of the shaders yet, but just
builds up the framework necessary to start building the shaders
and variants.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 04:43:33 +0000 (14:43 +1000)]
llvmpipe: introduce new state dirty tracking for compute.
Compute doesn't share dirty state with the fragment pipeline
so create a separate path for it.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 04:42:34 +0000 (14:42 +1000)]
llvmpipe: add initial shader create/bind/destroy variants framework.
This is mostly a port of the fragment shader framework
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 04:35:56 +0000 (14:35 +1000)]
llvmpipe: add compute debug option
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 04:32:46 +0000 (14:32 +1000)]
gallivm: add compute jit interface.
This adds the jit interface for compute shaders, it's based
on the fragment shader one.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 04:28:37 +0000 (14:28 +1000)]
llvmpipe: add initial compute state structs
These mirror the fragment shader structs, this is just a framework.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 03:19:00 +0000 (13:19 +1000)]
llvmpipe: introduce compute shader context
The compute shader will need it's own context like the frag shader
has, this just introduces the framework struct and allocates/frees
for it in the right places.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 02:50:35 +0000 (12:50 +1000)]
gallivm: add barrier support for compute shaders.
When the code is executing an hits a barrier, it will suspend
the coroutine and return control to the coroutine dispatcher.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 27 Aug 2019 02:45:39 +0000 (12:45 +1000)]
llvmpipe: add compute threadpool + mutex
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In order to efficiently run a number of compute blocks, use
a threadpool that just allows for jobs with unique sequential
ids to be dispatched.
Dave Airlie [Sun, 21 Jul 2019 22:29:42 +0000 (08:29 +1000)]
gallivm: add support for compute shared memory
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Sun, 21 Jul 2019 22:27:27 +0000 (08:27 +1000)]
gallivm: add new compute related intrinsics
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Wed, 26 Jun 2019 00:12:28 +0000 (10:12 +1000)]
llvmpipe: reogranise jit pointer ordering
In order to share the texture/image/sampler code with compute
shaders we need to reorg them to be at the front of context
same as draw does for vs/gs sharing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 25 Jun 2019 21:37:20 +0000 (07:37 +1000)]
gallivm: add coroutine pass manager support
coroutines require a proper pass manager, so add the passes
to the correct places
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 25 Jun 2019 21:36:40 +0000 (07:36 +1000)]
gallivm: add coroutine support files to gallivm.
These wrap the coroutine intrinsics and also add some higher
level wrappers around coroutine begin, end and suspend procedures
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 25 Jun 2019 21:35:36 +0000 (07:35 +1000)]
gallivm/flow: add counter reset for loops
This allows the counter value to be forced to a certain value
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Wed, 19 Jun 2019 19:38:19 +0000 (05:38 +1000)]
llvmpipe: enable fb no attach
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Kenneth Graunke [Wed, 28 Aug 2019 19:52:23 +0000 (12:52 -0700)]
iris: Report correct number of planes for planar images
We were only handling the modifiers case and not counting the number of
planes in actual planar images.
Fixes Piglit's ext_image_dma_buf_import-export.
Fixes: fc12fd05f56 ("iris: Implement pipe_screen::resource_get_param")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111509
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Ilia Mirkin [Mon, 2 Sep 2019 22:50:01 +0000 (18:50 -0400)]
teximage: ensure that Tex*SubImage* checks format
We were previously not doing at least some of the checks. This uses the
same logic that is used in glTexImage*.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jan Beich [Sat, 31 Aug 2019 18:32:16 +0000 (18:32 +0000)]
gallium/hud: add CPU usage support for DragonFly/NetBSD/OpenBSD
Each BSD has slightly different sysctl for retrieving per-CPU times.
FreeBSD returns long while NetBSD returns uint64_t. On OpenBSD return
type differs between summation and per-CPU times. DragonFly is
compatible with FreeBSD.
Signed-off-by: Jan Beich <jbeich@FreeBSD.org>
Roman Stratiienko [Mon, 2 Sep 2019 14:46:22 +0000 (17:46 +0300)]
lima: Return fence unconditionally
Based on the vc4 implementation.
Fixes Android RenderEngine::flush() routine:
android.googlesource.com/platform/frameworks/native/+/refs/tags/android-o-mr1-iot-release-smart-clock-fcs/services/surfaceflinger/RenderEngine/RenderEngine.cpp#225
Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Vasily Khoruzhick [Wed, 28 Aug 2019 06:02:12 +0000 (23:02 -0700)]
lima/ppir: clone uniforms and load_coords into each successor
Try more aggressive approach with cloning uniform and coord loads.
Uniform load can be inserted into any instruction, so let's do that. ARM site
claim that penalty for cache miss is one clock, so we don't lose anything if
we merge it into instruction that uses the result. As side effect we can also
pipeline it and thus decrease reg pressure.
Do the same for varyings that hold texture coords, but for different reason:
looks like there's a special path for coords that increases precision if
varying that holds it is pipelined. If we don't pipeline it and load coords
from a register its precision is fp16 and thus only 10 bits which is not enough
to accurately sample textures of size 1024 or larger.
Since instruction can hold only one uniform load and one varying load,
node_to_instr now creates a move using helper introduced in previous commit if
slot is already taken. As side effect of this change we can also try to
pipeline texture loads and create a move if attempt fails.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Sun, 1 Sep 2019 17:21:32 +0000 (10:21 -0700)]
lima/ppir: don't assume that load coords gets value from register
It can load value from varying directly as well. Also load_regs is the
only op that has a source, so add src_num field to load node and set it
accordingly.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Wed, 28 Aug 2019 05:22:01 +0000 (22:22 -0700)]
lima/ppir: add common helper for creating movs
Introduce common helper for creating movs to avoid code duplication
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Eric Engestrom [Mon, 26 Aug 2019 14:33:31 +0000 (15:33 +0100)]
nir: fix memleak in error path
Fixes: 2cf59861a8128a91bfdd ("nir: Add partial redundancy elimination for compares")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Engestrom [Mon, 26 Aug 2019 14:42:32 +0000 (15:42 +0100)]
freedreno/drm-shim: fix mem leak
Fixes: 494ecef6b42198ab6c3e ("freedreno: Add support for drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Engestrom [Mon, 26 Aug 2019 14:32:36 +0000 (15:32 +0100)]
anv: fix format string in error message
Fixes: 9775894f102535a79186 ("anv: Move size check from anv_bo_cache_import() to caller (v2)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Engestrom [Mon, 26 Aug 2019 14:30:54 +0000 (15:30 +0100)]
util/os_file: fix double-close()
Fixes: 955c63d3643f30d7db0c ("util/os_file: resize buffer to what was actually needed")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Eric Engestrom [Mon, 26 Aug 2019 14:29:49 +0000 (15:29 +0100)]
egl: fix deadlock in malloc error path
Fixes: cb0980e69aa921af7086 ("egl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku}")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Eric Engestrom [Mon, 26 Aug 2019 14:52:33 +0000 (15:52 +0100)]
ttn: fix 64-bit shift on 32-bit `1`
Fixes: 4d0b2c7aaac3cf3de5af ("ttn: Update shader->info as we generate code.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Thu, 8 Aug 2019 21:31:50 +0000 (14:31 -0700)]
freedreno/ir3: use uniform base
When lowering from ubo, use the constant base field in the load_uniform
instruction for the constant part of the offset. Doesn't change much
for constant indexing, but this will help for indirect indexing because
constant-folding can't completely clean up the result.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Thu, 29 Aug 2019 18:35:17 +0000 (11:35 -0700)]
freedreno/drm: fix 64b iova shifts
Should shift before splitting 64b iova into dwords
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Thu, 8 Aug 2019 20:37:49 +0000 (13:37 -0700)]
nir: remove unused constant_fold_state
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Wed, 28 Aug 2019 20:31:07 +0000 (13:31 -0700)]
freedreno: Fix the type of single-component scaled vertex attrs.
This looks like clear copy-and-pasteos, and fixes:
dEQP-GLES2.functional.draw.random.40
(on A307 and A630, both tested in the new CI farm)
Reviewed-by: Rob Clark <robdclark@chromium.org>
Connor Abbott [Tue, 27 Aug 2019 11:36:11 +0000 (13:36 +0200)]
radeonsi/nir: Remove uniform variable scanning
We can get all the information we need from NIR. It's slightly less
accurate, but radeonsi doesn't use the extra information. The old code
also overcounted atomic counters, which led to problems when everything
was used at once.
Fixes KHR-GL45.compute_shader.resources-max.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Tue, 27 Aug 2019 09:34:35 +0000 (11:34 +0200)]
ttn: Fill out more info fields
We'll use these in radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Tue, 27 Aug 2019 08:54:12 +0000 (10:54 +0200)]
nir: Fix num_ssbos when lowering atomic counters
Otherwise it's impossible to know the maximum SSBO index for both
internal TGSI shaders from TTN (which don't have any notion of atomic
counters and no offset) as well as shaders from GLSL.
I fixed everything I could find while grepping for num_ssbos and
num_abos, which hopefully is everything (iris was the only user I could
find that uses it in a meaningful way).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Thu, 6 Jun 2019 14:28:48 +0000 (16:28 +0200)]
ac/nir: Fix gather4 integer wa with unnormalized coordinates
This adds a bit of unneccesary code on radeonsi, since whether
unnormalized coordinates are used is known at compile time with GL, but
I wasn't sure if it was worth the few instructions to plumb everything
through, especially for something so rare -- my shader-db doesn't have
any instances where this changes anything.
Fixes CTS tests I created at
https://github.com/cwabbott0/VK-GL-CTS/tree/unnorm-gather-tests
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Thu, 6 Jun 2019 10:16:18 +0000 (12:16 +0200)]
ac/nir: Rewrite gather4 integer workaround based on radeonsi
The workaround was originally written based on amdgpu-pro traces, but
since then radeonsi has got its own slightly different version. Use the
radeonsi version instead, to be consistent and because it'll be slightly
more convenient for handling unnormalized coordinates.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Mon, 2 Sep 2019 09:09:58 +0000 (10:09 +0100)]
egl: warn user if they set an invalid EGL_PLATFORM
Technically, the user might have set EGL_DISPLAY instead of
EGL_PLATFORM, but since the former is deprecated let's just mention the
latter in the warning message.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:38:27 +0000 (17:38 -0700)]
panfrost: Remove panfrost_upload
This routine was made obsolete over a series of reworks of memory
allocation; Tomeu's changes to shader memory allocation finally made
this unused as cppcheck noted.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:37:22 +0000 (17:37 -0700)]
panfrost: Fix misc. issues flagged by cppcheck
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:34:13 +0000 (17:34 -0700)]
panfrost: Mark (1 << 31) as unsigned
I was not aware this incurred undefined behaviour; thank you cppcheck.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:32:30 +0000 (17:32 -0700)]
pan/midgard: Remove mir_rewrite_index_*_tag
These helpers are unused, as flagged by cppcheck.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:31:36 +0000 (17:31 -0700)]
pan/midgard: Remove mir_print_bundle
In practice, the new post-schedule print is just as useful.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:30:31 +0000 (17:30 -0700)]
pan/midgard: Remove cppwrap.cpp
It has not been used in a long time; I forgot this file even existed.
Flagged by cppcheck.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:29:17 +0000 (17:29 -0700)]
pan/midgard: Fix cppcheck issues
Miscellaneous minor issues flagged by cppcheck.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:16:17 +0000 (17:16 -0700)]
pan/midgard: Correct issues in disassemble.c
cppcheck.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:08:20 +0000 (17:08 -0700)]
pan/decode: Add missing format specifier
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:03:25 +0000 (17:03 -0700)]
pan/decode: Use portable format specifier for 64-bit
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:02:43 +0000 (17:02 -0700)]
pan/decode: Use %zu instead of %d
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 00:00:09 +0000 (17:00 -0700)]
pan/decode: Fix uninitialized variables
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Juan A. Suarez Romero [Tue, 3 Sep 2019 11:06:56 +0000 (13:06 +0200)]
docs: update calendar, add news item and link release notes for 19.1.6
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Juan A. Suarez Romero [Tue, 3 Sep 2019 11:04:25 +0000 (13:04 +0200)]
docs: add sha256 checksums for 19.1.6
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
4ec2325dd07a768f2b52ea788ee76085586b2469)
Juan A. Suarez Romero [Tue, 3 Sep 2019 10:02:19 +0000 (12:02 +0200)]
docs: add release notes for 19.1.6
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
85c8f88a49aa7c8aa866faed90a4a63330c15b8b)
Lionel Landwerlin [Wed, 21 Aug 2019 11:47:25 +0000 (13:47 +0200)]
vulkan/overlay: bounce image back to present layout
Once we write the overlay to an image to be presented, we must not
forget to put it back into present layout.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111401
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Zhaowei Yuan [Tue, 3 Sep 2019 02:58:59 +0000 (10:58 +0800)]
broadcom/vc4: Expand width of dst surface
Four bytes of src_surf will be compressed into a 32-bits data and
stored into dst_surf, and dst_surf is read as z-order, so its width
must be aligned to multiples of 8(4x2) before divided by 2.
Signed-off-by: Zhaowei Yuan <zhaowei.yuan@samsung.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111266
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Vinson Lee [Thu, 29 Aug 2019 23:44:09 +0000 (16:44 -0700)]
swr: Fix make_unique build error.
swr_shader.cpp: In function ‘void (* swr_compile_gs(swr_context*, swr_jit_gs_key&))(HANDLE, HANDLE, SWR_GS_CONTEXT*)’:
swr_shader.cpp:732:44: error: ‘make_unique’ was not declared in this scope
ctx->gs->map.insert(std::make_pair(key, make_unique<VariantGS>(builder.gallivm, func)));
^~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
nia [Sat, 31 Aug 2019 17:10:07 +0000 (18:10 +0100)]
loader: include limits.h for PATH_MAX
This is needed to build on illumos.
The location of the PATH_MAX definition in limits.h seems to be fairly standard:
https://pubs.opengroup.org/onlinepubs/
009695399/basedefs/limits.h.html
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Erik Faye-Lund [Wed, 14 Aug 2019 20:29:24 +0000 (22:29 +0200)]
util: only allow _BitScanReverse64 on 64-bit cpus
While the documentation for _BitScanReverse64 on MSDN says that it's
available on ARM, this isn't true. It's only available on ARM64. So
let's match reality.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Erik Faye-Lund [Thu, 15 Aug 2019 19:53:36 +0000 (21:53 +0200)]
mesa/x86: improve SSE-checks for MSVC
This enables some more SSE optimizations on MSVC builds.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Erik Faye-Lund [Wed, 14 Aug 2019 20:28:12 +0000 (22:28 +0200)]
util: do not assume MSVC implies SSE
This is not true for MSVC on ARM.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Erik Faye-Lund [Sun, 1 Sep 2019 08:05:12 +0000 (10:05 +0200)]
util: fix SSE-version needed for double opcodes
This code generates CVTSD2SI, which requires SSE2. So let's fix the
required SSE-version.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5de29ae (util: try to use SSE instructions with MSVC and 32-bit gcc)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Erik Faye-Lund [Thu, 15 Aug 2019 19:08:59 +0000 (21:08 +0200)]
mesa/main: remove unused include
This has been unused since
183db3a6455 ("glsl: move half<->float
convertion to util"), Oct 10 2015. Let's drop needlessly including it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Samuel Pitoiset [Tue, 27 Aug 2019 09:35:00 +0000 (11:35 +0200)]
nir: do not assume that the result of fexp2(a) is always an integral
It's only correct when 'a' is an integral greater or equal to 0.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111493
Fixes: 5544b2cbbd2 ("nir/algebraic: Use value range analysis to eliminate useless unary ops")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Lionel Landwerlin [Sun, 1 Sep 2019 14:22:24 +0000 (17:22 +0300)]
egl: fix platform selection
Add missing "device" platform
v2: Add the missing platform (Eric)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jean Hertel <jean.hertel@hotmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111529
Fixes: d6edccee8d ("egl: add EGL_platform_device support")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Kenneth Graunke [Fri, 21 Jun 2019 03:18:11 +0000 (20:18 -0700)]
iris: Lessen texture cache hack flush for blits/copies on Icelake.
Lionel found actual documentation for this at long last. Apparently
it actually is a sampler cache limitation that was mostly fixed on
Icelake. Unfortunately, it seems there are still issues with ASTC
and non-ASTC sampler views. Still, we can lessen the flush condition
from "format mismatch" to "ASTC mismatch", which eliminates most of
the flushing here.
We also update the documentation to refer to the workaround name.
Vinson Lee [Fri, 30 Aug 2019 06:56:17 +0000 (23:56 -0700)]
util: Define strchrnul on macOS.
strchrnul is not available on macOS.
pipe_loader.c:141:14: error: implicit declaration of function 'strchrnul' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
next = strchrnul(library_paths, ':');
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Erik Faye-Lund [Wed, 17 Jul 2019 08:21:08 +0000 (10:21 +0200)]
gallium/auxiliary/indices: consistently apply start only to input
The majority of these only apply the start argument to the input, but a
few of them also does for the output-array. util_primconvert, the only
user of this argument expects this pass a non-zero start-argument does
not expect this to be applied to the output; if it is, it will write
outside of allocated memory, leading to VRAM corruption.
The reason this doesn't seem to have been noticed before, is that no
driver currently use util_primconvert to convert a primitive-type to
itself, which is the cases where this was broken. But for Zink, this
will no longer be true, because we need to eliminate the use of 8-bit
index-buffers.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 28f3f8d413f ("gallium/auxiliary/indices: add start param")
Reviewed-by: Rob Clark <robdclark@chromium.org>