Mathieu Bridon [Thu, 9 Aug 2018 08:27:19 +0000 (10:27 +0200)]
python: Use the right function for the job
The code was just reimplementing itertools.combinations_with_replacement
in a less efficient way.
This does change the order of the results slightly, but it should be ok.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Anholt [Tue, 7 Aug 2018 18:37:28 +0000 (11:37 -0700)]
egl: Fix leak of X11 pixmaps backing pbuffers in DRI3.
This is basically copied from the DRI2 destroy path. Without this,
Raspberry Pi would quickly run out of CMA during the EGL tests in the CTS
due to all the pixmaps laying around.
Fixes: f35198badeb9 ("egl/x11: Implement dri3 support with loader's dri3 helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Kenneth Graunke [Thu, 2 Aug 2018 22:02:18 +0000 (15:02 -0700)]
intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.
We were generating this instruction in the RT write payload setup:
mov(16) m14<1>F g5<8,8,1>F { align1 compr };
which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers. This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.
I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3. This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.
Normally, we rely on the register allocator to even-align our virtual
GRFs. But we don't control the payload, so we need to lower SIMD widths
to make it work. To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).
Fixes: 8aee87fe4cce0a883867df3546db0e0a36908086 (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>
Kenneth Graunke [Thu, 2 Aug 2018 18:31:27 +0000 (11:31 -0700)]
i965: Only enable depth IZ signals if there's an actual depthbuffer.
According to the G45 PRM Volume 2 Page 265 we're supposed to only set
these signals when there is an actual depth buffer. Note that we
already do this for the stencil buffer by virtue of brw->stencil_enabled
invoking _mesa_is_stencil_enabled(ctx) which checks whether the current
drawbuffer's visual has stencil bits (which is updated based on what
buffers are bound). We just need to do it for depth as well.
Not observed to fix anything.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Adam Jackson [Tue, 5 Dec 2017 16:10:09 +0000 (11:10 -0500)]
glx: GLX_MESA_multithread_makecurrent is direct-only
This extension is not defined for indirect contexts. Marking it as
"client only", as the old code did here, would make the extension
available in indirect contexts, even though the server would certainly
not have it in its extension list.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Eric Engestrom [Wed, 8 Aug 2018 14:42:49 +0000 (15:42 +0100)]
anv: set error in all failure paths
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 5b196f39bddc689742d3 "anv/pipeline: Compile to NIR in compile_graphics"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Engestrom [Wed, 8 Aug 2018 14:26:32 +0000 (15:26 +0100)]
intel/tools: add missing variable initialisation
Fixes: 6a60beba4089315685b8 "intel/tools: Add an error state to aub translator"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
vadym.shovkoplias [Mon, 6 Aug 2018 12:52:13 +0000 (15:52 +0300)]
drirc: Allow extension midshader for Metro Redux
This fixes both Metro 2033 Redux and Metro Last Light Redux
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99730
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tapani Pälli [Tue, 7 Aug 2018 05:20:29 +0000 (08:20 +0300)]
glsl: handle error case with ast_post_inc, ast_post_dec
Return ir_rvalue::error_value with ast_post_inc, ast_post_dec if
parser error was emitted previously. This way process_array_size
won't see bogus IR generated like with commit
9c676a64273.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98699
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Eric Anholt [Thu, 9 Aug 2018 00:34:42 +0000 (17:34 -0700)]
vc4: Implement texture_subdata() to directly upload tiled data.
This avoids a memcpy into a temporary in the upload path.
Improves x11perf -putimage100 performance by 12.1586% +/- 1.38155% (n=145)
Eric Anholt [Wed, 4 Jan 2017 22:08:10 +0000 (14:08 -0800)]
vc4: Handle partial loads/stores of tiled textures.
Previously, we would load out the tile-aligned area, update the raster
copy, and store it back. This was a huge cost for XPutImage calls to the
screen under glamor.
Instead, implement a general load/store path that walks over the source
x/y writing into the corresponding pixel of the destination (using clever
math from
https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/).
If things are aligned, we go through the previous utile-at-a-time loop.
Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5)
Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11)
Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)
Eric Anholt [Wed, 8 Aug 2018 00:53:24 +0000 (17:53 -0700)]
vc4: Compile the LT image helper per cpp we might load/store.
For the partial load/store support I'm about to add, we want the memcpy to
be compiled out to a single load/store. This should also eliminate the
calls to vc4_utile_width/height().
Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)
Eric Anholt [Wed, 1 Mar 2017 01:39:23 +0000 (17:39 -0800)]
vc4: Refactor to reuse the LT tile walking code.
Juan A. Suarez Romero [Wed, 6 Jun 2018 10:13:05 +0000 (10:13 +0000)]
wayland/egl: update surface size on window resize
According to EGL 1.5 spec, section 3.10.1.1 ("Native Window Resizing"):
"If the native window corresponding to _surface_ has been resized
prior to the swap, _surface_ must be resized to match. _surface_ will
normally be resized by the EGL implementation at the time the native
window is resized. If the implementation cannot do this transparently
to the client, then *eglSwapBuffers* must detect the change and
resize surface prior to copying its pixels to the native window."
So far, resizing a native window in Wayland/EGL was interpreted in Mesa
as a request to resize, which is not executed until the first draw call.
And hence, surface size is not updated until executing it. Thus,
querying the surface size with eglQuerySurface() after a window resize
still returns the old values.
This commit updates the surface size values as soon as the resize is
done, even when the real resize is done in the draw call. This makes the
semantics that any native window resize request take effect inmediately,
and if user calls eglQuerySurface() it will return the new resized
values.
v2: update surface size if there isn't a back surface (Daniel)
CC: Daniel Stone <daniel@fooishbar.org>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
Juan A. Suarez Romero [Mon, 4 Jun 2018 10:22:49 +0000 (10:22 +0000)]
wayland/egl: initialize window surface size to window size
When creating a windows surface with eglCreateWindowSurface(), the
width and height returned by eglQuerySurface(EGL_{WIDTH,HEIGHT}) is
invalid until buffers are updated (like calling glClear()).
But according to EGL 1.5 spec, section 3.5.6 ("Surface Attributes"):
"Querying EGL_WIDTH and EGL_HEIGHT returns respectively the width and
height, in pixels, of the surface. For a window or pixmap surface,
these values are initially equal to the width and height of the
native window or pixmap with respect to which the surface was
created"
This fixes dEQP-EGL.functional.color_clears.* CTS tests
v2:
- Do not modify attached_{width,height} (Daniel)
- Do not update size on resizing window (Brendan)
CC: Daniel Stone <daniel@fooishbar.org>
CC: Brendan King <brendan.king@imgtec.com>
CC: mesa-stable@lists.freedesktop.org
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Juan A. Suarez Romero [Wed, 8 Aug 2018 11:27:32 +0000 (13:27 +0200)]
travis: make drivers explicit in Meson targets
Like in the autotools target, make the list of drivers to be built in
each of the Meson targets explicit.
This will help to identify missing dependencies and other issues more
easily.
CC: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Brian Paul [Fri, 3 Aug 2018 12:28:34 +0000 (06:28 -0600)]
svga: use pipe_sampler_view::target in svga_set_sampler_views()
instead of the underlying texture's target. This fixes an issue
where the TGSI sampler type was not agreeing with the sampler view
target/type. In particular, this fixes a Mint 19 XFCE desktop
scaling issue because the TGSI code was using a RECT sampler but
the sampler view's underlying texture was PIPE_TEXTURE_2D.
We want to use the sampler view's type rather than the underlying
resource, as we do for the view's surface format.
No piglit regressions.
VMware issue
2156696.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Tue, 31 Jul 2018 16:12:47 +0000 (10:12 -0600)]
svga: use SVGA3D_RS_FILLMODE for vgpu9
I'm not sure why we didn't support this in the past, but fillmode
is supported by all renderers nowadays.
Also fix the logic in svga_create_rasterizer_state() to avoid a few
swtnl case.
No piglit regressions
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 27 Jul 2018 17:53:32 +0000 (11:53 -0600)]
svga: add TGSI_SEMANTIC_FACE switch case in svga_swtnl_update_vdecl()
Fixes failed assertion running Piglit polygon-mode-face test.
Though, the test still does not pass.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Mon, 6 Aug 2018 15:34:33 +0000 (09:34 -0600)]
xlib: remove unused Fake_glXGetAGPOffsetMESA() function
To silence compiler warning.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Brian Paul [Mon, 6 Aug 2018 15:32:10 +0000 (09:32 -0600)]
gl.h: define GLeglImageOES depending on GL_EXT_EGL_image_storage
To avoid duplicate typedef with the definition in glext.h
V2: test for both GL_OES_EGL_image and GL_EXT_EGL_image_storage in
case both the GL and GLES headers are included. Per Emil.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107488
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Emil Velikov [Fri, 24 Nov 2017 14:25:06 +0000 (14:25 +0000)]
Android: copy -fno*math* options from the autotools build
Add -fno-math-errno and -fno-trapping-math to the build.
Mesa does not depend on the functionality provided, thus this should
result in slightly faster code and smaller binaries.
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Emil Velikov [Fri, 23 Feb 2018 19:32:08 +0000 (19:32 +0000)]
autotools: use correct gl.pc LIBS when using glvnd
This is more of a hack, since glvnd itself should be providing the file.
Until that happens, ensure the libs is correctly set to -lGL
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Emil Velikov [Fri, 23 Feb 2018 19:32:07 +0000 (19:32 +0000)]
glx: automake: add egl.pc/headers TODO when using glvnd
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Emil Velikov [Fri, 23 Feb 2018 19:32:06 +0000 (19:32 +0000)]
egl: automake: add egl.pc/headers TODO when using glvnd
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Emil Velikov [Fri, 23 Feb 2018 19:32:05 +0000 (19:32 +0000)]
autotools: error out when building with mangling and glvnd
It's not a thing that can work, nor is a wise idea to attempt.
v2: Tweak error message (Dylan)
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)
Emil Velikov [Fri, 23 Feb 2018 19:32:04 +0000 (19:32 +0000)]
autotools: error out when using the broken --with-{gl, osmesa}-lib-name
The toggles were broken with the introduction of --enable-mangling.
Fixing that up might be possible, but it's not worth the complexity
since one can rename the libraries at any point.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Emil Velikov [Fri, 23 Feb 2018 19:32:01 +0000 (19:32 +0000)]
meson: recommend building the surfaceless platform
It has no special requirements, size and build-time is effectively zero.
v2: Rebase
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Emil Velikov [Fri, 23 Feb 2018 19:32:00 +0000 (19:32 +0000)]
automake: require shared glapi when using DRI based libGL
This has been a requirement for ages, yet it seems like we never
explicitly errored out during configure.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Emil Velikov [Wed, 28 Feb 2018 14:30:06 +0000 (14:30 +0000)]
ttn: remove {varying_slot, frag_result}_to_tgsi_semantic helpers
The respective drivers have been updated and the helpers are no longer
needed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Juan A. Suarez Romero [Wed, 8 Aug 2018 10:16:59 +0000 (12:16 +0200)]
travis: remove libedit-dev dependency in LLVM 6.0 targets
In LLVM <6.0 we added explicitly libedit-dev, as it was required to
satisfy apt dependencies.
In LLVM 6.0, this is not required anymore, so let's remove it.
CC: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Erik Faye-Lund [Mon, 6 Aug 2018 16:59:06 +0000 (18:59 +0200)]
glsl_to_tgsi: plumb image writable through to driver
The virgl driver cares about the writable-flag on image definitions,
because it re-emits GLSL from the TGSI. However, so far it was hardcoded
to true in glsl_to_tgsi, which cause problems when virglrenderer is
running on top of GLES 3.1, where not all formats are supported for
writable images.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Tue, 7 Aug 2018 19:15:03 +0000 (12:15 -0700)]
vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels.
We won't have an FD if we're just having the server wait on a fence
created by eglCreateSyncKHR(). Our seqno fences will happen in order, so
server-side waits are no-ops in that case. Fixes
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete
Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
Eric Anholt [Tue, 7 Aug 2018 20:47:08 +0000 (13:47 -0700)]
vc4: Ignore samplers for finding uniform offsets.
Fixes:
dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex
dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex
Cc: mesa-stable@lists.freedesktop.org
Eric Anholt [Tue, 7 Aug 2018 20:38:36 +0000 (13:38 -0700)]
vc4: Extend dumping of uniforms in QIR and in the command stream.
Similar to what I did for V3D, provide some description of the uniforms.
Eric Anholt [Tue, 7 Aug 2018 20:31:09 +0000 (13:31 -0700)]
vc4: Pull uinfo->data[i] dereference out to the top of the loop.
Reduces the size of vc4_uniforms.o by about 10%. We would basically
always end up loading the cachline of uinfo->data[i] anyway, so it should
be good for performance as well as making the code a bit cleaner.
Eric Anholt [Tue, 7 Aug 2018 20:08:15 +0000 (13:08 -0700)]
vc4: Make sure to emit a tile coordinates between two MSAA loads.
The HW only executes a load once the tile coordinates packet happens, and
only tracks one at a time, so by emitting our two MSAA loads back to back
we would end up with an undefined color or Z buffer. The simulator
doesn't seem to care, but sync up the RCL generation with the kernel
anyway.
Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window
Eric Anholt [Tue, 7 Aug 2018 19:59:14 +0000 (12:59 -0700)]
vc4: Respect a sampler view's first_layer field.
Fixes texturing from EGL images created from cubemap faces, as in
dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture
Cc: mesa-stable@lists.freedesktop.org
Dave Airlie [Mon, 30 Jul 2018 22:02:59 +0000 (08:02 +1000)]
virgl: add ARB_shader_clock support
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Mathieu Bridon [Thu, 5 Jul 2018 13:17:46 +0000 (15:17 +0200)]
python: Specify the template output encoding
We're trying to write a unicode string (i.e decoded) to a file opened
in binary (i.e encoded) mode.
In Python 2 this works, because of the automatic conversion between
byte and unicode strings.
In Python 3 this fails though, as no automatic conversion is attempted.
This change makes the scripts compatible with both versions of Python.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Mathieu Bridon [Tue, 17 Jul 2018 20:57:39 +0000 (22:57 +0200)]
python: Fix rich comparisons
Python 3 doesn't call objects __cmp__() methods any more to compare
them. Instead, it requires implementing the rich comparison methods
explicitly: __eq__(), __ne(), __lt__(), __le__(), __gt__() and __ge__().
Fortunately Python 2 also supports those.
This commit only implements the comparison methods which are actually
used by the build scripts.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Mathieu Bridon [Wed, 25 Jul 2018 09:53:54 +0000 (11:53 +0200)]
python: Use explicit integer divisions
In Python 2, divisions of integers return an integer:
>>> 32 / 4
8
In Python 3 though, they return floats:
>>> 32 / 4
8.0
However, Python 3 has an explicit integer division operator:
>>> 32 // 4
8
That operator exists on Python >= 2.2, so let's use it everywhere to
make the scripts compatible with both Python 2 and 3.
In addition, using __future__.division tells Python 2 to behave the same
way as Python 3, which helps ensure the scripts produce the same output
in both versions of Python.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Chad Versace [Tue, 1 May 2018 05:32:25 +0000 (22:32 -0700)]
egl/main: Add bits for EGL_KHR_mutable_render_buffer
A follow-up patch enables EGL_KHR_mutable_render_buffer for Android.
This patch is separate from the Android patch because I think it's
easier to review the platform-independent bits separately.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Chad Versace [Sat, 31 Mar 2018 08:15:09 +0000 (01:15 -0700)]
dri: Add param driCreateConfigs(mutable_render_buffer)
If set, then the config will have __DRI_ATTRIB_MUTABLE_RENDER_BUFFER,
which translates to EGL_MUTABLE_RENDER_BUFFER_BIT_KHR.
Not used yet.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Chad Versace [Sat, 31 Mar 2018 08:16:14 +0000 (01:16 -0700)]
dri: Define DRI_MutableRenderBuffer extensions
Define extensions DRI_MutableRenderBufferDriver and
DRI_MutableRenderBufferLoader. These are the two halves for
EGL_KHR_mutable_render_buffer.
Outside the DRI code there is one additional change. Add
gl_config::mutableRenderBuffer to match
__DRI_ATTRIB_MUTABLE_RENDER_BUFFER. Neither are used yet.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Chad Versace [Sat, 7 Apr 2018 20:01:15 +0000 (13:01 -0700)]
egl/dri2: In dri2_make_current, return early on failure
This pulls an 'else' block into the function's main body, making the
code easier to follow.
Without this change, the upcoming EGL_KHR_mutable_render_buffer patch
transforms dri2_make_current() into spaghetti.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Chad Versace [Sat, 7 Apr 2018 21:23:48 +0000 (14:23 -0700)]
egl: Simplify queries for EGL_RENDER_BUFFER
There exist *two* queryable EGL_RENDER_BUFFER states in EGL:
eglQuerySurface(EGL_RENDER_BUFFER) and
eglQueryContext(EGL_RENDER_BUFFER).
These changes eliminate potentially very fragile code in the upcoming
EGL_KHR_mutable_render_buffer implementation.
* eglQuerySurface(EGL_RENDER_BUFFER)
The implementation of eglQuerySurface(EGL_RENDER_BUFFER) contained
abstruse logic which required comprehending the specification
complexities of how the two EGL_RENDER_BUFFER states interact. The
function sometimes returned _EGLContext::WindowRenderBuffer, sometimes
_EGLSurface::RenderBuffer. Why? The function tried to encode the
actual logic from the EGL spec. When did the function return which
variable? Go study the EGL spec, hope you understand it, then hope
Mesa mutated the EGL_RENDER_BUFFER state in all the correct places.
Have fun.
To simplify eglQuerySurface(EGL_RENDER_BUFFER), and to improve
confidence in its correctness, flatten its indirect logic. For pixmap
and pbuffer surfaces, simply return a hard-coded literal value, as the
spec suggests. For window surfaces, simply return
_EGLSurface::RequestedRenderBuffer. Nothing difficult here.
* eglQueryContext(EGL_RENDER_BUFFER)
The implementation of this suffered from the same issues as
eglQuerySurface, and the solution is the same. confidence in its
correctness, flatten its indirect logic. For pixmap and pbuffer
surfaces, simply return a hard-coded literal value, as the spec
suggests. For window surfaces, simply return
_EGLSurface::ActiveRenderBuffer.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Marek Olšák [Thu, 26 Jul 2018 22:26:56 +0000 (18:26 -0400)]
radeonsi: set GLC=1 for all write-only shader resources
Marek Olšák [Wed, 25 Jul 2018 05:35:11 +0000 (01:35 -0400)]
radeonsi: don't load block dimensions into SGPRs if they are not variable
Juan A. Suarez Romero [Mon, 6 Aug 2018 08:19:40 +0000 (10:19 +0200)]
travis: meson/Vulkan requires LLVM 6.0
RADV now requires LLVM 6.0.
Fixes: fd1121e8399 ("amd: remove support for LLVM 5.0")
CC: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Juan A. Suarez Romero [Mon, 6 Aug 2018 10:12:09 +0000 (12:12 +0200)]
travis: add ubuntu-toolchain-r-test
LLVM 6.0 requires libstc++4.9, which is not available in main Travis
repository.
v2: LLVM 6.0 requires libstdc+4.9, rather than GCC 4.9 (Jan Vesely)
Fixes: fd1121e8399 ("amd: remove support for LLVM 5.0")
CC: Marek Olšák <marek.olsak@amd.com>
CC: Emil Velikov <emil.velikov@collabora.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Wed, 27 Jun 2018 19:07:20 +0000 (20:07 +0100)]
egl: set EGL_BAD_NATIVE_PIXMAP in the copy_buffers fallback
As the spec says:
EGL_BAD_NATIVE_PIXMAP is generated if the implementation
does not support native pixmaps.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Emil Velikov [Wed, 27 Jun 2018 17:17:37 +0000 (18:17 +0100)]
egl/x11: use the no-op dri2_fallback_copy_buffers for swrast
Currently dri2_copy_buffers is used for swrast, which depends on the
DRI2_FLUSH extension. Since that's not a thing on software based
drivers we crash out.
Do the slightly more graceful, thing of returning EGL_FALSE.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Emil Velikov [Mon, 13 Nov 2017 14:04:59 +0000 (14:04 +0000)]
egl: remove unneeded _eglGetNativePlatform check
There's little point in calling _eglGetNativePlatform() in
eglCopyBuffers. The platform returned should be identical to the one
already stored in our _EGLDisplay.
In the following corner case, the check is incorrect.
The function _eglGetNativePlatform effectively invokes the old-style
eglGetDisplay platform selection. Thus if the EGL_PLATFORM platform does
not match with the EGL_EXT_platform_* used to create the display we'll
error out.
Addresses the egl-copy-buffers piglit test.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Emil Velikov [Fri, 22 Jun 2018 15:05:31 +0000 (16:05 +0100)]
travis: use https for all the links
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Emil Velikov [Thu, 28 Jun 2018 14:06:09 +0000 (15:06 +0100)]
autoconf: stop exporting internal wayland details
With version v1.15 the "code" option was deprecated in favour of
"private-code" or "public-code".
Before the interface symbol generated was exported (which is a bad idea
since it's internal implementation detail) and others may misuse it.
That was the case with libva approx. 1 year ago. Since then libva was
fixed, so we can finally hide it by using "private-code"
Inspired by similar xserver patch by Adam Jackson.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Emil Velikov [Thu, 28 Jun 2018 13:42:08 +0000 (14:42 +0100)]
meson: stop exporting internal wayland details
With version v1.15 the "code" option was deprecated in favour of
"private-code" or "public-code".
Before the interface symbol generated was exported (which is a bad idea
since it's internal implementation detail) and others may misuse it.
That was the case with libva approx. 1 year ago. Since then libva was
fixed, so we can finally hide it by using "private-code"
Inspired by similar xserver patch by Adam Jackson.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Emil Velikov [Thu, 28 Jun 2018 13:34:18 +0000 (14:34 +0100)]
meson: use dependency()+find_program() for wayland-scanner
Helps when the native wayland-scanner is located outside of PATH.
Inspired by the xserver code ;-)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Emil Velikov [Mon, 22 Jan 2018 17:52:49 +0000 (17:52 +0000)]
swr: don't export swr_create_screen_internal
With earlier rework the user and provider of the symbol are within the
same binary. Thus there's no point in exporting the function.
Spotted while reviewing patch from Chuck, that nearly added another
unneeded PUBLIC function.
Cc: Chuck Atkins <chuck.atkins@kitware.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Fixes: f50aa21456d "(swr: build driver proper separate from rasterizer")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com<mailto:george.kyriazis@intel.com>>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com<mailto:chuck.atkins@kitware.com>>
Eric Engestrom [Tue, 7 Aug 2018 13:59:36 +0000 (14:59 +0100)]
meson: install KHR/khrplatform.h when needed
Fixes: f7d42ee7d319256608ad "include: update GL & GLES headers (v2)"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Eric Engestrom [Tue, 7 Aug 2018 10:43:50 +0000 (11:43 +0100)]
i965: gen_shader_sha1() doesn't use the brw_context
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Eric Engestrom [Tue, 7 Aug 2018 11:56:25 +0000 (12:56 +0100)]
configure: install KHR/khrplatform.h when needed
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107511
Fixes: f7d42ee7d319256608ad "include: update GL & GLES headers (v2)"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Brad King <brad.king@kitware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Lionel Landwerlin [Tue, 7 Aug 2018 10:38:59 +0000 (11:38 +0100)]
intel: don't build tools without -Dtools=intel
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107487
Fixes: 4334196ab325c6w ("intel: tools: simplify meson build")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Erik Faye-Lund [Thu, 12 Jul 2018 10:43:13 +0000 (12:43 +0200)]
virgl: update virgl_hw.h from virglrenderer
This just makes sure we're currently up-to-date with what
virglrenderer has.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Thu, 12 Jul 2018 10:40:09 +0000 (12:40 +0200)]
virgl: rename msaa_sample_positions -> sample_locations
This matches what this field is called in virglrenderer's copy of
this.
This reduces the diff between the two different versions of
virgl_hw.h, and should make it easier to upgrade the file in
the future.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Eric Anholt [Tue, 7 Aug 2018 01:53:57 +0000 (18:53 -0700)]
vc4: Fix a leak of the no-vertex-elements workaround BO.
Fixes: bd1925562ad1 ("vc4: Convert the driver to emitting the shader record using pack macros.")
Eric Anholt [Mon, 6 Aug 2018 22:28:56 +0000 (15:28 -0700)]
vc4: Fix context creation when syncobjs aren't supported.
Noticed when trying to run current Mesa on rpi's downstream kernel.
Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
Eric Anholt [Thu, 2 Aug 2018 19:23:02 +0000 (12:23 -0700)]
v3d: Emit the VCM_CACHE_SIZE packet.
This is needed to ensure that we don't get blocked waiting for VPM space
with bin/render overlapping.
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Thu, 2 Aug 2018 19:15:20 +0000 (12:15 -0700)]
v3d: Drop "VC5" from the renderer string.
VC5 isn't a useful name any more, just stick to v3d.
Eric Anholt [Thu, 2 Aug 2018 18:12:37 +0000 (11:12 -0700)]
v3d: Avoid spilling that breaks the r5 usage after a ldvary.
Fixes bad rendering when forcing 2 spills in glxgears.
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Thu, 2 Aug 2018 00:47:13 +0000 (17:47 -0700)]
v3d: Make sure that QPU instruction-has-a-dest matches VIR.
Found when debugging register spilling -- we would try to spill the dest
of a STVPMV, inserting spill code after entering the last segment. In
fact, we were likely to to choose to do this, given that the STVPMV "dest"
temp was never read from, making it cheap to spill.
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Wed, 1 Aug 2018 23:56:38 +0000 (16:56 -0700)]
v3d: Wait for TMU writes to complete before continuing after a spill.
The simulator complained that we had write responses outstanding at shader
end. It seems that a TMU read does not guarantee that previous TMU writes
by the thread have completed, which surprised me.
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Wed, 1 Aug 2018 23:37:08 +0000 (16:37 -0700)]
v3d: Make sure we don't emit a thrsw before the last one finished.
Found while forcing some spilling, which creates a lot of short
tmua->thrsw->ldtmu sequences.
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Thu, 2 Aug 2018 00:38:25 +0000 (17:38 -0700)]
v3d: Add some debug code for forcing register spilling.
This is useful for periodically testing out register spilling to see how
it goes on simple shaders, rather than only failing on insanely
complicated ones.
Chad Versace [Thu, 19 Jul 2018 00:43:35 +0000 (17:43 -0700)]
drisw: Fix build on Android Nougat, which lacks shm (v2)
In commit
cf54bd5e8, dri_sw_winsys.c began using <sys/shm.h> to support
the new functions putImageShm, getImageShm in DRI_SWRastLoader. But
Android began supporting System V shared memory only in Oreo. Nougat has
no shm headers.
Fix the build by ifdef'ing out the shm code on Nougat.
Fixes: cf54bd5e8 "drisw: use shared memory when possible"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@gmail.com>
Ian Romanick [Sun, 5 Aug 2018 19:37:08 +0000 (12:37 -0700)]
mesa: fix make check for AMD_framebuffer_multisample_advanced
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107483
Fixes: 3d6900d76ef ("glapi: define AMD_framebuffer_multisample_advanced and add its functions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Ian Romanick [Sun, 5 Aug 2018 19:35:42 +0000 (12:35 -0700)]
glapi: Fix GLES versioning for AMD_framebuffer_multisample_advanced functions
The GL_AMD_framebuffer_multisample_advanced spec says:
OpenGL ES dependencies:
Requires OpenGL ES 3.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107483
Fixes: 3d6900d76ef ("glapi: define AMD_framebuffer_multisample_advanced and add its functions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Gert Wollny [Fri, 3 Aug 2018 09:47:28 +0000 (11:47 +0200)]
meson, install_megadrivers: Also remove stale symlinks
os.path.exists doesn't return True for stale symlinks, but they are in
the way later, when a link/file with the same name is to be created.
For instance it is conceivable that the pointed to file is replaced by
a file with a new name, and then the symlink is dead.
To handle this check specifically for all existing symlinks to be
removed. (This bugged me for some time with a link libXvMCr600.so
always being in the way of installing this file)
v2: use only os.lexist and replace all instances of os.exist (Dylan Baker)
v3: handle directory check correctly (Eric Engestrom)
Fixes: f7f1b30f81e842db6057591470ce3cb6d4fb2795
("meson: extend install_megadrivers script to handle symmlinking")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>(v2 minus dir check)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Tapani Pälli [Wed, 25 Jul 2018 11:26:33 +0000 (14:26 +0300)]
anv: add more swapchain formats
This change helps with some of the dEQP-VK.wsi.android.* tests that
try to create swapchain with using such formats.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Karol Herbst [Sat, 23 Jun 2018 17:01:34 +0000 (19:01 +0200)]
nvc0/ir: return 0 in imageLoad on incomplete textures
We already guarded all OP_SULDP against out of bound accesses, but we
ended up just reusing whatever value was stored in the dest registers.
Fixes CTS test shader_image_load_store.incomplete_textures
v2: fix for loads not ending up with predicates (bindless_texture)
v3: fix replacing the def
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Karol Herbst [Sat, 4 Aug 2018 02:19:49 +0000 (04:19 +0200)]
gm200/ir: optimize rcp(sqrt) to rsq
mitigates hurt shaders after adding sqrt:
total instructions in shared programs :
5456166 ->
5454825 (-0.02%)
total gprs used in shared programs : 647522 -> 647551 (0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs :
58288696 ->
58274448 (-0.02%)
local shared gpr inst bytes
helped 0 0 0 516 516
hurt 0 0 27 2 2
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Karol Herbst [Sat, 4 Aug 2018 01:13:11 +0000 (03:13 +0200)]
gm200/ir: add native OP_SQRT support
./GpuTest /test=pixmark_piano 1024x640 30sec:
301 -> 327 points
shader-db:
total instructions in shared programs :
5472103 ->
5456166 (-0.29%)
total gprs used in shared programs : 647530 -> 647522 (-0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs :
58459304 ->
58288696 (-0.29%)
local shared gpr inst bytes
helped 0 0 27 8281 8281
hurt 0 0 21 431 431
v2: use NVISA_GM200_CHIPSET
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Lionel Landwerlin [Sat, 28 Jul 2018 13:51:56 +0000 (14:51 +0100)]
intel: tools: simplify meson build
Remove the if tools condition and just put it through the install:
parameter.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Sat, 28 Jul 2018 13:27:49 +0000 (14:27 +0100)]
intel: aubinator: simplify decoding
Since we don't support streaming an aub file, we can drop the decoding
status enum.
v2: include stdbool (Eric)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Sat, 28 Jul 2018 18:11:56 +0000 (19:11 +0100)]
intel: common: add missing stdint include
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Mon, 30 Jul 2018 01:55:54 +0000 (02:55 +0100)]
intel: decoder: remove unused variable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Tue, 31 Jul 2018 09:48:37 +0000 (10:48 +0100)]
intel: tools: aubwrite: reuse canonical address helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Tue, 31 Jul 2018 06:12:56 +0000 (07:12 +0100)]
intel: aubinator: fix read the context/ring
Up to now we've been lucky that the buffer returned was always exactly
at the address we requested.
Fixes: 144b40db5411 ("intel: aubinator: drop the 1Tb GTT mapping")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Ian Romanick [Thu, 2 Aug 2018 02:34:02 +0000 (19:34 -0700)]
nir: Transform expressions of b2f(a) and b2f(b) to a == b
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs:
14276886 ->
14276838 (<.01%)
instructions in affected programs: 312 -> 264 (-15.38%)
helped: 2
HURT: 0
total cycles in shared programs:
532578395 ->
532570985 (<.01%)
cycles in affected programs: 682562 -> 675152 (-1.09%)
helped: 374
HURT: 4
helped stats (abs) min: 2 max: 200 x̄: 20.39 x̃: 18
helped stats (rel) min: 0.07% max: 11.64% x̄: 1.25% x̃: 1.28%
HURT stats (abs) min: 2 max: 114 x̄: 53.50 x̃: 49
HURT stats (rel) min: 0.06% max: 11.70% x̄: 5.02% x̃: 4.15%
95% mean confidence interval for cycles value: -21.30 -17.91
95% mean confidence interval for cycles %-change: -1.30% -1.06%
Cycles are helped.
Sandy Bridge
total instructions in shared programs:
10488123 ->
10488075 (<.01%)
instructions in affected programs: 336 -> 288 (-14.29%)
helped: 2
HURT: 0
total cycles in shared programs:
150260379 ->
150260439 (<.01%)
cycles in affected programs: 4726 -> 4786 (1.27%)
helped: 0
HURT: 2
No changes on Iron Lake or GM45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Thu, 2 Aug 2018 02:33:24 +0000 (19:33 -0700)]
nir: Transform expressions of b2f(a) and b2f(b) to a ^^ b
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs:
14276892 ->
14276886 (<.01%)
instructions in affected programs: 484 -> 478 (-1.24%)
helped: 2
HURT: 0
total cycles in shared programs:
532578397 ->
532578395 (<.01%)
cycles in affected programs: 3522 -> 3520 (-0.06%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Thu, 2 Aug 2018 02:32:18 +0000 (19:32 -0700)]
nir: Transform expressions of b2f(a) and b2f(b) to !(a && b)
All Gen platforms had pretty similar results. (Skylake shown)
total cycles in shared programs:
532578400 ->
532578397 (<.01%)
cycles in affected programs: 2784 -> 2781 (-0.11%)
helped: 1
HURT: 1
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.26% max: 0.26% x̄: 0.26% x̃: 0.26%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
v2: s/fmax/fmin/. Noticed by Thomas Helland.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Thu, 2 Aug 2018 02:31:22 +0000 (19:31 -0700)]
nir: Transform expressions of b2f(a) and b2f(b) to a && b
No changes on any Gen platform.
v2: s/fmax/fmin/. Noticed by Thomas Helland.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Thu, 2 Aug 2018 02:27:01 +0000 (19:27 -0700)]
nir: Transform expressions of b2f(a) and b2f(b) to !(a || b)
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs:
14276961 ->
14276892 (<.01%)
instructions in affected programs: 3215 -> 3146 (-2.15%)
helped: 28
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2
helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92%
95% mean confidence interval for instructions value: -2.87 -2.06
95% mean confidence interval for instructions %-change: -5.73% -2.95%
Instructions are helped.
total cycles in shared programs:
532577068 ->
532578400 (<.01%)
cycles in affected programs: 121864 -> 123196 (1.09%)
helped: 35
HURT: 30
helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22
helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86%
HURT stats (abs) min: 2 max: 246 x̄: 93.80 x̃: 36
HURT stats (rel) min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58%
95% mean confidence interval for cycles value: -5.02 46.01
95% mean confidence interval for cycles %-change: -0.99% 1.65%
Inconclusive result (value mean confidence interval includes 0).
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs:
7781299 ->
7781342 (<.01%)
instructions in affected programs: 22300 -> 22343 (0.19%)
helped: 13
HURT: 40
helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3
helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43%
95% mean confidence interval for instructions value: 0.23 1.39
95% mean confidence interval for instructions %-change: -1.18% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs:
177878928 ->
177879332 (<.01%)
cycles in affected programs: 383298 -> 383702 (0.11%)
helped: 7
HURT: 43
helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10
helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40%
HURT stats (abs) min: 2 max: 38 x̄: 11.02 x̃: 12
HURT stats (rel) min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09%
95% mean confidence interval for cycles value: 5.21 10.95
95% mean confidence interval for cycles %-change: -0.51% 0.21%
Inconclusive result (%-change mean confidence interval includes 0).
v2: s/fmin/fmax/. Noticed by Thomas Helland.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Thu, 2 Aug 2018 02:46:26 +0000 (19:46 -0700)]
nir: Transform -fabs(a) >= 0 to a == 0
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs:
14276964 ->
14276961 (<.01%)
instructions in affected programs: 411 -> 408 (-0.73%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.47% max: 1.96% x̄: 1.04% x̃: 0.68%
total cycles in shared programs:
532577062 ->
532577068 (<.01%)
cycles in affected programs: 1093 -> 1099 (0.55%)
helped: 1
HURT: 1
helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
helped stats (rel) min: 7.77% max: 7.77% x̄: 7.77% x̃: 7.77%
HURT stats (abs) min: 22 max: 22 x̄: 22.00 x̃: 22
HURT stats (rel) min: 2.48% max: 2.48% x̄: 2.48% x̃: 2.48%
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Thu, 2 Aug 2018 00:18:07 +0000 (17:18 -0700)]
nir: Transform expressions of b2f(a) and b2f(b) to a || b
All Gen6+ platforms had pretty similar results. (Skylake shown)
total instructions in shared programs:
14277184 ->
14276964 (<.01%)
instructions in affected programs: 10082 -> 9862 (-2.18%)
helped: 37
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4
helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for instructions value: -7.87 -3.71
95% mean confidence interval for instructions %-change: -6.98% -3.16%
Instructions are helped.
total cycles in shared programs:
532577990 ->
532577062 (<.01%)
cycles in affected programs: 170959 -> 170031 (-0.54%)
helped: 33
HURT: 9
helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30
helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13%
HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8
HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22%
95% mean confidence interval for cycles value: -31.23 -12.96
95% mean confidence interval for cycles %-change: -2.90% -1.02%
Cycles are helped.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs:
7781539 ->
7781301 (<.01%)
instructions in affected programs: 10169 -> 9931 (-2.34%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6
helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88%
95% mean confidence interval for instructions value: -9.53 -5.34
95% mean confidence interval for instructions %-change: -5.94% -2.12%
Instructions are helped.
total cycles in shared programs:
177878590 ->
177878932 (<.01%)
cycles in affected programs: 78706 -> 79048 (0.43%)
helped: 7
HURT: 21
helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28
helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37%
HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22
HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70%
95% mean confidence interval for cycles value: 0.30 24.13
95% mean confidence interval for cycles %-change: -1.52% 1.01%
Inconclusive result (%-change mean confidence interval includes 0).
v2: s/fmin/fmax/. Noticed by Thomas Helland.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Fri, 27 Jul 2018 01:26:18 +0000 (18:26 -0700)]
nir: Transform -fabs(a) < 0 to a != 0
Unlike the much older -abs(a) >= 0.0 transformation, this is not
precise. The behavior changes if a is NaN.
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs:
14277216 ->
14277184 (<.01%)
instructions in affected programs: 2300 -> 2268 (-1.39%)
helped: 8
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.00 x̃: 3
helped stats (rel) min: 0.48% max: 15.15% x̄: 4.41% x̃: 1.01%
95% mean confidence interval for instructions value: -6.45 -1.55
95% mean confidence interval for instructions %-change: -9.96% 1.13%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs:
532577848 ->
532577990 (<.01%)
cycles in affected programs: 17486 -> 17628 (0.81%)
helped: 2
HURT: 5
helped stats (abs) min: 2 max: 6 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.06% max: 1.81% x̄: 0.93% x̃: 0.93%
HURT stats (abs) min: 6 max: 50 x̄: 30.00 x̃: 26
HURT stats (rel) min: 0.55% max: 2.17% x̄: 1.19% x̃: 1.02%
95% mean confidence interval for cycles value: -1.06 41.63
95% mean confidence interval for cycles %-change: -0.58% 1.74%
Inconclusive result (value mean confidence interval includes 0).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Aug 2018 23:51:35 +0000 (16:51 -0700)]
nir: Rearrange bcsel with two bcsel sources
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs:
14277220 ->
14277216 (<.01%)
instructions in affected programs: 422 -> 418 (-0.95%)
helped: 2
HURT: 0
total cycles in shared programs:
532577908 ->
532577848 (<.01%)
cycles in affected programs: 2800 -> 2740 (-2.14%)
helped: 2
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Aug 2018 16:58:19 +0000 (09:58 -0700)]
nir: Collapse more repeated bcsels on the same argument
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs:
14277230 ->
14277220 (<.01%)
instructions in affected programs: 751 -> 741 (-1.33%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2
helped stats (rel) min: 1.23% max: 1.40% x̄: 1.32% x̃: 1.32%
95% mean confidence interval for instructions value: -3.42 -1.58
95% mean confidence interval for instructions %-change: -1.47% -1.17%
Instructions are helped.
total cycles in shared programs:
532577947 ->
532577908 (<.01%)
cycles in affected programs: 10641 -> 10602 (-0.37%)
helped: 4
HURT: 3
helped stats (abs) min: 1 max: 40 x̄: 13.75 x̃: 7
helped stats (rel) min: 0.11% max: 3.08% x̄: 1.10% x̃: 0.60%
HURT stats (abs) min: 2 max: 8 x̄: 5.33 x̃: 6
HURT stats (rel) min: 0.13% max: 0.55% x̄: 0.30% x̃: 0.23%
95% mean confidence interval for cycles value: -20.69 9.55
95% mean confidence interval for cycles %-change: -1.63% 0.63%
Inconclusive result (value mean confidence interval includes 0).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Tue, 3 Jul 2018 19:39:54 +0000 (12:39 -0700)]
nir: Don't compare i2f or u2i with zero
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs:
14277620 ->
14277230 (<.01%)
instructions in affected programs: 36905 -> 36515 (-1.06%)
helped: 101
HURT: 6
helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6
helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51%
HURT stats (abs) min: 1 max: 28 x̄: 10.00 x̃: 1
HURT stats (rel) min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47%
95% mean confidence interval for instructions value: -4.59 -2.70
95% mean confidence interval for instructions %-change: -1.90% -1.41%
Instructions are helped.
total cycles in shared programs:
532580716 ->
532577947 (<.01%)
cycles in affected programs: 940575 -> 937806 (-0.29%)
helped: 92
HURT: 12
helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62
helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41%
HURT stats (abs) min: 10 max: 1112 x̄: 160.58 x̃: 63
HURT stats (rel) min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20%
95% mean confidence interval for cycles value: -50.66 -2.59
95% mean confidence interval for cycles %-change: -2.09% -0.73%
Cycles are helped.
total spills in shared programs: 8116 -> 8124 (0.10%)
spills in affected programs: 200 -> 208 (4.00%)
helped: 0
HURT: 2
total fills in shared programs: 11086 -> 11094 (0.07%)
fills in affected programs: 436 -> 444 (1.83%)
helped: 0
HURT: 2
Ivy Bridge and Haswell had similar results. (Haswell shown)
total instructions in shared programs:
12979054 ->
12978067 (<.01%)
instructions in affected programs: 33633 -> 32646 (-2.93%)
helped: 120
HURT: 2
helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13
helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17%
HURT stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18
HURT stats (rel) min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00%
95% mean confidence interval for instructions value: -9.19 -6.99
95% mean confidence interval for instructions %-change: -5.27% -3.62%
Instructions are helped.
total cycles in shared programs:
411212880 ->
411199636 (<.01%)
cycles in affected programs: 696441 -> 683197 (-1.90%)
helped: 107
HURT: 5
helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146
helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88%
HURT stats (abs) min: 2 max: 50 x̄: 24.00 x̃: 22
HURT stats (rel) min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25%
95% mean confidence interval for cycles value: -136.96 -99.54
95% mean confidence interval for cycles %-change: -9.75% -6.53%
Cycles are helped.
total spills in shared programs: 78623 -> 78631 (0.01%)
spills in affected programs: 66 -> 74 (12.12%)
helped: 0
HURT: 2
total fills in shared programs: 80104 -> 80108 (<.01%)
fills in affected programs: 133 -> 137 (3.01%)
helped: 0
HURT: 2
No changes on Sandy Bridge, Iron Lake, or GM45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Tue, 3 Jul 2018 20:57:06 +0000 (13:57 -0700)]
nir: Remove f2i(i2f(x)) conversions
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs:
14277978 ->
14277620 (<.01%)
instructions in affected programs: 36957 -> 36599 (-0.97%)
helped: 76
HURT: 1
helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4
helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87%
HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14
HURT stats (rel) min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36%
95% mean confidence interval for instructions value: -7.06 -2.24
95% mean confidence interval for instructions %-change: -1.28% -0.77%
Instructions are helped.
total cycles in shared programs:
532584581 ->
532580716 (<.01%)
cycles in affected programs: 973591 -> 969726 (-0.40%)
helped: 76
HURT: 1
helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32
helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19%
HURT stats (abs) min: 8280 max: 8280 x̄: 8280.00 x̃: 8280
HURT stats (rel) min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10%
95% mean confidence interval for cycles value: -386.98 286.59
95% mean confidence interval for cycles %-change: -1.41% -0.81%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 8127 -> 8116 (-0.14%)
spills in affected programs: 108 -> 97 (-10.19%)
helped: 1
HURT: 0
total fills in shared programs: 11090 -> 11086 (-0.04%)
fills in affected programs: 440 -> 436 (-0.91%)
helped: 1
HURT: 1
Haswell
total instructions in shared programs:
12979174 ->
12979054 (<.01%)
instructions in affected programs: 9040 -> 8920 (-1.33%)
helped: 14
HURT: 1
helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6
helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14%
HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
HURT stats (rel) min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19%
95% mean confidence interval for instructions value: -13.58 -2.42
95% mean confidence interval for instructions %-change: -3.94% -1.01%
Instructions are helped.
total cycles in shared programs:
411227148 ->
411212880 (<.01%)
cycles in affected programs: 630506 -> 616238 (-2.26%)
helped: 15
HURT: 0
helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38
helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17%
95% mean confidence interval for cycles value: -2544.28 641.88
95% mean confidence interval for cycles %-change: -6.89% -0.94%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 78626 -> 78623 (<.01%)
spills in affected programs: 42 -> 39 (-7.14%)
helped: 1
HURT: 0
total fills in shared programs: 80111 -> 80104 (<.01%)
fills in affected programs: 140 -> 133 (-5.00%)
helped: 1
HURT: 1
Ivy Bridge
total instructions in shared programs:
11684101 ->
11684030 (<.01%)
instructions in affected programs: 3080 -> 3009 (-2.31%)
helped: 4
HURT: 1
helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5
helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99%
HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15%
95% mean confidence interval for instructions value: -45.59 17.19
95% mean confidence interval for instructions %-change: -9.38% -1.56%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs:
258407697 ->
258389653 (<.01%)
cycles in affected programs: 328323 -> 310279 (-5.50%)
helped: 5
HURT: 0
helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32
helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60%
95% mean confidence interval for cycles value: -11616.71 4399.11
95% mean confidence interval for cycles %-change: -16.56% -2.03%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 4537 -> 4528 (-0.20%)
spills in affected programs: 64 -> 55 (-14.06%)
helped: 1
HURT: 0
total fills in shared programs: 4823 -> 4815 (-0.17%)
fills in affected programs: 189 -> 181 (-4.23%)
helped: 1
HURT: 1
Sandy Bridge
total instructions in shared programs:
10488464 ->
10488449 (<.01%)
instructions in affected programs: 272 -> 257 (-5.51%)
helped: 3
HURT: 0
helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5
helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49%
total cycles in shared programs:
150263359 ->
150263263 (<.01%)
cycles in affected programs: 7978 -> 7882 (-1.20%)
helped: 3
HURT: 0
helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32
helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23%
No changes on Iron Lake or GM45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>