Rob Clark [Sun, 16 Apr 2017 16:39:59 +0000 (12:39 -0400)]
freedreno/ir3: split out per-stage emit_consts fxns
This makes it easier to deal with adding additional stages which have
their own driver-params. The duplicated code this introduces can be
refactored out after a later patch moves to per-shader-stage dirty
flags.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 16 Apr 2017 18:52:16 +0000 (14:52 -0400)]
freedreno: add helper to mark all state clean
Note that this involves juggling around a bit when we emit and clear
texture state. So split out from the patch that adds the helper to set
all state dirty.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 16 Apr 2017 16:21:49 +0000 (12:21 -0400)]
freedreno: add helper to mark all state dirty
This will simplify things when we break out per-shader-stage dirty bits.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 16 Apr 2017 15:57:04 +0000 (11:57 -0400)]
freedreno: move a2xx specific hack out of core
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 16 Apr 2017 15:49:54 +0000 (11:49 -0400)]
freedreno: make texture state an array
Make this an array indexed by shader stage, as is done elsewhere for
other per-shader-stage state. This will simplify things as more shader
stages are eventually added.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 18 Apr 2017 14:24:38 +0000 (10:24 -0400)]
freedreno/ir3: refactor out helpers for comparing shader keys
Each of the ir3 users has *basically* the same logic for comparing the
previous and current shader key, to see which, if any, shader state
needs to be marked dirty due to shader variant change.
The difference between gen's was just that some lowering flags never get
set on certain generations. But it doesn't really hurt to include the
extra checks (because both keys would have false).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 18 Apr 2017 01:20:37 +0000 (21:20 -0400)]
util/queue: don't hang at exit
So atexit() is horrible and
4aea8fe7 is probably not a good idea. But
add an extra layer of duct-tape to the problem. Otherwise we hit a
situation where app using an atexit() handler that runs later than ours
doesn't hang when trying to tear down a context.
(gdb) bt
#0 util_queue_killall_and_wait (queue=queue@entry=0x52bc80) at ../../../src/util/u_queue.c:264
#1 0x0000007fb6c380c0 in atexit_handler () at ../../../src/util/u_queue.c:51
#2 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
#3 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
#4 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
#5 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
#6 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
#7 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
#8 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
#9 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
(gdb) c
Continuing.
[Thread 0x7fb67580c0 (LWP 3471) exited]
^C
Thread 1 "drawbuffer-mode" received signal SIGINT, Interrupt.
0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000007fb6c38304 in cnd_wait (mtx=0x5bdc90, cond=0x5bdcc0) at ../../../include/c11/threads_posix.h:159
#2 util_queue_fence_wait (fence=0x5bdc90) at ../../../src/util/u_queue.c:106
#3 0x0000007fb6daac70 in fd_batch_sync (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:233
#4 batch_reset (batch=batch@entry=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:183
#5 0x0000007fb6daa5e0 in batch_flush (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:290
#6 fd_batch_flush (batch=0x5bdc70, sync=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:308
#7 0x0000007fb6daba2c in fd_bc_flush (cache=0x461220, ctx=0x52b920) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch_cache.c:141
#8 0x0000007fb6dac954 in fd_context_flush (pctx=0x52b920, fence=0x0, flags=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_context.c:54
#9 0x0000007fb6b43294 in st_glFlush (ctx=<optimized out>) at ../../../src/mesa/state_tracker/st_cb_flush.c:121
#10 0x0000007fb69a84e8 in _mesa_make_current (newCtx=newCtx@entry=0x0, drawBuffer=drawBuffer@entry=0x0, readBuffer=readBuffer@entry=0x0) at ../../../src/mesa/main/context.c:1654
#11 0x0000007fb6b7ca58 in st_api_make_current (stapi=<optimized out>, stctxi=0x0, stdrawi=0x0, streadi=0x0) at ../../../src/mesa/state_tracker/st_manager.c:827
#12 0x0000007fb6cc87e8 in dri_unbind_context (cPriv=<optimized out>) at ../../../../../src/gallium/state_trackers/dri/dri_context.c:217
#13 0x0000007fb6cc80b0 in driUnbindContext (pcp=0x5271e0) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:591
#14 0x0000007fb7d1da08 in MakeContextCurrent (dpy=0x433380, draw=0, read=0, gc_user=0x0) at ../../../src/glx/glxcurrent.c:214
#15 0x0000007fb7a8d5e0 in glx_platform_make_current () from /lib64/libwaffle-1.so.0
#16 0x0000007fb7a894e4 in waffle_make_current () from /lib64/libwaffle-1.so.0
#17 0x0000007fb7ef8c60 in piglit_wfl_framework_teardown (wfl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_wfl_framework.c:628
#18 0x0000007fb7ef939c in piglit_winsys_framework_teardown (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:238
#19 0x0000007fb7ef9c30 in destroy (gl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:212
#20 0x0000007fb7edb7c4 in destroy () at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:184
#21 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
#22 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
#23 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
#24 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
#25 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
#26 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
#27 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
#28 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
(gdb) r
Fixes: 4aea8fe7 ("gallium/u_queue: fix random crashes when the app calls exit()")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Eric Anholt [Mon, 6 Mar 2017 23:45:32 +0000 (15:45 -0800)]
vc4: Enable V3D 2.6.
This version of the chip is present on the Cygnus-based 911360 enterprise
phone platform. It appears to be completely backwards compatible.
Samuel Pitoiset [Thu, 13 Apr 2017 22:44:00 +0000 (00:44 +0200)]
st/mesa: add st_convert_sampler()
Similar to st_convert_image(), will be useful for bindless. While
we are at it, rename convert_sampler() to convert_sampler_from_unit()
and make 'st' a const argument.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bartosz Tomczyk [Thu, 13 Apr 2017 18:10:24 +0000 (20:10 +0200)]
mesa/glthread: add async support to ARB_viewport_array functions
v2: fix attribute name, it is count_scale not scale_count
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Timothy Arceri [Fri, 14 Apr 2017 03:33:32 +0000 (13:33 +1000)]
mesa: rename _mesa_add_renderbuffer* functions
These names make it easier to understand what is going on in
regards to references.
Reviewed-by: Brian Paul <brianp@vmware.com>
Nanley Chery [Thu, 13 Apr 2017 16:52:31 +0000 (09:52 -0700)]
anv/cmd_buffer: Disable CCS on BDW input attachments
The description under RENDER_SURFACE_STATE::RedClearColor says,
For Sampling Engine Multisampled Surfaces and Render Targets:
Specifies the clear value for the red channel.
For Other Surfaces:
This field is ignored.
This means that the sampler on BDW doesn't support CCS.
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Lionel Landwerlin [Mon, 17 Apr 2017 21:45:08 +0000 (14:45 -0700)]
anv: blorp: flush memory after copy
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Grazvydas Ignotas [Fri, 14 Apr 2017 20:59:28 +0000 (23:59 +0300)]
radv: enable timestampComputeAndGraphics
Commit
bfee9866 "radv: Use RELEASE_MEM packet for MEC timestamp query."
added WriteTimestamp handling for compute queues but forgot to flip
the flag.
Tested with DOOM (by me) and CTS (by Bas), but without verification
that these tests actually use timestamps on compute queues.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Rob Clark [Sat, 15 Apr 2017 16:32:17 +0000 (12:32 -0400)]
freedreno: fix crash if ctx torn down with no rendering
In this case, ctx->flush_queue would not have been initialized.
Fixes: 0b613c20 ("freedreno: enable draw/batch reordering by default")
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 12 Apr 2017 13:45:16 +0000 (09:45 -0400)]
freedreno/ir3: add 'high' register class
For compute shaders, we need to be able to allocate some "high"
registers (r48.x to r55.w). (Possibly these are global to all threads
in a warp?) Add a new register class to handle this.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 5 Apr 2017 23:43:31 +0000 (19:43 -0400)]
freedreno: extract helper for stage->sb for a4xx+
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Mon, 3 Apr 2017 17:38:02 +0000 (13:38 -0400)]
freedreno/{a4xx,a5xx}: switch to CP_LOAD_STATE4
The layout of CP_LOAD_STATE packet is slightly different on a4xx+.
Switch to the a4xx+ specific CP_LOAD_STATE4 to get the correct encoding.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Thu, 30 Mar 2017 03:18:36 +0000 (23:18 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
Emil Velikov [Mon, 17 Apr 2017 14:07:44 +0000 (15:07 +0100)]
configure.ac: print deprecation warning as needed
The warning should be printed only when one explicitly uses the
deprecated configure toggle.
Fixes: 7748c3f5eb1 ("configure.ac: deprecate --with-egl-platforms over
--with-platforms")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Mon, 17 Apr 2017 13:44:35 +0000 (14:44 +0100)]
docs: add news item and link release notes for 17.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Mon, 17 Apr 2017 13:42:37 +0000 (14:42 +0100)]
docs: add sha256 checksums for 17.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
12434966ebed20cea322b8a6bd4671c7f42e3e49)
Emil Velikov [Mon, 17 Apr 2017 13:38:04 +0000 (14:38 +0100)]
docs: add release notes for 17.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
367bafc7c153611b39bb41145a9601e5f1cb4934)
Emil Velikov [Mon, 17 Apr 2017 13:27:10 +0000 (14:27 +0100)]
docs: add 17.2.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Mon, 17 Apr 2017 12:29:41 +0000 (13:29 +0100)]
configure.ac: deprecate --with-egl-platforms over --with-platforms
Currently the former controls more than just EGL. With follow-up commits
we'll unwind and fix things so that one can build the different drivers
with said platform support.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Thu, 8 Dec 2016 19:21:44 +0000 (19:21 +0000)]
configure: remove egl platforms check
The configure option is used by more than just EGL and with next commit
we'll rename it accordingly. Thus having the check will (and is atm)
incorrect.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Thu, 8 Dec 2016 19:21:42 +0000 (19:21 +0000)]
travis: remove unneeded dri3/present proto requirement
Signed-off-by: Emil Velikov <emil.lvelikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Emil Velikov [Thu, 8 Dec 2016 19:21:41 +0000 (19:21 +0000)]
configure: remove unneeded dri3/present proto requirements
We are not using either of these. The respecive xcb packages are used
instead.
v2: Rebase, reword commit message.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kyle Brenneman [Wed, 4 Jan 2017 18:31:58 +0000 (11:31 -0700)]
EGL: Implement the libglvnd interface for EGL (v3)
The new interface mostly just sits on top of the existing library.
The only change to the existing EGL code is to split the client
extension string into platform extensions and everything else. On
non-glvnd builds, eglQueryString will just concatenate the two strings.
The EGL dispatch stubs are all generated. The script is based on the one
used to generate entrypoints in libglvnd itself.
v2: [Kyle]
- Rebased against master.
- Reworked the EGL makefile to use separate libraries
- Made the EGL code generation scripts work with Python 2 and 3.
- Change gen_egl_dispatch.py to use argparse for the command line arguments.
- Assorted formatting and style cleanup in the Python scripts.
v3: [Emil Velikov]
- Rebase
- Remove separate glvnd glx/egl configure toggles
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tapani Pälli [Thu, 16 Mar 2017 09:24:38 +0000 (11:24 +0200)]
android: add marshal_generated c and h files to generated sources
Fixes: efd63e2 ("mesa: Connect the generated GL command marshalling code to the build.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Thu, 13 Apr 2017 18:47:17 +0000 (19:47 +0100)]
configure.ac: honour --disable-libunwind if the .pc file is present
We should check the presence in order to determine if we should
[implicitly] set the CFLAGS/LIBS
v2: Drop spurious OMX hunk (Eric)
Cc: Eric Anholt <eric@anholt.net>
Reported-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Wed, 12 Apr 2017 14:11:14 +0000 (15:11 +0100)]
docs: document the C++14 SWR requirement
Earlier commit bumped the requirement for the SWR driver.
v2: Fold the note with the LLVM 3.9 one (Tim)
Fixes: 3c52a7316a1 ("swr: [configure.ac/scons] require c++14")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Samuel Pitoiset [Fri, 14 Apr 2017 16:32:25 +0000 (18:32 +0200)]
winsys/amdgpu: init buffer_indices_hashlist with memset()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 14 Apr 2017 16:32:24 +0000 (18:32 +0200)]
winsys/amdgpu: simplify amdgpu_cs_add_buffer() a bit
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Kenneth Graunke [Wed, 12 Apr 2017 16:30:48 +0000 (09:30 -0700)]
i965/drm: Delete NULL check in brw_bo_unmap().
I accidentally moved the bo->bufmgr dereference above the NULL check
when cleaning up this code.
While passing NULL to free() is a common pattern...passing NULL to
unmap seems pretty bad. You really ought to know whether you have
a buffer or not. We don't want to paper over bugs like that. So,
just drop the NULL check altogether.
CID:
1405006
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Wed, 12 Apr 2017 16:53:44 +0000 (09:53 -0700)]
intel/decoder: Fix is_header_field starting condition.
Starting positions >= 32 are not part of the header, rather than >.
Caught by Coverity, which found that "bits <<= field->start" may shift
by 32, which has undefined behavior.
CID:
1404968
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Kenneth Graunke [Wed, 12 Apr 2017 16:33:39 +0000 (09:33 -0700)]
i965/drm: Remove dead return in brw_bo_busy()
If ret is 0, we return. If ret is not 0, we return. This is dead.
CID:
1405013 (Structurally dead code (UNREACHABLE))
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Mauro Rossi [Sat, 1 Apr 2017 10:48:44 +0000 (12:48 +0200)]
android: amd/addrlib: trivial fix for gfx9 support
Fixes the following build error:
external/mesa/src/amd/addrlib/gfx9/gfx9addrlib.cpp:36:10: fatal error: 'gfx9_gb_reg.h' file not found
^
1 error generated.
Fixes: 7f160ef "amd/addrlib: import gfx9 support"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Jason Ekstrand [Fri, 14 Apr 2017 21:41:43 +0000 (14:41 -0700)]
nir: Add GLSL_TYPE_[U]INT64 to some switch statements
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Marek Olšák [Thu, 13 Apr 2017 21:46:59 +0000 (23:46 +0200)]
gallium/radeon: always flush asynchronously and wait after begin_new_cs
This hides the overhead of everything in the driver after the CS flush and
before returning from pipe_context::flush.
Only microbenchmarks will benefit.
+2% FPS for glxgears.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 15 Mar 2017 19:50:35 +0000 (20:50 +0100)]
radeonsi: remove local variable 'mod' from si_compile_tgsi_shader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 21 Feb 2017 19:32:51 +0000 (20:32 +0100)]
radeonsi: add si_shader_selector::vs_needs_prolog
cleanup
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 12 Apr 2017 15:48:16 +0000 (17:48 +0200)]
radeonsi: don't set VGT_GS_MODE as part of the GS state
The VS state sets it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 2 Apr 2017 14:22:54 +0000 (16:22 +0200)]
radeonsi: don't allow user indices with indirect draws
Not possible with GL and it will make future gallium rework easier.
(also it's something I wouldn't like to support)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 2 Apr 2017 13:27:02 +0000 (15:27 +0200)]
radeonsi: merge two if (indirect) statements
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 7 Apr 2017 10:36:59 +0000 (12:36 +0200)]
radeonsi: don't mark non-dirty textures with CMASK as compressed
because the compression is skipped with non-dirty textures.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bas Nieuwenhuizen [Fri, 14 Apr 2017 21:39:15 +0000 (23:39 +0200)]
docs: Document interaction Fixes tag and stable branches.
For the next time I forget.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Mon, 10 Apr 2017 01:48:49 +0000 (11:48 +1000)]
glsl: don't run the GLSL pre-processor when we are skipping compilation
This moves the hashing of shader source for the cache lookup to before
the preprocessor. In our experience, shaders are unlikely to hash the
same after preprocessing if they didn't hash the same before, so we can
skip preprocessing for cache hits.
Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.
Also fixes the leaking of state.
V2: fix indentation
v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.
Tested-by (v2): Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Timothy Arceri [Mon, 10 Apr 2017 01:48:48 +0000 (11:48 +1000)]
glsl: delay optimisations on individual shaders when cache is available
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.
To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.
Improves cold cache start-up times on Deus Ex by ~20 seconds.
When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.
V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jason Ekstrand [Sat, 25 Feb 2017 00:36:00 +0000 (16:36 -0800)]
anv: Add the pci_id into the shader cache UUID
This prevents a user from using a cache created on one hardware
generation on a different one. Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
Philipp Zabel [Wed, 12 Apr 2017 10:31:01 +0000 (12:31 +0200)]
etnaviv: native fence fd support
This adds native fence fd support to etnaviv, similarly to commit
0b98e84e9ba0 ("freedreno: native fence fd"), enabled for kernel
driver version 1.1 or later.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Francisco Jerez [Fri, 14 Apr 2017 22:59:52 +0000 (15:59 -0700)]
docs: mark GL_ARB_vertex_attrib_64bit and OpenGL 4.2 as supported by i965/gen7+
v2 (Andreas Boll):
- Mark GL 4.1 as supported by i965/gen7+
- Mark GL_ARB_shader_precision as supported by i965/gen7+
- Update release notes
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Wed, 29 Mar 2017 09:41:35 +0000 (11:41 +0200)]
i965: enable OpenGL 4.2 in Ivybridge
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Mon, 17 Oct 2016 14:40:06 +0000 (14:40 +0000)]
i965: enable ARB_shader_precision in gen7+
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Fri, 21 Oct 2016 14:57:25 +0000 (16:57 +0200)]
i965: enable ARB_vertex_attrib_64bit for gen7+
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
George Kyriazis [Fri, 14 Apr 2017 18:56:09 +0000 (13:56 -0500)]
swr: Fix swr osmesa build
Use GALLIUM_SWR to standardize
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Wladimir J. van der Laan [Fri, 14 Apr 2017 07:44:27 +0000 (09:44 +0200)]
etnaviv: SINGLE_BUFFER support on GC3000
This patch adds support for the SINGLE_BUFFER feature on GC3000
GPUs, which allows rendering to a single buffer using multiple pixel
pipes.
This feature is always used when it is available, which means that
multi-tiled formats are no longer being used in that case, and all
buffers will be normal (super)tiled. This mimics the behavior of the
blob on GC3000.
- Because the same format can be used to render to and texture from,
this avoids an extra resolve pass when rendering to texture.
- i.MX6qp includes a PRE which can scan-out directly from tiled formats,
avoiding untiling overhead.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Wladimir J. van der Laan [Fri, 14 Apr 2017 07:41:03 +0000 (09:41 +0200)]
etnaviv: Update includes from rnndb
Update to etna_viv commit
8486a97.
austriancoder: changed patch to include isa redefinition fix.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Wladimir J. van der Laan [Fri, 14 Apr 2017 07:39:52 +0000 (09:39 +0200)]
etnaviv: Add chipMinorFeatures4 and 5
Request chipMinorFeatures bitfields 4 and 5 from the
drm driver.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Philipp Zabel [Wed, 12 Apr 2017 14:13:37 +0000 (16:13 +0200)]
etnaviv: resolve tile status when flushing resource
When passing render buffers from EGL clients to a wayland compositor,
the resource tile status must be resolved because otherwise the tile
status is lost in the transfer and cleared parts of the buffer will
contain old contents.
The same applies when sampling directly from a renderable resource.
lst: Add seqno tracking, to skip flush when not needed.
Fixes: aadcb5e94b35 ("etnaviv: enable TS, but disable autodisable")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Philipp Zabel [Wed, 12 Apr 2017 14:13:36 +0000 (16:13 +0200)]
etnaviv: stop repeatedly resolving an unchanged resource into its scanout prime buffer
Before resolving a resource into its scanout prime buffer, check that
the prime resource is actually older. If it is not, the resolve is an
expensive no-op, and we better skip it.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
George Kyriazis [Sat, 1 Apr 2017 01:09:57 +0000 (20:09 -0500)]
swr: Add polygon stipple support
Add polygon stipple functionality to the fragment shader.
Explicitly turn off polygon stipple for lines and points, since we
do them using tris.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Samuel Iglesias Gonsálvez [Wed, 5 Apr 2017 04:23:43 +0000 (06:23 +0200)]
docs/relnotes: add GL_ARB_gpu_shader_fp64 support on i965/ivybridge
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Tue, 11 Oct 2016 08:59:52 +0000 (10:59 +0200)]
docs: mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as supported by i965/gen7+
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Fri, 26 Aug 2016 05:39:04 +0000 (07:39 +0200)]
i965: enable OpenGL 4.0 to Ivybridge/Baytrail
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Fri, 26 Aug 2016 05:37:42 +0000 (07:37 +0200)]
i965: enable ARB_gpu_shader_fp64 for Ivybridge/Baytrail
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Matt Turner [Fri, 20 Jan 2017 21:35:33 +0000 (13:35 -0800)]
i965: Use correct VertStride on align16 instructions.
In commit
c35fa7a, we changed the "width" of DF source registers to 2,
which is conceptually fine. Unfortunately a VertStride of 2 is not
allowed by align16 instructions on IVB/BYT, and the regular VertStride
of 4 works fine in any case.
See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test
for example:
cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
v2:
- Add spec quote (Curro).
- Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Fri, 17 Mar 2017 10:57:25 +0000 (11:57 +0100)]
i965/vec4/dce: improve track of partial flag register writes
This is required for correctness in presence of multiple 4-wide flag
writes (e.g. 4-wide instructions with a conditional mod set) which
update a different portion of the same 8-bit flag subregister.
Right now we keep track of flag dataflow with 8-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 8 channels wide,
which can cause live flag register updates to be dead
code-eliminated incorrectly.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Fri, 17 Mar 2017 10:55:49 +0000 (11:55 +0100)]
i965/vec4: don't do horizontal stride on some register file types
horiz_offset() shouldn't be doing anything for scalar registers,
because all channels of any SIMD instructions will end up reading or
writing the same component of the register, so shifting the register
offset would be wrong.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Re-implement in terms of is_uniform() for
simplicity. Pass argument by const reference. Clarify commit
message. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Matt Turner [Fri, 20 Jan 2017 21:35:32 +0000 (13:35 -0800)]
i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.
Otherwise for a pack_double_2x32_split opcode, we emit:
vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted };
mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted };
mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
The intention was to emit mov(4)s for the instructions that have ERROR
annotations.
See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
for example.
v2 (Samuel):
- Instead of setting the exec size to a fixed value, don't double it
(Curro).
- Add PICK_{HIGH,LOW}_32BIT to the condition.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Trivial rebase changes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Tue, 7 Mar 2017 09:29:53 +0000 (10:29 +0100)]
i965/vec4: use vec4_builder to emit instructions in setup_imm_df()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to
static stand-alone function. Don't write unused components in the
result. Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Fri, 23 Sep 2016 15:57:39 +0000 (15:57 +0000)]
i965/vec4: consider subregister offset in live variables
Take into account offset values less than a full register (32 bytes)
when getting the var from register.
This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).
v2:
- Take in account this offset < 32 in liveness analysis too (Curro)
v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Francisco Jerez [Wed, 12 Apr 2017 23:54:49 +0000 (16:54 -0700)]
i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4
== 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Fri, 24 Mar 2017 07:46:13 +0000 (08:46 +0100)]
i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Wed, 8 Mar 2017 08:27:49 +0000 (09:27 +0100)]
i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.
Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Fri, 23 Sep 2016 09:57:43 +0000 (09:57 +0000)]
i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.
v2:
- Use stride and don't need to fix dst (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Mon, 12 Sep 2016 16:06:22 +0000 (16:06 +0000)]
i965/vec4: keep original type when dealing with null registers
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.
This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.
v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Mon, 29 Aug 2016 08:10:30 +0000 (10:10 +0200)]
i965/vec4: split DF instructions and later double its execsize in IVB/BYT
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).
v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).
v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)
v4:
- Add get_exec_type() function and use it to calculate the execution
size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take
destination type as execution type where there is no valid source.
Assert-fail if the deduced execution type is byte. Clarify comment
in get_lowered_simd_width(). Move SIMD width workaround outside of
'if (...inst->size_written > REG_SIZE)' conditional block, since the
problem should be independent of whether the amount of data written
by the instruction is greater or lower than a GRF. Drop redundant
is_ivb_df definition. Drop bogus inst->exec_size < 8 check.
Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Thu, 25 Aug 2016 14:05:24 +0000 (16:05 +0200)]
i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Francisco Jerez [Thu, 9 Feb 2017 18:16:58 +0000 (10:16 -0800)]
i965/fs: Get 64-bit indirect moves working on IVB.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Matt Turner [Fri, 13 Jan 2017 02:05:58 +0000 (18:05 -0800)]
i965: Use source region <1,2,0> when converting to DF.
Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Juan A. Suarez Romero [Wed, 3 Aug 2016 11:51:44 +0000 (11:51 +0000)]
i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT
According to the IVB and HSW PRMs:
"2.When the destination requires two registers and the sources are
indirect, the sources must use 1x1 regioning mode."
So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.
This patch limits the SIMD width to 4 in this case.
v2:
- Fix typo (Matt).
- Fix condition (Curro)
v3:
- Add spec quote (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Fri, 20 Jan 2017 07:50:50 +0000 (08:50 +0100)]
i965/fs: fix dst stride in IVB/BYT type conversions
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.
But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.
v2:
- Fix typo (Matt)
v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)
v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Tue, 14 Mar 2017 07:17:36 +0000 (08:17 +0100)]
i965/fs: rename lower_d2x to lower_conversions
v2:
- Change the name to lower_conversions.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Tue, 28 Mar 2017 04:25:13 +0000 (06:25 +0200)]
Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."
This reverts commit
7dccd38b400d3a65da20ddefe282a7bb0b7ccb58.
d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.
Then,
7dccd38b400d3a65da20ddefe282a7bb0b7ccb58 is not needed anymore.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Samuel Iglesias Gonsálvez [Fri, 20 Jan 2017 07:47:05 +0000 (08:47 +0100)]
i965/fs: generalize the legalization d2x pass
Generalize it to lower any unsupported narrower conversion.
v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.
v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
when fixing the conversion.
- Add get_exec_type() function.
v4 (Curro):
- Use get_exec_type() function to get sources' type.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Matt Turner [Wed, 11 Jan 2017 03:33:22 +0000 (19:33 -0800)]
i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.
v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).
v3:
- Added comment explaining the reason (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Samuel Iglesias Gonsálvez [Wed, 11 Jan 2017 07:17:57 +0000 (08:17 +0100)]
i965/fs: clamp exec_size when an instruction has a scalar DF source
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Mon, 18 Jul 2016 07:27:56 +0000 (07:27 +0000)]
i965/fs: double regioning parameters and execsize for DF in IVB/BYT
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.
So when we have something like:
mov(8) g2<1>DF g3<4,4,1>DF
We are not actually moving 8 doubles (our intention), but 4 doubles.
We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.
v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)
v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).
v4:
- Fix comment (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Juan A. Suarez Romero [Mon, 18 Jul 2016 07:17:39 +0000 (07:17 +0000)]
i965/fs: add helper to retrieve instruction execution type
The execution data size is the biggest type size of any instruction
operand.
We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.
v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).
v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
calculate its size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced
execution type for integer vector types. Take destination type as
execution type where there is no valid source. Assert-fail if the
deduced execution type is byte. Move into brw_ir_fs.h header for
consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Matt Turner [Fri, 20 Jan 2017 21:35:31 +0000 (13:35 -0800)]
i965: Handle IVB DF differences in the validator.
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.
v2 (Sam):
- Add comments.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Iago Toral Quiroga [Fri, 22 Jul 2016 11:36:25 +0000 (13:36 +0200)]
i965/disasm: also print nibctrl in IVB for execsize=8
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.
v2:
- Refactor NibCtrl printing (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Boyan Ding [Wed, 12 Apr 2017 13:14:22 +0000 (21:14 +0800)]
nir: Destination component count of shader_clock intrinsic is 2
This fixes the following error when using ARB_shader_clock on i965:
vec1 32 ssa_0 = intrinsic shader_clock () () ()
intrinsic store_var (ssa_0) (clock_retval) (3) /* wrmask=xy */
error: src->ssa->num_components == num_components (nir/nir_validate.c:204)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Nicolai Hähnle [Wed, 12 Apr 2017 15:05:56 +0000 (17:05 +0200)]
radeonsi: add missing initialization for userptr buffers
Fix the accounting for memory usage of userptr buffers, which has been wrong
forever (or at least for a long time).
Also initialize flags. Without this initialization, the sparse buffer flag
might end up being set, which leads to staging buffers being used unnecessarily
(and incorrectly) in transfers to or from userptr buffers.
This works around VM faults that occur with the radeon kernel module when
running piglit ./bin/amd_pinned_memory decrement-offset map-buffer -auto
Fixes: e077c5fe6579 ("gallium/radeon: transfers and invalidation for sparse buffers")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fredrik Höglund [Thu, 13 Apr 2017 22:27:00 +0000 (00:27 +0200)]
radv: remove the temp descriptor set infrastructure
It is no longer used.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fredrik Höglund [Thu, 13 Apr 2017 22:26:59 +0000 (00:26 +0200)]
radv: use push descriptors in meta
Use push descriptors instead of temp descriptor sets.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fredrik Höglund [Thu, 13 Apr 2017 22:26:58 +0000 (00:26 +0200)]
radv: add private push descriptors for meta
This allows meta to use push descriptors without disturbing user
push descriptors.
radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR
in that partial updates are not supported; all descriptors used in
subsequent draw commands must be pushed at the same time.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Thu, 6 Apr 2017 21:15:55 +0000 (14:15 -0700)]
anv/blorp: Properly handle VK_ATTACHMENT_UNUSED
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description. The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should
fix all of them.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 7 Apr 2017 17:33:25 +0000 (10:33 -0700)]
anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 7 Apr 2017 17:31:01 +0000 (10:31 -0700)]
anv/cmd_buffer: Always set up a null surface state
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>