Lars Hamre [Wed, 23 Mar 2016 14:14:23 +0000 (10:14 -0400)]
compiler/glsl: allow sequence op as a const expr in gles 1.0
Allow the sequence operator to be a constant expression in GLSL ES
versions prior to GLSL ES 3.0
Fixes the following piglit test:
/all/spec/glsl-es-1.0/compiler/array-sized-by-sequence-in-parenthesis.vert
This is similar to the logic from process_initializer() which performs
the same check for constant variable initialization with sequence
operators.
v2: Fixed regression pointed out by Eduardo Lima Mitev
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Nicolai Hähnle [Mon, 21 Mar 2016 19:50:50 +0000 (14:50 -0500)]
radeonsi: fix out-of-bounds indexing of shader images
Results are undefined but may not crash. Without this change, out-of-bounds
indexing can lead to VM faults and GPU hangs.
Constant buffers, samplers, and possibly others will eventually need similar
treatment to support GL_ARB_robust_buffer_access_behavior.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Nicolai Hähnle [Thu, 17 Mar 2016 01:47:47 +0000 (20:47 -0500)]
radeonsi: cache flush/invalidation for missing PIPE_BARRIER_*_BUFFER bits (v2)
This fixes arb_shader_image_load_store-host-mem-barrier.
v2: flush TC L2 for index buffers on <= CIK (Marek)
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 18 Mar 2016 00:49:26 +0000 (19:49 -0500)]
st/mesa: add missing MemoryBarrier bits and some explanations
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 18 Mar 2016 00:49:03 +0000 (19:49 -0500)]
gallium: add PIPE_BARRIER_STREAMOUT_BUFFER
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Tue, 22 Mar 2016 17:26:53 +0000 (18:26 +0100)]
radeonsi: fix 2D array MSAA failures since image support landed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Jason Ekstrand [Thu, 17 Mar 2016 02:31:02 +0000 (19:31 -0700)]
i965/fs: Don't constant-fold RCP
No shader-db changes on Broadwell
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 16 Mar 2016 23:06:10 +0000 (16:06 -0700)]
i965: Remove the RCP+RSQ algebraic optimizations
NIR already has this optimization and it can do much better than the little
peephole in the backend.
No shader-db change on Haswell or Broadwell.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 2 Mar 2016 21:47:56 +0000 (13:47 -0800)]
nir: Don't abs slt and friends
No shader-db changes, but this is symmetric with the previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 2 Mar 2016 21:46:50 +0000 (13:46 -0800)]
nir: Don't abs the result of b2f or b2i
In the results below, 2 SIMD16 shaders in Trine are lost.
G4X
total instructions in shared programs:
4012279 ->
4011108 (-0.03%)
instructions in affected programs: 116776 -> 115605 (-1.00%)
helped: 339
HURT: 0
total cycles in shared programs:
84315862 ->
84313584 (-0.00%)
cycles in affected programs:
1767232 ->
1764954 (-0.13%)
helped: 274
HURT: 81
Ironlake
total instructions in shared programs:
6399073 ->
6396998 (-0.03%)
instructions in affected programs: 218050 -> 215975 (-0.95%)
helped: 600
HURT: 0
total cycles in shared programs:
128892088 ->
128888810 (-0.00%)
cycles in affected programs:
2867452 ->
2864174 (-0.11%)
helped: 422
HURT: 137
Sandy Bridge
total instructions in shared programs:
8462174 ->
8460759 (-0.02%)
instructions in affected programs: 178529 -> 177114 (-0.79%)
helped: 596
HURT: 0
total cycles in shared programs:
117542276 ->
117534098 (-0.01%)
cycles in affected programs:
1239166 ->
1230988 (-0.66%)
helped: 369
HURT: 150
Ivy Bridge
total instructions in shared programs:
7775131 ->
7773410 (-0.02%)
instructions in affected programs: 162903 -> 161182 (-1.06%)
helped: 590
HURT: 0
total cycles in shared programs:
65759882 ->
65747268 (-0.02%)
cycles in affected programs:
1004354 -> 991740 (-1.26%)
helped: 467
HURT: 141
Haswell
total instructions in shared programs:
7107786 ->
7106327 (-0.02%)
instructions in affected programs: 140954 -> 139495 (-1.04%)
helped: 590
HURT: 0
total cycles in shared programs:
64668028 ->
64655322 (-0.02%)
cycles in affected programs: 967080 -> 954374 (-1.31%)
helped: 452
HURT: 149
LOST: 2
GAINED: 0
Broadwell
total instructions in shared programs:
8980029 ->
8978287 (-0.02%)
instructions in affected programs: 197232 -> 195490 (-0.88%)
helped: 715
HURT: 0
total cycles in shared programs:
70070448 ->
70055970 (-0.02%)
cycles in affected programs: 975724 -> 961246 (-1.48%)
helped: 471
HURT: 111
LOST: 2
GAINED: 0
Skylake
total instructions in shared programs:
9115178 ->
9113436 (-0.02%)
instructions in affected programs: 203012 -> 201270 (-0.86%)
helped: 715
HURT: 0
total cycles in shared programs:
68848660 ->
68834004 (-0.02%)
cycles in affected programs: 993888 -> 979232 (-1.47%)
helped: 473
HURT: 116
LOST: 2
GAINED: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 2 Mar 2016 23:36:14 +0000 (15:36 -0800)]
nir: Simplify 0 < fabs(a)
Sandy Bridge / Ivy Bridge / Haswell
total instructions in shared programs:
8462180 ->
8462174 (-0.00%)
instructions in affected programs: 564 -> 558 (-1.06%)
helped: 6
HURT: 0
total cycles in shared programs:
117542462 ->
117542276 (-0.00%)
cycles in affected programs: 9768 -> 9582 (-1.90%)
helped: 12
HURT: 0
Broadwell / Skylake
total instructions in shared programs:
8980833 ->
8980826 (-0.00%)
instructions in affected programs: 626 -> 619 (-1.12%)
helped: 7
HURT: 0
total cycles in shared programs:
70077900 ->
70077714 (-0.00%)
cycles in affected programs: 9378 -> 9192 (-1.98%)
helped: 12
HURT: 0
G45 and Ironlake showed no change.
v2: Modify the comments to look more like a proof.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 2 Mar 2016 23:18:34 +0000 (15:18 -0800)]
nir: Simplify 0 >= b2f(a)
This also prevented some regressions with other patches in my local
tree.
Broadwell / Skylake
total instructions in shared programs:
8980835 ->
8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0
total cycles in shared programs:
70077904 ->
70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0
No changes on earlier platforms.
v2: Modify the comments to look more like a proof.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 2 Mar 2016 02:59:57 +0000 (18:59 -0800)]
nir: Simplify i2b with negated or abs operand
This enables removing ssa_201 and ssa_202 in sequences like:
vec1 ssa_200 = flt ssa_199, ssa_194
vec1 ssa_201 = b2i ssa_200
vec1 ssa_202 = i2b -ssa_201
shader-db results:
Sandy Bridge
total instructions in shared programs:
8462257 ->
8462180 (-0.00%)
instructions in affected programs: 3846 -> 3769 (-2.00%)
helped: 35
HURT: 0
total cycles in shared programs:
117542934 ->
117542462 (-0.00%)
cycles in affected programs: 20072 -> 19600 (-2.35%)
helped: 20
HURT: 1
Ivy Bridge
total instructions in shared programs:
7775252 ->
7775137 (-0.00%)
instructions in affected programs: 3645 -> 3530 (-3.16%)
helped: 35
HURT: 0
total cycles in shared programs:
65760522 ->
65760068 (-0.00%)
cycles in affected programs: 21082 -> 20628 (-2.15%)
helped: 25
HURT: 2
Haswell
total instructions in shared programs:
7108666 ->
7108589 (-0.00%)
instructions in affected programs: 3253 -> 3176 (-2.37%)
helped: 35
HURT: 0
total cycles in shared programs:
64675726 ->
64675272 (-0.00%)
cycles in affected programs: 21034 -> 20580 (-2.16%)
helped: 26
HURT: 1
Broadwell / Skylake
total instructions in shared programs:
8980912 ->
8980835 (-0.00%)
instructions in affected programs: 3223 -> 3146 (-2.39%)
helped: 35
HURT: 0
total cycles in shared programs:
70077926 ->
70077904 (-0.00%)
cycles in affected programs: 21886 -> 21864 (-0.10%)
helped: 21
HURT: 6
G45 and Ironlake showed no change.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Mon, 7 Mar 2016 21:09:30 +0000 (13:09 -0800)]
nir: Lower flrp with Boolean interpolator to bcsel
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse. On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.
shader-db results:
G4X / Ironlake
total instructions in shared programs:
4016538 ->
4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1
total cycles in shared programs:
84328296 ->
84315862 (-0.01%)
cycles in affected programs:
4174570 ->
4162136 (-0.30%)
helped: 926
HURT: 53
Unsurprisingly, no changes on later platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Mon, 7 Mar 2016 18:55:21 +0000 (10:55 -0800)]
i965: Have NIR lower flrp on pre-GEN6 vec4 backend
Previously we were doing the lowering by hand in vec4_visitor::emit_lrp.
By doing it in NIR, we have the opportunity for NIR to do additional
optimization of the expanded code.
This also enables optimizations added by the next commit.
shader-db results:
G4X / Ironlake
total instructions in shared programs:
4024401 ->
4016538 (-0.20%)
instructions in affected programs: 447686 -> 439823 (-1.76%)
helped: 2623
HURT: 0
total cycles in shared programs:
84375846 ->
84328296 (-0.06%)
cycles in affected programs:
16964960 ->
16917410 (-0.28%)
helped: 2556
HURT: 41
Unsurprisingly, no changes on later platforms.
v2: Formatting and comment changes suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Tue, 22 Mar 2016 14:35:25 +0000 (08:35 -0600)]
swrast: fix discarded const warning in s_texture.c
Signed-off-by: Brian Paul <brianp@vmware.com>
Marc-André Lureau [Fri, 18 Mar 2016 19:01:07 +0000 (20:01 +0100)]
i965: fix invalid memory write
I noticed some heap corruption running virgl tests, and valgrind
helped me to track it down to the following error:
==29272== Invalid write of size 4
==29272== at 0x90283D4: push_loop_stack (brw_eu_emit.c:1307)
==29272== by 0x9029A7D: brw_DO (brw_eu_emit.c:1750)
==29272== by 0x90554B0: fs_generator::generate_code(cfg_t const*, int) (brw_fs_generator.cpp:1999)
==29272== by 0x904491F: brw_compile_fs (brw_fs.cpp:5685)
==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137)
==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638)
==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51)
==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260)
==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006)
==29272== by 0x8C84325: _mesa_link_program (shaderapi.c:1042)
==29272== by 0x8C851D7: _mesa_LinkProgram (shaderapi.c:1515)
==29272== by 0x4E4B8E8: add_shader_program (vrend_renderer.c:880)
==29272== Address 0xf2f3cb0 is 0 bytes after a block of size 112 alloc'd
==29272== at 0x4C2AA98: calloc (vg_replace_malloc.c:711)
==29272== by 0x8ED11F7: ralloc_size (ralloc.c:113)
==29272== by 0x8ED1282: rzalloc_size (ralloc.c:134)
==29272== by 0x8ED14C0: rzalloc_array_size (ralloc.c:196)
==29272== by 0x9019C7B: brw_init_codegen (brw_eu.c:291)
==29272== by 0x904F565: fs_generator::fs_generator(brw_compiler const*, void*, void*, void const*, brw_stage_prog_data*, unsigned int, bool, gl_shader_stage) (brw_fs_generator.cpp:124)
==29272== by 0x9044883: brw_compile_fs (brw_fs.cpp:5675)
==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137)
==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638)
==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51)
==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260)
==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006)
if_depth_in_loop is an array of size p->loop_stack_array_size, and
push_loop_stack() will access if_depth_in_loop[p->loop_stack_depth+1],
thus the condition to grow the array should be
p->loop_stack_array_size <= (p->loop_stack_depth + 1) (it's currently
off by 2...)
This can be reproduced by running the following test with virgl test
server:
LIBGL_ALWAYS_SOFTWARE=y GALLIUM_DRIVER=virpipe bin/shader_runner
./tests/shaders/glsl-fs-unroll-explosion.shader_test -auto
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dave Airlie [Tue, 22 Mar 2016 00:28:44 +0000 (10:28 +1000)]
tgsi: drop unused set_exec/kill_mask interfaces.
These don't get used and haven't been in git history from what I can
see, so drop them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 21 Mar 2016 23:53:47 +0000 (09:53 +1000)]
docs/relnotes: update ARB_internalformat_query2 status.
Signed-off-by: Dave Airlie <Airlied@redhat.com>
Dave Airlie [Fri, 4 Mar 2016 02:33:46 +0000 (12:33 +1000)]
st/mesa: add support for internalformat query2.
Add code to handle GL_INTERNALFORMAT_PREFERRED.
Add code to deal with GL_RENDERBUFFER being passes into ChooseTextureFormat.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Anuj Phogat [Fri, 11 Mar 2016 23:24:36 +0000 (15:24 -0800)]
i965: Fix assert conditions for src/dst x/y offsets
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anuj Phogat [Mon, 14 Mar 2016 17:25:50 +0000 (10:25 -0700)]
swrast: Move assert for 'slice' in to check_map_teximage
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
xavier [Wed, 9 Mar 2016 08:58:48 +0000 (09:58 +0100)]
r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding multiplications.
Previously it was doing this transformation for a Trine 3 shader:
MUL R6.x.12, R13.x.23, 0.5|
3f000000
- MULADD R4.x.12, -R6.x.12, 2|
40000000, 1|
3f800000
+ MULADD R4.x.12, -R13.x.23, -1|
bf800000, 1|
3f800000
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94412
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Mon, 21 Mar 2016 12:15:44 +0000 (13:15 +0100)]
nvc0: make sure to delete samplers used by compute shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Thu, 17 Mar 2016 03:19:50 +0000 (20:19 -0700)]
i965/blorp: Make BlitFramebuffer() do sRGB encoding in ES 3.x.
According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer
is supposed to perform sRGB decoding and encoding whenever sRGB formats
are in use. The ES 3.0 specification is completely clear, and has
always stated this.
However, the GL specification has changed behavior in 4.1, 4.2, and
4.4. The original behavior stated that no sRGB encoding should occur.
The 4.4 behavior matches ES 3.0's wording. However, implementing the
new behavior appears to break applications such as Left 4 Dead 2.
This patch changes Meta to apply the ES 3.x rules in ES 3.x, but
leaves OpenGL alone for now, to avoid breaking applications.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Thu, 17 Mar 2016 03:15:52 +0000 (20:15 -0700)]
i965/blorp: Refactor sRGB encoding/decoding.
Because the rules for sRGB are so insane, we change brw_blorp_miptrees
to take decode_srgb and encode_srgb flags, which control linearization
of the source and destination separately.
This should make it easy to implement whatever crazy combination of
rules people throw at us. For now, it should be equivalent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Tue, 8 Mar 2016 08:34:14 +0000 (00:34 -0800)]
meta: Make BlitFramebuffer() do sRGB encoding in ES 3.x.
According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer
is supposed to perform sRGB decoding and encoding whenever sRGB formats
are in use. The ES 3.0 specification is completely clear, and has
always stated this.
However, the GL specification has changed behavior in 4.1, 4.2, and
4.4. The original behavior stated that no sRGB encoding should occur.
The 4.4 behavior matches ES 3.0's wording. However, implementing the
new behavior appears to break applications such as Left 4 Dead 2.
This patch changes Meta to apply the ES 3.x rules in ES 3.x, but
leaves OpenGL alone for now, to avoid breaking applications.
Meta implements several other functions in terms of BlitFramebuffer,
and many of those explicitly do not perform sRGB encoding. So, this
patch explicitly disables sRGB encoding in those other functions,
preserving the existing (correct) behavior.
If you're from the future and are reading this, hi! Welcome to
the "fun" of debugging sRGB problems! Best of luck!
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Nicolai Hähnle [Tue, 15 Mar 2016 18:08:21 +0000 (13:08 -0500)]
docs: mark GL_ARB_shader_image_load_store/_size as done for radeonsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Edward O'Callaghan [Sun, 10 Jan 2016 13:50:32 +0000 (00:50 +1100)]
radeonsi: Set PIPE_SHADER_CAP_MAX_SHADER_IMAGES
This enables ARB_shader_image_load_store and ARB_shader_image_size.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
[allow the same number of images for all shader stages and require LLVM 3.9]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 16 Mar 2016 01:58:12 +0000 (20:58 -0500)]
radeonsi: disable early Z if the fragment shader writes to memory
Empirically, both the EXEC_ON_* flags and LATE_Z are necessary.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 16 Mar 2016 01:54:30 +0000 (20:54 -0500)]
tgsi/scan: add writes_memory to flag presence of stores or atomics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 13 Mar 2016 19:44:46 +0000 (14:44 -0500)]
radeonsi: force the DCC enable bit off in image descriptors for writing (v2)
This avoids a lockup at least on Tonga.
v2: only force DCC off on VI+ (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 13 Mar 2016 16:37:27 +0000 (11:37 -0500)]
radeonsi: implement MemoryBarrier (v2)
v2: invalidate both constant and VMEM/TC L1 for constant buffers (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 14 Mar 2016 15:22:21 +0000 (10:22 -0500)]
radeonsi: implement volatile memory access
Prevent loads from being re-ordered or coalesced.
Atomics don't need special handling by definition, and stores don't need
special handling because LLVM is unable to detect dead image or buffer
stores.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 13 Mar 2016 02:32:34 +0000 (21:32 -0500)]
radeonsi: implement coherent memory access (v2)
v2: set glc=1 for volatile also on buffers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 10 Mar 2016 23:12:44 +0000 (18:12 -0500)]
radeonsi: Lower TGSI_OPCODE_MEMBAR down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 12 Feb 2016 01:54:25 +0000 (20:54 -0500)]
radeonsi: Lower TGSI_OPCODE_ATOM* down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 9 Feb 2016 16:51:31 +0000 (11:51 -0500)]
radeonsi: Lower TGSI_OPCODE_STORE down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 9 Feb 2016 15:59:14 +0000 (10:59 -0500)]
radeonsi: Lower TGSI_OPCODE_LOAD down to LLVM op (v3)
v2: new signature style for buffer intrinsics (offsets)
v3: new signature style for llvm.amdgcn.buffer.load.format (overloaded return)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Nicolai Hähnle [Tue, 9 Feb 2016 20:01:35 +0000 (15:01 -0500)]
radeonsi: extract the LLVM type name construction into its own function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 7 Feb 2016 23:41:09 +0000 (18:41 -0500)]
radeonsi: Lower TGSI_OPCODE_RESQ down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 7 Feb 2016 22:34:57 +0000 (17:34 -0500)]
radeonsi: extract TXQ buffer size computation into its own function
This will allow it to be reused for RESQ.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 8 Feb 2016 03:30:46 +0000 (22:30 -0500)]
radeonsi: decompress shader images
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 16 Mar 2016 03:01:39 +0000 (22:01 -0500)]
radeonsi: update shader image descriptor for invalidated buffer
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 6 Feb 2016 23:32:13 +0000 (18:32 -0500)]
radeonsi: implement set_shader_images (v2)
Whether DCC is disabled depends on the access flags with which the image
is bound: image_load supports DCC, but store and atomic don't.
v2: remove an unnecessary masking of images->desc.enabled_mask
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 12 Mar 2016 00:39:18 +0000 (19:39 -0500)]
gallium/radeon: make r600_texture_disable_dcc externally accessible
We will need it in radeonsi for shader images.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 13 Mar 2016 20:06:15 +0000 (15:06 -0500)]
tgsi/scan: track which shader images are really buffers
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 13 Mar 2016 20:00:40 +0000 (15:00 -0500)]
tgsi/scan: add images_writemask
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 13 Mar 2016 16:37:10 +0000 (11:37 -0500)]
st/mesa: translate additional flags in MemoryBarrier
Re-order flags in the order in which they appear in the OpenGL spec in the
description of MemoryBarrier().
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 13 Mar 2016 16:36:53 +0000 (11:36 -0500)]
gallium: add additional PIPE_BARRIER_* bits
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Brian Paul [Wed, 24 Feb 2016 01:02:11 +0000 (18:02 -0700)]
svga: add svga_winsys_context::pipe_debug_callback pointer
The svga winsys modules can use this to send debug messages to the
state tracker and Mesa.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Charmaine Lee [Tue, 26 Jan 2016 19:12:09 +0000 (11:12 -0800)]
svga: Fix the index buffer rebind regression
The index buffer handle saved in the hw_state structure could
be invalid after the index buffer is destroyed. Instead of
rebinding the index buffer using the saved index buffer handle,
we will reset the index buffer handle in the hw_state structure
to force resending of the index buffer.
Fixes bug
1593320
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Wed, 20 Jan 2016 18:35:56 +0000 (10:35 -0800)]
svga: rebind stream output targets
To ensure stream output target surfaces are available for the draw commands,
we need to rebind the current stream output targets at the first draw in the
command buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Wed, 20 Jan 2016 04:25:39 +0000 (20:25 -0800)]
svga: rebind index buffer
Similar to other resources, current index buffer needs to be
rebound at the first draw of the current command buffer to make
sure the buffer is available for the draw command.
Fixes bug
1587263.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 21 Mar 2016 19:23:04 +0000 (13:23 -0600)]
svga: minor formatting fix, comment addition
To sync with our internal tree.
Signed-off-by: Brian Paul <brianp@vmware.com>
Charmaine Lee [Fri, 11 Mar 2016 22:54:36 +0000 (14:54 -0800)]
svga: optimize constant buffer uploads
When a constant buffer slot is allocated in the upload buffer,
the allocated slot size is always in multiple of 256. But the actual buffer
size might not be in multiple of 256. This causes a gap between
the ending offset of a slot and the starting offset of the next slot.
The gap will prevent the two slots to be updated in a single update command.
In order to maximize the chance of merging the contiguous dirty ranges,
when a slot is to be allocated in the constant upload buffer,
specify a buffer size in multiple of 256.
There is about 10% performance improvement with Lightsmark2008 and
30% with Cinebench R11.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Charmaine Lee [Fri, 11 Mar 2016 22:33:39 +0000 (14:33 -0800)]
svga: add a few more resource updates HUD query
This patch adds the following HUD queries:
.num-resource-updates -- number of resource update. Commands include
UPDATE_SUBRESOURCE, UPDATE_GB_IMAGE.
.num-buffer-uploads -- number of buffer uploads.
.num-const-buf-updates -- number of set constant buffer.
.num-const-updates -- number of set shader constant.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Charmaine Lee [Thu, 10 Mar 2016 18:57:24 +0000 (10:57 -0800)]
svga: add new num-readbacks HUD query
To find out how many image readback command is issued.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Brian Paul [Fri, 18 Mar 2016 15:55:57 +0000 (09:55 -0600)]
svga: use shader sampler view declarations
Previously, we looked at the bound textures (via the pipe_sampler_views)
to determine texture dimensions (1D/2D/3D/etc) and datatype (float vs.
int). But this could fail in out of memory conditions. If we failed to
allocate a texture and didn't create a pipe_sampler_view, we'd default
to using 0 (PIPE_BUFFER) as the texture type. This led to device errors
because of inconsistent shader code.
This change relies on all TGSI shaders having an SVIEW declaration for
each SAMP declaration. The previous patch series does that.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 18 Mar 2016 20:20:44 +0000 (14:20 -0600)]
gallium/tests: declare sampler views in shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 18 Mar 2016 20:20:27 +0000 (14:20 -0600)]
gallium/util: declare sampler view in util_make_fs_blit_msaa_depthstencil()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 18 Mar 2016 20:20:01 +0000 (14:20 -0600)]
postprocess: declare sampler views in shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 18 Mar 2016 20:07:47 +0000 (14:07 -0600)]
hud: add sampler view declaration in text fragment shader
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Thu, 17 Mar 2016 00:43:00 +0000 (18:43 -0600)]
st/mesa: emit sampler view decls in drawpixels code
v2: support both TGSI_TEXTURE_2D and _RECT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Thu, 17 Mar 2016 00:43:00 +0000 (18:43 -0600)]
st/mesa: emit sampler view declaration in bitmap shader
In June 2015, Rob Clark started updating the tgsi utility code to emit
SVIEW declarations in various shaders (for polygon stipple, blitting,
etc). These patches do the same for the Mesa state tracker.
The VMware driver will use this.
v2: support both TGSI_TEXTURE_2D and _RECT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Wed, 16 Mar 2016 21:58:37 +0000 (15:58 -0600)]
st/mesa: emit sampler view declarations for ARB vert/frag programs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 18 Mar 2016 18:20:10 +0000 (12:20 -0600)]
st/mesa: use correct TGSI texture target in drawpix fragment shader
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 18 Mar 2016 18:16:50 +0000 (12:16 -0600)]
st/mesa: use correct TGSI texture target in bitmap fragment shader
Depending on the driver's support for NPOT textures, we might use
a RECT texture instead of 2D texture. We should propogate that info
to the fragment shader's TEX instruction.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Fri, 18 Mar 2016 18:11:39 +0000 (12:11 -0600)]
gallium/tgsi: pass TGSI tex target to tgsi_transform_tex_inst()
Instead of hard-coded 2D tex target in tgsi_transform_tex_2d_inst()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Nicolai Hähnle [Fri, 18 Mar 2016 22:16:39 +0000 (17:16 -0500)]
st/mesa: use the texture view's format for render-to-texture
Aside from the bug below, it fixes a simplistic test I've written locally,
and I see no regression in Piglit for radeonsi.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94595
Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Hans de Goede [Thu, 17 Mar 2016 09:04:15 +0000 (10:04 +0100)]
gallium: Remove unused TGSI_RESOURCE_ defines
These magic file-index defines where only ever used in the nouveau code
and that no longer uses them.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Hans de Goede [Thu, 17 Mar 2016 09:00:59 +0000 (10:00 +0100)]
nouveau: codegen: Do not silently fail in handeLOAD / handleSTORE / handleATOM
handeLOAD / handleSTORE / handleATOM can only handle TGSI_FILE_BUFFER
and TGSI_FILE_MEMORY. Make things fail explictly when another
register-file is used in these functions.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Hans de Goede [Wed, 16 Mar 2016 09:10:47 +0000 (10:10 +0100)]
nouveau: codegen: Disable more old resource handling code
Commit
c3083c7082 ("nv50/ir: add support for BUFFER accesses") disabled /
commented out some of the old resource handling code, but not all of it.
Effectively all of it is dead already, if we ever enter the old code
paths in handeLOAD / handleSTORE / handleATOM we will get an exception
due to trying to access the now always zero-sized resources vector.
Disable all the dead code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Hans de Goede [Tue, 15 Mar 2016 13:37:27 +0000 (14:37 +0100)]
nouveau: codegen: gk110: Make emitSTORE offset handling identical to emitLOAD
Make the store offset handling in CodeEmitterGK110::emitSTORE identical
to the one in CodeEmitterGK110::emitLOAD handling.
This is just a cleanup, it does not cause any functional changes.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Hans de Goede [Tue, 15 Mar 2016 12:48:30 +0000 (13:48 +0100)]
nouveau: codegen: Slightly refactor Source::scanInstruction() dst handling
Use the dst temp variable which was used in the TGSI_FILE_OUTPUT
case everywhere. This makes the code somewhat easier to reads
and helps avoiding going over 80 chars with upcoming changes.
This also brings the dst handling more in line with the src
handling.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Hans de Goede [Thu, 10 Mar 2016 15:02:06 +0000 (16:02 +0100)]
nouveau: codegen: Add support for clover / OpenCL kernel input parameters
Add support for clover / OpenCL kernel input parameters.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Hans de Goede [Thu, 10 Mar 2016 12:52:00 +0000 (13:52 +0100)]
tgsi: Add support for global / private / input MEMORY
Extend the MEMORY file support to differentiate between global, private
and shared memory, as well as "input" memory.
"MEMORY[x], INPUT" is intended to access OpenCL kernel parameters, a
special memory type is added for this, since the actual storage of these
(e.g. UBO-s) may differ per implementation. The uploading of kernel
parameters is handled by launch_grid, "MEMORY[x], INPUT" allows drivers
to use an access mechanism for parameter reads which matches with the
upload method.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Hans de Goede [Thu, 10 Mar 2016 14:26:21 +0000 (15:26 +0100)]
tgsi: Fix decl.Atomic and .Shared not propagating when parsing tgsi text
When support for decl.Atomic and .Shared was added, tgsi_build_declaration
was not updated to propagate these properly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Iago Toral Quiroga [Fri, 18 Mar 2016 07:39:23 +0000 (08:39 +0100)]
doc: document spilling options accepted by INTEL_DEBUG
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Hans de Goede [Wed, 16 Mar 2016 08:46:05 +0000 (09:46 +0100)]
tgsi: Fix return of uninitialized memory in tgsi_*_instruction_memory
tgsi_default_instruction_memory / tgsi_build_instruction_memory were
returning uninitialized memory for tgsi_instruction_memory.Texture and
tgsi_instruction_memory.Format. Note 0 means not set, and thus is a
correct default initializer for these.
Fixes: 3243b6fc97 ("tgsi: add Texture and Format to tgsi_instruction_memory")
Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ilia Mirkin [Sun, 20 Mar 2016 03:27:56 +0000 (23:27 -0400)]
st/mesa: report correct precision information for low/medium/high ints
When we have native integers, these have full precision. Whether they're
low/medium/high isn't piped through the TGSI yet, but eventually those
might have differing precisions. For now they're just 32-bit ints.
Fixes the following dEQP tests:
dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int
dEQP-GLES3.functional.state_query.shader.precision_fragment_highp_int
which expected highp ints to have full 32-bit precision, not the default
23-bit float precision.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Nishanth Peethambaran [Fri, 11 Mar 2016 06:23:00 +0000 (01:23 -0500)]
st/omx/dec: Correct the timestamping
Attach the timestamp to the dpb buffer and use that timestamp
while pushing buffer from dpb list to the omx client.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nishanth Peethambaran <nishanth.peethambaran@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Nishanth Peethambaran [Tue, 15 Mar 2016 05:56:18 +0000 (01:56 -0400)]
st/omx: Remove trailing spaces
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nishanth Peethambaran <nishanth.peethambaran@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Sun, 20 Mar 2016 17:43:43 +0000 (13:43 -0400)]
nv50/ir: fix indirect texturing for non-array textures on nvc0
If a layer parameter is provided, we want to flip it to position 0 (and
combine it with any indirect params). However if the target is not an
array, there is no layer, so we have to shift all of the arguments down
by one to make room for it.
This fixes situations where there were non-coordinate parameters, such
as bias, lod, depth compare, explicit derivatives. Instead of adding a
new parameter at the front for the indirect reference, we would swap one
of those in its place.
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.uniform.compute.*shadow
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Sun, 20 Mar 2016 01:25:36 +0000 (21:25 -0400)]
st/mesa: only minify depth for 3d targets
We make sure that that image depth matches the level's depth before
copying it into place. However we should only be minifying the first
level's depth for 3d textures - array textures have the same depth for
all levels.
This fixes tests such as
dEQP-GLES3.functional.texture.specification.texsubimage3d_depth.* and I
suspect account for a number of other odd situations I've run into where
level > 0 of array textures was messed up.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Sat, 19 Mar 2016 15:58:25 +0000 (11:58 -0400)]
nv50/ir: normalize cube coordinates after derivatives have been computed
In "manual" derivative mode (always used on nv50 and sometimes on nvc0
but always for cube), the idea is that using the quadop instruction, we
set up the "other" quads to have values such that the derivatives work
out, and then run the texture instruction as if nothing were strange. It
pulls values from the other lanes, and does its magic.
However cube coordinates have to be normalized - one of the 3 coords has
to be 1, to determine which is the major axis, to say which face is
being sampled. We were normalizing the coordinates first, and then
adding the derivatives. This is wrong for two reasons:
- the coordinates got normalized by a scaling factor but the
derivatives didn't
- the result of the addition didn't end up normalized
To resolve this, we flip the logic around to normalize *after* the
per-lane coordinates are set up.
This fixes a bunch of textureGrad cube dEQP tests.
NOTE: nv50 cube arrays with explicit derivatives are still broken, to be
resolved at a later date.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Marek Olšák [Fri, 11 Mar 2016 14:59:28 +0000 (15:59 +0100)]
gallium/radeon: remove remnants of R600 TGSI->LLVM
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 11 Mar 2016 14:53:55 +0000 (15:53 +0100)]
r600g: flatten if (1) statement after removal of TGSI->LLVM
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 11 Mar 2016 14:49:21 +0000 (15:49 +0100)]
r600g: remove TGSI->LLVM translation
It was useful for testing and as a prototype for radeonsi bringup,
but it's not used anymore and doesn't support OpenGL 3.3 even.
v2: try to fix OpenCL build
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Marek Olšák [Fri, 11 Mar 2016 14:24:05 +0000 (15:24 +0100)]
gallium/radeon: remove old CS tracing
Cons:
- it was only integrated in r600g
- it doesn't work with GPUVM
- it records buffer contents at the end of IBs instead of at the beginning,
so the replay isn't exact
- it lacks an IB parser and user-friendliness
A better solution is apitrace in combination with gallium/ddebug, which
has a complete IB parser and can pinpoint hanging CP packets.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 10 Mar 2016 12:29:12 +0000 (13:29 +0100)]
radeonsi: process TGSI property NEXT_SHADER
This allows compiling the main shader part as ES or LS.
If we get the correct hint, non-separable GLSL shaders no longer have to be
compiled as VS first, followed by LS or ES compiled on demand.
The result is that fewer shaders are compiled by piglit, but it doesn't
improve piglit running time.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 10 Mar 2016 12:28:08 +0000 (13:28 +0100)]
st/mesa: set TGSI property NEXT_SHADER
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 10 Mar 2016 12:20:36 +0000 (13:20 +0100)]
gallium: add TGSI property NEXT_SHADER
Radeonsi needs to know which shader stage will execute after a shader
in order to make the best decision about which shader variant to compile
first.
This is only set for VS and TES, because we don't need it elsewhere.
VS has 3 variants:
- next shader is FS
- next shader is GS
- next shader is TCS
TES has 2 variants:
- next shader is FS
- next shader is GS
Currently, radeonsi always assumes the next shader is FS, which is suboptimal,
since st/mesa always knows which shader is next if the GLSL program is not
a "separate shader".
By default, ureg always sets "next shader is FS".
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Pierre Moreau [Sat, 19 Mar 2016 16:56:03 +0000 (17:56 +0100)]
nvc0/ir: Use double constant in handleSQRT
Fixes: a100d89d0998 (nv50,nvc0: Fix invalid constant.)
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kenneth Graunke [Wed, 9 Mar 2016 07:59:37 +0000 (23:59 -0800)]
mesa: Disallow GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME on winsys FBO.
Fixes:
dEQP-GLES3.functional.negative_api.state.get_framebuffer_attachment_parameteriv
Apparently, GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME is not allowed when
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is GL_FRAMEBUFFER_DEFAULT, and
is expected to result in a GL_INVALID_ENUM error.
No GL specification actually defines what GL_FRAMEBUFFER_DEFAULT means.
It probably means the window system FBO. It also doesn't mention the
behavior of any queries for that type. Various ARB folks seem fairly
confused about it too. For now, just do something vaguely like what
dEQP expects.
I think we probably need to check the visual bits against 0 for the
attachment, but we haven't been doing that thusfar, and given how
confusingly this is specified, I can't imagine anyone relying on it.
v2: Improve comments, move error condition above the
_mesa_get_fb0_attachment call, add forgotten "return"
(all suggested/caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Ilia Mirkin [Sat, 19 Mar 2016 15:46:11 +0000 (11:46 -0400)]
nv50/ir: force-enable derivatives on TXD ops
This matters especially in vertex shaders, where derivatives are
disabled by default. This fixes textureGrad in vertex shaders on nv50.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Sat, 19 Mar 2016 15:43:37 +0000 (11:43 -0400)]
nv50: reset TFB bufctx when we no longer hold a reference to the buffers
This fix is analogous to commit
ff085d014.
This fixes some use-after-free situations in dEQP when an xfb state is
removed, and then a clear is triggered, which only does a partial
validation. It would attempt to read the no-longer-valid buffers,
resulting in crashes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Samuel Pitoiset [Wed, 24 Feb 2016 20:35:25 +0000 (21:35 +0100)]
nvc0: avoid using magic numbers for the uniform_bo offsets
Instead make use of constants to improve readability.
The first 32 bytes of the driver constant buffer are unknown... This
doesn't seem to be used in the codegen part, but if the texBindBase
offset is shifted from 0x20 to 0x00, this breaks the universe for
really weird reasons. This sounds like to be related to textures.
Anyway, name this NVC0_CB_AUX_UNK_INFO and add a todo should be
enough for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Samuel Pitoiset [Tue, 15 Mar 2016 12:06:44 +0000 (13:06 +0100)]
nv50/ir: make use of auxCBSlot instead of magic numbers
This avoids using magic numbers for the driver constbuf slot which
is always 15 except for compute shaders on gk104+ where the slot 0
is used.
For gk104+, some special compute-related values like the thread
index are uploaded to screen->parm which is currently bound on c0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Samuel Pitoiset [Tue, 15 Mar 2016 12:16:44 +0000 (13:16 +0100)]
nv50,nvc0: replace resInfoCBSlot by auxCBSlot
Having two different variables for the driver constant buffer slot
is confusing and really useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>