Jason Ekstrand [Sat, 26 Aug 2017 18:26:40 +0000 (11:26 -0700)]
i965/fs/nir: Simplify 64-bit store_output
The swizzles weren't doing any good because swiz is just XYZW. Also, we
were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit
already does a MOV for us. Finally, the temporary was only ever used
inside the inner loop so there's no need for it to actually be an array.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 18 Oct 2017 01:59:26 +0000 (18:59 -0700)]
intel/fs: Use the original destination region for int MUL lowering
Some hardware (CHV, BXT) have special restrictions on register regions
when doing integer multiplication. We want to respect those when we
lower to DxW multiplication.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Wed, 18 Oct 2017 01:56:29 +0000 (18:56 -0700)]
intel/fs: Fix integer multiplication lowering for src/dst hazards
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Tue, 17 Oct 2017 21:45:43 +0000 (14:45 -0700)]
intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core
The same workaround we need for 64-bit values on little core also takes
care of the Ivy Bridge problem and does so a bit more efficiently so we
can drop that code while we're here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Tue, 17 Oct 2017 21:45:12 +0000 (14:45 -0700)]
intel/eu: Fix broadcast instruction for 64-bit values on little-core
We're not using broadcast for any 32-bit types right now since we mostly
use it for emit_uniformize on 32-bit buffer indices. However, SPIR-V
subgroups are going to need it for 64-bit so let's make it work.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 18 Oct 2017 02:50:36 +0000 (19:50 -0700)]
intel/eu/reg: Add a subscript() helper
This is similar to the identically named fs_reg helper.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Tue, 17 Oct 2017 21:16:31 +0000 (14:16 -0700)]
intel/eu: Just modify the offset in brw_broadcast
This means we have to drop const from a variable but it also means that
100% of the code which deals with the offset limit is in one place.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 17 Oct 2017 18:57:48 +0000 (11:57 -0700)]
intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCAST
These restrictions effectively already existed due to the way we use
indirect sources but weren't being directly enforced.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Thu, 12 Oct 2017 23:17:03 +0000 (16:17 -0700)]
intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all
For some reason, the any/all predicates don't work properly with SIMD32.
In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read
the correct subset of the flag register and you end up getting garbage
in the second half. Work around this by using a pair of 1-wide MOVs and
scattering the result. This fixes the any/all instructions for SIMD32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 03:32:30 +0000 (20:32 -0700)]
intel/fs: Use an explicit D type for vote any/all/eq intrinsics
The any/all intrinsics return a boolean value so D or UD is the correct
type. Unfortunately, get_nir_dest has the annoying behavior of
returnning a float type by default. This causes format conversion which
gives us -1.0f or 0.0f in the register. If the consumer of the result
does an integer comparison to zero, it will give you the right boolean
value but if we do something more clever based on the 0/~0 assumption
for booleans, this will give the wrong value.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 01:37:34 +0000 (18:37 -0700)]
intel/fs: Don't stomp f0.1 in SIMD16 ballot
In fragment shaders f0.1 is used for discards so doing ballot after a
discard can potentially cause the discard to not happen. However, we
don't support SIMD32 fragment shaders yet so this isn't a problem.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Sat, 2 Sep 2017 06:24:15 +0000 (23:24 -0700)]
intel/fs: Use ANY/ALL32 predicates in SIMD32
We have ANY/ALL32 predicates and, for the most part, they work just
fine. (See the next commit for more details.) Also, due to the way
that flag registers are handled in hardware, instruction splitting is
able to split the CMP correctly. Specifically, that hardware looks at
the execution group and knows to shift it's flag usage up correctly so a
2H instruction will write to f0.1 instead of f0.0.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 01:31:11 +0000 (18:31 -0700)]
intel/fs: Be more explicit about our placement of [un]zip
Before, we were careful to place the zip after the last of the split
instructions but did unzip on-demand. This changes things so that the
unzips go before all of the split instructions and the unzip comes
explicitly after all the split instructions. As a side-effect of this
change, we now emit the split instruction from highest SIMD group to
lowest instead of low to high. We could have kept the old behavior, but
it shouldn't matter and this made the code easier.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 01:24:17 +0000 (18:24 -0700)]
intel/fs: Pass builders instead of blocks into emit_[un]zip
This makes it far more explicit where we're inserting the instructions
rather than the magic "before and after" stuff that the emit_[un]zip
helpers did based on block and inst.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 2 Nov 2017 21:52:49 +0000 (14:52 -0700)]
intel/fs: Use a pure vertical stride for large register strides
Register strides higher than 4 are uncommon but they can happen. For
instance, if you have a 64-bit extract_u8 operation, we turn that into
UB -> UQ MOV with a source stride of 8. Our previous calculation would
try to generate a stride of <32;8,8>:ub which is invalid because the
maximum horizontal stride is 4. To solve this problem, we instead use a
stride of <8;1,0>. As noted in the comment, this does not work as a
destination but that's ok as very few things actually generate that
stride.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Eric Anholt [Fri, 3 Nov 2017 02:04:12 +0000 (19:04 -0700)]
broadcom/vc5: Skip emitting textures that aren't used.
Fixes crashes when ARB_fp uses texture[1] but not 0, as in piglit's
fp-fragment-position.
Eric Anholt [Fri, 3 Nov 2017 01:49:58 +0000 (18:49 -0700)]
broadcom/vc5: Add missing SRGBA8 ETC2 support.
Fixes piglit oes_compressed_etc2_texture-miptree srgb8-alpha8.
Eric Anholt [Fri, 3 Nov 2017 01:45:07 +0000 (18:45 -0700)]
broadcom/vc5: Disable early Z test when the FS writes Z.
Fixes piglit early-z.
Eric Anholt [Thu, 2 Nov 2017 19:49:46 +0000 (12:49 -0700)]
broadcom/vc5: Shift the min/max lod fields by the BASE_LEVEL.
The lod clamping is what limits you between base and last level, and the
base level field is just there to help decide where the min/mag change
happens.
Fixes tex-miplevel-selection GL2:texture()
Eric Anholt [Thu, 2 Nov 2017 19:24:17 +0000 (12:24 -0700)]
broadcom/vc5: Add support for anisotropic filtering.
Eric Anholt [Thu, 2 Nov 2017 19:19:10 +0000 (12:19 -0700)]
broadcom/vc5: Fix mipmap filtering enums.
The ordering of the values was even less obvious than I thought, with both
the mip filter and the min filter being in different bits depending on
whether the mip filter is none.
Fixes piglit fs-textureLod-miplevels.shader_test
Eric Anholt [Thu, 2 Nov 2017 18:47:30 +0000 (11:47 -0700)]
broadcom/vc5: Fix height padding of small UIF slices.
The HW doesn't pad the slice's height to make a full 4x4 group of UIF
blocks. We just need to pad to columns, and the start of the next column
appears in the bottom of the previous column's last block.
Fixes piglit fs-textureOffset-2D.
Eric Anholt [Thu, 2 Nov 2017 00:55:52 +0000 (17:55 -0700)]
broadcom/vc5: Print the actual offsets in HW for our resource layout debug.
The alignment of level 0 is non-obvious, so it's hard to turn a faulting
address into a slice without this.
Eric Anholt [Thu, 2 Nov 2017 00:22:17 +0000 (17:22 -0700)]
broadcom/vc5: Set the available VS outputs to match the FS inputs.
Fixes piglit glsl-es-3.00/minimum-maximums.txt.
Eric Anholt [Wed, 1 Nov 2017 22:29:58 +0000 (15:29 -0700)]
broadcom/vc5: Set the max texture LOD bias.
The field is signed 8.8, so the usual 16.0f fits. Fixes piglit
gl-2.1-minmax.
Eric Anholt [Wed, 1 Nov 2017 22:28:04 +0000 (15:28 -0700)]
broadcom/vc5: Fix translation of stencil ops.
They aren't quite in the same order as the gallium defines. Fixes piglit
gl-2.0-two-sided-stencil.
Eric Anholt [Wed, 1 Nov 2017 22:18:34 +0000 (15:18 -0700)]
broadcom/vc5: Move stencil state packing to the CSO.
Only the stencil ref comes in as dynamic state at emit time.
Eric Anholt [Wed, 1 Nov 2017 21:39:47 +0000 (14:39 -0700)]
broadcom/vc5: Introduce a helper for pre-packing our V3DXX structs.
This is so much more pleasant to write than the manual
V3D33_whatever_pack() calls, and will be useful for when we start doing
actual per-V3D compiles.
Eric Anholt [Wed, 1 Nov 2017 22:16:59 +0000 (15:16 -0700)]
broadcom/vc5: Add a cl_emit() variant for merging with a pre-packed struct.
Cleans up the hand-written code, at the cost of another ugly macro.
Eric Anholt [Wed, 1 Nov 2017 21:04:45 +0000 (14:04 -0700)]
broadcom/vc5: Skip emitting depth offset while disabled.
The enable flag is also in the rasterizer state, so it will be emitted
once it's needed.
Eric Anholt [Wed, 1 Nov 2017 20:56:57 +0000 (13:56 -0700)]
broadcom/vc5: Don't emit stencil config if not doing stencil test.
As with blending, we'll have the bit flagged again when it gets reenabled
in CONFIGURATION_BITS, so there's no need to emit test state if we're not
testing.
Eric Anholt [Wed, 1 Nov 2017 20:56:25 +0000 (13:56 -0700)]
broadcom/vc5: Don't emit updated blend factors/funcs while disabled.
The dirty bit will be flagged again when re-enbaled. Keeps us from
emitting blend state in CLs that never do blending.
Eric Anholt [Wed, 1 Nov 2017 18:51:41 +0000 (11:51 -0700)]
broadcom/vc5: Fix missing enum decode for indexed primitives.
Eric Anholt [Wed, 1 Nov 2017 18:48:44 +0000 (11:48 -0700)]
broadcom/vc5: Drop padding bits from the bottom of the TSDA address.
Fixes misaligned-looking addresses in decode.
Eric Anholt [Wed, 1 Nov 2017 17:28:01 +0000 (10:28 -0700)]
broadcom/vc5: Make sure the TMU indirect struct is appropriately aligned.
I was hoping that this would help with fbo-generatemipmap hangs, but no
luck.
Kenneth Graunke [Thu, 26 Oct 2017 04:17:14 +0000 (21:17 -0700)]
broadcom/genxml: Fix decoding of groups with small fields.
Groups containing fields smaller than a byte probably not being decoded
correctly. For example:
<group count="32" start="32" size="4">
<field name="Vertex Element Enables" start="0" end="3" type="uint"/>
</group>
gen_field_iterator_next would properly walk over each element of the
array, incrementing group_iter. However, the code to print the actual
values only considered iter->field->start/end, which are 0 and 3 in the
above example. So it would always fetch bits 3:0 of the current byte,
printing the same value over and over.
Cc: Eric Anholt <eric@anholt.net>
Eric Anholt [Mon, 30 Oct 2017 22:12:33 +0000 (15:12 -0700)]
broadcom/vc5: Use DEPTH24_STENCIL8 for rendering to depth-only textures.
The HW puts the pad bits at the top for DEPTH_COMPONENT24, but we need it
at the bottom for texturing. Using the format with stencil probably means
we won't be able to do Z24 and separate S8, but I wasn't planning on
supporting that anyway.
Fixes hiz-depth-read-fbo-d24-s0
Chad Versace [Thu, 2 Nov 2017 23:05:45 +0000 (16:05 -0700)]
anv: Suffix anv-private 'VK' tokens with 'ANV'
I saw VK_IMAGE_ASPECT_ANY_COLOR_BIT while hacking anv_formats.c and got
confused. "Huh? What extension added that?". No extension defines it;
anv_private.h defines it.
To remove confusion, rename the anv-private VK tokens as if they were
extension tokens with the ANV vendor suffix.
I found only two such tokens:
VK_IMAGE_ASPECT_ANY_COLOR_BIT
VK_IMAGE_ASPECT_PLANES_BITS
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Chad Versace [Thu, 2 Nov 2017 22:34:04 +0000 (15:34 -0700)]
anv: Remove unused variable 'gen'
In anv_physical_device_get_format_properties().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Marek Olšák [Tue, 7 Nov 2017 02:52:34 +0000 (03:52 +0100)]
radeonsi: add si_screen::has_ls_vgpr_init_bug
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 02:50:19 +0000 (03:50 +0100)]
radeonsi: use ac_create_target_machine
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 02:43:38 +0000 (03:43 +0100)]
radeonsi: use ac_get_llvm_processor_name
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 02:29:36 +0000 (03:29 +0100)]
radeonsi/gfx9: don't set gs_table_depth
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 15:12:56 +0000 (16:12 +0100)]
radeonsi/gfx9: limit the scissor bug workaround to Vega10 and Raven only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 14:27:43 +0000 (15:27 +0100)]
radeonsi: remove unused field in the PCI ID table
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Miklós Máté [Fri, 3 Nov 2017 01:01:42 +0000 (02:01 +0100)]
mesa: fix deleting the dummy ATI_fs
The DummyShader is used by GenFragmentShadersATI() as a placeholder to
mark IDs as allocated. Context cleanup wants to delete everything in
ctx->Shared->ATIShaders, and crashes on these placeholders with this
backtrace:
==15060== Invalid free() / delete / delete[] / realloc()
==15060== at 0x482F478: free (vg_replace_malloc.c:530)
==15060== by 0x57694F4: _mesa_delete_ati_fragment_shader (atifragshader.c:68)
==15060== by 0x58B33AB: delete_fragshader_cb (shared.c:208)
==15060== by 0x5838836: _mesa_HashDeleteAll (hash.c:295)
==15060== by 0x58B365F: free_shared_state (shared.c:377)
==15060== by 0x58B3BC2: _mesa_reference_shared_state (shared.c:469)
==15060== by 0x578687F: _mesa_free_context_data (context.c:1366)
==15060== by 0x595E9EC: st_destroy_context (st_context.c:642)
==15060== by 0x5987057: st_context_destroy (st_manager.c:772)
==15060== by 0x5B018B6: dri_destroy_context (dri_context.c:217)
==15060== by 0x5B006D3: driDestroyContext (dri_util.c:511)
==15060== by 0x4A1CBE6: dri3_destroy_context (dri3_glx.c:170)
==15060== Address 0x7b5dae0 is 0 bytes inside data symbol "DummyShader"
Also, DeleteFragmentShadersATI() should not assert on DummyShader, just
remove the hash entry.
Normally one would define a shader after GenFragmentShadersATI(), and
BindFragmentShaderATI() replaces the placeholder with a real object.
However, the specification doesn't say that one has to define a shader
for each allocated ID.
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Michel Dänzer [Tue, 7 Nov 2017 09:48:12 +0000 (10:48 +0100)]
gallium: Guard assertions by NDEBUG instead of DEBUG
This matches the standard assert.h header.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Engestrom [Mon, 6 Nov 2017 17:18:06 +0000 (17:18 +0000)]
meson: only turn on Mesa's DEBUG for buildtype==debug
As discussed in this thread:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/175104.html
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Chad Versace <chadversary@chromium.org>
Eric Engestrom [Mon, 6 Nov 2017 16:49:27 +0000 (16:49 +0000)]
meson: switch default build type to debugoptimized
As discussed in this thread:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/175104.html
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Christian Schmidbauer <ch.schmidbauer@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Ernst Sjöstrand <ernstp@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Rodriguez <andresx7@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Eric Engestrom [Thu, 2 Nov 2017 23:38:09 +0000 (23:38 +0000)]
meson: drop GLESv1 .so version back to 1.0.0
autotools generates libGLESv1_CM.so.1.0.0, so let's make sure meson
does the same.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Thu, 2 Nov 2017 23:24:00 +0000 (23:24 +0000)]
meson: standardize .so version to major.minor.patch
This `version` field defines the filename for the .so.
The plan .so as well as .so.$major are always symlinks to this.
Unless I'm mistaken, only the major is ever used, so this shouldn't
matter, but for consistency with autotools (and in case it does matter),
let's always have all 3 major.minor.patch components.
(The soname isn't affected, and is always .so.$major)
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Dave Airlie [Tue, 31 Oct 2017 01:29:54 +0000 (11:29 +1000)]
ac/nir: for ubo load use correct num_components
I was hacking something stupid in doom, and hit an assert for the bitcast
following this, it definitely looks like this should be the number of 32-bit
components, not the instr level ones.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Gwan-gyeong Mun [Mon, 6 Nov 2017 23:28:25 +0000 (08:28 +0900)]
nir: fix a typo
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tomasz Figa [Tue, 26 Sep 2017 08:35:56 +0000 (17:35 +0900)]
glsl: Allow precision mismatch on dead data with GLSL ES 1.00
Commit
259fc505454ea6a67aeacf6cdebf1398d9947759 added linker error for
mismatching uniform precision, as required by GLES 3.0 specification and
conformance test-suite.
Several Android applications, including Forge of Empires, have shaders
which violate this rule, on a dead varying that will be eliminated.
The problem affects a big number of applications using Cocos2D engine
and other GLES implementations accept this, this poses a serious
application compatibility issue.
Starting from GLSL ES 3.0, declarations with conflicting precision
qualifiers are explicitly prohibited. However GLSL ES 1.00 does not
clearly specify the behavior, except that
"Uniforms are defined to behave as if they are using the same storage in
the vertex and fragment processors and may be implemented this way.
If uniforms are used in both the vertex and fragment shaders, developers
should be warned if the precisions are different. Conversion of
precision should never be implicit."
The word "used" is not clear in this context and might refer to
1) declared (same as GLES 3.x)
2) referred after post-processing, or
3) linked after all optimizations are done.
Looking at existing applications, 2) or 3) seems to be widely adopted.
To avoid compatibility issues, turn the error into a warning if GLSL ES
version is lower than 3.0 and the data is dead in at least one of the
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97532
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Timothy Arceri [Sun, 5 Nov 2017 23:31:30 +0000 (10:31 +1100)]
i965: disable NIR linking on HSW and below
Fixes: 379b24a40d3d "i965: make use of nir linking"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103537
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Dave Airlie [Mon, 6 Nov 2017 04:06:35 +0000 (04:06 +0000)]
radv: move is_local up to the winsys level.
We can avoid adding the buffer in the non-local case, this will
avoid all the overhead of the indirect call.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 6 Nov 2017 04:05:59 +0000 (04:05 +0000)]
radv: wrap cs_add_buffer in an inline. (v2)
The next patch will try and avoid calling the indirect function.
v2: add a missing conversion.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 6 Nov 2017 02:17:09 +0000 (02:17 +0000)]
radv: when loading regs no need to add buffer
The function that calls us has just added the buffer to the
list already, no need to try and add it again.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 6 Nov 2017 06:49:55 +0000 (06:49 +0000)]
radv: pre-calculate user_data_0 registers and store in pipeline
There's no point recalculating these the whole time on descriptor
emission, just store them at pipeline creation.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Adam Jackson [Mon, 6 Nov 2017 21:10:22 +0000 (16:10 -0500)]
docs: Mark GLX_ARB_context_flush_control done
Requires an unreleased X server, but from the client GLX side this is as
done as it gets.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Neil Roberts [Wed, 1 Oct 2014 19:00:50 +0000 (20:00 +0100)]
i965: Enable flush control
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Adam Jackson [Wed, 4 Feb 2015 18:04:26 +0000 (13:04 -0500)]
drisw: Enable flush control for llvmpipe and softpipe
Hilariously this is a fairly big win. Neil's multi-context-test
improves from ~24 to ~36 fps with llvmpipe on a Core i5-3317U. softpipe
also improves, from about 2.25 to 3.09 fps (when it's that slow, you're
allowed to be that precise).
I'd have added it to swrast classic, but the testcase wants GL 3.0 and
shaders, and that's not a thing classic has, so I figured making it work
on softpipe was crime enough.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Adam Jackson [Wed, 4 Feb 2015 18:05:36 +0000 (13:05 -0500)]
gallium: Wire up flush control
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Adam Jackson [Thu, 22 Sep 2016 07:47:55 +0000 (03:47 -0400)]
egl: Implement EGL_KHR_context_flush_control
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Neil Roberts [Wed, 1 Oct 2014 19:00:48 +0000 (20:00 +0100)]
glx: Implement GLX_ARB_context_flush_control
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Neil Roberts [Wed, 1 Oct 2014 19:00:47 +0000 (20:00 +0100)]
dri: Add a flush control extension
This advertises that the driver can accept a new context attribute
__DRI_CTX_ATTRIB_RELEASE_BEHAVIOR.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Neil Roberts [Wed, 1 Oct 2014 19:00:46 +0000 (20:00 +0100)]
dri: Change __DriverApiRec::CreateContext to take a struct for attribs
Previously the CreateContext method of __DriverApiRec took a set of
arguments to describe the attribute values from the window system API's
CreateContextAttribs function. As more attributes get added this could
quickly get unworkable and every new attribute needs a modification for
every driver.
To fix that, pass the attribute values in a struct instead. The struct
has a bitmask to specify which members are used. The first three members
(two for the GL version and one for the flags) are always set. If the
bit is not set in the attribute mask then it can be assumed the
attribute has the default value. Drivers will error if unknown bits in
the mask are set.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Neil Roberts [Wed, 4 Feb 2015 15:20:12 +0000 (10:20 -0500)]
intel: Don't flush the old context in intelMakeCurrent
It shouldn't be necessary to flush the context within the driver
implementation because the old context is explicitly flushed in
_mesa_make_current which is called a little further on. It is useful to
only have a single place that flushes when switching contexts to make it
easier to later implement the GL_KHR_context_flush_control extension.
The flush in intelMakeCurrent was added in commit
5505865 to implement
the GLX semantics that the context should be flushed when it is
released. When the commit was made there was no flush in
_mesa_make_current because it was only added later in
93102b4c. I think
that later commit effectively makes the first commit redundant.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Adam Jackson [Thu, 22 Sep 2016 07:38:01 +0000 (03:38 -0400)]
egl/dri2: Factor out context attribute initialization
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Wladimir J. van der Laan [Thu, 2 Nov 2017 15:08:42 +0000 (16:08 +0100)]
etnaviv: Don't over-pad compressed textures
HALIGN_FOUR/SIXTEEN has no meaning for compressed textures, and we can't
render to them anyway. So use the tightest possible packing. This
avoids bugs with non-power-of-two block sizes.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Wladimir J. van der Laan [Wed, 1 Nov 2017 17:19:02 +0000 (18:19 +0100)]
etnaviv: ASTC texture support
Add ASTC texture support for hardware that supports this
(currently only GC3000 on i.MX6qp is known to have this).
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Wladimir J. van der Laan [Wed, 1 Nov 2017 17:19:01 +0000 (18:19 +0100)]
etnaviv: Update from rnndb
Updated as of etnav_viv commit
3b4a8ec.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Dave Airlie [Fri, 3 Nov 2017 04:06:35 +0000 (04:06 +0000)]
radv: add initial copy descriptor support. (v2)
It appears the latest dota2 vulkan uses this,
and we get a hang in VR mode without it.
v2: remove finishme I left in after finishing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 26 Oct 2017 01:17:29 +0000 (03:17 +0200)]
gallium/u_vbuf: use signed vertex buffers offsets for optimal uploads
Uploaded data must start at (stride * start), because we can't modify
start in all cases. If it's the first allocation, it's also the amount
of memory wasted. If the starting offset is larger than the size of
the upload buffer, the buffer is re-created, used for 1 upload, and then
thrown away. If the upload is small, most of the buffer space is unused
and wasted. Keep doing that and the OOM killer comes. It's actually
pretty quick.
With signed VB offsets, we can set min_out_offset = 0
in u_upload_alloc/u_upload_data.
This fixes OOM situations with SPECviewperf.
Marek Olšák [Wed, 25 Oct 2017 23:51:29 +0000 (01:51 +0200)]
radeonsi: enable signed vertex buffer offsets
Marek Olšák [Wed, 25 Oct 2017 23:50:44 +0000 (01:50 +0200)]
gallium: add PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSET
Juan A. Suarez Romero [Fri, 3 Nov 2017 17:54:21 +0000 (18:54 +0100)]
automake: include git_sha1.h.in in release tarball
Fixes:
make[2]: Leaving directory '/home/local/mesa/mesa-17.4.0-devel/_build/sub/src'
make[2]: *** No rule to make target '../../../src/git_sha1.h.in', needed by 'git_sha1.h'. Stop.
Makefile:660: recipe for target 'all-recursive' failed
Fixes: 16be271c6ee618e79c7d "git_sha1_gen: use git_sha1.h.in on all build systems"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Marek Olšák [Wed, 1 Nov 2017 23:05:15 +0000 (00:05 +0100)]
radeonsi: don't map big VRAM buffers for the first upload directly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 1 Nov 2017 23:00:53 +0000 (00:00 +0100)]
gallium/u_threaded: don't map big VRAM buffers for the first upload directly
This improves Paraview "many spheres" performance 4x along with the radeonsi
commit.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 2 Nov 2017 00:06:43 +0000 (01:06 +0100)]
gallium/u_threaded: clean up tc_improve_map_buffer_flags and prevent reentry
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Dave Airlie [Sun, 5 Nov 2017 23:37:47 +0000 (23:37 +0000)]
radv: move descriptor sets out of cmd_state.
Instead of storing all the pointers and zeroing them all out,
just store a valid bitmask in the state. This also moves
the CmdBindPipeline path down the cpu usage path for the
multithreading demo as it no longer has to traverse MAX_SETS
to find the active descriptor sets.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 5 Nov 2017 23:15:52 +0000 (23:15 +0000)]
radv: add helper for setting a descriptor.
This is just a simple refactor.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 5 Nov 2017 23:17:09 +0000 (23:17 +0000)]
radv: move vertex binding out of cmd state.
This isn't required to be cleared, since buffers are only linked
by vertex elements, so if elements are clear then no buffers
should be referenced.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 5 Nov 2017 23:40:05 +0000 (23:40 +0000)]
radv: reorder cmd_state to remove a hole.
This just removes a hole in the cmd_state and packs some bools
together.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 6 Nov 2017 00:35:17 +0000 (00:35 +0000)]
radv: free attachments on end command buffer.
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.
Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.
Fixes a memory leak with multithreading demo.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Fri, 3 Nov 2017 23:14:55 +0000 (00:14 +0100)]
radv: Optimize calling radv_save_descriptors.
uint32_t data[MAX_SETS * 2] = {}; was getting executed before
the exit and took significant amounts of time. By having the
check outside the function, we skip the execution of the clear.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sat, 4 Nov 2017 14:19:02 +0000 (15:19 +0100)]
radv: Use an array to store descriptor sets.
The vram_list linked list resulted in lots of pointer chasing.
Replacing this with an array instead improves descriptor set
allocation CPU usage by 3x at least (when also considering the free),
because it had to iterate through 300-400 sets on average.
Not a huge improvement as the pre-improvement CPU usage was only
about 2.3% in the busiest thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Pierre Moreau [Mon, 2 Oct 2017 18:57:11 +0000 (20:57 +0200)]
nv50,nvc0: Display shared memory usage in pipe_debug_message
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Pierre Moreau [Mon, 2 Oct 2017 18:57:10 +0000 (20:57 +0200)]
nv50,nvc0: Copy shared memory per block to the program info structure and back
In OpenCL/CUDA kernels, shared memory usage can be defined within the
kernel code. Those usage will only be picked up while parsing the
SPIR-V, during the translation phase of the program.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Pierre Moreau [Mon, 2 Oct 2017 18:57:09 +0000 (20:57 +0200)]
nv50/ir: Store shared memory per block in nv50_ir_prog_info
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Anuj Phogat [Tue, 12 Sep 2017 23:05:06 +0000 (16:05 -0700)]
i965/gen10: Implement Wa3DStateMode
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Remove the bits enabling Float blend optimization. It is
enabled through CACHE_MODE_SS register.
Update the comment.
Move gen10 if block on top of gen9 if block.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Anuj Phogat [Tue, 31 Oct 2017 16:28:09 +0000 (09:28 -0700)]
i965/gen10: Enable float blend optimization
This optimization is enabled for previous generations too.
See Mesa commit
c17e214a6b
On CNL this bit has been moved to CACHE_MODE_SS register.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Anuj Phogat [Mon, 11 Sep 2017 20:03:31 +0000 (13:03 -0700)]
i965/gen10: Implement WaForceRCPFEHangWorkaround
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Add the check for Post Sync Operation.
Update the workaround comment.
Use braces around if-else.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Anuj Phogat [Sat, 9 Sep 2017 00:23:28 +0000 (17:23 -0700)]
i965/gen10: Implement WaSampleOffsetIZ workaround
There are few other (duplicate) workarounds which have similar recommendations:
WaFlushHangWhenNonPipelineStateAndMarkerStalled
WaCSStallBefore3DSamplePattern
WaPipeControlBefore3DStateSamplePattern
WaPipeControlBefore3DStateSamplePattern has some extra recommendations if
driver is using mid batch context restore. Ignoring it for now because We're
not doing mid-batch context restore in Mesa.
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Use brw_load_register_imm32() to program CACHE_MODE_0.
Get rid of brw_flush_gpu_caches().
V3: Make the workaround helper functions static.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by :Nanley Chery <nanley.g.chery@intel.com>
Anuj Phogat [Thu, 26 Oct 2017 18:03:13 +0000 (11:03 -0700)]
i965/gen10: Don't set Antialiasing Enable in 3DSTATE_RASTER if num_samples > 1
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anuj Phogat [Thu, 26 Oct 2017 18:02:36 +0000 (11:02 -0700)]
i965/gen10: Don't set Smooth Point Enable in 3DSTATE_SF if num_samples > 1
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Andrey Grodzovsky [Thu, 2 Nov 2017 14:50:39 +0000 (10:50 -0400)]
winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.
Fixes reverted patch
f03b7c9 by doing VMID reservation per
process and not per context.
Also updates required amdgpu libdrm version since the change
involved interface updates in amdgpu libdrm.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Lionel Landwerlin [Tue, 25 Jul 2017 16:21:22 +0000 (17:21 +0100)]
i965: perf: list registers to program for queries
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Tue, 25 Jul 2017 16:19:08 +0000 (17:19 +0100)]
i965: perf: factorize code for availability
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Tue, 25 Jul 2017 16:17:48 +0000 (17:17 +0100)]
i965: perf: make revision variable available
This will be used in the next commit to build up register programming.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>