nir/load_store_vectorize: rework alignment calculation It now also updates align_offset and creates better alignment information with a constant 0 offset. shader-db (Navi): Totals from 63 (0.05% of 127638) affected shaders: SGPRs: 3072 -> 3064 (-0.26%) VGPRs: 2736 -> 2740 (+0.15%) CodeSize: 325180 -> 324336 (-0.26%); split: -0.27%, +0.01% Instrs: 63555 -> 63413 (-0.22%); split: -0.24%, +0.02% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4710>
nir: Handle all array stride cases in nir_deref_instr_array_stride This renames it to drop the ptr_as and makes it handle all of the stride cases. There's a bit of a tricky bit in here around Booleans but we currently use 32-bit for those always. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
nir: add and use nir_intrinsic_has_ helpers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6402>
nir/load_store_vectorize: fix indentation Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5980>
nir: Filter modes of scoped memory barrier in nir_opt_load_store_vectorize Otherwise a scoped memory barrier containing nir_var_mem_ubo (which memoryBarrier() does lower to) would incorrectly prevent the optimization to happen in UBOs. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5980>
nir: Split nir_index_vars into two functions We also very slightly change the semantics. It no longer is one index per list for global variables and is a single index over-all. Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
nir: Replace the scoped_memory barrier by a scoped_barrier SPIRV OpControlBarrier can have both a memory and a control barrier which some hardware can handle with a single instruction. Let's turn the scoped_memory_barrier into a scoped barrier which can embed both barrier types. Note that control-only or memory-only barriers can be supported through this new intrinsic by passing NIR_SCOPE_NONE to the unused barrier type. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
nir: replace fnv1a hash function with xxhash xxhash is faster than fnv1a in almost all circumstances, so we're switching to it globally. Signed-off-by: Dmytro Nester <dmytro.nester@globallogic.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4020>
nir: do not vectorize load/store if offset can overflow and robustness enabled This prevents vectorization for loads/stores that can overflow if the low offset is negative and the range greater or equal than 0. The caller can pass the list of variable modes that matter for robust access. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4881>
nir: Actually do load/store vectorization beyond vec2 nir_opt_load_store_vectorize has an is_strided_vector() function that looks for types with weird explicit strides. It does so by comparing the explicit stride against the type-size-derived typical stride. This had a subtle bug. Simple vector types (vec2/3/4) have no explicit stride, so glsl_get_explicit_stride() returns 0. This never matches the typical stride for a vector, so is_strided_vector() would return true for basically any vector type, causing the vectorizer to bail. I found this by looking at a compute shader with scalar SSBO loads at offsets 0x220, 0x224, 0x228, 0x22c. nir_opt_load_store_vectorize would properly vectorize the first two into a vec2 load, but would refuse to extend it to a vec3 and ultimately vec4 load because is_strided_vector() saw a vec2 and freaked out. Neither ACO nor ANV do load/store vectorization before lowering derefs, so this shouldn't affect them. However, I'd like to fix this bug to avoid the trap for anyone who decides to in the future. In a branch where anv used this lowering, this cut an additional 38% of the send messages in the shader by properly vectorizing more things. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4255>
nir/load_store_vectorize: Add support for nir_var_mem_global Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
nir/load_store_vectorize: Use nir_iadd_imm for offsets This makes it capable of handling 64-bit offsets Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
nir/load_store_vectorize: Fix shared atomic info These were clearly copied and pasted from SSBOs. The shared atomics don't have an SSBO index so their offset is src0 and data is src1. Fixes: ce9205c03bd20 "nir: add a load/store vectorization pass" Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
nir: fix nir_const_value_as_uint bit size in load/store vectorizer tests Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3690>
nir+vtn: vec8+vec16 support This introduces new vec8 and vec16 instructions (which are the only instructions taking more than 4 sources), in order to construct 8 and 16 component vectors. In order to avoid fixing up the non-autogenerated nir_build_alu() sites and making them pass 16 src args for the benefit of the two instructions that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is used for the > 4 src args case). v2 (Karol Herbst): use nir_build_alu2 for vec8 and vec16 use python's array multiplication syntax add nir_op_vec helper simplify nir_vec nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert use nir_build_alu for opcodes with <= 4 sources v3 (Karol Herbst): fix nir_serialize v4 (Dave Airlie): fix serialization of glsl_type handle vec8/16 in lowering of bools v5 (Karol Herbst): fix load store vectorizer Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com>
nir/load_store_vectorize: fix combining stores with aliasing loads between v2: add test Fixes: ce9205c03bd ('nir: add a load/store vectorization pass') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1) Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v2)
nir: add a load/store vectorization pass This pass combines intersecting, adjacent and identical loads/stores into potentially larger ones and will be used by ACO to greatly reduce the number of memory operations. v2: handle nir_deref_type_ptr_as_array v3: assume explicitly laid out types for derefs v4: create less deref casts v4: fix shared boolean vectorization v4: fix copy+paste error in resources_different v4: fix extract_subvector() to pass nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64 v4: rebase v5: subtract from deref/offset instead of scheduling offset calculations v5: various non-functional changes/cleanups v5: require less metadata and preserve more v5: rebase v6: cleanup and improve dependency handling v6: emit less deref casts v6: pass undef to components not set in the write_mask for new stores v7: fix 8-bit extract_vector() with 64-bit input v7: cleanup creation of store write data v7: update align correctly for when the bit size of load/store increases v7: rename extract_vector to extract_component and update comment v8: prevent combining of row-major matrix column acceses v9: rework process_block() to be able to vectorize more v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v9: remove entry::store_value, since it will not be updated if it's was from a vectorized load v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2 v9: handle nir_intrinsic_scoped_memory_barrier v10: use nir_ssa_scalar v10: handle non-32-bit offsets v10: use signed offsets for comparison v10: improve create_entry_key_from_offset() v10: support load_shared/store_shared v10: remove strip_deref_casts() v10: don't ever pass NULL to memcmp v10: remove recursion in gcd() v10: fix outdated comment v11: use the new nir_extract_bits() v12: remove use of nir_src_as_const_value in resources_different v13: make entry key hash function deterministic v13: simplify mask_sign_extend() v14: add comment in hash_entry_key() about hashing pointers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)