git.libre-soc.org Git - mesa.git/log

mesa/program: Link SPIR-V shaders using the SPIR-V code-path

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

mesa/glspirv: Add _mesa_spirv_link_shaders() function

This is the equivalent to link_shaders() from
src/compiler/glsl/linker.cpp, but for SPIR-V programs. It just
creates the program and its gl_linked_shader objects, giving drivers
the opportunity to implement any linking of SPIR-V shaders they choose,
at a later stage.

v2: Bail out if we see more that one shader for the same stage, and
    add a corresponding comment. (Timothy Arceri)

v3:
  * Adds also a linker error log to the condition above, with a
    reference to the specification issue. (Timothy Arceri)
  * Squash with the patch adding the function boilerplate (Timothy
    Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

mesa: Add a reference to gl_shader_spirv_data to gl_linked_shader

This is a reference to the spirv_data object stored in gl_shader, which
stores shader SPIR-V data that is needed during linking too.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

mesa: Implement glSpecializeShaderARB

v2:
  * Use gl_spirv_validation instead of spirv_to_nir.  This method just
    validates the shader. The conversion to NIR will happen later,
    during linking. (Alejandro Piñeiro)
  * Use gl_shader_spirv_data struct to store the SPIR-V data.
    (Eduardo Lima)
  * Use the 'spirv_data' member to tell if the gl_shader is a SPIR-V
    shader, instead of a dedicated flag. (Timothy Arceri)

Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir/spirv: add gl_spirv_validation method

ARB_gl_spirv adds the ability to use SPIR-V binaries, and a new
method, glSpecializeShader. Here we add a new function to do the
validation for this function:

From OpenGL 4.6 spec, section 7.2.1"

   "Shader Specialization", error table:

    INVALID_VALUE is generated if <pEntryPoint> does not name a valid
    entry point for <shader>.

    INVALID_VALUE is generated if any element of <pConstantIndex>
    refers to a specialization constant that does not exist in the
    shader module contained in <shader>.""

v2: rebase update (spirv_to_nir options added, changes on the warning
    logging, and others)

v3: include passing options on common initialization, doesn't call
    setjmp on common_initialization

v4: (after Jason comments):
  * Rename common_initialization to vtn_builder_create
  * Move validation method and their helpers to own source file.
  * Create own handle_constant_decoration_cb instead of reuse existing one

v5: put vtn_build_create refactoring to their own patch (Jason)

v6: update after vtn_builder_create method renamed, add explanatory
    comment, tweak existing comment and commit message (Timothy)

spirv: add vtn_create_builder

Refactored from spirv_to_nir, in order to be reused later.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2: renamed method (from vtn_builder_create), add explanatory comment
(Timothy)

i965: initialize SPIR-V capabilities

Needed for ARB_gl_spirv. Those are not the same that the Intel vulkan
driver. From the ARB_spirv_extensions spec:

   "3. If a new GL extension is added that includes SPIR-V support via
   a new SPIR-V extension does it's SPIR-V extension also get
   enumerated by the SPIR_V_EXTENSIONS_ARB query?.

   RESOLVED. Yes. It's good to include it for consistency. Any SPIR-V
   functionality supported beyond the SPIR-V version that is required
   for the GL API version should be enumerated."

So in addition to the core SPIR-V support, there is the possibility of
specific GL extensions enabling specific SPIR-V extensions (so
capabilities). That would mean that it is possible that OpenGL and
Vulkan not having the same capabilities supported, even for the same
driver. For this reason it is better to keep them separated.

As an example: at the time of this patch writing Intel vulkan driver
support multiview, but there isn't any OpenGL multiview GL extension
supported.

Note: we initialize SPIR-V capabilities at brwCreateContext instead of
the usual brw_initialize_context_constants because we want to do that
only if the extension is enabled.

v2:
   * Rebase update (SpirVCapabilities not a pointer anymore)
   * Fill spirv capabilities for OpenGL >= 3.3 (Ian Romanick)

v3:
   * Drop multiview support, as i965 doesn't support any multiview GL
     extension (Jason)
   * Fill spirv capabilities only if the extension is enabled (Jason)

v4: Capabilities are supported only on gen7+. Added comment and assert
    (Jason)

mesa: add gl_constants::SpirVCapabilities

For drivers to declare which SPIR-V features they support.

v2: Don't use a pointer (Ian Romanick)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

i965: Don't request GLSL IR lowering of gl_VertexID

Let the lowering in NIR handle it instead.

This hurts one shader that occurs twice in shader-db (SynMark GSCloth)
on IVB and HSW. No other shaders or platforms were affected.

total cycles in shared programs: 253438422 -> 253438426 (0.00%)
cycles in affected programs: 412 -> 416 (0.97%)
helped: 0
HURT: 2

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>

i965: Silence unused parameter warning

src/mesa/drivers/dri/i965/brw_draw_upload.c: In function ‘double_types’:
src/mesa/drivers/dri/i965/brw_draw_upload.c:225:34: warning: unused parameter ‘brw’ [-Wunused-parameter]
double_types(struct brw_context *brw,
^~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

spirv: Move SPIR-V building to Makefile.spirv.am and spirv/meson.build

Future changes will add generated files used only from
src/compiler/glsl. These can't be built from Makefile.nir.am, and we
can't move all the rules from Makefile.nir.am to Makefile.spirv.am (and
it would be silly anyway).

v2: Do it for meson too.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (the meson bits)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (the automake bits)

compiler: All leaf Makefile.am should use +=

This slightly simplifies later changes that add more Makefile.*.am
files.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

util: Include bitscan.h directly

Previously bitset.h would include u_math.h to get bitscan.h. u_math.h
lives in src/gallium/auxiliary/util while both bitset.h and bitscan.h
live in src/util. Having the one file directly include another file
that lives in the same directory makes much more sense.

As a side-effect, several files need to directly include standard header
files that were previously indirectly included.

v2: Fix build break in src/amd/common/ac_nir_to_llvm.c.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

util: Optimize util_is_power_of_two_nonzero

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

util: Use util_is_power_of_two_nonzero in u_vector

Previously size=0, element_size=0 would have been allowed. That
combination can only lead to despair.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

util: Add and use util_is_power_of_two_nonzero

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

util: Move util_is_power_of_two to bitscan.h and rename to util_is_power_of_two_or_zero

The new name make the zero-input behavior more obvious. The next
patch adds a new function with different zero-input behavior.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

meson: use dep_libdrm version for pkg-config

This corrects pkg-config to use the libdrm version (as computed by the
previous patch) instead of using a hardcoded value that may or may not
(probably not) be right.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

meson: Use the same version for all libdrm checks

Currently each driver specifies it's own version, and core libdrm
specifies a version. In the most common case this is fine, since there
will be exactly one libdrm installed on a system, but if there are more
than one it's possible that mesa will be linked against different
versions of libdrm. There is also the possibility that the current
approach makes the pkg-config files we generate incorrect, since there
could be #defines that use newer features if they're available.

This patch corrects all of that. All of the versions are still set by
driver (along with a default core version). Then all of the drivers that
are enabled have their versions compared and the highest version is
selected, then all libdrm checks are made with that version.

v2: - Reorder the list to have the name first and whether the dependency
is needed second (Eric)

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

meson: group libdrm dependencies

The reason libdrm is after libdrm_* will be made clear in later patches.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

gl.h: remove stale comment, trailing whitespace

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

glapi: add glBlendBarrier(), glPrimitiveBoundingBox() prototypes

in glapi_dispatch.c, as we have for many other GLES functions.
Fixes a cross-compile issue (missing prototype) when GLES support
is disabled.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>

st/mesa: silence unhandled switch case warning

And improve the unreachable() error message.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>

mesa: Inherit texture view multi-sample information from the original texture images.

Found running "The Witness" in Wine. Without this patch, texture views created
on multi-sample textures would have a GL_TEXTURE_SAMPLES of 0. All things
considered such views actually work surprisingly well, but when combined with
(plain) multi-sample textures in a framebuffer object, the resulting FBO is
incomplete because the sample counts don't match.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

radv: fix scanning output_usage_mask with structs

To fix a regression in:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output.struct

And the following regressions (Polaris only):
dEQP-VK.glsl.indexing.varying_array.*

Fixes: f3275ca01c ("ac/nir: only enable used channels when exporting parameters")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

nvc0/ir: fix emiting NOTs with predicates

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

broadcom/vc4: Fix out-of-tree build with automake.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

broadcom/vc5: Start using nir_opt_move_load_ubo().

In the absence of a general NIR or VIR-level scheduler, this at least
avoids spilling in
GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts

broadcom/vc5: Fix setup of integer surface clear values.

I'm disappointed that the compiler didn't warn me about use of
uninitialized uc in these paths. Just use the incoming clear color
instead of the packing temporary if we're doing our own packing.

Fixes GTF-GLES3.gtf.GL3Tests.color_buffer_float.color_buffer_float_clamp_*

broadcom/vc5: Stop trying to swizzle around RGBA4 clear color.

We always want A in the A slot in the tile buffer, and any other swapping
should happen elsewhere.

Fixes RGBA4-using cases in fbo-clear-formats and
GTF-GLES3.gtf.GL3Tests.color_buffer_float.color_buffer_float_clamp_fixed.

broadcom/vc5: Work around scissor w/h==0 bug same as rasterizer discard.

The 7268 HW apparently lets some rendering through in this case. Fixes
GTF-GLES2.gtf.GL2FixedTests.scissor.scissor

st: Don't try to finalize the texture in st_render_texture().

We can't necessarily finalize the texture at this point if we're rendering
to a texture image whose format is different from the baselevel's format.
This was introduced as a fix for fbo-incomplete-texture-03 in
de414f491526610bb260c73805c81ba413388e20, but the later fix for vmware on
that testcase in 95d5c48f68b598cfa6db25f44aac52b3e11403cc made it
unnecessary.

Fixes assertion failures in util_resource_copy_region() in
KHR-GLES3.copy_tex_image_conversions.forbidden.* when trying to finalize
an R8 texture image to the RG8 texture object's pt.

Reviewed-by: Brian Paul <brianp@vmware.com>

drirc: whitelist glthread for Medieval II: TW, Carnivores: DHR, Far Cry 2

radv: enable VK_AMD_shader_trinary_minmax extension

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

ac: add support for trinary_minmax instructions

v2: Add missing break (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

spirv: add support for SPV_AMD_shader_trinary_minmax

Co-authored-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

nir: add support for min/max/median of 3 srcs

These are needed for SPV_AMD_shader_trinary_minmax,
the AMD HW supports these.

Co-authored-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radeonsi: simplify DCC format categories

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radeonsi: don't use the SPI barrier management bug workaround

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radeonsi: use maximum OFFCHIP_BUFFERING on Vega12

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

ac/nir: Add workaround for GFX9 buffer views.

On GFX9 whether the buffer size is interpreted as elements or bytes
depends on whether IDXEN is enabled in the instruction. If the index
is a constant zero, LLVM optimizes IDXEN to 0.

Now the size in elements is interpreted in bytes which of course
results in out of bounds accesses.

The correct fix is most likely to disable the LLVM optimization,
but we need something to work with LLVM <= 6.0.

radeonsi does the max between stride and element count on the CPU
but that results in the size intrinsics returning the wrong size
for the buffer. This would cause CTS errors for radv.

v2: Also include the store changes.

Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

ac/surface: set AddrSurfInfoIn.format = ADDR_FMT_8 for stencil, add assertions

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105738

Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: enable VK_EXT_sampler_filter_minmax

Only enable for CIK+ because it's buggy on SI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: add support for VK_EXT_sampler_filter_minmax

The driver only supports the required formats for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: rename VEGA10 device name

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: add support for Vega12

Based on RadeonSI. Untested.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

build: Fix up nir_intrinsics.Plo

nir_intrinsics.c existed as a static file until commit 76dfed8ae2d5 began
generating it as part of the build process. autotools is incapable of
coping, and so a build-tree from before this commit would then fail with
it:

[4]: *** No rule to make target '../../../mesa/src/compiler/nir/nir_intrinsics.c', needed by 'nir/nir_intrinsics.lo'. Stop.

Add a few lines to configure.ac to update the broken build files.

Fixes: 76dfed8ae2d5 ("nir: mako all the intrinsics")

autotools: Include intel/dev/meson.build in tarball

Fixes: 272bef0601a1bdb5292771aefc8d62fcbdf4c47f
("intel: Split gen_device_info out into libintel_dev")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

autotools: include meson_get_version

Otherwise meson won't read the VERSION file and won't set a version.
That means that pkg-config files will have version unset as well.

Fixes: 3e9533d9b88d75d99632fa40e38cfed842d10842
("meson: Add script to use VERSION file for getting version")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

docs: fix 18.0 release note version

Fixes: 839fb3a696679bfe975c2 "docs: Update 18.0.0 release notes"
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

radeonsi: add support for Vega12

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

amd/addrlib: update to the latest version for Vega12

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

gbm: remove never-implemented function

I assume this was implemented in a previous version of that commit, but
was removed in the version that actually landed.

Fixes: 8430af5ebe1ee8119e14 "Add support for swrast to the DRM EGL platform"
Cc: Giovanni Campagna <gcampagna@src.gnome.org>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

android: Use new nir intrinsics python scripts

Fixes: 76dfed8ae2d5 ("nir: mako all the intrinsics")
Signed-off-by: Stefan Schake <stschake@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

broadcom/vc5: Fix padding of NPOT miplevels >= 2.

The power-of-two padded size that gets minified is based on level 1's
dimensions, not level 0's, which starts to differ at a width of 9.

Fixes all failures on texelFetch fs sampler2D 1x1x1-64x64x1

ac/radeonsi: pass bindless bool to load_sampler_desc()

We also fix the base_index for bindless by using the driver
location.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/glsl_to_nir: set driver location for bindless images and samplers

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi/nir: set uses_bindless_samplers for samplers

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

nir: add bindless to nir data

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

i965: Drop unnecessary bo->align field.

bo->align is always 0; there's no need to waste 8 bytes storing it.
Thanks to C99 initializers zeroing fields, we can completely drop the
only read of the field altogether.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Drop unused alignment parameter from brw_bo_alloc().

brw_bo_alloc no longer uses this parameter, so there's no point.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Drop alignment parameter from bo_alloc_internal().

Buffers are always page aligned on 965+ hardware; I believe this extra
parameter is a vestige from the Gen2-3 era.

All callers pass 0, and in fact we assert that the alignment is 0 unless
BO_ALLOC_BUSY is set (for some reason). We can just drop the parameter
and set the value to 0 explicitly.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Drop BO_ALLOC_BUSY in intel_miptree_create_for_bo().

intel_miptree_create_for_bo does not actually allocate a BO, so
specifying allocation flags accomplishes nothing and is confusing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Drop PIPE_CONTROL_NO_WRITE from various calls.

This is just zero - passing nothing already gives us a post-sync
operation of "nothing".

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

nir/intrinsics: Don't report negative dest_components

I have no idea why but having dest_components == -1 was causing a memory
leak somewhere. Without this, you can't get through a full shader-db
run without running out of memory.

Reviewed-by: Rob Clark <robdclark@gmail.com>

intel/fs: Don't emit a des copy for image ops with has_dest == false

This was causing us to walk dest_components times over a thing with no
destination. This happened to work because all of the image intrinsics
without a destination also happened to have dest_components == 0. We
shouldn't be reading dest_components if has_dest == false.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

nvc0/ir: fix INTERP_* with indirect inputs

There were two problems, both of which are fixed now:
- The indirect address was not being shifted by 4
- The indirect address was being placed as an argument in the offset case

This fixes some of the new interpolateAt* piglits which now test for
these situations.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

nir: fix crash in loop unroll corner case

When an if nesting inside anouther if is optimised away we can
end up with a loop terminator and following block that looks like
this:

        if ssa_596 {
                block block_5:
                /* preds: block_4 */
                vec1 32 ssa_601 = load_const (0xffffffff /* -nan */)
                break
                /* succs: block_8 */
        } else {
                block block_6:
                /* preds: block_4 */
                /* succs: block_7 */
        }
        block block_7:
        /* preds: block_6 */
        vec1 32 ssa_602 = phi block_6: ssa_552
        vec1 32 ssa_603 = phi block_6: ssa_553
        vec1 32 ssa_604 = iadd ssa_551, ssa_66

The problem is the phis. Loop unrolling expects the last block in
the loop to be empty once we splice the instructions in the last
block into the continue branch. The problem is we cant move phis
so here we lower the phis to regs when preparing the loop for
unrolling. As it could be possible to have multiple additional
blocks/ifs following the terminator we just convert all phis at
the top level of the loop body for simplicity.

We also add some comments to loop_prepare_for_unroll() while we
are here.

Fixes: 51daccb289eb "nir: add a loop unrolling pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670

st/glsl_to_nir: correctly handle arrays packed across multiple vars

Fixes piglit test:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi/nir: fix input processing for packed varyings

The location was only being incremented the first time we processed a
location. This meant we would incorrectly skip some elements of
an array if the first element was packed and proccessed previously
but other elements were not.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac/nir_to_llvm: fix component packing for double outputs

We need to wait until after the writemask is widened before we
adjust it for component packing.

Together with the previous patch this fixes a number of
arb_enhanced_layouts component layout piglit tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/glsl_to_nir: fix driver location for dual-slot packed doubles

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi/nir: fix scanning of multi-slot output varyings

This fixes tcs/tes varying arrays where we dont lower indirects and
therefore don't split arrays. Here we also fix useagemask for dual
slot doubles.

Fixes a number of arb_tessellation_shader piglit tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

broadcom/vc5: Fix RG16I/UI texture sampling.

How many times did I look at this table without noticing the missing 'G'
in the texture column?

Fixes KHR-GLES3.copy_tex_image_conversions.required.* on 7268.

nir: fix generated nir_intrinsics.c for MSVC

Apparently it is not happy about things like: .foo = {}

So skip over initializers for empty lists.

Fixes: 76dfed8ae2d5c6c509eb2661389be3c6a25077df
Reported-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>

docs: update calendar 18.0.0 is out

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

docs: add news item and link release notes for 18.0.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

docs: add sha256 checksums for 18.0.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fb64913d195112462786c0459d12f4bc8e7adee7)

docs: Update 18.0.0 release notes

Note: the file was originally 17.4.0, yet git stuggles to detect the
move :-\

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit dceb1ce807a8b0ab32dc16b38040969bdbcc0d1b)

nir: mako all the intrinsics

I threatened to do this a long time ago.. I probably *should* have done
it a long time ago when there where many fewer intrinsics.  But the
system of macro/#include magic for dealing with intrinsics is a bit
annoying, and python has the nice property of optional fxn params,
making it possible to define new intrinsics while ignoring parameters
that are not applicable (and naming optional params).  And not having to
specify various array lengths explicitly is nice too.

I think the end result makes it easier to add new intrinsics.

v2: couple small fixes found with a test program to compare the old and
    new tables
v3: misc comments, don't rely on capture=true for meson.build, get rid
    of system_values table to avoid return value of intrinsic() and
    *mostly* remove side-effects, add autotools build support
v4: scons build

Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>

nir: fix per_vertex_output intrinsic

This is supposed to have both BASE and COMPONENT but num_indices was
inadvertantly set to 1.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

glsl_types: fix build break with intel/msvc compiler

The VECN() macro was taking advantage of a GCC specific feature that is
not available on lesser compilers, mostly for the purposes of avoiding a
macro that encoded a return statement.

But as suggested by Ian, we could just have the macro produce the entire
method body and avoid the need for this. So let's do that instead.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105740
Fixes: f407edf3407396379e16b0be74b8d3b85d2ad7f0
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: add GL_HALF_FLOAT as supported type to readpixels

EXT_color_buffer_float spec states:

  "An INVALID_OPERATION error is generated ... if the color buffer is
   a floating-point format and type is not FLOAT, HALF FLOAT, or
   UNSIGNED_INT_10F_11F_11F_REV."

This means that GL_HALF_FLOAT type should be supported when color
buffer has floating-point format.

Fixes Android CTS test android.view.cts.PixelCopyTest.

v2: remove comments of EXT_color_buffer_half_float as
    EXT_color_buffer_float can use type GL_HALF_FLOAT

Signed-off-by: Lin Johnson <johnson.lin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

broadcom/vc5: Fix swizzling of RGB10_A2UI render targets.

This is the actual hardware layout, and we were only swizzling R/B back
around in texturing. Fixes part of
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx in
simulation.

broadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.

Just like TLB without a config uniform, we don't have a register index.

broadcom/vc5: Implement workaround for GFXH-1431.

This should fix some blending errors, but doesn't impact any testcases in
the CTS.

broadcom/vc5: Fix EZ disabling and allow using GT/GE direction as well.

Once we've disabled EZ for some draws, we need to not use EZ on future
draws. Implementing that made implementing the GT/GE direction trivial.

Fixes KHR-GLES3.shaders.fragdepth.compare.no_write on V3D 4.1 simulation.

broadcom/vc5: Disable TF on V3D 4.x when drawing with queries disabled.

On 3.x, we just don't flag the primitive as needing TF, but those
primitive bits are now allocated to the new primitive types. Now we need
to actually update the enable flag at draw time.

broadcom/vc5: Disable transform feedback on V3D 4.x at the end of the job.

The next job from this client will turn it back on unless TF gets
disabled, but we don't want the state to leak from this client to another
(which causes GPU hangs).

broadcom/vc5: Move the BCL epilogue code to a per-version compile.

I need to do some new packets for transform feedback on 4.1.

broadcom/vc5: Fix transform feedback in the presence of point size.

I had this note to myself, and it turns out that a lot of CTS tests use
XFB with points to get data out without using a fragment shader. Keep
track of two sets of precomputed TF specs (point size in VPM prologue or
not), and switch between them when we enable/disable point size.

broadcom/vc5: Split transform feedback specs update from buffers.

The specs update will be changing based on additional state flags in the
next commit, and this unindents the buffer update code.

broadcom/vc5: Limit each transform feedback data spec to 16 dwords.

The length-1 field only has 4 bits, so we need to generate separate specs
when there's too much TF output per buffer.

Fixes
GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_builtin_type
and transform_feedback_max_interleaved.

gallium/u_vbuf: Protect against overflow with large instance divisors.

GTF-GLES3.gtf.GL3Tests.instanced_arrays.instanced_arrays_divisor uses -1
as a divisor, so we would overflow to count=0 and upload no data,
triggering the assert below. We want to upload 1 element in this case,
fixing the test on VC5.

v2: Use some more obvious logic, and explain why we don't use the normal
round_up().

Reviewed-by: Brian Paul <brianp@vmware.com>

st: Allow accelerated CopyTexImage from RGBA to RGB.

There's nothing to worry about here -- the A channel just gets dropped by
the blit. This avoids a segfault in the fallback path when copying from a
RGBA16_SINT renderbuffer to a RGB16_SINT destination represented by an
RGBA16_SINT texture (the fallback path tries to get/fetch to float
buffers, but the float pack/unpack functions are NULL for SINT/UINT).

Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba16i on VC5.

v2: Extract the logic to a helper function and explain what's going on
better.
v3: const-qualify args

Reviewed-by: Brian Paul <brianp@vmware.com>

winsys/amdgpu: always allow GTT placements on APUs

Reviewed-by: Christian König <christian.koenig@amd.com>

radeonsi: don't reallocate on DMABUF export if local BOs are disabled

glsl: fix infinite loop caused by bug in loop unrolling pass

Just checking for 2 jumps is not enough to be sure we can do a
complex loop unroll. We need to make sure we also have also found
2 loop terminators.

Without this we were attempting to unroll a loop where the second
jump was nested inside multiple ifs which loop analysis is unable
to detect as a terminator. We ended up splicing out the first
terminator but failed to actually unroll the loop, this resulted
in the creation of a possible infinite loop.

Fixes: 646621c66da9 "glsl: make loop unrolling more like the nir unrolling path"
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670

gallium: Do not add -Wframe-address option for gcc <= 4.4.

This patch fixes these build errors with GCC 4.4.

Compiling src/gallium/auxiliary/util/u_debug_stack.c ...
src/gallium/auxiliary/util/u_debug_stack.c: In function ‘debug_backtrace_capture’:
src/gallium/auxiliary/util/u_debug_stack.c:268: error: #pragma GCC diagnostic not allowed inside functions
src/gallium/auxiliary/util/u_debug_stack.c:269: error: #pragma GCC diagnostic not allowed inside functions
src/gallium/auxiliary/util/u_debug_stack.c:271: error: #pragma GCC diagnostic not allowed inside functions

Fixes: 370e356ebab4 ("gallium: silence __builtin_frame_address nonzero argument is unsafe warning")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105529
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

gallium: Correct minor typo in header comments

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>