git.libre-soc.org Git - mesa.git/log

svga: Fix index buffer uploads

In the case of SWTNL and index translation we were uploading index buffers
and then reading out from them using the CPU. Furthermore, when translating
indices we often cached the results with an upload_mgr buffer, causing the
cached indexes to be immediately discarded on the next write to that
upload_mgr buffer.

Fix this by only uploading when we know the index buffer is going to be
used by hardware. If translating, only cache translated indices if the
original buffer was not a user buffer. In the latter case when we're not
caching, use an upload_mgr buffer for the hardware indices.

This means we can also remove the SWTNL hand-crafted index buffer upload
mechanism in favour of the upload_mgr.

Finally avoid using util_upload_index_buffer(). It wastes index buffer
space by trying to make sure that the offset of the indices in the
upload_mgr buffer is larger or equal to the position of the indices in
the source buffer. From what I can tell, the SVGA device does not
require that.

Testing done: Piglit quick. No regressions.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

winsys/svga: Make it possible to specify coherent resources

Add a flag in the surface cache key and a winsys usage flag to
specify coherent memory.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

gallium/util: Make u_debug_flush support persistent maps

Previously unsynchronized maps have been assumed to also be persistent,
Now destinguish between persistent and unsynchronized map and also support
PIPE_TRANSFER_PERSISTENT from ARB_buffer_storage.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

virgl: Add debug flag to bypass driconf to enable the BGRA tweaks

This useful for testing, also because with vtest the dri configuration
is not read.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: Add a tweak to set the value for emulated queries of GL_SAMPLES_PASSED

On GLES hosts GL_SAMPLES_PASSED is emulated by GL_ANY_SAMPLES_PASSED which returns a boolen.
With this tweak the value that is returned if any sample passed can be set. This
may be of iterest when an application decides whether some geometry is rendered based
on an amount of visibility and not just a binary desicion. virgelrenderer sets a default
of 1024 on th host.

v2: Remove reference from virgl and correct description (Emil)
v3: Send the tweak binary encoded instead of using strings (Gurchetan)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: Add tweak to apply a swizzle when drawing/blitting to a emulated BGRA texture

With Qemu this final swizzle is not needed, but with vtest it is, i.e. it depends on
how a program using virglrenderer uses the surface that is rendered to, hence
a tweak is added.

v2: Update description and fix spelling (Emil)
v3: Send tweak as binary value instead of using strings (Gurchetan)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: Add driconf tweak for emulating BGRA surfaces on GLES

These tweaks are used to fix rendering issues with Valve games and
at least also "The Raven Remastered" when run on a GLES host.

v2: Fix type in define and remove virgl from driconf option (Emil)
v3: Encode tweak binary instead of using strings (Gurchetan)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: Add override for BGRA format to use swizzled SRGB format

Tie in the check whether the host supports tweaks and whether this tweak
is enabled.

v2: Add comment about the emulated formats not being used directly in the
guest (Gurchetan)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: Add code to accept BGRx_SRGB as RGBx_SRGB

This will be enabled in later patches by the emulation tweak.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: Add skeleton to evaluate cap and send tweaks

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: factor out format host bits check

This will make it a single location when we want to replace a format.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

gallium/virgl: Add code path for virgl to read driconf

This works only for the drm variant of virgl and not for the vtest
variant.

v2: Rebase, replace the configuration query function by a pointer to
the configuration data.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

virgl: Add driinfo file and tie it into the build

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>

glspirv: Call pass to lower frexp instructions

These were previously handled by the spirv_to_nir, but that changed to
be an explict pass in 23d30f4099f "spirv,nir: lower
frexp_exp/frexp_sig inside a new NIR pass"

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv: Restrict use of descriptor intrinsics to Vulkan

In ARB_gl_spirv we'll be able to use variables for uniform buffers, so
don't use the descriptor intrinsics to lower the block access.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

ac/rtld: report better error messages for LDS overallocation

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

ac/rtld: check correct LDS max size

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

radeonsi: add s_sethalt to shaders for debugging

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

ac/rtld: fix sorting of LDS symbols by alignment

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

meson: Allow building radeonsi with just the android platform.

Just as was allowed by autotools.

Fixes: 108d257a168 "meson: build libEGL"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv: Fix vulkan build in meson.

Apparently the android part was never ported to meson.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

radv: Fix vulkan build in meson.

Apparently the android part was never ported to meson.

CC: <mesa-stable@lists.freedesktop.org>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

anv/image: Set different usage flags for shadow surfaces

For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter
because the flags were already more-or-less what we wanted. However,
for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT
so we were getting W-tiled which isn't what we want for the shadow. By
passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now
get something that's actually texturable.

Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"

anv: Flush caches in anv_image_copy_to_shadow

Copies to a shadow image happen during a VkCmdPipelineBarrier or at
subpass transitions. We could potentially be a bit more conservative
but these transitions shouldn't happen often and it's better to have our
bases covered.

Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"

nir: Make nir_constant a vector rather than a matrix

Most places in NIR, we treat matrices like arrays.  The one annoying
exception to this has been nir_constant where a matrix is a first-class
thing.  This commit changes that so a matrix nir_constant is the same as
an array nir_constant.  This makes matrix nir_constants a tiny bit more
expensive but shrinks all others by 96B.

Reviewed-by: Karol Herbst <kherbst@redhat.com>

glsl/nir: Fix handling of 64-bit values in uniform storage

Reviewed-by: Karol Herbst <kherbst@redhat.com>

spirv: Only copy needed components for OpSpecConstantOp

Reviewed-by: Karol Herbst <kherbst@redhat.com>

spirv: Use a single path for OpSpecConstantOp of OpVectorShuffle

Now that nir_const_value is a scalar, there's no reason why we need
multiple paths here and it's just extra paths to keep working. While
we're here, we also add a vtn_fail_if check that component indices are
in-bounds.

Reviewed-by: Karol Herbst <kherbst@redhat.com>

spirv: Use vtn_constan_uint() for array lengths and gather components

Reviewed-by: Karol Herbst <kherbst@redhat.com>

spirv: Add a vtn_constant_int helper

Reviewed-by: Karol Herbst <kherbst@redhat.com>

glsl/types: Add a real is_integer helper

Reviewed-by: Karol Herbst <kherbst@redhat.com>

glsl/types: Rename is_integer to is_integer_32

It only accepts 32-bit integers so it should have a more descriptive
name. This patch should not be a functional change.

Reviewed-by: Karol Herbst <kherbst@redhat.com>

glsl/types: Ignore bit sizes in contains_integer()

All of the callers for this function are looking at interpolation
qualifiers and want to make sure they're declared flat. Any 64-bit
integer inputs need to be flat. It's also makes the function make more
sense since "integer" is fairly generic.

Reviewed-by: Karol Herbst <kherbst@redhat.com>

glsl/types: Handle all bit sizes in glsl_type_is_integer

All of the callers of this function really just want to know if the type
is an integer and don't care about bit size.

Reviewed-by: Karol Herbst <kherbst@redhat.com>

glsl/nir_opt_access: Update uniforms correctly when only vars change

Even if only variables access flags are changed, the existing NIR
infrastructure expects metadata to be explicitly preserved, so do
that. Don't care about avoiding preserve to be called twice since the
cost is negligible.

This scenario can be triggered by dead variables, and also by other
intrinsics that read the variables -- but not cause progress to be
made when processing the intrinsics.

Fixes: f2d0e48ddc7 "glsl/nir: Add optimization pass for access flags"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl/nir: Fix getting the sampler dim when arrays are involved

Unwrap any array in the variable type so we can get the sampler dim.

This fixes piglit test
spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-const-uniform-index.shader_test.

Fixes: f2d0e48ddc7 "glsl/nir: Add optimization pass for access flags"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

meson: Search for execinfo.h

Rather than checking __GLIBC__/__UCLIBC__ macros as a proxy for
execinfo.h presence, just check directly. This allows the build to work
on musl.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

util: Heap-allocate 256K zlib buffer

The disk cache code tries to allocate a 256 Kbyte buffer on the stack.
Since musl only gives 80 Kbyte of stack space per thread, this causes a
trap.

See https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size

(In musl-1.1.21 the default stack size has increased to 128K)

[mattst88]: Original author unknown, but I think this is small enough
that it is not copyrightable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv: Fix wrong printf formatter

%lu is for unsigned long, %zu is for size_t. Just cast the data.

iris: Bail on queries for INTEL_NO_HW=1.

We don't execute any of the commands to record snapshots, so we can't
actually produce a real result. We do however need to avoid waiting
on a syncpt which will never be signalled. So, just return 0.

virgl: Support VIRGL_BIND_SHARED

Support a new virgl bind type for shared buffers.

Signed-off-by: David Riley <davidriley@chormium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

anv: write spirv-nir logs back to the application

Using the existing VK_EXT_debug_report extension.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

ac/nir: Set speculatable for buffer loads where allowed

This brings the nir path in line with the TGSI path.

Totals from affected shaders:
SGPRS: 2984 -> 2984 (0.00 %)
VGPRS: 2792 -> 2652 (-5.01 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 247380 -> 248072 (0.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 121 -> 132 (9.09 %)
Wait states: 0 -> 0 (0.00 %)

Most of the change came from DiRT: Showdown, and came from sinking SSBO
loads.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir: Use reorderable access flag

No changes with radeonsi shader-db.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir: Add a helper to determine if an intrinsic can be reordered

This is simple now, but we're going to be adding a few more conditions
to this later.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

st/nir: Use gl_nir_opt_access

Nothing uses its results yet, that will come with the following commits.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

glsl/nir: Add optimization pass for access flags

Right now, this just deduces when we can arbitrarily reorder SSBO and
image loads, matching the existing logic in radeonsi's TGSI->LLVM pass.
This approach can't handle some things that nir_opt_copy_prop_vars can,
but it can handle images, and with GCM it lets us hoist reads outside of
loops. We can also pass this information to LLVM which lets it do its
own optimizations on it.

This is GLSL only as I haven't tested it on Vulkan yet, and it would
probably need a few changes to work there.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir: Add reorderable memory access enum

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir/copy_prop_vars: Ignore volatile accesses

The spec explicitly says that volatile writes can't be removed and
volatile reads do not guarantee that the same value will still be around
after the read, as if there were a barrier after each read/write. Just
ignore them.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

glsl/nir: Propagate access qualifiers

We were completely ignoring these before, except for putting them on
variables. While we're here, don't set access qualifiers when converting
to bindless since glsl_to_nir will already have set a more accurate
qualifier that includes any qualifiers on struct members that are
dereferenced.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir: Allow qualifiers on copy_deref and image instructions

In the next commit, we'll properly handle access qualifiers on struct
members by propagating them to load/store instructions, but these
instructions had no way to specify the qualifier.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

ac,radeonsi: Always mark buffer stores as inaccessiblememonly

inaccessiblememonly means that it doesn't modify memory accesible via
normal LLVM pointers. This lets LLVM's dead store elimination, memcpy
forwarding, etc. ignore functions with this attribute. We don't
represent descriptors as pointers, so this property is always true of
buffer and image stores. There are plans to represent descriptors via
pointers, but this just means that now nothing is inaccessiblememonly,
as LLVM will then understand loads/stores via its usual alias analysis.

Radeonsi was mistakenly only setting it if the driver could prove that
there were no reads, and then it was cargo-culted into ac_llvm_build
and ac_llvm_to_nir. Rip it out of everything.

statistics with nir enabled:

Totals from affected shaders:
SGPRS: 152 -> 152 (0.00 %)
VGPRS: 128 -> 132 (3.12 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 9324 -> 9244 (-0.86 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 17 -> 17 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

The only difference was a manhattan31 shader.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

egl: add missing #include

close() is in <unistd.h>

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

radv: disable viewport clamping even if FS doesn't write Z

This fixes new CTS dEQP-VK.pipeline.depth_range_unrestricted.*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: implement compressed FMASK texture reads with RADV_PERFTEST=tccompatcmask

This allows us to disable the FMASK decompress pass when
transitioning from CB writes to shader reads.

This will likely be improved and enabled by default in the future.

No CTS regressions on GFX8 but a few number of multisample CTS
failures on GFX9 (they look related to the small hint).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: fix FMASK expand with SRGB formats

Found while working on DCC for MSAA.

Fixes: 6b976024a87 ("radv: add support for FMASK expand")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

panfrost: Move to use ralloc for some allocations

We have some serious leaks, so plug some and also move to ralloc to
limit the lifetime of some objects to that of their parent.

Lots more such work to do.

For some reason, this fixes:

dEQP-GLES2.functional.lifetime.attach.deleted_output.texture_framebuffer

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

egl: Don't add hardware device if there is no render node v2.

Do not offer a hardware drm backed egl device if no render node
is available. The current implementation will fail on this
egl device. On top it issues a warning that is actually missleading.
There are finally more error paths that can fail on the way to a
hardware backed egl device. Fixing all of them would kind of require
opening the drm device and see if there is a usable driver associated
with the device. The taken approach avoids a full probe and fixes at
least this kind of problem on kvm virtualization hosts I observe here.

Fixes: dbb4457d985 ("egl: add EGL_EXT_device_drm support")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

etnaviv: support GL_ARB_seamless_cubemap_per_texture

Passes spec@amd_seamless_cubemap_per_texture@amd_seamless_cubemap_per_texture

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-By: Guido Günther <agx@sigxcpu.org>

etnaviv: update headers from rnndb

Update to etna_viv commit a3bf0da.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>

radeonsi: fix undefined shift in macro definition

Pointed out by coverity

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

nouveau: fix frees in unsupported IR error paths.

This is pointless in that we won't ever hit those paths in real life,
but coverity complains.

Fixes: f014ae3c7cce ("nouveau: add support for nir")
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

panfrost: Move clearing logic into pan_job

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

virgl: fix sync issue regarding discard/unsync transfers

GL_MAP_INVALIDATE_BUFFER_BIT cannot be treated as
GL_MAP_INVALIDATE_RANGE_BIT naively.  When we run into

  ptr = glMapBufferRange(buf, 0, size,
          GL_WRITE_BIT|GL_MAP_INVALIDATE_BUFFER_BIT);
  memcpy(ptr, data1, size);
  glUnmapBuffer(buf);
  ptr = glMapBufferRange(buf, size, size,
          GL_WRITE_BIT|GL_MAP_UNSYNCHRONIZED_BIT);
  memcpy(ptr, data2, size);
  glUnmapBuffer(buf);

we never want data1 to be copy_transfer'ed.  Because that would mean
that data2 might overwrite valid data.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis alexandros.frantzis@collabora.com
Fixes: a22c5df0794 ("virgl: Use buffer copy transfers to avoid waiting when mapping")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

panfrost: Enable sRGB

Now that sRGB formats are supported for both rendering and sampling,
advertise support.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Disable AFBC on sRGB buffers

The performance impact is slightly mitigated by tiling the render
target, but it's undeniably still slow compared to AFBC. Unfortunately,
it doesn't look like AFBC and sRGB play nice...

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Enable sRGB fixed-function blending

For fixed-function, we have hardware to handle sRGB so we just set a
flag. For blend shaders, it's rather more involved; this is currently
unimplemented. Assert it out for now; we don't need it quite yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Specify sRGB in the render target

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Implement sRGB texturing

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Add sRGB render target flag

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Implement tiled rendering

We already can sample from Mali's linear/tiled encoding (the one from
Utgard -- AFBC is mostly unrelated); let's be able to render to it as
well.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Decode rendering block type

A mode for rendering tiled/uncompressed was noticed, so we reshuffle the
MFBD render target definitions to explicitly include block type.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Refactor texture targets

This combines the two cmdstream bits "is_3d" and "is_not_cubemap" into a
single 2-bit texture target selection, noticing it's the same as the
2-bit selection in Midgard and Bifrost texturing ops. Accordingly, we
share this definition and add the missing entry for 1D/buffer textures.

This requires a nontrivial (but functionally similar) refactor of all
parts of the driver to use the new definitions appropriately.
Theoretically, this should add support for buffer textures, but that's
obviously not tested and probably wouldn't work.

While doing so, we notice the sRGB enable bit, which we document and
decode as well here so we don't forget about it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Figure out job requirements in pan_job.c

Requirements for a job should be figured out in pan_job.c

v2: [Alyssa] Fix early return

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Reset job counters once the job is submitted

Move the reset out of frame invalidation into job submission

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Initial implementation of panfrost_job_submit

Start fleshing out panfrost_job

v2: [Alyssa: Remove unused variable, warning introduced]

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

virgl_hw: add YUV support

Add corresponding entries from p_format.h

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

virgl: sync to virglrenderer virgl_hw.h

It's nice to keep these two files in sync, as they define
guest userspace <---> host userspace communcation.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

anv: Make border colors the right size and alignment on HSW

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

imgui: bump imgui memory editor copy

Getting rid of a compiler warning :

In file included from ../src/intel/tools/aubinator_viewer.cpp:225:
../src/imgui/imgui_memory_editor.h: In member function ‘void MemoryEditor::DisplayPreviewData(size_t, const u8*, size_t, MemoryEditor::DataType, MemoryEditor::DataFormat, char*, size_t) const’:
../src/imgui/imgui_memory_editor.h:637:16: warning: enumeration value ‘DataType_COUNT’ not handled in switch [-Wswitch]
switch (data_type)
^

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

panfrost/midgard: Enable autovectorization

Enable nir_opt_vectorize.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

nir: add a vectorization pass

This effectively does the opposite of nir_lower_alus_to_scalar, trying
to combine per-component ALU operations with the same sources but
different swizzles into one larger ALU operation. It uses a similar
model as CSE, where we do a depth-first approach and keep around a hash
set of instructions to be combined, but there are a few major
differences:

1. For now, we only support entirely per-component ALU operations.
2. Since it's not always guaranteed that we'll be able to combine
equivalent instructions, we keep a stack of equivalent instructions
around, trying to combine new instructions with instructions on the
stack.

The pass isn't comprehensive by far; it can't handle operations where
some of the sources are per-component and others aren't, and it can't
handle phi nodes. But it should handle the more common cases, and it
should be reasonably efficient.

[Alyssa: Rebase on latest master, updating with respect to typeless
moves]

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>

panfrost: Add support for TXS instructions

This patch adds support for nir_texop_txs instructions which are needed
to support the OpenGL textureSize() function. This is also needed to
support RECT texture sampling which is currently lowered to 2D sampling +
a TXS() instruction by the nir_lower_tex() helper.

Changes in v2:
* Split options for the 1st and 2nd tex lowering passes

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Prepare things to support non-native texture ops

We are about to add support for the TXS (texture size) op which is not
implemented using a midgard texture instruction. Let's rename emit_tex()
into emit_texop_native() and repurpose emit_tex() as a dispatcher.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Move sysval upload logic out of panfrost_emit_for_draw()

We're about to add more sysval types, and panfrost_emit_for_draw()
is big enough, so let's move the sysval upload logic in a separate
function.

We also add one sub-function per sysval type to keep the
panfrost_upload_sysvals() small/readable.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Make the sysval logic more generic

We are about to add support for nir_texop_txs which requires adding a
sysval/uniform containing the texture size. Let's change the
emit_sysval_read() prototype to take a nir_instr object instead of
a nir_intrinsic_instr one so we can re-use this function when emitting
a sysval for a txs instruction.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

nir/lower_tex: Add a way to lower TXS(non-0-LOD) instructions

The V3D driver has an open-coded solution for this, and we need the
same thing for Panfrost, so let's add a generic way to lower TXS(LOD)
into max(TXS(0) >> LOD, 1).

Changes in v2:
* Use == 0 instead of !
* Rework the minification logic as suggested by Jason
* Assign cursor pos at the beginning of the function
* Patch the LOD just after retrieving the old value

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

nir/lower_tex: Update ->sampler_dim value before calling get_texture_size()

get_texture_size() will create a txs instruction with ->sampler_dim set
to the original tex->sampler_dim. The condition to call lower_rect()
only checks the value of ->sampler_dim and whether lower_rect is
requested or not. This leads to an infinite loop when calling
nir_lower_tex() with the same options until it returns false.

In order to avoid that, let's move the tex->sampler_dim patching before
get_texture_size() is called. This way the txs instruction will have
->sampler_dim set to GLSL_SAMPLER_DIM_2D and nir_lower_tex() won't try
to lower it on the subsequent passes.

Changes in v2:
* Add Jason R-b
* Add a comment explaining why we patch ->sampler_dim at the beginning
of the lower_rect() func

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

nir/lower_tex: Actually report when projector lowering happened

The code considers that projector lowering was done even if it's not
really the case. Change the project_src() prototype to return a bool
encoding whether projector lowering happened or not and update the
progress var accordingly in nir_lower_tex_block().

---
Changes in v2:
* Add Jason R-b
* Drop the part suggesting that nir_lower_rect() could be called in
a do-while(progress) loop.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Adapt to constant name change in UABI

We hadn't updated the kernel header after the driver got into mainline.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: ci: Update results

Alyssa fixed some failing tests last night.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

radv: adjust the DCC base VA for mipmapped color attachments

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: fix color decompressions for FMASK/CMASK

Only skip levels without DCC when it's a DCC decompression.
Whoops.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: do not decompress levels without DCC with the graphics path

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: do not decompress levels without DCC with the compute path

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: check if DCC is enabled per mip not for the whole image

In other words, make use of radv_dcc_enabled() instead of
radv_image_has_dcc() all over the places.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

v3d: implement simultaneous peripheral access exceptions for V3D 4.1+

Shader-db results:

total instructions in shared programs: 9117550 -> 9102719 (-0.16%)
instructions in affected programs: 1752873 -> 1738042 (-0.85%)
helped: 7076
HURT: 478
helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2
helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07%
HURT stats (abs)   min: 1 max: 7 x̄: 1.41 x̃: 1
HURT stats (rel)   min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54%
95% mean confidence interval for instructions value: -2.00 -1.92
95% mean confidence interval for instructions %-change: -1.58% -1.50%
Instructions are helped.

total max-temps in shared programs: 1327774 -> 1327728 (<.01%)
max-temps in affected programs: 1025 -> 979 (-4.49%)
helped: 47
HURT: 2
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17%
95% mean confidence interval for max-temps value: -1.06 -0.82
95% mean confidence interval for max-temps %-change: -8.89% -5.49%
Max-temps are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>

v3d: only flush jobs accessing the query BO when reading query results

Reviewed-by: Eric Anholt <eric@anholt.net>

v3d: add a helper function to flush jobs using a BO

v2: use _mesa_set_search() (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>

iris: Support more RGBX pipe formats.

Without them, the state tracker falls back to an RGBA format, but it
doesn't always manage to override the swizzle for us. So we lose the
information that the API expects an X channel, where alpha is garbage
and reads back as 1. We have no equivalent ISL RGBX format for these,
so we just use RGBA directly and override the swizzle in all cases.