mesa.git
5 years agollvmpipe: enable ARB_shader_image_load_store
Dave Airlie [Sat, 20 Jul 2019 04:29:00 +0000 (14:29 +1000)]
llvmpipe: enable ARB_shader_image_load_store

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: flush on api memorybarrier.
Dave Airlie [Thu, 22 Aug 2019 01:28:40 +0000 (11:28 +1000)]
llvmpipe: flush on api memorybarrier.

Until we have somewhere we can do better, just hit it with a hammer.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: add memory barrier support
Dave Airlie [Wed, 21 Aug 2019 00:19:16 +0000 (10:19 +1000)]
gallivm: add memory barrier support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: add support for fences api on older llvm
Dave Airlie [Wed, 21 Aug 2019 06:43:55 +0000 (16:43 +1000)]
gallivm: add support for fences api on older llvm

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: bind vertex/geometry shader images
Dave Airlie [Sat, 20 Jul 2019 04:28:45 +0000 (14:28 +1000)]
llvmpipe: bind vertex/geometry shader images

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: add fragment shader image support
Dave Airlie [Sat, 20 Jul 2019 04:28:23 +0000 (14:28 +1000)]
llvmpipe: add fragment shader image support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agodraw: add vs/gs images support
Dave Airlie [Sat, 20 Jul 2019 04:26:48 +0000 (14:26 +1000)]
draw: add vs/gs images support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: add image load/store/atomic support
Dave Airlie [Fri, 19 Jul 2019 09:06:48 +0000 (19:06 +1000)]
gallivm: add image load/store/atomic support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm/tgsi: add image interface to tgsi builder
Dave Airlie [Fri, 19 Jul 2019 06:33:03 +0000 (16:33 +1000)]
gallivm/tgsi: add image interface to tgsi builder

This adds the callbacks for the driver/gallium binding for
image operations.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: introduce image jit type to fragment shader jit.
Dave Airlie [Fri, 19 Jul 2019 06:29:10 +0000 (16:29 +1000)]
llvmpipe: introduce image jit type to fragment shader jit.

This adds the image type to the fragment shader jit context

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agodraw: add jit image type for vs/gs images.
Dave Airlie [Fri, 19 Jul 2019 06:28:12 +0000 (16:28 +1000)]
draw: add jit image type for vs/gs images.

This introduces the jit image type into the jit interface
for vertex/geom shaders

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: move the fragment shader variant key to dynamic length.
Dave Airlie [Fri, 19 Jul 2019 06:07:47 +0000 (16:07 +1000)]
llvmpipe: move the fragment shader variant key to dynamic length.

This mirrors the vs/gs keys, and will be needed when adding images
support.

The const changes also mirror how the draw code work (as is needed
when we add images)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: add a basic image limit
Dave Airlie [Fri, 19 Jul 2019 05:52:14 +0000 (15:52 +1000)]
gallivm: add a basic image limit

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: handle early test property.
Dave Airlie [Fri, 19 Jul 2019 05:45:22 +0000 (15:45 +1000)]
llvmpipe: handle early test property.

Also handle setting late for shaders that use stores

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: move first/last level jit texture members.
Dave Airlie [Fri, 19 Jul 2019 05:04:55 +0000 (15:04 +1000)]
gallivm: move first/last level jit texture members.

This lets us create an image structure with the same basic
types as the texture one.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: handle helper invocation (v2)
Dave Airlie [Thu, 4 Jul 2019 01:33:22 +0000 (11:33 +1000)]
gallivm: handle helper invocation (v2)

Just invert the exec_mask to get if this is a helper or not.

v2: get the bld mask (Roland)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: make lp_build_float_to_r11g11b10 take a const src
Dave Airlie [Sun, 30 Jun 2019 20:49:59 +0000 (06:49 +1000)]
gallivm: make lp_build_float_to_r11g11b10 take a const src

This allows using it with a const src later.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: refactor jit type creation
Dave Airlie [Fri, 28 Jun 2019 21:34:18 +0000 (07:34 +1000)]
llvmpipe: refactor jit type creation

This just cleans the code up so the texture/sampler type
creation can be reused.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agogallivm: fix atomic compare-and-swap
Dave Airlie [Tue, 20 Aug 2019 05:44:50 +0000 (15:44 +1000)]
gallivm: fix atomic compare-and-swap

Not sure how I missed this before, but compswap was hitting an
assert here as it is it's own special case.

Fixes: b5ac381d8f ("gallivm: add buffer operations to the tgsi->llvm conversion.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agointel/fs: grab fail_msg from v32 instead of v16 when v32->run_cs fails
Paulo Zanoni [Wed, 14 Aug 2019 00:02:13 +0000 (17:02 -0700)]
intel/fs: grab fail_msg from v32 instead of v16 when v32->run_cs fails

Looks like a copy/paste error. This patch prevents a segfault when
running the following on BDW:

    INTEL_DEBUG=no8,no16,do32 ./deqp-vk -n \
        dEQP-VK.subgroups.arithmetic.compute.subgroupmin_dvec4

For the curious, the message we're getting is:

    CS compile failed: Failure to register allocate.  Reduce number
    of live scalar values to avoid this.

Fixes: 864737ce6cd5 ("i965/fs: Build 32-wide compute shader when needed.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
5 years agopan/midgard: Fix invert fusing with r26
Alyssa Rosenzweig [Mon, 26 Aug 2019 19:48:14 +0000 (12:48 -0700)]
pan/midgard: Fix invert fusing with r26

The invert wasn't applying (correctly) due to the issues addressed here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Fold ssa_args into midgard_instruction
Alyssa Rosenzweig [Mon, 26 Aug 2019 18:58:27 +0000 (11:58 -0700)]
pan/midgard: Fold ssa_args into midgard_instruction

This is just a bit of refactoring to simplify MIR.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agogallium: Add the ASTC 3D formats.
Eric Anholt [Wed, 14 Aug 2019 19:23:46 +0000 (12:23 -0700)]
gallium: Add the ASTC 3D formats.

No driver implements them yet, but this is a long way toward gallium
having matching format enums for Mesa formats.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agogallium: Add block depth to the format utils.
Eric Anholt [Wed, 14 Aug 2019 19:16:46 +0000 (12:16 -0700)]
gallium: Add block depth to the format utils.

I decided not to update nblocks() with a depth arg as the callers
wouldn't be doing ASTC 3D.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agogallium: Add a block depth field to the u_formats table.
Eric Anholt [Wed, 14 Aug 2019 19:09:04 +0000 (12:09 -0700)]
gallium: Add a block depth field to the u_formats table.

To add ASTC 3D compression formats, we need to be able to express the
block depth.  While I'm touching every line, line up the columns of
the CSV again as they've drifted over time.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agopan/midgard: Add imov->fmov optimization
Alyssa Rosenzweig [Fri, 23 Aug 2019 23:14:13 +0000 (16:14 -0700)]
pan/midgard: Add imov->fmov optimization

When moving constants, if switching to a floating-point representation
doesn't break anything, we'd rather have an fmov than an imov,
permitting inlining the constant in many circumstances.

total quadwords in shared programs: 3408 -> 3366 (-1.23%)
quadwords in affected programs: 1188 -> 1146 (-3.54%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 9.65% x̃: 11.11%
95% mean confidence interval for quadwords value: -1.07 -0.98
95% mean confidence interval for quadwords %-change: -11.38% -7.93%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Switch constants to uint32
Alyssa Rosenzweig [Fri, 23 Aug 2019 23:02:49 +0000 (16:02 -0700)]
pan/midgard: Switch constants to uint32

Storing constants as float doesn't make sense when we have integer
instructions; better to switch to be integer natively and coerce to/from
float rather than the opposite.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoisl: Don't set UnormPathInColorPipe for integer surfaces.
Kenneth Graunke [Sat, 24 Aug 2019 00:32:06 +0000 (17:32 -0700)]
isl: Don't set UnormPathInColorPipe for integer surfaces.

This fixes dEQP-GLES3.functional.texture.specification subtests on iris:

- texsubimage3d_depth.depth24_stencil8_2d_array
- texsubimage3d_depth.depth32f_stencil8_2d_array
- texsubimage3d_depth.depth_component32f_2d_array
- texsubimage3d_depth.depth_component24_2d_array
- texstorage2d.format.depth24_stencil8_2d
- texstorage2d.format.depth32f_stencil8_2d
- texstorage2d.format.depth_component24_2d
- texstorage2d.format.depth_component32f_2d
- texstorage3d.format.depth24_stencil8_2d_array
- texstorage3d.format.depth32f_stencil8_2d_array
- texstorage3d.format.depth_component24_2d_array
- texstorage3d.format.depth_component32f_2d_array

Here, something appears to be going wrong with having this bit set
during blorp_copy operations for texture upload, which override the
format to R8G8B8A8_UINT.

AFAICT this bit should have no effect for integer surfaces, as it has
to do with blending, and integer blending is not a thing.  So it should
be harmless to disable it.

The Windows driver appears to be setting this bit universally, so
I am unclear why we would need to.  Perhaps they simply haven't run
into this issue.

Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoisl: Drop UnormPathInColorPipe for buffer surfaces.
Kenneth Graunke [Sat, 24 Aug 2019 00:32:24 +0000 (17:32 -0700)]
isl: Drop UnormPathInColorPipe for buffer surfaces.

Jason suggested I remove this in review, and he's right.  AFAICT this
affects blending, and that just isn't going to happen on buffers.

Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agopan/midgard, bifrost: Set lower_fdph = true
Alyssa Rosenzweig [Mon, 26 Aug 2019 14:46:43 +0000 (07:46 -0700)]
pan/midgard, bifrost: Set lower_fdph = true

fdph instructions show up in some desktop GL shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoradv: add mipmap support for the clear depth/stencil values
Samuel Pitoiset [Thu, 6 Jun 2019 15:30:17 +0000 (17:30 +0200)]
radv: add mipmap support for the clear depth/stencil values

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: add mipmap support for the TC-compat zrange bug
Samuel Pitoiset [Thu, 6 Jun 2019 15:23:17 +0000 (17:23 +0200)]
radv: add mipmap support for the TC-compat zrange bug

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: allocate metadata space for mipmapped depth/stencil images
Samuel Pitoiset [Fri, 23 Aug 2019 08:48:20 +0000 (10:48 +0200)]
radv: allocate metadata space for mipmapped depth/stencil images

For each mipmaps, the driver will store the clear values (8-bytes)
and the TC-compat zrange value (4-bytes).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: decompress mipmapped depth/stencil images during transitions
Samuel Pitoiset [Thu, 6 Jun 2019 10:03:10 +0000 (12:03 +0200)]
radv: decompress mipmapped depth/stencil images during transitions

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: add mipmaps support for decompress/resummarize
Samuel Pitoiset [Thu, 6 Jun 2019 09:57:19 +0000 (11:57 +0200)]
radv: add mipmaps support for decompress/resummarize

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: add radv_process_depth_image_layer() helper
Samuel Pitoiset [Thu, 6 Jun 2019 09:46:21 +0000 (11:46 +0200)]
radv: add radv_process_depth_image_layer() helper

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoac/nir: Remove gfx9_stride_size_workaround_for_atomic
Connor Abbott [Fri, 23 Aug 2019 08:46:53 +0000 (10:46 +0200)]
ac/nir: Remove gfx9_stride_size_workaround_for_atomic

The workaround was entirely in common code, and it's needed in radeonsi
too so just always do it when necessary. Fixes
KHR-GL45.shader_image_load_store.advanced-allStages-oneImage on gfx9
with LLVM 8.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac/nir: add a workaround for viewing a slice of 3D as a 2D image
Connor Abbott [Thu, 22 Aug 2019 15:09:21 +0000 (17:09 +0200)]
ac/nir: add a workaround for viewing a slice of 3D as a 2D image

GL and Vulkan allow you to bind a single layer of a 3D texture to a 2D
image, and we weren't implementing a workaround for that on gfx9 that
TGSI was. Copy it over.

Fixes KHR-GL45.shader_image_load_store.non-layered_binding with radeonsi
NIR.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: fix getting the index type size for uint8_t
Samuel Pitoiset [Fri, 23 Aug 2019 07:23:21 +0000 (09:23 +0200)]
radv: fix getting the index type size for uint8_t

16-bit and 32-bit values match hardware values but 8-bit doesn't.

This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index.

Fixes: 372c3dcfdb8 ("radv: implement VK_EXT_index_type_uint8")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
5 years agovirgl: fix format conversion for recent gallium changes.
Dave Airlie [Thu, 22 Aug 2019 06:30:11 +0000 (16:30 +1000)]
virgl: fix format conversion for recent gallium changes.

The virgl formats are fixed in time snapshots of the gallium ones,
we just need to provide a translation table between them when
we enter the hardware.

This fixes a regression since Eric renumbered the gallium table.

Fixes: c45c33a5a2 (gallium: Remove manual defining of PIPE_FORMAT enum values.)
Bugzilla: https://bugs.freedesktop.org/111454

v1 by Dave Airlie <airlied@redhat.com>
v2: virgl: Add a number of formats to the table that are used, e.g. for vertex
    attributes
v3: cover some more missing formats from a piglit run

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
5 years agovirgl: drop unused format field
Dave Airlie [Thu, 22 Aug 2019 06:20:39 +0000 (16:20 +1000)]
virgl: drop unused format field

5 years agolima/ppir: enable vectorize optimization
Erico Nunes [Sat, 10 Aug 2019 20:46:02 +0000 (22:46 +0200)]
lima/ppir: enable vectorize optimization

pp has vector units and some operations can be optimized when bundled
together.
Benchmarking this with piglit shaders shows that the instruction count
can be greatly reduced on many examples with vectorize.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/ppir: lower selects to scalars
Erico Nunes [Sat, 10 Aug 2019 20:44:22 +0000 (22:44 +0200)]
lima/ppir: lower selects to scalars

nir vec4 fcsel assumes that each component of the condition will be used
to select the same component from the options, but pp can't implement
that since it only has 1 component for the condition.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agolima: fix ppir spill stack allocation
Erico Nunes [Fri, 23 Aug 2019 04:34:36 +0000 (06:34 +0200)]
lima: fix ppir spill stack allocation

The previous spill stack was fixed and too small, and caused instability
in programs requiring spilling for roughly more than one value.
This patch adds a dynamic calculation of the buffer size based on stack
utilization and switches it to a separate allocation at flush time that
will fit the shader that requires the largest buffer.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agointel/fs: Drop the gl_program from fs_visitor
Jason Ekstrand [Fri, 23 Aug 2019 20:33:24 +0000 (15:33 -0500)]
intel/fs: Drop the gl_program from fs_visitor

It's not used by anything anymore now that so much lowering has been
moved into NIR.  Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge.  Short of a lot of pointless work,
that one's probably not going away.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agolima: move format handling to unified place
Qiang Yu [Sat, 17 Aug 2019 08:40:49 +0000 (16:40 +0800)]
lima: move format handling to unified place

Create a unified table to handle pipe format to texture
and render target format lookup.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
5 years agoradv: Change memory type order for GPUs without dedicated VRAM
Alex Smith [Sun, 2 Jun 2019 10:32:06 +0000 (11:32 +0100)]
radv: Change memory type order for GPUs without dedicated VRAM

Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.

When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.

Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.

On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(Bas: CCing this to 19.2-rc due to high impact and limited complexity)

5 years agolima/ppir: print register index and components number for spilled register
Vasily Khoruzhick [Tue, 20 Aug 2019 14:30:46 +0000 (07:30 -0700)]
lima/ppir: print register index and components number for spilled register

It can be useful for debugging purposes

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: add control flow support
Vasily Khoruzhick [Mon, 19 Aug 2019 06:37:23 +0000 (23:37 -0700)]
lima/ppir: add control flow support

This commit adds support for nir_jump_instr, if and loop
nir_cf_nodes.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: add better liveness analysis
Vasily Khoruzhick [Mon, 19 Aug 2019 06:34:22 +0000 (23:34 -0700)]
lima/ppir: add better liveness analysis

Add better liveness analysis that was modelled after one in vc4.
It uses live ranges and is aware of multiple blocks which is prerequisite
for adding CF support

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: validate shader outputs
Vasily Khoruzhick [Fri, 23 Aug 2019 04:17:23 +0000 (21:17 -0700)]
lima/ppir: validate shader outputs

Mali4x0 supports only gl_FragColor. gl_FragDepth is not supported.
Check that we don't get anything but gl_FragColor in shader outputs.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: turn store_color into ALU node
Vasily Khoruzhick [Thu, 22 Aug 2019 14:42:56 +0000 (07:42 -0700)]
lima/ppir: turn store_color into ALU node

We don't have a special OP to store color in PP, all we need to do is to
store gl_FragColor into reg0, thus it's just a mov and therefore ALU node.

Yet we still need to indicate that it's store_color op so regalloc ignores
its destination.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: create ppir block for each corresponding NIR block
Vasily Khoruzhick [Mon, 19 Aug 2019 06:30:32 +0000 (23:30 -0700)]
lima/ppir: create ppir block for each corresponding NIR block

Create ppir block for each corresponding NIR block and populate
its successors. It will be used later in liveness analysis and
in CF support

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: add dummy op
Vasily Khoruzhick [Mon, 19 Aug 2019 06:03:57 +0000 (23:03 -0700)]
lima/ppir: add dummy op

We can get following from NIR:

(1) r1 = r2
(2) r2 = ssa1

Note that r2 is read before it's assigned, so there's no node for
it in comp->var_nodes. We need to create a dummy node in this case
which sole purpose is to hold ppir_dest with reg in it.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: add write after read deps for registers
Vasily Khoruzhick [Mon, 19 Aug 2019 05:57:54 +0000 (22:57 -0700)]
lima/ppir: add write after read deps for registers

For cases like:

(1) r1 = r2
(2) r2 = ssa1

We need to add (1) as dependency of (2), otherwise scheduler may
reorder them.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: fix ordering deps
Vasily Khoruzhick [Mon, 19 Aug 2019 05:51:34 +0000 (22:51 -0700)]
lima/ppir: fix ordering deps

There can be several root nodes, i.e.:

(1) r0 = r1
(2) r2 = r3
(3) branch if (ssa1)

We need to make (3) depend on (1) and (2), old code added
dependency only for (2), and (1) was kept as root node since there
is no branch/discard or store color between two movs.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: set write mask for texture loads if dest is reg
Vasily Khoruzhick [Mon, 19 Aug 2019 05:48:22 +0000 (22:48 -0700)]
lima/ppir: set write mask for texture loads if dest is reg

Destination for texture load can be a reg, so we need to
set write mask in this case

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: add support for unconditional branches and condition negation
Vasily Khoruzhick [Tue, 28 May 2019 03:02:55 +0000 (20:02 -0700)]
lima/ppir: add support for unconditional branches and condition negation

We need 'negate' modifier for branch condition to minimize branching. Idea
is to generate following:

current_block: { ...; if (!statement) branch else_block; }
then_block: { ...; branch after_block; }
else_block: { ... }
after_block: { ... }

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: clone ld_{uni,tex,var} into each block
Vasily Khoruzhick [Tue, 20 Aug 2019 03:20:12 +0000 (20:20 -0700)]
lima/ppir: clone ld_{uni,tex,var} into each block

ppir_lower_load() and ppir_lower_load_texture() assume that node
is in the same block as its successors, fix it by cloning each
ld_uni and ld_tex to every block.

It also reduces register pressure since values never cross block
boundaries and thus never appear in live_in or live_out of any block,
so do it for varyings as well.

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: refactor const lowering
Vasily Khoruzhick [Wed, 24 Jul 2019 22:33:47 +0000 (15:33 -0700)]
lima/ppir: refactor const lowering

Const nodes are now cloned for each user, i.e. const is guaranteed to have
exactly one successor, so we can use ppir_do_one_node_to_instr() and
drop insert_to_each_succ_instr()

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agoanv: Only re-emit non-dynamic state that has changed.
Rafael Antognolli [Tue, 20 Aug 2019 00:01:10 +0000 (17:01 -0700)]
anv: Only re-emit non-dynamic state that has changed.

On commit f6e7de41d7b, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.

This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.

v2: Improve commit message and add documentation about skipped checks
(Jason)

Fixes: f6e7de41d7b ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agopan/decode: Validate and quiet helper invocation flag
Alyssa Rosenzweig [Thu, 22 Aug 2019 23:36:10 +0000 (16:36 -0700)]
pan/decode: Validate and quiet helper invocation flag

We can statically determine from the disassembly if helper invocations
will be needed, so we can validate the corresponding bit in the
cmdstream and thus avoid printing the bit itself in the decode.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Analyze helper invocations
Alyssa Rosenzweig [Thu, 22 Aug 2019 23:31:03 +0000 (16:31 -0700)]
pan/midgard: Analyze helper invocations

We check for texture ops which calculate derivatives (either explicitly
via dFd* or implicitly) and mark the shader as requiring helper
invocations.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoutil: fix compilation on macos
Lionel Landwerlin [Fri, 23 Aug 2019 08:35:13 +0000 (10:35 +0200)]
util: fix compilation on macos

timespec_get() is not available on macos, we need to pull in the
include/c11/threads_posix.h helper.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103674
Fixes: e2d761de03 ("util: drop final reference to p_compiler.h")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoi965: Silence brw_blorp uninitialized warning
Caio Marcelo de Oliveira Filho [Fri, 23 Aug 2019 16:12:02 +0000 (09:12 -0700)]
i965: Silence brw_blorp uninitialized warning

The variables level and start_layer are not initialized, then
initialized if we have a BUFFER_BIT_DEPTH set.  We assert on them
later using the same check.  This should be enough but GCC 9.1.1 is
not convinced, so let's initialize the variables.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agotgsi: Remove unused local
Caio Marcelo de Oliveira Filho [Fri, 23 Aug 2019 15:22:02 +0000 (08:22 -0700)]
tgsi: Remove unused local

Code that used it was removed in 4ebe6b2e72e ("tgsi: Drop the SSE2
constants setup that's been dead code since 2011.")

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoiris: Guard GEN9-only function in Iris state to avoid warning
Caio Marcelo de Oliveira Filho [Fri, 23 Aug 2019 15:12:37 +0000 (08:12 -0700)]
iris: Guard GEN9-only function in Iris state to avoid warning

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/decoders: Avoid uninitialized variable warnings
Caio Marcelo de Oliveira Filho [Fri, 23 Aug 2019 14:59:21 +0000 (07:59 -0700)]
intel/decoders: Avoid uninitialized variable warnings

Initialize `next_batch_addr` and `second_level`.  If the batch is well
formed, those values will be overriden, if not, they are as good as
uninitialized garbage.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agocompiler/glsl: Fix warning about unused function
Caio Marcelo de Oliveira Filho [Fri, 23 Aug 2019 14:57:13 +0000 (07:57 -0700)]
compiler/glsl: Fix warning about unused function

The helper check_node_type() is only used when DEBUG is set (in the
function below), but ASSERTED macro uses NDEBUG.  So just guard the
helper with #ifdef.  If we see more such cases we might consider a
ASSERTED-like macro for the DEBUG case.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoanv: Drop unused local variable
Caio Marcelo de Oliveira Filho [Thu, 22 Aug 2019 14:59:47 +0000 (07:59 -0700)]
anv: Drop unused local variable

Leftover from 021fa28163a ("xintel/nir: Add a helper for getting
BRW_AOP from an intrinsic").

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: Silence maybe-uninitialized warning in GCC 9.1.1
Caio Marcelo de Oliveira Filho [Fri, 23 Aug 2019 14:41:18 +0000 (07:41 -0700)]
intel/compiler: Silence maybe-uninitialized warning in GCC 9.1.1

Compiler can't see that d is initialized.

    ../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’:
    ../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      984 |          d = MAX2(-d, d);

Assert that we expect at least one component -- hence d going to be
set.  That by itself is not enough, so also zero initialize the
variable.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoradv: additional query fixes
Andres Rodriguez [Wed, 14 Aug 2019 03:52:23 +0000 (23:52 -0400)]
radv: additional query fixes

Make sure we read the updated data from the gpu in cases where WAIT_BIT
is not set.

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoiris: Fix large timeout handling in rel2abs()
Kenneth Graunke [Thu, 22 Aug 2019 23:08:16 +0000 (16:08 -0700)]
iris: Fix large timeout handling in rel2abs()

...by copying the implementation of anv_get_absolute_timeout().

Appears to fix a CTS test with 32-bit builds:
GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush

Fixes: f459c56be6b ("iris: Add fence support using drm_syncobj")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years agoiris: Set MOCS in all STATE_BASE_ADDRESS commands
Kenneth Graunke [Thu, 22 Aug 2019 23:52:52 +0000 (16:52 -0700)]
iris: Set MOCS in all STATE_BASE_ADDRESS commands

Rafael Antognolli tracked down a performance gap between i965 and iris
in Synmark2's OglCSDof microbenchmark, noting that iris was performing
substantially more memory reads and writes, with substantially fewer
L3 hits.  He suggested that something might be wrong with MOCS, or L3
configs, at which point I came up with a theory...

It would appear that the STATE_BASE_ADDRESS command updates the MOCS
settings for various base addresses even if you don't specify the
"Modify Enable" bit for that address.  Until now, we had been setting
only the MOCS for bases we intended to change, leaving the others
"blank" which is MOCS table entry 0, which is uncached.

Most data access has a more specific MOCS (e.g. in SURFACE_STATE),
but scratch access uses the Stateless Data Port Access MOCS from
STATE_BASE_ADDRESS.  So this meant all scratch access was uncached.

Improves performance in Synmark2's OglCSDof by 2x, bringing iris
on par with the existing i965 driver.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoglx: Fix up glXQueryGLXPbufferSGIX on macOS.
Vinson Lee [Fri, 23 Aug 2019 05:26:26 +0000 (22:26 -0700)]
glx: Fix up glXQueryGLXPbufferSGIX on macOS.

Fix this build error on macOS.

../src/glx/apple/glx_empty.c:158:4: error: void function 'glXQueryGLXPbufferSGIX' should not return a value [-Wreturn-type]
   return 0;
   ^      ~

Fixes: 3dd299c3d5b8 ("glx: Sync <GL/glxext.h> with Khronos")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Adam Jackson <ajax@redhat.com>
5 years agodocs: update calendar, add news item and link release notes for 19.1.5
Juan A. Suarez Romero [Fri, 23 Aug 2019 10:40:40 +0000 (12:40 +0200)]
docs: update calendar, add news item and link release notes for 19.1.5

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
5 years agodocs: add sha256 checksums for 19.1.5
Juan A. Suarez Romero [Fri, 23 Aug 2019 10:38:02 +0000 (12:38 +0200)]
docs: add sha256 checksums for 19.1.5

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit ae2a676cd1748c850f579863003c92f2b137f44a)

5 years agodocs: add release notes for 19.1.5
Juan A. Suarez Romero [Fri, 23 Aug 2019 10:24:21 +0000 (12:24 +0200)]
docs: add release notes for 19.1.5

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit a384fe0cebf1fcd6671c51c749fcc981e01b5505)

5 years agoradeonsi/nir: Rewrite output scanning
Connor Abbott [Wed, 21 Aug 2019 15:08:03 +0000 (17:08 +0200)]
radeonsi/nir: Rewrite output scanning

Similarly to before, this didn't properly handle varying structs with
doubles in them.

This doesn't fix any tests, but was noticed while looking at the code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi/nir: Rewrite store intrinsic gathering
Connor Abbott [Wed, 21 Aug 2019 11:28:21 +0000 (13:28 +0200)]
radeonsi/nir: Rewrite store intrinsic gathering

The old version wasn't as accurate as it could be, and didn't handle
double variables inside structs correctly. Walk the path to compute the
actual components affected.

In combination with the previous commit fixes
KHR-GL45.enhanced_layouts.varying_structure_locations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi/nir: Add const_index when loading GS inputs
Connor Abbott [Tue, 20 Aug 2019 10:47:39 +0000 (12:47 +0200)]
radeonsi/nir: Add const_index when loading GS inputs

This fixes loading GS inputs in structures or arrays.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi/nir: Don't add const offset to indirect
Connor Abbott [Tue, 20 Aug 2019 10:45:32 +0000 (12:45 +0200)]
radeonsi/nir: Don't add const offset to indirect

This is already done in get_deref_offset() in the common code. We were
adding it twice accidentally.

Fixes KHR-GL45.enhanced_layouts.varying_array_locations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoac/nir: Assert GS input index is constant
Connor Abbott [Tue, 20 Aug 2019 10:43:33 +0000 (12:43 +0200)]
ac/nir: Assert GS input index is constant

If it's not we silently ignore indir_index which is definitely a bug.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoac/nir: Handle const array offsets in get_deref_offset()
Connor Abbott [Tue, 20 Aug 2019 10:31:55 +0000 (12:31 +0200)]
ac/nir: Handle const array offsets in get_deref_offset()

Some users of this function (e.g. GS inputs) currently only work with
constant offsets. We got lucky since all the tests used an array index
of 0, so the non-constant part was always 0. But we still need to handle
this.

This doesn't fix any CTS test, but was noticed while debugging one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi/nir: Don't recompute num_inputs and num_outputs
Connor Abbott [Thu, 22 Aug 2019 11:21:17 +0000 (13:21 +0200)]
radeonsi/nir: Don't recompute num_inputs and num_outputs

Don't repeat what mesa/st already does.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agost/nir: Fix num_inputs for VS inputs
Connor Abbott [Thu, 22 Aug 2019 11:19:07 +0000 (13:19 +0200)]
st/nir: Fix num_inputs for VS inputs

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv/gfx10: do not use NGG with NAVI14
Samuel Pitoiset [Wed, 21 Aug 2019 08:53:57 +0000 (10:53 +0200)]
radv/gfx10: do not use NGG with NAVI14

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0
Samuel Pitoiset [Wed, 21 Aug 2019 08:50:48 +0000 (10:50 +0200)]
radv/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0

Only gfx9 and older use it to get InstanceID in VGPR1.
Ported from RadeonSI.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agogitlab-ci: bump LLVM to 8 for meson-vulkan and meson-clover
Samuel Pitoiset [Wed, 21 Aug 2019 09:45:25 +0000 (11:45 +0200)]
gitlab-ci: bump LLVM to 8 for meson-vulkan and meson-clover

To fix pipeline builds.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoac,radv,radeonsi: remove LLVM 7 support
Samuel Pitoiset [Thu, 1 Aug 2019 09:18:43 +0000 (11:18 +0200)]
ac,radv,radeonsi: remove LLVM 7 support

Now that LLVM 9 will be released soon, we will only support
LLVM 8, 9 and master (10).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoegl: reset blob cache set/get functions on terminate
Tapani Pälli [Thu, 22 Aug 2019 07:49:36 +0000 (10:49 +0300)]
egl: reset blob cache set/get functions on terminate

Fixes errors seen with eglSetBlobCacheFuncsANDROID on Android when
running dEQP that terminates and reinitializes a display.

Fixes: 6f5b57093b3 "egl: add support for EGL_ANDROID_blob_cache"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoiris: Avoid unnecessary resolves on transfer maps
Kenneth Graunke [Fri, 26 Apr 2019 17:44:18 +0000 (10:44 -0700)]
iris: Avoid unnecessary resolves on transfer maps

We were always resolving the buffer as if we were accessing it via
CPU maps, which don't understand any auxiliary surfaces.  But we often
copy to a temporary using BLORP, which understands compression just
fine.  So we can avoid the resolve, and accelerate the copy as well.

Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoiris: Drop copy format hacks from copy region based transfer path.
Kenneth Graunke [Wed, 24 Apr 2019 03:19:37 +0000 (20:19 -0700)]
iris: Drop copy format hacks from copy region based transfer path.

This doesn't work for compressed formats, as the source texture and
temporary texture would have different block sizes.  (Forcing the driver
to always take the GPU path would expose the bug.)  Instead, just use
the source format for the temporary, and let blorp_copy deal with
overrides.

The one case where we can't do this is ASTC, because isl won't let us
create a linear ASTC surface.  Fall back to the CPU paths there for now.

Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoiris: Update fast clear colors on Gen9 with direct immediate writes.
Kenneth Graunke [Mon, 19 Aug 2019 20:57:46 +0000 (13:57 -0700)]
iris: Update fast clear colors on Gen9 with direct immediate writes.

Gen11 stores the fast clear color in an "indirect clear buffer", as
a packed pixel value.  Gen9 hardware stores it as a float or integer
value, which is interpreted via the format.  We were trying to store
that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM
it from there to the actual SURFACE_STATE bytes where it's stored.

This unfortunately doesn't work for blorp_copy(), which does bit-for-bit
copies, and overrides the format to a CCS-compatible UINT format.  This
causes the clear color to be interpreted in the overridden format.

Normally, we provide the clear color on the CPU, and blorp_blit.c:2611
converts it to a packed pixel value in the original format, then unpacks
it in the overridden format, so the clear color we use expands to the
bits we originally desired.

However, BLORP doesn't support this pack/unpack with an indirect clear
buffer, as it would need to do the math on the GPU.  On Gen11+, it isn't
necessary, as the hardware does the right thing.

This patch changes Gen9 to stop using an indirect clear buffer and
simply do PIPE_CONTROLs with post-sync write immediate operations
to store the new color over the surface states for regular drawing.
BLORP continues streaming out surface states, and handles fast clear
colors on the CPU.

Fixes: 53c484ba8ac ("iris: blorp using resolve hooks")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoiris: Fix broken aux.possible/sampler_usages bitmask handling
Kenneth Graunke [Tue, 20 Aug 2019 05:36:36 +0000 (22:36 -0700)]
iris: Fix broken aux.possible/sampler_usages bitmask handling

For renderable surfaces, we allocate SURFACE_STATEs for each bit in
res->aux.possible_usages.  Sampler views use res->aux.sampler_usages.

When pinning buffers, we call surf_state_offset_for_aux() to calculate
the offset to the desired surface state.  surf_state_offset_for_aux()
took an aux_modes parameter, which should be one of those two fields.
However...it was not using that parameter.  It always used the broader
res->aux.possible_usages field directly.

One of the callers, update_clear_value(), was passing incorrect masks
for this parameter.  It iterated through the bits in order, using
u_bit_scan(), which destructively modifies the mask.  So each time we
called it, the count of bits before our selected mode was 0, which would
cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE,
rather than updating each in turn.  This was hidden by the earlier bug
where surf_state_offset_for_aux() ignored the parameter.

Fixes: 7339660e803 ("iris: Add aux.sampler_usages.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoiris: Replace devinfo->gen with GEN_GEN
Kenneth Graunke [Mon, 19 Aug 2019 20:52:37 +0000 (13:52 -0700)]
iris: Replace devinfo->gen with GEN_GEN

This is genxml, we can compile out this code.

Fixes: 26606672847 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agopan/midgard: Fix writeout combining
Alyssa Rosenzweig [Thu, 22 Aug 2019 20:59:54 +0000 (13:59 -0700)]
pan/midgard: Fix writeout combining

shader-db regression in the scheduler.

Fixes: dff4986b1aa ("pan/midgard: Emit store_output branch just-in-time")
total bundles in shared programs: 2055 -> 2019 (-1.75%)
bundles in affected programs: 1055 -> 1019 (-3.41%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 20.00% x̄: 6.71% x̃: 5.16%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -8.45% -4.97%
Bundles are helped.

total quadwords in shared programs: 3444 -> 3408 (-1.05%)
quadwords in affected programs: 1897 -> 1861 (-1.90%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 14.29% x̄: 3.97% x̃: 2.99%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -5.08% -2.86%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Implement gl_FragCoord correctly
Alyssa Rosenzweig [Thu, 22 Aug 2019 18:29:23 +0000 (11:29 -0700)]
panfrost: Implement gl_FragCoord correctly

Rather than passing through the transformed gl_Position, we can use the
hardware-level varying for this, which will correctly handle
gl_FragCoord.w

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Remove vertex buffer offset from its size
Alyssa Rosenzweig [Thu, 22 Aug 2019 15:02:52 +0000 (08:02 -0700)]
panfrost: Remove vertex buffer offset from its size

The offset is added to the base address, so we need to subtract it from
the size to maintain the same end address and thus prevent a buffer
overflow:

   end_address = start_address + size

   start_address' = start_address + offset
   size' = size - offset

   end_address' = start_address' + size'
                = (start_address + offset) + (size - offset)
                = (start_address + size) + (offset - offset)
                = start_address + size
                = end_address

   QED.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/decode: Handle special varyings
Alyssa Rosenzweig [Thu, 22 Aug 2019 20:27:38 +0000 (13:27 -0700)]
pan/decode: Handle special varyings

We need a special path for special varyings so we parse them correctly
instead of throwing an error when they inevitably point to bad memory.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>