git.libre-soc.org Git - mesa.git/log

nir/search: Support 8 and 16-bit constants in match_value

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>

travis: make Meson find the proper llvm-config

Travis CI has moved to LLVM 5.0, and meson is detecting automatically
the available version in /usr/local/bin based on the PATH env variable
order preference.

As for 0.44.x, Meson cannot receive the path to the llvm-config binary
as a configuration parameter. See
https://github.com/mesonbuild/meson/issues/2887 and
https://github.com/dcbaker/meson/commit/7c8b6ee3fa42f43c9ac7dcacc61a77eca3f1bcef

We want to use the custom (APT) installed version. Therefore, let's
make Meson find our wanted version sooner than the one at
/usr/local/bin

Once this is corrected, we would still need a patch similar to:
https://lists.freedesktop.org/archives/mesa-dev/2017-December/180217.html

v2: Create the link only to the specificly wanted LLVM version (Gert).

Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Gert Wollny <gw.fossdev@gmail.com>
Cc: Jon Turney <jon.turney@dronecode.org.uk>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: fix LLVM version detection when <= 3.4

3 digits versions in LLVM only started from 3.4.1 on.

Hence, even if you can perfectly build with an old LLVM (< 3.4.1) in
the system while not needing LLVM at all (auto), when passing through
the LLVM version detection code, meson will fail when accessing
"_llvm_version[2]" due to:

"Index 2 out of bounds of array of size 2."

v2: Properly compare LLVM version and set patch version to 0
if < 3.4.1 (Eric).

v3: Improve the commit log explanation (Eric).

Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

i965/sbe: fix number of inputs for active components

In 16631ca30ea6 we fixed gen9 active components to account for padded
inputs in the URB, which we can have with SSO programs. To do that,
instead of going through the bitfield of inputs (which doesn't include
padding information), we compute the number of inputs from the size
of the URB entry.

Unfortunately, there are some special inputs that are not stored in
the URB and that we also need to account for. These special inputs
are identified and handled during calculate_attr_overrides().

Instead of keeping track of the exact number of inputs, we just
program active components for all possible inputs like we do in
anvil.

This fixes a regression in a WebGL program that uses Point Sprite
functionality (specifically, VARYING_SLOT_PNTC).

v2:
- Add 'Fixes' tag (Mark Janes)
- make no_vue_inputs int instead of uint32_t, and add const qualifier
to num_inputs variable (Ian)

v3:
- Do not try to count inputs correctly, just program all input
slots like we do in anvil (Ken)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105224
Fixes: 16631ca30ea6 (i965/sbe: fix active components for SSO programs with over 16 inputs)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

radv: only emit cache flushes when the pool size is large enough

This is an optimization which reduces the number of flushes for
small pool buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: keep track of the query pool size

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: make sure to emit cache flushes before starting a query

If the query pool has been previously resetted using the compute
shader path.

Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105292
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

nir/serialize: handle var->name being NULL

var->name could be NULL under ARB_gl_spirv for example. And in any
case, the code is already handing var name being NULL when reading a
variable, so it is consistent to do it writing a variable too.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

anv: Enable VK_KHR_16bit_storage for PushConstant

Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned

The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some cases.

For 16-bit types, the push constant offset takes into account the
internal offset in the 32-bit uniform bucket adding 2-bytes when we access
not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0.

v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv: Calculate properly 16-bit vector sizes

Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.

v2: Use glsl_get_bit_size instead of if statement
(Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv: Enable VK_KHR_16bit_storage for SSBO and UBO

Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout

Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.

Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non constant offsets we will always use
byte_scattered_write.

v2: (Jason Ekstrand)
- Assert offset_reg to be multiple of 4 if it is immediate.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout

16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.

Vectors reads when we have non-constant offsets are implemented with
multiple byte_scattered_read messages that not require 32-bit aligned offsets.

Now for all constant offsets we can use the untyped_read_surface message.
In the case of constant offsets not aligned to 32-bits, we calculate a
start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data
function and the first_component parameter to skip the copy of the unneeded
component.

v2: (Jason Ekstrand)
    Use untyped_read_surface messages always we have constant offsets.

v3: (Jason Ekstrand)
    Simplify loop for reads with non constant offsets.
    Use end - start to calculate the number of 32-bit components to read with
    constant offsets.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components

This helper used to load 16bit components from 32-bits read now allows
skipping components with the new parameter first_component. The semantics
now skip components until we reach the first_component, and then reads the
number of components passed to the function.

All previous uses of the helper are updated to use 0 as first_component.
This will allow read 16-bit components when the first one is not aligned
32-bit. Enabling more usages of untyped_reads with 16-bit types.

v2: (Jason Ektrand)
Change parameters order to first_component, num_components

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytew multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = isl_align(buffer_size, 4) +
               (isl_align(buffer_size, 4) - buffer_size)

So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
    Move padding logic fron anv to isl_surface_state.
    Move calculus of original size from spirv to driver backend.
v3: (Jason Ekstrand)
    Rename some variables and use a similar expresion when calculating.
    padding than when obtaining the original buffer size.
    Avoid use of unnecesary component call at brw_fs_nir.
v4: (Jason Ekstrand)
    Complete comment with buffer size calculus explanation in brw_fs_nir.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

vbo: Remove vbo_save_vertex_list::vertex_size.

Like before use local variables from compile_vertex_list instead.
Remove vertex_size from struct vbo_save_vertex_list.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove vbo_save_vertex_list::buffer_offset.

The buffer_offset is used in aligned_vertex_buffer_offset.
But now that most of these decisions are done in compile_vertex_list
we can work on local variables instead of struct members in the
display list code. Clean that up and remove buffer_offset.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove vbo_save_vertex_list::start_vertex.

Replace last use on replay with _vbo_save_get_{min,max}_index. Appart from
that it is not used anymore.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove vbo_save_vertex_list::attrsz.

Is not used anymore on replay, move the last use in display list
compilation to the original array in the display list compiler.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove vbo_save_vertex_list::attrtype.

Is not used anymore on replay, move the last use in display list
compilation to the original array in the display list compiler.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove vbo_save_vertex_list::enabled.

Is not used anymore on replay.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove reference to the vertex_store from the dlist node.

Since we now store a set of VAOs in the display list, use these object
to get the reference to the VBO in several places.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Implement current values update in terms of the VAO.

Use the information already present in the VAO to update the current values
after display list replay. Set GL_OUT_OF_MEMORY on allocation failure
for the current value update storage.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Implement vbo_loopback_vertex_list in terms of the VAO.

Use the information already present in the VAO to replay a display list
node using immediate mode draw commands. Use a hand full of helper methods
that will be useful for the next patches also.

v2: Insert asserts, constify local variables.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Use a local variable for the dlist offsets.

The master value is now stored inside the VAO already present in
struct vbo_save_vertex_list. Remove the unneeded copy from dlist storage.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove unused vbo_save_context::wrap_count.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

vbo: Remove unused vbo_save_vertex_list::dangling_attr_ref.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

anv: Always set has_context_priority

We don't zalloc the physical device so we need to unconditionally set
everything. Crucible helpfully initializes all allocations to 139 so it
was getting true regardless of whether or not the kernel actually
supports context priorities.

Fixes: 6d8ab53303331 "anv: implement VK_EXT_global_priority extension"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Revert "i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+"

This reverts commit a2c1e48f15995a826dc759e064c2603882a37e0c.

On BDWGT3e and KBLGT3e systems, this commit regressed the following
tests:

  piglit.spec.ext_framebuffer_multisample.accuracy 2 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy 4 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy 6 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy 8 stencil_resolve small depthstencil
  piglit.spec.ext_framebuffer_multisample.accuracy all_samples stencil_resolve small depthstencil

radeonsi/nir: increase values to 8 for gs fetch.

This stops a crash when running (still fails):
tests/spec/arb_gpu_shader_fp64/execution/explicit-location-gs-fs-vs.shader_test

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: Use the syncobj wait ioctl to wait on fences if possible.

Handles the !waitAll and signal after the start of the wait cases correctly.

Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: Implement more efficient !waitAll fence waiting.

Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: Implement waiting on non-submitted fences.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: Implement WaitForFences with !waitAll.

Nothing to do except using a busy wait loop. At least for old kernels.

A better implementation for newer kernels to come later.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105255
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>

ac/nir: fix shared atomic operations.

The nir->llvm conversion was using the wrong srcs.

Fixes:
tests/spec/arb_compute_shader/execution/shared-atomics.shader_test

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

ac/nir: don't apply slice rounding on txf_ms

This matches the tgsi code.

Fixes arb_texture_multisample texelFetch piglit tests.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec7914 (radv: add initial non-conformant radv vulkan driver)
Signed-off-by: Dave Airlie <airlied@redhat.com>

radeonsi: set some context vars for nir path

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: remove llvm from ir struct

This was added in 425dc4c4b366 but never used. Also since
100796c15c3a native has superseded llvm.

Acked-by: Dave Airlie <airlied@redhat.com>

i965: Don't emit MOVs with undefined registers for Gen4 point clipping.

Gen4 point clipping calls brw_clip_tri_alloc_regs with nr_verts == 0,
which means that c->reg.vertex[] isn't initialized. It then emits MOVs
to stomp components of those uninitialized registers to 0.

This started causing assertions after Matt's recent series, when those
uninitialized registers started getting BRW_REGISTER_TYPE_NF, which
definitely doesn't exist on Gen4-5.

Reviewed-by: Matt Turner <mattst88@gmail.com>

broadcom/vc5: Fix regression in the page-cache slice size alignment.

We need to align the size of the slice, not the offset of the next slice.
Fixes KHR-GLES3.texture_repeat_mode.rgba32ui_11x131_2_clamp_to_edge.

Fixes: b4b4ada7616d ("broadcom/vc5: Fix layout of 3D textures.")

i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Be more clever about setting up our viewport clip

Before, we were trusting in the hardware to take the intersection
of the viewport clip with the drawing rectangle. Unfortunately,
3DSTATE_DRAWING_RECTANGLE is fairly expensive because it implicitly
does a full pipeline stall. If we're a bit more careful with our
viewport clipping, we can just re-emit it once at context creation
time.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Re-add .vs_inputs_dual_locations = true

Looks like a rebase mistake.

Fixes: 89fe5190a256 ("intel/compiler: Lower flrp32 on Gen11+")

r600/shader: when using images always load thread id gpr at start (v2)

The delayed loading code was fail if we had control flow.

This fixes:
tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test

v2: don't use temp_reg before setting temp_reg up.

Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: fix whitespace in recent 1d texture commit.

trivial fix.

intel/compiler: Add ICL to test_eu_validate.cpp

With the Align16 tests now disabled, we can run the rest of the tests in
ICL mode (and see them pass!)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Disable Align16 tests on Gen11+

Align16 is no more.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Add instruction compaction support on Gen11

Gen11 only differs from SKL+ in that it uses a new datatype index table.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Mark line, pln, and lrp as removed on Gen11+

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Lower flrp32 on Gen11+

The LRP instruction is no more.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler/fs: Implement ddy without using align16 for Gen11+

Align16 is no more. We previously generated an align16 ADD instruction
to calculate DDY:

   add(16) g25<1>F  -g23<4>.xyxyF   g23<4>.zwzwF   { align16 1H };

Without align16, we now implement it as:

   add(4) g25<1>F   -g23<0,2,1>F    g23.2<0,2,1>F  { align1 1N };
   add(4) g25.4<1>F -g23.4<0,2,1>F  g23.6<0,2,1>F  { align1 1N };
   add(4) g26<1>F   -g24<0,2,1>F    g24.2<0,2,1>F  { align1 1N };
   add(4) g26.4<1>F -g24.4<0,2,1>F  g24.6<0,2,1>F  { align1 1N };

where only the first two instructions are needed in SIMD8 mode.

Note: an earlier version of the patch implemented this in two
instructions in SIMD16:

   add(8) g25<2>F   -g23<4,2,0>F    g23.2<4,2,0>F  { align1 1N };
   add(8) g25.1<2>F -g23.1<4,2,0>F  g23.3<4,2,0>F  { align1 1N };

but I realized that the channel enable bits will not be correct. If we
knew we were under uniform control flow, we could emit only those two
instructions however.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler/fs: Simplify ddx/ddy code generation

The brw_reg() constructor just obfuscates things here, in my opinion.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler/fs: Pass fs_inst to generate_ddx/ddy instead of opcode

In a future patch, generate_ddy will want to inspect inst->exec_size.
Change generate_ddx as well for consistency.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler/fs: Don't generate integer DWord multiply on Gen11

Like CHV et al., Gen11 does not support 32x32 -> 32/64-bit integer
multiplies.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler/fs: Implement FS_OPCODE_LINTERP with MADs on Gen11+

The PLN instruction is no more. Its functionality is now implemented
using two MAD instructions with the new native-float type. Instead of

   pln(16) r20.0<1>:F r10.4<0;1,0>:F r4.0<8;8,1>:F

we now have

   mad(8) acc0<1>:NF r10.7<0;1,0>:F r4.0<8;8,1>:F r10.4<0;1,0>:F
   mad(8) r20.0<1>:F acc0<8;8,1>:NF r5.0<8;8,1>:F r10.5<0;1,0>:F
   mad(8) acc0<1>:NF r10.7<0;1,0>:F r6.0<8;8,1>:F r10.4<0;1,0>:F
   mad(8) r21.0<1>:F acc0<8;8,1>:NF r7.0<8;8,1>:F r10.5<0;1,0>:F

... and in the case of SIMD8 only the first pair of MAD instructions is
used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler/fs: Return multiple_instructions_emitted from generate_linterp

If multiple instructions are emitted, special handling of things like
conditional mod and NoDDClr/NoDDChk need to be performed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler/fs: Fix application of cmod and saturate to LINE/MAC pair

This isn't technically broken, but the next patch will make this
function report whether it generated multiple instructions, and that
information will be used to disable the application of conditional mod
by the generic code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Add Gen11+ native float type

This new type exposes the additional precision offered by the
accumulator register and will be used in the next patch to implement the
functionality of the PLN instruction using a pair of MAD instructions.

One weird thing to note: align1 ternary instructions may only have an
accumulator in the dst or src1 normally, but when src0's type is :NF
the accumulator is read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Add Gen11 register types

The hardware register types' encodings have changed on Gen11. Good thing
we have that superfluous looking brw_reg_type abstraction lying around!

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel: Disable 64-bit extensions on platforms without 64-bit types

Gen11 does not support DF, Q, UQ types in hardware. As a result, we have
to disable some GL extensions until they can be reimplemented.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

intel: Add icl pci id for INTEL_DEVID_OVERRIDE

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>

i965: Warn about preliminary support for Gen11

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel: Add a preliminary device for Ice Lake

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Anuj Phogat <anuj.phogat@intel.com>

anv: remove anv_gem_set_context_priority helper

anv_gem_set_context_param is to be used directly instead!

Fixes: 6d8ab53303 "anv: implement VK_EXT_global_priority extension"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

swr/rast: revert clip distance precision

Fixes piglit tests that broke with 8a64593bde

Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rast: Faster frustum prim culling

Fix clipper validMask setting. We don't need to run frustum rejected
primitives through the clipper. Perform frustum culling with only
frustum clip codes. Guardband clip codes cannot be used because they
overlap frustum codes.

Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rast: Consolidate TRANSLATE_ADDRESS

Translate is now part of an overloaded LOAD call which required a change to
the code gen to skip the load functions in order to handle them manually
to make them virtual.

Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rast: Code generation cleanup

Generate more compact code from gen_llvm.hpp.

Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rast: Remove draw type from event definitions

- Have the draw type sent to DrawInfoEvent in handlers created in
  archrast.cpp.  The draw type no longer needs to be sent during during
  AR_API_EVENT() call in api.cpp.

- Remove draw type from event defintions in events_private.proto, no
  longer needed

Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rast: whitespace change

Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rast: Fix index buffer overfetch issue for non-indexed draws

Populate pLastIndex, even for the non-indexed case. An zero pLastIndex
can cause the index offsets inside the fetcher to have non-sensical values
that can be either very large positive or very large negative numbers.

Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>

softpipe: don't iterate through PIPE_MAX_SHADER_SAMPLER_VIEWS

We were setting view to NULL if the iteration was larger than i.
But in fact if the view is NULL the code did nothing anyway...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

cso: don't cycle through PIPE_MAX_SHADER_SAMPLER_VIEWS on context destroy

There's no point, we know the highest non-null one.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

draw: don't needlessly iterate through all sampler view slots

We already stored the highest (potentially) used number.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

anv: implement VK_EXT_global_priority extension

v2: add ANV_CONTEXT_REALTIME_PRIORITY (Chris)
use unreachable with unknown priority (Samuel)

v3: add stubs in gem_stubs.c (Emil)
use priority defines from gen_defines.h

v4: cleanup, add anv_gem_set_context_param (Jason)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (v2)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: use context priority definitions from gen_defines.h

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

intel: add new common header gen_defines.h

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

winsys/amdgpu: request high addresses

We now have hopefully fixed all bugs regarding high addresses on Vega10 and
Raven. Start to use the high range to make room for SVM in the low
range.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac/shader: move scanning some info about input PS declarations

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

glsl/linker: fix bug when checking precision qualifier

According to GLSL ES 3.2 spec, see table in 9.2.1 "Linked Shaders"
section, the precision qualifier should match for uniform variables.
This also applies to previous GLSL ES 3.x specs.

This 'if' checks the condition for uniform variables, while for UBOs
it is checked in link_interface_blocks.cpp.

Fixes: b50b82b8a553
("glsl/es31: precision qualifier doesn't need to match in shader interface block members")

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

anv: set maxResourceSize to the respective value for each generation

v2:
- Add the proper values to gen9+ (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

r600: partly revert disabling tiling for 1d texture.

Previously we had a check for 1d of narrow 2D textures, however
narrow 2d textures caused gpu hangs, but it was correct for 1d
textures.

This fixes a bunch of 1D image piglits for me.

Fixes: 7b8e1c089d (r600/texture: drop lowering 1d/2d images to linear.)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

nir: fix interger divide by zero crash during constant folding

From the GLSL 4.60 spec Section 5.9 (Expressions):

"Dividing by zero does not cause an exception but does result in
an unspecified value."

Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271

st/mesa: ensure that images don't try to reference non-existent levels

Ideally the st_finalize_texture call would take care of that, but it
doesn't seem to with KHR-GL45.shader_image_size.advanced-nonMS-*. This
assertion makes sure that no such values are passed to the driver.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac/radv: move load base vertex abi setup to vertex shader.

This was segfaulting:
dEQP-VK.memory.pipeline_barrier.host_write_index_buffer.1024

Fixes: 8de6f797070 (ac/radeonsi: add load_base_vertex() to the abi)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>

ac/shader: fix vertex input with components.

This fixes:
dEQP-VK.glsl.440.linkage.varying.component.*

Fixes: 1c57a6da5e3 (ac/shader: scan vertex inputs usage mask)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: remove device pointer from buffer.

This is never used.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

nir: add lower_ldexp to nir compiler options

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac: implement nir_op_ldexp

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac: fix nir_op_fdd{x,y} handling

radeonsi, i965 and anv all treat fdd{x,y} opcodes the same as
fdd{x,y}_coarse by default. The SPIR-V spec lets the implementation
decide how it should be handled and radv was previously going
for the higher quality option. Here we change the shared amd
code to match how nir_op_fdd{x,y} is expected to be handled
by the other NIR drivers.

Fixes piglit test:
./bin/arb_shader_texture_lod-texgrad -auto

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac/radeonsi: add load_base_vertex() to the abi

Fixes the following piglit tests:

./bin/arb_shader_draw_parameters-basevertex basevertex -auto -fbo
./bin/arb_shader_draw_parameters-basevertex basevertex-baseinstance -auto -fbo

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: create get_base_vertex() helper

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi/nir: disable vertex_id_zero_based lowering

The lowering is incompatible with how the radeonsi backend works.

Fixes piglit test:
./bin/arb_shader_draw_parameters-basevertex vertexid-zerobased -auto

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac: add support for handling nir_intrinsic_load_vertex_id

This will be used by radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ac: fix f2b and i2b for doubles

Without this llvm was asserting in debug builds.

V2: use LLVMConstNull()

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

intel/ir: Fix invalid type aliasing with undefined behavior in test_eu_compact.

test_fuzz_compact_instruction() was attempting to modify the uint64_t
data array of a brw_inst through a pointer to uint32_t, which has
undefined behavior. This was causing the test_eu_compact unit test to
fail mysteriously for me on GCC 7 with some additional
harmless-looking changes I had applied to my tree, which happened to
affect the order instructions are emitted by GCC causing the bit
twiddling to be done after the clear_pad_bits() call which is supposed
to overwrite the same data through a pointer of different type,
leading to data corruption. A similar failure has been reported by
Vinson Lee on the master branch built with GCC 8.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105052
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

util/bitset: Make C++ wrapper trivially constructible.

In order to fix a build failure on compilers not implementing
unrestricted unions, which is a C++11 feature.

v2: Provide signed integer comparison and assignment operators instead
of BITSET_WORD ones to avoid spurious ambiguity warnings on
comparisons with a signed integer literal.

Fixes: ba79a90fb52e1e81fb "glsl: Switch ast_type_qualifier to a 128-bit bitset."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105238
Tested-by: Roland Scheidegger <sroland@vmware.com>
Tested-By: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

intel/tools: Use gen_device_name_to_pci_device_id in aubinator

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel/common: Add gen_device_name_to_pci_device_id

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>