Ian Romanick [Tue, 11 Jun 2019 18:08:49 +0000 (11:08 -0700)]
intel/vec4: Try to emit a VF source in try_immediate_source
This commit is also a pre-requisite for the next commit.
No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.
v2: Massive rebase on
eeebeb211f1 ("intel/vec4: Try emitting non-scalar
immediates"). This change is a lot less helpful since that commit
landed (previously helped 1934 shaders on HSW) because, apparently, a
lot of the cases helped by that commit were things like vector loads of
{ 1.0, 1.0, 1.0 } that were also helped by this commit.
Haswell
total instructions in shared programs:
13480095 ->
13478598 (-0.01%)
instructions in affected programs: 229534 -> 228037 (-0.65%)
helped: 1006
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09%
95% mean confidence interval for instructions value: -1.54 -1.43
95% mean confidence interval for instructions %-change: -1.15% -1.07%
Instructions are helped.
total cycles in shared programs:
376385734 ->
376386916 (<.01%)
cycles in affected programs:
14101380 ->
14102562 (<.01%)
helped: 941
HURT: 56
helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2
helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42%
HURT stats (abs) min: 2 max: 618 x̄: 115.50 x̃: 32
HURT stats (rel) min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44%
95% mean confidence interval for cycles value: -2.06 4.43
95% mean confidence interval for cycles %-change: -0.47% -0.39%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total instructions in shared programs:
12048004 ->
12046589 (-0.01%)
instructions in affected programs: 217072 -> 215657 (-0.65%)
helped: 934
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11%
95% mean confidence interval for instructions value: -1.57 -1.46
95% mean confidence interval for instructions %-change: -1.18% -1.10%
Instructions are helped.
total cycles in shared programs:
180285854 ->
180287608 (<.01%)
cycles in affected programs:
14103824 ->
14105578 (0.01%)
helped: 871
HURT: 53
helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2
helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42%
HURT stats (abs) min: 2 max: 618 x̄: 123.66 x̃: 32
HURT stats (rel) min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46%
95% mean confidence interval for cycles value: -1.60 5.39
95% mean confidence interval for cycles %-change: -0.46% -0.37%
Inconclusive result (value mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs:
10861227 ->
10860328 (<.01%)
instructions in affected programs: 92969 -> 92070 (-0.97%)
helped: 624
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95%
95% mean confidence interval for instructions value: -1.52 -1.36
95% mean confidence interval for instructions %-change: -1.09% -1.01%
Instructions are helped.
total cycles in shared programs:
153944316 ->
153942720 (<.01%)
cycles in affected programs:
1640956 ->
1639360 (-0.10%)
helped: 601
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2
helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08%
HURT stats (abs) min: 2 max: 72 x̄: 36.13 x̃: 36
HURT stats (rel) min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00%
95% mean confidence interval for cycles value: -3.44 -1.74
95% mean confidence interval for cycles %-change: -0.18% -0.09%
Cycles are helped.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs:
8139924 ->
8139378 (<.01%)
instructions in affected programs: 69776 -> 69230 (-0.78%)
helped: 322
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1
helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54%
95% mean confidence interval for instructions value: -1.88 -1.51
95% mean confidence interval for instructions %-change: -0.85% -0.72%
Instructions are helped.
total cycles in shared programs:
188542864 ->
188541756 (<.01%)
cycles in affected programs:
3031532 ->
3030424 (-0.04%)
helped: 320
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2
helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -3.85 -3.07
95% mean confidence interval for cycles %-change: -0.06% -0.05%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 6 Jun 2019 18:21:15 +0000 (11:21 -0700)]
intel/vec4: Try to emit a single load for multiple 3-src instruction operands
If a 3-source instruction uses immediate values 1.0 and -1.0, just load
1.0 into a register. Use the negation source modifier to get -1.0.
This has trivial impact now, but it prevents a few thousand regressions
on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1,
a) and flrp(1, -1, a)"
All Gen6 and Gen7 platforms had similar results. (Haswell shown)
total instructions in shared programs:
13487412 ->
13487406 (<.01%)
instructions in affected programs: 541 -> 535 (-1.11%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.33% -0.97%
Instructions are helped.
total cycles in shared programs:
376402564 ->
376402500 (<.01%)
cycles in affected programs: 10348 -> 10284 (-0.62%)
helped: 10
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2
helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29%
95% mean confidence interval for cycles value: -11.72 0.08
95% mean confidence interval for cycles %-change: -1.20% -0.36%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on any other Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 6 Jun 2019 18:12:14 +0000 (11:12 -0700)]
intel/vec4: Refactor operand fixing for ffma and flrp
Reviewed-by: Matt Turner <mattst88@gmail.com>
Alyssa Rosenzweig [Thu, 11 Jul 2019 14:02:26 +0000 (07:02 -0700)]
panfrost: Wire up GLES2-class polygon offset
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 11 Jul 2019 14:01:56 +0000 (07:01 -0700)]
pan/decode: Depth units/factor are identical to GL
I'm not sure why I thoughtt here was an off-by-one, other than maybe bad
data collection.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Christian Gmeiner [Fri, 5 Jul 2019 06:07:36 +0000 (08:07 +0200)]
etnaviv: remove dead translate_ts_sampler_format(..) declaration
Fixes: 66411521ea9 ("etnaviv: combine translate_ts_sampler_format/translate_msaa_format")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Caio Marcelo de Oliveira Filho [Wed, 10 Jul 2019 19:02:23 +0000 (12:02 -0700)]
intel/fs: Add support for SLM fence in Gen11
Gen11 SLM is not on L3 anymore, so now the hardware has two separate
fences. Add a way to control which fence types to use.
At this time, we don't have enough information in NIR to control the
visibility of the memory being fenced, so for now be conservative and
assume that fences will need a stall. With more information later
we'll be able to reduce those.
Fixes Vulkan CTS tests in ICL:
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp
The whole set of supported tests in dEQP-VK.memory_model.* group
should be passing in ICL now.
v2: Pass BTI around instead of having an enum. (Jason)
Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets
transformed into two. (Jason)
List tests fixed. (Lionel)
v3: For clarity, split the decision of which fences to emit from the
emission code. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tomeu Vizoso [Thu, 11 Jul 2019 14:34:01 +0000 (16:34 +0200)]
Revert "panfrost/midgard: Use _safe iterator"
This reverts commit
812ce2ce9e5655613eae740926176509897122fa.
We massively regress with the reverted patch. So in the meantime, take
it out.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 21:50:48 +0000 (14:50 -0700)]
panfrost: Don't lie about Z/S formats
Only Z24S8 is properly supported right now, so let's be careful. Fixes a
number of issues relating to improper Z/S handling. The most obvious is
depth buffers with incorrect strides, which manifests in truly bizarre
ways and can happen commonly with FBOs.
Fixes WebGL (Aquarium runs, etc).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Samuel Pitoiset [Thu, 11 Jul 2019 06:44:20 +0000 (08:44 +0200)]
radv/gfx10: enable geometry shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Thu, 11 Jul 2019 06:44:19 +0000 (08:44 +0200)]
radv/gfx10: Fix NGG GS output mask handlings for LDS indexing.
In emit_vertex we optimize storage if the output mask does not
have all bits set. Do the same in the epilogue so the indices
actually match up.
Fixes dEQP-VK.geometry.input.basic_primitive.points because it
outputs PSIZE with an output mask of 1, which cause the generic
attribute for the color to be loaded from the wrong indices.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Thu, 11 Jul 2019 06:44:18 +0000 (08:44 +0200)]
radv/gfx10: Simplify output mask handling for NGG GS.
We only ever get in this function for a NGG GS proper.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Thu, 11 Jul 2019 06:44:17 +0000 (08:44 +0200)]
radv/gfx10: Do GS prologue outside of gs_threads if.
Mirror radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Thu, 11 Jul 2019 06:44:16 +0000 (08:44 +0200)]
radv/gfx10: implement support for GS as NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Thu, 11 Jul 2019 06:44:15 +0000 (08:44 +0200)]
radv/gfx10: Use correct ES shader for es_vgpr_comp_cnt for GS.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Thu, 11 Jul 2019 06:44:14 +0000 (08:44 +0200)]
radv/gfx10: Do not allocate a gs_copy_shader on gfx10.
Will use ngg for any gs anyway.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Thu, 11 Jul 2019 06:44:13 +0000 (08:44 +0200)]
radv/gfx10: fix VGT_SHADER_STAGES_EN for GS as NGG
The driver shouldn't set the copy shader bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 11 Jul 2019 06:44:12 +0000 (08:44 +0200)]
radv/gfx10: fix number of GS invocations for NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tomeu Vizoso [Thu, 11 Jul 2019 10:18:58 +0000 (12:18 +0200)]
panfrost/midgard: Use _safe iterator
Fixes this assertion:
../mesa/src/panfrost/midgard/midgard_schedule.c:507:schedule_block: Assertion `ins == __next && "use _safe iterator"' failed.
Trace/breakpoint trap
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tomeu Vizoso [Thu, 11 Jul 2019 10:12:16 +0000 (12:12 +0200)]
panfrost: Place the height value in the height field
In the mali_single_framebuffer descriptor.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
v2: Remove unwanted chunks
Samuel Pitoiset [Thu, 11 Jul 2019 12:24:37 +0000 (14:24 +0200)]
radv/gfx10: fix maximum number of mip levels for 3D images
The dimensions also have to be adjusted if the number of supported
mip levels is changed.
This fixes dEQP-VK.api.info.image_format_properties.3d.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 11 Jul 2019 09:54:24 +0000 (11:54 +0200)]
radv/gfx10: disable TC-compat HTILE for multisampled D32_SFLOAT format
For some reasons D32_SFLOAT is also affected on GFX10, it works
fine with previous generations.
This fixes some dEQP-VK.renderpass2.depth_stencil_resolve.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Kenneth Graunke [Sun, 7 Jul 2019 23:48:10 +0000 (16:48 -0700)]
iris: Fix key->input_vertices for 8_PATCH TCS mode.
We were failing to flag the program dirty when it changed. Also, we
were unnecessarily setting key->input_vertices for SINGLE_PATCH mode,
which would reduce program cache hits. Only set it if needed.
Kenneth Graunke [Sun, 7 Jul 2019 23:32:09 +0000 (16:32 -0700)]
iris: Only set key->flat_shade if COL0/COL1 are written.
This was just laziness on my part, we already added similar checks in
the VS key handling. Just need to do it here too. Should improve cache
hits.
Kenneth Graunke [Sun, 7 Jul 2019 23:58:14 +0000 (16:58 -0700)]
iris: Drop comment about var->data.binding not being set.
I refactored the sampler lowering passes a long time ago to ensure
that gl_nir_lower_samplers_as_deref is run and var->data.binding is set.
Kenneth Graunke [Sun, 7 Jul 2019 23:53:17 +0000 (16:53 -0700)]
iris: Drop comments about missing NOS
These stages don't need NOS. If they do, we can add it - the
infrastructure is there if we need it someday.
Kenneth Graunke [Sun, 7 Jul 2019 23:29:24 +0000 (16:29 -0700)]
iris: Drop a TODO comment
This is literally implemented two lines above.
Neil Roberts [Wed, 10 Jul 2019 11:04:01 +0000 (13:04 +0200)]
glsl/builtin types: Set the precision on the depth range params
The members of gl_DepthRangeParameters are declared to be highp in
GLSL ES specs.
Reviewed-by: Eric Anholt <eric@anholt.net>
Neil Roberts [Wed, 10 Jul 2019 11:01:38 +0000 (13:01 +0200)]
glsl: Add a constructor for glsl_struct_field to specify the precision
Adds a third constructor to glsl_struct_field which has an extra
parameter to specify the precision.
Reviewed-by: Eric Anholt <eric@anholt.net>
Neil Roberts [Wed, 10 Jul 2019 10:56:22 +0000 (12:56 +0200)]
glsl: Add a macro for the default values for glsl_struct_field
There are two constructors for glsl_struct_field with different
parameters. Instead of repeating them for both constructors, this
patch adds a convenience macro. This will make it easier to add a
third constructor in a later patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Neil Roberts [Fri, 5 Jul 2019 18:54:32 +0000 (20:54 +0200)]
glsl/builtin_variables: Add a precision to the builtins
All of the builtin variables mentioned in the GLSL ES spec and the
extensions include a precision declaration which is different
depending on what the variable is used for. This patch makes it set
the corresponding precision when creating the variable. This will make
a difference once we start using the precision information for
optimisation. Previously all of the builtin variables ended up with a
precision of NONE.
v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed
gl_MaxViewPorts to always be highp. (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Tue, 9 Jul 2019 07:32:42 +0000 (00:32 -0700)]
compiler: Save a single copy of the softfp64 shader in the context.
We were recompiling the softfp64 library of functions from GLSL to NIR
every time we compiled a shader that used fp64. Worse, we were ralloc
stealing it to the GL context. This meant that we'd accumulate lots of
copies for the lifetime of the context, which was a big space leak.
Instead, we can simply stash a single copy in the GL context, and use
it for subsequent compiles. Having a single copy should be fine from
a memory context point of view: nir_inline_function_impl already clones
the necessary nir_function_impl's as it inlines.
KHR-GL45.enhanced_layouts.ssb_member_align_non_power_of_2 was previously
OOM'ing a system with 16GB of RAM when using softfp64. Now it finishes
much more quickly and uses only ~200MB of RAM.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Timothy Arceri [Wed, 10 Jul 2019 04:11:23 +0000 (14:11 +1000)]
radv: fix memory leak when restoring from cache
Fixes: 726a31df705b ("radv: Add the concept of radv shader binaries.")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Kristian H. Kristensen [Tue, 11 Jun 2019 18:27:36 +0000 (11:27 -0700)]
freedreno: Generate headers from xml files
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Rob Clark <robdclark@gmail.com>
Samuel Pitoiset [Wed, 10 Jul 2019 13:18:58 +0000 (15:18 +0200)]
radv: switch to the new VS exports path
It will help for GS as NGG on GFX10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 13:18:57 +0000 (15:18 +0200)]
radv: set the slot_index correctly for VARYING_SLOT_CLIP_DIST1
For selecting a different SQ_EXP_POS target.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 13:18:56 +0000 (15:18 +0200)]
radv: add a new function for exporting VS outputs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 13:18:55 +0000 (15:18 +0200)]
radv: implement new path for exporting generic varyings
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 13:18:54 +0000 (15:18 +0200)]
radv: use the generic export path for clip/cull distances
When they are exported to the next stage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 13:18:53 +0000 (15:18 +0200)]
radv: remove an extra memcpy when exporting clip/cull distances
Cleanup.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Thu, 21 Feb 2019 23:20:39 +0000 (17:20 -0600)]
intel/compiler: Add a "base class" for program keys
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data. I'd like to add another thing or two and need a
place to put it. This commit adds a new brw_base_prog_key struct which
contains those two common bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Sat, 23 Feb 2019 17:53:43 +0000 (11:53 -0600)]
i965/program_cache: Cast the key to char * before adding key_size
We're about to change the type of key to be brw_base_prog_key and that
will mean blindly adding the key size without a cast will lead to the
wrong calculation. It's safer to cast to char * first anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Sun, 23 Jun 2019 14:26:59 +0000 (09:26 -0500)]
anv: Make the workaround BO a whole page
I'm not 100% sure how this ever worked because gem_create usually shoots
you if the BO size isn't page-aligned.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Sat, 22 Jun 2019 15:57:39 +0000 (10:57 -0500)]
anv: Set Stateless Data Port Access MOCS
This is the MOCS setting used for the A64 stateless messages which we
sometimes use for SSBO operations.
Fixes: 48ed2a7bb009 "anv: Implement VK_EXT_buffer_device_address"
Fixes: 79fb0d27f3ab "anv: Implement SSBOs bindings with GPU addr..."
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 18:30:00 +0000 (11:30 -0700)]
panfrost: Clamp point size
It's not clear the hardware really has a maximum which confuses dEQP;
clamp to whatever we report as our maximum.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 17:36:16 +0000 (10:36 -0700)]
pan/decode: Auto style
$ astyle *.c *.h --style=linux -s8
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 17:33:24 +0000 (10:33 -0700)]
panfrost: Move non-Gallium files outside of Gallium
In preparation for a Panfrost-based non-Gallium driver (maybe
Vulkan...?), hoist everything except for the Gallium driver into a
shared src/panfrost. Practically, that means the compilers, the headers,
and pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 17:10:31 +0000 (10:10 -0700)]
panfrost: Style main Gallium driver
$ astyle *.c *.h --style=linux -s8
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 17:00:50 +0000 (10:00 -0700)]
panfrost/midgard: Apply code styling
$ astyle *.c *.h --style=linux -s8
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 16:58:21 +0000 (09:58 -0700)]
panfrost/nir: Apply NIR style
$ astyle *.c *.h --style=linux -s3
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 16:51:32 +0000 (09:51 -0700)]
panfrost: Move midgard/nir* to nir folder
The reason for doing this is two-fold:
1. These passes are likely to be shared with the Bifrost compiler
Therefore, we don't want to restrict them to Midgard
2. The coding style is different (NIR-style vs Panfrost-style)
The NIR passes are candidates for moving upstream into
compiler/nir, so don't block that off for stylistic reasons
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 15:06:36 +0000 (08:06 -0700)]
panfrost: Typofix
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 14:22:19 +0000 (07:22 -0700)]
panfrost: Identify shared tiler structure
This is identical across SFBD/MFBD so pull it out to allow for better
code sharing.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 16:37:08 +0000 (09:37 -0700)]
panfrost/midgard: Drop unnecessary assert
Just use the #define instead.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 16:36:03 +0000 (09:36 -0700)]
panfrost: Don't expose OES_standard_derivatives
This has not been implemented quite yet.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Erik Faye-Lund [Fri, 5 Jul 2019 14:36:41 +0000 (16:36 +0200)]
gallium: get rid of PIPE_CAP_SM3
PIPE_CAP_SM3 has always been an odd one out of all our caps. While most
other caps are fine-grained and single-purpose, this cap encode several
features in one. And since OpenGL cares more about single features, it'd
be nice to get rid of this one.
As it turns, this is now relatively simple. We only really care about
three features using this cap, and those already got their own caps. So
we can remove it, and make sure all current drivers just give the same
response to all of them.
The only place we *really* care about SM3 is in nine, and there we can
instead just re-construct the information based on the finer-grained
caps. This avoids DX9 semantics from needlessly leaking into all of the
drivers, most of who doesn't care a whole lot about DX9 specifically.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Erik Faye-Lund [Fri, 5 Jul 2019 14:21:45 +0000 (16:21 +0200)]
gallium: give vertex-shader saturate its own cap
Shader Model 3.0 is a big promise to make to the state-tracker, and
for instance mobile hardware might support vertex-shader saturate but
not some of the other features of SM3. So let's give this its own cap
for simplicity.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Erik Faye-Lund [Fri, 5 Jul 2019 14:08:19 +0000 (16:08 +0200)]
gallium: give fragment-shader derivatives its own cap
Shader Model 3.0 is a big promise to make to the state-tracker, and
for instance mobile hardware might support fragment-shader derivatives
but not some of the other features of SM3. So let's give this its own
cap for simplicity.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Erik Faye-Lund [Fri, 5 Jul 2019 13:46:38 +0000 (15:46 +0200)]
gallium: give fragment-shader texture-lod its own cap
Shader Model 3.0 is a big promise to make to the state-tracker, and
for instance mobile hardware might support texture lod but not some
of the other features of SM3. So let's give this its own cap for
simplicity.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Erik Faye-Lund [Fri, 5 Jul 2019 14:10:49 +0000 (16:10 +0200)]
mesa/st: drop needless has_shader_model3 boolean
This boolean is only consulted once during init, so there's nothing
much saved by storing this in the context. So let's just check directly
when we need it instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Alyssa Rosenzweig [Wed, 10 Jul 2019 13:23:13 +0000 (06:23 -0700)]
panfrost: Fix copyright identifier in a few places
Oops.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Mon, 8 Jul 2019 16:25:08 +0000 (09:25 -0700)]
panfrost: Bikeshed pan_screen.c comment
The asterisks were inherited from... softpipe, maybe?
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Mon, 8 Jul 2019 16:14:59 +0000 (09:14 -0700)]
panfrost: Check GPU version before loading
Panfrost is known to only work on a select few CPU/GPU combinations at
the moment (tested system-on-chips: RK3288, RK3399, and S912). Whitelist
the combinations known to work and refuse to load on others where
nothing works yet to avoid user confusion.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Mon, 8 Jul 2019 15:44:49 +0000 (08:44 -0700)]
panfrost: Be more honest about PIPE_CAPs
A lot of the pan_screen.c code was cargoculted from other drivers. The
upshot is that we return true for a lot of PIPE_CAPs that we don't
actually support, resulting in us exposing way too many extensions that
we don't actually support. Be more careful.
Some CAPs we do need to fake to access higher dEQP versions (i.e. in
order to debug the features we're hiding behind the CAP). For these, we
hide the CAP behind a special PAN_MESA_DEBUG=deqp option to avoid
apps randomly using these in-development features.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Tue, 9 Jul 2019 17:59:34 +0000 (10:59 -0700)]
panfrost/midgard: Hit missed scheduling opportunity
Don't try to schedule to vmul when that can't possible work (forcing a
bundle break). glmark:
total bundles in shared programs: 2700 -> 2683 (-0.63%)
bundles in affected programs: 695 -> 678 (-2.45%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.21 x̃: 1
helped stats (rel) min: 1.27% max: 7.69% x̄: 4.30% x̃: 4.77%
95% mean confidence interval for bundles value: -1.68 -0.75
95% mean confidence interval for bundles %-change: -5.63% -2.97%
Bundles are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 9 Jul 2019 18:16:57 +0000 (11:16 -0700)]
panfrost/midgard: Include shader size for shader-db
It's easy to forget about, but shader size does matter for things like
i-cache, so let's include it in the analysis.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 9 Jul 2019 18:10:49 +0000 (11:10 -0700)]
panfrost/midgard: Include loop count for shader-db
We have to emit it anyway for the report to be happy (with respect to
unrolling), so return an actual count rather than dummy numbers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 8 Jul 2019 23:42:29 +0000 (16:42 -0700)]
panfrost/midgard: Dump shader-db stats
All the kool kids are doing it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 8 Jul 2019 19:40:34 +0000 (12:40 -0700)]
panfrost/midgard: Flush undefineds to zero
Fixes a buggy dEQP test.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 8 Jul 2019 19:02:33 +0000 (12:02 -0700)]
panfrost/midgard: Specify channel count for broadcasting ops
bany/ball type ops read from all 4 channels even though they only write
to 1; specify this in the opcode table like we do for dot products.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 8 Jul 2019 18:51:14 +0000 (11:51 -0700)]
panfrost/midgard: Don't try to "alias" texture registers
It won't work. Just, stop it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Samuel Pitoiset [Tue, 9 Jul 2019 07:18:25 +0000 (09:18 +0200)]
radv: compute correct number of input vertices for NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 11:03:50 +0000 (13:03 +0200)]
radv: remove extra code for exporting LayerID to the next stage
Now that the output usage mask is set to 0x1 the LayerID is
correctly exported in the loop above.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 11:03:49 +0000 (13:03 +0200)]
radv: set the LayerId output usage mask if FS needs it
When the stage preceding FS doesn't export it the fragment shader
might read it, even if it's 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Alyssa Rosenzweig [Mon, 1 Jul 2019 18:53:38 +0000 (11:53 -0700)]
panfrost: Update supported formats
Much of the format selection code was inherited from softpipe (!) of all
places, and a lot of it is accordingly cruft. Later if-elses were added
in random places to workaround missing formats at various points in
history. Clean up some of this.
Theoretically, any format we can texture from we can also render to. In
practice, there are a few corner cases that we need to disable
explicitly.
For one, we do have to restrict SCANOUT formats to workaround
buggy apps (in particular, dEQP which with --deqp-surface-type=window
under Weston will end up with RGB10_A2 and complain about low alpha
precision). Just be clearer about how/why.
Also, RGB5_A1 support is still broken; let's not worry about that quite
yet.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 5 Jul 2019 13:26:48 +0000 (06:26 -0700)]
panfrost/mfbd: Cleanup format code selection
Rather than have random variables flying around and a long if-else
chain, use a switch. They're literally *designed* for this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 3 Jul 2019 22:31:24 +0000 (15:31 -0700)]
panfrost/midgard: Cleanup blend switch
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 5 Jul 2019 22:59:22 +0000 (15:59 -0700)]
panfrost/mfbd: Handle PIPE_FORMAT_B10G10R10A2_UNORM
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 5 Jul 2019 22:58:54 +0000 (15:58 -0700)]
panfrost/midgard: Handle PIPE_FORMAT_B10G10R10A2_UNORM
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 3 Jul 2019 19:34:32 +0000 (12:34 -0700)]
panfrost: Implement ES3-format writeout
We add support for writing out (via a blend shader):
- RGBA4
- RGB10_A2_UNORM
- RGB10_A2_UINT
- RGB5_A1_UNORM
- R11G11B10_FLOAT
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 5 Jul 2019 22:40:08 +0000 (15:40 -0700)]
panfrost: Refactor blend infrastructure
We would like to permit keying blend shaders against the framebuffer
format, which requires some new blending abstractions.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 5 Jul 2019 23:51:30 +0000 (16:51 -0700)]
panfrost/midgard: Use unsigned blend patch offset
We would like the offset field to be unsigned, letting 0 represent "no
offset" and positive represent an offset.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 17:54:23 +0000 (10:54 -0700)]
panfrost/midgard: Handle pure int formats
I'm not sure I'm totally comfortable with this, but conceptually neither
float nor pure-int formats require any format conversion, except size
conversion. Going from a shaderable format (fp32 or i16, for instance)
into a blendable format (fp16) is a separate question, one we can defer
momentarily while we're not interested in actually blending.
As an aside, I'd be fascinated by an integer-based blending
implementation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 17:50:00 +0000 (10:50 -0700)]
panfrost/mfbd: Handle pure int formats
We see that the render target itself turns out to be typeless
(surprise!)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 16:54:23 +0000 (09:54 -0700)]
panfrost: Set rt_count_2 for bpp>4 formats
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 16:23:31 +0000 (09:23 -0700)]
panfrost/midgard: Implement preliminary float converters
We'll need some careful handling, but for now, get some baseline code
out for handling float formats in a blend shader.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 16:48:19 +0000 (09:48 -0700)]
panfrost/midgard: Skip blend for REPLACE (shader)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 16:08:18 +0000 (09:08 -0700)]
panfrost: Handle "blend disabled" blend shaders
Normally, disabled blend can definitely be fixed-function'd away, but
if a blend shader is used merely for format conversion rather than
blending, this code path can be nevertheless hit.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 13:47:13 +0000 (06:47 -0700)]
panfrost: Route format through fixed-function blending
Not all framebuffer formats are supported by the fixed-function blender.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 13:34:52 +0000 (06:34 -0700)]
panfrost: Pipe framebuffer format around
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 13:28:47 +0000 (06:28 -0700)]
panfrost/midgard: Use Gallium framebuffer formats
Ideally, we would keep Galliumisms far away from the compiler;
unfortunately, Mesa hasn't standardized on system of format codes to be
shared across APIs and across drivers, so using Gallium formats is our
best bet in the short run.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 02:55:00 +0000 (19:55 -0700)]
panfrost/midgard: Use fp16 exclusively while blending
We now have some preliminary fp16 support available. We're not able to
expose this for GLSL quite yet, but for internal blend shaders, we're
able to do control bitness ourselves just fine. So let's fp16 that
stuff!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 13:05:18 +0000 (06:05 -0700)]
panfrost/midgard: Remove opt_copy_prop_tex
Eventually this should be replaced by proper tex RA / not emitting so
many silly moves to begin with / better general copy prop. For now
remove it since it breaks things.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 03:23:50 +0000 (20:23 -0700)]
panfrost/midgard: Fix scalarification
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 03:02:57 +0000 (20:02 -0700)]
panfrost/midgard: Handle fp16 in embedded_to_inline_constants
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 02:57:49 +0000 (19:57 -0700)]
panfrost/midgard: Eliminate redundant type convert
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 02:49:51 +0000 (19:49 -0700)]
panfrost/midgard: Fix fp16 embedded constants
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 01:51:48 +0000 (18:51 -0700)]
panfrost/midgard: Hoist mask field
Share a single mask field in midgard_instruction with a unified format,
rather than using separate masks for each instruction tag with
hardware-specific formats. Eliminates quite a bit of duplicated code and
will enable vec8/vec16 masks as well (which don't map as cleanly to the
hardware as we might like).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 01:35:28 +0000 (18:35 -0700)]
panfrost/midgard: Allow fp16 in scalar ALU
The packing is a little different, so implement that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 2 Jul 2019 00:41:20 +0000 (17:41 -0700)]
panfrost/midgard: Implement f2u16 and friends
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>