mesa.git
6 years agoi965/icl: Disable binding table prefetching
Topi Pohjolainen [Wed, 30 May 2018 11:46:08 +0000 (07:46 -0400)]
i965/icl: Disable binding table prefetching

Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. It fixes multiple gpu hangs in
ext_framebuffer_multisample* tests on ICLLP B0 h/w.

Anuj: Add comments and commit message.
      Add gen 11 checks in the code.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agoglsl: use only copy_propagation_elements
Caio Marcelo de Oliveira Filho [Fri, 15 Jun 2018 21:06:57 +0000 (14:06 -0700)]
glsl: use only copy_propagation_elements

Now that the elements version handles both cases, remove the
non-elements version.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
6 years agoglsl: teach copy_propagation_elements to deal with whole variables
Caio Marcelo de Oliveira Filho [Wed, 27 Jun 2018 21:39:54 +0000 (14:39 -0700)]
glsl: teach copy_propagation_elements to deal with whole variables

Keep information in acp_entry whether the entry is full or not, and
use the ACP in more nodes when visiting the instructions:

- add_copy: write whole variables to the ACP state (regardless the
  type).

- visit(ir_dereference_variable *): perform the propagation here if we have a
  full candidate. Element-wise here doesn't apply because the mask
  isn't available at this point.

- visit_leave(ir_assignment *): process beyond scalar and vector, as
  the full variables might have other types.

Also import an improvement from opt_copy_propagation.cpp: if ir_call
is an intrinsic, we know the variables affected, so keep going.

v2: (all from Eric Anholt)
    Describe how acp_entry attributes are used.
    Don't do book-keeping to avoid adding repeated element to
    the dsts in write_elements().

v3: Use _mesa_set_remove_key. (Thomas Helland)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
6 years agoi965: Disable guardband clipping on SandyBridge for odd dimensions
vadym.shovkoplias [Thu, 24 May 2018 11:16:46 +0000 (14:16 +0300)]
i965: Disable guardband clipping on SandyBridge for odd dimensions

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104388
Signed-off-by: Andriy Khulap <andriy.khulap@globallogic.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agodocs: Update release calendar, add news item, and add release notes for 18.1.5
Dylan Baker [Fri, 27 Jul 2018 14:08:59 +0000 (07:08 -0700)]
docs: Update release calendar, add news item, and add release notes for 18.1.5

6 years agodocs: Add sha-256 sums for 18.1.5
Dylan Baker [Fri, 27 Jul 2018 14:06:08 +0000 (07:06 -0700)]
docs: Add sha-256 sums for 18.1.5

6 years agodocs: add 18.1.5 release notes
Dylan Baker [Thu, 26 Jul 2018 17:48:51 +0000 (10:48 -0700)]
docs: add 18.1.5 release notes

6 years agointel/compiler: fix lower conversions to account for predication
Iago Toral Quiroga [Tue, 17 Jul 2018 09:10:34 +0000 (11:10 +0200)]
intel/compiler: fix lower conversions to account for predication

The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
6 years agoradv: allocate enough space in radv_cmd_buffer_after_draw()
Samuel Pitoiset [Wed, 25 Jul 2018 15:01:46 +0000 (17:01 +0200)]
radv: allocate enough space in radv_cmd_buffer_after_draw()

The driver might emit up to 4 dwords when RADV_TRACE_FILE is
used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: check CS space in radv_emit_write_data_packet()
Samuel Pitoiset [Wed, 25 Jul 2018 14:56:06 +0000 (16:56 +0200)]
radv: check CS space in radv_emit_write_data_packet()

This wasn't wrong but it looks better to me like this. It's
only used for debugging purposes (ie. RADV_TRACE_FILE).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: do not emit pipeline stats flushes on compute queue
Samuel Pitoiset [Fri, 27 Jul 2018 09:50:27 +0000 (11:50 +0200)]
radv: do not emit pipeline stats flushes on compute queue

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: reduce CB/DB meta flushes in radv_dst_access_flush()
Samuel Pitoiset [Fri, 27 Jul 2018 09:50:03 +0000 (11:50 +0200)]
radv: reduce CB/DB meta flushes in radv_dst_access_flush()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: Fix build
Kenneth Graunke [Fri, 27 Jul 2018 06:56:30 +0000 (23:56 -0700)]
radv: Fix build

I renamed this pass and forgot to update radv.

Fixes: 488972222c6454551ab1559f753c13a493dc513f ("i965: Combine both gl_PatchVerticesIn lowering passes.")
6 years agoi965: Combine both gl_PatchVerticesIn lowering passes.
Kenneth Graunke [Wed, 18 Jul 2018 23:42:03 +0000 (16:42 -0700)]
i965: Combine both gl_PatchVerticesIn lowering passes.

Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases.  Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward.  This simplified the
passes, but made life painful for the callers.

This patch combines both into a single pass.  If you give it a non-zero
static count, it uses that.  If you give it Mesa state slots, it turns
it back into a built-in uniform.  Otherwise, it does nothing.

This also moves the i965 uniform lowering out to shared code.

v2: Make token arrays const.

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoi965: Expose EXT_base_instance extension in OpenGLES 3.0
Sagar Ghuge [Wed, 25 Jul 2018 17:48:31 +0000 (10:48 -0700)]
i965: Expose EXT_base_instance extension in OpenGLES 3.0

The extension requires at least OpenGL 3.0 and
OpenGL ES 3.0.

Fixes two ext_base_instance tests:

arb_base_instance-baseinstance-doesnt-affect-gl-instance-id_gles3
arb_base_instance-drawarrays_gles3

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
6 years agoradv: Add support for ETC2 textures.
Bas Nieuwenhuizen [Thu, 4 Jan 2018 00:32:04 +0000 (01:32 +0100)]
radv: Add support for ETC2 textures.

Was surprised that is even supported by Vega.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoclover: Reduce wait_count in abort path.
Jan Vesely [Wed, 25 Jul 2018 02:17:28 +0000 (22:17 -0400)]
clover: Reduce wait_count in abort path.

Trigger waiter condition variable.
Passes 'events' CTS on carrizo and turks.
v2: reduce to 0

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agoclover: Don't extend illegal integer types.
Jan Vesely [Sun, 22 Jul 2018 18:14:21 +0000 (14:14 -0400)]
clover: Don't extend illegal integer types.

It's OK to pass them in memory, which is what kernel invocation needs.
Fixes regressions since llvm r337535 ("Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"):
scalar-arithmetic-char
scalar-arithmetic-uchar
scalar-arithemtic-short
scalar-arithmetic-ushort
scalar-comparison-char
scalar-comparison-uchar
scalar-comparison-short
scalar-comparison-ushort

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agointel/compiler: Delete dead VS intrinsic handling.
Kenneth Graunke [Wed, 18 Jul 2018 22:45:46 +0000 (15:45 -0700)]
intel/compiler: Delete dead VS intrinsic handling.

These are lowered by brw_nir_lower_vs_inputs().  If they weren't, we
would have already hit the unreachable() in emit_system_values_block().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agov3d: Avoid the GFXH-1461 workaround if we have only Z or only S.
Eric Anholt [Mon, 23 Jul 2018 21:08:54 +0000 (14:08 -0700)]
v3d: Avoid the GFXH-1461 workaround if we have only Z or only S.

This seems like a sensible precaution to avoid extra draws.  It doesn't
deal with the case of a Z24S8 buffer created by the window system for an
application that happens to never use S.

6 years agov3d: Rework the ordering of how we clear things.
Eric Anholt [Mon, 23 Jul 2018 20:56:40 +0000 (13:56 -0700)]
v3d: Rework the ordering of how we clear things.

First, figure out if we can just sneak the clear into the TLB clear, even
if drawing has already happened (since we have job->load and job->clear to
tell us), taking into account GFXH-1461.  For any pieces we can't TLB
clear, fall back to drawing a quad without flushing the scene.

Fixes extra scene flushes in glmark2 due to GFXH-1461.

6 years agov3d: Only store buffers that have been written to.
Eric Anholt [Mon, 23 Jul 2018 20:43:25 +0000 (13:43 -0700)]
v3d: Only store buffers that have been written to.

I've seen cases where a color buffer is bound, but only Z is written, and
we end up storing color.

6 years agov3d: Track the buffers being loaded separately.
Eric Anholt [Mon, 23 Jul 2018 20:30:58 +0000 (13:30 -0700)]
v3d: Track the buffers being loaded separately.

We were computing this at RCL generation time, but that means you can't
unflag the store for an invalidate_resource, or not flag the store if
writmasking is disabled.

6 years agov3d: Rename cleared/resolve to clear/store.
Eric Anholt [Mon, 23 Jul 2018 20:23:07 +0000 (13:23 -0700)]
v3d: Rename cleared/resolve to clear/store.

These describe what the fields mean in RCL generation.  "resolve" is left
over from VC4, and sounds like MSAA resolves (which may or may not be
involved in the store we generate).

6 years agonir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
Eric Anholt [Fri, 6 Jul 2018 20:43:06 +0000 (13:43 -0700)]
nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.

This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agodocs: fix incorrect placement of the ARB_sample_locations release notes
Rhys Perry [Wed, 25 Jul 2018 12:49:36 +0000 (13:49 +0100)]
docs: fix incorrect placement of the ARB_sample_locations release notes

Seems something went wrong somehow when it was pushed.

v2: combine into one list

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek OIšák <marek.olsak@amd.com>
6 years agoanv: drop unused local vars
Eric Engestrom [Wed, 18 Jul 2018 13:44:59 +0000 (14:44 +0100)]
anv: drop unused local vars

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: remove incorrect `UNUSED` flag
Eric Engestrom [Tue, 24 Jul 2018 08:54:33 +0000 (09:54 +0100)]
anv: remove incorrect `UNUSED` flag

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agogallium: initialize ureg_dst::Invariant bit
Erik Faye-Lund [Wed, 25 Jul 2018 15:20:34 +0000 (17:20 +0200)]
gallium: initialize ureg_dst::Invariant bit

When this bit was added, it seems the some initialization code
was omitted by mistake.

Since stack-variables have kinda random contents, and we don't
zero initialize the whole struct in these code-paths, we end up
getting random-ish values for this bit.

Spotted by Coverity in the following CIDs:
1438115
1438123
1438130

Fixes: 70425bcfe63c4e9191809659d019ec4af923595d ("gallium: plumb
invariant output attrib thru TGSI")

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoradv: fix adjusting vertex fetches since 16bit support
Samuel Pitoiset [Wed, 25 Jul 2018 12:55:31 +0000 (14:55 +0200)]
radv: fix adjusting vertex fetches since 16bit support

Move the integer conversion after the fixup.

This fixes some regressions with
dEQP-VK.pipeline.vertex_input.single_attribute.mat4.as_a2r10g10b10*

Fixes: b722b29f10 ("radv: add support for 16bit input/output")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonir: remove wrong assertion in print_var_decl()
Samuel Pitoiset [Wed, 25 Jul 2018 12:30:47 +0000 (14:30 +0200)]
nir: remove wrong assertion in print_var_decl()

This breaks printing input/output variables with more than
4 components like mat4.

Fixes: 1beef89ad8 ("nir: prepare for bumping up max components to 16")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: fix typo DSL_SEL -> DST_SEL
Marek Olšák [Thu, 26 Jul 2018 03:14:28 +0000 (23:14 -0400)]
ac: fix typo DSL_SEL -> DST_SEL

6 years agoradeonsi: update a comment about cache behavior
Marek Olšák [Thu, 26 Jul 2018 00:21:04 +0000 (20:21 -0400)]
radeonsi: update a comment about cache behavior

6 years agointel: Make the decoder just store addresses for bases, not buffers.
Kenneth Graunke [Wed, 25 Jul 2018 17:23:04 +0000 (10:23 -0700)]
intel: Make the decoder just store addresses for bases, not buffers.

The various base addresses are simply addresses.  There may or may not
be a buffer located at those addresses.  So, it doesn't make much sense
to request one.  Just save the raw address so we can add it later, when
asking about BOs at the final <base + offset> address.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel: Make the decoder handle STATE_BASE_ADDRESS not being a buffer.
Kenneth Graunke [Wed, 11 Jul 2018 17:50:16 +0000 (10:50 -0700)]
intel: Make the decoder handle STATE_BASE_ADDRESS not being a buffer.

Normally, i965 programs STATE_BASE_ADDRESS every batch, and puts all
state for a given base in a single buffer.

I'm working on a prototype which emits STATE_BASE_ADDRESS only once at
startup, where each base address is a fixed 4GB region of the PPGTT.
State may live in many buffers in that 4GB region, even if there isn't
a buffer located at the actual base address itself.

To handle this, we need to save the STATE_BASE_ADDRESS values across
multiple batches, rather than assuming we'll see the command each time.
Then, each time we see a pointer, we need to ask the driver for the BO
map for that data.  (We can't just use the map for the base address, as
state may be in multiple buffers, and there may not even be a buffer
at the base address to map.)

v2: Fix things caught in review by Lionel:
 - Drop bogus bind_bo.size check.
 - Drop "get the BOs again" code - we just get the BOs as needed
 - Add a message about interface descriptor data being unavailable

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: don't crash on vkDestroyDevice(NULL)
Eric Engestrom [Wed, 25 Jul 2018 18:43:24 +0000 (19:43 +0100)]
anv: don't crash on vkDestroyDevice(NULL)

CovID: 1438132
Fixes: a99c9e63a07477634ab73 "anv: finish the binding_table_pool on
                              destroyDevice when use_softpin"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
6 years agovulkan/wsi: fix incorrect assignment in assert()
Eric Engestrom [Wed, 25 Jul 2018 18:51:51 +0000 (19:51 +0100)]
vulkan/wsi: fix incorrect assignment in assert()

CovID: 1438113143811814381191438121
Fixes: dc1d10b396179766227df "anv,radv: Add support for VK_KHR_get_display_properties2"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoanv: fix python whitespace warning
Eric Engestrom [Wed, 18 Jul 2018 13:40:23 +0000 (14:40 +0100)]
anv: fix python whitespace warning

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agoanv: cleanup python imports
Eric Engestrom [Wed, 18 Jul 2018 13:39:36 +0000 (14:39 +0100)]
anv: cleanup python imports

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agoanv: remove unnecessary semicolons in python
Eric Engestrom [Wed, 18 Jul 2018 13:38:11 +0000 (14:38 +0100)]
anv: remove unnecessary semicolons in python

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agost/nir: Fix st_nir_opts() prototype.
Kenneth Graunke [Tue, 24 Jul 2018 18:36:06 +0000 (11:36 -0700)]
st/nir: Fix st_nir_opts() prototype.

This wasn't updated for the new scalar ISA parameter.  It worked anyway
because all the function's callers live in the same file, so it found
the correct function.  Tim made this external for the new st prog_to_nir
translator, which got reverted, but which I'd like to land eventually.

So, fix the prototype.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
6 years agointel: tools: dump: only store device id on success
Lionel Landwerlin [Mon, 23 Jul 2018 14:39:12 +0000 (15:39 +0100)]
intel: tools: dump: only store device id on success

We might fail on master node drm fd because we won't have the right
permissions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agor600: Scale integer valued texture border colors to float (v2)
Gert Wollny [Fri, 29 Jun 2018 19:44:53 +0000 (21:44 +0200)]
r600: Scale integer valued texture border colors to float (v2)

It seems the hardware always expects floating point border color values
[0,1] for unsigned, and [-1,1] for signed texture component, regardless
of pixel type, but the border colors are passed according to texture
component type. Hence, before submitting the border color, convert and
scale it these ranges accordingly.

This doesn't seem to work for textures with 32 bit integer components
though, here, it seems that the border color is always set to zero,
regardless of the BORDER_COLOR_TYPE state set in Q_TEX_SAMPLER_WORD0_0.

v2: Simplyfy logic as suggested by Roland Schneidegger

Fixes:
  dEQP-GLES31.functional.texture.border_clamp.formats.compressed*
  dEQP-GLES31.functional.texture.border_clamp.formats.r* (non 32 bit integer)
  dEQP-GLES31.functional.texture.border_clamp.per_axis_wrap_mode.texture_2d*
 and a number of piglits out of
  piglit run gpu -t texture -t gather -t formats

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agonir: Add a couple of iand/ior optimizations
Jason Ekstrand [Mon, 23 Jul 2018 07:35:02 +0000 (00:35 -0700)]
nir: Add a couple of iand/ior optimizations

Spotted in a shader in Batman: Arkham City.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoi965, anv: Use INTEL_DEBUG for disk_cache driver flags
Jordan Justen [Sat, 21 Jul 2018 06:52:59 +0000 (23:52 -0700)]
i965, anv: Use INTEL_DEBUG for disk_cache driver flags

Since various options within INTEL_DEBUG could impact code generation,
we need to set the disk cache driver_flags parameter based on the
INTEL_DEBUG flags in use.

An example that will affect the program generated by i965 is the
INTEL_DEBUG=nocompact option.

The DEBUG_DISK_CACHE_MASK value is added to mask the settings of
INTEL_DEBUG that can affect program generation.

v2:
 * Use driver_flags (Tim)
 * Also update Anvil (Jason)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoi965, anv: Add extra unused character in disk_cache renderer temp string
Jordan Justen [Sat, 21 Jul 2018 06:41:23 +0000 (23:41 -0700)]
i965, anv: Add extra unused character in disk_cache renderer temp string

This extra character should not be used by snprintf, but we make it
available to verify that we printed the exact number we wanted, and
didn't overflow.

v2:
 * Also update Anvil

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agomesa: allow indirect draws with the default VAO and compatibility profile
Marek Olšák [Tue, 24 Jul 2018 04:11:47 +0000 (00:11 -0400)]
mesa: allow indirect draws with the default VAO and compatibility profile

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomesa: Fix copy-paste error in ConservativeRasterDilateRange initialization
Danylo Piliaiev [Wed, 18 Jul 2018 08:58:04 +0000 (11:58 +0300)]
mesa: Fix copy-paste error in ConservativeRasterDilateRange initialization

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 4580617509d ("mesa: add support for nvidia conservative
rasterization extensions")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agonir/serialize: Alloc constants off the variable
Jason Ekstrand [Tue, 24 Jul 2018 18:01:20 +0000 (11:01 -0700)]
nir/serialize: Alloc constants off the variable

nir_sweep assumes that constants area always allocated off the variable
to which they belong.  Violating this assumption causes them to get
freed early and leads to use-after-free bugs.

Fixes: 120da00975541 "nir: add serialization and deserialization"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107366
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
6 years agonir: rename f2f16_undef to f2f16
Karol Herbst [Thu, 26 Apr 2018 19:06:08 +0000 (21:06 +0200)]
nir: rename f2f16_undef to f2f16

we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.

v2: rebased on master

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
6 years agonir: add builtin builder
Karol Herbst [Wed, 25 Apr 2018 16:19:23 +0000 (18:19 +0200)]
nir: add builtin builder

also move some of the GLSL builtins over we will need for implementing
some OpenCL builtins

v2: replace NIR_IMM_FP by nir_imm_floatN_t in ported code
    fix up changes caused by swizzle rework

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
6 years agonir/spirv: import OpenCL.std.h
Rob Clark [Mon, 22 Jan 2018 16:05:07 +0000 (11:05 -0500)]
nir/spirv: import OpenCL.std.h

Lightly edited to be valid 'C' code.

Is there a bug open to fix this upstream?

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
6 years agoradeonsi: handle SI_FORCE_FAMILY early
Marek Olšák [Fri, 20 Jul 2018 02:44:27 +0000 (22:44 -0400)]
radeonsi: handle SI_FORCE_FAMILY early

before LLVM target machines are created

6 years agopython: Use range() instead of xrange()
Mathieu Bridon [Fri, 6 Jul 2018 10:22:18 +0000 (12:22 +0200)]
python: Use range() instead of xrange()

Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.

Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().

As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agopython: Better use iterators
Mathieu Bridon [Thu, 5 Jul 2018 13:17:39 +0000 (15:17 +0200)]
python: Better use iterators

In Python 2, iterators had a .next() method.

In Python 3, instead they have a .__next__() method, which is
automatically called by the next() builtin.

In addition, it is better to use the iter() builtin to create an
iterator, rather than calling its __iter__() method.

These were also introduced in Python 2.6, so using it makes the script
compatible with Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agopython: Better sort dictionary keys/values
Mathieu Bridon [Fri, 6 Jul 2018 10:17:50 +0000 (12:17 +0200)]
python: Better sort dictionary keys/values

In Python 2, dict.keys() and dict.values() both return a list, which can
be sorted in two ways:

* l.sort() modifies the list in-place;
* sorted(l) returns a new, sorted list;

In Python 3, dict.keys() and dict.values() do not return lists any more,
but iterators. Iterators do not have a .sort() method.

This commit moves the build scripts to using sorted() on dict keys and
values, which makes them compatible with both Python 2 and Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agopython: Better iterate over dictionaries
Mathieu Bridon [Fri, 6 Jul 2018 10:20:26 +0000 (12:20 +0200)]
python: Better iterate over dictionaries

In Python 2, dictionaries have 2 sets of methods to iterate over their
keys and values: keys()/values()/items() and iterkeys()/itervalues()/iteritems().

The former return lists while the latter return iterators.

Python 3 dropped the method which return lists, and renamed the methods
returning iterators to keys()/values()/items().

Using those names makes the scripts compatible with both Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agopython: Stop using the string module
Mathieu Bridon [Thu, 5 Jul 2018 13:17:36 +0000 (15:17 +0200)]
python: Stop using the string module

Most functions in the builtin string module also exist as methods of
string objects.

Since the functions were removed from the string module in Python 3,
using the instance methods directly makes the code compatible with both
Python 2 and Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agopython: Better check for keys in dicts
Mathieu Bridon [Thu, 5 Jul 2018 13:17:35 +0000 (15:17 +0200)]
python: Better check for keys in dicts

Python 3 lost the dict.has_key() method. Instead it requires using the
"in" operator.

This is also compatible with Python 2.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agointel: Make the disassembler take a const pointer to the assembly.
Kenneth Graunke [Wed, 11 Jul 2018 17:30:12 +0000 (10:30 -0700)]
intel: Make the disassembler take a const pointer to the assembly.

Disassembling doesn't modify the assembly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agotravis: manually generate sys/syscall.h
Andres Gomez [Thu, 19 Jul 2018 12:33:33 +0000 (15:33 +0300)]
travis: manually generate sys/syscall.h

Until now, the needed bits were wrongly included in linux/memfd.h

Since Travis' sys/syscall.h doesn't provide the SYS_memfd_create, we
generate that header manually, including the needed bits to avoid
compilation problems, as the ones observed after:
3228335b55c ("intel: aubinator: handle GGTT mappings")

v2: replace fixes commit with the first direct user of
    syscall.h (Emil).

Fixes: 3228335b55c ("intel: aubinator: handle GGTT mappings")
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
6 years agodocs: update calendar to match the 18.2 plan with the one announced
Andres Gomez [Thu, 19 Jul 2018 13:03:11 +0000 (16:03 +0300)]
docs: update calendar to match the 18.2 plan with the one announced

Additionally, I've extended the 18.1 cycle by one more release,
tentatively assigned to Dylan, due to the ~2 weeks delay for 18.2.

Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Juan A. Suarez <jasuarez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: move releases from Fridays to Wednesdays
Andres Gomez [Thu, 19 Jul 2018 13:00:07 +0000 (16:00 +0300)]
docs: move releases from Fridays to Wednesdays

As discussed at:
https://lists.freedesktop.org/archives/mesa-dev/2018-March/188525.html

Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Carl Worth <cworth@cworth.org>
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: correct typo in the submitting patches instructions
Andres Gomez [Thu, 19 Jul 2018 13:02:19 +0000 (16:02 +0300)]
docs: correct typo in the submitting patches instructions

Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoradv: Still enable inmemory & API level caching if disk cache is not enabled.
Bas Nieuwenhuizen [Tue, 24 Jul 2018 12:57:42 +0000 (14:57 +0200)]
radv: Still enable inmemory & API level caching if disk cache is not enabled.

That we don't have a background disk cache does not mean we should
prevent the app caching anything.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agogallium/tests: Don't ignore S3TC errors.
Jose Fonseca [Tue, 24 Jul 2018 12:57:05 +0000 (13:57 +0100)]
gallium/tests: Don't ignore S3TC errors.

Now we do full S3TC decompression they should no longer fail.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agoegl: Fix missing clamping in eglSetDamageRegionKHR
Harish Krupo [Sun, 8 Jul 2018 07:23:00 +0000 (12:53 +0530)]
egl: Fix missing clamping in eglSetDamageRegionKHR

Clamp the x and y co-ordinates of the rectangles.

v2: Clamp width/height after converting to co-ordinates
    (Ilia Merkin)

Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
6 years agoforward precise-flag if supported
Erik Faye-Lund [Wed, 11 Jul 2018 14:28:30 +0000 (15:28 +0100)]
forward precise-flag if supported

New versions of virglrenderer supports the precise-flag, so let's
forward it from TGSI if that's the case.

This fixes a few dEQP-GLES31 tests:
- dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradeonsi: fix pk2h breakage
Marek Olšák [Tue, 24 Jul 2018 02:11:12 +0000 (22:11 -0400)]
radeonsi: fix pk2h breakage

6 years agoradeonsi: reduce LDS stalls by 40% for tessellation
Marek Olšák [Fri, 13 Jul 2018 04:23:36 +0000 (00:23 -0400)]
radeonsi: reduce LDS stalls by 40% for tessellation

40% is the decrease in the LGKM counter (which includes SMEM too)
for the GFX9 LSHS stage.

This will make the LDS size slightly larger, but I wasn't able to increase
the patch stride without corruption, so I'm increasing the vertex stride.

6 years agoradeonsi: Add debug option to enable LLVM GlobalISel (v2)
Tom Stellard [Fri, 20 Jul 2018 17:54:56 +0000 (19:54 +0200)]
radeonsi: Add debug option to enable LLVM GlobalISel (v2)

R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than
SelectionDAG for instruction selection.

v2: mareko: move the helper to src/amd/common

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
6 years agointel/compiler: Account for built-in uniforms in analyze_ubo_ranges
Jason Ekstrand [Mon, 23 Jul 2018 16:41:26 +0000 (09:41 -0700)]
intel/compiler: Account for built-in uniforms in analyze_ubo_ranges

The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant.  One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader.  This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.

Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
6 years agoradv: enable VK_KHR_16bit_storage extension / 16bit storage features
Daniel Schürmann [Tue, 15 May 2018 15:10:12 +0000 (17:10 +0200)]
radv: enable VK_KHR_16bit_storage extension / 16bit storage features

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit load_push_constant
Daniel Schürmann [Mon, 16 Jul 2018 18:45:24 +0000 (20:45 +0200)]
ac: add support for 16bit load_push_constant

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: add support for 16bit input/output
Daniel Schürmann [Tue, 15 May 2018 15:09:03 +0000 (17:09 +0200)]
radv: add support for 16bit input/output

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonir: add 16bit type information to glsl types
Daniel Schürmann [Tue, 6 Feb 2018 17:53:33 +0000 (18:53 +0100)]
nir: add 16bit type information to glsl types

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit buffer loads
Daniel Schürmann [Tue, 15 May 2018 14:01:25 +0000 (16:01 +0200)]
ac: add support for 16bit buffer loads

v2: Fixed dvec3 loads (bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit UBO loads
Daniel Schürmann [Wed, 7 Feb 2018 18:40:43 +0000 (19:40 +0100)]
ac: add support for 16bit UBO loads

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit ssbo stores
Daniel Schürmann [Tue, 15 May 2018 09:27:25 +0000 (11:27 +0200)]
ac: add support for 16bit ssbo stores

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add 16bit conversion operations
Daniel Schürmann [Sat, 3 Feb 2018 13:37:26 +0000 (14:37 +0100)]
ac: add 16bit conversion operations

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agor600: enable tess_input_info for TES
Dave Airlie [Thu, 19 Jul 2018 04:39:15 +0000 (05:39 +0100)]
r600: enable tess_input_info for TES

There might be a nicer way to do this, but this is at least correct.

This fixes:
KHR-GL44.tessellation_shader.single.max_patch_vertices
KHR-GL44.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
6 years agodocs/features: fix virgl gles3.1 entries
Dave Airlie [Mon, 23 Jul 2018 20:10:06 +0000 (06:10 +1000)]
docs/features: fix virgl gles3.1 entries

6 years agodraw: force draw pipeline if there's more than 65535 vertices
Roland Scheidegger [Sat, 21 Jul 2018 23:05:39 +0000 (01:05 +0200)]
draw: force draw pipeline if there's more than 65535 vertices

The pt emit path can only handle 65535 - the number of vertices is
truncated to a ushort, resulting in a too small buffer allocation, which
will crash.

Forcing the pipeline path looks suboptimal, then again this bug is
probably there ever since GS is supported, so it seems it's not
happening often. (Note that the vertex_id in the vertex header is 16
bit too, however this is only used by the draw pipeline, and it denotes
the emit vertex nr, and that uses vbuf code, which will only emit smaller
chunks, so should be fine I think.)
Other solutions would be to simply allow 32bit counts for vertex
allocation, however 65535 is already larger than this was intended for
(the idea being it should be more cache friendly). Or could try to teach
the pt emit path to split the emit in smaller chunks (only the non-index
path can be affected, since gs output is always linear), but it's a bit
tricky (we don't know the primitive boundaries up-front).

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107295
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agodocs/features: note ARB_copy_image is working on virgl
Dave Airlie [Mon, 23 Jul 2018 20:05:50 +0000 (06:05 +1000)]
docs/features: note ARB_copy_image is working on virgl

6 years agoRevert "virgl: remove unused stride-arguments"
Dave Airlie [Mon, 23 Jul 2018 20:03:03 +0000 (06:03 +1000)]
Revert "virgl: remove unused stride-arguments"

This reverts commit dc938b8398c0dafb60507e41685f7518b681c24d.

This adds warnings in vtest, and possibly breaks it.

6 years agodocs/features: note ssbo and atomic counters done for virgl
Dave Airlie [Wed, 18 Jul 2018 02:36:04 +0000 (12:36 +1000)]
docs/features: note ssbo and atomic counters done for virgl

6 years agovirgl: add initial shader_storage_buffer_object support. (v2)
Dave Airlie [Tue, 17 Jul 2018 07:24:29 +0000 (17:24 +1000)]
virgl: add initial shader_storage_buffer_object support. (v2)

This adds the guest side support for ARB_shader_storage_buffer_object.

Co-authors: Gurchetan Singh <gurchetansingh@chromium.org>

v2: move to using separate maximums
(fixup macros)

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agonir: Add a couple trivial abs optimizations
Jason Ekstrand [Mon, 23 Jul 2018 06:57:07 +0000 (23:57 -0700)]
nir: Add a couple trivial abs optimizations

Spotted in a shader in Batman: Arkham City.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoglsl: remove delegating constructors to allow build with C++98
Caio Marcelo de Oliveira Filho [Fri, 20 Jul 2018 20:21:33 +0000 (13:21 -0700)]
glsl: remove delegating constructors to allow build with C++98

Delegating constructors is a C++11 feature, so this was breaking when
compiling with C++98. Change the copy_propagation_state() calls that
used the convenience constructor to use a static member function
instead.

Since copy_propagation_state is expected to be heap allocated, this
change is a good fit.

Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107305

6 years agov3d: Implement a small immediates optimization, based on VC4's.
Eric Anholt [Fri, 20 Jul 2018 21:27:09 +0000 (14:27 -0700)]
v3d: Implement a small immediates optimization, based on VC4's.

We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).

total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs:     82711 -> 80163 (-3.08%)

6 years agov3d: Return an invalid src number if asked for a missing implicit uniform.
Eric Anholt [Fri, 20 Jul 2018 21:06:57 +0000 (14:06 -0700)]
v3d: Return an invalid src number if asked for a missing implicit uniform.

Sometimes when iterating over sources, we might want to check if it's the
implicit one.  We wouldn't want to match on a non-implicit src using this
function.

6 years agov3d: Skip emitting texture config parameter 2 if it's just the defaults.
Eric Anholt [Fri, 20 Jul 2018 20:31:49 +0000 (13:31 -0700)]
v3d: Skip emitting texture config parameter 2 if it's just the defaults.

shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs:     20702 -> 20195 (-2.45%)

6 years agov3d: Update an XXX comment for a path we handled in HW on V3D 4.x.
Eric Anholt [Fri, 20 Jul 2018 20:24:53 +0000 (13:24 -0700)]
v3d: Update an XXX comment for a path we handled in HW on V3D 4.x.

6 years agov3d: Switch to using the new SFU instructions on V3D 4.x.
Eric Anholt [Fri, 20 Jul 2018 20:06:50 +0000 (13:06 -0700)]
v3d: Switch to using the new SFU instructions on V3D 4.x.

These instructions let us write directly to the phys regfile, instead of
just R4.  That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.

There is still an extra instruction of latency, which is not represented
in the scheduler at the moment.  If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value.  That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.

total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs:     82590 -> 78196 (-5.32%)

6 years agov3d: Add QPU pack/unpack for the new SFU instructions.
Eric Anholt [Fri, 20 Jul 2018 19:19:36 +0000 (12:19 -0700)]
v3d: Add QPU pack/unpack for the new SFU instructions.

These instructions allow writing the result to any register, instead of a
special writeback to r4.

6 years agov3d: Fix the name of the "flpop" operation.
Eric Anholt [Fri, 20 Jul 2018 19:43:37 +0000 (12:43 -0700)]
v3d: Fix the name of the "flpop" operation.

Noticed while trying to sort a new op into the appropriate place to match
the documentation.

6 years agov3d: Print the instruction we're testing in the QPU disasm/pack round-trip.
Eric Anholt [Fri, 20 Jul 2018 19:29:39 +0000 (12:29 -0700)]
v3d: Print the instruction we're testing in the QPU disasm/pack round-trip.

If we fail initial disassembly, it's good to know what instruction it was
that failed.

6 years agov3d: Drop unused vir_SAT() operation.
Eric Anholt [Fri, 20 Jul 2018 19:10:08 +0000 (12:10 -0700)]
v3d: Drop unused vir_SAT() operation.

We lower saturates in NIR.

6 years agov3d: Rotate through registers to improve post-RA scheduling options.
Eric Anholt [Fri, 20 Jul 2018 19:05:57 +0000 (12:05 -0700)]
v3d: Rotate through registers to improve post-RA scheduling options.

Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier.  I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.

shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs:     77254 -> 76092 (-1.50%)

6 years agov3d: Allow reading from physical regs written in the previous instruction.
Eric Anholt [Fri, 20 Jul 2018 18:53:25 +0000 (11:53 -0700)]
v3d: Allow reading from physical regs written in the previous instruction.

This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.

shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs:     48520 -> 47234 (-2.65%)