git.libre-soc.org Git - mesa.git/log

projects / mesa.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Ben Crocker [Mon, 27 Nov 2017 19:44:58 +0000 (14:44 -0500)]

docs/llvmpipe: document ppc64le as alternative architecture to x86.

Power8, Power8NV, and Power9 are supported on an equal footing
with X86.

Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Eric: changed formatting, reworded a bit (with Ben's ack)]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>

commit | commitdiff | tree

Emil Velikov [Fri, 8 Dec 2017 13:59:27 +0000 (13:59 +0000)]

docs/release-calendar: drop 17.3.0 from the table

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

commit | commitdiff | tree

Emil Velikov [Fri, 8 Dec 2017 13:56:01 +0000 (13:56 +0000)]

docs: add news item and link release notes for 17.3.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

commit | commitdiff | tree

Emil Velikov [Fri, 8 Dec 2017 13:53:30 +0000 (13:53 +0000)]

docs: add sha256 checksums for 17.3.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 49a612d1580b3316392273a069d20d93967126a8)

commit | commitdiff | tree

Emil Velikov [Fri, 8 Dec 2017 13:47:33 +0000 (13:47 +0000)]

docs: Update 17.3.0 release notes

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 8d55da9f579463038f4305ed7d505aa7fffa0f37)

commit | commitdiff | tree

Samuel Pitoiset [Fri, 1 Dec 2017 15:15:40 +0000 (16:15 +0100)]

radv: do not print ASM to stderr when dumping shaders

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 6 Dec 2017 11:06:43 +0000 (12:06 +0100)]

radv/winsys: implement query_value()

Might be useful to know the VRAM/GTT usage, the number of VRAM
CPU page faults, etc. Nothing is currently using that new
interface, but it's a first step.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 6 Dec 2017 16:49:37 +0000 (17:49 +0100)]

radv: remove useless check radv_set_dcc_need_cmask_elim_pred()

emit_fast_color_clear() already checks that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 6 Dec 2017 16:49:36 +0000 (17:49 +0100)]

radv: remove useless checks in radv_set_{color,depth}_clear_regs()

Already checked by the respective callers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 6 Dec 2017 16:49:20 +0000 (17:49 +0100)]

radv: only re-mit the index type when it changes

dota2 binds a ton of index buffers but the type is always 16-bit.
Note that we have to invalidate the type when switching from
indexed draws to normal draws.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 6 Dec 2017 16:48:41 +0000 (17:48 +0100)]

radv: only reset command buffers that are not in the initial state

dota2 always calls vkResetCommandBuffer() before
vkBeginCommandBuffer() which is quite useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 6 Dec 2017 16:48:40 +0000 (17:48 +0100)]

radv: track different status of a command buffer

RADV_CMD_BUFFER_STATUS_INVALID is not used for now, but I think
it makes sense to declare it. Could be used later with better
command buffer error handling.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 7 Dec 2017 10:39:46 +0000 (11:39 +0100)]

radv: fix TC-compat HTILE with VK_FORMAT_D32_SFLOAT_S8_UINT on Vega

Copied from RadeonSI.

This fixes all CTS
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.clear.*

And some other ones which use the same format.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Jordan Justen [Mon, 20 Nov 2017 21:42:33 +0000 (13:42 -0800)]

docs: Update GL_ARB_get_program_binary docs to support 1 format

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Jordan Justen [Sat, 4 Nov 2017 23:53:15 +0000 (16:53 -0700)]

i965: Add ARB_get_program_binary support using nir_serialization

This resolves an apparent game bug described in 85564. The game
doesn't properly handle ARB_get_program_binary with 0 supported
formats.

V2 (Timothy Arceri):
- less driver code as more has been moved into the common helpers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85564
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Jordan Justen [Tue, 7 Nov 2017 10:11:28 +0000 (02:11 -0800)]

main: Clear shader program data whenever ProgramBinary is called

The GL_ARB_get_program_binary extension spec says:

"If ProgramBinary fails to load a binary, no error is generated, but
  any information about a previous link or load of that program object
  is lost."

v2:
* Re-initialize shProg->data after clear. (Jordan)
   (Required after 6a72eba755fea15a0d97abb913a6315d9d32e274)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Jordan Justen [Sat, 4 Nov 2017 23:47:54 +0000 (16:47 -0700)]

main: add binary support to ProgramBinary

V2: call generic mesa_program_binary() helper rather than driver
function directly to allow greater code sharing.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Jordan Justen [Sat, 4 Nov 2017 23:47:25 +0000 (16:47 -0700)]

main: add binary support to GetProgramBinary

V2: call generic _mesa_get_program_binary() helper rather than driver
function directly to allow greater code sharing.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Jordan Justen [Sat, 4 Nov 2017 23:43:21 +0000 (16:43 -0700)]

main: Support getting GL_PROGRAM_BINARY_LENGTH

V2: call generic _mesa_get_program_binary_length() helper
rather than driver function directly to allow greater
code sharing.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>i (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Jordan Justen [Sat, 4 Nov 2017 23:52:14 +0000 (16:52 -0700)]

mesa: Add Mesa ARB_get_program_binary helper functions

V2 (Timothy Arceri):
- add extra code comment
- stop passing around void *binary and just pass
program_binary_header *hdr instead.
- move to src/mesa/main rather than src/util

V3 (Timothy Arceri):
- Move more code out of the backend and into the common
helpers.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Timothy Arceri [Tue, 28 Nov 2017 03:27:51 +0000 (14:27 +1100)]

mesa: add driver callbacks for serialising ProgramBinary blobs

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Jordan Justen [Tue, 7 Nov 2017 08:21:33 +0000 (00:21 -0800)]

main: Support 1 Mesa format with get for GL_PROGRAM_BINARY_FORMATS

Mesa supports either 0 or 1 formats. If 1 format is supported, it is
GL_PROGRAM_BINARY_FORMAT_MESA as defined in the
GL_MESA_program_binary_formats extension spec.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Jordan Justen [Sat, 4 Nov 2017 23:39:08 +0000 (16:39 -0700)]

main: Allow non-zero NUM_PROGRAM_BINARY_FORMATS

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Jordan Justen [Sat, 4 Nov 2017 00:18:32 +0000 (17:18 -0700)]

i965: Fix memory leak when serializing nir

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Jordan Justen [Fri, 3 Nov 2017 23:57:42 +0000 (16:57 -0700)]

i965: Add brw_program_serialize_nir

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Jordan Justen [Fri, 3 Nov 2017 23:45:46 +0000 (16:45 -0700)]

i965: Free serialized nir after deserializing

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Jordan Justen [Fri, 3 Nov 2017 23:40:17 +0000 (16:40 -0700)]

i965: Add brw_program_deserialize_nir

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Jordan Justen [Mon, 30 Oct 2017 18:16:48 +0000 (11:16 -0700)]

main, glsl: Add UniformDataDefaults which stores uniform defaults

The ARB_get_program_binary extension requires that uniform values in a
program be restored to their initial value just after linking.

This patch saves off the initial values just after linking. When the
program is restored by glProgramBinary, we can use this to copy the
initial value of uniforms into UniformDataSlots.

V2 (Timothy Arceri):
- Store UniformDataDefaults only when serializing GLSL as this
   is what we want for both disk cache and ARB_get_program_binary.
   This saves us having to come back later and reset the Uniforms
   on program binary restores.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Jordan Justen [Fri, 27 Oct 2017 08:04:53 +0000 (01:04 -0700)]

glsl: Split out shader program serialization

This will allow us to use the program serialization to implement
ARB_get_program_binary.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Jordan Justen [Tue, 7 Nov 2017 08:16:47 +0000 (00:16 -0800)]

include: Add GL_MESA_program_binary_formats to GL/GLES2 ext.h files

Thus was merged into the OpenGL Registry in version
667c5a253781834b40a6ae9eb19d05af4542cfe1.

Ref: https://github.com/KhronosGroup/OpenGL-Registry/pull/127
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Jordan Justen [Tue, 28 Nov 2017 00:15:07 +0000 (11:15 +1100)]

mesa: add GL_PROGRAM_BINARY_FORMAT_MESA enum

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Francisco Jerez [Sat, 14 Oct 2017 00:52:00 +0000 (17:52 -0700)]

intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.

This addresses a long-standing back-end compiler bug that could lead
to cross-channel data corruption in loops executed non-uniformly. In
some cases live variables extending through a loop divergence point
(e.g. a non-uniform break) into a convergence point (e.g. the end of
the loop) wouldn't be considered live along all physical control flow
paths the SIMD thread could possibly have taken in between due to some
channels remaining in the loop for additional iterations.

This patch fixes the problem by extending the CFG with physical edges
that don't exist in the idealized non-vectorized program, but
represent valid control flow paths the SIMD EU may take due to the
divergence of logical threads. This makes sense because the i965 IR
is explicitly SIMD, and it's not uncommon for instructions to have an
influence on neighboring channels (e.g. a force_writemask_all header
setup), so the behavior of the SIMD thread as a whole needs to be
considered.

No changes in shader-db.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Francisco Jerez [Mon, 23 Oct 2017 20:47:10 +0000 (13:47 -0700)]

intel/fs: Don't let undefined values prevent copy propagation.

This makes the dataflow propagation logic of the copy propagation pass
more intelligent in cases where the destination of a copy is known to
be undefined for some incoming CFG edges, building upon the
definedness information provided by the last patch.  Helps a few
programs, and avoids a handful shader-db regressions from the next
patch.

shader-db results on ILK:

  total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
  instructions in affected programs: 360 -> 336 (-6.67%)
  helped: 8
  HURT: 0

  LOST:   0
  GAINED: 10

shader-db results on BDW:

  total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
  instructions in affected programs: 7730 -> 7289 (-5.71%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 4

shader-db results on SKL:

  total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
  instructions in affected programs: 10364 -> 9293 (-10.33%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 2

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Francisco Jerez [Thu, 7 Sep 2017 07:26:03 +0000 (00:26 -0700)]

intel/fs: Restrict live intervals to the subset possibly reachable from any definition.

Currently the liveness analysis pass would extend a live interval up
to the top of the program when no unconditional and complete
definition of the variable is found that dominates all of its uses.

This can lead to a serious performance problem in shaders containing
many partial writes, like scalar arithmetic, FP64 and soon FP16
operations.  The number of oversize live intervals in such workloads
can cause the compilation time of the shader to explode because of the
worse than quadratic behavior of the register allocator and scheduler
when running out of registers, and it can also cause the running time
of the shader to explode due to the amount of spilling it leads to,
which is orders of magnitude slower than GRF memory.

This patch fixes it by computing the intersection of our current live
intervals with the subset of the program that can possibly be reached
from any definition of the variable.  Extending the storage allocation
of the variable beyond that is pretty useless because its value is
guaranteed to be undefined at a point that cannot be reached from any
definition.

According to Jason, this improves performance of the subgroup Vulkan
CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
improves by nearly 50x).

No significant change in the running time of shader-db (with 5%
statistical significance).

shader-db results on IVB:

  total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
  cycles in affected programs: 16335482 -> 16159558 (-1.08%)
  helped: 5121
  HURT: 4347

  total spills in shared programs: 1309 -> 1288 (-1.60%)
  spills in affected programs: 249 -> 228 (-8.43%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1652 -> 1597 (-3.33%)
  fills in affected programs: 262 -> 207 (-20.99%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 209

shader-db results on BDW:

  total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
  cycles in affected programs: 23397142 -> 23141100 (-1.09%)
  helped: 8045
  HURT: 6488

  total spills in shared programs: 1456 -> 1252 (-14.01%)
  spills in affected programs: 465 -> 261 (-43.87%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1720 -> 1465 (-14.83%)
  fills in affected programs: 471 -> 216 (-54.14%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 162

shader-db results on SKL:

  total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
  cycles in affected programs: 22560936 -> 22369874 (-0.85%)
  helped: 8457
  HURT: 6247

  total spills in shared programs: 437 -> 437 (0.00%)
  spills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total fills in shared programs: 870 -> 854 (-1.84%)
  fills in affected programs: 16 -> 0
  helped: 1
  HURT: 0

  LOST:   0
  GAINED: 107

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Francisco Jerez [Wed, 6 Dec 2017 19:42:54 +0000 (11:42 -0800)]

intel/fs: Teach instruction scheduler about GRF bank conflict cycles.

This should allow the post-RA scheduler to do a slightly better job at
hiding latency in presence of instructions incurring bank conflicts.
The main purpuse of this patch is not to improve performance though,
but to get conflict cycles to show up in shader-db statistics in order
to make sure that regressions in the bank conflict mitigation pass
don't go unnoticed.

Acked-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Francisco Jerez [Thu, 15 Jun 2017 22:23:57 +0000 (15:23 -0700)]

intel/fs: Implement GRF bank conflict mitigation pass.

Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

   total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
   conflicts in affected programs: 816222 -> 407702 (-50.05%)
   helped: 46234
   HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

   total conflicts in shared programs: 944636 -> 623185 (-34.03%)
   conflicts in affected programs: 853258 -> 531807 (-37.67%)
   helped: 31052
   HURT: 19

And on BDW:

   total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
   conflicts in affected programs: 1179787 -> 748933 (-36.52%)
   helped: 47592
   HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
      for the sake of reviewability.  This is addressed by the next
      squash patch which introduces an amount of (for the most part
      boring) boilerplate that might distract reviewers from the
      non-trivial algorithmic details of the pass.

The following patch is squashed in:

SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.

Acked-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Dylan Baker [Tue, 5 Dec 2017 17:40:03 +0000 (09:40 -0800)]

meson: Fix building gallium media targets with gallium-xlib glx

To demonstrate this bug run meson with the options:
-Ddri-drivers= -Dglx=gallium-xlib

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

commit | commitdiff | tree

Dylan Baker [Mon, 4 Dec 2017 22:03:25 +0000 (14:03 -0800)]

meson: Add lmsensors to gallium libgl-xlib target.

Fixes: 5e71efef44b992b5d70b ("meson: Add lmsensors support")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

commit | commitdiff | tree

Eric Engestrom [Thu, 7 Dec 2017 14:47:46 +0000 (14:47 +0000)]

meson: add dep_thread to every lib that includes threads.h

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

commit | commitdiff | tree

Eric Engestrom [Mon, 4 Dec 2017 15:06:03 +0000 (15:06 +0000)]

meson: fix pl111 dependency on vc4

src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create':
pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly'

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 5 Dec 2017 17:02:08 +0000 (18:02 +0100)]

radv: use a faster version for nir_op_pack_half_2x16

This patch is ported from RadeonSI and it has two effects.

It fixes a rendering issue which affects F1 2017 and Dawn
of War 3 (Vega only) because LLVM was ending up by generating
the new v_mad_mix_{hi,lo} instructions which appear to be
buggy in some way. Not sure if Mesa is generating something
wrong or if the issue is in LLVM only. Anyway, that explains why
the DOW3 issue can't be reproduced with GL on Vega.

It also improves performance because v_cvt_pkrtz_f16 is faster,
and because I guess the rounding mode behaviour is similar between
GL and VK, we can use it. About performance, it improves Talos
by +3/4% but I don't see any other impacts.

No CTS regressions on Polaris.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Alejandro Piñeiro [Thu, 7 Dec 2017 08:38:41 +0000 (09:38 +0100)]

mesa/spirv: move and rename nir_spirv_supported_capabilities

To avoid any vulkan driver to include the GL mtypes.h. Renamed as
eventually this could be used by drivers not using nir.

v2: remove compiler/spirv/spirv.h from mtypes (Alejandro)
v3: added the definition at compiler/shader_info.h (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Vadym Shovkoplias [Mon, 4 Dec 2017 09:47:33 +0000 (11:47 +0200)]

util/disk_cache: Remove unneeded free() on always null string

At this point dc_job->cache_item_metadata.keys always equals
NULL, so call to free() is useless

Fixes: b86ecea3446 ("util/disk_cache: write cache item metadata to disk")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

commit | commitdiff | tree

Samuel Iglesias Gonsálvez [Mon, 20 Nov 2017 12:12:12 +0000 (13:12 +0100)]

spirv: fix bug when OpSpecConstantOp calls a conversion

In that case, nir_eval_const_opcode() will evaluate the conversion
but as it was using destination's bit_size, the resulting
value was just a cast of the source constant value. By passing the
source's bit size, it does the conversion properly.

Fixes:

dEQP-VK.spirv_assembly.instruction.*.opspecconstantop.*convert*

v2:
- Remove invalid conversion op cases.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Samuel Iglesias Gonsálvez [Mon, 20 Nov 2017 11:05:31 +0000 (12:05 +0100)]

spirv: allow specialization constants with bitsize different than 32 bits

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

James Legg [Wed, 6 Dec 2017 11:55:14 +0000 (11:55 +0000)]

nir/opcodes: Fix constant-folding of bitfield_insert

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119
CC: <mesa-stable@lists.freedesktop.org>
CC: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Alex Smith [Wed, 6 Dec 2017 10:28:14 +0000 (10:28 +0000)]

radv: Add LLVM version to the device name string

Allows apps to determine the LLVM version so that they can decide
whether or not to enable workarounds for LLVM issues.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Alejandro Piñeiro [Wed, 6 Dec 2017 10:38:59 +0000 (11:38 +0100)]

mesa: remove set_entry from forward type declarations

This type was used at gl_sync_object, but it is not used anymore.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Kenneth Graunke [Tue, 21 Mar 2017 07:30:06 +0000 (00:30 -0700)]

meta: Fix ClearTexture with GL_DEPTH_COMPONENT.

We only handled unpacking for GL_DEPTH_STENCIL formats.

Cemu was hitting _mesa_problem() for an unsupported format in
_mesa_unpack_float_32_uint_24_8_depth_stencil_row(), because the
format was depth-only, rather than depth-stencil.

Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94739
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103966
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Tue, 5 Dec 2017 19:09:13 +0000 (11:09 -0800)]

meta: Initialize depth/clear values on declaration.

This helps avoid compiler warningss in the next commit - everything
was initialized, but it wasn't obvious to static analysis.

Suggested-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Timothy Arceri [Wed, 6 Dec 2017 23:16:55 +0000 (10:16 +1100)]

glsl: get correct member type when processing xfb ifc arrays

This fixes a crash in:

KHR-GL45.enhanced_layouts.xfb_block_stride

Fixes: 0822517936d4 "glsl: add helper to process xfb qualifiers during linking"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Gert Wollny [Wed, 6 Dec 2017 16:42:02 +0000 (17:42 +0100)]

r600/sb: do not convert if-blocks that contain indirect array access

If an array is accessed within an if block, then currently it is not known
whether the value in the address register is involved in the evaluation of the
if condition, and converting the if condition may actually result in
out-of-bounds array access. Consequently, if blocks that contain indirect array
access should not be converted.

Fixes piglits on r600/BARTS:
spec/glsl-1.10/execution/variable-indexing/
  vs-output-array-float-index-wr
  vs-output-array-vec3-index-wr
  vs-output-array-vec4-index-wr

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Tue, 28 Nov 2017 02:53:02 +0000 (12:53 +1000)]

r600: add support for compute grid/block sizes. (v2)

We just pass these in from outside in a constant buffer.

The shader side stores them once they are accessed once.

v2: fix to not use a temp_reg.

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Mon, 27 Nov 2017 06:12:18 +0000 (16:12 +1000)]

r600: handle image/buffer sizes correctly.

This adds support to compute for the resq workarounds (buffer/cube sizes)

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Fri, 3 Nov 2017 01:47:55 +0000 (11:47 +1000)]

r600/compute: add support for emitting compute image/buffer atoms

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Fri, 3 Nov 2017 01:47:31 +0000 (11:47 +1000)]

r600/compute: handle atomic counters in compute state.

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Fri, 3 Nov 2017 01:44:06 +0000 (11:44 +1000)]

r600/compute: add support for TGSI compute shaders. (v1.1)

This add paths to handle TGSI compute shaders and shader selection.

It also avoids emitting certain things on tgsi paths,
CBs, vertex buffers, config reg init (not required).

v1.1: fix rat mask calc

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Fri, 3 Nov 2017 01:15:26 +0000 (11:15 +1000)]

r600/shader: add compute support to shader assembler

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Fri, 3 Nov 2017 01:35:03 +0000 (11:35 +1000)]

r600/texture: drop lowering 1d/2d images to linear.

This appears to cause hangs with compute images. Unless
we can find more specifics, just don't do this for now.

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Alejandro Piñeiro [Wed, 6 Dec 2017 08:57:18 +0000 (09:57 +0100)]

mesa: define nir_spirv_supported_capabilities

Until now it was part of spirv_to_nir_options. But it will be used on
the implementation of ARB_gl_spirv and ARB_spirv_extensions, and added
to the OpenGL context, as a way to save what SPIR-V capabilities the
current OpenGL implementation supports.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Fredrik Höglund [Tue, 5 Dec 2017 20:19:51 +0000 (21:19 +0100)]

anv: fix a case statement in GetMemoryFdPropertiesKHR

The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.

Fixes: ab18e8e59b6 ("anv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Fredrik Höglund [Tue, 5 Dec 2017 20:15:25 +0000 (21:15 +0100)]

radv: fix a case statement in GetMemoryFdPropertiesKHR

The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.

Fixes: 546e747867c ("radv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Eric Engestrom [Wed, 6 Dec 2017 13:27:52 +0000 (13:27 +0000)]

meson: fix keyword argument in declare_dependency()

`declare_dependency()` takes `compile_args`, not `c_args`.
It was correct in all the other `declare_dependency()` from that commit.

Fixes: 0bbecc5a8548883f76a71 "meson: define driver dependencies"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

commit | commitdiff | tree

Emil Velikov [Wed, 6 Dec 2017 17:33:00 +0000 (17:33 +0000)]

i965: include brw_pipe_control.h in the tarball

Fixes: bfe0f3a7027 ("i965: Move PIPE_CONTROL defines and prototypes to
brw_pipe_control.h.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

commit | commitdiff | tree

Emil Velikov [Thu, 16 Nov 2017 14:22:18 +0000 (14:22 +0000)]

mesa: document _mesa_extension_override_* variables

Currently there are no users of these outside of extensions.c.
Provide some information why they exist and how to use them.

Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>

commit | commitdiff | tree

Emil Velikov [Thu, 23 Nov 2017 16:56:44 +0000 (16:56 +0000)]

docs: annotate MESA_program_debug as obsolete

It has been obsolete for years - state it explicitly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

George Kyriazis [Tue, 5 Dec 2017 16:47:12 +0000 (10:47 -0600)]

swr/scons: Fix another intermittent build failure

gen_BackendPixelRate*.cpp depends on gen_ar_eventhandler.hpp.
Fix missing dependency.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

commit | commitdiff | tree

Marek Olšák [Fri, 1 Dec 2017 02:08:16 +0000 (03:08 +0100)]

radeonsi: make const and stream uploaders allocate read-only memory

and anything that clones these uploaders, like u_threaded_context.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Marek Olšák [Tue, 5 Dec 2017 19:04:11 +0000 (20:04 +0100)]

radeonsi: use a separate allocator for fine fences

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Marek Olšák [Tue, 5 Dec 2017 12:32:47 +0000 (13:32 +0100)]

radeonsi/gfx9: make shader binaries use read-only memory

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Marek Olšák [Tue, 5 Dec 2017 12:32:33 +0000 (13:32 +0100)]

winsys/amdgpu: make IBs use read-only memory

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Marek Olšák [Mon, 4 Dec 2017 22:02:54 +0000 (23:02 +0100)]

radeonsi: print the buffer list for CHECK_VM

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Marek Olšák [Thu, 30 Nov 2017 21:49:10 +0000 (22:49 +0100)]

radeonsi: allow DMABUF exports for local buffers

Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 23 Nov 2017 09:29:49 +0000 (10:29 +0100)]

radeonsi: always place sparse buffers in VRAM

Together with "radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check",
this ensures that sparse buffers are placed in VRAM.

Noticed by an assertion that started triggering with commit d4fac1e1d7
("gallium/radeon: enable suballocations for VRAM with no CPU access")

Fixes KHR-GL45.sparse_buffer_tests.BufferStorageTest in debug builds.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Nicolai Hähnle [Thu, 23 Nov 2017 09:25:34 +0000 (10:25 +0100)]

radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check

The flag is on the pipe_resource, not the r600_resource.

I don't see an obvious bug related to this, but it could potentially lead
to suboptimal placement of some resources.

Fixes: a41587433c4d ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Mon, 24 Jul 2017 20:42:59 +0000 (22:42 +0200)]

i965/fs: Use untyped_surface_read for 16-bit load_ssbo

SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.

But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using only
one message. Once each pair is read it is unshuffled to return the proper
16-bit components. vec3 case is assimilated to vec4 but the 4th component
is ignored.

16-bit scalars are read using one byte_scattered_read message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
    erros (Jason Ekstrand)
    Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
v4: Use subscript insead of chaging type and stride  (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Wed, 12 Jul 2017 12:49:41 +0000 (14:49 +0200)]

i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg

Currently, we use byte-scattered write messages for storing 16-bit
into an SSBO. This is because untyped surface messages have a fixed
32-bit size.

This patch optimizes these 16-bit writes by combining 2 values (e.g,
two consecutive components aligned with 32-bits) into a 32-bit register,
packing the two 16-bit words.

16-bit single component values will continue to use byte-scattered
write messages. The same will happens when the first consecutive
component is not aligned 32-bits.

This optimization reduces the number of SEND messages used for storing
16-bit values potentially by 2 or 4, which cuts down execution time
significantly because byte-scattered writes are an expensive
operation as they only write a component for message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using shuffle 16 write and enable writes
    of 16bit vec4 with only one message of 32-bits. (Chema Casanova)
v3: - Fix coding style (Eduardo Lima)
    - Reorganize code to avoid duplication. (Jason Ekstrand)
    - Include new comments to explain the length calculations to
      fix alignment issues of components. (Jason Ekstrand)
    - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand)
v4: (Jason Ekstrand)
    - Reorganize 64-bit ssbo-writes to avoid using slots_per_component.
    - Comment about why suffle is needed when using byte_scattered_write.

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:32:17 +0000 (08:32 +0200)]

anv: Enable SPV_KHR_16bit_storage and VK_KHR_16bit_storage for SSBO/UBO

Enables SPV_KHR_16bit_storage on gen 8+.

VK_KHR_16bit_storage is enabled for SSBO/UBO using the
VK_KHR_get_physical_device_properties2 functionality to expose
if the extension is supported or not.

v2: update due rebase against master (Alejandro)
v3: (Jason Ekstrand)
    - Move this patch up in VK_KHR_16bit_storage series enabling only
      storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess.
    - Only expose VK_KHR_16bit_storage on Gen8+
v4: (Jason Ekstrand)
    - Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage
      enablement for SSBO/UBO.

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jason Ekstrand [Mon, 20 Nov 2017 23:03:46 +0000 (00:03 +0100)]

i965/fs: Enables 16-bit load_ubo with sampler

load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
surface format defined. So when reading 16-bit components with the
sampler we need to unshuffle two 16-bit components from each 32-bit
component.

Using the sampler avoids the use of the byte_scattered_read message
that needs one message for each component and is supposed to be
slower.

v2: (Jason Ekstrand)
    - Simplify component selection and unshuffling for different bitsizes
    - Remove SKL optimization of reading only two 32-bit components when
      reading 16-bits types.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Mon, 20 Nov 2017 22:10:51 +0000 (23:10 +0100)]

i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components

This helpers are used to load/store 16-bit types from/to 32-bit
components.

The functions shuffle_32bit_load_result_to_16bit_data and
shuffle_16bit_data_for_32bit_write are implemented in a similar
way than the analogous functions for handling 64-bit types.

v1: Explain need of temporary in shuffle operations. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 06:20:07 +0000 (08:20 +0200)]

i965/fs: Use byte scattered read for 16-bit load_ssbo

Used to enable 16-bit reads at do_untyped_vector_read, that is used on
the following intrinsics:

   * nir_intrinsic_load_shared
   * nir_intrinsic_load_ssbo

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: - Add bitsize to scattered read operation (Jason Ekstrand)
    - Remove implementation of 16-bit UBO read from this patch.
    - Avoid assertion at opt_algebraic caused by ADD of two IMM with
      offset with BRW_REGISTER_TYPE_UD type found on matrix tests.
      (Jose Maria Casanova)
v4: (Jason Ekstrand)
    - Put if case for 16-bits at the beginning of the if ladder.
    - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 06:19:17 +0000 (08:19 +0200)]

i965/fs: Add byte scattered read message and fs support

v2: Fix alignment style (Topi Pohjolainen)
    (Jason Ekstrand)
    - Enable bit_size parameter to scattered messages to enable different
      bitsizes byte/word/dword.
    - Remove use of brw_send_indirect_scattered_message in favor of
      brw_send_indirect_surface_message.
    - Move scattered messages to surface messages namespace.
    - Assert align1 for scattered messages and assume Gen8+.
    - Inline brw_set_dp_byte_scattered_read.

v3: (Jason Ekstrand)
    - Use renamed brw_byte_scattered_data_element_from_bit_size method
    - Assert scattered read for Gen8+ and Haswell.
    - Use conditional expresion at components_read.
    - Include comment about params for scattered opcodes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 29 Jul 2017 14:10:02 +0000 (16:10 +0200)]

i965/fs: Predicate byte scattered writes if needed

While on Untyped Surface messages the bits of the execution mask are
ANDed with the corresponding bits of the Pixel/Sample Mask, that is
not the case for byte scattered writes. That is needed to avoid ssbo
stores writing on helper invocations. So when that can affect, we load
the sample mask, and predicate the send message.

Note: the need for this patch was tested with a custom test. Right now
the 16 bit storage CTS tests doesnt need this path in order to get a
full pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:17:09 +0000 (08:17 +0200)]

i965/fs: Use byte_scattered_write on 16-bit store_ssbo

We need to rely on byte scattered writes as untyped writes are 32-bit
size. We could try to keep using 32-bit messages when we have two or
four 16-bit elements, but for simplicity sake, we use the same message
for any component number. We revisit this aproach in the follwing
patches.

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: (Jason Ekstrand)
    - Include bit_size to scattered write message and remove namespace
    - specific for scattered messages.
    - Move comment to proper place.
    - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo.
    (Jose Maria Casanova)
    - Take into account that get_nir_src returns now WORD types for
      16-bit sources instead of DWORD.
v4: (Jason Ekstrand)
    - Rename lenght variable to num_components.
    - Include assertions before emit_untyped_write.
    - Remove type_slot in favor of num_slot and first_slot.

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 06:16:01 +0000 (08:16 +0200)]

i965/fs: Add byte scattered write message and fs support

v2: (Jason Ekstrand)
    - Enable bit_size parameter to scattered messages to enable different
      bitsizes byte/word/dword.
    - Remove use of brw_send_indirect_scattered_message in favor of
      brw_send_indirect_surface_message.
    - Move scattered messages to surface messages namespace.
    - Assert align1 for scattered messages and assume Gen8+.
    - Inline brw_set_dp_byte_scattered_write.
v3: - Remove leftover newline (Topi Pohjolainen)
    - Rename brw_data_size to brw_scattered_data_element and use
      defines instead of an enum (Jason Ekstrand)
    - Assert scattered write for Gen8+ and Haswell (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:14:56 +0000 (08:14 +0200)]

i965/fs: Add remove_extra_rounding_modes optimization

Although from SPIR-V point of view, rounding modes are attached to the
operation/destination, on i965 it is a status, so we don't need to
explicitly set the rounding mode if the one we want is already set.

Taking into account that the default mode is RTE, one possible
optimization would be optimize out the first RTE set for each
block. For in order to work, we would need to take into account block
interrelationships. At this point, it is not worth to complicate the
optimization for such small gain.

v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)
v3: Reset optimization for every block. (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:14:09 +0000 (08:14 +0200)]

i965/fs: Enable rounding mode on f2f16 ops

By default we don't set the rounding mode. We only set
round-to-near-even or round-to-zero mode if explicitly set from nir.

v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)

v3: Use new helper brw_rnd_mode_from_nir_op (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:12:59 +0000 (08:12 +0200)]

i965/fs: Define new shader opcode to set rounding modes

Although it is possible to emit them directly as AND/OR on brw_fs_nir,
having a specific opcode makes it easier to remove duplicate settings
later.

v2: (Curro)
  - Set thread control to 'switch' when using the control register
  - Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
    with the rounding mode.
  - Avoid magic numbers setting rounding mode field at control register.
v3: (Curro)
  - Remove redundant and add missing whitespace lines.
  - Match printing instruction to IR opcode "rnd_mode"

v4: (Topi Pohjolainen)
  - Fix code style.

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 06:11:58 +0000 (08:11 +0200)]

i965: Add support for control register

Control register cr0 in i965 can be used to change the rounding modes
in 32-bit to 16-bit floating-point conversions.

From intel Skylake PRM, vol 07, section "Register and Tegister Regions",
subsection "Control Register" (page 754):

"Subregister cr0.0:ud contains normal operation control fields such as the
floating-point mode ... "

Floating-point Rounding mode is changed at bits 5:4 of cr0.0:

"Rounding Mode. This field specifies the FPU rounding mode. It is
initialized by Thread Dispatch."
  00b = Round to Nearest or Even (RTNE)
  01b = Round Up, toward +inf (RU)
  10b = Round Down, toward -inf (RD)
  11b = Round Toward Zero (RTZ)"

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:11:05 +0000 (08:11 +0200)]

i965/fs: Handle 32-bit to 16-bit conversions

Conversions to 16-bit need having aligment between the 16-bit
and 32-bit types. So the conversion operations unpack 16-bit types
to with an stride=2 and then applies a MOV with the conversion.

v2 (Jason Ekstrand):
  - Avoid the general use of stride=2 for 16-bit register types.

v3 (Topi Pohjolainen)
  - Code style fix
   (Jason Ekstrand)
  - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef
    because conversion to f16 with undefined rounding is explicit

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:08:20 +0000 (08:08 +0200)]

i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type

Note that we don't remove the assert at i965/vec4. At this point half
float support is only for the scalar backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 06:06:45 +0000 (08:06 +0200)]

i965: Support for 16-bit base types in helper functions

v2: Fixed calculation of scalar size for 16-bit types. (Jason Ekstrand)
v3: Fix coding style (Topi Pohjolainen)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 1 Jul 2017 06:06:17 +0000 (08:06 +0200)]

i965/vec4: Handle 16-bit types at type_size_xvec4

These types have similar vec4 sizes as their 32-bit counterparts.

The vec4 backend doesn't support 16-bit types and probably never will,
but this method is called by the scalar backend at
fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4
sizes for 16-bit types. In the future, something different should be
implemented to avoid this dependency.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Eduardo Lima Mitev [Sat, 1 Jul 2017 06:02:45 +0000 (08:02 +0200)]

spirv/nir: Add support for SPV_KHR_16bit_storage

v2: Minor changes after rebase against recent master (Alejandro
Pinheiro)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 06:05:55 +0000 (08:05 +0200)]

spirv: Enable FPRoundingMode decorator to nir operations

SpvOpFConvert now manages the FPRoundingMode decorator for the
returning values enabling the nir_rounding_mode in the conversion
operation to fp16 values.

v2: Fixed breaking of specialization constants. (Jason Ekstrand)

v3: Avoid nir_rounding_mode * casting. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Eduardo Lima Mitev [Sat, 1 Jul 2017 06:04:40 +0000 (08:04 +0200)]

spirv/nir: Handle 16-bit types

v2: Added more missing implementations of 16-bit types. (Jason Ekstrand)

v3: Store values in values[0].u16[i] (Jason Ekstrand)
    Include switches based on bitsize for 16-bit types
    (Chema Casanova)
v4: Coding style fixes (Jason Ekstrand)
    Use vtn_u64_literal and u64[0] at 64-bit SpvOpConstant (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 05:58:26 +0000 (07:58 +0200)]

nir: Handle fp16 rounding modes at nir_type_conversion_op

nir_type_conversion enables new operations to handle rounding modes to
convert to fp16 values. Two new opcodes are enabled nir_op_f2f16_rtne
and nir_op_f2f16_rtz.

The undefined behaviour doesn't has any effect and uses the original
nir_op_f2f16 operation.

v2: Indentation fixed (Jason Ekstrand)

v3: Use explicit case for undefined rounding and assert if
rounding mode is used for non 16-bit float conversions
(Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Eduardo Lima Mitev [Sat, 1 Jul 2017 06:01:21 +0000 (08:01 +0200)]

nir: Populate conversion opcodes to 16-bit types

This will include the following NIR ALU opcodes:
* nir_op_i2i16
* nir_op_i2f16
* nir_op_u2u16
* nir_op_u2f16
* nir_op_f2i16
* nir_op_f2u16
* nir_op_f2f16

v2: Remove "from" 16-bit in commit subject (Topi Pohjolainen)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Sat, 1 Jul 2017 05:56:51 +0000 (07:56 +0200)]

nir: Add rounding modes enum

v2: Added comments describing each of the rounding modes. (Jason
Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Eduardo Lima Mitev [Sat, 1 Jul 2017 05:54:50 +0000 (07:54 +0200)]

nir: Add support for 16-bit types (half float, int16 and uint16)

v2: Renamed glsl_half_float_type() to glsl_float16_t_type().
(Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>