git.libre-soc.org Git - mesa.git/log

i965/fs: Make emit_shader_time_end() insert before EOT.

Previously, we emitted the shader-time epilogue from emit_fb_writes(),
during the middle of looping through color regions (or emit_urb_writes
for the VS).  This is duplicated several times and rather awkward.

I need to fix a bug in our FB write handling, and it will be a lot
easier if we move emit_shader_time_end() out of there.

Now, we simply emit FB writes/URB writes, and subsequently have
emit_shader_time_end() insert instructions before the final SEND with
EOT.  Not only is this simpler, it's actually a slight improvement:
we now include the MOVs to set up the final FB write payload in our
shader-time measurements.

Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses
send-from-GRF.  (In the past, we might have hit trouble where both
attempt to use MRFs for messages; that's not a problem now.)

v2: Rebase on v3 of the previous patch and other shader_time fixes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1]
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org

i965/fs: Make get_timestamp() pass back the MOV rather than emitting it.

This makes another part of the INTEL_DEBUG=shader_time code emittable
at arbitrary locations, rather than just at the end of the instruction
stream.

v2: Don't lose smear! Caught by Topi Pohjolainen.
v3: Don't set smear on the destination of the MOV. Thanks Topi!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org

i965/fs: Make emit_shader_time_write return rather than emit.

Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)).
The advantage is that we can also insert a shader time write at an
arbitrary location in the instruction stream, rather than being
restricted to emitting at the end.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org

i965/fs: Set smear on shader_time diff register.

The ADD(diff, diff, fs_reg(-2u)) instruction reads diff, which is a
width 1 register. We need to read it as <0,1,0> with a subreg of 0,
which is what smear accomplishes.

Fixes assertion:
brw_eu_emit.c:285: validate_reg: Assertion `hstride == 0' failed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org

i965/fs: Set force_writemask_all on shader_time instructions.

These computations don't have anything to do with the currently
executing channels, so they should use force_writemask_all.

This fixes assert failures.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org

r600g: Use R600_MAX_VIEWPORTS instead of 16

Lets define R600_MAX_VIEWPORTS instead of using 16 here and there
in the code when looping through viewports and scissors. It is
easier to understand what this number represents.

v2: Missed a case where R600_MAX_VIEWPORTS should have been used.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

i915: Remove unused IS_GEN2 macro

Inspired by Damien's recent libdrm changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i915: Remove (mostly) unused IS_915 macro

Inspired by Damien's recent libdrm changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i915: Remove (mostly) unused IS_PNV, IS_PNVG, and IS_PNVGM macros

Inspired by Damien's recent libdrm changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i915: Remove IS_9XX macro

Since the i915 / i965 split, IS_9XX just means IS_GEN3. Inspired by
Damien's recent libdrm changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i915: Remove unused IS_MOBILE macro

Inspired by Damien's recent libdrm changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965: Don't write past the end of the application supplied buffer

Both the AMD and Intel APIs provide a dataSize parameter, and this
function would merrily ignore it.  Neither API specifies what to do when
the buffer isn't big enough.  I take the easy route of writing all the
complete bits of data that will fit.  With more complete specs, we could
probably do something different.

I noticed this while looking into an unused parameter warning.  The
warning was actually useful!

brw_performance_monitor.c: In function 'brw_get_perf_monitor_result':
brw_performance_monitor.c:1261:37: warning: unused parameter 'data_size' [-Wunused-parameter]
                             GLsizei data_size,
                                     ^

v2: Fix checks to include offset in the calculation.  Noticed by Jan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>

i965: Silence unused parameter warning

All dd functions take a gl_context as the first parameter. Instead of
removing it, just silence the warning.

brw_performance_monitor.c: In function 'brw_new_perf_monitor':
brw_performance_monitor.c:1354:41: warning: unused parameter 'ctx' [-Wunused-parameter]
brw_new_perf_monitor(struct gl_context *ctx)
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965: Silence many 'static' is not at beginning of declaration warnings

What a useful warning. #ThanksGCC

brw_performance_monitor.c:153:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen5_raw_chaps_counters[] = {
^
brw_performance_monitor.c:185:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen5_oa_snapshot_layout[] =
^
brw_performance_monitor.c:221:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen5_groups[] = {
^
brw_performance_monitor.c:240:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen6_raw_oa_counters[] = {
^
brw_performance_monitor.c:281:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen6_oa_snapshot_layout[] =
^
brw_performance_monitor.c:317:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen6_statistics_counters[] = {
^
brw_performance_monitor.c:332:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen6_statistics_register_addresses[] = {
^
brw_performance_monitor.c:346:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen6_groups[] = {
^
brw_performance_monitor.c:356:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen7_raw_oa_counters[] = {
^
brw_performance_monitor.c:402:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen7_oa_snapshot_layout[] =
^
brw_performance_monitor.c:470:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen7_statistics_counters[] = {
^
brw_performance_monitor.c:493:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen7_statistics_register_addresses[] = {
^
brw_performance_monitor.c:515:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen7_groups[] = {
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965/fs: Silence unused parameter warning

I don't this opt_cmod_propagation_local ever used the fs_visitor.

brw_fs_cmod_propagation.cpp:52:40: warning: unused parameter 'v' [-Wunused-parameter]
opt_cmod_propagation_local(fs_visitor *v, bblock_t *block)
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965/fs: Silence unused parameter warning

Unused since b18fd23.

brw_fs.cpp:2878:44: warning: unused parameter 'dispatch_width' [-Wunused-parameter]
clear_deps_for_inst_src(fs_inst *inst, int dispatch_width, bool *deps,
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965/fs: Silence unused parameter warning

brw_fs_visitor.cpp:2162:56: warning: unused parameter 'offset_components' [-Wunused-parameter]
fs_reg offset_value, unsigned offset_components,
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

main: Add entry point for TextureBufferRange.

v2: Review by Martin Peres
- Get rid of difficult-to-follow code copied and pasted from
the original TexBufferRange

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Add check_texture_buffer_target.

Creates a shared function to ensure that texture buffer target is
GL_TEXTURE_BUFFER. Helps to clean up the Tex[ture]Buffer[Range] functions.

v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Add check_texture_buffer_range.

Creates a shared function that TexBufferRange and TextureBufferRange can use
to check the buffer range. This cleans up TexBufferRange considerably.

v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Cosmetic changes for Texture Buffers.

Adds a useful comment and some whitespace. Fixes an error message.

v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Refactor _mesa_texture_buffer_range.

Changes how the caller is identified in error messages, moves a check for
ARB_texture_buffer_object from the entry points to the shared code in
_mesa_texture_buffer_range, and removes an unused argument (GLenum target).

v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Use _mesa_lookup_bufferobj_err to simplify Tex[ture]Buffer[Range].

v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
- Closing curly brace on the same line as else

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Add utility function _mesa_lookup_bufferobj_err.

This function is exposed to mesa driver internals so that texture buffer
objects and array objects can use it.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Checking for cube completeness in GetCompressedTextureImage.

v2: Review from Anuj Phogat
   - Remove redundant copies of the cube map block comment
   - Replace redundant "if (!texImage) return;" statements with
     assert(texImage)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Add TEXTURE_CUBE_MAP support for glCompressedTextureSubImage3D.

v2: Review from Anuj Phogat
   - Remove redundant copies of the cube map block comment
   - Replace redundant "if (!texImage) return;" statements with
     assert(texImage)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: assert(texImage) in ARB_DSA texture cube map functions.

ARB_direct_state_access functions that deal with texture cube
maps need to make sure that texture images are not NULL before operating on
them. In the following cases, the error check functions already throw an
error if texImage == NULL, so an assert can be raised instead.

v2: Review from Anuj Phogat
- Replace redundant "if (!texImage) return;" statements with
assert(texImage)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Remove redundant copy of cube map block comment in GetTextureImage.

The comment describing why ARB_direct_state_access texture cube map functions
use _mesa_cube_level_complete is very long. To save room in the files,
readers are now referred to one central comment on texturesubimage in
teximage.c.

v2: Review from Anuj Phogat
- Remove redundant copies of the cube map block comment

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: Remove redundant NumLayers checks.

ARB_direct_state_access texture functions that operate on cube maps no longer
need to verify that cube map texture objects contain six texture images
because _mesa_cube_level_complete now does that for them.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

main: _mesa_cube_level_complete checks NumLayers.

_mesa_cube_level_complete now verifies that a cube map texture object actually
has six texture images before proceeding.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

r300g: fix sRGB->sRGB blits

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>

r300g: fix a crash when resolving into an sRGB texture

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>

r300g: use memset for clearing the shader key

r300g: remove the broken SNORM->UNORM shader lowering pass

Not used anymore.

r300g: fix RGTC1 and LATC1 SNORM formats

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>

r300g: Fix the ATI1N swizzle (RGTC1 and LATC1)

This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01
test as well as the precision part of Wine's 3dc format test (fd.o bug
89156).

The Z component seems to contain a lower precision version of the
result, probably a temporary value from the decompression computation.
The Y and W component contain different data that depends on the input
values as well, but I could not make sense of them (Not that I tried
very hard).

GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in
piglit, and both formats are affected by a compiler bug if they're
sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx,
which returns random garbage.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>

radeonsi: Add additional information to shader dumps

This adds SGPR count, VGPR count, shader size, LDS size, and scratch
usage to shader dumps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi/compute: Use value from compiler for COMPUTE_PGM_RSRC1.FLOAT_MODE

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

clover: Return the minimum required value for CL_DEVICE_SINGLE_FP_CONFIG v2

This means dropping CL_FP_DENORM from the current return value.

v2:
- Add comments about minimum values for OpenCL 1.2.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>

freedreno/ir3: get the # of miplevels from getinfo

This fixes ARB_texture_query_levels to actually return the desired
value.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>

freedreno/ir3: fix array count returned by TXQ

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>

freedreno: move fb state copy after checking for size change

Fixes: 1f3ca56b ("freedreno: use util_copy_framebuffer_state()")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>

nir: Make the printer include nir_variable::location too.

Being able to see both location and driver_location can be useful when
debugging IO mistakes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

i965/fs: Implement SIMD16 dual source blending.

From the SNB PRM, volume 4, part 1, page 193:

"The dual source render target messages only have SIMD8 forms due to
maximum message length limitations. SIMD16 pixel shaders must send two of
these messages to cover all of the pixels. Each message contains two colors
(4 channels each) for each pixel in the message payload."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82831
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Only do gl_FrontFacing workaround in glsl_to_nir for the FS.

Vertex shaders can have shader inputs where location happens to be
VARYING_SLOT_FACE. Without predicating this on the shader stage,
we suddenly end up with load_front_face intrinsics in vertex shaders,
which is nonsensical.

Fixes spec/arb_vertex_buffer_object/pos-array when using NIR for VS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Plumb the shader stage into glsl_to_nir().

The next commit needs to know the shader stage in glsl_to_nir().
To facilitate that, we pass the gl_shader rather than the raw exec_list
of instructions. This has both the exec_list and the stage.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Add native_integers to nir_shader_compiler_options.

glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the
driver supports native integers. Presumably other passes may as well.

Adding this to nir_shader_compiler_options is an easy way to provide
that information, as it's accessible via nir_shader::options.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Try to make sense of the nir_shader_compiler_options code.

The code in glsl_to_nir is entirely dead, as we translate from GLSL to
NIR at link time, when there isn't a _mesa_glsl_parse_state to pass,
so every caller passes NULL.

glsl_to_nir seems like the wrong place to try and create the shader
compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other
translators all would have to duplicate that code.  The driver should
set this up once with whatever settings it wants, and pass it in.

Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[]
and left a comment saying: "The memory for the options is expected to be
kept in a single static copy by the driver."  This suggests the plan was
to do exactly that.  That pointer was not marked const, however, and the
dead code used a mix of static structures and ralloced ones.

This patch deletes the dead code in glsl_to_nir, instead making it take
the shader compiler options as a mandatory argument.  It creates an
(empty) options struct in the i965 driver, and makes NirOptions point
to that.  It marks the pointer const so that we can actually do so
without generating "discards const qualifier" compiler warnings.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir: Delete nir_shader::user_structures and num_user_structures.

Nothing actually uses these, and the only caller of glsl_to_nir()
(brw_fs_nir.cpp) always passes NULL for the _mesa_glsl_parse_state
pointer, meaning they'll always be NULL and 0, respectively.

Just delete them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

glsl: Mark array access when copying to a temporary for the ?: operator.

Piglit's spec/glsl-1.20/compiler/structure-and-array-operations/
array-selection.vert test contains the following code:

   gl_Position = (pick_from_a_or_b ? a : b)[i];

where "a" and "b" are uniform vec4[2] variables.

ast_to_hir creates a temporary vec4[2] variable, conditional_tmp, and
generates an if-block to copy one or the other:

   (declare (temporary) (array vec4 2) conditional_tmp)
   (if (var_ref pick_from_a_or_b)
     ((assign () (var_ref conditional_tmp) (var_ref a)))
     ((assign () (var_ref conditional_tmp) (var_ref b))))

However, we failed to update max_array_access for "a" and "b", so it
remained 0 - here, the whole array is being accessed.  At link time,
update_array_sizes() used this bogus information to change the types
of "a" and "b" to vec4[1].  We then had assignments from a vec4[1] to
a vec4[2], which is highly illegal.

This tripped assertions in nir_split_var_copies with scalar VS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: mesa-stable@lists.freedesktop.org

i965/nir: Resolve source modifiers on Gen8+ logic operations.

On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and
negate changes meaning to bitwise-not (~, not -). This isn't what NIR
expects, so we should resolve the source modifers via a MOV.

+30 Piglits (fs-op-bit{and,or,xor}-not-abs-*).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

st/mesa: drop unused texture function

This has no users.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>

mesa/st: remove unused TexData

this isn't hooked up to anything at all from what I can see.

Seems like a left over from commit 5d67d4fbebb(st/mesa: remove
st_TexImage(), use core Mesa code instead).

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

freedreno: replace glsl130 debug flag with glsl120

Now that relative-dst works, we should never fall back to the old
compiler. (Which is almost true, other than a couple edge case sched
fails in piglit).

So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with
a glsl120 flag to force GLSL 120 and !integers.

If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a
workaround and please let me know.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

gallium/docs: add some freedreno compiler docs

Enable the 'sphinx.ext.graphviz' extension, and add in a section for
driver specific docs, with freedreno compiler docs beneath.  The
goal is for more complete compiler docs, and hopefully some docs about
other parts of the driver (such as how tiling works, etc).

Note that there is also a Distribution -> Drivers section.  Although
that appears to be simply just a list of drivers.  Not sure if that
should move under the 'Drivers' section or left alone.  I did add a
one-line section for freedreno in the existing Distribution -> Drivers
section.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: relative dst

To simplify RA, assign arrays that are written to first. Since enough
dependency information is in the graph to preserve order of reads and
writes of array, so all SSA names for the array collapse into one, just
assign the entire thing by array-id.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: split out array_fanin() helper

We'll need this too for relative dst..

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: drop deref nodes

The meta-deref instruction doesn't really do what we need for relative
destination.  Instead, since each instruction can reference at most a
single address value, track the dependency on the address register via
instr->address.  This lets us express the dependency regardless of
whether it is used for dst and/or src.

The foreach_ssa_src{_n} iterator macros now also iterates the address
register so, at least in SSA form, the address register behaves as an
additional virtual src to the instruction.  Which is pretty much what
we want, as far as scheduling/etc.

TODO:
For now, the foreach_src{_n} iterators are unchanged.  We could wrap
the address in an ir3_register and make the foreach_src_{_n} iterators
behave the same way.  But that seems unnecessary at this point, since
we mainly care about the address dependency when in SSA form.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: helpful iterator macros

I remembered that we are using c99.. which makes some sugary iterator
macros easier. So introduce iterator macros to iterate all src
registers and all SSA src instructions. The _n variants also return
the src #, since there are a handful of places that need this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: fix register usage calculations

For cat1 instructions, use reg() as well for relative src, to ensure
proper accounting of register usage. Also, for relative instructions,
use reg->size rather than reg->wrmask to determine the number of
components read/written.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: couple tweaks for cmdline compiler

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: split up ssa_dst

And a couple other trivial renames, to prepare for relative dst.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: fix failed assert in grouping

Turns out there are scenarios where we need to insert mov's in "front"
of an input.  Triggered by shaders like:

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], GENERIC[9]
  DCL SAMP[0]
  DCL TEMP[0], LOCAL
    0: MOV TEMP[0].xy, IN[1].xyyy
    1: MOV TEMP[0].w, IN[1].wwww
    2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY
    3: MOV OUT[1], TEMP[0]
    4: MOV OUT[0], IN[0]
    5: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>

c99_alloca.h: Also use <alloca.h> for cygwin

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>

i915: Fix GCC unused-variable warning in release build.

i915_debug_fp.c: In function ‘i915_disassemble_program’:
i915_debug_fp.c:302:11: warning: unused variable ‘size’ [-Wunused-variable]
GLuint size = program[0] & 0x1ff;
^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>

r300g: Fix build, invalid extern "C" around header inclusion.

A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in r300 files.

When the helper to detect this issue was pushed to master, it broke
the build for the r300 driver. This patch fixes the r300 build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

nouveau: Fix build, invalid extern "C" around header inclusion.

A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in nouveau files.

When the helper to detect this issue was pushed to master, it broke
the build for the nouveau driver. This patch fixes the nouveau build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

nv50,nvc0: remove bogus 64_FLOAT formats

There is no HW support for these and the VBO pusher doesn't know about
them. No need to, either, since the st will be lowering them to 2x32.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

docs: add news item and link release notes for mesa 10.4.6/10.5.0

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

docs: Add sha256 sums for the 10.5.0 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 0d3e4ed1349565dea8e6c5139400d7441b8ffdca)

docs: Update 10.5.0 release notes

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 97357d475fc8cbb5dbe7bf17ca41f535827fb253)

docs: Add sha256 sums for the 10.4.6 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit fc9dd495b2adbd329d6b58cd611d2acd8ac3070a)

Add release notes for the 10.4.6 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 542a754524a2b149c178a2f70c05b292c7228fc2)

ilo: clarify valid and preferred tilings

We did it right until the switch to gen_surface_tiling, which has
GEN8_TILING_W. Generally, GEN8_TILING_W may be valid but not preferred.

ilo: clean up Gen6 WAs

Add a help function for each WA and make PIPE_CONTROL flags match the WA
descriptions. Call gen6_wa_pre_pipe_contro() only before PIPE_CONTROLs.
Fix missing gen6_wa_pre_3dstate_vs_toggle() in the rectlist path.

ilo: add generic ilo_render_3dprimitive()

It replaces gen[6-8]_3dprimitive().

ilo: add generic ilo_render_pipe_control()

It replaces gen[6-8]_pipe_control() and a direct gen6_PIPE_CONTROL() call in
ilo_render_emit_flush().

ilo: fix padding of linear sampler views

Should use the temporary variable in the loop instead of layout->bo_height.

ilo: do not check for interleaved_samples

interleaved_samples is only zero-initialized when layout_want_mcs() is called.
We should not check for it. There is also no need to.

Revert "egl/main: use c11/threads' mutex directly"

This reverts commit 6cee785c69a5c8d2d32b6807f9c502117f5a74b0.

Not meant to go in yet. Lacking review.

Revert "egl/main: convert thread management to use c11 threads"

This reverts commit 33eff853363d7eba5e61b00431b95f7aa0d7b0a5.

Not meant to go in yet. Lacking review.

Revert "configure: require pthreads for POSIX builds"

This reverts commit 50714cec2b50c7836841c09f04bfe875de00ae1d.

Not meant to go in yet. Lacking review.

Revert "glx: remove final reference to THREADS"

This reverts commit 8b15a883e0ba72c9156d7192a798bb272e0bc528.

Not meant to go in yet. Lacking review.

Revert "glx: remove support for non-multithreaded platforms"

This reverts commit 38591295cd4b68f89f257b20f476f98de3772a47.

Not meant to go in yet. Lacking review.

glx: remove unneeded ifdef _WIN32 guard

The C99 header exists on other platforms as well.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

util: rework _MSC_VER >= 1200 checks

Replace the _MSC_VER >= 1200 with defined (_MSC_VER) and compact if/else
statements. We require MSVC 2008 or later with commit 46110c5d564.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

glx: remove support for non-multithreaded platforms

Implicitly required for a while, although commit 9385c592c68 (mapi:
remove u_thread.h) was the one that put the final nail on the
coffin.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

glx: remove final reference to THREADS

Left over from commit 18db13f5865(mapi: THREADS was always defined,
remove it)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

configure: require pthreads for POSIX builds

This has been an implicit rule for building mesa for a long time. Let's
make it official and just bail out at configure time. This way we can
cleaning up some of our glx code.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

egl/main: convert thread management to use c11 threads

Convert the code to use the C11 threads implementation, and nuke the
Windows non-pthreads code-path. The c11/threads_win32.h abstraction
should be better than the current code.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

egl/main: use c11/threads' mutex directly

Remove the inline wrappers/abstraction layer.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

include: Add helper header to help trap includes inside extern C.

This is just to help repro and fixing these issues with any C++ compiler --

Commiting this will of course wait until all issues are addressed.

$ scons src/glsl/
scons: Reading SConscript files ...
Checking for GCC ...  yes
Checking for Clang ...  no
Checking for X11 (x11 xext xdamage xfixes glproto >= 1.4.13)... yes
Checking for XCB (x11-xcb xcb-glx >= 1.8.1 xcb-dri2 >= 1.8)... yes
Checking for XF86VIDMODE (xxf86vm)... yes
Checking for DRM (libdrm >= 2.4.38)... yes
Checking for UDEV (libudev >= 151)... yes
warning: LLVM disabled: not building llvmpipe
scons: done reading SConscript files.
scons: Building targets ...
scons: building associated VariantDir targets: build/linux-x86_64-debug/glsl
  Compiling src/glsl/ast_array_index.cpp ...
  Compiling src/glsl/ast_expr.cpp ...
  Compiling src/glsl/ast_function.cpp ...
  Compiling src/glsl/ast_to_hir.cpp ...
  Compiling src/glsl/ast_type.cpp ...
  Compiling src/glsl/builtin_functions.cpp ...
In file included from include/c99_compat.h:28:0,
                 from src/mapi/u_compiler.h:4,
                 from src/mapi/u_thread.h:47,
                 from src/mapi/glapi/glapi.h:47,
                 from src/mesa/main/mtypes.h:42,
                 from src/mesa/main/errors.h:47,
                 from src/mesa/main/imports.h:41,
                 from src/mesa/main/core.h:44,
                 from src/glsl/builtin_functions.cpp:58:
include/no_extern_c.h:48:1: error: template with C linkage
template<class T> class _IncludeInsideExternCNotPortable;
^
In file included from include/c99_compat.h:28:0,
                 from include/c11/threads.h:38,
                 from src/mapi/u_thread.h:49,
                 from src/mapi/glapi/glapi.h:47,
                 from src/mesa/main/mtypes.h:42,
                 from src/mesa/main/errors.h:47,
                 from src/mesa/main/imports.h:41,
                 from src/mesa/main/core.h:44,
                 from src/glsl/builtin_functions.cpp:58:
include/no_extern_c.h:48:1: error: template with C linkage
template<class T> class _IncludeInsideExternCNotPortable;
^
  Compiling src/glsl/builtin_types.cpp ...
  Compiling src/glsl/builtin_variables.cpp ...
scons: *** [build/linux-x86_64-debug/glsl/builtin_functions.os] Error 1
scons: building terminated because of errors.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>

i965: free scratch buffers when destroying the context

If scratch space is needed for a shader stage we try to reuse the last scratch
buffer bound to that stage. If we can't, we free the old scratch buffer and
allocate a new one. This means we always keep the last scratch buffer for a
particular shader stage around for the entire life span of the context.

These buffers are being reported by Valgrind as definitely lost after
destroying the OpenGL context. For example, for the geometry shader stage:

==18350== 248 bytes in 1 blocks are definitely lost in loss record 85 of 150
==18350==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18350==    by 0xA1B35D6: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem.c:724)
==18350==    by 0xA1B383F: drm_intel_gem_bo_alloc (intel_bufmgr_gem.c:794)
==18350==    by 0xA1AEFA3: drm_intel_bo_alloc (intel_bufmgr.c:52)
==18350==    by 0x9D08E31: brw_get_scratch_bo (brw_program.c:226)
==18350==    by 0x9D2A0F2: do_gs_prog (brw_vec4_gs.c:280)
==18350==    by 0x9D2A635: brw_gs_precompile (brw_vec4_gs.c:401)
==18350==    by 0x9D14F68: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_shader.cpp:76)
==18350==    by 0x9D157B8: brw_link_shader (brw_shader.cpp:269)
==18350==    by 0x9B0941E: _mesa_glsl_link_shader (ir_to_mesa.cpp:3038)
==18350==    by 0x99AE4ED: link_program (shaderapi.c:917)
==18350==    by 0x99AF365: _mesa_LinkProgram (shaderapi.c:1385)

So make sure that by the time we destroy the context we check if we have live
scratch buffers for the various stages and release them if that is the case.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965: Fix URB size for CHV

Increase the device info .urb.size for CHV to match the default URB
size (192kB).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

configure: Introduce new output variable to ax_check_python_mako_module.m4

This output variables gives more flexibility for future changes
in autoconf to detect if it is needed to auto-generate files and
check for the auto-generation dependencies.

It is still returning error when Python is not installed.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>

i965/vec4: Don't lose the saturate modifier in copy propagation.

Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89224
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/vec4: Handle saturate in dump_instruction().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

ilo: enable L3 cache in MOCS

This enables L3 cache in MOCS almost everywhere.

ilo: track if a ilo_view_surface is a scanout

Scanouts require a different cache type.

ilo: clean up SURFACE_STATE and BINDING_TABLE_STATE

Add ilo_builder_surface_pointer() to replace ilo_builder_surface_write().
Make Gen8+ take a different path in gen6_SURFACE_STATE().