mesa.git
6 years agomeson: fix builds against LLVM built without rtti
Dylan Baker [Mon, 16 Apr 2018 21:47:58 +0000 (14:47 -0700)]
meson: fix builds against LLVM built without rtti

Building without rtti is a frought with peril, but it's something that
autotools supports so we need to support it too.

Since we've moved to version 0.44 as a whole we can use the meson
functionality for accessing random llvm-config options we can check for
rtti and add -fno-rtti to all C++ code accordingly.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agomeson: remove dummy_cpp
Dylan Baker [Mon, 16 Apr 2018 21:40:51 +0000 (14:40 -0700)]
meson: remove dummy_cpp

meson has gotten pretty smart about tracking C and C++ dependencies
(internal and external), and using the right linker. This wasn't always
the case and we created empty c++ files to force the use of the c++
linker. We don't need that any more.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: allow empty sources when using link_whole
Dylan Baker [Mon, 16 Apr 2018 21:39:59 +0000 (14:39 -0700)]
meson: allow empty sources when using link_whole

meson used to get grumpy if the sources list was empty, even when using
--whole-archive (link_whole). In more recent versions that's not true,
so remove the workaround.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: remove workaround for custom target creating .h and .c files
Dylan Baker [Mon, 16 Apr 2018 21:34:35 +0000 (14:34 -0700)]
meson: remove workaround for custom target creating .h and .c files

In more modern versions of meson a custom_target returns an index-able
object. This allows us to create accurate dependency models for targets
that rely only on the header and not on the code from anv_entrypoints.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: raise required version to 0.44.1
Dylan Baker [Fri, 13 Apr 2018 22:05:55 +0000 (15:05 -0700)]
meson: raise required version to 0.44.1

We have already required 0.44 for building clover and swr, so it was
already partially required. This just makes it required across the board
instead of just for clover and swr.

There is a bug in 0.44 which makes it impossible to build mesa in some
configurations, so require 0.44.1 which fixes this.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: fix graw-xlib after auxiliary consolidation
Dylan Baker [Wed, 18 Apr 2018 18:31:31 +0000 (11:31 -0700)]
meson: fix graw-xlib after auxiliary consolidation

This one's completely my fault, I didn't do good enough testing after
rebasing and this got missed.

Fixes: d28c24650110c130008be3d3fe584520ff00ceb1
       ("meson: build graw tests")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: only build mesa_st tests when build-tests is true
Dylan Baker [Wed, 18 Apr 2018 16:29:35 +0000 (09:29 -0700)]
meson: only build mesa_st tests when build-tests is true

Since we have an option to turn test building on and off, we should
honor that.

Fixes: 34cb4d0ebc14663113705beae63dd52b9d1b2d87
       ("meson: build tests for gallium mesa state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: don't build classic mesa tests without dri_drivers
Dylan Baker [Wed, 18 Apr 2018 17:53:27 +0000 (10:53 -0700)]
meson: don't build classic mesa tests without dri_drivers

Since mesa_classic is build-on-demand the tests will create a demand and
add a bunch of extra compilation.

Fixes: 43a6e84927e3b1290f6f211f5dfb184dfe5a719e
       ("meson: build mesa test.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoi965/meta_util: Re-enable sRGB-encoded fast-clears on CNL
Nanley Chery [Fri, 23 Mar 2018 00:05:34 +0000 (17:05 -0700)]
i965/meta_util: Re-enable sRGB-encoded fast-clears on CNL

The paths which sample with the clear color are now using a getter which
performs the sRGB decode needed to enable this fast clear.

This path can be exercised by fast-clearing a texture, then performing
an operation which requires sRGB decoding. Test coverage for this
feature is provided with the following tests:

* Shader texture calls:
  - spec@ext_texture_srgb@tex-srgb

* Shader texelfetch calls:
  - spec@arb_framebuffer_srgb@fbo-fast-clear
  - spec@arb_framebuffer_srgb@msaa-fast-clear

* Blending:
  - spec@arb_framebuffer_srgb@arb_framebuffer_srgb-fast-clear-blend

* Blitting:
  - spec@arb_framebuffer_srgb@blit texture srgb msaa enabled clear

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/miptree: Extend the sRGB-blending WA to future platforms
Nanley Chery [Fri, 30 Mar 2018 05:14:09 +0000 (22:14 -0700)]
i965/miptree: Extend the sRGB-blending WA to future platforms

The blending issue seems to be present on CNL as well.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: Add and use a getter for the clear color
Nanley Chery [Mon, 26 Mar 2018 21:32:18 +0000 (14:32 -0700)]
i965: Add and use a getter for the clear color

It returns both the inline clear color and a clear address which points
to the indirect clear color buffer (or NULL if unused/non-existent).
This getter allows CNL to sample from fast-cleared sRGB textures
correctly by doing the needed sRGB-decode on the clear color (inline)
and making the indirect clear color buffer unused.

v2 (Rafael):
* Have a more detailed commit message.
* Add a comment on the sRGB conversion process.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoutil/srgb: Add a float sRGB -> linear helper
Jason Ekstrand [Fri, 23 Jun 2017 03:00:47 +0000 (20:00 -0700)]
util/srgb: Add a float sRGB -> linear helper

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/wm_surface_state: Use the clear address if clear_bo is non-NULL
Nanley Chery [Tue, 10 Apr 2018 20:56:18 +0000 (13:56 -0700)]
i965/wm_surface_state: Use the clear address if clear_bo is non-NULL

We want to add and use a getter that turns off the indirect path by
returning zero for the clear color bo and offset.

v2: Fix usage of "clear address" in commit message (Jason).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: Add and use a single miptree aux_buf field
Nanley Chery [Fri, 6 Apr 2018 16:54:31 +0000 (09:54 -0700)]
i965: Add and use a single miptree aux_buf field

We want to add and use a function that accesses the auxiliary buffer's
clear_color_bo and doesn't care if it has an MCS or HiZ buffer
specifically.

v2 (Jason Ekstrand):
* Drop intel_miptree_get_aux_buffer().
* Mention CCS in the aux_buf field.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: Add and use a getter for the miptree aux buffer
Nanley Chery [Mon, 9 Apr 2018 18:11:46 +0000 (11:11 -0700)]
i965: Add and use a getter for the miptree aux buffer

Make the next patch easier to read by eliminating most of the would-be
duplicate field accesses now.

v2: Update the HiZ comment instead of deleting it (Rafael).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agogm107/ir/lib: fix sched in div u32 builtin
Karol Herbst [Sun, 22 Apr 2018 20:23:13 +0000 (22:23 +0200)]
gm107/ir/lib: fix sched in div u32 builtin

Imad needs to set a read barrier.

With significant big work groups I was getting wrong results for div u32. Turns
out the issue was with the sched opcodes.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agointel/compiler: Add scheduler deps for instructions that implicitly read g0
Ian Romanick [Mon, 16 Apr 2018 23:32:41 +0000 (16:32 -0700)]
intel/compiler: Add scheduler deps for instructions that implicitly read g0

Otherwise the scheduler can move the writes after the reads.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95009
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95012
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Cc: Clayton A Craft <clayton.a.craft@intel.com>
Cc: mesa-stable@lists.freedesktop.org
6 years agointel/compiler: Silence unused parameter warnings in empty vec4_instruction_scheduler...
Ian Romanick [Wed, 28 Mar 2018 23:45:01 +0000 (16:45 -0700)]
intel/compiler: Silence unused parameter warnings in empty vec4_instruction_scheduler methods

src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::count_reads_remaining(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:764:72: warning: unused parameter ‘be’ [-Wunused-parameter]
 vec4_instruction_scheduler::count_reads_remaining(backend_instruction *be)
                                                                        ^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::setup_liveness(cfg_t*)’:
src/intel/compiler/brw_schedule_instructions.cpp:769:51: warning: unused parameter ‘cfg’ [-Wunused-parameter]
 vec4_instruction_scheduler::setup_liveness(cfg_t *cfg)
                                                   ^~~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::update_register_pressure(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:774:75: warning: unused parameter ‘be’ [-Wunused-parameter]
 vec4_instruction_scheduler::update_register_pressure(backend_instruction *be)
                                                                           ^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:779:80: warning: unused parameter ‘be’ [-Wunused-parameter]
 vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction *be)
                                                                                ^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::issue_time(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:1550:61: warning: unused parameter ‘inst’ [-Wunused-parameter]
 vec4_instruction_scheduler::issue_time(backend_instruction *inst)
                                                             ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel/compiler: Silence unused parameter warning in compile_cs_to_nir
Ian Romanick [Wed, 28 Mar 2018 23:35:10 +0000 (16:35 -0700)]
intel/compiler: Silence unused parameter warning in compile_cs_to_nir

src/intel/compiler/brw_fs.cpp: In function ‘nir_shader* compile_cs_to_nir(const brw_compiler*, void*, const brw_cs_prog_key*, brw_cs_prog_data*, const nir_shader*, unsigned int)’:
src/intel/compiler/brw_fs.cpp:7205:44: warning: unused parameter ‘prog_data’ [-Wunused-parameter]
                   struct brw_cs_prog_data *prog_data,
                                            ^~~~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel/compiler: Silence unused parameter warnings in generate_foo methods
Ian Romanick [Wed, 28 Mar 2018 23:29:45 +0000 (16:29 -0700)]
intel/compiler: Silence unused parameter warnings in generate_foo methods

Since all of the fs_generator::generate_foo methods take a fs_inst * as
the first parameter, just remove the name to quiet the compiler.

src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_barrier(fs_inst*, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:743:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_barrier(fs_inst *inst, struct brw_reg src)
                                         ^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_discard_jump(fs_inst*)’:
src/intel/compiler/brw_fs_generator.cpp:1326:46: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_discard_jump(fs_inst *inst)
                                              ^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_pack_half_2x16_split(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1675:54: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_pack_half_2x16_split(fs_inst *inst,
                                                      ^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_shader_time_add(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1743:49: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_shader_time_add(fs_inst *inst,
                                                 ^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_set_simd4x2_header_gen9(brw_codegen*, brw::vec4_instruction*, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1412:52: warning: unused parameter ‘inst’ [-Wunused-parameter]
                                  vec4_instruction *inst,
                                                    ^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_mov_indirect(brw_codegen*, brw::vec4_instruction*, brw_reg, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1430:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
                       vec4_instruction *inst,
                                         ^~~~
src/intel/compiler/brw_vec4_generator.cpp:1432:63: warning: unused parameter ‘length’ [-Wunused-parameter]
                       struct brw_reg indirect, struct brw_reg length)
                                                               ^~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agobroadcom/vc5: Set up internal_format for imported resources.
Eric Anholt [Thu, 12 Apr 2018 23:29:19 +0000 (16:29 -0700)]
broadcom/vc5: Set up internal_format for imported resources.

Without this, we'd assertion fail in u_transfer_helper when mapping an
imported resource.

6 years agobroadcom/vc5: Assert that created BOs have offset != 0.
Eric Anholt [Thu, 12 Apr 2018 22:20:17 +0000 (15:20 -0700)]
broadcom/vc5: Assert that created BOs have offset != 0.

The kernel shouldn't return a bo at NULL, and the HW special-cases NULL
address values for things like OQs.

6 years agobroadcom/vc5: Don't allocate simulator BOs at offset 0.
Eric Anholt [Thu, 12 Apr 2018 22:19:42 +0000 (15:19 -0700)]
broadcom/vc5: Don't allocate simulator BOs at offset 0.

The kernel won't return us BOs at offset 0 (because things like OQs
wouldn't work there), so we shouldn't in the simulator either.

6 years agobroadcom/vc5: Add sim support for the GET_BO_OFFSET ioctl.
Eric Anholt [Thu, 12 Apr 2018 20:47:52 +0000 (13:47 -0700)]
broadcom/vc5: Add sim support for the GET_BO_OFFSET ioctl.

Otherwise we'd crash immediately upon importing a BO through EGL
interfaces.

6 years agobroadcom/vc5: Treat imports of DRM_FORMAT_MOD_INVALID BOs as linear.
Eric Anholt [Thu, 12 Apr 2018 20:46:24 +0000 (13:46 -0700)]
broadcom/vc5: Treat imports of DRM_FORMAT_MOD_INVALID BOs as linear.

We don't have any kernel metadata about BO tiling, so this probably is all
we should do for the moment.

6 years agoi965: expose MESA_FORMAT_R8G8B8A8_SRGB visual
Tapani Pälli [Mon, 19 Mar 2018 11:41:45 +0000 (13:41 +0200)]
i965: expose MESA_FORMAT_R8G8B8A8_SRGB visual

Exposing the visual makes following dEQP tests pass on Android:

   dEQP-EGL.functional.wide_color.window_8888_colorspace_srgb
   dEQP-EGL.functional.wide_color.pbuffer_8888_colorspace_srgb

Visual is exposed only when DRI_LOADER_CAP_RGBA_ORDERING is set.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agodri: Add __DRI_IMAGE_FORMAT_SABGR8
Tapani Pälli [Mon, 19 Mar 2018 11:41:44 +0000 (13:41 +0200)]
dri: Add __DRI_IMAGE_FORMAT_SABGR8

Add format definition and required plumbing to create images.
Note that there is no match to drm_fourcc definition, just like
with existing _DRI_IMAGE_FOURCC_SARGB8888.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoRevert "st/dri: Fix dangling pointer to a destroyed dri_drawable"
Marek Olšák [Tue, 24 Apr 2018 04:00:20 +0000 (00:00 -0400)]
Revert "st/dri: Fix dangling pointer to a destroyed dri_drawable"

This reverts commit dab02dea3411d325a5aee6cda5b581e61396ecc6.

It causes crashes of qtcreator and firefox.

Fixes: dab02de "st/dri: Fix dangling pointer to a destroyed dri_drawable"
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
6 years agogallivm: dump bitcode before optimization
Roland Scheidegger [Mon, 23 Apr 2018 04:22:45 +0000 (06:22 +0200)]
gallivm: dump bitcode before optimization

If we dump the bitcode for off-line debug purposes, we really want the
pre-optimized bitcode, otherwise it's useless in identifying problems
with IR optimization (if you have a shader which takes an hour to do
IR optimization, it's also nice you don't have to wait that hour...).
Also, print out the function passes for opt which correspond to what
was used for jit compilation (and also the opt level for codegen).
Using opt/llc this way should then pretty much mimic what was done
for jit. (When specifying something like -time-passes
-debug-pass=[Structure|Arguments] (for either opt or llc) that also
gives very useful information in which passes all the time was spent,
and which passes are really run along with the order - llvm will add
passes due to dependencies on its own, and of course -O2 for llc
comes with a ~100 pass list.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agogallivm: (trivial) do division by 1000 with int64
Roland Scheidegger [Mon, 23 Apr 2018 02:52:48 +0000 (04:52 +0200)]
gallivm: (trivial) do division by 1000 with int64

Conversion to int can otherwise overflow if compile times are over
~71min. (Yes this can happen...)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agogallivm: remove LICM pass
Roland Scheidegger [Mon, 23 Apr 2018 02:39:00 +0000 (04:39 +0200)]
gallivm: remove LICM pass

LICM is simply too expensive, even though it presumably can help quite
a bit in some cases.
It was definitely cheaper in llvm 3.3, though as far as I can tell with
llvm 3.3 it failed to do anything in most cases. early-cse also actually
seems to cause licm to be able to move things when it previously couldn't,
which causes noticeable compile time increases.
There's more loop passes in llvm, but I'm not sure which ones are helpful,
and I couldn't find anything which would roughly do what the old licm in
llvm 3.3 did, so ditch it.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agogallivm: add early cse pass
Roland Scheidegger [Mon, 23 Apr 2018 02:32:56 +0000 (04:32 +0200)]
gallivm: add early cse pass

This pass is quite cheap, and can simplify the IR quite a bit for our
generated IR.
In particular on a variety of shaders I've found the time saved by
other passes due to the simplified IR more than makes up for the cost
of this pass, and on top of that the end result is actually better.
The only downside I've found is this enables the LICM pass to move some
things out of the main shader loop (in the case I've seen, instanced
vertex fetch (which is constant within the jit shader) plus the derived
instructions in the shader) which it couldn't do before for some reason.
This would actually be desirable but can increase compile time
considerably (licm seems to have considerable cost when it actually can
move things out of loops, due to alias analysis). But blaming early cse
for this seems inappropriate. (Note that the first two sroa / earlycse
passes are similar to what a standard llvm opt -O1/-O2 pipeline would
do, albeit this has some more passes even before but I don't think
they'd do much for us.)
It also in particular helps some crazy shader used for driver
verification (don't ask...) a lot (about factor of 6 faster in compile
time) (due to simplfiying the ir before LICM is run).
While here, also move licm behind simplifycfg. For some shaders there
seems to be very significant compile time gains (we've seen a factor
of 10000 albeit that was a really crazy shader you'd certainly never
see in a real app), beause LICM is quite expensive and there's cases
where running simplifycfg (along with sroa and early-cse) before licm
reduces IR complexity significantly. (I'm not entirely sure if it would
make sense to also run it afterwards.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agoglsl/glcpp: Handle hex constants with 0X prefix
Vlad Golovkin [Thu, 19 Apr 2018 20:08:01 +0000 (23:08 +0300)]
glsl/glcpp: Handle hex constants with 0X prefix

GLSL 4.6 spec describes hex constant as:

hexadecimal-constant:
    0x hexadecimal-digit
    0X hexadecimal-digit
    hexadecimal-constant hexadecimal-digit

Right now if you have a shader with the following structure:

    #if 0X1 // or any hex number with the 0X prefix
    // some code
    #endif

the code between #if and #endif gets removed because the checking is performed
only for "0x" prefix which results in strtoll being called with the base 8 and
after encountering the 'X' char the strtoll returns 0. Letting strtoll detect
the base makes this limitation go away and also makes code easier to read.

From the strtoll Linux man page:

"If base is zero or 16, the string may then include a "0x" prefix, and the
number will be read in base 16; otherwise, a zero base is taken as 10 (decimal)
unless the next character is '0', in which case it is taken as 8 (octal)."

This matches the behaviour in the GLSL spec.

This patch also adds a test for uppercase hex prefix.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agomesa: rename api_validate.{c,h} -> draw_validate.{c,h}
Timothy Arceri [Mon, 23 Apr 2018 03:46:15 +0000 (13:46 +1000)]
mesa: rename api_validate.{c,h} -> draw_validate.{c,h}

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65422

6 years agoac/radv/radeonsi: refactor harvest config register getters.
Dave Airlie [Mon, 23 Apr 2018 00:42:21 +0000 (10:42 +1000)]
ac/radv/radeonsi: refactor harvest config register getters.

This refactors the code out to share it between radv and radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: only set raster_config_1 outside the index registers.
Dave Airlie [Mon, 23 Apr 2018 00:39:33 +0000 (10:39 +1000)]
radv: only set raster_config_1 outside the index registers.

This follows what radeonsi does.

Ported from radeonsi:
    radeonsi: emit PA_SC_RASTER_CONFIG_1 only once

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoac/radv/radeonsi: refactor max simd waves into common code.
Dave Airlie [Mon, 23 Apr 2018 00:16:07 +0000 (10:16 +1000)]
ac/radv/radeonsi: refactor max simd waves into common code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/radv/radeonsi: refactor raster_config default values getters.
Dave Airlie [Mon, 23 Apr 2018 00:09:36 +0000 (10:09 +1000)]
ac/radv/radeonsi: refactor raster_config default values getters.

This just makes this common code between the two drivers.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradeonsi: use common gs_table_depth code
Dave Airlie [Sun, 22 Apr 2018 23:57:20 +0000 (09:57 +1000)]
radeonsi: use common gs_table_depth code

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: use common gs_table_depth code.
Dave Airlie [Sun, 22 Apr 2018 23:57:10 +0000 (09:57 +1000)]
radv: use common gs_table_depth code.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/info: move gs table depth to common code.
Dave Airlie [Sun, 22 Apr 2018 23:56:43 +0000 (09:56 +1000)]
ac/info: move gs table depth to common code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradeonsi: don't runtime check gs table info
Dave Airlie [Sun, 22 Apr 2018 23:52:28 +0000 (09:52 +1000)]
radeonsi: don't runtime check gs table info

We can just unreachable here, this aligns with radv code, makes
it easier to move to common code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv/gfx9: don't use gs_table_depth on gfx9.
Dave Airlie [Sun, 22 Apr 2018 23:50:28 +0000 (09:50 +1000)]
radv/gfx9: don't use gs_table_depth on gfx9.

Missed this on initial radeonsi port, we shouldn't use this value
on gfx9, but also in gfx8 only for when we have a geom shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoi965/fs: Return mlen * 8 for size_read() for INTERPOLATE_AT_*
Jason Ekstrand [Fri, 20 Apr 2018 03:48:42 +0000 (20:48 -0700)]
i965/fs: Return mlen * 8 for size_read() for INTERPOLATE_AT_*

They are send messages and this makes size_read() and mlen agree.  For
both of these opcodes, the payload is just a dummy so mlen == 1 and this
should decrease register pressure a bit.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable@lists.freedesktop.org
6 years agoac: fix the number of coordinates for ac_image_get_lod and arrays
Samuel Pitoiset [Mon, 23 Apr 2018 15:05:10 +0000 (17:05 +0200)]
ac: fix the number of coordinates for ac_image_get_lod and arrays

This fixes crashes for the following CTS:
dEQP-VK.glsl.texture_functions.query.texturequerylod.*

Cubemaps are the same as 2D arrays.

Fixes: 625dcbbc456 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoi965: perf: enable GPA query statistics
Lionel Landwerlin [Fri, 9 Feb 2018 10:56:42 +0000 (10:56 +0000)]
i965: perf: enable GPA query statistics

The combinaison of GPA/MDAPI components expects a particular name &
layout for their pipeline statistics query.

v2: Limit the query GPA/MDAPI statistics to gen7->9 (Lionel)

v3: Add curly braces (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: perf: add support for raw queries
Lionel Landwerlin [Wed, 7 Mar 2018 14:28:41 +0000 (14:28 +0000)]
i965: perf: add support for raw queries

The INTEL_performance_query extension provides a list of queries that
a user can select to monitor a particular workload. Each query reports
different sets of counters (roughly looking at different parts of the
hardware, i.e. caches/fixed functions/etc...).

Each query has an associated configuration that we need to program
into the hardware before using the query. Up to now, we provided
predefined queries. This change allows the user to build its own query
(and associated configuration) externally, and have the i965 driver
use that configuration through a new query named :

   Intel_Raw_Hardware_Counters_Set_0_Query

When this query is selected, the i965 driver will report raw counters
deltas (meaning their values need to be interpreted by the user, as
opposed to existing queries that provide human readable values).

This change is also useful for debug purposes for building new
pre-defined queries and verifying the underlying numbers make sense
before writing equations for user readable output.

This change's purpose is also to enable GPA. GPA uses a library called
MDAPI that processes raw counter data. MDAPI expects raw data to have
a certain layout (per generation which is a bit unfortunate...). This
change also embeds the expected data layouts.

v2: Enable raw queries on gen 7->11, v1 had 7->9 (Lionel)

v3: Don't assert on cherryview for gen7... (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: perf: read slice/unslice frequencies from OA reports
Lionel Landwerlin [Wed, 7 Mar 2018 16:02:40 +0000 (16:02 +0000)]
i965: perf: read slice/unslice frequencies from OA reports

v2: Add comment breaking down where the frequency values come from (Ken)

v3: More documentation (Ken/Lionel)
    Adjust clock ratio multiplier to reflect the divider's behavior (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: perf: snapshot RPSTAT register
Lionel Landwerlin [Wed, 7 Mar 2018 10:46:58 +0000 (10:46 +0000)]
i965: perf: snapshot RPSTAT register

This register contains the current/previous frequency of the GT, it's
one of the value GPA would like to have as part of their queries.

v2: Don't use this register on baytrail/cherryview (Ken)
    Use GET_FIELD() macro (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: perf: extract utility functions
Lionel Landwerlin [Tue, 6 Mar 2018 17:09:21 +0000 (17:09 +0000)]
i965: perf: extract utility functions

We would like to reuse a number of the functions and structures in
another file in a future commit.

We also move the previous content of brw_performance_query.h into
brw_performance_query_metrics.h to be included by generated metrics
files.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoac: teach get_ac_sampler_dim() about subpass attachments
Samuel Pitoiset [Mon, 23 Apr 2018 14:55:39 +0000 (16:55 +0200)]
ac: teach get_ac_sampler_dim() about subpass attachments

Suggested by Nicolai.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoac/nir: add missing round_slice for 1D arrays
Samuel Pitoiset [Mon, 23 Apr 2018 12:46:26 +0000 (14:46 +0200)]
ac/nir: add missing round_slice for 1D arrays

This fixes a bunch of CTS fails with 1D arrays:

dEQP-VK.glsl.texture_functions.texture*.sampler1darray_*

Fixes: 625dcbbc456 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agobin/install_megadrivers: rename a few variables to make things clearer
Dylan Baker [Mon, 9 Apr 2018 20:59:55 +0000 (13:59 -0700)]
bin/install_megadrivers: rename a few variables to make things clearer

Originally the "each" variable was just a part of the "drivers"
variable. It's not anymore so it's a bit ambiguous.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agobin/install_megadrivers: fix DESTDIR and -D*-path
Dylan Baker [Mon, 9 Apr 2018 20:53:09 +0000 (13:53 -0700)]
bin/install_megadrivers: fix DESTDIR and -D*-path

This fixes -Ddri-drivers-path, -Dvdpau-libs-path, etc. with DESTDIR when
those paths are absolute. Currently due to the way python's os.path.join
handles absolute paths these will ignore DESTDIR, which is bad. This
fixes them to be relative to DESTDIR if that is set.

Fixes: 3218056e0eb375eeda470058d06add1532acd6d4
       ("meson: Build i965 and dri stack")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agocompiler/glsl: close fd's in glcpp_test.py
Dylan Baker [Thu, 19 Apr 2018 18:02:32 +0000 (11:02 -0700)]
compiler/glsl: close fd's in glcpp_test.py

I would have thought falling out of scope would allow the gc to collect
these, but apparently it doesn't, and this hits an fd limit on macos.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106133
Fixes: db8cd8e36771eed98eb638fd0593c978c3da52a9
       ("glcpp/tests: Convert shell scripts to a python script")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
6 years agonir: Do not use progress for unreachable code in return lowering.
Bas Nieuwenhuizen [Sun, 22 Apr 2018 17:05:19 +0000 (19:05 +0200)]
nir: Do not use progress for unreachable code in return lowering.

We seem to use progress for two cases:
1) When we lowered some returns.
2) When we remove unreachable code.

If just case 2 happens we assert as state->return_flag has not
been allocated yet, but we are still trying to do insert all
predicates based on it.

This splits the concerns. We only use progress internally for case 1
and then keep track of 2 in a separate variable to indicate progress
in the return value of the pass.

This is slightly better than transforming the assert into
if (!state->return_flag) return, as the solution in this patch avoids
inserting predicates even if some other part of the might need them.

Fixes: 6e22ad6edc "nir: return early when lowering a return at the end of a function"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106174
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoradv: advertise 8 bits of subpixel precision for viewports
Józef Kucia [Tue, 10 Apr 2018 22:11:57 +0000 (00:11 +0200)]
radv: advertise 8 bits of subpixel precision for viewports

This is what radeonsi does.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agost/dri: Fix dangling pointer to a destroyed dri_drawable
Johan Klokkhammer Helsing [Fri, 20 Apr 2018 10:29:16 +0000 (12:29 +0200)]
st/dri: Fix dangling pointer to a destroyed dri_drawable

If an EGLSurface is created, made current and destroyed, and then a second
EGLSurface is created. Then the second malloc in driCreateNewDrawable may
return the same pointer address the first surface's drawable had.
Consequently, when dri_make_current later tries to determine if it should
update the texture_stamp it compares the surface's drawable pointer against
the drawable in the last call to dri_make_current and assumes it's the same
surface (which it isn't).

When texture_stamp is left unset, then dri_st_framebuffer_validate thinks
it has already called update_drawable_info for that drawable, leaving it
unvalidated and this is when bad things starts to happen. In my case it
manifested itself by the width and height of the surface being unset.

This is fixed this by setting the pointer to NULL before freeing the
surface.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106126
Signed-off-by: Johan Klokkhammer Helsing <johan.helsing@qt.io>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
6 years agonv50/ir: make a copy of tex src if it's referenced multiple times
Ilia Mirkin [Tue, 10 Apr 2018 02:19:35 +0000 (22:19 -0400)]
nv50/ir: make a copy of tex src if it's referenced multiple times

For nv50 we coalesce the srcs and defs into a single node. As such, we
can end up with impossible constraints if the source is referenced
after the tex operation (which, due to the coalescing of values, will
have overwritten it).

This logic already exists for inserting moves for MERGE/UNION sources.
It's the exact same idea here, so leverage that code, which also
includes a few optimizations around not extending live ranges
unnecessarily.

Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
6 years agovirgl: disable virgl when no 3D for virtio gpu.
Lepton Wu [Thu, 5 Apr 2018 19:38:48 +0000 (12:38 -0700)]
virgl: disable virgl when no 3D for virtio gpu.

If users are running mesa under old version of qemu or have turned off
GL at runtime, virtio gpu driver actually doesn't work. Adds a detection
here so mesa can fall back to software rendering.

v2:
 - move detection from loader to virgl (Ilia, Emil)

Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: mark const structs as extern in header file to avoid lto damage
Dave Airlie [Fri, 13 Apr 2018 02:40:55 +0000 (12:40 +1000)]
radv: mark const structs as extern in header file to avoid lto damage

The copr repo from che was using LTO and he reported radv broke
recently with it. When testing with lto builds here I noticed
that we weren't seeing any instance extensions reported.

It appears LTO was treating the const without extern as an empty
struct, this is possibly a gcc bug, but we can work around it
just by marking these with extern.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoBump version after 18.1
Dylan Baker [Sat, 21 Apr 2018 03:31:26 +0000 (20:31 -0700)]
Bump version after 18.1

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agogallium/tests/trivial: fix viewport depth transform
Ilia Mirkin [Thu, 1 Mar 2018 00:40:48 +0000 (19:40 -0500)]
gallium/tests/trivial: fix viewport depth transform

These were getting mapped off into outer space, which would cause nv50
and nvc0 to clip the primitives (as depth_clip was enabled).

These drivers are configured to clip everything outside the [0, 1]
range, even though the hardware supports other view settings.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agotrace: allow image resource to be null
Ilia Mirkin [Tue, 27 Feb 2018 00:26:36 +0000 (19:26 -0500)]
trace: allow image resource to be null

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agonv50/ir/ra: prefer def == src2 for fma with immediates on nvc0
Karol Herbst [Tue, 27 Mar 2018 17:10:34 +0000 (19:10 +0200)]
nv50/ir/ra: prefer def == src2 for fma with immediates on nvc0

This helps with the PostRALoadPropagation pass moving long immediates into
FMA/MAD instructions.

changes in shader-db:
total instructions in shared programs : 5894114 -> 5886074 (-0.14%)
total gprs used in shared programs    : 666558 -> 666563 (0.00%)
total shared used in shared programs  : 520416 -> 520416 (0.00%)
total local used in shared programs   : 53524 -> 53524 (0.00%)
total bytes used in shared programs   : 54006744 -> 53932472 (-0.14%)

                local     shared        gpr       inst      bytes
    helped           0           0           2        4192        4192
      hurt           0           0           7           9           9

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: minor edits to separate nv50 and nvc0+ cases]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
6 years agodocs/features: mark GL_ARB_post_depth_coverage as DONE for nvc0
Rhys Perry [Sat, 21 Apr 2018 10:43:16 +0000 (11:43 +0100)]
docs/features: mark GL_ARB_post_depth_coverage as DONE for nvc0

This was done a while ago but never marked on features.txt. Note that
this is only supported on GM200+.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
6 years agoautotools: Include new meson files
Dylan Baker [Sat, 21 Apr 2018 01:52:55 +0000 (18:52 -0700)]
autotools: Include new meson files

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoautotools: Add passes.h to sources so it will be included in the tarball
Dylan Baker [Sat, 21 Apr 2018 02:04:01 +0000 (19:04 -0700)]
autotools: Add passes.h to sources so it will be included in the tarball

This was introduced in commit 8f848ada8a42d9aaa8136afa1bafe32281a0fb48
but not added to the sources list, which is necessary for it to be
included in release tarballs.

Fixes: 8f848ada8a42d9aaa8136afa1bafe32281a0fb48
       ("swr/rast: Start refactoring of builder/packetizer.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoautotools: include include/vulkan headers
Dylan Baker [Sat, 21 Apr 2018 01:28:57 +0000 (18:28 -0700)]
autotools: include include/vulkan headers

This is needed to provide vk_android_native_buffer.h for vk_enum_to_str.

v2: - remove accidentally included changes

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonvc0: fix line width on GM20x+
Rhys Perry [Fri, 20 Apr 2018 22:32:05 +0000 (23:32 +0100)]
nvc0: fix line width on GM20x+

This has the side-effect of fixing polygon-offset piglit test failures.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
6 years agoi965/miptree: Delete an unused function
Nanley Chery [Mon, 9 Apr 2018 18:20:27 +0000 (11:20 -0700)]
i965/miptree: Delete an unused function

We're going to combine ::mcs_buf and ::hiz_buf in later commits. Once
that happens, this function no longer make sense.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/miptree: Don't leak the clear_color_bo
Nanley Chery [Mon, 9 Apr 2018 18:27:08 +0000 (11:27 -0700)]
i965/miptree: Don't leak the clear_color_bo

Free the clear_color_bo in addition to freeing the
intel_miptree_aux_buffer which holds the reference to it.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/blorp: Do the gen11 BTI flush
Jason Ekstrand [Tue, 17 Apr 2018 22:07:13 +0000 (15:07 -0700)]
i965/blorp: Do the gen11 BTI flush

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
6 years agoanv/blorp: Do the gen11 BTI flush
Jason Ekstrand [Tue, 17 Apr 2018 22:06:46 +0000 (15:06 -0700)]
anv/blorp: Do the gen11 BTI flush

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
6 years agoetnaviv: fix texture_format_needs_swiz
Lucas Stach [Fri, 20 Apr 2018 12:34:45 +0000 (14:34 +0200)]
etnaviv: fix texture_format_needs_swiz

memcmp returns 0 when both swizzles are the same, which means we don't
need any hardware swizzling. texture_format_needs_swiz should return
true when the return value of the memcmp is non-zero.

Fixes: 751ae6afbefd ("etnaviv: add support for swizzled texture formats")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
6 years agoac/nir: fix image dimension for subpass attachments
Samuel Pitoiset [Fri, 20 Apr 2018 16:06:43 +0000 (18:06 +0200)]
ac/nir: fix image dimension for subpass attachments

For subpass attachments we need one more coordinate with
the layer, so make them array types.

This fixes a bunch of CTS fails with RADV.

Fixes: 24fb3e6aa1 ("ac/nir: use ac_build_image_opcode for image intrinsics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: Mark GTT memory as device local for APUs.
Bas Nieuwenhuizen [Fri, 20 Apr 2018 16:16:02 +0000 (18:16 +0200)]
radv: Mark GTT memory as device local for APUs.

Otherwise a lot of games complain about not having enough memory,
and it is sort of local so this seems reasonable to me.

CC: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv/winsys: allow to submit up to 4 IBs for chips without chaining
Samuel Pitoiset [Fri, 20 Apr 2018 11:42:36 +0000 (13:42 +0200)]
radv/winsys: allow to submit up to 4 IBs for chips without chaining

The SI family doesn't support chaining which means the maximum
size in dwords per CS is limited. When that limit was reached
we failed to submit the CS and the application crashed.

This patch allows to submit up to 4 IBs which is currently the
limit, but recent amdgpu supports more than that.

Please note that we can reach the limit of 4 IBs per submit
but currently we can't improve that. The only solution is to
upgrade libdrm. That will be improved later but for now this
should fix crashes on SI or when using RADV_DEBUG=noibs.

Fixes: 36cb5508e89 ("radv/winsys: Fail early on overgrown cs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105775
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agogallium/util: Android backtrace support
Stefan Schake [Sun, 15 Apr 2018 22:45:17 +0000 (00:45 +0200)]
gallium/util: Android backtrace support

We can't use any of the existing implementations in u_debug_stack.
Android technically has libunwind, but it's been modified to the point
where it no longer compiles with the Mesa usage. The library is also
not meant to be referenced by vendor libraries. The officially sanctioned
way of obtaining backtraces is through the Android own libbacktrace, a
C++ library. Access it through a separate C++ source file on Android only.

Signed-off-by: Stefan Schake <stschake@gmail.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agogallium/util: Don't stub u_debug_stack on Android
Stefan Schake [Sun, 15 Apr 2018 22:45:16 +0000 (00:45 +0200)]
gallium/util: Don't stub u_debug_stack on Android

The fallback path for no libunwind ends up being stubs for Android.
Don't compile them in so we can provide our own implementation.

Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agoac/nir: handle nir_intrinsic_load_first_vertex like base_vertex
Samuel Pitoiset [Fri, 20 Apr 2018 14:58:24 +0000 (16:58 +0200)]
ac/nir: handle nir_intrinsic_load_first_vertex like base_vertex

This fixes a ton of CTS crashes.

Fixes: c366f422f0 ("nir: Offset vertex_id by first_vertex instead of base_vertex")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv/winsys: allow local BOs on APUs
Samuel Pitoiset [Fri, 20 Apr 2018 13:11:24 +0000 (15:11 +0200)]
radv/winsys: allow local BOs on APUs

Ported from RadeonSI.

Local BOs ignore BO priorities, and we don't need those on APUs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: use a global BO list only for VK_EXT_descriptor_indexing
Samuel Pitoiset [Thu, 19 Apr 2018 11:48:33 +0000 (13:48 +0200)]
radv: use a global BO list only for VK_EXT_descriptor_indexing

Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.

We might find a better solution in the future, but for now
just keeps two paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoRevert "radv: Don't store buffer references in the descriptor set."
Samuel Pitoiset [Thu, 19 Apr 2018 11:39:17 +0000 (13:39 +0200)]
Revert "radv: Don't store buffer references in the descriptor set."

In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.

One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.

With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.

This reverts commit ab6cadd3ecc7fbdd9079808b407674e0b19c52f0.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoi965/fs: retype offset_reg to UD at load_ssbo
Jose Maria Casanova Crespo [Mon, 19 Mar 2018 14:03:17 +0000 (15:03 +0100)]
i965/fs: retype offset_reg to UD at load_ssbo

All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:

mov(8) vgrf9:UD, vgrf7:D

This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoac/nir: use ac_build_image_opcode for image intrinsics
Nicolai Hähnle [Fri, 20 Apr 2018 07:30:07 +0000 (09:30 +0200)]
ac/nir: use ac_build_image_opcode for image intrinsics

So that we'll use the dimension-aware intrinsics in the future.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: generate image load/store/atomic ops using ac_build_image_opcode
Nicolai Hähnle [Fri, 20 Apr 2018 07:29:57 +0000 (09:29 +0200)]
radeonsi: generate image load/store/atomic ops using ac_build_image_opcode

In preparation of dimension-aware LLVM image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: pass address components individually to ac_build_image_intrinsic
Nicolai Hähnle [Fri, 23 Mar 2018 10:20:24 +0000 (11:20 +0100)]
amd/common: pass address components individually to ac_build_image_intrinsic

This is in preparation for the new image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: pass new enum ac_image_dim to ac_build_image_opcode
Nicolai Hähnle [Fri, 16 Feb 2018 13:21:56 +0000 (14:21 +0100)]
amd/common: pass new enum ac_image_dim to ac_build_image_opcode

This is in preparation for the new, dimension-aware LLVM image
intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: fix crash in test involving the sample mask
Nicolai Hähnle [Wed, 4 Apr 2018 19:14:13 +0000 (21:14 +0200)]
radeonsi/nir: fix crash in test involving the sample mask

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoradeonsi/nir: set FS properties only when scanning a fragment shader
Nicolai Hähnle [Mon, 2 Apr 2018 11:20:02 +0000 (13:20 +0200)]
radeonsi/nir: set FS properties only when scanning a fragment shader

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoac/nir: fix atomic compare-and-swap
Nicolai Hähnle [Mon, 2 Apr 2018 12:12:50 +0000 (14:12 +0200)]
ac/nir: fix atomic compare-and-swap

The LLVM instruction returns { i32, i1 }, where the i1 indicates success.
We're only interested in the first part, which is the loaded value.

Fixes dEQP-GLES31.functional.compute.shared_var.atomic.compswap.*

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoradeonsi: fix error paths of si_texture_transfer_map
Nicolai Hähnle [Tue, 16 Jan 2018 13:38:00 +0000 (14:38 +0100)]
radeonsi: fix error paths of si_texture_transfer_map

trans is zero-initialized, but trans->resource is setup immediately so
needs to be dereferenced.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoglsl: prevent spurious Valgrind errors when serializing NIR
Nicolai Hähnle [Fri, 23 Mar 2018 14:43:58 +0000 (15:43 +0100)]
glsl: prevent spurious Valgrind errors when serializing NIR

It looks as if the structure fields array is fully initialized below,
but in fact at least gcc in debug builds will not actually overwrite
the unused bits of bit fields.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoclover: Fix host access validation for sub-buffer creation
Aaron Watry [Sat, 7 Apr 2018 18:44:53 +0000 (13:44 -0500)]
clover: Fix host access validation for sub-buffer creation

  From CL 1.2 Section 5.2.1:
    CL_INVALID_VALUE if buffer was created with CL_MEM_HOST_WRITE_ONLY and
    flags specify CL_MEM_HOST_READ_ONLY , or if buffer was created with
    CL_MEM_HOST_READ_ONLY and flags specify CL_MEM_HOST_WRITE_ONLY , or if
    buffer was created with CL_MEM_HOST_NO_ACCESS and flags specify
    CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY .

Fixes CL 1.2 CTS test/api get_buffer_info

v2: Correct host_access_flags check (Francisco)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agonir: Offset vertex_id by first_vertex instead of base_vertex
Neil Roberts [Thu, 25 Jan 2018 18:15:43 +0000 (19:15 +0100)]
nir: Offset vertex_id by first_vertex instead of base_vertex

base_vertex will be zero for non-indexed calls and in that case we
need vertex_id to be offset by the ‘first’ parameter instead. That is
what we get with first_vertex. This is true for both GL and Vulkan.

The freedreno driver is also setting vertex_id_zero_based on
nir_options. In order to avoid breakage this patch switches the
relevant code to handle SYSTEM_VALUE_FIRST_VERTEX so that it can
retain the same behavior.

v2: change a3xx/fd3_emit.c and a4xx/fd4_emit.c from
SYSTEM_VALUE_BASE_VERTEX to SYSTEM_VALUE_FIRST_VERTEX (Kenneth).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agospirv: Lower BaseVertex to FIRST_VERTEX instead of BASE_VERTEX
Neil Roberts [Thu, 25 Jan 2018 18:15:41 +0000 (19:15 +0100)]
spirv: Lower BaseVertex to FIRST_VERTEX instead of BASE_VERTEX

The base vertex in Vulkan is different from GL in that for non-indexed
primitives the value is taken from the firstVertex parameter instead
of being set to zero. This coincides with the new SYSTEM_VALUE_FIRST_VERTEX
instead of BASE_VERTEX.

v2 (idr): Add comment describing why SYSTEM_VALUE_FIRST_VERTEX is used
for SpvBuiltInBaseVertex.  Suggested by Jason.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel: Handle firstvertex in an identical way to BaseVertex
Antia Puentes [Thu, 25 Jan 2018 18:15:40 +0000 (19:15 +0100)]
intel: Handle firstvertex in an identical way to BaseVertex

Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.

The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>

v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.

6 years agointel/compiler: Add a uses_firstvertex flag
Neil Roberts [Thu, 25 Jan 2018 18:15:39 +0000 (19:15 +0100)]
intel/compiler: Add a uses_firstvertex flag

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agocompiler: Add SYSTEM_VALUE_FIRST_VERTEX and instrinsics
Antia Puentes [Thu, 25 Jan 2018 18:15:38 +0000 (19:15 +0100)]
compiler: Add SYSTEM_VALUE_FIRST_VERTEX and instrinsics

This VS system value will contain the value passed as <basevertex> for
indexed draw calls or the value passed as <first> for non-indexed draw
calls. It can be used to calculate the gl_VertexID as
SYSTEM_VALUE_VERTEX_ID_ZERO_BASE plus SYSTEM_VALUE_FIRST_VERTEX.

From the OpenGL 4.6 spec, 10.4 "Drawing Commands Using Vertex Arrays":

-  Page 352:
"The index of any element transferred to the GL by DrawArraysOneInstance
is referred to as its vertex ID, and may be read by a vertex shader as
gl_VertexID.  The vertex ID of the ith element transferred is first +
i."

- Page 355:
"The index of any element transferred to the GL by
DrawElementsOneInstance is referred to as its vertex ID, and may be read
by a vertex shader as gl_VertexID.  The vertex ID of the ith element
transferred is the sum of basevertex and the value stored in the
currently bound element array buffer at offset indices + i."

Currently the gl_VertexID calculation uses SYSTEM_VALUE_BASE_VERTEX but
this will have to change when the value of gl_BaseVertex is
fixed. Currently its value is broken for non-indexed draw calls because
it must be zero but we are setting it to <first>.

v2: use SYSTEM_VALUE_FIRST_VERTEX as name for the value, instead of
SYSTEM_VALUE_BASE_VERTEX_ID (Kenneth).

v3 (idr): Rebase on Rob Clark converting nir_intrinsics.h to be
generated.  Reformat commit message to 72 columns.

Reviewed-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>