Ryan Houdek [Mon, 22 Apr 2019 03:41:09 +0000 (20:41 -0700)]
panfrost: Adds Bifrost shader disassembler utility
This code is stable and can live upstream independently while the rest
of the Bifrost stack comes up.
v2: Added a verbose flag to hide away some of the more verbose features
that nobody really needs
[The Bifrost disassembler is written by Connor Abbott, Lyude Paul, and
Ryan Houdek.]
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Wed, 24 Apr 2019 02:18:28 +0000 (02:18 +0000)]
panfrost/midgard: Add "op commutes?" property
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Wed, 24 Apr 2019 01:15:15 +0000 (01:15 +0000)]
panfrost/midgard: Refactor opcode tables
We create an all-encompassing opcode table for handling name and
properties, removing a number of ad hoc opcode tables which became
brittle and quickly out of date. While we're at it, we fix some
incorrect opcodes relating to ball/bany, and move a small function out
to midgard_compile.c. Together these changes should allow compilation
without warnings, along with helping the codebase health considerably.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Mon, 22 Apr 2019 04:58:53 +0000 (04:58 +0000)]
panfrost/midgard: Optimize MIR in progress loop
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Mon, 22 Apr 2019 04:56:25 +0000 (04:56 +0000)]
panfrost/midgard: Implement copy propagation
Most copy prop should occur at the NIR level, but we generate a fair
number of moves implicitly ourselves, etc... long story short, it's a
net win to also do simple copy prop + DCE on the MIR. As a bonus, this
fixes the weird imov precision bug once and for good, I think.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Mon, 22 Apr 2019 03:25:42 +0000 (03:25 +0000)]
panfrost/midgard: Set integer mods
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Mon, 22 Apr 2019 03:08:25 +0000 (03:08 +0000)]
panfrost/midgard: Document sign-extension/zero-extension bits (vector)
For floating point ops, these bits determine the "negate?" and "abs?"
modifiers. For integer ops, it turns out they control how sign/zero
extension work, useful for mixing types.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Mon, 22 Apr 2019 02:56:53 +0000 (02:56 +0000)]
panfrost/midgard: Update integer op list
In the future, we might want to switch to a table-based approach, but
for now, at least have it current.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 19:13:27 +0000 (19:13 +0000)]
panfrost/midgard: Remove unused mir_next_block
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 19:12:10 +0000 (19:12 +0000)]
panfrost/midgard: Fix off-by-one in successor analysis
This reduces register pressure substantially since we get smaller
liveness ranges.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 16:22:44 +0000 (16:22 +0000)]
panfrost/midgard: Track loop depth
This fixes nested loops.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 16:11:11 +0000 (16:11 +0000)]
panfrost/midgard: Dead code eliminate MIR
We reshuffle the existing "dead move elimination" pass into a generic
dead code elimination layer, fixing bugs incurred with looping in the
process.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 05:11:37 +0000 (05:11 +0000)]
panfrost: Use actual imov instruction
The bug this worked around is no longer applicable, it seems -- remove
the hack that breaks more than it fixes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 05:11:02 +0000 (05:11 +0000)]
panfrost: Disable indirect outputs for now
The hardware needs this lowered anyway; for now, might as well use
mesa's default lowering for pure conformance reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 04:09:10 +0000 (04:09 +0000)]
panfrost/midgard: imul can only run on *mul
This restriction makes sense logically. Not sure why it wasn't obeyed
before. In conjunction with previous commit's disclaimer, fixes
dEQP-GLES2.functional.shaders.loop.for_dynamic_iterations.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 03:59:05 +0000 (03:59 +0000)]
panfrost/midgard: Don't try to inline constants on branches
Along with a corresponding fix to the move elimination pass (not
included here yet -- I just have it disabled for now), this will fix
dEQP-GLES2.functional.shaders.loops.for_uniform_iterations.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 03:29:47 +0000 (03:29 +0000)]
panfrost: Respect backwards branches in RA
Fixes a bunch of issues with looping. Honestly, I'm not sure why loops
worked at all before.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 01:43:08 +0000 (01:43 +0000)]
panfrost/midgard: Remove useless MIR dump
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 21 Apr 2019 00:09:13 +0000 (00:09 +0000)]
panfrost/midgard: Respect component of bcsel condition
Fixes a bunch of non-vec4 indexing.varying_array tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sat, 20 Apr 2019 23:52:42 +0000 (23:52 +0000)]
panfrost/midgard: Implement indirect loads of varyings/UBOs
This adds preliminary support for indirect loads of varying arrays and
uniform arrays, bringing a few new tests in shader.indexing.* to
passing, although there remains a number of cases still missing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sat, 20 Apr 2019 23:39:29 +0000 (23:39 +0000)]
panfrost/midgard: Pipe through varying arrays
Varying arrays sometimes are lowered to a series of directly accessed
varyings (which we handled okay), but when indirectly accessed, they
appear as a single array; we need to handle this as well.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sat, 20 Apr 2019 23:37:14 +0000 (23:37 +0000)]
panfrost/mdg/disasm: Print raw varying_parameters
The semantics of this field are not well understood; it is better to
print it unconditionally along with the other unknown state, rather than
silently eat the value. Without this change, some critical state was
being lost in some shaders (notably, the offset for load/store
scratchpad intructions found in shaders that spill registers.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Kenneth Graunke [Wed, 24 Apr 2019 00:40:09 +0000 (17:40 -0700)]
iris: Prefer staging blits when destination supports CCS_E.
Otherwise our textures don't get color compression. Thanks to
Eero Tamminen for noticing this was missing!
Improves performance of GLB27_FillTestC24Z16 on my Apollolake
laptop with single channel RAM by 2.3x.
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Marek Olšák [Tue, 23 Apr 2019 00:00:10 +0000 (20:00 -0400)]
gallium: replace drm_driver_descriptor::configuration with driconf_xml
PIPE_CAPs are better.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Mon, 22 Apr 2019 21:50:00 +0000 (17:50 -0400)]
gallium: replace DRM_CONF_SHARE_FD with PIPE_CAP_DMABUF
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Mon, 22 Apr 2019 21:35:27 +0000 (17:35 -0400)]
gallium: replace DRM_CONF_THROTTLE with PIPE_CAP_MAX_FRAMES_IN_FLIGHT
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Mon, 22 Apr 2019 21:05:18 +0000 (17:05 -0400)]
st/dri: simplify throttling code
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Fri, 19 Apr 2019 23:11:34 +0000 (19:11 -0400)]
gallium: document conservative rasterization flags
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 19 Apr 2019 00:48:15 +0000 (17:48 -0700)]
intel/compiler: Lower ffma on Gen4 and Gen5
flrp32 is also a 3-source instruction, but there is another pending
series that handles that for Gen4 and Gen5.
v2: Rebase on "intel/compiler: Don't have sepearate, per-Gen
nir_options"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Fri, 19 Apr 2019 20:11:34 +0000 (13:11 -0700)]
intel/compiler: Don't have sepearate, per-Gen nir_options
Instead, just have separate scalar vs. vector nir_options and do
per-Gen "fix ups".
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 18 Apr 2019 19:33:39 +0000 (12:33 -0700)]
glsl: Silence may unused parameter warnings in glsl/ir.h
Every file that included glsl/ir.h had a warning like:
src/compiler/glsl/ir.h: In member function ‘virtual bool ir_rvalue::is_lvalue(const _mesa_glsl_parse_state*) const’:
src/compiler/glsl/ir.h:236:64: warning: unused parameter ‘state’ [-Wunused-parameter]
virtual bool is_lvalue(const struct _mesa_glsl_parse_state *state = NULL) const
^
Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: fa4ebf6b8d9 ("glsl: add _mesa_glsl_parse_state object to is_lvalue()")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Timothy Arceri [Tue, 23 Apr 2019 02:54:38 +0000 (12:54 +1000)]
st/mesa/radeonsi: fix race between destruction of types and shader compilation
Commit
624789e3708c moved the destruction of types out of atexit() and
made use of a ref count instead. This is useful for avoiding a crash
where drivers such as radeonsi are still compiling in a thread when the app
exits and has not called MakeCurrent to change from the current context.
While the above scenario is technically an app bug we shouldn't crash.
However that change caused another race condition between the shader
compilation tread in radeonsi and context teardown functions.
This patch makes two changes to fix this new problem:
First we explicitly call _mesa_destroy_shader_compiler_types() when destroying
the st context rather than calling it indirectly via _mesa_free_context_data().
We do this as we must call it after st_destroy_context_priv() so that we don't
destory the glsl types before the compilation threads finish.
Next wait for the shader threads to finish in si_destroy_context() this
also means we need to call context destroy before destroying the queues
in si_destroy_screen().
Fixes: 624789e3708c ("compiler/glsl: handle case where we have multiple users for types")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bas Nieuwenhuizen [Tue, 16 Apr 2019 23:51:10 +0000 (01:51 +0200)]
radv: Add adaptive_sync driconfig option and enable it by default.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Tue, 16 Apr 2019 23:30:49 +0000 (01:30 +0200)]
vulkan/wsi: Add X11 adaptive sync support based on dri options.
The dri options are optional. When the dri options are not provided
the WSI will not use adaptive sync.
FWIW I think for xf86-video-amdgpu this still requires an X11 config
option, so only people who opt in can get possible regressions from this.
So then the remaining question is: why do this in the WSI?
It has been suggested in another MR that the application sets this.
However, I disagree with that as I don't think we'll ever get a
reasonable set of applications setting it.
The next questions is whether this can be a layer. It definitely
can be as implemented now. However, I think this generally fits
well with the function of the WSI. Furthemore, for e.g. the DISPLAY
WSI this is much harder to do in a layer.
Of course, most of the WSI could almost be a layer, but I think
this still fits best in the WSI.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Bas Nieuwenhuizen [Sun, 14 Apr 2019 22:32:27 +0000 (00:32 +0200)]
radv: Add support for driconf.
This includes 0 options.
The cache parsing is located at a position where we can easily add
config filtering by VkApplicationInfo.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Mike Blumenkrantz [Thu, 18 Apr 2019 17:21:56 +0000 (13:21 -0400)]
iris: add support for INTEL_conservative_rasterization
this hooks up the iris gallium driver to existing mesa bits which handle
the implementation
resolves kwg/mesa#8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mike Blumenkrantz [Thu, 18 Apr 2019 17:18:43 +0000 (13:18 -0400)]
st/mesa: indicate intel extension support for inner_coverage based on cap
if the driver (iris) indicates support for the inner_coverage pipe cap, this
will set the necessary states in the driver flags and rasterizer structs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mike Blumenkrantz [Thu, 18 Apr 2019 17:18:03 +0000 (13:18 -0400)]
gallium: add pipe cap for inner_coverage conservative raster mode
this can be used by drivers which support the extension to indicate support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Fri, 22 Mar 2019 07:42:56 +0000 (00:42 -0700)]
iris: Fix DrawTransformFeedback math when there's a buffer offset
We need to subtract the starting offset from the final offset before
dividing by the stride. See src/intel/vulkan/genX_cmd_buffer.c:3142.
Not known to fix anything.
Kenneth Graunke [Thu, 14 Mar 2019 08:19:59 +0000 (01:19 -0700)]
iris: Make some offset math helpers take a const isl_surf pointer
Caio Marcelo de Oliveira Filho [Mon, 22 Apr 2019 23:09:56 +0000 (16:09 -0700)]
spirv: Handle SpvOpDecorateId
This operation decorate with an Id instead of a Literal or String.
It is used by HlslCounterBufferGOOGLE (provided by
SPV_GOOGLE_hlsl_functionality1). Even if we don't do anything with
that decoration, we must be able to parse SPIR-V that uses it.
Fixes: 891886da2f9 "spirv: Add no-op support for VK_GOOGLE_hlsl_functionality1"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Mon, 22 Apr 2019 23:17:58 +0000 (16:17 -0700)]
spirv: Rename vtn_decoration literals to operands
Decorations (and ExecutionModes) can have not only literals, but also
Ids associated with them. So rename the field to the more general
name "Operand" used by the spec.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Mon, 22 Apr 2019 21:09:11 +0000 (22:09 +0100)]
anv: fix argument name for vkCmdEndQuery
Doesn't fix anything but it's not the right function prototype.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 673f33c77dd765 ("anv: Implement CmdBegin/EndQueryIndexed")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Chia-I Wu [Thu, 18 Apr 2019 22:34:46 +0000 (15:34 -0700)]
virgl: skip empty cmdbufs
Several empty cmdbufs are submitted by app/xserver per frame, from
glamor_block_handler for example. Let's skip them.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Eric Anholt [Fri, 19 Apr 2019 22:19:58 +0000 (15:19 -0700)]
gallium: Remove the malloc pipebuffer manager.
This has been unused since r600 stopped using it in 2010.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>
Eric Anholt [Fri, 19 Apr 2019 22:10:30 +0000 (15:10 -0700)]
gallium: Remove the "alt" pipebuffer manager interface.
This one would allocate from two underlying pools, but has never been
used.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>
Eric Anholt [Fri, 19 Apr 2019 22:08:39 +0000 (15:08 -0700)]
gallium: Remove the ondemand pipebuffer manager.
I couldn't find any uses in the tree since its introduction.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>
Eric Anholt [Fri, 19 Apr 2019 22:02:22 +0000 (15:02 -0700)]
gallium: Remove the pool pipebuffer manager.
Noticed while trying to decide if pipebuffer was of any use to me, and
found that nothing has used it in the last 10 years at least.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>
Jonathan Marek [Mon, 18 Feb 2019 10:15:01 +0000 (11:15 +0100)]
freedreno: a2xx: same gmem2mem sequence for all tiles
Set REG_A2XX_RB_COPY_DEST_OFFSET in the tile init as it won't get touched
by the draw batch. Then gmem2mem is the same for all tiles.
Similar to what is done in a6xx, but only for gmem2mem.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Jonathan Marek [Wed, 23 Jan 2019 22:28:20 +0000 (17:28 -0500)]
freedreno: a2xx: enable batch reordering
Batch reordering on a2xx is now tested and functional.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Jonathan Marek [Wed, 10 Apr 2019 18:11:26 +0000 (14:11 -0400)]
freedreno: a2xx: use nir_lower_io for TGSI shaders
Allows removing the load_deref/store_deref code in the compiler.
tgsi_to_nir now uses screen instead of options so we can simplify that too.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Jonathan Marek [Wed, 10 Apr 2019 17:59:10 +0000 (13:59 -0400)]
freedreno: a2xx: disable PIPE_CAP_PACKED_UNIFORMS
a2xx driver is currently broken when PIPE_CAP_PACKED_UNIFORMS is enabled,
disable it for now.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Jonathan Marek [Wed, 10 Apr 2019 17:52:55 +0000 (13:52 -0400)]
freedreno: a2xx: fix builtin blit program compilation
tgsi_to_nir now requires a screen pointer and is used by fd2_prog_init.
fd2_prog_init is used before fd_context_init so set the pointer manually.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Jonathan Marek [Wed, 10 Apr 2019 23:24:09 +0000 (19:24 -0400)]
svga: add new ATC formats to the format conversion table
Fixes the static assertion error.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Jonathan Marek [Tue, 5 Feb 2019 16:10:46 +0000 (11:10 -0500)]
freedreno: a2xx: add GL_AMD_compressed_ATC_texture support
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Jonathan Marek [Tue, 5 Feb 2019 16:08:33 +0000 (11:08 -0500)]
freedreno: a3xx: add GL_AMD_compressed_ATC_texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Jonathan Marek [Tue, 5 Feb 2019 16:04:52 +0000 (11:04 -0500)]
st/mesa: add ATC support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Jonathan Marek [Tue, 5 Feb 2019 16:00:14 +0000 (11:00 -0500)]
llvmpipe, softpipe: no support for ATC textures
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Jonathan Marek [Tue, 5 Feb 2019 15:59:42 +0000 (10:59 -0500)]
gallium: add ATC format support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Jonathan Marek [Tue, 5 Feb 2019 16:36:59 +0000 (11:36 -0500)]
mesa: add GL_AMD_compressed_ATC_texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Marek Olšák [Thu, 28 Feb 2019 02:13:15 +0000 (21:13 -0500)]
radeonsi: delay adding BOs at the beginning of IBs until the first draw
so that bound compute shader resources won't be added when they are not
needed and same for graphics.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 12 Feb 2019 20:23:01 +0000 (15:23 -0500)]
radeonsi: add helper si_get_minimum_num_gfx_cs_dwords
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 12 Feb 2019 20:03:13 +0000 (15:03 -0500)]
radeonsi: add si_cp_copy_data
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 14 Feb 2019 05:38:37 +0000 (00:38 -0500)]
winsys/amdgpu: clean up and remove nonsensical assertion
The assertion considers max_dw from the current IB in the chain, but
big_ib_buffer is a buffer for the next IB, which can be smaller.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 14 Feb 2019 05:38:05 +0000 (00:38 -0500)]
winsys/amdgpu: enable chaining for compute IBs
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 4 Feb 2019 22:31:11 +0000 (17:31 -0500)]
winsys/amdgpu: reorder chunks, make BO_HANDLES first, IB and FENCE last
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 16 Aug 2018 01:17:06 +0000 (21:17 -0400)]
winsys/amdgpu: make IBs writable and expose their address
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 12 Feb 2019 20:01:18 +0000 (15:01 -0500)]
ac: add REWIND and GDS registers to register headers
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 12 Feb 2019 20:00:53 +0000 (15:00 -0500)]
ac: add ac_get_i1_sgpr_mask
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 12 Feb 2019 17:19:33 +0000 (12:19 -0500)]
ac: add radeon_info::is_pro_graphics
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 12 Feb 2019 17:14:15 +0000 (12:14 -0500)]
ac: add radeon_info::marketing_name, replacing the winsys callback
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 4 Feb 2019 19:31:59 +0000 (14:31 -0500)]
tgsi/scan: add uses_drawid
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Kenneth Graunke [Fri, 5 Apr 2019 18:54:10 +0000 (11:54 -0700)]
iris: Track valid data range and infer unsynchronized mappings.
Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data. If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.
u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization. At
the very least, it would let us measure the benefit of threading
independently from this optimization. And it's not a lot of code.
Removes most stall avoidance blits in "Total War: WARHAMMER."
On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):
----------------------------------------------
| DiRT Rally | +2% (avg) | + 2% (max) |
| Bioshock Infinite | +3% (avg) | + 9% (max) |
| Shadow of Mordor | +7% (avg) | +20% (max) |
----------------------------------------------
Kenneth Graunke [Tue, 16 Apr 2019 20:23:06 +0000 (13:23 -0700)]
iris: Make a resource_is_busy() helper
This checks both "is it busy" and "do we have work queued up for it"?
Kenneth Graunke [Tue, 12 Mar 2019 21:51:22 +0000 (14:51 -0700)]
iris: Replace buffer backing storage and rebind to update addresses.
This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either
of these happen, we swap out the backing storage of the buffer for a
new idle BO, allowing us to write to it immediately without stalling
or queueing a blit.
On my Skylake GT4e at 1920x1080, this improves performance in games:
-----------------------------------------------
| DiRT Rally | +25% (avg) | +17% (max) |
| Bioshock Infinite | +22% (avg) | +11% (max) |
| Shadow of Mordor | +27% (avg) | +83% (max) |
-----------------------------------------------
Kenneth Graunke [Mon, 22 Apr 2019 22:16:49 +0000 (15:16 -0700)]
iris: Make memzone_for_address non-static
I want to use this in iris_resource.c.
Kenneth Graunke [Wed, 17 Apr 2019 06:54:37 +0000 (23:54 -0700)]
iris: Make a gl_shader_stage -> pipe_shader_stage helper function
This is probably not the best place for it, but I don't feel like moving
the one out of the TGSI translator today, and we already have the other
direction here, so...*shrug*
Kenneth Graunke [Mon, 22 Apr 2019 18:27:37 +0000 (11:27 -0700)]
iris: Rework image views to store pipe_image_view.
This will be useful when rebinding images.
Kenneth Graunke [Wed, 17 Apr 2019 06:44:15 +0000 (23:44 -0700)]
iris: Rework UBOs and SSBOs to use pipe_shader_buffer
This unifies a bunch of the UBO and SSBO code to use common structures.
Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size,
which can be useful when filling out the surface state.
Kenneth Graunke [Wed, 17 Apr 2019 06:01:41 +0000 (23:01 -0700)]
iris: Track bound constant buffers
This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS)
looking to see if any resources are bound.
Kenneth Graunke [Tue, 23 Apr 2019 02:11:44 +0000 (19:11 -0700)]
iris: Mark constants dirty on transfer unmap even if no flushes occur
I have various conditions in place to try and avoid unnecessary
PIPE_CONTROL flushes, especially to batches which may have never
used the buffer being mapped. But if we do a CPU map to a bound
constant buffer, we still need to mark push constants dirty, even
if there's nothing happening in batches that would warrant a flush.
Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus
(lots of rainbow colored triangles). Fixes lots of blinking elements
in "Shadow of Mordor". Fixes missing crowd rendering in "DiRT Rally".
Lionel Landwerlin [Wed, 20 Feb 2019 12:50:56 +0000 (12:50 +0000)]
intel: workaround VS fixed function issue on Gen9 GT1 parts
The issue is noticeable in the
dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d
test where a triangle goes missing when we use the maximum number of
URB entries as specified by the documentation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 18 Apr 2019 21:29:03 +0000 (14:29 -0700)]
intel/compiler: Improve fix_3src_operand()
Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will
be handled by opt_combine_constants(). Both are already allowed by
opt_copy_propagation().
Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other
stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think
those occur. This is sufficient to allow a pass added in a future commit
(fs_visitor::lower_linterp) to avoid emitting extra MOV instructions.
I removed the 'src.stride > 1' case because it seems wrong: 3-src
instructions on Gen6-9 are align16-only and can only do stride=1 or
stride=0. A run through Jenkins with an assert(src.stride <= 1) never
triggers, so it seems that it was dead code.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Matt Turner [Thu, 18 Apr 2019 17:11:54 +0000 (10:11 -0700)]
intel/compiler: Add unit tests for sat prop for different exec sizes
The two new unit tests verify that propagating a saturate between
instructions of different exec sizes does not happen.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Matt Turner [Thu, 18 Apr 2019 17:09:08 +0000 (10:09 -0700)]
intel/compiler: Use SIMD16 instructions in fs saturate prop unit test
Will allow us to test that propagation between instructions of different
exec sizes does not happen (in the next commit).
The stray-looking change in intervening_dest_write is to adjust the size
of the texture result to keep the test functioning identically when the
instructions' exec sizes are doubled. Without the change, the texture
does not overwrite the destination fully as the unit test intends.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Rafael Antognolli [Tue, 23 Oct 2018 21:06:33 +0000 (14:06 -0700)]
intel/fs: Remove fs_generator::generate_linterp from gen11+.
We now have a lowering pass that will do this at the fs_visitor level,
so we can remove this code from gen11+.
v2: Reduce size of the "i" array from 4 to 2 (Matt).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Rafael Antognolli [Tue, 23 Oct 2018 16:03:32 +0000 (09:03 -0700)]
intel/fs: Add a lowering pass for linear interpolation.
On gen11, instead of using a PLN instruction, we convert
FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the
fs_generator code.
This patch adds a lowering pass that does the same thing at the
fs_visitor. It also drops the usage of NF types, since we don't need the
extra precision and it lets us skip the accumulator. With all that, some
optimizations will still be run on the generated code, and we should get
better scheduling.
v2: Update comment about saturation and conditional mod (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Rafael Antognolli [Fri, 19 Oct 2018 22:44:15 +0000 (15:44 -0700)]
intel/fs: Move the scalar-region conversion to the generator.
Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.
v2: Better commit message (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Rafael Antognolli [Fri, 19 Oct 2018 22:33:50 +0000 (15:33 -0700)]
intel/fs: Only propagate saturation if exec_size is the same.
Otherwise it could propagate the saturation from a SIMD16 instruction
into a SIMD8 instruction. With that, only part of the destination
register, which is the source of the move with saturation, would have
been updated.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 22 Apr 2019 16:52:24 +0000 (09:52 -0700)]
i965: Tidy bogus indentation left by previous commit
I left code indented one level too far in the previous commit to make
the diff easier to review. Drop that extra level now.
Fixes: 6981069fc80 i965: Ignore uniform storage for samplers or images, use binding info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Thu, 18 Apr 2019 00:25:29 +0000 (17:25 -0700)]
i965: Ignore uniform storage for samplers or images, use binding info
gl_nir_lower_samplers_as_deref creates new top level sampler and image
uniforms which have been split from structure uniforms. i965 assumed
that it could walk through gl_uniform_storage slots by starting at
var->data.location and walking forward based on a simple slot count.
This assumed that structure types were walked in a particular order.
With samplers and images split out of structures, it becomes impossible
to assign meaningful locations. Consider:
struct S {
sampler2D a;
sampler2D b;
} s[2];
The gl_uniform_storage locations for these follow this map:
0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0].
But the new split variables look like:
sampler2D lowered_a[2];
sampler2D lowered_b[2];
and there is no way to know that there's effectively a stride to get to
the location for successive elements of a[] or b[]. So, working with
location becomes effectively impossible.
Ultimately, the point of looking at uniform storage was to pull out the
bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs
can obtain this information while doing the splitting, however, and sets
up var->data.binding to have the desired values.
We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so
gl_nir_lower_samplers_as_derefs has the opportunity to set proper image
bindings. Then, we make the uniform handling code skip sampler(-array)
variables, and handle image param setup based on var->data.binding.
Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct,
this time without regressing dEQP-GLES2.functional.uniform_api.random.3.
Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 17 Apr 2019 21:48:10 +0000 (14:48 -0700)]
Revert "glsl: Set location on structure-split sampler uniform variables"
This reverts commit
9e0c744f07a21fc7bb018a77cf83b057436d0d1b, which
regressed dEQP-GLES2.functional.uniform_api.random.3. It turns out
that the newly produced location is meaningless and impossible to
consume by drivers that want to look at gl_uniform_storage, so it's
probably better to leave it unset (0) than a number that looks usable.
Leave a tombstone^Wcomment to discourage the next person from making
the obvious looking fix.
See the next commit for a longer description of the problem.
This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct
on i965, which was originally fixed by the revert. The next commit
will fix it again.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek Olšák [Fri, 12 Apr 2019 15:12:34 +0000 (11:12 -0400)]
radeonsi: use CP DMA for the null const buffer clear on CIK
This is a workaround for a thread deadlock that I have no idea
why it occurs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Danylo Piliaiev [Wed, 17 Apr 2019 11:32:47 +0000 (14:32 +0300)]
drirc: Add workaround for Epic Games Launcher
Epic Games Launcher could be launched in opengl mode
with "-opengl" option. It creates 4.4 opengl core context
however it uses deprecated functionality e.g. default
vertex buffer object.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110462
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Kenneth Graunke [Wed, 17 Apr 2019 05:54:40 +0000 (22:54 -0700)]
iris: Track bound and writable SSBOs
Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only). We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.
Chia-I Wu [Thu, 18 Apr 2019 22:10:22 +0000 (15:10 -0700)]
virgl: clear vertex_array_dirty
Clear vertex_array_dirty after the state is emitted.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Lubomir Rintel [Mon, 11 Mar 2019 20:18:48 +0000 (21:18 +0100)]
gallivm: disable NEON instructions if they are not supported
The LLVM project made some questionable decisions about defaults for
armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell
platforms).
On top of that, getHostCPUFeatures() doesn't disable missing machine
attributes. Finally, -neon alone is not sufficient to disable emmision
of NEON instructions.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Lubomir Rintel [Mon, 11 Mar 2019 18:16:40 +0000 (19:16 +0100)]
gallivm: guess CPU features also on ARM
getHostCPUFeatures() is also available on ARM, for even longer time than
for x86. Use it -- it potentially enables instructions that may speed
things up.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Fri, 19 Apr 2019 05:29:27 +0000 (22:29 -0700)]
iris: Enable the dual_color_blend_by_location driconf option.
This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.
Kenneth Graunke [Fri, 19 Apr 2019 05:13:41 +0000 (22:13 -0700)]
iris: Add mechanism for iris-specific driconf options
Based on Nicolai's
0f8c5de8690e7c87aa2e24383065efaca7e6fe78.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>