Matt Turner [Fri, 26 Sep 2014 00:28:20 +0000 (17:28 -0700)]
glapi: Inline x86_64_current_tls().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Edward O'Callaghan [Tue, 1 Sep 2015 08:38:34 +0000 (18:38 +1000)]
r600g: Simplify out a couple of unnecessary branches
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Marek Olšák [Sun, 30 Aug 2015 16:46:06 +0000 (18:46 +0200)]
radeonsi: use an indirect buffer for init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 16:39:19 +0000 (18:39 +0200)]
radeonsi: add IB2 indirect buffer support for pm4 states
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 15:41:23 +0000 (17:41 +0200)]
winsys/radeon: add a flag telling how gfx IBs should be padded
This is always false on amdgpu (set by calloc).
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 15:39:03 +0000 (17:39 +0200)]
winsys/amdgpu: remove IB padding for SI
SI is unsupported by amdgpu
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 12:43:59 +0000 (14:43 +0200)]
radeonsi: remove unused macro si_pm4_set_state
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 12:39:54 +0000 (14:39 +0200)]
radeonsi: remove si_pm4_cleanup
All remaining pm4 state are created and destroyed by state trackers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 12:13:10 +0000 (14:13 +0200)]
radeonsi: rework uploading border colors
The border colors are uploaded only once when the state is created.
This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.
It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 11:17:15 +0000 (13:17 +0200)]
radeonsi: use all built-in border colors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 10:39:45 +0000 (12:39 +0200)]
radeonsi: inline si_cmd_context_control
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 10:35:02 +0000 (12:35 +0200)]
radeonsi: remove unused si_pm4_state code
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 10:25:03 +0000 (12:25 +0200)]
radeonsi: reorder si_context variables
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 01:56:13 +0000 (03:56 +0200)]
radeonsi: don't send IB dword usage to si_need_cs_space
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 01:53:39 +0000 (03:53 +0200)]
radeonsi: don't set number of IB dwords for states
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 01:49:15 +0000 (03:49 +0200)]
radeonsi: don't count IB space for states, just use an upper bound
Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 01:17:30 +0000 (03:17 +0200)]
radeonsi: convert SPI state to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 00:04:37 +0000 (02:04 +0200)]
gallium/radeon: rename r600_context_bo_reloc -> radeon_add_to_buffer_list
this name should be easy to understand without other knowledge
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 23:54:00 +0000 (01:54 +0200)]
gallium/radeon: rename write_*_reg functions
e.g. radeon_set_context_reg is nicer and looks consistent next to
radeon_emit().
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 23:35:03 +0000 (01:35 +0200)]
radeonsi: rename and precalculate polygon offset states
one less calloc and state construction while drawing
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 22:50:42 +0000 (00:50 +0200)]
radeonsi: convert CB_TARGET_MASK setup to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 22:16:01 +0000 (00:16 +0200)]
radeonsi: don't set VGT_VTX_CNT_EN twice in init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 15:00:11 +0000 (17:00 +0200)]
radeonsi: convert stencil ref state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 13:05:53 +0000 (15:05 +0200)]
radeonsi: convert blend color state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 13:05:53 +0000 (15:05 +0200)]
radeonsi: convert sample mask state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 12:54:58 +0000 (14:54 +0200)]
radeonsi: convert clip state into an atom
Reducing calloc overhead.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 00:32:13 +0000 (02:32 +0200)]
radeonsi: avoid redundant CB and DB register updates
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.
Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 00:02:29 +0000 (02:02 +0200)]
radeonsi: don't rebind GSVS ring buffers every draw call using GS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 23:56:27 +0000 (01:56 +0200)]
radeonsi: don't clear the tessellation factor ring buffer
Leftover from the bring-up.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 23:45:28 +0000 (01:45 +0200)]
radeonsi: remove the tf_ring state, add the registers to init_config
One less state to worry about.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 23:45:28 +0000 (01:45 +0200)]
radeonsi: remove the gs_rings state, add the registers to init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 22:49:40 +0000 (00:49 +0200)]
radeonsi: use a bitmask for tracking dirty atoms
This mainly removes the cache misses when checking the dirty flags.
Not much else though.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 22:03:02 +0000 (00:03 +0200)]
radeonsi: initialize atom IDs for external atoms
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 21:52:47 +0000 (23:52 +0200)]
radeonsi: call si_init_atom for remaining radeonsi atoms
I need to initialize more atom IDs.
This adds 4 more si_init_atom calls, which simplifies the code.
(si_init_atom needs a different context type of the emit functions though)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 21:26:50 +0000 (23:26 +0200)]
radeonsi: initialize atom IDs
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 19:59:22 +0000 (21:59 +0200)]
radeonsi: define the state atom array separately
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 19:48:37 +0000 (21:48 +0200)]
radeonsi: optimize viewport states
same as scissors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 19:08:49 +0000 (21:08 +0200)]
radeonsi: optimize scissor states
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 20:33:02 +0000 (22:33 +0200)]
radeonsi: add SI_MAX_ATTRIBS
PIPE_MAX_ATTRIBS is 32, but we currently only support 16.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 01:44:03 +0000 (03:44 +0200)]
radeonsi: fix memory usage checking for big IBs
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 22:12:03 +0000 (00:12 +0200)]
radeonsi: set all 16 viewport Z bounds for GL 4.1
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sat, 29 Aug 2015 20:59:23 +0000 (22:59 +0200)]
radeonsi: fix a Unigine Heaven hang when drirc is missing
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Sun, 30 Aug 2015 09:59:23 +0000 (11:59 +0200)]
winsys/amdgpu: use small IBs for better performance on VI
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Fri, 28 Aug 2015 18:53:08 +0000 (20:53 +0200)]
gallium/util: add u_bit_scan_consecutive_range
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Chris Wilson [Sat, 6 Jun 2015 08:33:33 +0000 (09:33 +0100)]
i965: Prevent coordinate overflow in intel_emit_linear_blit
Fixes regression from
commit
8c17d53823c77ac1c56b0548e4e54f69a33285f1
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Wed Apr 15 03:04:33 2015 -0700
i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.
which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.
Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Connor Abbott [Fri, 1 May 2015 06:51:12 +0000 (02:51 -0400)]
i965/nir: enable the dead control flow optimization
total instructions in shared programs:
7541551 ->
7541381 (-0.00%)
instructions in affected programs: 3054 -> 2884 (-5.57%)
helped: 29
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Fri, 8 May 2015 18:42:14 +0000 (14:42 -0400)]
nir/dead_cf: add support for removing useless loops
v2: fix detecting if the loop has any phi nodes after it.
v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when
checking for values live after the loop to catch const_load
instructions.
v2: fix handling return instructions
v2: add some documentation to loop_is_dead()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Connor Abbott [Fri, 8 May 2015 18:40:58 +0000 (14:40 -0400)]
nir: add a helper for iterating over blocks in a cf node
We were already doing this internally for iterating over a function
implementation, so just expose it directly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Fri, 8 May 2015 17:17:10 +0000 (13:17 -0400)]
nir: add nir_block_get_following_loop() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Fri, 8 May 2015 05:44:24 +0000 (01:44 -0400)]
nir/dead_cf: delete code that's unreachable due to jumps
v2: use nir_cf_node_remove_after().
v2: use foreach_list_typed() instead of hardcoding a list walk.
v3: update to new control flow modification helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Fri, 1 May 2015 06:38:17 +0000 (02:38 -0400)]
nir: add an optimization for removing dead control flow
v2: use nir_cf_node_remove_after() instead of our own broken thing.
v3: use the new control flow modification helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dave Airlie [Tue, 1 Sep 2015 02:29:58 +0000 (12:29 +1000)]
r600g: fix calculation for gpr allocation
I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.
This is totally my fault as well, maths fail 101.
This makes geom shaders on r600 not fail heavily.
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Marta Lofstedt [Mon, 24 Aug 2015 11:01:53 +0000 (13:01 +0200)]
mesa: Limit Framebuffer Parameter OpenGL ES 3.1 usage
According to OpenGL ES 3.1 specification, section 9.2.1 for
glFramebufferParameter and section 9.2.3 for glGetFramebufferParameteriv:
"An INVALID_ENUM error is generated if pname is not FRAMEBUFFER_DEFAULT_WIDTH,
FRAMEBUFFER_DEFAULT_HEIGHT, FRAMEBUFFER_DEFAULT_SAMPLES, or
FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS."
Therefore exclude OpenGL ES 3.1 from using the GL_FRAMEBUFFER_DEFAULT_LAYERS
parameter.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Kevin Rogovin <kevin.rogovin at intel.com>
Marta Lofstedt [Tue, 1 Sep 2015 05:19:11 +0000 (08:19 +0300)]
mesa: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1
V2: Conform to new standard for exposing enums for OpenGL ES 3.1.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Jason Ekstrand [Mon, 31 Aug 2015 23:54:02 +0000 (16:54 -0700)]
nir/builder: Use nir_after_instr to advance the cursor
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node. Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node. By just resetting to after_instr, we
avoid all these problems.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nanley Chery [Tue, 19 May 2015 19:28:20 +0000 (12:28 -0700)]
i965: advertise ASTC support for Skylake
v2: remove OES ASTC extension reference.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Mon, 31 Aug 2015 23:38:09 +0000 (16:38 -0700)]
mesa/glformats: recognize ASTC formats as color formats
ASTC formats contain RGBA components.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Wed, 12 Aug 2015 21:41:50 +0000 (14:41 -0700)]
mesa/texformat: use format conversion function in _mesa_choose_tex_format
This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Wed, 26 Aug 2015 19:01:38 +0000 (12:01 -0700)]
mesa/texcompress: correct mapping of S3TC formats in conversion function
MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Dave Airlie [Mon, 31 Aug 2015 04:22:23 +0000 (14:22 +1000)]
r600/sb: update last_cf for finalize if.
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.
I think this fixes one of the earlier shader going off end
of memory problems we've stopped.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Matt Turner [Sat, 29 Aug 2015 00:10:00 +0000 (17:10 -0700)]
i965/fs: Use greater-equal cmod to implement maximum.
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.
See commit
3b7f683f.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Ben Widawsky [Thu, 9 Jul 2015 00:04:10 +0000 (17:04 -0700)]
i965/chv|skl: Apply sampler bypass w/a
Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.
This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/
v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
surface state open coded because it seems to go better with the existing code.
v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).
Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dave Airlie [Thu, 27 Aug 2015 01:13:14 +0000 (02:13 +0100)]
st/mesa: move to renumbering registers in a group
This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 27 Aug 2015 00:46:33 +0000 (01:46 +0100)]
st/mesa: reduce time spent in calculating temp read/writes
The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,
so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 27 Aug 2015 00:01:00 +0000 (01:01 +0100)]
st/mesa: cache tgsi opcode info in the instruction
Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 30 Aug 2015 10:40:31 +0000 (20:40 +1000)]
r600: move prim convert from geom shader to function.
This should avoid C++ fail including this header.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Sun, 9 Aug 2015 06:25:50 +0000 (16:25 +1000)]
glsl: remove specical case subroutine type counting
Unlike samplers we can get the correct value for subroutines from
component_slots()
Reviewed-by: Dave Airlie <airlied@redhat.com>
Edward O'Callaghan [Sat, 29 Aug 2015 08:31:09 +0000 (18:31 +1000)]
r600g: Use TGSI parse results instead of manually exfiltrating
This makes better use of the work that the TGSI API has done for
us.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Edward O'Callaghan [Sat, 29 Aug 2015 08:31:08 +0000 (18:31 +1000)]
r600g: Set geometry properties in r600_create_shader_state()
The selector is shared by all shader variants, so the
individual shaders shouldn't change it. Use tgsi_shader_scan()
results to set geometry properties within a
r600_create_shader_state() call and treat said propertices in
the selector as read-only within r600_shader_from_tgsi().
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Edward O'Callaghan [Sat, 29 Aug 2015 08:31:07 +0000 (18:31 +1000)]
r600g: Move geometry properties state from shader to selector
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Edward O'Callaghan [Sat, 29 Aug 2015 08:31:06 +0000 (18:31 +1000)]
r600g: Remove dead assigment to 'gs_input_prim' in shader state
Note that 'geometry shader properties' should be carried in the
selector state over the shader state in any case.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Tue, 25 Aug 2015 17:21:38 +0000 (19:21 +0200)]
radeonsi: don't use the emit qt keyword in si_init_atom
It confuses my editor.
Marek Olšák [Sun, 23 Aug 2015 11:05:53 +0000 (13:05 +0200)]
radeonsi: remove no-op 32-bit masking
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Sun, 23 Aug 2015 10:57:09 +0000 (12:57 +0200)]
gallium/radeon: fix the ADDRESS_HI mask for EVENT_WRITE CIK packets
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Sat, 22 Aug 2015 16:05:37 +0000 (18:05 +0200)]
winsys/radeon: handle non-zero finite timeout when waiting for buffers
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Ilia Mirkin [Wed, 26 Aug 2015 04:11:23 +0000 (00:11 -0400)]
freedreno/a3xx: implement half-z clipping
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Tue, 25 Aug 2015 03:31:00 +0000 (23:31 -0400)]
freedreno/a3xx: add basic clip plane support
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Samuel Pitoiset [Sat, 29 Aug 2015 08:58:49 +0000 (10:58 +0200)]
nvc0: change prefix of MP performance counters to HW_SM
According to NVIDIA, local performance counters (MP) are prefixed
with SM, while global performance counters (PCOUNTER) are called PM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Fri, 28 Aug 2015 17:09:33 +0000 (19:09 +0200)]
nvc0: sort performance counter queries by name
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Fri, 28 Aug 2015 16:41:16 +0000 (18:41 +0200)]
nvc0: make names of performance counter queries consistent
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Fri, 28 Aug 2015 16:30:13 +0000 (18:30 +0200)]
nvc0: use enumerations for driver queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Fri, 28 Aug 2015 16:15:13 +0000 (18:15 +0200)]
nvc0: remove commented out code related to PCOUNTER queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Dave Airlie [Fri, 28 Aug 2015 00:46:10 +0000 (10:46 +1000)]
r600: port si_conv_prim_to_gs_out from radeonsi
This code was broken by the tess merge, and I totally missed it
until now. I'm not sure this fixes anything but it stops the assert.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 27 Aug 2015 23:58:15 +0000 (09:58 +1000)]
r600g: use PRIi64 for some compute debug printfs
Otherwise this will crash on 32-bit, and it gets rid of
warnings building on 32-bit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 27 Aug 2015 23:57:04 +0000 (09:57 +1000)]
gallium/util: fix debug_get_flags_option on 32-bit
On 32-bit we need to use PRIu64 flags for printfs,
otherwise this segfaults in R600_DEBUG=help otherwise.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ilia Mirkin [Fri, 21 Aug 2015 01:55:52 +0000 (21:55 -0400)]
glsl: provide the option of using BFE for unpack builting lowering
This greatly improves generated code, especially for the snorm variants,
since it is able to get rid of the lshift/rshift for sext, as well as
replacing each shift + mask with a single op.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ilia Mirkin [Fri, 21 Aug 2015 00:52:32 +0000 (20:52 -0400)]
glsl: use bitfield_insert instead of and + shift + or for packing
It is fairly tricky to detect the proper conditions for using bitfield
insert, but easy to just use it up front. This removes a lot of
instructions on nvc0 when invoking the packing builtins.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Mon, 17 Aug 2015 21:38:31 +0000 (14:38 -0700)]
i965/fs: Remove fs_visitor::try_replace_with_sel().
No shader-db changes on g4x, snb, hsw, or bdw.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 28 Aug 2015 01:30:34 +0000 (18:30 -0700)]
i965/fs: Replace awful variable names.
start_to -> dst_start
end_to -> dst_end
start_from -> src_start
end_from -> src_end
var_to -> dst_var
var_from -> src_var
reg_to -> dst_reg
reg_to_offset -> dst_reg_offset
reg_from -> src_reg
Not sure how these made sense to me before.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 19 Aug 2015 00:47:00 +0000 (17:47 -0700)]
i965/fs: Skip blocks in register coalescing interference check.
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 17 Aug 2015 23:03:27 +0000 (16:03 -0700)]
i965/fs: Improve register coalescing interference check.
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.
The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.
With that fixed, we can calculate interference more accurately.
total instructions in shared programs:
6261319 ->
6257917 (-0.05%)
instructions in affected programs: 346282 -> 342880 (-0.98%)
helped: 1552
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 19 Aug 2015 00:10:44 +0000 (17:10 -0700)]
i965/fs: Use overwrites_reg() instead of dst.equals().
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 18 Aug 2015 21:28:03 +0000 (14:28 -0700)]
i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM.
Noticed when debugging things that lead to the next patch.
On G45 (and presumably ILK) this helps register coalescing:
total instructions in shared programs:
4077373 ->
4077340 (-0.00%)
instructions in affected programs: 43751 -> 43718 (-0.08%)
helped: 52
HURT: 2
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marta Lofstedt [Fri, 28 Aug 2015 08:22:41 +0000 (10:22 +0200)]
i965/fs: Do not set the size for zero-size uniforms
Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them. This was previously fixed in:
commit:
781dc7c0e1f41502f18e07c0940af949a78d2792.
However,
commit:
259f7291de2387aa3ac5f856b39b7b934a1d8e7d
removed the fix.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Daniel Scharrer [Fri, 28 Aug 2015 09:45:36 +0000 (11:45 +0200)]
mesa: return old name for deleted samplers for SAMPLER_BINDING queries
If the sampler object has been deleted in the same context the binding
will have been cleared. If it has been deleted in another context, the
spec does not say what should returned. None of the other binding point
queries check for deletion in another context.
Also, as names of deleted objects are free for reuse, the current code
didn't even work reliably.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Daniel Scharrer [Fri, 28 Aug 2015 09:45:35 +0000 (11:45 +0200)]
mesa: add missing queries for ARB_direct_state_access
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Neil Roberts [Fri, 28 Aug 2015 13:29:22 +0000 (14:29 +0100)]
docs: Fix a typo in GL3.txt concerning GL_KHR_context_flush_control
Ilia Mirkin [Fri, 28 Aug 2015 06:50:25 +0000 (02:50 -0400)]
mesa: fix dispatch sanity with GL_OES_texture_storage_multisample_2d_array
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91785
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Matt Turner <mattst88@gmail.com>
Vinson Lee [Tue, 21 Jul 2015 21:02:01 +0000 (14:02 -0700)]
ABI-check: Use more portable bash invocation.
Fixes 'make check' on FreeBSD.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Boyan Ding [Fri, 21 Aug 2015 13:42:45 +0000 (21:42 +0800)]
i965/nir: Make use of nir_opt_undef
Shader-db result on Ivy Bridge:
total instructions in shared programs: 145484 -> 145445 (-0.03%)
instructions in affected programs: 225 -> 186 (-17.33%)
helped: 5
HURT: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>