Marek Olšák [Wed, 23 May 2018 04:34:38 +0000 (00:34 -0400)]
radeonsi: fix passing gl_ClipVertex for GS and tess
Also add the fprintf call.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 23 May 2018 04:02:10 +0000 (00:02 -0400)]
radeonsi: fix color inputs/outputs for GS and tess
GS is tested, tessellation is untested.
Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 23 May 2018 04:16:25 +0000 (00:16 -0400)]
radeonsi: fix incorrect parentheses around VS-PS varying elimination
I don't know if it caused issues.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 23 May 2018 18:41:25 +0000 (14:41 -0400)]
st/mesa: simplify lastLevel determination in st_finalize_texture
This fixes shader images where we always bind stObj->pt and not individual
gl_texture_images.
Roughly based on i965 commit
845ad2667ab2466752f06ea30bdb9c837116c308
which does a similar thing but for a different reason.
This fixes GL CTS assertion failures introduced by Ilia.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Scott D Phillips [Mon, 30 Apr 2018 17:25:48 +0000 (10:25 -0700)]
i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Alok Hota [Fri, 25 May 2018 15:19:49 +0000 (10:19 -0500)]
swr/rast: Adjusted avx512 primitive assembly for msvc codegen
Optimize AVX-512 PA Assemble (PA_STATE_OPT). Reduced generated code by
about 4x, MSVC compiler was going crazy making temporaries and
split-loading inputs onto the stack unless explicit AVX-512 load ops
were added
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Fri, 25 May 2018 15:19:48 +0000 (10:19 -0500)]
swr/rast: Moved memory init out of core swr init
Added two new files for a wrapper function for initialization
v2: added missing include for single architecture builds
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Fri, 25 May 2018 15:19:47 +0000 (10:19 -0500)]
swr/rast: Removed superfluous JitManager argument from passes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Fri, 25 May 2018 15:19:46 +0000 (10:19 -0500)]
swr/rast: Renamed MetaData calls
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Fri, 25 May 2018 15:19:45 +0000 (10:19 -0500)]
swr/rast: Use metadata to communicate between passes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Fri, 25 May 2018 15:19:44 +0000 (10:19 -0500)]
swr/rast: Check gCoreBuckets/CORE_BUCKETS equal length at compile time
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Alok Hota [Fri, 25 May 2018 15:19:43 +0000 (10:19 -0500)]
swr/rast: Added in-place building to SCATTERPS
SCATTERPS previously assumed it was being used with an existing basic
block
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Samuel Pitoiset [Thu, 24 May 2018 20:55:54 +0000 (22:55 +0200)]
radv: run the EarlyCSEMemSSA LLVM pass
It's recommended by the instruction combining pass, and
RadeonSI also runs it.
This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.
Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size:
22560328 ->
22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)
Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size:
22479428 ->
22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 24 May 2018 11:09:14 +0000 (13:09 +0200)]
radv: fix dumping compute shader on the graphics queue
The graphics pipeline can be NULL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 24 May 2018 11:09:13 +0000 (13:09 +0200)]
radv: add radv_dump_pipeline_state() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 24 May 2018 11:09:12 +0000 (13:09 +0200)]
radv: rework how shaders are dumped when generating a hang report
Use a flag for the active stages instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 24 May 2018 11:09:11 +0000 (13:09 +0200)]
radv: remove unused parameter in radv_dump_annotated_shader()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jose Dapena Paz [Thu, 24 May 2018 17:56:24 +0000 (19:56 +0200)]
mesa: do not leak ctx->Shader.ReferencedProgram references
When glUseProgram is used, references to the included shaders are
added in ctx->Shader.ReferencedProgram. But those references are not
decreased when the shader data is deallocated. Thus, those shaders
are leaked.
Explicitely remove the pending references to these shaders.
Fixes: e6506b3cd23 ("mesa: retain gl_shader_programs after glDeleteProgram if they are in use")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Wed, 16 May 2018 02:04:20 +0000 (22:04 -0400)]
radeonsi: set DB_EQAA.MAX_ANCHOR_SAMPLES correctly
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 16 May 2018 02:03:40 +0000 (22:03 -0400)]
radeonsi: round ps_iter_samples in set_min_samples
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 16 May 2018 01:51:07 +0000 (21:51 -0400)]
radeonsi: remove redundant ps_iter_samples clamp
si_get_ps_iter_samples already does this.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 15 May 2018 21:19:33 +0000 (17:19 -0400)]
radeonsi: remove some old gfx 9.x registers
Leftover from bring up.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 3 May 2018 01:03:44 +0000 (21:03 -0400)]
radeonsi: disable primitive binning for all blitter ops
same as amdvlk.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 1 May 2018 18:34:19 +0000 (14:34 -0400)]
ac/surface/gfx6: don't overallocate mipmapped HTILE
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Eric Engestrom [Thu, 17 May 2018 15:16:34 +0000 (16:16 +0100)]
egl/x11: deduplicate depth-to-format logic
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tapani Pälli [Wed, 9 May 2018 06:12:33 +0000 (09:12 +0300)]
i965: enable OES_texture_view for gen8+
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Wed, 9 May 2018 06:12:32 +0000 (09:12 +0300)]
mesa: changes to expose OES_texture_view extension
Functionality already covered by ARB_texture_view, patch also
adds missing 'gles guard' for enums (added in
f1563e6392).
Tested via arb_texture_view.*_gles3 tests and individual app
utilizing texture view with ETC2.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Juan A. Suarez Romero [Wed, 23 May 2018 13:57:20 +0000 (15:57 +0200)]
docs: update release calendar for 18.1 series
v2: extend 18.1 series (Andres)
v3: fix copy/paste typo (Engestrom)
CC: Andres Gomez <agomez@igalia.com>
CC: Emil Velikov <emil.l.velikov@gmail.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Samuel Pitoiset [Wed, 23 May 2018 12:31:56 +0000 (14:31 +0200)]
radv: call nir_lower_io_to_temporaries for VS, GS, TES and FS
Do not lower FS inputs because this moves all load_var
instructions at beginning of shaders and because
interp_var_at_sample (and friends) seem broken. That might
be eventually enabled later on if we really want to preload
all FS inputs at beginning.
Polaris10:
Totals from affected shaders:
SGPRS: 54072 -> 54264 (0.36 %)
VGPRS: 38580 -> 38124 (-1.18 %)
Spilled SGPRs: 652 -> 652 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size:
2128116 ->
2127380 (-0.03 %) bytes
Max Waves: 8048 -> 8086 (0.47 %)
Vega10:
Totals from affected shaders:
SGPRS: 52616 -> 52656 (0.08 %)
VGPRS: 37536 -> 37116 (-1.12 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Code Size:
2043756 ->
2042672 (-0.05 %) bytes
Max Waves: 9176 -> 9254 (0.85 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 23 May 2018 12:31:55 +0000 (14:31 +0200)]
radv: call nir_split_var_copies() before nir_lower_var_copies()
This doesn't nothing special currently because we don't create
any copy_var instructions, but this is needed for the next patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Francisco Jerez [Fri, 16 Mar 2018 21:28:59 +0000 (14:28 -0700)]
i965: Use intel_bufferobj_buffer() wrapper in image surface state setup.
Instead of directly using intel_obj->buffer. Among other things
intel_bufferobj_buffer() will update intel_buffer_object::
gpu_active_start/end, which are used by glBufferSubData() to decide
which path to take. Fixes a failure in the Piglit
ARB_shader_image_load_store-host-mem-barrier Buffer Update/WaW tests,
which could be reproduced with a non-standard glGetTexSubImage
implementation (see bug report).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105351
Reported-by: Nanley Chery <nanleychery@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Francisco Jerez [Fri, 16 Mar 2018 21:35:10 +0000 (14:35 -0700)]
i965: Handle non-zero texture buffer offsets in buffer object range calculation.
Otherwise the specified surface state will allow the GPU to access
memory up to BufferOffset bytes past the end of the buffer. Found by
inspection.
v2: Protect against out-of-range BufferOffset (Nanley).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Francisco Jerez [Fri, 16 Mar 2018 20:06:26 +0000 (13:06 -0700)]
i965: Move buffer texture size calculation into a common helper function.
The buffer texture size calculations (should be easy enough, right?)
are repeated in three different places, each of them subtly broken in
a different way. E.g. the image load/store path was never fixed to
clamp to MaxTextureBufferSize, and none of them are taking into
account the buffer offset correctly. It's easier to fix it all in one
place.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106481
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Francisco Jerez [Fri, 16 Mar 2018 20:43:27 +0000 (13:43 -0700)]
Revert "mesa: simplify _mesa_is_image_unit_valid for buffers"
This reverts commit
c0ed52f6146c7e24e1275451773bd47c1eda3145. It was
preventing the image format validation from being done on buffer
textures, which is required to ensure that the application doesn't
attempt to bind a buffer texture with an internal format incompatible
with the image unit format (e.g. of different texel size), which is
not allowed by the spec (it's not allowed for *any* texture target,
whether or not there is spec wording restricting this behavior
specifically for buffer textures) and will cause the driver to
calculate texel bounds incorrectly and potentially crash instead of
the expected behavior.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106465
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Bas Nieuwenhuizen [Wed, 23 May 2018 09:34:15 +0000 (11:34 +0200)]
ac: Use DPP for build_ddxy where possible.
WQM is pretty reliable now on LLVM 7, so let us just use
DPP + WQM.
This gives approximately a 1.5% performance increase on the
vrcompositor built-in benchmark.
v2: Use ac_build_quad_swizzle.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Miguel Casas [Mon, 7 May 2018 15:45:21 +0000 (11:45 -0400)]
i965: add {X,A}BGR2101010 to 'intel_image_formats'
This patch adds {X,A}BGR2101010 entries to the list of supported
'intel_image_formats'.
Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Miguel Casas [Mon, 7 May 2018 15:45:20 +0000 (11:45 -0400)]
dri_util: Add R10G10B10{A,X}2 translation between DRI and mesa_format.
Add R10G10B10{A,X}2 translation between mesa_format and DRI format
to driGLFormatToImageFormat() and driImageFormatToGLFormat().
Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Dylan Baker [Mon, 21 May 2018 17:30:42 +0000 (10:30 -0700)]
bin/get-pick-listh.sh: force git --pretty=medium
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Dylan Baker [Mon, 21 May 2018 17:28:34 +0000 (10:28 -0700)]
bin/bugzilla_mesa.sh: explicitly set the --pretty argument
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Eric Engestrom [Wed, 23 May 2018 11:47:33 +0000 (12:47 +0100)]
docs: drop unnecessary out-of-frame target
I'm guessing an earlier version of the website used to have the page
contents in <frames>, but this isn't the case anymore so just drop the
unnecessary `target="_main"` :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Engestrom [Wed, 23 May 2018 11:46:44 +0000 (12:46 +0100)]
docs: fix various html tags mistakes
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Engestrom [Wed, 23 May 2018 11:46:00 +0000 (12:46 +0100)]
docs: fix `<` & `>` used in html code
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Juan A. Suarez Romero [Tue, 22 May 2018 07:33:19 +0000 (09:33 +0200)]
docs: add news notes to 18.1.0
CC: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Dave Airlie [Thu, 10 May 2018 00:01:58 +0000 (01:01 +0100)]
tgsi/scan: add hw atomic to the list of memory accessing files
This fixes 4 out of 5 cases in:
arb_framebuffer_no_attachments-atomic on cayman.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
Roland Scheidegger [Tue, 22 May 2018 00:12:38 +0000 (02:12 +0200)]
llvmpipe: improve rasterization discard logic
This unifies the explicit rasterization discard as well as the implicit
rasterization disabled logic (which we need for another state tracker),
which really should do the exact same thing.
We'll now toss out the prims early on in setup with (implicit or
explicit) discard, rather than do setup and binning with them, which
was entirely pointless.
(We should eventually get rid of implicit discard, which should also
enable us to discard stuff already in draw, hence draw would be
able to skip the pointless clip and fallback stages in this case.)
We still need separate logic for only null ps - this is not the same
as rasterization discard. But simplify the logic there and don't count
primitives simply when there's an empty fs, regardless of depth/stencil
tests, which seems perfectly acceptable by d3d10.
While here, also fix statistics for primitives if face culling is
enabled.
No piglit changes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Bas Nieuwenhuizen [Mon, 21 May 2018 13:43:19 +0000 (15:43 +0200)]
ac/surface/gfx6: Don't force a tile index for fmask.
The bpe of the fmask often differs from the bpe of the main
surface. On SI that means it has to get a different tile
index.
addrlib is capable of figuring this out itself, so just pass
-1 instead to let it know that it is not preset.
Fixes: 9bf3570fed0 "ac/surface/gfx6: compute FMASK together with the color surface"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106511
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106499
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jason Ekstrand [Sat, 12 May 2018 01:16:48 +0000 (18:16 -0700)]
i965: Remove ring switching entirely
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 19:34:44 +0000 (12:34 -0700)]
i965/miptree: Move the access_raw call to the individual map functions
The only function that doesn't need to call access_raw is map_blit. If
it takes the blitter path, it will happen as part of intel_miptree_copy.
If map_blit takes the blorp path, brw_blorp_copy_miptrees will handle
doing whatever resolves are needed. This should save us resolves in
quite a few cases and will probably help performance a bit.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 17:09:59 +0000 (10:09 -0700)]
i965: Remove support for the BLT ring
We still support the blitter on gen4-5 but it's on the same ring as 3D.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 19:29:07 +0000 (12:29 -0700)]
i965/miptree: Use blorp for blit maps on gen6+
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 19:21:38 +0000 (12:21 -0700)]
i965/miptree: Use blorp for validation tex copies on gen6+
It's faster than the blitter and can handle things like stencil properly
so it doesn't require software fallbacks.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 18:19:06 +0000 (11:19 -0700)]
i965: Delete the blitter path for CopyTexSubImage
The blorp path (called first) can do anything the blitter path can do so
it's just dead code.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 17:30:16 +0000 (10:30 -0700)]
i965: Don't fall back to the blitter in BlitFramebuffer
On gen4-5, we try the blitter before we even try blorp. On newer
platforms, blorp can do everything the blitter can so there's no point
in even having the blitter fall-back path.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 17:29:28 +0000 (10:29 -0700)]
i965: Remove some unused includes of intel_blit.h
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 18:49:26 +0000 (11:49 -0700)]
i965/blit: Delete intel_emit_linear_blit
This function is no longer used.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Fri, 11 May 2018 23:22:47 +0000 (16:22 -0700)]
i965: Use meta for pixel ops on gen6+
Using meta for anything is fairly aweful and definitely has more CPU
overhead. However, it also uses the 3D pipe and is therefore likely
faster in terms of GPU time than the blitter. Also, the blitter code
has so many early returns that it's probably not buying us that much.
We may as well just use meta all the time instead of working over-time
to find the tiny case where we can use the blitter. We keep gen4-5
using the old blit paths to avoid perturbing old hardware too much.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Mon, 9 Apr 2018 22:39:56 +0000 (15:39 -0700)]
i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.
We'd like to start using soft-pin to assign BO addresses up front, and
never move them again. Our previous plan for dealing with 48-bit VF
cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
have addresses that alias in the low 32 bits. But that requires moving
buffers dynamically.
This patch tracks the last seen BO address for each vertex/index buffer,
and emits a VF cache invalidate if the high bits change. (Ideally, we
won't hit this case very often.) This should work for the soft-pin
case, but unfortunately won't work in the relocation case, as we don't
actually know the addresses. So, we have to use both methods.
v2: Mention that the cache uses a <VertexBufferIndex, Address> tuple
more explicitly (suggested by Scott). Mention "single batch" too
(suggested by Chris).
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Kenneth Graunke [Mon, 9 Apr 2018 23:47:11 +0000 (16:47 -0700)]
i965: Introduce a "memory zone" concept on BO allocation.
We're planning to start managing the PPGTT in userspace in the near
future, rather than relying on the kernel to assign addresses. While
most buffers can go anywhere, some need to be restricted to within 4GB
of a base address.
This commit adds a "memory zone" parameter to the BO allocation
functions, which lets the caller specify which base address the BO will
be associated with, or BRW_MEMZONE_OTHER for the full 48-bit VMA.
Eventually, I hope to create a 4GB memory zone corresponding to each
state base address.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Sat, 19 May 2018 03:04:12 +0000 (20:04 -0700)]
intel/eu: Set EXECUTE_1 when setting the rounding mode in cr0
Fixes: d6cd14f2131a5b "i965/fs: Define new shader opcode to..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Michel Dänzer [Tue, 8 May 2018 09:51:09 +0000 (11:51 +0200)]
dri3: Stricter SBC wraparound handling
Prevents corrupting the upper 32 bits of draw->recv_sbc when
draw->send_sbc resets to 0 (which currently happens when the window is
unbound from a context and bound to one again), which in turn caused
loader_dri3_swap_buffers_msc to calculate target_msc with corrupted
upper 32 bits. This resulted in hangs with the Xorg modesetting driver
as of xserver 1.20 (older versions and other drivers ignored the upper
32 bits of the target MSC, which is why this wasn't noticed earlier).
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/106351
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Samuel Pitoiset [Mon, 21 May 2018 14:57:54 +0000 (16:57 +0200)]
radv: fix computation of user sgprs for 32-bit pointers
With 32-bit pointers we only need one user SGPR per desc set.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 21 May 2018 14:57:53 +0000 (16:57 +0200)]
radv: drop user_sgpr_info::sgpr_count
It's only used inside allocate_user_sgprs().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 16 May 2018 15:40:47 +0000 (17:40 +0200)]
radv: add support for 32-bit pointers in user data SGPRs
We still use 64-bit GPU pointers for all ring buffers because
llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit
GPU pointers for now. This can be improved later anyways.
Vega10:
Totals from affected shaders:
SGPRS:
1008722 ->
1026710 (1.78 %)
VGPRS: 706580 -> 707136 (0.08 %)
Spilled SGPRs: 22555 -> 22209 (-1.53 %)
Spilled VGPRs: 75 -> 75 (0.00 %)
Code Size:
34819208 ->
35202140 (1.10 %) bytes
Max Waves: 175423 -> 175086 (-0.19 %)
Polaris10:
Totals from affected shaders:
SGPRS:
1029849 ->
1036517 (0.65 %)
VGPRS: 709984 -> 708872 (-0.16 %)
Spilled SGPRs: 22672 -> 22309 (-1.60 %)
Spilled VGPRs: 82 -> 66 (-19.51 %)
Scratch size: 76 -> 60 (-21.05 %) dwords per thread
Code Size:
34915336 ->
35309752 (1.13 %) bytes
Max Waves: 151221 -> 151677 (0.30 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 18 May 2018 08:57:02 +0000 (10:57 +0200)]
radv: add set_loc_shader_ptr() helper
This helper will hep for switching to 32-bit GPU pointers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 16 May 2018 15:32:57 +0000 (17:32 +0200)]
radv: allocate descriptor BOs in the 32-bit addr space
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 16 May 2018 15:32:38 +0000 (17:32 +0200)]
radv: allocate the upload BO in the 32-bit addr space
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 16 May 2018 14:02:04 +0000 (16:02 +0200)]
radv: set amdgpu-32bit-address-high-bits LLVM attribute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 16 May 2018 13:34:52 +0000 (15:34 +0200)]
radv/winsys: allow to allocate BOs in the 32-bit addr space
This introduces a new flag called RADEON_FLAG_32BIT.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 18 May 2018 11:59:46 +0000 (13:59 +0200)]
radv/winsys: request high address
This is needed for 32-bit GPU pointers. Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Anuj Phogat [Mon, 21 May 2018 22:21:56 +0000 (15:21 -0700)]
i965/glk: Add l3 banks count for 2x6 configuration
2x6 configuration with pci-id 0x3185 has same number of
banks (2) as 3x6 configuration (pci-id 0x3184).
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: eb23be1d97da "i965: Add and initialize l3_banks field for gen7+"
Cc: Francisco Jerez <currojerez@riseup.net>
Vinson Lee [Thu, 17 May 2018 22:39:50 +0000 (22:39 +0000)]
v3d: Include v3d_drm.h path.
Fix build error.
CC v3d_blit.lo
In file included from v3d_blit.c:27:0:
v3d_context.h:39:10: fatal error: v3d_drm.h: No such file or directory
#include "v3d_drm.h"
^~~~~~~~~~~
Fixes: 8a793d42f1cc ("v3d: Switch the vc5 driver to using the finalized V3D UABI.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Samuel Pitoiset [Mon, 21 May 2018 09:15:51 +0000 (11:15 +0200)]
radv: fix centroid interpolation
It's legal to set the centroid and sample interpolation modes
when MSAA disabled. So, we have to initialize the centroid
inputs because the hardware doesn't.
This fixes rendering issues with DXVK and The Witness, World of
Warcraft, Trackmania and probably more games.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106315
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102390
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Sun, 20 May 2018 23:31:49 +0000 (01:31 +0200)]
radv: Cleanup unused prime blit path.
Since we have the common WSI code, we use vkCmdCopyImageToBuffer
instead.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bas Nieuwenhuizen [Sun, 20 May 2018 23:26:46 +0000 (01:26 +0200)]
radv: Fix SRGB compute copies.
SRGB stores are broken. We had compensation code in the
resolve path but none in the copy path. Since we don't
want any conversion and it does not matter for DCC,
just make everything UNORM instead.
This happened to cause wrong colors for the PRIME path, as
that uses image->buffer copies which always use the compute
path.
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106587
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tapani Pälli [Wed, 16 May 2018 05:38:50 +0000 (08:38 +0300)]
android: enable VK_ANDROID_native_buffer
Patch changes entrypoints generator to not skip this extension even
though it is set as disabled in the xml. We also need compilation
flag VK_USE_PLATFORM_ANDROID_KHR to be enabled.
It looks like this extension got disabled in commit
69f447553c.
v2: just remove the whole 'supported' attrib check + remove
vk_icd.h compilation fix (fix in VulkanHeaders instead)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tapani Pälli [Fri, 18 May 2018 05:01:39 +0000 (08:01 +0300)]
vulkan: update vk_icd.h to current upstream
Import from commit
eb0c1fd on branch 'master'
of https://github.com/KhronosGroup/Vulkan-Headers.git.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Dave Airlie [Fri, 18 May 2018 00:44:27 +0000 (10:44 +1000)]
virgl: set texture buffer offset alignment to disable ARB_texture_buffer_range.
The host side hasn't got support for this feature yet, so don't enable it
unless we get the caps from the host.
This makes the texture buffer range piglit tests skip now.
Fixes: fe0647df5a7 (virgl: add offset alignment values to to v2 caps struct)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Timothy Arceri [Fri, 18 May 2018 02:36:05 +0000 (12:36 +1000)]
mesa: stop hiding query parameters from OpenGL compat
Just let the extension detection do its job as we will be adding
compat profile support in future, also we want these to work
with compat profile version overrides.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Christoph Haag [Sun, 20 May 2018 11:21:13 +0000 (13:21 +0200)]
radv: fix VK_EXT_descriptor_indexing
GetPhysicalDeviceProperties2KHR() was crashing because features was null
Fixes: 0e10790558b "radv: Enable VK_EXT_descriptor_indexing."
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Fri, 18 May 2018 23:03:57 +0000 (01:03 +0200)]
ac/surface: Only align linear power of two fmt textures.
We're not sharing 32_32_32 formats between different GPUs, so we
do not have to align for vega on pre-vega cards.
Fixes: e361970ed73 "radv: Add support for IMG_DATA_FORMAT_32_32_32."
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bas Nieuwenhuizen [Mon, 14 May 2018 15:38:36 +0000 (17:38 +0200)]
amd/addrlib: Use defines in autotools build.
Otherwise stuff like NDEBUG would not be passed through.
CC: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106479
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Aaron Watry [Sat, 26 Aug 2017 19:21:23 +0000 (14:21 -0500)]
r600/compute: Mark several functions as static
They're not used anywhere else, so keep them private
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Aaron Watry [Sat, 26 Aug 2017 19:15:17 +0000 (14:15 -0500)]
r600/compute: Remove unused compute_memory_pool functions
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Roland Scheidegger [Thu, 17 May 2018 01:45:02 +0000 (03:45 +0200)]
draw: get rid of special logic to not emit null tris
I've confirmed after
77554d220d6d74b4d913dc37ea3a874e9dc550e4 we no
longer need this to pass some tests from another api (as we no longer
generate the bogus extra null tris in the first place).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Dylan Baker [Fri, 18 May 2018 23:35:59 +0000 (16:35 -0700)]
docs: Add sha sums for release
Dylan Baker [Fri, 18 May 2018 18:29:45 +0000 (11:29 -0700)]
docs: Add release notes for 18.1.0
Alyssa Rosenzweig [Wed, 2 May 2018 02:04:55 +0000 (02:04 +0000)]
nir: Implement optional b2f->iand lowering
This pass is required by the Midgard compiler; our instruction set uses
NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction.
Normally, this lowering pass would be implemented in a backend-specific
algebraic pass, but this conflicts with the existing iand->b2f pass in
nir_opt_algebraic.py, hanging the compiler. This patch thus makes the
existing pass optional (default on -- all other backends should remain
unaffected), adding an optional pass for lowering the opposite
direction.
v2: Defer lowering until late algebraic optimisations to allow
optimising the b2f instruction itself.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jan Vesely [Fri, 18 May 2018 01:32:56 +0000 (21:32 -0400)]
travis: Adapt to radeonsi dropping support for LLVM 4
meson Vulkan, Clover, and autotools Vulkan need to be switched to llvm 5
Fixes: f9eb1ef870eba9fdacf9a8cbd815ec3bff81db05
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Thu, 17 May 2018 03:47:15 +0000 (23:47 -0400)]
radeonsi: skip ES output stores for undefined output components
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Nanley Chery [Wed, 16 May 2018 18:11:04 +0000 (11:11 -0700)]
i965: isl: Move the MCS gen7+ assertion into ISL
This is useful for every user of ISL. Drop the comment along the way to
match similar functions in ISL.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Nanley Chery [Wed, 16 May 2018 18:03:24 +0000 (11:03 -0700)]
i965/miptree: Remove format assertion in alloc_aux
intel_miptree_supports_{ccs,mcs,hiz} ensures the format is valid for the
color or depth miptree before the miptree is assigned an aux_usage.
alloc_aux switches on the aux_usage so don't assert that the format is
valid.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Nanley Chery [Wed, 16 May 2018 18:07:41 +0000 (11:07 -0700)]
i965/miptree: Simplify the switch in supports_ccs
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Nanley Chery [Wed, 16 May 2018 17:10:35 +0000 (10:10 -0700)]
i965: Make get_ccs_surf succeed in alloc_aux
Synchronize the requirements listed in isl_surf_get_ccs_surf with
intel_miptree_supports_ccs by importing a restriction from ISL. Some
implications:
* We successfully create every aux_surf in alloc_aux
* We only return false from alloc_aux if we run out of memory
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Brian Paul [Fri, 18 May 2018 01:57:21 +0000 (19:57 -0600)]
llvmpipe: fix check for a no-op shader
The tgsi_info.num_tokens fix broke llvmpipe's detection of no-op shaders.
Fix the code to check for num_instructions <= 1 instead.
Fixes: 8fde9429c36b75 ("tgsi: fix incorrect tgsi_shader_info::num_tokens
computation")
Tested-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Samuel Pitoiset [Fri, 18 May 2018 08:43:06 +0000 (10:43 +0200)]
radv: pass radv_nir_compiler_options directly to create_llvm_function()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Christian Gmeiner [Wed, 16 May 2018 14:02:54 +0000 (16:02 +0200)]
st/mesa: only define GLSL 1.4 for compat if driver supports it
Currently GLSL 1.4 is defined for all gallium drivers even only
GLSL 1.2 is supported as seen on etnaviv.
v1 -> v2:
- use _min(..) as suggested by Lucas Stach and Michel Dänzer
Fixes: 4560aad780b ("mesa: add GLSLVersionCompat constant")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Dave Airlie [Tue, 15 May 2018 05:44:04 +0000 (15:44 +1000)]
vbo: remove MaxVertexAttribStride assert check.
Some drivers (virgl) don't support GL4.4 or GLES3.1 yet,
so never fill in this const.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Fri, 11 May 2018 05:33:22 +0000 (15:33 +1000)]
mesa: drop GL_EXT_polygon_offset support
glPolygonOffset() has been part of the GL standard since 1.1. Also
niether AMD or Nvidia support this in their binary drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61761
Brian Paul [Thu, 17 May 2018 19:38:05 +0000 (13:38 -0600)]
tgsi: fix incorrect tgsi_shader_info::num_tokens computation
We were incrementing num_tokens in each loop iteration while parsing
the shader. But each call to tgsi_parse_token() can consume more than
one token (and often does). Instead, just call the tgsi_num_tokens()
function.
Luckily, this issue doesn't seem to effect any current users of this
field (llvmpipe just checks for <= 1, for example).
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Samuel Pitoiset [Thu, 17 May 2018 12:08:43 +0000 (14:08 +0200)]
radv: add radv_emit_shader_pointer() helper
For future work (support for 32-bit GPU pointers).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>