Topi Pohjolainen [Thu, 8 Jun 2017 09:31:18 +0000 (12:31 +0300)]
i965/miptree: Clarify face/level/layer in slice copy
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jonas Kulla [Mon, 19 Jun 2017 17:46:23 +0000 (19:46 +0200)]
anv: Fix L3 cache programming on Bay Trail
Valid values for URBAllocation start at 32, so substract that
before programming the register.
This was missed when porting from the GL driver.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek Olšák [Fri, 16 Jun 2017 16:13:14 +0000 (18:13 +0200)]
radeonsi: fix dumping shader descriptors into ddebug logs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 16 Jun 2017 22:44:05 +0000 (00:44 +0200)]
radeonsi: add a workaround for inexact SNORM8 blitting again
GFX9 is affected.
We only have tests for GL_x_SNORM where x is R8, RG8, RGB8, and RGBA8.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 16 Jun 2017 20:54:26 +0000 (22:54 +0200)]
radeonsi/gfx9: fix TC-compatible stencil compression
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 16 Jun 2017 20:33:22 +0000 (22:33 +0200)]
radeonsi/gfx9: fix TXF_LZ with 1D textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 16 Jun 2017 19:07:49 +0000 (21:07 +0200)]
radeonsi/gfx9: disable sparse buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 15 Jun 2017 22:11:50 +0000 (00:11 +0200)]
ac/sid.h: don't use parentheses in PKT3_RELEASE_MEM definition
The parses skips the line if it contains parentheses.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 15 Jun 2017 17:01:56 +0000 (19:01 +0200)]
ac: parse EVENT_WRITE_EOP, RELEASE_MEM, WAIT_REG_MEM, NOWHERE
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 7 Jun 2017 20:04:34 +0000 (22:04 +0200)]
st/mesa: simplify returning GL_VENDOR
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 7 Jun 2017 20:00:48 +0000 (22:00 +0200)]
st/mesa: remove the "Gallium 0.4 on" prefix from GL_RENDERER
If you want to keep it for your driver, please raise your hand.
The prefix will probably have to be added into the driver instead of here.
I cringe when I look at my long renderer string:
Gallium 0.4 on AMD Radeon R9 Fury Series (DRM 3.17.0 /
4.11.0-staging-01277-gab25a9e, LLVM 5.0.0)
I'm sincerely sorry for all apps that detect Mesa by expecting "Gallium"
in the string.
Reviewed-by: Eric Anholt <eric@anholt.net>
Marek Olšák [Fri, 9 Jun 2017 18:55:01 +0000 (20:55 +0200)]
st/mesa: don't update MSAA states for GL_FRAMEBUFFER_SRGB
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Kenneth Graunke [Mon, 2 May 2016 02:09:14 +0000 (19:09 -0700)]
i965: Ignore anisotropic filtering in nearest mode.
This fixes both Europa Universalis IV and Stellaris rendering on i965.
This was tested on SKL.
This fix was discovered by Jakub Szuppe at Stream HPC
(https://streamhpc.com/).
bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96958
bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95530
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Iago Toral Quiroga [Fri, 16 Jun 2017 10:05:20 +0000 (12:05 +0200)]
glsl: gl_Max{Vertex,Fragment}UniformComponents exist in all desktop GL versions
The current implementation assumed that these were replaced in GLSL >= 4.10
by gl_Max{Vertex,Fragment}UniformVectors, however this is not true: both
built-ins should be produced from GLSL 4.10 onwards.
This was raised by new CTS tests that are in development.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Emil Velikov [Mon, 19 Jun 2017 11:23:07 +0000 (12:23 +0100)]
docs: update calendar, add news item and link release notes for 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Mon, 19 Jun 2017 11:20:12 +0000 (12:20 +0100)]
docs: add sha256 checksums for 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Mon, 19 Jun 2017 11:13:25 +0000 (12:13 +0100)]
docs: add release notes for 17.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Nicolai Hähnle [Mon, 12 Jun 2017 08:53:07 +0000 (10:53 +0200)]
st/glsl_to_tgsi: use correct writemask when converting generic intrinsics
This fixes a bug when lowering ballotARB: previously, using writemask 0xf,
emit_asm would create TGSI_OPCODE_BALLOT instructions that span two registers
to cover 4 64-bit channels. This could trample over other a neighbouring
temporary.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101360
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 6 Jun 2017 17:21:26 +0000 (19:21 +0200)]
gallium/radeon/gfx9: fix PBO texture uploads to compressed textures
st/mesa creates a surface that reinterprets the compressed blocks as
RGBA16UI or RGBA32UI. We have to adjust width0 & height0 accordingly to
avoid out-of-bounds memory accesses by CB.
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 12 Jun 2017 19:31:43 +0000 (21:31 +0200)]
r600: fix off-by-one in egd_tables.py
Port of the corresponding fix in sid_tables.py.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 6 Jun 2017 17:17:49 +0000 (19:17 +0200)]
amd/common: fix off-by-one in sid_tables.py
The very last entry in the sid_strings_offsets table ended up missing,
leading to out-of-bounds reads and potential crashes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Iago Toral Quiroga [Fri, 16 Jun 2017 07:27:43 +0000 (09:27 +0200)]
i965: update MaxTextureRectSize to match PRMs and comply with OpenGL 4.1+
We were exposing 4096, but we can do up to 8192 in Gen4-6 and up to
16384 in gen7+. OpenGL 4.1+ requires at least 16384.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Samuel Pitoiset [Wed, 14 Jun 2017 09:27:44 +0000 (11:27 +0200)]
mesa: add KHR_no_error support for gl*UniformHandleui64*ARB
Similar to _mesa_uniform() except that we have to call
validate_uniform_parameters() instead of validate_uniform().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Wed, 14 Jun 2017 09:27:43 +0000 (11:27 +0200)]
mesa: add KHR_no_error support for glGetImageHandleARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Wed, 14 Jun 2017 09:27:42 +0000 (11:27 +0200)]
mesa: add KHR_no_error support for glGetTexture*HandleARB()
It would be nice to have a no_error path for
_mesa_test_texobj_completeness() because this function doesn't
only test if the texture is complete.
Anyway, that seems enough for now and a bunch of checks are
skipped with this patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Wed, 14 Jun 2017 09:27:41 +0000 (11:27 +0200)]
mesa: add KHR_no_error support for glMake{Image,Texture}Handle*ResidentARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Wed, 14 Jun 2017 09:27:40 +0000 (11:27 +0200)]
mesa: add KHR_no_error support for glIs{Image,Texture}HandleResidentARB()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Wed, 14 Jun 2017 11:55:12 +0000 (13:55 +0200)]
radeonsi: reduce overhead for resident textures which need color decompression
This is done by introducing a separate list.
si_decompress_textures() is now 5x faster.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 14 Jun 2017 11:55:11 +0000 (13:55 +0200)]
radeonsi: reduce overhead for resident textures which need depth decompression
This is done by introducing a separate list.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 14 Jun 2017 11:55:10 +0000 (13:55 +0200)]
radeonsi: use util_dynarray_foreach for bindless resources
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 14 Jun 2017 11:55:09 +0000 (13:55 +0200)]
mesa/util: add util_dynarray_clear() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 14 Jun 2017 09:40:59 +0000 (11:40 +0200)]
gallium/radeon: add a new HUD query for the number of resident handles
Useful for debugging performance issues when ARB_bindless_texture
is enabled. This query doesn't make a distinction between texture
and image handles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Topi Pohjolainen [Fri, 19 May 2017 12:53:40 +0000 (15:53 +0300)]
i965/gen4: Refactor depth/stencil rebase
Effectively there is the same code twice, once for depth and
again for stencil.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Fri, 19 May 2017 09:26:16 +0000 (12:26 +0300)]
i965: Drop depth/stencil miptree pointers in alignment workaround
In brw_workaround_depthstencil_alignment() corresponding
renderbuffers are always set to refer to the same temp miptrees.
There is no need to carry them in context.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Fri, 19 May 2017 08:04:54 +0000 (11:04 +0300)]
i965/gen4: Simplify depth/stencil invalidate check
There is no separate stencil on gen < 6.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Fri, 19 May 2017 07:39:21 +0000 (10:39 +0300)]
i965/gen4: Remove redundant check for depth when rebasing stencil
In case of gen < 6 stencil (if present) is always combined with
depth. Both stencil and depth attachments point to the same
physical surface.
Alignment workaround starts by considering depth and updates
stencil accordingly. Current logic continues with stencil and
in vain considers the case where depth would refer to different
surface than stencil.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Fri, 5 May 2017 11:43:20 +0000 (14:43 +0300)]
i965/gen4: Remove non-existing stencil and hiz buffer setup
Separate stencil and hiz are only enabled for gen6+.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Mauro Rossi [Sat, 20 May 2017 15:31:36 +0000 (17:31 +0200)]
android: ac: add missing libdrm_amdgpu shared dependency
Fixes building errors in amd/common:
target C: libmesa_amd_common <= external/mesa/src/amd/common/ac_gpu_info.c
...
target C: libmesa_amd_common <= external/mesa/src/amd/common/ac_surface.c
...
external/mesa/src/amd/common/ac_gpu_info.h:31:10: fatal error: 'amdgpu.h' file not found
^
2 errors
Fixes: 98a2492 ("ac_surface: use radeon_info from ac_gpu_info")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sat, 17 Jun 2017 10:40:21 +0000 (11:40 +0100)]
r600: include libelf headers only as needed
Headers are required only when building with OpenCL. As we're building
w/o it libelf may be missing, hence we'll error out as below:
src/gallium/drivers/r600/evergreen_compute.c:27:10:
fatal error: 'gelf.h' file not found
^
1 error generated.
Fixes: d96a210842 ("r600g,compute: provide local copy of functions from
ac_binary.c")
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Fri, 16 Jun 2017 18:53:50 +0000 (19:53 +0100)]
radeonsi: include ac_binary.h for struct ac_shader_binary
The header embeds the struct so it needs the header inclusion instead of
the dummy forward declaration.
Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Tom Stellard <tstellar@redhat.com>
Fixes: 32206c5e560 ("radeonsi: Add radeon_shader_binary member to struct
si_shader")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Emil Velikov [Fri, 16 Jun 2017 19:03:41 +0000 (20:03 +0100)]
r600, radeon: move radeon_shader_binary_{init,clean} back to radeon
Those are used by r600 and radeonsi, so moving them within the former
was a bad idea.
Fixes: d96a210842b ("r600g,compute: provide local copy of functions
from ac_binary.c")
Cc: Jan Vesely <jan.vesely@rutgers.edu>
Cc: Aaron Watry <awatry@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Emil Velikov [Fri, 16 Jun 2017 18:10:25 +0000 (19:10 +0100)]
ac: resolve conflicts introduced with "ac: remove amdgpu.h dependency"
The commit did not add the relevant includes - in particular
stdint.h and stdbool.h for the respective standard types.
At the same time, the amdgpu_device_handle typedef redeclaration was
off.
Fixes: 81945ded0dc ("ac: remove amdgpu.h dependency")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101471
Cc: Mark Janes <mark.a.janes@intel.com>
Cc: Gregor Münch <gr.muench@gmail.com>
Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reported-by: Gregor Münch <gr.muench@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Topi Pohjolainen [Sun, 21 May 2017 04:39:07 +0000 (07:39 +0300)]
i965/gen4: Set depth offset when there is stencil attachment only
Current version fails to set depthstencil.depth_offset when there
is only stencil attachment (it does set the intra tile offsets
though). Fixes piglits:
g45,g965,ilk: depthstencil-render-miplevels 1024 s=z24_s8
g45,ilk: depthstencil-render-miplevels 273 s=z24_s8
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Tue, 10 Jan 2017 08:52:32 +0000 (10:52 +0200)]
i965/gen6: Remove dead code in hiz surface setup
In intel_hiz_miptree_buf_create() the miptree is unconditionally
created with MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Sun, 14 May 2017 16:02:20 +0000 (19:02 +0300)]
intel/isl/gen6: Allow arrayed stencil
Nothing prevents arrayed stencil surfaces even though hardware
doesn't support mipmapping.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Brian Paul [Fri, 16 Jun 2017 22:36:43 +0000 (16:36 -0600)]
svga: add new num-failed-allocations HUD query
This counter is incremented if we fail to allocate memory for
vertex/index/const buffers, textures, etc.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Fri, 16 Jun 2017 22:35:27 +0000 (16:35 -0600)]
gallium/hud: support GALLIUM_HUD_DUMP_DIR feature on Windows
Use a dummy implementation of the access() function. Use \ path separator.
Add a few comments.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Fri, 16 Jun 2017 22:34:43 +0000 (16:34 -0600)]
svga: add a few minor comments
Trivial.
Brian Paul [Fri, 16 Jun 2017 20:45:02 +0000 (14:45 -0600)]
mesa: whitespace fixes in enable.c
Remove trailing whitespace, replace tabs w/ spaces, etc. Trivial.
Rafael Antognolli [Tue, 6 Jun 2017 16:23:31 +0000 (09:23 -0700)]
i965: Convert SF_STATE to genxml.
This patch finishes the work done by Ken of converting SF_STATE to genxml, and
merges it with gen6+ code for emitting that state.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rafael Antognolli [Tue, 6 Jun 2017 16:23:30 +0000 (09:23 -0700)]
genxml: The viewport state offset is actually an address.
This fixes code generation on gen45.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rafael Antognolli [Tue, 6 Jun 2017 16:23:29 +0000 (09:23 -0700)]
genxml: Rename fields to match gen6+.
"Anti-aliasing Enable" to "Anti-Aliasing Enable".
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rafael Antognolli [Tue, 6 Jun 2017 16:23:28 +0000 (09:23 -0700)]
genxml: Rename SF_STATE field to match gen6+.
Rename "Use Point Width State" to "Point Width Source". It accepts the same
values and has the same meaning as gen6+, so lets keep them with the same name
to simplify the code.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rafael Antognolli [Tue, 6 Jun 2017 16:23:27 +0000 (09:23 -0700)]
i965: aa_line_distance_mode should be before the padding.
It seems that it was never set correctly.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tim Rowley [Fri, 9 Jun 2017 23:37:27 +0000 (18:37 -0500)]
swr/rast: Fix read-back of viewport array index
Binner/clipper read viewport array index from the vertex header as needed.
Move viewport state to BACKEND_STATE.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 9 Jun 2017 21:58:59 +0000 (16:58 -0500)]
swr/rast: Refactor includes to limit simdintrin.h usage
Reduces the files rebuilt after modifying simdintrin.h from
84 to 64.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 9 Jun 2017 17:57:39 +0000 (12:57 -0500)]
swr/rast: Fix read-back of render target array index
The last FE stage can emit render target array index. Currently we only
check to see if GS is emitting it. Moved the state to BACKEND_STATE and
plumbed the driver to set it.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 8 Jun 2017 23:37:08 +0000 (18:37 -0500)]
swr/rast: Adjust cast for gcc warning
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 8 Jun 2017 19:44:32 +0000 (14:44 -0500)]
swr/rast: Don't transition hottile resolved->dirty during store tiles
Fixes crash when dumping render targets and RT surface has been deleted.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 8 Jun 2017 19:42:54 +0000 (14:42 -0500)]
swr/rast: gen_llvm_types.py support for SIMD256/SIMD512
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 8 Jun 2017 16:48:37 +0000 (11:48 -0500)]
swr/rast: Properly size GS stage scratch space
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Wed, 7 Jun 2017 18:32:11 +0000 (13:32 -0500)]
swr/rast: Fix early z / query interaction
For certain cases, we perform early z for optimization. The GL_SAMPLES_PASSED
query was providing erroneous results because we were counting the number
of samples passed before the fragment shader, which did not work if the
fragment shader contained a discard.
Account properly for discard and early z, by anding the zpass mask with
the post fragment shader active mask, after the fragment shader.
Fixes the following piglit tests:
- occlusion-query-discard
- occlusion_query_meta_fragments
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Wed, 7 Jun 2017 18:16:15 +0000 (13:16 -0500)]
swr/rast: Share vertex memory between VS input/output
Removes large simdvertex stack allocation.
Vertex shader must ensure reads happen before writes.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Tue, 6 Jun 2017 23:41:40 +0000 (18:41 -0500)]
swr/rast: Add support for dynamic vertex size for VS output
Add support for dynamic vertex size for the vertex shader output.
Add new state in SWR_FRONTEND_STATE to specify the size.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Tue, 6 Jun 2017 20:34:54 +0000 (15:34 -0500)]
swr/rast: SIMD16 FE - improve calcDeterminantIntVertical
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Mon, 5 Jun 2017 21:13:25 +0000 (16:13 -0500)]
swr/rast: Add support to PA for variable sized vertices
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 1 Jun 2017 18:08:04 +0000 (13:08 -0500)]
swr/rast: Rework attribute layout
Move fixed attributes to the top and pack single component SGVs.
WIP to support dynamically allocated vertex size.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Wed, 31 May 2017 16:24:08 +0000 (11:24 -0500)]
swr/rast: Remove explicit primitive id slot in the vertex layout
- Remove any special casing in the PS stage when primitive ID is input.
Treat as a normal attribute that must be set up properly in the FE linkage.
- Remove primitive id from the PS_CONTEXT and TRI_FLAGS
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 26 May 2017 06:47:58 +0000 (01:47 -0500)]
swr/rast: Fix invalid 16-bit format traits for A1R5G5B5
Correctly handle formats of <= 16 bits where the component bits don't
add up to the pixel size.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 25 May 2017 02:54:43 +0000 (21:54 -0500)]
swr/rast: Implement JIT shader caching to disk
Disabled by default; currently doesn't cache shaders (fs,gs,vs).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Brian Paul [Fri, 26 May 2017 19:56:37 +0000 (13:56 -0600)]
gallium/docs: improve docs for SAMPLE_POS, SAMPLE_INFO, TXQS, MSAA semantics
For the SAMPLE_POS and SAMPLE_INFO opcodes, clarify resource vs. render
target queries, range of postion values, swizzling, etc. We basically
follow the DX10.1 conventions.
For the TXQS opcode and TGSI_SEMANTIC_SAMPLEID, clarify return value
and type.
For the TGSI_SEMANTIC_SAMPLEPOS system value, clarify the range of
positions returned.
v2: use 'undef' for unused vector components. Use (0.5, 0.5, undef, undef)
for sample pos when MSAA not applicable.
v3: Add note that OPCODE_SAMPLE_INFO, OPCODE_SAMPLE_POS are not used yet
and the information is subject to change.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Fri, 16 Jun 2017 19:16:30 +0000 (13:16 -0600)]
svga: add some missing SVGA_STATS_* enum values, prefix strings
To fix the build when VMX86_STATS is defined.
Also, some minor whitespace changes to match upstream code.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Alex Deucher [Fri, 16 Jun 2017 16:12:21 +0000 (12:12 -0400)]
radeonsi: add new polaris12 pci id
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Bruce Cherniak [Thu, 15 Jun 2017 16:24:47 +0000 (11:24 -0500)]
swr: Don't crash when encountering a VBO with stride = 0.
The swr driver uses vertex_buffer->stride to determine the number
of elements in a VBO. A recent change to the state-tracker made it
possible for VBO's with stride=0. This resulted in a divide by zero
crash in the driver. The solution is to use the pre-calculated vertex
element stream_pitch in this case.
This patch fixes the crash in a number of piglit and VTK tests introduced
by
17f776c27be266f2.
There are several VTK tests that still crash and need proper handling of
vertex_buffer_index. This will come in a follow-on patch.
v2: Correctly update all parameters for VBO constants (stride = 0).
Also fixes the remaining crashes/regressions that v1 did
not address, without touching vertex_buffer_index.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Anuj Phogat [Fri, 19 May 2017 19:09:22 +0000 (12:09 -0700)]
intel/isl: Add the maximum surface size limit
V2: Use 2^31 bytes (2GB) surface size limit on pre-gen9 and
2^38 bytes for gen9+.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Anuj Phogat [Fri, 19 May 2017 20:47:12 +0000 (13:47 -0700)]
intel/isl: Use uint64_t to store total surface size
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Chris Wilson [Thu, 8 Jun 2017 23:35:09 +0000 (00:35 +0100)]
i965: Mark freshly allocate bo as idle
When created, buffers are idle, so mark them as such to save an early
ioctl or mistakenly assuming the fresh buffer is busy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Christian Gmeiner [Fri, 9 Jun 2017 10:34:49 +0000 (12:34 +0200)]
etnaviv: add rs-operations sw query
It could be useful to get the number of emited resolve operations when
doing driver optimizations.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Lucas Stach [Sun, 4 Jun 2017 19:06:33 +0000 (21:06 +0200)]
etnaviv: advertise correct max LOD bias
The maximum LOD bias supported is the same as the max texture level
supported.
Fixes piglit: ext_texture_lod_bias
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Lucas Stach [Sun, 4 Jun 2017 19:06:32 +0000 (21:06 +0200)]
etnaviv: mask correct channel for RB swapped rendertargets
Now that we support RB swapped targets by using a shader variant, we
must derive the color mask from both the blend state and the bound
framebuffer.
Fixes piglit: fbo-colormask-formats
Fixes: 7f62ffb68ad ("etnaviv: add support for rb swap")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Lucas Stach [Sun, 4 Jun 2017 19:06:31 +0000 (21:06 +0200)]
etnaviv: replace translate_clear_color with util_pack_color
This replaces the open coded etnaviv version of the color pack with the
common util_pack_color.
Fixes piglits:
arb_color_buffer_float-clear
fcc-front-buffer-distraction
fbo-clearmipmap
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Lucas Stach [Sun, 4 Jun 2017 19:06:30 +0000 (21:06 +0200)]
etnaviv: remove bogus assert
etna_resource_copy_region handles resources with multiple samples
by falling back to the software path. There is no need to kill the
application there.
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Lucas Stach [Sun, 4 Jun 2017 19:06:29 +0000 (21:06 +0200)]
etnaviv: use padded width/height for resource copies
When copying a resource fully we can just blit the whole level. This allows
to use the RS even for level sizes not aligned to the RS min alignment. This
is especially useful, as etna_copy_resource is part of the software fallback
paths (used in etna_transfer), that are used for doing unaligned copies.
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Lucas Stach [Sun, 4 Jun 2017 19:06:28 +0000 (21:06 +0200)]
etnaviv: don't try RS blit if blit region is unaligned
If the blit region is not aligned to the RS min alignment don't try
to execute the blit, but fall back to the software path.
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Emil Velikov [Fri, 26 May 2017 15:32:53 +0000 (16:32 +0100)]
Revert "amd/common: add missing libdrm include path"
This reverts commit
44b29dd7b6cdc1a3fde58c367b9de8081ac4167b.
Should no longer be required as of last patch.
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Emil Velikov [Mon, 29 May 2017 13:50:47 +0000 (14:50 +0100)]
ac: remove amdgpu.h dependency
Add a couple of forward declarations and drop the amdgpu.h requirement.
With this we can build the r300 and r600 drivers without the need for
amdgpu.
v2:
- Add amdgpu.h include in the C file (Marek)
- Add a comment about pre C11 typedef redeclaration warning (Eric)
Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101189
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Jan Vesely [Fri, 2 Jun 2017 16:37:07 +0000 (12:37 -0400)]
r600g,compute: provide local copy of functions from ac_binary.c
This is a verbatim copy of the code. The functions can be cleaned up since
r600 does not use all the stuff that gcn does.
The symbol names have been changed since we still use ac_binary.h header
(for struct definition)
v2: Add ifdef guard around r600_binary_clean call (Aaron)
Remove stray comment
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-By: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jan Vesely [Fri, 2 Jun 2017 16:37:06 +0000 (12:37 -0400)]
r600: android: amdgpu_common is only required when building OpenCL
v2: split off Android changes
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Eric Engestrom [Thu, 15 Jun 2017 22:53:55 +0000 (23:53 +0100)]
egl/display: make platform detection thread-safe
Imagine there are 2 threads that both call _eglGetNativePlatform()
simultaneously:
- thread 1 completes the first "if (native_platform ==
_EGL_INVALID_PLATFORM)" check and is preempted to do something else
- thread 2 executes the whole function, does "native_platform =
_EGL_NATIVE_PLATFORM" and just before returning it's preempted
- thread 1 wakes up and calls _eglGetNativePlatformFromEnv() which
returns _EGL_INVALID_PLATFORM because no env vars are set, updates
native_platform and then gets preempted again
- thread 2 wakes up and returns wrong _EGL_INVALID_PLATFORM
Solve this by doing the detection in a local var and only overwriting
the global one at the end, if no other thread has updated it since.
This means the platform detected in the thread might not be the platform
returned by the function, but this is a different issue that will need
to be discussed when this becomes possible.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Eric Engestrom [Thu, 15 Jun 2017 22:53:54 +0000 (23:53 +0100)]
egl/display: only detect the platform once
My refactor missed the fact that `native_platform` is static.
Add the proper guard around the detection code, as it might not be
necessary, and only print the debug message when a detection was
actually performed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Fixes: 7adb9b094894a512c019 ("egl/display: remove unnecessary code and
make it easier to read")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Thomas Hellstrom [Wed, 14 Jun 2017 13:53:40 +0000 (15:53 +0200)]
svga: Relax the format checks for copy_region_vgpu10 somewhat
The new generic checks were actually more restrictive than the previous svga-
specific tests and not vice versa. So bypass the common format checks for
copy_region_vgpu10.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Thomas Hellstrom [Wed, 14 Jun 2017 13:39:42 +0000 (15:39 +0200)]
svga: Fix incorrect format conversion blit destination
The blit.dst.resource member that was used as destination was
modified earlier in the function, effectively making us try to blit
the content onto itself. Fix this and also add a debug printout when the
format conversion blits fail.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Thomas Hellstrom [Wed, 3 May 2017 12:26:02 +0000 (05:26 -0700)]
svga: Fix srgb copy_region regression
This fixes a tf2 srgb copy_region regression from
"svga: Rework the blit and resource_copy_region functionality v3"
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Thomas Hellstrom [Thu, 27 Apr 2017 06:58:47 +0000 (23:58 -0700)]
svga: Prefer accelerated blits over cpu copy region
This reduces the number of cpu copy_region fallbacks on a Nvidia system
running the piglit command
./publish/bin/piglit run -1 -t copy -t blit tests/quick
from 64789 to 780
Previously this has caused a regression in piglit test
spec@!opengl 1.0@gl-1.0-scissor-copypixels, but I'm currently not able to
reproduce that regression.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Thomas Hellstrom [Wed, 12 Apr 2017 08:38:23 +0000 (10:38 +0200)]
svga: Support accelerated conditional blitting
The blitter has functions to save and restore the conditional rendering state,
but we currently don't save the needed info.
Since also the copy_region_vgpu10 path supports conditional blitting,
we instead use the same function as the clearing routines and move
that function to svga_pipe_query.c
Note that we still haven't implemented conditional blitting with
the software fallbacks.
Fixes piglit nv_conditional_render::copyteximage
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Thomas Hellstrom [Wed, 12 Apr 2017 07:28:49 +0000 (09:28 +0200)]
svga: Use utility functions to help determine whether we can use copy_region
It seems like the SVGA tests are in general more stringent than the utility
tests, but they also miss some blitter features like filters and window
rectangles, and if new blitter features are added in the future, it might
be possible that we forget adding tests for those.
So in addition to the SVGA tests, use the utility tests to restrict the
situations where we can use copy_region.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Thomas Hellstrom [Tue, 11 Apr 2017 13:18:04 +0000 (15:18 +0200)]
svga: Rework the blit and resource_copy_region functionality v3
This work was initially trigged by the fact that imported surfaces may
be backed by other SVGA3D formats than the default. Therefore some fixes were
needed to avoid using the copy_region_vgpu10() functionality for incompatible
SVGA3D formats where the pipe formats were OK. This situation happens when
using dri3.
Also in some situations, for example where a R8G8_UNORM surface is backed by
an SVGA3D_NV12 format, we can't use the copy_region functionality at all and
thus need to fall back to the quad blitter also for the resource_copy_region
function. This situation doesn't happen currently, but will if we start using
video textures.
The patch makes the blit- and copy_region paths similar and the decision whether
to use a certain gpu command should now be easy to locate. Probably the
resource_copy_region path will suffer from a minor additional cpu overhead,
but on the other hand there are more cases now that we accelerate, since
we try harder before falling back to cpu copies / blits.
v2: Addressed review comments and fixed up piglit failures by sometimes
preferring cpu_copy_region() over blit().
v3: Removed a stray test statement. Updated commit message.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Kenneth Graunke [Thu, 2 Mar 2017 00:41:05 +0000 (16:41 -0800)]
i965: Improve conditional rendering in fallback paths.
We need to fall back in a couple of cases:
- Sandybridge (it just doesn't do this in hardware)
- Occlusion queries on Gen7-7.5 with command parser version < 2
- Transform feedback overflow queries on Gen7, or on Gen7.5 with
command parser version < 7
In these cases, we printed a perf_debug message and fell back to
_mesa_check_conditional_render(), which stalls until the full
query result is available. Additionally, the code to handle this
was a bit of a mess.
We can do better by using our normal conditional rendering code,
and setting a new state, BRW_PREDICATE_STATE_STALL_FOR_QUERY, when
we would have set BRW_PREDICATE_STATE_USE_BIT. Only if that state
is set do we perf_debug and potentially stall. This means we avoid
stalls when we have a partial query result (i.e. we know it's > 0,
but don't have the full value). The perf_debug should trigger less
often as well.
Still, this is primarily intended as a cleanup.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Sun, 4 Jun 2017 23:04:02 +0000 (00:04 +0100)]
configure.ac: remove manual AC_SUBST for pthread-stubs
Unneeded, since the PKG_CHECK_MODULES macro already does the
substitution of the package Cflags/Libs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Emil Velikov [Sun, 4 Jun 2017 23:03:59 +0000 (00:03 +0100)]
configure.ac: add -pthread to PTHREAD_LIBS
As described inline - follow what's written in the manual and what works
for all platforms that Mesa supports.
We want to untangle things leaving only -pthread, yet that has a
potential of causing regressions. Thus we'll do it as a follow-up patch.
As a nice side-effect this resolves issues, where the system lacks
libpthread.so, yet the linker does not warn about it and we and up with
unresolved symbols.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101071
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>