mesa.git
7 years agosvga: move svga_is_format_supported() to svga_format.c
Brian Paul [Tue, 21 Nov 2017 14:31:57 +0000 (07:31 -0700)]
svga: move svga_is_format_supported() to svga_format.c

where the other format-related functions live.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agosvga: s/unsigned/SVGA3dDevCapIndex/
Brian Paul [Tue, 21 Nov 2017 14:27:06 +0000 (07:27 -0700)]
svga: s/unsigned/SVGA3dDevCapIndex/

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agoi965: perf: add support for CoffeeLake GT3
Lionel Landwerlin [Thu, 9 Nov 2017 16:40:55 +0000 (16:40 +0000)]
i965: perf: add support for CoffeeLake GT3

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: perf: add support for CoffeeLake GT2
Lionel Landwerlin [Thu, 31 Aug 2017 10:28:30 +0000 (11:28 +0100)]
i965: perf: add support for CoffeeLake GT2

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: perf: add busyness metric sets on gen8/9 platforms
Lionel Landwerlin [Thu, 9 Nov 2017 16:51:26 +0000 (16:51 +0000)]
i965: perf: add busyness metric sets on gen8/9 platforms

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: fix time elapsed counter equations in VME/Media configs
Lionel Landwerlin [Thu, 9 Nov 2017 16:48:45 +0000 (16:48 +0000)]
i965: fix time elapsed counter equations in VME/Media configs

There was a mistake just in those metric sets. We probably didn't
noticed because they're not really interesting for 3D workloads.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: perf: update counter names on gen8/9 platforms
Lionel Landwerlin [Thu, 9 Nov 2017 16:46:47 +0000 (16:46 +0000)]
i965: perf: update counter names on gen8/9 platforms

Just fixing names.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: add a debug option to disable oa config loading
Lionel Landwerlin [Tue, 29 Aug 2017 09:41:27 +0000 (10:41 +0100)]
i965: add a debug option to disable oa config loading

This provides a good way to verify we haven't broken using the perf
driver on older kernels (which don't have the oa config loading
mechanism).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: perf: add support for userspace configurations
Lionel Landwerlin [Tue, 25 Jul 2017 16:22:58 +0000 (17:22 +0100)]
i965: perf: add support for userspace configurations

This allows us to deploy new configurations without touching the
kernel.

v2: Detect loadable configs without creating one (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: perf: update configs for loading from userspace
Lionel Landwerlin [Thu, 31 Aug 2017 10:04:28 +0000 (11:04 +0100)]
i965: perf: update configs for loading from userspace

When making configs loadable from userspace in the kernel, we left to
userspace more responsability around programming some registers. In
particular one register we use to set directly in the driver has now
been moved into the configs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoutil: add mesa-sha1 test to meson
Eric Engestrom [Mon, 27 Nov 2017 11:33:48 +0000 (11:33 +0000)]
util: add mesa-sha1 test to meson

Fixes: 513d7ffa23d42e96f831 "util: Add a SHA1 unit test program"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agocompiler: fix typo
Eric Engestrom [Fri, 24 Nov 2017 18:00:57 +0000 (18:00 +0000)]
compiler: fix typo

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agocompiler: use NDEBUG to guard asserts
Eric Engestrom [Thu, 23 Nov 2017 13:16:43 +0000 (13:16 +0000)]
compiler: use NDEBUG to guard asserts

nir_validate.c's #endif already had the correct NDEBUG comment

Fixes: dcb1acdea00a8f2c29777 "nir/validate: Only build in debug mode"
Fixes: 9ff71b649b4b3808a9e17 "i965/nir: Validate that NIR passes call nir_metadata_preserve()"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agobroadcom: use NDEBUG to guard asserts
Eric Engestrom [Fri, 24 Nov 2017 17:59:23 +0000 (17:59 +0000)]
broadcom: use NDEBUG to guard asserts

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agovc4: check preprocessor token existence using #ifdef instead of #if
Eric Engestrom [Fri, 24 Nov 2017 16:58:43 +0000 (16:58 +0000)]
vc4: check preprocessor token existence using #ifdef instead of #if

(other uses of USE_VC4_SIMULATOR are already correct)

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agodocs/llvmpipe.html: Minor edits
Ben Crocker [Mon, 27 Nov 2017 19:44:59 +0000 (14:44 -0500)]
docs/llvmpipe.html: Minor edits

Language and spelling fixups in three places.

Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Eric: move two fixes from the other patch to this one.]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agost/dri: replace hard-coded array size with ARRAY_SIZE()
Eric Engestrom [Fri, 24 Nov 2017 10:49:25 +0000 (10:49 +0000)]
st/dri: replace hard-coded array size with ARRAY_SIZE()

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi/gfx9: simplify condition for on-chip ESGS
Nicolai Hähnle [Thu, 16 Nov 2017 16:23:43 +0000 (17:23 +0100)]
radeonsi/gfx9: simplify condition for on-chip ESGS

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: clarify that si_shader_selector::esgs_itemsize is set for the ES part
Nicolai Hähnle [Sat, 18 Nov 2017 13:33:34 +0000 (14:33 +0100)]
radeonsi: clarify that si_shader_selector::esgs_itemsize is set for the ES part

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: use si_shader_context instead of lp_build_context in more places
Nicolai Hähnle [Thu, 16 Nov 2017 15:56:21 +0000 (16:56 +0100)]
radeonsi: use si_shader_context instead of lp_build_context in more places

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: cleanup si_initialize_color_surface
Nicolai Hähnle [Thu, 16 Nov 2017 06:33:34 +0000 (07:33 +0100)]
radeonsi: cleanup si_initialize_color_surface

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: avoid attempting to create CMASK if the tiling mode doesn't have it
Nicolai Hähnle [Sun, 19 Nov 2017 16:26:45 +0000 (17:26 +0100)]
radeonsi: avoid attempting to create CMASK if the tiling mode doesn't have it

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: check that we don't leak fine.buf references
Nicolai Hähnle [Tue, 14 Nov 2017 08:37:38 +0000 (09:37 +0100)]
radeonsi: check that we don't leak fine.buf references

Just as an added precaution.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoac/surface: fix indentation
Nicolai Hähnle [Sun, 19 Nov 2017 15:09:28 +0000 (16:09 +0100)]
ac/surface: fix indentation

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoamd/common: sid.h cleanups
Nicolai Hähnle [Thu, 9 Nov 2017 09:59:22 +0000 (10:59 +0100)]
amd/common: sid.h cleanups

Fix a bunch of labels indicating when registers were added/removed
and normalize the SI-class GRBM_GFX_INDEX.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agost_glsl_to_tgsi: check for the tail sentinel in merge_two_dsts
Nicolai Hähnle [Fri, 17 Nov 2017 19:01:50 +0000 (20:01 +0100)]
st_glsl_to_tgsi: check for the tail sentinel in merge_two_dsts

This fixes yet another case where DFRACEXP has only one destination. Found
by address sanitizer.

Fixes tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4-only-mantissa.shader_test

Fixes: 3b666aa74795 ("st/glsl_to_tgsi: fix DFRACEXP with only one destination")
Acked-by: Marek Olšák <marek.olsak@amd.com>
7 years agomesa/gles: adjust internal format in glTexSubImage2D error checks
Tapani Pälli [Mon, 20 Nov 2017 13:00:19 +0000 (15:00 +0200)]
mesa/gles: adjust internal format in glTexSubImage2D error checks

When floating point textures are created on OpenGL ES 2.0, driver
is free to choose used internal format. Mesa makes this decision in
adjust_for_oes_float_texture. Error checking for glTexImage2D properly
checks that sized formats are not used. We use same error checking
path for glTexSubImage2D (since there is lot of overlap), however since
those checks include internalFormat checks, we need to pass original
internalFormat passed by the client. Patch adds oes_float_internal_format
that does reverse adjust_for_oes_float_texture to get that format.

Fixes following test failure:
   ES2-CTS.gtf.GL2ExtensionTests.texture_float.texture_float

(when running test with MESA_GLES_VERSION_OVERRIDE=2.0)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103227
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoradv: Use the suffixed versions of VK_QUEUE_GLOBAL_PRIORITY_*
Jason Ekstrand [Tue, 28 Nov 2017 02:28:51 +0000 (18:28 -0800)]
radv: Use the suffixed versions of VK_QUEUE_GLOBAL_PRIORITY_*

Acked-by: Dave Airlie <airlied@redhat.com>
7 years agovulkan: Update the XML and headers to 1.0.66
Jason Ekstrand [Tue, 28 Nov 2017 02:26:21 +0000 (18:26 -0800)]
vulkan: Update the XML and headers to 1.0.66

Acked-by: Dave Airlie <airlied@redhat.com>
7 years agointel/blorp: Drop blorp_resolve_ccs_attachment
Jason Ekstrand [Sat, 11 Nov 2017 20:31:54 +0000 (12:31 -0800)]
intel/blorp: Drop blorp_resolve_ccs_attachment

The only reason why we needed that version was because the Vulkan driver
needed to be able to create the surface states so it could handle
indirect clear colors.  Now that blorp handles them natively, there's no
need for the extra entrypoint.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoanv: Let blorp handle indirect clear colors for CCS resolves
Jason Ekstrand [Sat, 11 Nov 2017 20:22:45 +0000 (12:22 -0800)]
anv: Let blorp handle indirect clear colors for CCS resolves

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoanv: Move get_fast_clear_state_address into anv_private.h
Jason Ekstrand [Sat, 11 Nov 2017 20:12:57 +0000 (12:12 -0800)]
anv: Move get_fast_clear_state_address into anv_private.h

While we're at it, we break it into two nicely named functions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agointel/blorp: Take a range of layers in blorp_ccs_resolve
Jason Ekstrand [Sat, 11 Nov 2017 19:26:23 +0000 (11:26 -0800)]
intel/blorp: Take a range of layers in blorp_ccs_resolve

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agointel/blorp: Add initial support for indirect clear colors
Jason Ekstrand [Sat, 11 Nov 2017 19:10:59 +0000 (11:10 -0800)]
intel/blorp: Add initial support for indirect clear colors

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoi965/blorp: Use a designated initializer for blorp_surf
Jason Ekstrand [Mon, 13 Nov 2017 02:31:56 +0000 (18:31 -0800)]
i965/blorp: Use a designated initializer for blorp_surf

This way uninitialized fields get automatically zeroed and it's safe to
add more fields to blorp_surf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agointel/blorp: Add fast-clear to the special case in MSAA resolves
Jason Ekstrand [Sat, 11 Nov 2017 23:44:23 +0000 (15:44 -0800)]
intel/blorp: Add fast-clear to the special case in MSAA resolves

This doesn't go all the way of avoiding the txf_ms if it's fast-cleared,
however it does at least make us only do it once.  This should improve
performance of MSAA resolves in the presence of lots of clear color.
Without the patch, enabling fast-clears in the multisampling Sascha demo
drops the framerate by about 10%.  With this patch, enabling fast-clears
increases the demo's framerate by 25%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agointel/blorp/blit: Rename blorp_nir_txf_ms_mcs
Jason Ekstrand [Sat, 11 Nov 2017 23:42:51 +0000 (15:42 -0800)]
intel/blorp/blit: Rename blorp_nir_txf_ms_mcs

That name is already taken by one of the helpers in blorp_nir_builder.h
and, while we haven't moved the guts of blorp_blit.c there yet, we'd
like to start using some things from that header.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoAndroid: disable warnings causing errors
Rob Herring [Mon, 27 Nov 2017 19:32:19 +0000 (13:32 -0600)]
Android: disable warnings causing errors

AOSP master has changed the build default to -Werror making all the
warnings errors. Override that with -Wno-error.

Signed-off-by: Rob Herring <robh@kernel.org>
7 years agost/glsl_to_tgsi: make use of driver_cache_blob with the disk cache
Timothy Arceri [Mon, 27 Nov 2017 05:25:11 +0000 (16:25 +1100)]
st/glsl_to_tgsi: make use of driver_cache_blob with the disk cache

driver_cache_blob was introduced with the i965 disk cache, it allows
us to simplify the cache a little and possibly offers some minor
speed improvements since we load the GLSL metadata and TGSI from
disk in one pass.

Using driver_cache_blob should also make it straight forward to
implement binary support for ARB_get_program_binary in gallium.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoglsl: Fix typo nagivation -> navigation
Gwan-gyeong Mun [Sat, 25 Nov 2017 14:08:23 +0000 (23:08 +0900)]
glsl: Fix typo nagivation -> navigation

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agogl_table.py: add extern C guard for the generated glapitable.h
Emil Velikov [Thu, 23 Nov 2017 18:51:14 +0000 (18:51 +0000)]
gl_table.py: add extern C guard for the generated glapitable.h

The header can be included from C++, hence contents should have
appropriate notation.

Cc: mesa-stable@lists.freedesktop.org
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoac: pack legacy_surf_level better
Marek Olšák [Tue, 14 Nov 2017 18:44:33 +0000 (19:44 +0100)]
ac: pack legacy_surf_level better

r600_texture: 1488 -> 1248 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoac: change legacy_surf_level::slice_size to dword units
Marek Olšák [Tue, 14 Nov 2017 18:31:39 +0000 (19:31 +0100)]
ac: change legacy_surf_level::slice_size to dword units

The next commit will reduce the size even more.

v2: typecast to uint64_t manually
v3: add more typecasts, add asserts

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoac: pack ac_surface better
Marek Olšák [Tue, 14 Nov 2017 18:22:15 +0000 (19:22 +0100)]
ac: pack ac_surface better

r600_texture: 1736 -> 1488 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: always initialize max_forced_staging_uploads
Marek Olšák [Fri, 24 Nov 2017 21:08:03 +0000 (22:08 +0100)]
radeonsi: always initialize max_forced_staging_uploads

r600_resource is malloc'd.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103808
Fixes: 4b0dc098b256 ("gallium/u_threaded: don't map big VRAM buffers for the first upload directly")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: remove an old hack for evergreen
Marek Olšák [Thu, 23 Nov 2017 19:29:27 +0000 (20:29 +0100)]
radeonsi: remove an old hack for evergreen

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST when profitable
Marek Olšák [Thu, 23 Nov 2017 19:22:25 +0000 (20:22 +0100)]
radeonsi: set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST when profitable

ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoac/nir: don't write tcs outputs to LDS that aren't read back.
Dave Airlie [Tue, 14 Nov 2017 05:11:39 +0000 (15:11 +1000)]
ac/nir: don't write tcs outputs to LDS that aren't read back.

If the TCS doesn't read back the outputs, no need to store them
to LDS in the first place. (except for tess factors).

This seems to give about 50fps (3290->3330) with tessellation demo.

I haven't tested if it impacts DoW3 at all.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agonir: fill outputs_read field and add patch outputs read (v2)
Dave Airlie [Tue, 14 Nov 2017 05:10:44 +0000 (15:10 +1000)]
nir: fill outputs_read field and add patch outputs read (v2)

This is to be used for TCS optimisations on radv.

v2: don't set written on reads (nha)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agor600/eg: dump event type in dumps
Dave Airlie [Thu, 25 May 2017 21:57:52 +0000 (07:57 +1000)]
r600/eg: dump event type in dumps

This just makes it easier to debug some things.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agonouveau/compiler: Allow to omit line numbers when printing instructions
Tobias Klausmann [Sun, 12 Nov 2017 01:51:55 +0000 (02:51 +0100)]
nouveau/compiler: Allow to omit line numbers when printing instructions

This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!

V2:
 - Use environmental variable (Karol Herbst)
V3:
 - Use the already populated nv50_ir_prog_info to forward information to the
   print pass (Pierre Moreau)
V4:
 - get rid of default value in PrintPass constructor

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agoradeonsi: try flushing unflushed fences in si_fence_finish even when timeout == 0
Nicolai Hähnle [Wed, 22 Nov 2017 16:52:43 +0000 (17:52 +0100)]
radeonsi: try flushing unflushed fences in si_fence_finish even when timeout == 0

Under certain conditions, waiting on a GL sync objects should act like
a flush, regardless of the timeout.

Portal 2, CS:GO, and presumably other Source engine games rely on this
behavior and hang during loading without this fix.

Fixes: bc65dcab3bc4 ("radeonsi: avoid syncing the driver thread in si_fence_finish")
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904

7 years agonv50/ir: move LateAlgebraicOpt to the very end
Ilia Mirkin [Thu, 16 Nov 2017 06:48:20 +0000 (01:48 -0500)]
nv50/ir: move LateAlgebraicOpt to the very end

Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.

This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.

total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs    : 944261 -> 943375 (-0.09%)
total shared used in shared programs  : 0 -> 0 (0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)
total bytes used in shared programs   : 60339896 -> 60221504 (-0.20%)

                local     shared        gpr       inst      bytes
    helped           0           0         555        2698        2698
      hurt           0           0         138         336         336

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonv50/ir: when merging immediates/consts, load directly
Ilia Mirkin [Thu, 16 Nov 2017 04:32:16 +0000 (23:32 -0500)]
nv50/ir: when merging immediates/consts, load directly

When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).

We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.

With SM35 (255 regs) we get these results:

total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs    : 950818 -> 944261 (-0.69%)
total shared used in shared programs  : 0 -> 0 (0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)
total bytes used in shared programs   : 60367456 -> 60339896 (-0.05%)

                local     shared        gpr       inst      bytes
    helped           0           0        4584        3186        3186
      hurt           0           0          55         968         968

I suspect they will be better for SM20 and SM30.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonv50/ir: add optimization for modulo by a non-power-of-2 value
Ilia Mirkin [Sat, 11 Nov 2017 03:10:46 +0000 (22:10 -0500)]
nv50/ir: add optimization for modulo by a non-power-of-2 value

We can still use the optimized division methods which make use of
multiplication with overflow.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
7 years agonv50/ir: optimize signed integer modulo by pow-of-2
Ilia Mirkin [Sat, 11 Nov 2017 02:47:59 +0000 (21:47 -0500)]
nv50/ir: optimize signed integer modulo by pow-of-2

It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agoutil: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC
Matt Turner [Sun, 26 Nov 2017 00:45:27 +0000 (16:45 -0800)]
util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC

MSVC doesn't support #warning?! Getting really tired of this.

7 years agodocs: remove bug 103626 from fix list as per 17.2.6
Andres Gomez [Sun, 26 Nov 2017 00:15:43 +0000 (02:15 +0200)]
docs: remove bug 103626 from fix list as per 17.2.6

Bug https://bugs.freedesktop.org/show_bug.cgi?id=103626 was
incorrectly listed as fixed.

Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit b9b60dbf55a1307a60a333c70c3add3643243c36)

7 years agoutil: Use preprocessor correctly
Matt Turner [Sat, 25 Nov 2017 23:56:43 +0000 (15:56 -0800)]
util: Use preprocessor correctly

Fixes: 6a353479a757 ("util: Assume little endian in the absence of
                      platform-specific handling")

7 years agodocs: update calendar, add news item and link release notes for 17.2.6
Andres Gomez [Sat, 25 Nov 2017 23:46:25 +0000 (01:46 +0200)]
docs: update calendar, add news item and link release notes for 17.2.6

Signed-off-by: Andres Gomez <agomez@igalia.com>
7 years agodocs: add sha256 checksums for 17.2.6
Andres Gomez [Sat, 25 Nov 2017 23:40:36 +0000 (01:40 +0200)]
docs: add sha256 checksums for 17.2.6

Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 93c2beafc0a7fa2f210b006d22aba61caa71f773)

7 years agodocs: add release notes for 17.2.6
Andres Gomez [Sat, 25 Nov 2017 23:32:53 +0000 (01:32 +0200)]
docs: add release notes for 17.2.6

Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 00b52f8e99653316a090826914509a138a1c78f7)

7 years agofreedreno/a4xx: add ARB_framebuffer_no_attachments support
Ilia Mirkin [Sun, 19 Nov 2017 21:36:08 +0000 (16:36 -0500)]
freedreno/a4xx: add ARB_framebuffer_no_attachments support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/a4xx: add indirect draw support
Ilia Mirkin [Sun, 19 Nov 2017 21:32:12 +0000 (16:32 -0500)]
freedreno/a4xx: add indirect draw support

This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno: regenerate pm4 header, adjust code for new names
Ilia Mirkin [Sun, 19 Nov 2017 21:31:02 +0000 (16:31 -0500)]
freedreno: regenerate pm4 header, adjust code for new names

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/a4xx: add stencil texturing support
Ilia Mirkin [Sun, 19 Nov 2017 20:13:41 +0000 (15:13 -0500)]
freedreno/a4xx: add stencil texturing support

Copied from a5xx, should be identical.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx
Ilia Mirkin [Sun, 19 Nov 2017 17:28:53 +0000 (12:28 -0500)]
freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx

Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agonir: allow texture offsets with cube maps
Ilia Mirkin [Sun, 19 Nov 2017 17:27:12 +0000 (12:27 -0500)]
nir: allow texture offsets with cube maps

GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agoutil: Fix disk_cache index calculation on big endian
Matt Turner [Thu, 23 Nov 2017 18:41:34 +0000 (10:41 -0800)]
util: Fix disk_cache index calculation on big endian

The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.

The following program

        unsigned char array[4] = {1, 2, 3, 4};
        int *ptr = (int *)array;

        for (int i = 0; i < 4; i++) {
            printf("%02x", array[i]);
        }
        printf("\n");

        printf("%08x\n", *ptr);

prints

   01020304
   04030201

on little endian, and

   01020304
   01020304

on big endian.

On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.

To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.

Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab35 ("glsl: Add initial functions to implement an
                      on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoutil: Add a SHA1 unit test program
Matt Turner [Wed, 22 Nov 2017 23:10:47 +0000 (15:10 -0800)]
util: Add a SHA1 unit test program

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoutil: Fix SHA1 implementation on big endian
Matt Turner [Thu, 23 Nov 2017 06:39:51 +0000 (22:39 -0800)]
util: Fix SHA1 implementation on big endian

The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.

Fixes: d1efa09d342b ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoutil: Assume little endian in the absence of platform-specific handling
Matt Turner [Sat, 25 Nov 2017 04:25:04 +0000 (20:25 -0800)]
util: Assume little endian in the absence of platform-specific handling

7 years agomesa: shrink VERT_ATTRIB bitfields to 32 bits
Marek Olšák [Wed, 15 Nov 2017 22:53:04 +0000 (23:53 +0100)]
mesa: shrink VERT_ATTRIB bitfields to 32 bits

There are only 32 vertex attribs now.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agomesa: remove unused vertex attrib WEIGHT
Marek Olšák [Wed, 15 Nov 2017 22:24:56 +0000 (23:24 +0100)]
mesa: remove unused vertex attrib WEIGHT

We don't support ARB_vertex_blend.

Note that the attribute aliasing check for ARB_vertex_program had to be
rewritten.

vbo_context: 20344 -> 20008 bytes
gl_context: 74672 -> 74616 bytes

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agomesa: don't assign numbers to vertex attrib enums manually
Marek Olšák [Wed, 15 Nov 2017 21:58:58 +0000 (22:58 +0100)]
mesa: don't assign numbers to vertex attrib enums manually

I plan to remove one of them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agogallium/hud: add HUD sharing within a context share group
Marek Olšák [Sun, 19 Nov 2017 20:29:46 +0000 (21:29 +0100)]
gallium/hud: add HUD sharing within a context share group

This is needed for profiling multi-context applications like Chrome.
One context can record queries and another context can draw the HUD.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: update the HUD interface for multiple contexts
Marek Olšák [Sun, 19 Nov 2017 20:04:07 +0000 (21:04 +0100)]
gallium/hud: update the HUD interface for multiple contexts

This is the boring subset of the following commit.
All new parameters are optional.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: prevent a crash if the recording context is inactive
Marek Olšák [Sun, 19 Nov 2017 03:36:38 +0000 (04:36 +0100)]
gallium/hud: prevent a crash if the recording context is inactive

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: separate code for record context init/release
Marek Olšák [Sat, 18 Nov 2017 17:07:40 +0000 (18:07 +0100)]
gallium/hud: separate code for record context init/release

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: separate code for draw context init/release
Marek Olšák [Sat, 18 Nov 2017 17:07:40 +0000 (18:07 +0100)]
gallium/hud: separate code for draw context init/release

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: don't use hud->pipe in hud_parse_env_var
Marek Olšák [Sat, 18 Nov 2017 16:53:34 +0000 (17:53 +0100)]
gallium/hud: don't use hud->pipe in hud_parse_env_var

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: use cso_get_pipe_context
Marek Olšák [Sat, 18 Nov 2017 16:46:51 +0000 (17:46 +0100)]
gallium/hud: use cso_get_pipe_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agocso: add cso_get_pipe_context
Marek Olšák [Sat, 18 Nov 2017 16:43:42 +0000 (17:43 +0100)]
cso: add cso_get_pipe_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: pass pipe_context explicitly to most functions
Marek Olšák [Sat, 18 Nov 2017 15:25:52 +0000 (16:25 +0100)]
gallium/hud: pass pipe_context explicitly to most functions

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: split hud_draw into 3 separate functions
Marek Olšák [Sat, 18 Nov 2017 14:23:23 +0000 (15:23 +0100)]
gallium/hud: split hud_draw into 3 separate functions

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/dri: remove dead code and incorrect comment around make_current
Marek Olšák [Sat, 18 Nov 2017 23:24:40 +0000 (00:24 +0100)]
st/dri: remove dead code and incorrect comment around make_current

Core Mesa already handles flushing based on ContextReleaseBehavior,
so the comment is wrong.

Also, old_st is always NULL, because unbind_context always precedes
make_current.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/dri: clean up dri_unbind_context
Marek Olšák [Sat, 18 Nov 2017 23:19:19 +0000 (00:19 +0100)]
st/dri: clean up dri_unbind_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: expose all CB performance counters on Stoney
Marek Olšák [Tue, 21 Nov 2017 00:47:30 +0000 (01:47 +0100)]
radeonsi: expose all CB performance counters on Stoney

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: handle imported textures with DCC robustly
Marek Olšák [Mon, 20 Nov 2017 00:51:50 +0000 (01:51 +0100)]
radeonsi: handle imported textures with DCC robustly

now you can hack the driver to enable DCC for displayable textures and
Glamor that doesn't enable that by default won't crash anymore.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: fix a typo in creating monolithic ES-GS
Marek Olšák [Fri, 17 Nov 2017 16:52:09 +0000 (17:52 +0100)]
radeonsi: fix a typo in creating monolithic ES-GS

This has no effect because both occupy the same memory in a union.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: don't write undefined output channels to LDS in LS
Marek Olšák [Fri, 17 Nov 2017 03:56:13 +0000 (04:56 +0100)]
radeonsi: don't write undefined output channels to LDS in LS

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: use ac.lds for shared memory
Marek Olšák [Thu, 9 Nov 2017 22:34:26 +0000 (23:34 +0100)]
radeonsi: use ac.lds for shared memory

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: do 64-bit LDS loads recursively
Marek Olšák [Thu, 9 Nov 2017 22:25:34 +0000 (23:25 +0100)]
radeonsi: do 64-bit LDS loads recursively

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomapi: Teach es{1,2}api/ABI-check shared library names on Cygwin
Jon Turney [Sat, 11 Nov 2017 14:48:10 +0000 (14:48 +0000)]
mapi: Teach es{1,2}api/ABI-check shared library names on Cygwin

Ideally we'd be able to get the library filename from libtool, but that
doesn't seem to be a feature...

Use of ${uname} is presumably ok here as we won't be running 'make check' if
we are cross-compiling

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoRevert "radv: remove unnecessary memset() in radv_AllocateCommandBuffers()"
Samuel Pitoiset [Wed, 22 Nov 2017 15:13:28 +0000 (16:13 +0100)]
Revert "radv: remove unnecessary memset() in radv_AllocateCommandBuffers()"

This fixes two CTS regressions:
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary

These two tests are part the mustpass lists, so presumably they
are correct and my change was wrong.

This reverts commit 0f68208f1d1d3b7b2963dab40e84c60212518692.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv/winsys: improve error messages when the buffer list creation failed
Samuel Pitoiset [Wed, 22 Nov 2017 19:13:26 +0000 (20:13 +0100)]
radv/winsys: improve error messages when the buffer list creation failed

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv/winsys: do not try to create a BO list with 0 buffers
Samuel Pitoiset [Wed, 22 Nov 2017 19:13:25 +0000 (20:13 +0100)]
radv/winsys: do not try to create a BO list with 0 buffers

This happens when all BOs have the RADEON_FLAG_NO_INTERPROCESS_SHARING
(DRM version >= 3.23) flag set. This flag is mainly used for reducing
overhead on the userspace side because we don't have to put those BOs
inside the list.

Though, if the driver tries to create a list with 0 buffers inside it,
libdrm returns -EINVAL and the app just crashes.

This fixes a bunch of CTS dEQP-VK.sparse_resources.* fails (~100).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoi965/vec4: fix splitting of interleaved attributes
Iago Toral Quiroga [Tue, 21 Nov 2017 10:33:53 +0000 (11:33 +0100)]
i965/vec4: fix splitting of interleaved attributes

When we split an instruction that reads an uniform value
(vstride 0) we need to respect the vstride on the second
half of the instruction (that is, the second half should
read the same region as the first).

We were doing this already, but we didn't account for
stages that have interleaved input attributes which also
have a vstride of 0 and need the same treatment.

Fixes the following on Haswell:
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_structure_locations

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Gomez <agomez@igalia.com>
7 years agoetnaviv: Emit vertex buffers consecutively
Wladimir J. van der Laan [Thu, 23 Nov 2017 09:08:34 +0000 (10:08 +0100)]
etnaviv: Emit vertex buffers consecutively

Vertex buffer legacy state is no longer picked up with new drawing
commands. Change to use different cases depending on the number of
vertex streams in the GPU specs.

This results in slightly more compact state emission as well, on all
vivantes.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoREVIEWERS: add Alexander von Gluck IV as a reviewer for Haiku
Eric Engestrom [Thu, 9 Nov 2017 17:38:25 +0000 (17:38 +0000)]
REVIEWERS: add Alexander von Gluck IV as a reviewer for Haiku

There's been some Haiku-related activity lately, so let's document who
to cc on these patches.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Alexander von Gluck IV <kallisti5@unixzen.com>