git.libre-soc.org Git - mesa.git/log

nv50: expose two groups of compute-related MP perf counters

This turns on GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>

i965/gen9: Support fast clears for 32b float

SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).

v2: Remove LUMINANCE16F and INTENSITY16F special cases since they are now
handled by Neil's patch to disable MSAA fast clears.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>

Revert "i965/gen9: Enable rep clears on gen9"

This reverts commit 8a0c85b25853decb4a110b6d36d79c4f095d437b.

It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.

Reviewed-by: Neil Roberts <neil@linux.intel.com>

Revert "i965/gen9: Disable MCS for 1x color surfaces"

This reverts commit dcd59a9e322edeea74187bcad65a8e56c0bfaaa2.

Reviewed-by: Neil Roberts <neil@linux.intel.com>

i965/meta/gen9: Individually fast clear color attachments

The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch).

It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0

v2: Doing the framebuffer binding only once (Chad)
Directly use the renderbuffers from the mt (Chad)

v3: Patch from Neil whose feedback I originally missed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>

i965/skl: skip fast clears for certain surface formats

Some of the information originally in this commit message is now in the patch
before this.

SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "Auxiliary Surfaces For
Sampled Tiled Resource".

The formats which are supported are defined in the table titled "Render Target
Surface Types [SKL+]". There is no PRM yet to reference. The previously
implemented helper function already does the right thing provided the table is
correct.

v2: Use better English in commit message (Matt)
s/compressable/compressible/ (Matt)
Don't compare bools to true (Matt)
Use the helper function and don't increase the context size - this is mostly
implemented in the patch just before this (Chad, Neil)
Remove an "invalid" assert (Chad)
Fix assertion to check num_samples > 1, instead of num_samples (Chad)

v3:
Use Matt's code as Requested-by: Chad. I didn't even look at it since Chad said
he was fine with that, and presumably Matt is fine with it.

v4: Use better quote from spec (Topi)

Cc: Chad Versace <chad.versace@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Add lossless compression to surface format table

Background: Prior to Skylake and since Ivybridge Intel hardware has had the
ability to use a MCS (Multisample Control Surface) as auxiliary data in
"compression" operations on the surface. This reduces memory bandwidth. This
hardware was either used for MSAA compression, or fast clear operations. On
Gen8, a similar mechanism exists to allow the hiz buffer to be sampled from, and
therefore this feature is sometimes referred to more generally as "AUX buffers".

Skylake adds the ability to have the display engine directly source compressed
surfaces on top of the ability to sample from them. Inference dictates that
enabling this display features adds a restriction to the formats which could
actually be compressed. This is backed up by a blurb in the AUX_CCS_D section
from the RENDER_SURFACE_STATE: "In addition, if the surface is bound to the
sampling engine, Surface Format must be supported for Render Target Compression
for surfaces bound to the sampling engine." The current set of surfaces seems
to be a subset as compared to previous gens (see the next patch). Also, if I had
to guess I would guess that future gens add support for more surface formats. To
make handling this a bit easier to read, and more future proof, the support for
this is moved into the surface formats table.

Along with the modifications to the table, a helper function is also provided to
determine if a surface is CCS_E compatible. Because fast clears are currently
disabled on SKL, we can plumb the helper all the way through here, and not
actually have anything break.

v2:
- rename ccs to ccs_e; Requested-by: Chad
- rename lossless_compression to lossless_compression Requested-by: Chad
- change meaning of brw_losslessly_compressible_format Requested-by: Chad
- related changes to the code to reflect this.
- remove excess ccs (Chad)

v3:
- Commit message changes (Topi)
- Const some things which could be const (Topi)

Requested-by: Chad Versace <chad.versace@intel.com>
Requested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>

i965/skl: Add fast color clear infrastructure

Patch was originally called:
i965/skl: Enable fast color clears on SKL

Skylake introduces some differences in the way that fast clears are programmed
and in the restrictions for using fast clears. Since some of these are
non-obvious, and fast clears are currently disabled globally, we can enable the
simple stuff here and leave the weirder stuff and separately reviewable work.

Based on a patch originally from Kristian.

Note that within this patch the change in scaling factors could be achieved with
this hunk instead. I've opted to keep things more like how the docs describe it
however.
   --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
   +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
   @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
          /* In release builds, fall through */
       case I915_TILING_Y:
          *width_px = 32 / mt->cpp;
   -      *height = 4;
   +      if (brw->gen >= 9)
   +         *height = 2;
   +      else
   +         *height = 4;

v2: Add braces for the multiline (Matt + Chad)
Comment updates (requested by Chad)
Modified commit message
Commit message from Chad explaining the MCS height change (Chad)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>

docs: Add GL_EXT_shader_samples_identical to the release notes

Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>

radeon/vce: disable two pipe mode for stoney

Only one encoding pipe available for Stoney

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

radeon/vce: add new firmware interface support

Add new interface to create and encode

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

egl: don't forget to ship platform_x11_dri3.h into the tarball

Should have been a part of f35198badeb

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

glsl: move builtin_type_macros.h into the correct list

Commit b9b40ef9b76 moved the file, but forgot to update the reference in
the makefile. Thus the out of tree build was busted :\

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

automake: use static llvm for make distcheck

With llvm 3.7 semi-dropping the autoconf build, we rely on their cmake
build. With the latter of which annoyingly using another (busted?)
SONAME.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

mesa: remove unused var in _mesa_PushDebugGroup()

Trivial.

mesa: whitespaces fixes in _mesa_one_time_init_extension_overrides()

Trivial.

radeon: ensure that timing/profiling queries are suspended on flush

The queries_suspended_for_flush flag is redundant because suspended queries
are not removed from their respective linked list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: add support for batch driver queries to perfmon

v2 + v3: forgot null-pointer checks (spotted by Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

gallium/hud: add support for batch queries

v2 + v3: be more defensive about allocations

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

gallium: add the concept of batch queries

Some drivers (in particular radeon[si], but also freedreno judging from
a quick grep) may want to expose performance counters that cannot be
individually enabled or disabled.

Allow such drivers to mark driver-specific queries as requiring a new
type of batch query object that is used to start and stop a list of queries
simultaneously.

v3: adjust recently added nv50 queries

v2: documentation for create_batch_query

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

st/mesa: maintain active perfmon counters in an array

It is easy enough to pre-determine the required size, and arrays are
generally better behaved especially when they get large.

v2: make sure init_perf_monitor returns true when no counters are active
(spotted by Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

st/mesa: use BITSET_FOREACH_SET to loop through active perfmon counters

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

st/mesa: store mapping from perfmon counter to query type

Previously, when a performance monitor was initialized, an inner loop through
all driver queries with string comparisons for each enabled performance
monitor counter was used. This hurts when a driver exposes lots of queries.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

st/mesa: map semantic driver query types to underlying type

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

gallium/hud: remove unused field in query_info

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

gallium: remove pipe_driver_query_group_info field type

This was only used to implement an unnecessarily restrictive interpretation
of the spec of AMD_performance_monitor. The spec says

  A performance monitor consists of a number of hardware and software
  counters that can be sampled by the GPU and reported back to the
  application.

I guess one could take this as a requirement that counters _must_ be sampled
by the GPU, but then why are they called _software_ counters? Besides,
there's not much reason _not_ to expose all counters that are available,
and this simplifies the code.

v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

gallivm: use sampler index 0 for texel fetches

texel fetches don't use any samplers. Previously we just set the same
number for both texture and sampler unit (as per "ordinary" gl style
sampling where the numbers are always the same) however this would trigger
some assertions checking that the sampler index isn't over PIPE_MAX_SAMPLERS
limit elsewhere with d3d10, so just set to 0.
(Fixing the assertion instead isn't really an option, the sampler isn't
really used but might still pass an out-of-bound pointer around and even
copy some things from it.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

freedreno/a4xx: add BPTC support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

xmlconfig: Add support for DragonFly

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

android: export the path of glsl nir headers

The change is necessary to avoid building errors in glsl and i965
modules due to missing glsl_types.h header

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

mesa: re-enable KHR_debug for ES contexts

With the earlier issues resolved we can expose the extension.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

main: Don't restrict several KHR_debug enum to desktop GL

In preparation for supporting GL_KHR_debug in OpenGL ES

v2: add a missing hunk in _mesa_IsEnabled (Emil)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

mesa: use the correct string for the ES GL_KHR_debug functions

As defined in the spec

when implemented in an OpenGL ES context, all entry points defined
by this extension must have a "KHR" suffix.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

glsl: avoid linker and user varying location to overlap

Current behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR0 by the linker

New behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR1 by the linker

v4:
* Fix variable name in assert

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>

auxiliary/vl/dri2: coding style fixes

Rewrap long(ish) lines, add space between struct foo and *.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

auxiliary/vl/dri2: hide internal functions

Analogous to previous commit. While we're here prefix all functions
identically -> vl_dri2_foo

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

auxiliary/vl/drm: hide internal functions

As of last commit everyone is using the vl_screen dispatch, thus we can
hide this function from the headers and make it static.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

st/vdpau: use the vl_screen dispatch

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

st/xvmc: use the vl_screen dispatch

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

st/va: use the vl_screen dispatch

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

st/omx: use the vl_screen dispatch

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

auxiliary/vl/dri2: setup the dispatch

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

auxiliary/vl/drm: use a label for the error path

... just like every other place in gallium.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

auxiliary/vl/drm: setup the dispatch

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

auxiliary/vl: add dispatch table

As mentioned previously, it will allow us to use different vl backend in
a generic way from either video state-tracker.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

auxiliary/vl: rename vl_screen_create to vl_dri2_screen_create

In a preparation of having proper multi-platform/backend handling in VL.

With follow up commits we'll introduce a dispatch within vl_screen
similar to the one in pipe_screen. This way any VL state-tracker can
operate seamlessly, considering the backend/platform is properly setup.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

st/va: trivial cleanup

Drop the temporary variable and fold the two conditional.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

st/omx: straighten get/put_screen

The current code is busted in a number of ways.

- initially checks for omx_display (rather than omx_screen), which may
or may not be around.
- blindly feeds the empty env variable string to loader_open_device()
- reads the env variable every time get_screen is called
- the latter manifests into memory leaks, and other issues as one sets
the variable between two get_screen calls.

Additionally it cleans up a couple of extra bits
- drops unneeded set/check of omx_display.
- make the teardown (put_screen) order was not symmetrical to the setup
(get_screen)

v2: Drop the "is empty string" check (Leo)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

automake: loader: don't create an empty dri3 helper

Seems that creating an empty one does not fair too well with MacOSX's
ar. Considering that all the users of the helper include it only when
needed, let's reshuffle the makefile.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92985
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>

automake: loader: honour the XCB_DRI3 cflags

Without this the compilation will fail, as the headers are installed in
a non-default location.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>

automake: egl: add symbols test

Should help us catch issues where we expose any extra symbols by
mistake. Just like the ones fixes with previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>

automake: loader: rework the CPPFLAGS

Rather than duplicating things, just use the generic AM_CPPFLAGS. This
has the fortunate side-effect of adding VISIBILITY_CFLAGS for the dri3
helper. The latter of which was erroneously exposing some internal
symbols.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965: Enable EXT_shader_samples_identical

On the vec4 backend, textureSamplesIdentical() will always return
false.  There are currently no test cases for the vec4 backend, so we
don't have much confidence in any implementation.  We also don't think
anyone is likely to miss it.

v2: Handle immediate value for MCS smarter.  Rebase on changes to
nir_texop_sampels_identical (missing second parameter).  Suggested by
Jason.

v3: Add Neil's code to handle 16x MSAA in the FS.  Also rebase on top of
f9a9ba5e.  Stub out the vec4 implementation.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v2]
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]

i965/vec4: Handle nir_tex_src_ms_index more like the scalar

v2: Rebase on top of f9a9ba5e.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

nir: Add nir_texop_samples_identical opcode

This is the NIR analog to GLSL IR ir_samples_identical.

v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by
Ken and Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: Add textureSamplesIdenticalEXT built-in functions

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: Add ir_samples_identical opcode

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: Extension tracking for EXT_shader_samples_indentical

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

mesa: Extension tracking for EXT_shader_samples_indentical

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

Import current draft of EXT_shader_samples_identical spec

v2: Add Neil to the list of contributors. I meant to do that before,
but Matt reminded me.

v3: Fix typos noticed by Nicolai.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

nir: add nir_ssa_for_alu_src()

Using something like:

   numer = nir_ssa_for_src(bld, alu->src[0].src,
                           nir_ssa_alu_instr_src_components(alu, 0));

for alu src's with swizzle, like:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_2 = udiv ssa_10.xx, ssa_11

ends up turning into something like:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_13 = imov ssa_10
   ...

because nir_ssa_for_src() ignore's the original nir_alu_src's swizzle.
Instead for alu instructions, nir_src_for_alu_src() should be used to
ensure the original alu src's swizzle doesn't get lost in translation:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_13 = imov ssa_10.xx
   ...

v2: check for abs/neg, and re-use existing nir_alu_src

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: fix missing increments of num_inputs/num_outputs

Note: not quite perfect, we should use type_size vfunc (in
compiler_options or nir_shader?) to determine how much we
increment num_inputs/outputs/uniforms. But we don't have
that yet, so let's at least fix things for the existing
users of these passes.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir/print: show # of uniforms/inputs/outputs

Signed-off-by: Rob Clark <robclark@freedesktop.org>

nir/print: show shader name/label if set

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: add nir_var_all enum

Otherwise, passing -1 gets you:

error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive]

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

freedreno/a4xx: fix 5_5_5_1 texture sampler format

This fixes teximage-colors, fbo-generatemipmap-formats, and probably
others (in relation to the RGB5 formats, others still fail).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org

freedreno/a4xx: add depth clamp and halfz clip

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

freedreno/a4xx: allow seamless cubemap filtering to be enabled per-texture

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

freedreno/a4xx: support lod_bias

The lower layers assume that we support this, and it's been core since
GL 1.4. This fixes a slew of piglit tests, especially around
tex-miplevel-selection.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org

nv50: allow using inline vertex data submit when gl_VertexID is used

The hardware can actually generates vertexid when vertices come from
a client-side buffer like when glDrawElements is used.

This doesn't fix (or break) any piglit tests but it improves the
previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
data submit when gl_VertexID is used")

The only disadvantage is that only works on G84+, but we don't really
care of that weird and old NV50 chipset.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv50: add NV84_3D macro

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

i965: Drop IMM fs_reg/src_reg -> brw_reg conversions.

The previous two commits make this unnecessary.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/vec4: Replace src_reg(imm) constructors with brw_imm_*().

Cuts 1.5k of .text.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Use brw_imm_uw().

W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.

Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().

Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.

   text     data      bss      dec      hex  filename
5204535   214112    27784  5446431   531b1f  i965_dri.so before
5193977   214112    27784  5435873   52f1e1  i965_dri.so after

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Make brw_imm_vf4() take 8-bit restricted floats.

This partially reverts commit bbf8239f92ecd79431dfa41402e1c85318e7267f.

I didn't like that commit to begin with -- computing things at compile
time is fine -- but for purposes of verifying that the resulting values
are correct, looking up 0x00 and 0x30 in a table is a lot better than
evaluating a recursive function.

Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats
directly (instead of only integral values that would be converted to
restricted float), we can use this function as a replacement for the
vector float src_reg/fs_reg constructors.

brw_float_to_vf() is not currently an inline function, so it will not be
evaluated at compile time. I'll address that in a follow-up patch.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Add test for sorted extension table

Enable developers to know if the table's alphabetical sorting
is maintained or lost.

v2: Move "*" next to pointer name (Matt)
Include extensions_table.h instead of extensions.h (Ian)
Remove extra " *" in comment (Ian)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

mesa/extensions: Sort the extension table alphabetically

Make it easier to determine where to add new extensions.
Performed with the vim sort command.

v2: Insert newline after last #define (Matt)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

docs: GL3.1 for a3xx and a4xx

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

mesa: enable EXT_blend_func_extended if the driver supports the ARB version

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

mesa: allow MAX_DUAL_SOURCE_DRAW_BUFFERS to be available to ES

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

mesa: enable usage of blend_func_extended blend factors in GLES2

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl: add a parse check to check for the index layout qualifier

This can only be used if EXT_blend_func_extended is enabled

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl: add GL_EXT_blend_func_extended preprocessor define

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl: add support for EXT_blend_func_extended builtins

gl_MaxDualSourceDrawBuffersEXT - Maximum dual-source draw buffers supported

For ESSL 1.0, it provides two builtins since you can't have user-defined
color output variables:
gl_SecondaryFragColorEXT
gl_SecondaryFragDataEXT[MaxDSDrawBuffers]

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl: add EXT_blend_func_extended parser enables

This adds a state for the maximum dual source draw variables available
and the variable for determining if the extension has been enabled
in the program shaders.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glapi: add EXT_blend_func_extended XML definitions

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

os: check for GALLIUM_PROCESS_NAME to override os_get_process_name()

Useful for debugging and for glretrace.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>

glsl: fix ir_constant::equals() for doubles

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

glsl: fix isinf() for doubles

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

nir: fix constant folding of bfi

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

hud: fix Windows build break

Protect signal-related code with PIPE_OS_UNIX test.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

glsl: Fix off-by-one error in array size check assertion

Apparently, this has been a bug since 2010 (c30f6e5d).

Also use ARRAY_SIZE instead of open coding it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org

mesa: Don't expose GL_EXT_shader_integer_mix in GLES 1.x

There are no shaders, so it doesn't even make sense to expose the
extension.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>

glsl: Silence unused parameter warnings

builtin_functions.cpp:5289:52: warning: unused parameter 'num_arguments' [-Wunused-parameter]
                                           unsigned num_arguments,
                                                    ^
builtin_functions.cpp:5290:52: warning: unused parameter 'flags' [-Wunused-parameter]
                                           unsigned flags)
                                                    ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl: Silence ignored qualifier warning

I think the intention was to mark the "this" parameter as const, but
const goes on the other end to do that.

In file included from glsl_symbol_table.cpp:26:0:
ast.h:339:35: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
const bool is_single_dimension()
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>

i965: Allow indirect GS input indexing in the scalar backend.

This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.

All indirects are handled via the pull model.  We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data.  Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly.  We'd need to be careful.

v2: Use updated MOV_INDIRECT opcode.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>

gallium/hud: document GALLIUM_HUD_PERIOD in envvars.html.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>

gallium/hud: control visibility at startup and runtime.

- env GALLIUM_HUD_VISIBLE: control default visibility
- env GALLIUM_HUD_SIGNAL_TOGGLE: toggle visibility via signal

Signed-off-by: Marek Olšák <marek.olsak@amd.com>

i965/nir: Add hooks for testing nir_shader_clone

This commit adds code for testing nir_shader_clone by running it after each
and every optimization pass and throwing away the old shader. Testing
nir_shader_clone is hidden behind a new INTEL_CLONE_NIR environment
variable.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>