mesa.git
9 years agollvmpipe: use simple coeffs calc for 128bit vectors
Oded Gabbay [Tue, 3 Nov 2015 08:36:01 +0000 (10:36 +0200)]
llvmpipe: use simple coeffs calc for 128bit vectors

There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.

The decision which method to use is determined by the size of the vector
that is used by the platform.

For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.

For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.

This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).

This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.

The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:

- glxspheres, on ppc64le:

   - original code:  4.892745317 frames/sec 5.460303857 Mpixels/sec
   - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
   - Additional 0.8% performance boost

- glxspheres, on x86-64 without AVX:

   - original code:  20.16418809 frames/sec 22.50323395 Mpixels/sec
   - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
   - Additional 0.74% performance boost

- glmark2, on ppc64le:

  - original code:  score of 58
  - with my change: score of 57

- glmark2, on x86-64 without AVX:

  - original code:  score of 175
  - with the patch: score of 167
  - Impact of of -4.5% on performance

- OpenArena, on ppc64le:

  - original code:  3398 frames 1719.0 seconds 2.0 fps
                    255.0/505.9/2773.0/0.0 ms

  - with the patch: 3398 frames 1690.4 seconds 2.0 fps
                    241.0/497.5/2563.0/0.2 ms

  - 29 seconds faster with the patch, which is about 2%

- OpenArena, on x86-64 without AVX:

  - original code:  3398 frames 239.6 seconds 14.2 fps
                    38.0/70.5/719.0/14.6 ms

  - with the patch: 3398 frames 244.4 seconds 13.9 fps
                    38.0/71.9/697.0/14.3 ms

  - 0.3 fps slower with the patch (about 2%)

Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agonir: Properly invalidate metadata in nir_opt_remove_phis().
Kenneth Graunke [Tue, 3 Nov 2015 05:43:40 +0000 (21:43 -0800)]
nir: Properly invalidate metadata in nir_opt_remove_phis().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_lower_vec_to_movs().
Kenneth Graunke [Tue, 3 Nov 2015 05:38:56 +0000 (21:38 -0800)]
nir: Properly invalidate metadata in nir_lower_vec_to_movs().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_opt_copy_prop().
Kenneth Graunke [Tue, 3 Nov 2015 05:21:25 +0000 (21:21 -0800)]
nir: Properly invalidate metadata in nir_opt_copy_prop().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_remove_dead_variables().
Kenneth Graunke [Tue, 3 Nov 2015 05:28:26 +0000 (21:28 -0800)]
nir: Properly invalidate metadata in nir_remove_dead_variables().

v2: Preserve live_variables too (Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
9 years agonir: Properly invalidate metadata in nir_split_var_copies().
Kenneth Graunke [Tue, 3 Nov 2015 05:05:08 +0000 (21:05 -0800)]
nir: Properly invalidate metadata in nir_split_var_copies().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_lower_global_vars_to_local().
Kenneth Graunke [Tue, 3 Nov 2015 05:02:37 +0000 (21:02 -0800)]
nir: Properly invalidate metadata in nir_lower_global_vars_to_local().

v2: Preserve nir_metadata_live_variables as well (caught by Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
9 years agonir: Unexpose _impl versions of copy_prop and dce
Jason Ekstrand [Wed, 28 Oct 2015 17:11:11 +0000 (10:11 -0700)]
nir: Unexpose _impl versions of copy_prop and dce

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex
Jordan Justen [Fri, 23 Oct 2015 23:10:02 +0000 (16:10 -0700)]
mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Iago Toral <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
9 years agoi965/vec4: Send from GRF in atomic operations.
Matt Turner [Fri, 30 Oct 2015 17:07:23 +0000 (10:07 -0700)]
i965/vec4: Send from GRF in atomic operations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agogallium/radeon: allow returning SDMA fences from pipe->flush
Marek Olšák [Wed, 28 Oct 2015 12:50:08 +0000 (13:50 +0100)]
gallium/radeon: allow returning SDMA fences from pipe->flush

pipe->flush never returned SDMA fences. This fixes it.
This is only an issue on amdgpu where fences can signal out of order.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agogallium/radeon: always return the last SDMA fence on SDMA flush if needed
Marek Olšák [Wed, 28 Oct 2015 11:59:38 +0000 (12:59 +0100)]
gallium/radeon: always return the last SDMA fence on SDMA flush if needed

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoi965: Add scalar geometry shader support.
Kenneth Graunke [Thu, 12 Mar 2015 06:14:31 +0000 (23:14 -0700)]
i965: Add scalar geometry shader support.

This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support
instanced geometry shaders, and Orbital Explorer's shader spills like
crazy.  But the infrastructure is in place, and it's largely working.

v2: Lots of rebasing.

v3: (feedback from Kristian Høgsberg)
- Handle stride and subreg_offset correctly for ATTRs; use a helper.
- Fix missing emit_shader_time_end() call.
- Delete dead code after early EOT in static vertex case to avoid
  tripping asserts in emit_shader_time_end().
- Use proper D/UD type in intexp2().
- Fix "EndPrimitve" and "to that" typos.
- Assert that invocations == 1 so we know this is missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965: Add scalar GS input lowering code.
Kenneth Graunke [Thu, 24 Sep 2015 03:52:19 +0000 (20:52 -0700)]
i965: Add scalar GS input lowering code.

We really ought to compute the VUE map at link time and stash it, rather
than recomputing it here, but with the mess of program structures I
wasn't sure where to put it.  We can improve that later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965: Fix the fs_visitor GS constructor to take shader_time_index.
Kenneth Graunke [Tue, 3 Nov 2015 20:51:32 +0000 (12:51 -0800)]
i965: Fix the fs_visitor GS constructor to take shader_time_index.

Jason reworked this so it isn't simply ST_GS anymore...it's either -1
(not enabled) or an actual offset.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965/gen8+: Extract color clear surface state
Ben Widawsky [Wed, 14 Oct 2015 03:50:19 +0000 (20:50 -0700)]
i965/gen8+: Extract color clear surface state

On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.

Should have no functional impact.

v2: Move hunk from the next patch into this patch (Matt)
Whitespace fix (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoi965/gen8+: Remove redundant zeroing of surface state
Ben Widawsky [Wed, 14 Oct 2015 03:50:18 +0000 (20:50 -0700)]
i965/gen8+: Remove redundant zeroing of surface state

The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).

NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agonvc0: add missing compute parameters required by clover
Samuel Pitoiset [Tue, 3 Nov 2015 18:33:08 +0000 (19:33 +0100)]
nvc0: add missing compute parameters required by clover

This fixes crashes with some piglit OpenCL tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonvc0: handle NULL pointer in nvc0_get_compute_param()
Samuel Pitoiset [Tue, 3 Nov 2015 18:32:49 +0000 (19:32 +0100)]
nvc0: handle NULL pointer in nvc0_get_compute_param()

To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoi965/skl: PCI ID cleanup and brand strings
Ben Widawsky [Fri, 23 Oct 2015 18:30:16 +0000 (11:30 -0700)]
i965/skl: PCI ID cleanup and brand strings

A few new PCI ids are added here, and one is removed (0x190B) because it no
longer seems to exist anywhere.

v2-4:
Only use ascii characters (Ilia)
0x1921 is no longer marked as f

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
9 years agoi965/skl: Add GT4 PCI IDs
Ben Widawsky [Fri, 30 Oct 2015 00:30:35 +0000 (17:30 -0700)]
i965/skl: Add GT4 PCI IDs

Like other gen8+ hardware, the hardware automatically scales up thread counts.
We must be careful about the URB sizes since GT4 adds another slice.

One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a
real bug since the URB size will be wrong. Because this patch is simply meant to
add the missing IDs, that will be fixed in a later patch.

v2: No longer relevant.

v3: Update the wm thread count to support GT4. The WM thread count is used to
determine the maximum scratch space required. Currently the code always
allocates the maximum amount even though lower GT SKUs require less. The formula
is threads_per_psd * subslices_per_slice * slices

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
9 years agomesa: Add spec citations for DispatchCompute*
Jordan Justen [Tue, 13 Oct 2015 22:04:54 +0000 (15:04 -0700)]
mesa: Add spec citations for DispatchCompute*

Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
appears to have an error regarding the max allowed values. When adding
the specification citation, we note why the code does not match the
specification language.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agomesa: Update DispatchComputeIndirect errors for indirect parameter
Jordan Justen [Tue, 13 Oct 2015 22:04:54 +0000 (15:04 -0700)]
mesa: Update DispatchComputeIndirect errors for indirect parameter

There is some discrepancy between the return values for some error
cases for the DispatchComputeIndirect call in the ARB_compute_shader
specification. Regarding the indirect parameter, in one place the
extension spec lists that the error returned for invalid values should
be INVALID_OPERATION, while later it specifies INVALID_VALUE.

The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
in requiring the INVALID_VALUE error return in this case.

Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoi965/fs: Clean up FBH code.
Matt Turner [Mon, 26 Oct 2015 18:35:57 +0000 (11:35 -0700)]
i965/fs: Clean up FBH code.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Clean up FBH code.
Matt Turner [Mon, 26 Oct 2015 18:35:57 +0000 (11:35 -0700)]
i965/vec4: Clean up FBH code.

It did a bunch of unnecessary stuff, emitting an extra MOV included.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: Replace default case with list of enum values.
Matt Turner [Mon, 26 Oct 2015 13:58:56 +0000 (06:58 -0700)]
i965: Replace default case with list of enum values.

If we add a new file type, we'd like to get warnings if it's not
handled.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Don't disable channels in any/all comparisons.
Matt Turner [Mon, 26 Oct 2015 03:49:08 +0000 (20:49 -0700)]
i965/vec4: Don't disable channels in any/all comparisons.

We've made a mistake in calling the Channel Enable bits "writemask",
because they do more than control which channels of the destination are
written -- they actually control which channels are enabled (surprise!
surprise!)

So, if we emit

               cmp.z.f0(8) null.xy<1>D  g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD
               mov(8)      g12<1>.xUD   0x00000000UD
   (+f0.all4h) mov(8)      g12<1>.xUD   0xffffffffUD

where the CMP instruction has only .xy channel enables, it won't write
the .zw channels of the flag register, which are of course read by the
+f0.all4 predicate.

We need to always emit CMP instructions whose flag result might be read
by such a predicate with all channels enabled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agomesa: fix uniforms calculation in glGetProgramiv
Tapani Pälli [Fri, 30 Oct 2015 12:30:35 +0000 (14:30 +0200)]
mesa: fix uniforms calculation in glGetProgramiv

Since introduction of SSBO, UniformStorage contains not just uniforms
but also buffer variables, this needs to be taken in to account when
calculating active uniforms with GL_ACTIVE_UNIFORMS and
GL_ACTIVE_UNIFORM_MAX_LENGTH.

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agomesa: fix program resource queries for atomic counter buffers
Tapani Pälli [Fri, 30 Oct 2015 10:02:51 +0000 (12:02 +0200)]
mesa: fix program resource queries for atomic counter buffers

gl_active_atomic_buffer contains index to UniformStorage, we need to
calculate resource index for that gl_uniform_storage.

Fixes following CTS tests:
   ES31-CTS.program_interface_query.atomic-counters
   ES31-CTS.program_interface_query.atomic-counters-one-buffer

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoglsl: join calculate_array_size() and calculate_array_stride()
Juha-Pekka Heikkila [Wed, 21 Oct 2015 09:09:21 +0000 (12:09 +0300)]
glsl: join calculate_array_size() and calculate_array_stride()

These helpers are ran for same case the same loop. Here joined
their operation so the loop is ran just once. Also fixed
out-of-memory condition here.

v2: Make the loop simpler to read as per Tapani's suggestion

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
9 years agomesa: expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES
Ryan Houdek [Mon, 2 Nov 2015 03:25:27 +0000 (21:25 -0600)]
mesa: expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES

This has been tested with the piglits in the mailing list and
on the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonouveau: set MaxDrawBuffers to the same value as MaxColorAttachments
Ilia Mirkin [Mon, 2 Nov 2015 01:13:13 +0000 (20:13 -0500)]
nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
9 years agonv50: use correct heaps for FP and GP code segments
Samuel Pitoiset [Sun, 1 Nov 2015 22:28:02 +0000 (23:28 +0100)]
nv50: use correct heaps for FP and GP code segments

This is just a cosmetic change. Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
9 years agomesa/sso: Add compute shader support
Jordan Justen [Sat, 17 Oct 2015 04:19:45 +0000 (21:19 -0700)]
mesa/sso: Add compute shader support

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
[itoral@igalia.com: Reviewed-by for all except the ctx->_Shader change]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agomesa/sso: Add MESA_VERBOSE=api trace support
Jordan Justen [Sat, 17 Oct 2015 04:14:10 +0000 (21:14 -0700)]
mesa/sso: Add MESA_VERBOSE=api trace support

v2:
 * Use %u for unsigned values (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agoi965: Setup pull constant state for compute programs
Jordan Justen [Thu, 15 Oct 2015 17:27:00 +0000 (10:27 -0700)]
i965: Setup pull constant state for compute programs

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agomain/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS
Jordan Justen [Wed, 14 Oct 2015 00:19:54 +0000 (17:19 -0700)]
main/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoglsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules
Jordan Justen [Thu, 15 Oct 2015 21:47:34 +0000 (14:47 -0700)]
glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules

The OpenGLES GLSL 3.1 specification uses the precision qualifier
ordering rules from ARB_shading_language_420pack.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoglsl: Add compute shader builtin variables for OpenGLES 3.1
Jordan Justen [Wed, 14 Oct 2015 00:18:52 +0000 (17:18 -0700)]
glsl: Add compute shader builtin variables for OpenGLES 3.1

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agonouveau: get rid of tabs
Ilia Mirkin [Sat, 31 Oct 2015 23:54:38 +0000 (19:54 -0400)]
nouveau: get rid of tabs

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoi965/sched: don't calculate live intervals for post-RA scheduling
Connor Abbott [Fri, 30 Oct 2015 22:19:34 +0000 (18:19 -0400)]
i965/sched: don't calculate live intervals for post-RA scheduling

For some reason, this causes assertions on gm965 only. In any case, it's
unnecessary since we don't need liveness information in the post-RA
scheduler.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92744
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agovirgl/vtest: fix extra malloc
Dave Airlie [Sat, 31 Oct 2015 08:04:26 +0000 (18:04 +1000)]
virgl/vtest: fix extra malloc

This somehow got added twice, drop the first one.

Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: free sampler view on failure path
Dave Airlie [Sat, 31 Oct 2015 06:07:52 +0000 (16:07 +1000)]
virgl: free sampler view on failure path

Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agogallium/swrast: fixup build breakage and warnings
Dave Airlie [Sat, 31 Oct 2015 06:11:29 +0000 (16:11 +1000)]
gallium/swrast: fixup build breakage and warnings

The front buffer rendering changes broke an interface, I didn't
fix up all of them.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agogallium/swrast: fix front buffer blitting. (v2)
Dave Airlie [Fri, 9 Oct 2015 00:38:08 +0000 (01:38 +0100)]
gallium/swrast: fix front buffer blitting. (v2)

So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.

The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.

It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.

This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git

v2: bump extension version,
check extension version before calling get image. (Ian)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agoglsl: set image access qualifiers for AoA
Timothy Arceri [Thu, 15 Oct 2015 23:28:45 +0000 (10:28 +1100)]
glsl: set image access qualifiers for AoA

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965: Do legacy userclipping in OpenGL ES 1.x contexts.
Ian Romanick [Tue, 27 Oct 2015 21:50:14 +0000 (14:50 -0700)]
i965: Do legacy userclipping in OpenGL ES 1.x contexts.

Commit fba4823a disabled user clipping for everything except
compatibility profile.  Core profile and OpenGL ES 2.0+ have all removed
the classic, OpenGL 1.0 user clip planes.  ES 1.x, however, still has
them.

Fixes OpenGL ES 1.1 conformance mustpass.c and userclip.c

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Olivier Berthier <olivierx.berthier@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92639
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92641

9 years agogbm.h: Add a missing stddef.h include for size_t.
Emmanuel Gil Peyrot [Thu, 29 Oct 2015 15:22:19 +0000 (15:22 +0000)]
gbm.h: Add a missing stddef.h include for size_t.

This was causing compilation issues when one of its providers wasn’t
already included before gbm.h.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
9 years agowinsys/virgl: rework line wrapping/indent
Emil Velikov [Wed, 28 Oct 2015 09:54:15 +0000 (09:54 +0000)]
winsys/virgl: rework line wrapping/indent

Wrap some of the 'omg it's getting out of hand' long lines, and
re-indent where things feel off.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: unwrap the includes
Emil Velikov [Thu, 29 Oct 2015 10:17:04 +0000 (10:17 +0000)]
virgl: unwrap the includes

Include what you want, rather than relying on a header foo.h N levels
down the include chain, to provide something that you need.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl: remove temporary ret variable
Emil Velikov [Wed, 28 Oct 2015 12:50:47 +0000 (12:50 +0000)]
winsys/virgl: remove temporary ret variable

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl: always memset prior to ioctl
Emil Velikov [Wed, 28 Oct 2015 12:49:08 +0000 (12:49 +0000)]
winsys/virgl: always memset prior to ioctl

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl: use MALLOC to match FREE
Emil Velikov [Wed, 28 Oct 2015 12:28:35 +0000 (12:28 +0000)]
winsys/virgl: use MALLOC to match FREE

The uppercase versions are wrappers which must be matched.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl: remove calloc/malloc casts
Emil Velikov [Wed, 28 Oct 2015 12:27:14 +0000 (12:27 +0000)]
winsys/virgl: remove calloc/malloc casts

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl: throw in some inline wrappers
Emil Velikov [Wed, 28 Oct 2015 12:38:35 +0000 (12:38 +0000)]
winsys/virgl: throw in some inline wrappers

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: introduce virgl_query() inline wrapper
Emil Velikov [Wed, 28 Oct 2015 11:36:00 +0000 (11:36 +0000)]
virgl: introduce virgl_query() inline wrapper

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: use virgl_screen/surface upcast wrappers
Emil Velikov [Wed, 28 Oct 2015 11:21:49 +0000 (11:21 +0000)]
virgl: use virgl_screen/surface upcast wrappers

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: introduce and use virgl_transfer/texture/resource inline wrappers
Emil Velikov [Wed, 28 Oct 2015 11:14:02 +0000 (11:14 +0000)]
virgl: introduce and use virgl_transfer/texture/resource inline wrappers

The only two remaining cases of (struct virgl_resource *) require a
closer look. Either the error checking is missing or the arguments
provided feel wrong.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: add virgl_context/sampler_view/so_target() upcast wrappers
Emil Velikov [Wed, 28 Oct 2015 10:48:31 +0000 (10:48 +0000)]
virgl: add virgl_context/sampler_view/so_target() upcast wrappers

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl/drm: drop unneeded forward declaration
Emil Velikov [Wed, 28 Oct 2015 11:57:55 +0000 (11:57 +0000)]
winsys/virgl/drm: drop unneeded forward declaration

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: remove sw_winsys pointer from virgl_screen
Emil Velikov [Wed, 28 Oct 2015 10:21:54 +0000 (10:21 +0000)]
virgl: remove sw_winsys pointer from virgl_screen

The screen already has a pointer to the (base) winsys object.
With the latter of which implemented/sub-classed as either drm or sw
based one, depending on the target.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: rename virgl.h to virgl_screen.h
Emil Velikov [Thu, 29 Oct 2015 10:10:35 +0000 (10:10 +0000)]
virgl: rename virgl.h to virgl_screen.h

Provide a more meaningful name considering it's purpose.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: move virgl_hw.h into the driver dir
Emil Velikov [Wed, 28 Oct 2015 14:39:45 +0000 (14:39 +0000)]
virgl: move virgl_hw.h into the driver dir

Strictly speaking virgl_hw.h should reside in the driver folder, as
it describes the hardware. Moving it allows us to nuke the following
strange dependency

winsys/vtest > driver > winsys/drm

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: straighten the includes confusion
Emil Velikov [Mon, 26 Oct 2015 11:53:36 +0000 (11:53 +0000)]
virgl: straighten the includes confusion

Use the relevant GALLIUM_foo_CFLAGS which has all the requirements
(not to mention VISIBITY_CFLAGS) and keep ../ out of the include
directives.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: remove the _FILE_OFFSET_BITS defines
Emil Velikov [Wed, 28 Oct 2015 10:05:27 +0000 (10:05 +0000)]
virgl: remove the _FILE_OFFSET_BITS defines

The build already sets it as needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl/drm: add all files to the tarball
Emil Velikov [Mon, 26 Oct 2015 11:51:47 +0000 (11:51 +0000)]
winsys/virgl/drm: add all files to the tarball

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl/vtest: list all files in Makefile.sources
Emil Velikov [Wed, 28 Oct 2015 10:08:25 +0000 (10:08 +0000)]
winsys/virgl/vtest: list all files in Makefile.sources

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: move sources list to Makefile.sources
Emil Velikov [Mon, 26 Oct 2015 11:36:50 +0000 (11:36 +0000)]
virgl: move sources list to Makefile.sources

... and add the missing files while we're at it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: fix drm.h include path
Emil Velikov [Wed, 28 Oct 2015 11:47:18 +0000 (11:47 +0000)]
virgl: fix drm.h include path

The drm/ prefix is required, if using the kernel provided headers. As
most distros don't ship them it and we already depend on libdrm (which
adds the relevant -I flag) just drop the drm/ from the include.

Once a libdrm release with the virtgpu_drm.h header is released, we can
drop our local copy of the file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agoi965: enable ARB_shader_clock on gen7+
Emil Velikov [Fri, 30 Oct 2015 17:23:18 +0000 (17:23 +0000)]
i965: enable ARB_shader_clock on gen7+

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Implement nir_intrinsic_shader_clock
Emil Velikov [Wed, 7 Oct 2015 10:50:01 +0000 (11:50 +0100)]
i965: Implement nir_intrinsic_shader_clock

v2:
 - Add a few const qualifiers for good measure.
 - Drop unneeded retype()s (Matt)
 - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

v3:
 - Remove unneeded temporary + MOV (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/fs: move the fs_reg::smear() from get_timestamp() to the callers
Emil Velikov [Fri, 9 Oct 2015 09:40:35 +0000 (10:40 +0100)]
i965/fs: move the fs_reg::smear() from get_timestamp() to the callers

We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir: add shader_clock intrinsic
Emil Velikov [Wed, 7 Oct 2015 10:59:26 +0000 (11:59 +0100)]
nir: add shader_clock intrinsic

v2: Add flags and inline comment/description.
v3: None of the input/outputs are variables
v4: Drop clockARB reference, relate code motion barrier comment wrt
intrinsic flag.
v5: Drop the "thus we can eliminate..." comment (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoglsl: add support for the clock2x32ARB function
Emil Velikov [Fri, 2 Oct 2015 09:25:51 +0000 (10:25 +0100)]
glsl: add support for the clock2x32ARB function

v2: correctly set the return type

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoglsl: add ARB_shader_clock infrastructure
Emil Velikov [Fri, 2 Oct 2015 08:56:37 +0000 (09:56 +0100)]
glsl: add ARB_shader_clock infrastructure

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: add infra for ARB_shader_clock
Emil Velikov [Fri, 2 Oct 2015 08:49:47 +0000 (09:49 +0100)]
mesa: add infra for ARB_shader_clock

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonv50: do not create an invalid HW query type
Samuel Pitoiset [Sat, 17 Oct 2015 09:24:50 +0000 (11:24 +0200)]
nv50: do not create an invalid HW query type

While we are at it, store the rotate offset for occlusion queries to
nv50_hw_query like on nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
9 years agonv50: move HW queries to nv50_query_hw.c/h files
Samuel Pitoiset [Fri, 16 Oct 2015 23:04:27 +0000 (01:04 +0200)]
nv50: move HW queries to nv50_query_hw.c/h files

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
9 years agonv50: move nva0_so_target_save_offset() to its correct location
Samuel Pitoiset [Sun, 18 Oct 2015 16:33:41 +0000 (18:33 +0200)]
nv50: move nva0_so_target_save_offset() to its correct location

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
9 years agonv50: add a header file for nv50_query
Samuel Pitoiset [Fri, 16 Oct 2015 22:14:28 +0000 (00:14 +0200)]
nv50: add a header file for nv50_query

Like for nvc0, this will allow to split different types of queries and
to prepare the way for both global performance counters and MP counters.

While we are at it, make use of nv50_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
9 years agost/va: add support to export a surface as dmabuf
Julien Isorce [Fri, 30 Oct 2015 11:42:53 +0000 (11:42 +0000)]
st/va: add support to export a surface as dmabuf

I.e. implements:
VaAcquireBufferHandle
VaReleaseBufferHandle
for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME

And apply relatives change to:
vlVaMapBuffer
vlVaUnMapBuffer
vlVaDestroyBuffer

Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Tested with gstreamer-vaapi with nouveau driver.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: implement VaDeriveImage
Julien Isorce [Fri, 30 Oct 2015 11:42:52 +0000 (11:42 +0000)]
st/va: implement VaDeriveImage

And apply relatives change to:
vlVaBufferSetNumElements
vlVaCreateBuffer
vlVaMapBuffer
vlVaUnmapBuffer
vlVaDestroyBuffer
vlVaPutImage

It is unfortunate that there is no proper va buffer type and struct
for this. Only possible to use VAImageBufferType which is normally
used for normal user data array.
On of the consequences is that it is only possible VaDeriveImage
is only useful on surfaces backed with contiguous planes.
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer
Julien Isorce [Fri, 30 Oct 2015 11:42:51 +0000 (11:42 +0000)]
st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: add headless support, i.e. VA_DISPLAY_DRM
Julien Isorce [Fri, 30 Oct 2015 11:42:50 +0000 (11:42 +0000)]
st/va: add headless support, i.e. VA_DISPLAY_DRM

This patch allows to use gallium vaapi without requiring
a X server running for your second graphic card.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: handle Video Post Processing for configs
Julien Isorce [Fri, 30 Oct 2015 11:42:49 +0000 (11:42 +0000)]
st/va: handle Video Post Processing for configs

Add support for VA_PROFILE_NONE and VAEntrypointVideoProc
in the 4 following functions:

vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: add colospace conversion through Video Post Processing
Julien Isorce [Fri, 30 Oct 2015 11:42:48 +0000 (11:42 +0000)]
st/va: add colospace conversion through Video Post Processing

Add support for VPP in the following functions:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture

Add support for VAProcFilterNone in:
vlVaQueryVideoProcFilters
vlVaQueryVideoProcFilterCaps
vlVaQueryVideoProcPipelineCaps

Add handleVAProcPipelineParameterBufferType helper.

One application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: implement dmabuf import for VaCreateSurfaces2
Julien Isorce [Fri, 30 Oct 2015 11:42:47 +0000 (11:42 +0000)]
st/va: implement dmabuf import for VaCreateSurfaces2

For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
Julien Isorce [Fri, 30 Oct 2015 11:42:46 +0000 (11:42 +0000)]
st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes

Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.

This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.

The avaible pixel formats can be query with VaQuerySurfaceAttributes.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: do not destroy old buffer when new one failed
Julien Isorce [Fri, 30 Oct 2015 11:42:45 +0000 (11:42 +0000)]
st/va: do not destroy old buffer when new one failed

If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: properly defines VAImageFormat formats and improve VaCreateImage
Julien Isorce [Fri, 30 Oct 2015 11:42:44 +0000 (11:42 +0000)]
st/va: properly defines VAImageFormat formats and improve VaCreateImage

Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h
and return it by default in ChromaToPipe.

Renamed YCbCrToPipe to VaFourccToPipeFormat because it now
contains RGB.

Implemented PipeFormatToVaFourcc which will be used later in
VlVaDeriveImage.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agomain: fix basename match's check if it's an array or struct
Samuel Iglesias Gonsalvez [Tue, 27 Oct 2015 13:21:12 +0000 (14:21 +0100)]
main: fix basename match's check if it's an array or struct

Commit 4565b6f did not update the basename match's check for
the case that string would exactly match the name of the
variable if the suffix "[0]" were appended to it.

Fixes two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Change the position of rname_has_array_index_zero to avoid an out-of-bounds
  read. Reported by Tapani Pälli.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agoi965: Fix invalid memory accesses after resizing brw_codegen's store table
Kristian Høgsberg [Wed, 28 Oct 2015 17:58:09 +0000 (10:58 -0700)]
i965: Fix invalid memory accesses after resizing brw_codegen's store table

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agoi965/sched: use liveness analysis for computing register pressure
Connor Abbott [Tue, 9 Jun 2015 17:26:53 +0000 (10:26 -0700)]
i965/sched: use liveness analysis for computing register pressure

Previously, we were using some heuristics to try and detect when a write
was about to begin a live range, or when a read was about to end a live
range. We never used the liveness analysis information used by the
register allocator, though, which meant that the scheduler's and the
allocator's ideas of when a live range began and ended were different.
Not only did this make our estimate of the register pressure benefit of
scheduling an instruction wrong in some cases, but it was preventing us
from knowing the actual register pressure when scheduling each
instruction, which we want to have in order to switch to register
pressure scheduling only when the register pressure is too high.

This commit rewrites the register pressure tracking code to use the same
model as our register allocator currently uses. We use the results of
liveness analysis, as well as the compute_payload_ranges() function that
we split out in the last commit. This means that we compute live ranges
twice on each round through the register allocator, although we could
speed it up by only recomputing the ranges and not the live in/live out
sets after scheduling, since we only shuffle around instructions within
a single basic block when we schedule.

Shader-db results on bdw:

total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
instructions in affected programs: 1744 -> 1437 (-17.60%)
helped: 1
HURT: 1

total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
cycles in affected programs: 11338636 -> 11276736 (-0.55%)
helped: 876
HURT: 873

LOST:   8
GAINED: 0

v2: use regs_read() in more places.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965/fs: split out calculation of payload live ranges
Connor Abbott [Fri, 12 Jun 2015 19:01:35 +0000 (12:01 -0700)]
i965/fs: split out calculation of payload live ranges

We'll need this for the scheduler too, since it wants to know when the
live ranges of payload registers end in order to model them in our
register pressure calculations.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965: dump scheduling cycle estimates
Connor Abbott [Sat, 6 Jun 2015 14:55:21 +0000 (10:55 -0400)]
i965: dump scheduling cycle estimates

The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.

v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965: always run the post-RA scheduler
Connor Abbott [Sat, 6 Jun 2015 17:32:21 +0000 (13:32 -0400)]
i965: always run the post-RA scheduler

Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.

Test                   master      post-ra-sched     diff       %diff
bench_OglPSBump2       396.730     402.386           5.656      +1.400%
bench_OglPSBump8       244.370     247.591           3.221      +1.300%
bench_OglPSPhong       241.117     242.002           0.885      +0.300%
bench_OglPSPom         59.555      59.725            0.170      +0.200%
bench_OglShMapPcf      86.149      102.346           16.197     +18.800%
bench_OglVSTangent     388.849     395.489           6.640      +1.700%
bench_trex             65.471      65.862            0.390      +0.500%
bench_trexoff          69.562      70.150            0.588      +0.800%
bench_heaven           25.179      25.254            0.074      +0.200%

Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
9 years agoi965/sched: write-after-read dependencies are free
Connor Abbott [Sun, 7 Jun 2015 04:37:27 +0000 (00:37 -0400)]
i965/sched: write-after-read dependencies are free

Although write-after-write dependencies have the same latency as
read-after-write dependencies due to how the register scoreboard works,
write-after-read dependencies aren't checked by the EU at all, so
they're purely a constraint on how the scheduler can order the
instructions.

v2: fix accumulator dependencies too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965: fix cycle estimates when there's a pipeline stall
Connor Abbott [Fri, 5 Jun 2015 23:20:57 +0000 (19:20 -0400)]
i965: fix cycle estimates when there's a pipeline stall

The issue time for an instruction is how many cycles it takes to
actually put it into the pipeline. If there's a pipeline stall that
causes the instruction to be delayed, we should first take that into
account to figure out when the instruction would start executing and
*then* add the issue time. The old code had it backwards, and so we
would underestimate the total time whenever we thought there would be a
pipeline stall by up to the issue time of the instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agovc4: Allow user index buffers, to avoid slow readback for shadow IBs.
Eric Anholt [Tue, 28 Jul 2015 18:35:03 +0000 (11:35 -0700)]
vc4: Allow user index buffers, to avoid slow readback for shadow IBs.

Improves low-settings openarena performance by 31.9975% +/- 0.659931%
(n=7).

9 years agonv50: mark contexts shareable, compile at creation time
Ilia Mirkin [Fri, 30 Oct 2015 03:25:08 +0000 (23:25 -0400)]
nv50: mark contexts shareable, compile at creation time

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>