git.libre-soc.org Git - mesa.git/log

Eric Anholt [Mon, 4 Jan 2016 21:56:39 +0000 (13:56 -0800)]

vc4: Don't try the SF coalescing unless it's on a def.

If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.

commit | commitdiff | tree

Edward O'Callaghan [Tue, 5 Jan 2016 10:07:23 +0000 (21:07 +1100)]

gallium/drivers/svga: Use unsigned for loop index

Fix a 's/unsigned int/unsigned/' consistency case while here.

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Edward O'Callaghan [Tue, 5 Jan 2016 10:07:22 +0000 (21:07 +1100)]

gallium/drivers/r600: Use unsigned for loop index

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Edward O'Callaghan [Tue, 5 Jan 2016 10:07:21 +0000 (21:07 +1100)]

gallium/drivers/ilo: Use unsigned for loop index

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Edward O'Callaghan [Tue, 5 Jan 2016 10:07:20 +0000 (21:07 +1100)]

gallium: Use unsigned for loop index

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Edward O'Callaghan [Tue, 5 Jan 2016 10:07:19 +0000 (21:07 +1100)]

gallium/drivers: Remove unnecessary semicolons

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Edward O'Callaghan [Tue, 5 Jan 2016 10:07:18 +0000 (21:07 +1100)]

gallium: Remove unnecessary semicolons

Fix silly issue with MSVC case fall-though support to need
a extra 'break;'

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

commit | commitdiff | tree

Oded Gabbay [Tue, 29 Dec 2015 16:12:35 +0000 (18:12 +0200)]

llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8

This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
Name            Before     After    Delta
------------------------------------------------
openarena        16.35      16.7     2.14%
xonotic          4.707      4.97     5.57%

glmark2 didn't show a significant (more than 1%) difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Oded Gabbay [Tue, 29 Dec 2015 16:12:34 +0000 (18:12 +0200)]

llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8

This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   139.8      142.7    2.07%

openarena and xonotic didn't show a significant (more than 1%)
difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Oded Gabbay [Sun, 13 Dec 2015 15:49:32 +0000 (17:49 +0200)]

llvmpipe: Optimize do_triangle_ccw for POWER8

This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   136.6      139.8    2.34%
openarena         16.14      16.35    1.30%
xonotic           4.655      4.707    1.11%

v2:

- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Oded Gabbay [Thu, 3 Dec 2015 07:11:13 +0000 (09:11 +0200)]

llvmpipe: add POWER8 portability file - u_pwr8.h

This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.

All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".

Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.

All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.

All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).

v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Oded Gabbay [Thu, 3 Dec 2015 07:11:04 +0000 (09:11 +0200)]

configure.ac: Detect if running on POWER8 arch

To determine if we could use special POWER8 assembly directives, we first
need to detect whether we are running on POWER8 architecture. This patch
adds this detection to configure.ac and adds the necessary compilation
flags accordingly.

v2:

- Add option to disable POWER8 instructions generation
- Detect whether building on BE or LE machine and build with
-mpower8-vector only on LE machine
- Make the printed messages more standard

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Kenneth Graunke [Tue, 5 Jan 2016 13:09:46 +0000 (05:09 -0800)]

nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.

The nir_opt_algebraic rule

(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),

can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv. (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point. So, make NIR do it for us.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Kenneth Graunke [Tue, 5 Jan 2016 10:54:50 +0000 (02:54 -0800)]

i965: Only turn on ARB_compute_shader if we can write registers.

Compute shaders require reconfiguring the L3 for shared local memory
support. We have to be able to write the L3 registers to do that.

This effectively turns off compute shaders prior to Kernel 4.2.

(Previously, the extension enable was in an API_OPENGL_CORE conditional.
However, that isn't necessary - core Mesa extension handling already
restricts it properly. I've moved it out in this patch.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

commit | commitdiff | tree

Kenneth Graunke [Tue, 5 Jan 2016 12:46:33 +0000 (04:46 -0800)]

i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x.

That's what it's for. Plus, we actually implement rcp.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Timothy Arceri [Wed, 6 Jan 2016 00:27:05 +0000 (11:27 +1100)]

mesa: fix GL_MAX_NAME_LENGTH query for tessellation shaders

This fixes some piglit subtests for ARB_program_interface_query.

V3: remove some of the unnecessary parentheses
V2: fix alignment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Timothy Arceri [Wed, 23 Dec 2015 03:26:49 +0000 (14:26 +1100)]

glsl: don't change the varying type in validation code

There is a function dedicated to demoting unused varyings lets
trust it to do its job.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

commit | commitdiff | tree

Timothy Arceri [Wed, 23 Dec 2015 03:11:04 +0000 (14:11 +1100)]

glsl: move lowering after matching validation

After lowering the matching flag is_unmatched_generic_inout is lost so
we need to move this validation before lowering.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

commit | commitdiff | tree

Timothy Arceri [Wed, 23 Dec 2015 22:50:59 +0000 (09:50 +1100)]

glsl: only add outward facing varyings to resourse list for SSO

An SSO program can have multiple stages and we only want to add the externally
facing varyings. The current code was adding both the packed inputs and outputs
for the first and last stage of each program.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

commit | commitdiff | tree

Anuj Phogat [Tue, 24 Mar 2015 23:07:40 +0000 (16:07 -0700)]

i965/gen9: Modify the conditions to use blitter on skl+

Conditions modified allow skl+ to use blitter:
- for all tiling formats
- to write data to YF/YS tiled surfaces

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

commit | commitdiff | tree

Anuj Phogat [Tue, 10 Nov 2015 23:33:53 +0000 (15:33 -0800)]

i965/gen9: Return false in place of assert in intelEmitCopyBlit()

This allows the fallback paths to handle it correctly.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Anuj Phogat [Tue, 3 Nov 2015 18:31:45 +0000 (10:31 -0800)]

i965/gen9: Remove regions overlap check in fast copy blit

Overlapping blits are anyway undefined in OpenGL. So no need
of overlap check here.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Anuj Phogat [Tue, 28 Jul 2015 17:47:35 +0000 (10:47 -0700)]

i965/gen9: Don't use fast copy blit in case of non power of 2 cpp

Fast copy blit is currently enabled for use only with Yf/Ys tiling.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Ian Romanick [Fri, 18 Dec 2015 01:50:34 +0000 (17:50 -0800)]

i915/i965: Fix typo in perf_debug message

Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Brian Paul [Tue, 5 Jan 2016 20:04:46 +0000 (13:04 -0700)]

st/mesa: minor indentation fixes

commit | commitdiff | tree

Brian Paul [Tue, 5 Jan 2016 20:03:05 +0000 (13:03 -0700)]

draw: minor indentation fix

commit | commitdiff | tree

Brian Paul [Tue, 5 Jan 2016 20:03:05 +0000 (13:03 -0700)]

mesa: minor clean-up of some memcpy/sizeof() calls in m_matrix.c

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

commit | commitdiff | tree