mesa.git
9 years agoradeonsi: Use util_memcpy_cpu_to_le32()
Tom Stellard [Fri, 18 Jul 2014 19:10:52 +0000 (15:10 -0400)]
radeonsi: Use util_memcpy_cpu_to_le32()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoutil: Add util_memcpy_cpu_to_le32() v3
Tom Stellard [Fri, 18 Jul 2014 19:55:08 +0000 (15:55 -0400)]
util: Add util_memcpy_cpu_to_le32() v3

v2:
  - Preserve word boundaries.

v3:
  - Use const and restrict.
  - Fix indentation.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoclover: Add checks for image support to the image functions v2
Tom Stellard [Fri, 25 Jul 2014 21:12:28 +0000 (17:12 -0400)]
clover: Add checks for image support to the image functions v2

Most image functions are required to return a CL_INVALID_OPERATION
error when used on devices without image support.

v2:
  - Simplified the code

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agor600g/compute: Add debug information to promote and demote functions
Bruno Jiménez [Sun, 27 Jul 2014 11:56:16 +0000 (13:56 +0200)]
r600g/compute: Add debug information to promote and demote functions

v2: Add information about the item's starting point and size
v3: Rebased on top of master

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Add documentation to compute_memory_pool
Bruno Jiménez [Sun, 27 Jul 2014 11:56:15 +0000 (13:56 +0200)]
r600g/compute: Add documentation to compute_memory_pool

v2: Rebased on top of master

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agoilo: unblock an inline write with a staging bo
Chia-I Wu [Mon, 28 Jul 2014 07:11:42 +0000 (15:11 +0800)]
ilo: unblock an inline write with a staging bo

This should allow a deeper pipeline.

9 years agoilo: try unblocking a transfer with a staging bo
Chia-I Wu [Mon, 28 Jul 2014 01:28:05 +0000 (09:28 +0800)]
ilo: try unblocking a transfer with a staging bo

When mapping a busy resource with PIPE_TRANSFER_DISCARD_RANGE or
PIPE_TRANSFER_FLUSH_EXPLICIT, we can avoid blocking by allocating and mapping
a staging bo, and emit pipelined copies at proper places.  Since the staging
bo is never bound to GPU, we give it packed layout to save space.

9 years agoilo: enable persistent and coherent transfers
Chia-I Wu [Mon, 28 Jul 2014 01:50:31 +0000 (09:50 +0800)]
ilo: enable persistent and coherent transfers

Enable PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT and reorder caps a bit.

9 years agoilo: drop ptr from ilo_transfer
Chia-I Wu [Mon, 28 Jul 2014 05:03:08 +0000 (13:03 +0800)]
ilo: drop ptr from ilo_transfer

With the recent clean-ups, we can pass the mapped pointer around between
functions cleanly.  Drop it to make ilo_transfer smaller.

9 years agoilo: s/TRANSFER_MAP_UNSYNC/TRANSFER_MAP_GTT_UNSYNC/
Chia-I Wu [Mon, 28 Jul 2014 04:56:02 +0000 (12:56 +0800)]
ilo: s/TRANSFER_MAP_UNSYNC/TRANSFER_MAP_GTT_UNSYNC/

It maps to drm_intel_gem_bo_map_unsynchronized(), which results in
unsynchronized GTT mapping.

9 years agoilo: drop unused context param from transfer functions
Chia-I Wu [Mon, 28 Jul 2014 04:04:46 +0000 (12:04 +0800)]
ilo: drop unused context param from transfer functions

Many of the transfer functions do not need an ilo_context.  Drop it.

9 years agoilo: tidy up transfer mapping/unmapping
Chia-I Wu [Mon, 28 Jul 2014 03:00:52 +0000 (11:00 +0800)]
ilo: tidy up transfer mapping/unmapping

Add xfer_map() to replace map_bo_for_transfer().  Add xfer_unmap() and
xfer_alloc_staging_sys() to simplify texture and buffer mapping/unmapping, and
enable more code sharing between them.

9 years agoilo: tidy up choose_transfer_method()
Chia-I Wu [Fri, 25 Jul 2014 17:10:21 +0000 (01:10 +0800)]
ilo: tidy up choose_transfer_method()

Add a bunch of helper functions and a big comment for
choose_transfer_method().  This also fixes handling of
PIPE_TRANSFER_MAP_DIRECTLY to not ignore tiling.

9 years agoilo: free transfers with util_slab_free()
Chia-I Wu [Sat, 26 Jul 2014 20:55:24 +0000 (04:55 +0800)]
ilo: free transfers with util_slab_free()

We used FREE() in one of the error path.

9 years agoclover: Add clUnloadPlatformCompiler.
EdB [Sun, 27 Jul 2014 21:07:39 +0000 (23:07 +0200)]
clover: Add clUnloadPlatformCompiler.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoclover: Add clCreateProgramWithBuiltInKernels.
EdB [Sun, 27 Jul 2014 21:07:38 +0000 (23:07 +0200)]
clover: Add clCreateProgramWithBuiltInKernels.

[ Francisco Jerez: Check for devices not associated with the specified
  context.  Style fix. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoglsl/cs: Add several GLSL compute shader variables
Jordan Justen [Wed, 11 Jun 2014 00:43:25 +0000 (17:43 -0700)]
glsl/cs: Add several GLSL compute shader variables

With MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader, this fixes piglit:
built-in-constants tests/spec/arb_compute_shader/minimum-maximums.txt

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agomain/cs: Add additional compute shader constant values
Jordan Justen [Mon, 9 Jun 2014 20:40:01 +0000 (13:40 -0700)]
main/cs: Add additional compute shader constant values

With MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader, this fixes piglit:
* arb_compute_shader-minmax

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoglsl: No longer require ubo block index to be constant in ir_validate
Chris Forbes [Sun, 18 May 2014 00:19:04 +0000 (12:19 +1200)]
glsl: No longer require ubo block index to be constant in ir_validate

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoglsl: Accept nonconstant array references in lower_ubo_reference
Chris Forbes [Fri, 16 May 2014 10:07:24 +0000 (22:07 +1200)]
glsl: Accept nonconstant array references in lower_ubo_reference

Instead of falling back to just the block name (which we won't find),
look for the first element of the block array. We'll deal with the rest
in the backend by arranging for the blocks to be laid out contiguously.

V2: Squashed together patches 3, 5 of V1, plus a naming tweak.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoglsl: Convert uniform_block in lower_ubo_reference to ir_rvalue.
Chris Forbes [Sun, 18 May 2014 00:03:54 +0000 (12:03 +1200)]
glsl: Convert uniform_block in lower_ubo_reference to ir_rvalue.

Previously this was a block index with special semantics for -1.
With ARB_gpu_shader5, this need not be a compile-time constant, so
allow any rvalue here and convert the -1 to a NULL pointer.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoglsl: Mark entire UBO array active if indexed with non-constant.
Chris Forbes [Fri, 16 May 2014 09:28:09 +0000 (21:28 +1200)]
glsl: Mark entire UBO array active if indexed with non-constant.

Without doing a lot more work, we have no idea which indices may
be used at runtime, so just mark them all.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoglsl: Allow non-constant UBO array indexing with GLSL4/ARB_gpu_shader5.
Chris Forbes [Fri, 16 May 2014 09:10:18 +0000 (21:10 +1200)]
glsl: Allow non-constant UBO array indexing with GLSL4/ARB_gpu_shader5.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoilo: simplify ilo_flush()
Chia-I Wu [Fri, 25 Jul 2014 02:53:05 +0000 (10:53 +0800)]
ilo: simplify ilo_flush()

Move fence creation to the new ilo_fence_create().

9 years agor600g/compute: Defrag the pool at the same time as we grow it
Bruno Jiménez [Sat, 19 Jul 2014 17:35:51 +0000 (19:35 +0200)]
r600g/compute: Defrag the pool at the same time as we grow it

This allows us two things: we now need less item copies when we have
to defrag+grow the pool (to just one copy per item) and, even in the
case where we don't need to defrag the pool, we reduce the data copied
to just the useful data that the items use.

Note: The fallback path is a bit ugly now, but hopefully we won't need
it much.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Try to use a temporary resource when growing the pool
Bruno Jiménez [Mon, 7 Jul 2014 15:50:05 +0000 (17:50 +0200)]
r600g/compute: Try to use a temporary resource when growing the pool

Now, before moving everything to host memory, we try to create a
new resource to use as a pool. I we succeed we just use this resource
and delete the previous one. If we fail we fallback to using the
shadow.

This should make growing the pool faster, and we can also save
64KB of memory that were allocated for the 'shadow', even if they
weren't used.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agofreedreno: fix typo in gpu version check
Rob Clark [Fri, 25 Jul 2014 18:28:10 +0000 (14:28 -0400)]
freedreno: fix typo in gpu version check

Opps, I should use larger fonts, I guess.

Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/ir3: split out shader compiler from a3xx
Rob Clark [Fri, 25 Jul 2014 15:15:59 +0000 (11:15 -0400)]
freedreno/ir3: split out shader compiler from a3xx

Move the bits we want to share between generations from fd3_program to
ir3_shader.  So overall structure is:

  fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
                                    |- ...
                                    \- ir3_shader_variant -> ir3

So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.

There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.

Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx/compiler: rename ir3_shader to ir3
Rob Clark [Fri, 25 Jul 2014 14:56:23 +0000 (10:56 -0400)]
freedreno/a3xx/compiler: rename ir3_shader to ir3

First step of reoganization split out compiler (so it can be shared
between a3xx and a4xx).  Rename ir3_shader -> ir3 (since we'll want
the name ir3_shader for a higher level object).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx/compiler: scheduler vs pred reg
Rob Clark [Fri, 25 Jul 2014 13:50:34 +0000 (09:50 -0400)]
freedreno/a3xx/compiler: scheduler vs pred reg

The scheduler also needs to be aware of predicate register (p0) in
addition to address register (a0).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx/compiler: little cleanups
Rob Clark [Fri, 25 Jul 2014 13:49:41 +0000 (09:49 -0400)]
freedreno/a3xx/compiler: little cleanups

Remove some obsolete comments, rename deref->addr.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx: enable/disable wa's based on patch-level
Rob Clark [Wed, 18 Jun 2014 14:24:04 +0000 (10:24 -0400)]
freedreno/a3xx: enable/disable wa's based on patch-level

It seems like for the most part, different behaviors, workarounds, etc,
should be conditional on GPU patch revision (ie. a320.0 vs a320.2)
rather than GPU id (a320 vs a330).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx/compiler: make IR heap dyanmic
Rob Clark [Wed, 23 Jul 2014 21:21:29 +0000 (17:21 -0400)]
freedreno/a3xx/compiler: make IR heap dyanmic

The fixed size heap is a remnant of the fdre-a3xx assembler.  Yet it is
convenient for being able to free the entire data structure in one shot
without worrying about leaking nodes.

Change it to dynamically grow the heap size (adding chunks) as needed so
we don't have an artificial upper limit on shader size (other than hw
limits) and don't always have to allocate worst-case size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agor600g/compute: Fix singed/unsigned comparison compiler warnings.
Jan Vesely [Fri, 25 Jul 2014 14:33:42 +0000 (10:33 -0400)]
r600g/compute: Fix singed/unsigned comparison compiler warnings.

The iteration variables go from 0 anyway.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agoclover: Query the device to see if images are supported
Tom Stellard [Thu, 24 Jul 2014 00:37:08 +0000 (20:37 -0400)]
clover: Query the device to see if images are supported

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agogallium: Add PIPE_CAP_COMPUTE_IMAGES_SUPPORTED
Tom Stellard [Thu, 24 Jul 2014 00:37:07 +0000 (20:37 -0400)]
gallium: Add PIPE_CAP_COMPUTE_IMAGES_SUPPORTED

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agor600g/compute: Allow compute_memory_defrag to defragment between resources
Bruno Jiménez [Sat, 19 Jul 2014 17:35:50 +0000 (19:35 +0200)]
r600g/compute: Allow compute_memory_defrag to defragment between resources

This will be used in the following patch to avoid duplicated code

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Allow compute_memory_move_item to move items between resources
Bruno Jiménez [Thu, 24 Jul 2014 08:28:06 +0000 (10:28 +0200)]
r600g/compute: Allow compute_memory_move_item to move items between resources

v2: Remove unnecesary variables

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agogbm: Search LIBGL_DRIVERS_PATH if GBM_DRIVERS_PATH is not set
Dylan Baker [Tue, 22 Jul 2014 18:43:54 +0000 (11:43 -0700)]
gbm: Search LIBGL_DRIVERS_PATH if GBM_DRIVERS_PATH is not set

The GBM_DRIVERS_PATH environment variable is not documented, and only
used to set the location of gbm drivers, while LIBGL_DRIVERS_PATH is
used for everything else, and is documented.

Generally this split leads to confusion as to why gbm doesn't work.

This patch will read LIBGL_DRIVERS_PATH as a fallback if
GBM_DRIVERS_PATH is not set.

The comments clearly indicate that using LIBGL_DRIVERS_PATH is
preferred over GBM_DRIVERS_PATH.

v2: - Use GBM_DRIVERS_PATH as a fallback
v3: [jordan.l.justen@intel.com] - Make LIBGL_DRIVERS_PATH the fallback

Signed-off-by: Dylan Baker <baker.dylan.c@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agowinsys/radeon: fix indentation
Jerome Glisse [Thu, 24 Jul 2014 21:30:31 +0000 (17:30 -0400)]
winsys/radeon: fix indentation

Can we please keep it clean and avoid ending up in messy situation
like ddx.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
9 years agoAdd an accelerated version of F_TO_I for x86_64
Jason Ekstrand [Mon, 21 Jul 2014 23:46:39 +0000 (16:46 -0700)]
Add an accelerated version of F_TO_I for x86_64

According to a quick micro-benchmark, this new version is 20% faster on my
Haswell laptop.

v2: Removed the XXX note about x86_64 from the comment
v3: Use an intrinsic instead of an __asm__ block.  This should give us MSVC
    support for free.
v4: Enable it for all x86_64 builds, not just with USE_X86_64_ASM

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/fs: Decide predicate/predicate_inverse outside of the for loop.
Matt Turner [Tue, 11 Feb 2014 21:12:07 +0000 (13:12 -0800)]
i965/fs: Decide predicate/predicate_inverse outside of the for loop.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Swap if/else conditions in SEL peephole.
Matt Turner [Tue, 11 Feb 2014 21:04:55 +0000 (13:04 -0800)]
i965/fs: Swap if/else conditions in SEL peephole.

Will clarify make the next commit easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Improve dead control flow elimination.
Matt Turner [Tue, 15 Jul 2014 22:29:29 +0000 (15:29 -0700)]
i965: Improve dead control flow elimination.

... to eliminate an ELSE instruction followed immediately by an ENDIF.

instructions in affected programs:     704 -> 700 (-0.57%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonvc0/ir: support 2d constbuf indexing
Ilia Mirkin [Sun, 6 Jul 2014 06:06:04 +0000 (02:06 -0400)]
nvc0/ir: support 2d constbuf indexing

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agogm107/ir: emit LDC subops
Ilia Mirkin [Tue, 15 Jul 2014 00:29:04 +0000 (20:29 -0400)]
gm107/ir: emit LDC subops

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agogk110/ir: emit load constant subop
Ilia Mirkin [Tue, 15 Jul 2014 00:20:03 +0000 (20:20 -0400)]
gk110/ir: emit load constant subop

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agomesa/st: add support for interpolate_at_* ops
Ilia Mirkin [Sun, 6 Jul 2014 03:32:06 +0000 (23:32 -0400)]
mesa/st: add support for interpolate_at_* ops

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agonv50/ir: fix phi/union sources when their def has been merged
Ilia Mirkin [Thu, 17 Jul 2014 04:30:40 +0000 (00:30 -0400)]
nv50/ir: fix phi/union sources when their def has been merged

In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.

This maintains the invariant that a phi node's defs and sources are
allocated the same register.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: fix hard-coded TYPE_U32 sized register
Ilia Mirkin [Thu, 17 Jul 2014 03:20:57 +0000 (23:20 -0400)]
nv50/ir: fix hard-coded TYPE_U32 sized register

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonvc0: mark shader header if fp64 is used
Ilia Mirkin [Fri, 18 Jul 2014 02:31:11 +0000 (22:31 -0400)]
nvc0: mark shader header if fp64 is used

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: keep track of whether the program uses fp64
Ilia Mirkin [Fri, 18 Jul 2014 02:30:00 +0000 (22:30 -0400)]
nv50/ir: keep track of whether the program uses fp64

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonvc0: make sure that the local memory allocation is aligned to 0x10
Ilia Mirkin [Fri, 18 Jul 2014 02:11:56 +0000 (22:11 -0400)]
nvc0: make sure that the local memory allocation is aligned to 0x10

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
9 years agomesa: add ARB_clear_texture.xml to file list, remove duplicate decls
Ilia Mirkin [Thu, 24 Jul 2014 01:10:51 +0000 (21:10 -0400)]
mesa: add ARB_clear_texture.xml to file list, remove duplicate decls

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agoilo: check the tilings of imported handles
Chia-I Wu [Thu, 24 Jul 2014 05:21:41 +0000 (13:21 +0800)]
ilo: check the tilings of imported handles

Just to be cautious.

9 years agoilo: clean up resource bo renaming
Chia-I Wu [Thu, 24 Jul 2014 03:10:48 +0000 (11:10 +0800)]
ilo: clean up resource bo renaming

s/alloc_bo/rename_bo/ as that is what the functions do.  Simplify bo
allocation and move the complexity to bo renaming.

9 years agoilo: share some code between {tex,buf}_create_bo
Chia-I Wu [Thu, 24 Jul 2014 02:32:31 +0000 (10:32 +0800)]
ilo: share some code between {tex,buf}_create_bo

Add resource_get_bo_name() and resource_get_bo_initial_domain() for use by
both functions.

9 years agoilo: use native 3-component vertex formats on GEN7.5+
Chia-I Wu [Thu, 24 Jul 2014 01:39:37 +0000 (09:39 +0800)]
ilo: use native 3-component vertex formats on GEN7.5+

GEN7.5 gains support for those formats natively.

9 years agoilo: allow for device-dependent format translation
Chia-I Wu [Thu, 24 Jul 2014 01:32:34 +0000 (09:32 +0800)]
ilo: allow for device-dependent format translation

Pass ilo_dev_info to all format translation functions.

9 years agoi965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV textures
Jason Ekstrand [Sat, 19 Jul 2014 01:23:30 +0000 (18:23 -0700)]
i965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV textures

Since intel is always going to be little-endian,
GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and
BGRA textures, so the same acceleration code will work.  We might as well
use it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: Fix the name in the error message
Ian Romanick [Wed, 16 Jul 2014 17:52:32 +0000 (10:52 -0700)]
mesa: Fix the name in the error message

Obvious copy-and-paste bug.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoglsl: Fix some bad indentation
Ian Romanick [Wed, 16 Jul 2014 20:02:26 +0000 (13:02 -0700)]
glsl: Fix some bad indentation

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/fs: Set LastRT on the final FB write on Broadwell.
Kenneth Graunke [Mon, 21 Jul 2014 23:17:46 +0000 (16:17 -0700)]
i965/fs: Set LastRT on the final FB write on Broadwell.

In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend
test, key->nr_color_regions == 2, but the dual source blend FB write has
ir->target set to 0.  So we failed to set "Last Render Target Select" on
any FB write message.

We only emit one FB write per render target, so my comment about setting
LastRT on every FB write directed at the last color region is a bit...
misinformed.  According to the documentation, depth buffer writes and
scoreboard updates happen on the FB write with LastRT set, so I believe
we want to set it only once.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
9 years agoi965: Port INTEL_DEBUG=optimizer to the vec4 backend.
Kenneth Graunke [Tue, 22 Jul 2014 03:06:23 +0000 (20:06 -0700)]
i965: Port INTEL_DEBUG=optimizer to the vec4 backend.

Largely via copy and paste.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Save the gl_shader_stage enum in backend_visitor.
Kenneth Graunke [Tue, 22 Jul 2014 03:05:21 +0000 (20:05 -0700)]
i965: Save the gl_shader_stage enum in backend_visitor.

This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which
needs to know whether it's currently processing a VS or GS.  It isn't
worth adding virtual methods for this case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Don't print WE_normal in disassembly.
Kenneth Graunke [Thu, 17 Jul 2014 23:41:44 +0000 (16:41 -0700)]
i965: Don't print WE_normal in disassembly.

Dropping this helps most lines fit in an 80 column terminal.  The
absence of WE_normal also helps call attention to WE_all, where
something unusual is going on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agofreedreno/a3xx/compiler: fix p0 (kill, etc)
Rob Clark [Wed, 23 Jul 2014 19:08:40 +0000 (15:08 -0400)]
freedreno/a3xx/compiler: fix p0 (kill, etc)

Don't assert (debug builds) or assign random uninitialized value for
predicate register (p0).. that screws up kill, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agoRevert "r600g/compute: Fix warnings"
Tom Stellard [Wed, 23 Jul 2014 15:52:05 +0000 (11:52 -0400)]
Revert "r600g/compute: Fix warnings"

This reverts commit 467f1585e28adba0e94ef593de131bc327f098bb.

This breaks the build on some systems.

9 years agoradeon/llvm: fix formatting
Grigori Goronzy [Thu, 17 Jul 2014 16:44:26 +0000 (18:44 +0200)]
radeon/llvm: fix formatting

Use K&R and same indent as most other code. No functional change
intended.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agoradeon/llvm: enable unsafe math for graphics shaders
Grigori Goronzy [Thu, 17 Jul 2014 16:44:25 +0000 (18:44 +0200)]
radeon/llvm: enable unsafe math for graphics shaders

Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.

Piglit didn't indicate any regressions.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Fix warnings
Tom Stellard [Wed, 23 Jul 2014 14:26:16 +0000 (10:26 -0400)]
r600g/compute: Fix warnings

9 years agor600g: Use hardware sqrt instruction
Glenn Kennard [Fri, 18 Jul 2014 07:54:37 +0000 (09:54 +0200)]
r600g: Use hardware sqrt instruction

Piglit quick tests including sqrt pass, no other regressions,
tested on radeon 6670.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
9 years agor600g/compute: Remove unneeded code from compute_memory_promote_item
Bruno Jiménez [Wed, 16 Jul 2014 21:12:47 +0000 (23:12 +0200)]
r600g/compute: Remove unneeded code from compute_memory_promote_item

Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.

This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Quick exit if there's nothing to add to the pool
Bruno Jiménez [Wed, 16 Jul 2014 21:12:46 +0000 (23:12 +0200)]
r600g/compute: Quick exit if there's nothing to add to the pool

This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Defrag the pool if it's necesary
Bruno Jiménez [Wed, 16 Jul 2014 21:12:45 +0000 (23:12 +0200)]
r600g/compute: Defrag the pool if it's necesary

This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.

The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.

This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Add a function for defragmenting the pool
Bruno Jiménez [Wed, 16 Jul 2014 21:12:44 +0000 (23:12 +0200)]
r600g/compute: Add a function for defragmenting the pool

This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.

For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agor600g/compute: Add a function for moving items in the pool
Bruno Jiménez [Wed, 16 Jul 2014 21:12:43 +0000 (23:12 +0200)]
r600g/compute: Add a function for moving items in the pool

This function will be used in the future by compute_memory_defrag
to move items forward in the pool.

It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.

Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agofreedreno/a3xx: more vtx formats
Rob Clark [Mon, 21 Jul 2014 14:41:49 +0000 (10:41 -0400)]
freedreno/a3xx: more vtx formats

Actually what we currently handle is just the SCALED versions, and not
the int versions.  The difference probably matters more when we actually
support integer in the compiler.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx/compiler: const file relative addressing
Rob Clark [Mon, 21 Jul 2014 19:24:30 +0000 (15:24 -0400)]
freedreno/a3xx/compiler: const file relative addressing

Teach new compiler scheduling and register assignment how to deal with
relative addressing.  This gets us what we need to avoid falling back to
old compiler for CONST[ADDR[0].x+n].  It is also a prerequisite for temp
file relative addressing, although that is going to also need some
cleverness in register assignment to keep arrays grouped together.

NOTE: doing address calculation in full precision and then narrowing to
s16 in the mov to addr reg seems to sometimes cause lockups (and
sometimes work?!).  It seems more reliable to do the address calculation
in s16, like the blob does.  Which means teaching RA how to deal with
mixed half and full precision allocation.  Fortunately that didn't turn
out to be too hard, so that is a nice bonus which we could probably take
better advantage of elsewhere.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx/compiler: move function
Rob Clark [Mon, 21 Jul 2014 18:16:44 +0000 (14:16 -0400)]
freedreno/a3xx/compiler: move function

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx: add back a few stalls
Rob Clark [Sun, 20 Jul 2014 15:26:56 +0000 (11:26 -0400)]
freedreno/a3xx: add back a few stalls

Technically we should not need these.  CP_LOAD_STATE can be pipelined.
But removing them broke a few piglit tests, like fbo-depth-
GL_DEPTH_COMPONENT24-readpixels.  I expect these are just masking a
problem elsewhere, or perhaps they are only needed under some more
specific circumstances.  But until that is understood properly, give
back a bit of the perf boost we got from c63450e8.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agotargets/dri: fix freedreno targets
Rob Clark [Mon, 21 Jul 2014 14:43:30 +0000 (10:43 -0400)]
targets/dri: fix freedreno targets

The kernel driver name is either "kgsl" (downstream/android) or "msm"
(upstream).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: update generated headers
Rob Clark [Sat, 19 Jul 2014 17:22:10 +0000 (13:22 -0400)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agodocs: Update GL3.txt and relnotes for GL_ARB_clear_texture
Neil Roberts [Wed, 23 Jul 2014 11:10:37 +0000 (12:10 +0100)]
docs: Update GL3.txt and relnotes for GL_ARB_clear_texture

9 years agometa: Add a meta implementation of GL_ARB_clear_texture
Neil Roberts [Tue, 10 Jun 2014 15:21:21 +0000 (16:21 +0100)]
meta: Add a meta implementation of GL_ARB_clear_texture

Adds an implementation of the ClearTexSubImage driver entry point that tries
to set up an FBO to render to the texture and then calls glClearBuffer with a
scissor to perform the actual clear. If an FBO can't be created for the
texture then it will fall back to using _mesa_store_ClearTexSubImage.

When used in combination with _mesa_store_ClearTexSubImage this should provide
an implementation that works for all DRI-based drivers. However as this has
only been tested with the i965 driver it is currently only enabled there.

v2: Only enable the extension for the i965 driver instead of all DRI drivers.
    Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add
    some more comments.

v3: Use glClearBuffer* to avoid having to modify glClearColor and friends.
    Handle sRGB textures. Explicitly disable dithering.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
9 years agometa: Add a state flag for the GL_DITHER
Neil Roberts [Fri, 4 Jul 2014 14:37:28 +0000 (15:37 +0100)]
meta: Add a state flag for the GL_DITHER

The Meta implementation of glClearTexSubImage is going to want to ensure that
dithering is disabled so that it can get a consistent color across the whole
texture when clearing. This adds a state flag to easily save it and set it to
the default value when performing meta operations.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
9 years agotexstore: Add a generic implementation of GL_ARB_clear_texture
Neil Roberts [Tue, 10 Jun 2014 15:19:58 +0000 (16:19 +0100)]
texstore: Add a generic implementation of GL_ARB_clear_texture

Adds an implmentation of the ClearTexSubImage driver entry point that just
maps the texture and writes the values in. The extension is not yet enabled by
default because it doesn't work with multisample textures as they don't have a
simple linear layout.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agomesa/main: Add generic bits of ARB_clear_texture implementation
Neil Roberts [Tue, 10 Jun 2014 15:11:00 +0000 (16:11 +0100)]
mesa/main: Add generic bits of ARB_clear_texture implementation

This adds the driver entry point for glClearTexSubImage and fills in the
_mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it.

v2: Don't clear some of the images if only one of them makes an error

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoteximage: Add utility func for format/internalFormat compatibility check
Neil Roberts [Fri, 13 Jun 2014 16:28:48 +0000 (17:28 +0100)]
teximage: Add utility func for format/internalFormat compatibility check

In texture_error_check() there was a snippet of code to check whether the
given format and internal format are basically compatible. This has been split
out into its own static helper function so that it can be used by an
implementation of glClearTexImage too.

9 years agomesa/main: add ARB_clear_texture entrypoints
Ilia Mirkin [Sat, 1 Mar 2014 21:46:53 +0000 (16:46 -0500)]
mesa/main: add ARB_clear_texture entrypoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agor600g/radeonsi: Use write-combined CPU mappings of some BOs in GTT
Michel Dänzer [Thu, 19 Jun 2014 01:40:38 +0000 (10:40 +0900)]
r600g/radeonsi: Use write-combined CPU mappings of some BOs in GTT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agowinsys/radeon: Use separate caching buffer managers for VRAM and GTT
Michel Dänzer [Fri, 13 Jun 2014 08:48:57 +0000 (17:48 +0900)]
winsys/radeon: Use separate caching buffer managers for VRAM and GTT

Should reduce overhead because the caching buffer manager doesn't need to
consider buffers of the wrong type.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agodocs/GL3.txt: update status for ARB_compute_shader
Dave Airlie [Wed, 23 Jul 2014 01:06:15 +0000 (11:06 +1000)]
docs/GL3.txt: update status for ARB_compute_shader

since some bits are done in tree, but nobody is working on it anymore.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa: Don't use memcpy() in _mesa_texstore() for float depth texture data
Anuj Phogat [Mon, 21 Jul 2014 23:58:42 +0000 (16:58 -0700)]
mesa: Don't use memcpy() in _mesa_texstore() for float depth texture data

because float depth texture data needs clamping to [0.0, 1.0]. Let the
_mesa_texstore() fallback to slower path.

Fixes Khronos GLES3 CTS tests:
shadow_execution_vert
shadow_execution_frag

V2: Move the check to _mesa_texstore_can_use_memcpy() function.
    Add check for floating point data types.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoi965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.
Kenneth Graunke [Fri, 18 Jul 2014 20:19:46 +0000 (13:19 -0700)]
i965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.

We actually want to use mov(16), not mov(8).

Fixes 7 Piglit tests: ARB_sample_shading/builtin-gl-sample-mask [2468]
and ARB_sample_shading/builtin-gl-sample-mask-simple [468].

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
9 years agoi965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.
Kenneth Graunke [Fri, 18 Jul 2014 20:19:45 +0000 (13:19 -0700)]
i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.

We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.

This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
9 years agoi965: Add missing persample_shading field to brw_wm_debug_recompile.
Kenneth Graunke [Thu, 17 Jul 2014 18:18:35 +0000 (11:18 -0700)]
i965: Add missing persample_shading field to brw_wm_debug_recompile.

Otherwise, the performance warning for shader recompiles will just say
"something else".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
9 years agoi965/disasm: Don't disassemble the URB complete field on Broadwell.
Kenneth Graunke [Thu, 17 Jul 2014 22:55:05 +0000 (15:55 -0700)]
i965/disasm: Don't disassemble the URB complete field on Broadwell.

It doesn't exist, so attempting to read it will trigger generation
assertions in the brw_inst API.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Disable hex offset printing in disassembly.
Kenneth Graunke [Thu, 17 Jul 2014 23:29:41 +0000 (16:29 -0700)]
i965: Disable hex offset printing in disassembly.

Printing the hex offsets makes it basically impossible to diff assembly:
if you add even a single instruction, the entire shader shows up as a
difference.  So, every time I want to compare assembly, I have to strip
this out.

The hex offsets might be useful when debugging compaction, or when
inspecting the program cache buffer.  Since it's occasionally useful,
but uncommon, this patch disables it by default, but makes it easy to
re-enable it temporarily when the need arises.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/vec4: Use foreach_inst_in_block a couple more places.
Matt Turner [Sat, 12 Jul 2014 18:21:21 +0000 (11:21 -0700)]
i965/vec4: Use foreach_inst_in_block a couple more places.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>