Leo Liu [Wed, 4 Nov 2015 21:24:26 +0000 (16:24 -0500)]
vl: add drm support for vl_screen
This will allow the state trackers to use render nodes
with screen creation
v2: dup fd for pipe loader
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Leo Liu [Thu, 5 Nov 2015 16:22:22 +0000 (11:22 -0500)]
st/va: fix build fails with pipe loader
There is no dev in drv, and dev should be from vl_screen here
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Samuel Pitoiset [Thu, 5 Nov 2015 23:33:48 +0000 (00:33 +0100)]
nvc0: enable compute support on Fermi
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.
This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 23:48:55 +0000 (18:48 -0500)]
nv50/ir: fix emission of s[] args in certain situations
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 23:47:40 +0000 (18:47 -0500)]
nv50/ir: only take abs value when computing high result
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 6 Nov 2015 05:44:10 +0000 (00:44 -0500)]
nouveau: avoid queueing too much work onto a single fence
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.
This fixes crashes in teximage-colors --benchmark with too many active
maps.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Dave Airlie [Sat, 7 Nov 2015 21:55:17 +0000 (07:55 +1000)]
llvmpipe: disable front updates for now
As pointed out by Emil, this sometimes hangs, appears to be due to threading
need to rethink how this stuff works for llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 31 Oct 2015 06:19:43 +0000 (16:19 +1000)]
virgl: wrap ret assignment with braces to do correct thing
Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Sat, 7 Nov 2015 20:01:50 +0000 (12:01 -0800)]
nir: Add a nir_deref_tail helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 1 May 2015 18:26:40 +0000 (11:26 -0700)]
nir/types: Add an is_vector_or_scalar helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 6 Nov 2015 00:37:47 +0000 (16:37 -0800)]
i965/fs: Use regs_read/written for post-RA scheduling in calculate_deps
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16). This isn't actually true. In
particular, the PLN instruction reads 2 logical registers in one of the
components. This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 22 Oct 2015 23:53:27 +0000 (16:53 -0700)]
nir/validate: Add better validation of load/store types
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 3 Nov 2015 11:20:18 +0000 (12:20 +0100)]
radeonsi: add register definitions for Stoney
There are a few non-stoney changes too.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Sun, 1 Nov 2015 12:43:26 +0000 (13:43 +0100)]
radeonsi: add workarounds for CP DMA to stay on the fast path
v2: set emit_scratch_reloc, add a NULL check
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Oct 2015 00:33:42 +0000 (01:33 +0100)]
radeonsi: unify CP DMA preparation logic
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Oct 2015 00:21:01 +0000 (01:21 +0100)]
radeonsi: unify CP DMA code determining various flags
v2: don't call get_flush_flags twice per function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Oct 2015 00:03:42 +0000 (01:03 +0100)]
radeonsi: only enable write confirmation on the last CP DMA packet
This should improve performance for big copies that need to be split.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Ilia Mirkin [Sat, 7 Nov 2015 05:41:05 +0000 (00:41 -0500)]
nv50/ir: allow emission of immediates in imul/imad ops
Nothing actually uses this yet (due to complications), but the emission
logic is right.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 00:28:29 +0000 (19:28 -0500)]
nv50/ir: properly set the type of the constant folding result
This removes the hack used for merge, which only covers a fraction of
the cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 00:13:35 +0000 (19:13 -0500)]
nv50/ir: add support for const-folding OP_CVT with F64 source/dest
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 23 Feb 2015 00:49:49 +0000 (19:49 -0500)]
nv50/ir: add fp64 opcode emission support for G200 (NVA0)
Need to emulate rcp/rsq before providing full fp64 support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:38 +0000 (14:32 +0100)]
nv50/ir: Add support for 64bit immediates to checkSwapSrc01
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:37 +0000 (14:32 +0100)]
nvc0/ir: Teach insnCanLoad about double immediates
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"
This turns the following (nvc0) code:
1: mov u32 $r2 0x00000000 (8)
2: mov u32 $r3 0x3fe00000 (8)
3: add f64 $r0d $r0d $r2d (8)
Into:
1: add f64 $r0d $r0d 0.500000 (8)
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:36 +0000 (14:32 +0100)]
nv50/ir: Add support for merge-s to the ConstantFolding pass
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.
If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 6 Nov 2015 22:58:42 +0000 (17:58 -0500)]
nv50/ir: disallow 64-bit immediates on nv50 targets
No instructions are able to load short immediates like nvc0 can.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 6 Nov 2015 22:18:01 +0000 (17:18 -0500)]
nv50/ir: allow movs with TYPE_F64 destinations to be split
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:35 +0000 (14:32 +0100)]
gm107/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:34 +0000 (14:32 +0100)]
nvc0/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Francisco Jerez [Fri, 6 Nov 2015 21:19:56 +0000 (13:19 -0800)]
i965/nir/fs: Add comment for no-op memory barrier functions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jordan Justen [Sat, 10 Oct 2015 20:00:04 +0000 (13:00 -0700)]
i965/nir/fs: Implement new barrier functions for compute shaders
For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:
* nir_intrinsic_memory_barrier_atomic_counter
* nir_intrinsic_memory_barrier_buffer
* nir_intrinsic_memory_barrier_image
We treat these nir intrinsics as no-ops:
* nir_intrinsic_group_memory_barrier
* nir_intrinsic_memory_barrier_shared
v3:
* Add comment for no-op cases (curro)
v4:
* Moving comment to a separate patch authored by curro
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Jordan Justen [Sat, 10 Oct 2015 15:59:42 +0000 (08:59 -0700)]
nir: Add new barrier functions for compute shaders
When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Jordan Justen [Fri, 9 Oct 2015 21:16:05 +0000 (14:16 -0700)]
glsl: Add new barrier functions for compute shaders
When these functions are called in GLSL code, we create an intrinsic
function call:
* groupMemoryBarrier => __intrinsic_group_memory_barrier
* memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
* memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
* memoryBarrierImage => __intrinsic_memory_barrier_image
* memoryBarrierShared => __intrinsic_memory_barrier_shared
v2:
* Consolidate with memoryBarrier function/intrinsic creation (curro)
v3:
* Instead of add_memory_barrier_function, add an intrinsic_name
parameter to _memory_barrier (curro)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Boyuan Zhang [Wed, 23 Sep 2015 08:11:08 +0000 (10:11 +0200)]
radeon/uvd: fix VC-1 simple/main profile decode v2
We just needed to set the extra width/height fields to get this working.
v2 (chk): rebased, CC stable added, commit message added, fixed coding style
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Boyuan Zhang [Wed, 23 Sep 2015 08:11:07 +0000 (10:11 +0200)]
st/vaapi: fix vaapi VC-1 simple/main corruption v2
Apply the start code fix only to advanced profile.
v2 (chk): add commit message
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Julien Isorce [Fri, 6 Nov 2015 09:45:22 +0000 (09:45 +0000)]
st/va: add support for RGBX and BGRX in VPP
Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.
This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Julien Isorce [Fri, 6 Nov 2015 09:45:19 +0000 (09:45 +0000)]
vl/buffers: add RGBX and BGRX to the supported formats
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Julien Isorce [Fri, 6 Nov 2015 09:45:17 +0000 (09:45 +0000)]
st/va: properly use brackets in vlVaAcquireBufferHandle's switch
In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Julien Isorce [Fri, 6 Nov 2015 09:45:11 +0000 (09:45 +0000)]
st/va: properly indent buffer.c, config.c, image.c and picture.c
Some lines were using 4 indentation spaces instead of 3.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Rob Clark [Tue, 27 Oct 2015 15:38:34 +0000 (11:38 -0400)]
freedreno/a4xx: fix blend color
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 27 Oct 2015 15:33:32 +0000 (11:33 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Guillaume Charifi [Fri, 6 Nov 2015 16:17:25 +0000 (11:17 -0500)]
freedreno: add a305 support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Boyan Ding [Fri, 16 Oct 2015 07:15:38 +0000 (15:15 +0800)]
freedreno/ir3: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 21 Oct 2015 14:57:15 +0000 (10:57 -0400)]
nir: some small cleanups
The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit. Plus couple other corrections/
clarifications.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Ilia Mirkin [Fri, 6 Nov 2015 04:12:52 +0000 (23:12 -0500)]
nvc0: reintroduce BGRA4 format support
Commit
342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.
The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Brian Paul [Fri, 6 Nov 2015 02:03:39 +0000 (19:03 -0700)]
mesa: report enum name in glClientActiveTexture() error string
As we do for glActiveTexture(). Trivial.
Julien Isorce [Thu, 5 Nov 2015 08:24:45 +0000 (08:24 +0000)]
st/va: fix memory leak on error in vlVaCreateSurfaces2
Found by coverity: CID #
1337953
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Julien Isorce [Thu, 5 Nov 2015 08:24:44 +0000 (08:24 +0000)]
st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2
Some lines were using 4 indentation spaces instead of 3.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Kenneth Graunke [Tue, 13 Oct 2015 22:30:03 +0000 (15:30 -0700)]
i965: Fix scalar VS float[] and vec2[] output arrays.
The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken). Outputs need to be padded
out to vec4 slots.
In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4. This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.
Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them. Nothing used those values,
and dead code elimination threw a party.
To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.
Normally, varying packing avoids this problem by turning varyings into
vec4s. So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.
v2: Shorten the implementation of type_size_4x to a single line (caught
by Connor Abbott), and rename it to type_size_vec4_times_4()
(renaming suggested by Jason Ekstrand). Use type_size_vec4
rather than using type_size_vec4_times_4 and then dividing by 4.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Roland Scheidegger [Thu, 5 Nov 2015 17:00:40 +0000 (18:00 +0100)]
llvmpipe: disable texture cache
There are some weird problems with 8-wide vectors.
Ilia Mirkin [Sat, 31 Oct 2015 00:44:57 +0000 (20:44 -0400)]
nouveau: send back a debug message when waiting for a fence to complete
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 30 Oct 2015 22:41:09 +0000 (18:41 -0400)]
nv50,nvc0: provide debug messages with shader compilation stats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 30 Oct 2015 21:23:22 +0000 (17:23 -0400)]
nouveau: add support for sending debug messages via KHR_debug
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 31 Oct 2015 03:25:59 +0000 (23:25 -0400)]
st/clover: provide a path for drivers to call through to pfn_notify
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[ Francisco Jerez: Clean up clover::context interface by passing
around a function object. ]
Ilia Mirkin [Sat, 31 Oct 2015 03:28:01 +0000 (23:28 -0400)]
st/mesa: set debug callback for debug contexts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Ilia Mirkin [Fri, 30 Oct 2015 07:17:35 +0000 (03:17 -0400)]
gallium: expose a debug message callback settable by context owner
This will allow gallium drivers to send messages to KHR_debug endpoints
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ilia Mirkin [Thu, 5 Nov 2015 05:33:22 +0000 (00:33 -0500)]
st/mesa: account for texture views when doing CopyImageSubData
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Iago Toral Quiroga [Fri, 30 Oct 2015 10:10:02 +0000 (11:10 +0100)]
i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.
v2: use const, get rid of useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Iago Toral Quiroga [Fri, 30 Oct 2015 09:57:47 +0000 (10:57 +0100)]
i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.
v2: use const, get rid of useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Iago Toral Quiroga [Fri, 30 Oct 2015 09:24:12 +0000 (10:24 +0100)]
i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
v2: Use const, do not add unnecessary temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Iago Toral Quiroga [Fri, 30 Oct 2015 07:48:57 +0000 (08:48 +0100)]
i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Iago Toral Quiroga [Fri, 30 Oct 2015 07:39:11 +0000 (08:39 +0100)]
i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
v2: Use const and remove useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Neil Roberts [Mon, 7 Sep 2015 17:23:14 +0000 (18:23 +0100)]
i965/skl+: Enable support for 16x multisampling
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Neil Roberts [Mon, 28 Sep 2015 17:22:32 +0000 (18:22 +0100)]
mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.
To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.
This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.
v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
[Ian Romanick]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Neil Roberts [Thu, 22 Oct 2015 08:55:35 +0000 (10:55 +0200)]
meta/blit: Always try to enable GL_ARB_sample_shading
Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Neil Roberts [Wed, 16 Sep 2015 16:43:33 +0000 (17:43 +0100)]
meta: Support 16x MSAA in the multisample scaled blit shader
v2: Fix the x_scale in the shader. Remove the doubts in the commit
message.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Neil Roberts [Fri, 11 Sep 2015 17:09:46 +0000 (18:09 +0100)]
i965/meta: Support 16x MSAA in the meta stencil blit
The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Neil Roberts [Wed, 9 Sep 2015 16:44:17 +0000 (17:44 +0100)]
i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA
In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Neil Roberts [Wed, 9 Sep 2015 13:38:08 +0000 (14:38 +0100)]
i965: Support allocating the MCS buffer for 16x MSAA
When 16 samples are used the MCS buffer needs 64 bits per pixel.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Neil Roberts [Wed, 9 Sep 2015 13:36:42 +0000 (14:36 +0100)]
i965: Support calculating the bits needed to set up 16x MSAA
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Neil Roberts [Tue, 15 Sep 2015 15:34:35 +0000 (16:34 +0100)]
i965/fs: Add a sampler program key for whether the texture is 16x MSAA
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.
Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Neil Roberts [Wed, 9 Sep 2015 14:59:36 +0000 (15:59 +0100)]
i965/vec4/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Neil Roberts [Tue, 8 Sep 2015 14:52:09 +0000 (15:52 +0100)]
i965/fs/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.
v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Neil Roberts [Wed, 16 Sep 2015 10:48:42 +0000 (11:48 +0100)]
i965: Program 16x MSAA sample positions.
This is the standard pattern used by the other 3D graphics API.
BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.
The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.
arb_gpu_shader5-interpolateAtSample-different
Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.
(Based on a patch by Kenneth Graunke)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
Kenneth Graunke [Thu, 29 Jan 2015 07:58:43 +0000 (23:58 -0800)]
i965: Handle 16x MSAA in IMS dimension munging code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Kenneth Graunke [Wed, 4 Nov 2015 01:16:49 +0000 (17:16 -0800)]
nir: Rename nir_live_variables.c to nir_liveness.c.
It doesn't actually operate on variables.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Wed, 4 Nov 2015 01:15:24 +0000 (17:15 -0800)]
nir: Rename live_variables to live_ssa_defs.
This computes liveness of SSA values, not nir_variables.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Alejandro Piñeiro [Tue, 20 Oct 2015 11:08:09 +0000 (13:08 +0200)]
i965/vec4: select predicate based on writemask for sel emissions
Equivalent to commit
8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.
This patch helps on cases like this:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.
So, with this patch, the previous case becames this:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
So now cmod propagation can simplify out #2:
1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
Shader-db numbers:
total instructions in shared programs:
6235835 ->
6228008 (-0.13%)
instructions in affected programs: 219850 -> 212023 (-3.56%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 1192
HURT: 0
Ilia Mirkin [Thu, 5 Nov 2015 03:42:41 +0000 (22:42 -0500)]
nouveau: relax fence emit space assert
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Eric Anholt [Wed, 4 Nov 2015 21:27:16 +0000 (13:27 -0800)]
vc4: When the create ioctl fails, free our cache and try again.
This greatly increases the pressure you can put on the driver before
create fails. Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Wed, 4 Nov 2015 21:10:28 +0000 (13:10 -0800)]
vc4: Print the rounded shader size in debug output.
It's surprising to see "0kb" printed for debug on short shaders, while
4kb alignment won't be suprising.
Eric Anholt [Wed, 4 Nov 2015 21:13:39 +0000 (13:13 -0800)]
vc4: Fix dumping the size of BOs allocated/cached.
60MB of cached BOs are a lot less scary than 600MB.
Ilia Mirkin [Wed, 4 Nov 2015 19:26:37 +0000 (14:26 -0500)]
mesa/tests: add glBufferStorageEXT to ES 3.1 dispatch list
I thought that aliased functions didn't need to be added, but that might
only be if the function aliases something in the same {desktop,ES}
space. Resolves the dispatch sanity test failure.
Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Brian Paul [Sat, 31 Oct 2015 13:02:36 +0000 (07:02 -0600)]
vbo: fix another GL_LINE_LOOP bug
Very long line loops which spanned 3 or more vertex buffers were not
handled correctly and could result in stray lines.
The piglit lineloop test draws 10000 vertices by default, and is not
long enough to trigger this. Even 'lineloop -count 100000' doesn't
trigger the bug.
For future reference, the issue can be reproduced by changing Mesa's
VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to
use glVertex2f(), draw 3 loops instead of 1, and specifying -count
1023.
Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Tue, 3 Nov 2015 21:34:15 +0000 (14:34 -0700)]
svga: implement 'white_fragments' option for VGPU10 fragment shaders
When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white. We had this implemented
for VGPU9, but not VGPU10.
VMware bug
1545492.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Thu, 29 Oct 2015 01:05:27 +0000 (19:05 -0600)]
u_vbuf: minor code reformatting / line wrapping
Trivial.
Brian Paul [Thu, 29 Oct 2015 01:02:38 +0000 (19:02 -0600)]
u_vbuf: add some const qualifiers
Trivial.
Brian Paul [Sat, 31 Oct 2015 13:44:49 +0000 (07:44 -0600)]
svga: use new enum indices_mode type
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Sat, 31 Oct 2015 13:44:23 +0000 (07:44 -0600)]
util/indices: replace #define tokens with enum type
To ease debugging in gdb.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Alejandro Piñeiro [Thu, 22 Oct 2015 20:22:14 +0000 (22:22 +0200)]
i965: check inst->predicate when clearing flag_live at dead code eliminate
Detected by Matt Turner while reviewing commit
a59359ecd22154cc2b3f88bb8c599f21af8a3934
Reviewed-by: Matt Turner <mattst88@gmail.com>
Roland Scheidegger [Wed, 4 Nov 2015 13:21:43 +0000 (14:21 +0100)]
gallivm: fix sampling for s3tc srgb formats when using texture cache
This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
Ben Widawsky [Wed, 14 Oct 2015 03:50:25 +0000 (20:50 -0700)]
i965/meta: Assert fast clears and rep clears never overlap
There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Ryan Houdek [Tue, 3 Nov 2015 01:30:18 +0000 (19:30 -0600)]
mesa: expose support for GL_EXT_buffer_storage
This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Timothy Arceri [Tue, 3 Nov 2015 21:41:29 +0000 (08:41 +1100)]
glsl: make sure to only add subroutines to resource list
Over looked in
763cd8c080353.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Timothy Arceri [Wed, 4 Nov 2015 03:50:49 +0000 (14:50 +1100)]
glsl: remove old TODO
SSBO support now exists as of commits f24e5e and
f408a13dd30.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Timothy Arceri [Thu, 15 Oct 2015 23:28:48 +0000 (10:28 +1100)]
docs: Mark AoA as done for i965
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 15 Oct 2015 23:28:47 +0000 (10:28 +1100)]
i965: enable ARB_arrays_of_arrays
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Timothy Arceri [Fri, 30 Oct 2015 23:31:37 +0000 (10:31 +1100)]
i965: add support for image AoA
V3: clamp array index to the correct size (the size of the current array
rather than the inner array) Francisco Jerez.
V2: avoid useless zero-initialization and addition for the first AoA level,
avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
to element_size, assign the indirect indexing temporary directly to
image.reladdr, and replace while loop with a for loop. All suggested
by Francisco Jerez.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Roland Scheidegger [Tue, 27 Oct 2015 04:34:00 +0000 (05:34 +0100)]
llvmpipe: add cache for compressed textures
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Oded Gabbay [Tue, 3 Nov 2015 08:36:01 +0000 (10:36 +0200)]
llvmpipe: use simple coeffs calc for 128bit vectors
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.
The decision which method to use is determined by the size of the vector
that is used by the platform.
For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.
For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.
This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).
This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.
The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:
- glxspheres, on ppc64le:
- original code: 4.
892745317 frames/sec 5.
460303857 Mpixels/sec
- with the patch: 4.
932083873 frames/sec 5.
504205571 Mpixels/sec
- Additional 0.8% performance boost
- glxspheres, on x86-64 without AVX:
- original code: 20.
16418809 frames/sec 22.
50323395 Mpixels/sec
- with the patch: 20.
31328989 frames/sec 22.
66963152 Mpixels/sec
- Additional 0.74% performance boost
- glmark2, on ppc64le:
- original code: score of 58
- with my change: score of 57
- glmark2, on x86-64 without AVX:
- original code: score of 175
- with the patch: score of 167
- Impact of of -4.5% on performance
- OpenArena, on ppc64le:
- original code: 3398 frames 1719.0 seconds 2.0 fps
255.0/505.9/2773.0/0.0 ms
- with the patch: 3398 frames 1690.4 seconds 2.0 fps
241.0/497.5/2563.0/0.2 ms
- 29 seconds faster with the patch, which is about 2%
- OpenArena, on x86-64 without AVX:
- original code: 3398 frames 239.6 seconds 14.2 fps
38.0/70.5/719.0/14.6 ms
- with the patch: 3398 frames 244.4 seconds 13.9 fps
38.0/71.9/697.0/14.3 ms
- 0.3 fps slower with the patch (about 2%)
Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Kenneth Graunke [Tue, 3 Nov 2015 05:43:40 +0000 (21:43 -0800)]
nir: Properly invalidate metadata in nir_opt_remove_phis().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org