mesa.git
9 years agoi965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
Eduardo Lima Mitev [Thu, 22 Oct 2015 13:32:13 +0000 (15:32 +0200)]
i965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const

When both fadd and fmul instructions have at least one operand that is a
constant and it is only used once, the total number of instructions can
be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
the constants will be progagated as immediate operands of fmul and fadd.

This patch detects these situations and prevents fusing fmul+fadd into ffma.

Shader-db results on i965 Haswell:

total instructions in shared programs: 6235835 -> 6225895 (-0.16%)
instructions in affected programs:     1124094 -> 1114154 (-0.88%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                7612
HURT:                                  843
GAINED:                                4
LOST:                                  0

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoutil: Add list_is_singular() helper function
Eduardo Lima Mitev [Fri, 23 Oct 2015 14:31:41 +0000 (16:31 +0200)]
util: Add list_is_singular() helper function

Returns whether the list has exactly one element.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driver
Eduardo Lima Mitev [Thu, 22 Oct 2015 13:25:23 +0000 (15:25 +0200)]
nir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driver

Because the next patch will add an optimization that is specific to i965,
we want to move this loweing pass to that driver altogether.

This is safe because i965 is the only consumer.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoglsl: Use array deref for access to vector components
Kristian Høgsberg Kristensen [Wed, 4 Nov 2015 22:58:54 +0000 (14:58 -0800)]
glsl: Use array deref for access to vector components

We've assumed that we could lower per-component vector access from

  vec[i] = scalar

to

  vec = ir_triop_vector_insert(vec, scalar, i)

but with SSBOs (and compute shader SLM and tesselation outputs) this is
no longer valid. If a vector is "externally visible", multiple threads
can write independent components simultaneously. With lowering to
ir_triop_vector_insert, each thread read the entire vector, changes one
component, then writes out the entire vector. This is racy.

Instead of generating a ir_binop_vector_extract when we see v[i], we
generate ir_dereference_array. We then add a lowering pass to lower the
ir_dereference_array to ir_binop_vector_extract for rvalues and for to
vector_insert for lvalues in a separate lowering pass.

The resulting IR is the same as before, but we now have a window between
ast->ir conversion and the lowering pass where v[i] appears in the IR as
an array deref. This lets us run lowering passes that lower the vector
access to I/O (eg for SSBO load/store) before we lower the per-component
access to full vector writes.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoglsl: Lower UBO and SSBO access in glsl linker
Kristian Høgsberg Kristensen [Wed, 4 Nov 2015 22:55:32 +0000 (14:55 -0800)]
glsl: Lower UBO and SSBO access in glsl linker

All GLSL IR consumers run this lowering pass so we can move it to the
linker. This moves the pass up quite a bit, but that's the point: it
needs to run before we throw away information about per-component vector
access.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoglsl: Drop exec_list argument to lower_ubo_reference
Kristian Høgsberg Kristensen [Wed, 4 Nov 2015 22:50:51 +0000 (14:50 -0800)]
glsl: Drop exec_list argument to lower_ubo_reference

We always pass in shader->ir and we already pass in the shader, so just
drop the exec_list. Most passes either take just a exec_list or a
shader, so this seems more consistent.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agonir/glsl: switch to using the builder
Connor Abbott [Sat, 31 Oct 2015 20:31:59 +0000 (16:31 -0400)]
nir/glsl: switch to using the builder

v2: use nir_bulder_cf_insert (Ken)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/glsl: make emit() take nir_ssa_def * sources
Connor Abbott [Sat, 31 Oct 2015 03:56:49 +0000 (23:56 -0400)]
nir/glsl: make emit() take nir_ssa_def * sources

Again, this matches what the builder will have to do.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/glsl: convert nir_visitor::result to a nir_ssa_def *
Connor Abbott [Sat, 31 Oct 2015 03:47:46 +0000 (23:47 -0400)]
nir/glsl: convert nir_visitor::result to a nir_ssa_def *

Its only user now returns a nir_ssa_def *, and we'll need this since the
builder returns a nir_ssa_def *.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/glsl: make evaluate_rvalue() return a nir_ssa_def *
Connor Abbott [Sat, 31 Oct 2015 03:32:50 +0000 (23:32 -0400)]
nir/glsl: make evaluate_rvalue() return a nir_ssa_def *

A long time ago, before NIR was even merged to master, glsl_to_nir used
registers and these sources were actually register sources. But nowadays
everything in glsl_to_nir is an SSA value, so stop pretending that by
evaluating an rvalue we can get an arbitrary nir_src. Most importantly,
we need this since the builder takes nir_ssa_def * sources directly.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agost/mesa: Destroy buffer object's mutex.
Jose Fonseca [Mon, 9 Nov 2015 22:25:27 +0000 (22:25 +0000)]
st/mesa: Destroy buffer object's mutex.

Ideally we should have a _mesa_cleanup_buffer_object function in
src/mesa/bufferobj.c so that the destruction logic resided in a single
place.

Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agonir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.
Kenneth Graunke [Fri, 9 Oct 2015 22:49:49 +0000 (15:49 -0700)]
nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.

These tessellation shader related fields need plumbing through NIR.

v2: Use uint32_t instead of uint64_t to match the source type of
    GLbitfield (caught by Iago Toral).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agovc4: Avoid loading undefined (newly-allocated) FBO contents.
Eric Anholt [Mon, 9 Nov 2015 17:12:20 +0000 (09:12 -0800)]
vc4: Avoid loading undefined (newly-allocated) FBO contents.

Since X has undefined contents in new pixmaps, it will allocate new
textures for an FBO and draw to them without an explicit clear.  For
VC4, it's much faster to emit a clear than the load of the actual
undefined memory contents, so just do that instead.

9 years agovc4: Return NULL when we can't make our shadow for a sampler view.
Eric Anholt [Mon, 9 Nov 2015 16:56:01 +0000 (08:56 -0800)]
vc4: Return NULL when we can't make our shadow for a sampler view.

I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
9 years agovc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.
Eric Anholt [Fri, 6 Nov 2015 19:07:25 +0000 (11:07 -0800)]
vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.

I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
9 years agovc4: Add CL dumping for GL_ARRAY_PRIMITIVE.
Eric Anholt [Tue, 27 Oct 2015 23:14:05 +0000 (16:14 -0700)]
vc4: Add CL dumping for GL_ARRAY_PRIMITIVE.

9 years agovc4: Fix a compiler warning.
Eric Anholt [Mon, 9 Nov 2015 16:51:47 +0000 (08:51 -0800)]
vc4: Fix a compiler warning.

9 years agoglsl: Use shared storage variable type for shared variables
Jordan Justen [Tue, 28 Jul 2015 22:00:47 +0000 (15:00 -0700)]
glsl: Use shared storage variable type for shared variables

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agoglsl: Add shared variable type
Jordan Justen [Tue, 28 Jul 2015 21:56:49 +0000 (14:56 -0700)]
glsl: Add shared variable type

Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.

These variables are similar to OpenCL's local/__local variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agoglsl: Add space to shader_storage in print_visitor
Jordan Justen [Mon, 9 Nov 2015 03:07:43 +0000 (19:07 -0800)]
glsl: Add space to shader_storage in print_visitor

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agoglsl: Align comments on variables types
Jordan Justen [Tue, 28 Jul 2015 21:56:49 +0000 (14:56 -0700)]
glsl: Align comments on variables types

v2:
 * Split from patch to add ir_var_shader_shared (tarceri)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agoglsl: Parse shared keyword for compute shader variables
Jordan Justen [Sun, 15 Mar 2015 20:53:06 +0000 (13:53 -0700)]
glsl: Parse shared keyword for compute shader variables

v2:
 * Move shared parsing under storage qualifiers (tarceri)
 * Fail to compile if shared is used in non-compute shader (tarceri)
 * Use separate shared_storage bit for shared variables (tarceri)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agoglsl: simplify interface block stream qualifier validation
Timothy Arceri [Fri, 6 Nov 2015 23:53:53 +0000 (10:53 +1100)]
glsl: simplify interface block stream qualifier validation

Qualifiers on member variables are redundent all we need to do
if check if it matches the stream associated with the block and
throw an error if its not.

Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
9 years agodocs: note that ARB_copy_image was added to nv50, nvc0 in this release
Ilia Mirkin [Mon, 9 Nov 2015 12:13:29 +0000 (07:13 -0500)]
docs: note that ARB_copy_image was added to nv50, nvc0 in this release

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agost/wgl: add null pointer check for HUD texture
Brian Paul [Mon, 6 Jul 2015 20:53:06 +0000 (14:53 -0600)]
st/wgl: add null pointer check for HUD texture

Fixes crash when using HUD with Nobel Clinician Viewer.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agost/wgl: fix double-present on swapbuffers bug
Brian Paul [Tue, 16 Jun 2015 01:14:42 +0000 (19:14 -0600)]
st/wgl: fix double-present on swapbuffers bug

The stw_st_framebuffer_present_locked() function was getting called
twice per SwapBuffers.  First, when st_context_iface::flush() was
called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was
given.  Second, by stw_st_swap_framebuffer_locked() which does the
actual SwapBuffers.

Two code changes:
1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT.
2. Move the implementation of stw_flush_current_locked() into
DrvSwapBuffers() since it's not called anywhere else.

Not much change in perf for benchmarks like Lightsmark, but some simple
Mesa demos are measurably faster.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
9 years agost/wgl: reorder pixel formats to put MSAA formats last
Brian Paul [Mon, 1 Jun 2015 14:45:07 +0000 (08:45 -0600)]
st/wgl: reorder pixel formats to put MSAA formats last

And put 8-bit/channel formats before 5/6/5 formats.

The ChoosePixelFormat() function seems to be finicky about format
selection.  Putting the MSAA formats after the non-MSAA formats
means most apps get a low-numbered format.  Now we generally get
the same pixel format regardless of whether using vgpu9 or 10.

VMware bug 1455030

Reviewed-by: José Fonseca <jfonseca@vmware.com>
9 years agost/wgl: Don't rely on GDI to bookkeep pixelformat for us.
José Fonseca [Thu, 22 Mar 2012 12:16:17 +0000 (12:16 +0000)]
st/wgl: Don't rely on GDI to bookkeep pixelformat for us.

This allows to use apitrace's retracediff script on Windows to retrace and
compare two builds of a Mesa based opengl32.dll/ICD side-by-side.

See also https://github.com/apitrace/apitrace/commit/e4a4f15f5b92e0abbd24d7d053da25f8278c9f64

9 years agowinsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3
Michel Dänzer [Thu, 21 Aug 2014 09:30:44 +0000 (18:30 +0900)]
winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3

Fixes GPUVM conflicts with non-4K page size.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738

v2: Replace sanitization of VM base address alignment with comment why
    that's not necessary.
v3: Use unsigned instead of long as the type for the size_align member.
    (Marek)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoradeon/uvd: add H.265/HEVC to legal notes
Christian König [Fri, 6 Nov 2015 20:15:56 +0000 (15:15 -0500)]
radeon/uvd: add H.265/HEVC to legal notes

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
9 years agost/omx: add headless support
Leo Liu [Wed, 4 Nov 2015 21:38:28 +0000 (16:38 -0500)]
st/omx: add headless support

This will allow dec/enc/transcode without X

v2:  use env override even with X,
     use loader_open_device instead of open
v3:  clean up

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: use vl screen drm support from vl_wys_drm
Leo Liu [Thu, 5 Nov 2015 16:56:37 +0000 (11:56 -0500)]
st/va: use vl screen drm support from vl_wys_drm

v2: move the dup to vl_wys_drm for pipe loader

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agovl: add drm support for vl_screen
Leo Liu [Wed, 4 Nov 2015 21:24:26 +0000 (16:24 -0500)]
vl: add drm support for vl_screen

This will allow the state trackers to use render nodes
with screen creation

v2: dup fd for pipe loader

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agost/va: fix build fails with pipe loader
Leo Liu [Thu, 5 Nov 2015 16:22:22 +0000 (11:22 -0500)]
st/va: fix build fails with pipe loader

There is no dev in drv, and dev should be from vl_screen here

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agonvc0: enable compute support on Fermi
Samuel Pitoiset [Thu, 5 Nov 2015 23:33:48 +0000 (00:33 +0100)]
nvc0: enable compute support on Fermi

Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.

This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: fix emission of s[] args in certain situations
Ilia Mirkin [Sat, 7 Nov 2015 23:48:55 +0000 (18:48 -0500)]
nv50/ir: fix emission of s[] args in certain situations

There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: only take abs value when computing high result
Ilia Mirkin [Sat, 7 Nov 2015 23:47:40 +0000 (18:47 -0500)]
nv50/ir: only take abs value when computing high result

Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonouveau: avoid queueing too much work onto a single fence
Ilia Mirkin [Fri, 6 Nov 2015 05:44:10 +0000 (00:44 -0500)]
nouveau: avoid queueing too much work onto a single fence

Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.

This fixes crashes in teximage-colors --benchmark with too many active
maps.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agollvmpipe: disable front updates for now
Dave Airlie [Sat, 7 Nov 2015 21:55:17 +0000 (07:55 +1000)]
llvmpipe: disable front updates for now

As pointed out by Emil, this sometimes hangs, appears to be due to threading

need to rethink how this stuff works for llvmpipe.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: wrap ret assignment with braces to do correct thing
Dave Airlie [Sat, 31 Oct 2015 06:19:43 +0000 (16:19 +1000)]
virgl: wrap ret assignment with braces to do correct thing

Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agonir: Add a nir_deref_tail helper
Jason Ekstrand [Sat, 7 Nov 2015 20:01:50 +0000 (12:01 -0800)]
nir: Add a nir_deref_tail helper

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/types: Add an is_vector_or_scalar helper
Jason Ekstrand [Fri, 1 May 2015 18:26:40 +0000 (11:26 -0700)]
nir/types: Add an is_vector_or_scalar helper

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs: Use regs_read/written for post-RA scheduling in calculate_deps
Jason Ekstrand [Fri, 6 Nov 2015 00:37:47 +0000 (16:37 -0800)]
i965/fs: Use regs_read/written for post-RA scheduling in calculate_deps

Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16).  This isn't actually true.  In
particular, the PLN instruction reads 2 logical registers in one of the
components.  This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/validate: Add better validation of load/store types
Jason Ekstrand [Thu, 22 Oct 2015 23:53:27 +0000 (16:53 -0700)]
nir/validate: Add better validation of load/store types

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoradeonsi: add register definitions for Stoney
Marek Olšák [Tue, 3 Nov 2015 11:20:18 +0000 (12:20 +0100)]
radeonsi: add register definitions for Stoney

There are a few non-stoney changes too.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
9 years agoradeonsi: add workarounds for CP DMA to stay on the fast path
Marek Olšák [Sun, 1 Nov 2015 12:43:26 +0000 (13:43 +0100)]
radeonsi: add workarounds for CP DMA to stay on the fast path

v2: set emit_scratch_reloc, add a NULL check

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: unify CP DMA preparation logic
Marek Olšák [Sat, 31 Oct 2015 00:33:42 +0000 (01:33 +0100)]
radeonsi: unify CP DMA preparation logic

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: unify CP DMA code determining various flags
Marek Olšák [Sat, 31 Oct 2015 00:21:01 +0000 (01:21 +0100)]
radeonsi: unify CP DMA code determining various flags

v2: don't call get_flush_flags twice per function

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: only enable write confirmation on the last CP DMA packet
Marek Olšák [Sat, 31 Oct 2015 00:03:42 +0000 (01:03 +0100)]
radeonsi: only enable write confirmation on the last CP DMA packet

This should improve performance for big copies that need to be split.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agonv50/ir: allow emission of immediates in imul/imad ops
Ilia Mirkin [Sat, 7 Nov 2015 05:41:05 +0000 (00:41 -0500)]
nv50/ir: allow emission of immediates in imul/imad ops

Nothing actually uses this yet (due to complications), but the emission
logic is right.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: properly set the type of the constant folding result
Ilia Mirkin [Sat, 7 Nov 2015 00:28:29 +0000 (19:28 -0500)]
nv50/ir: properly set the type of the constant folding result

This removes the hack used for merge, which only covers a fraction of
the cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: add support for const-folding OP_CVT with F64 source/dest
Ilia Mirkin [Sat, 7 Nov 2015 00:13:35 +0000 (19:13 -0500)]
nv50/ir: add support for const-folding OP_CVT with F64 source/dest

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: add fp64 opcode emission support for G200 (NVA0)
Ilia Mirkin [Mon, 23 Feb 2015 00:49:49 +0000 (19:49 -0500)]
nv50/ir: add fp64 opcode emission support for G200 (NVA0)

Need to emulate rcp/rsq before providing full fp64 support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: Add support for 64bit immediates to checkSwapSrc01
Hans de Goede [Thu, 5 Nov 2015 13:32:38 +0000 (14:32 +0100)]
nv50/ir: Add support for 64bit immediates to checkSwapSrc01

Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonvc0/ir: Teach insnCanLoad about double immediates
Hans de Goede [Thu, 5 Nov 2015 13:32:37 +0000 (14:32 +0100)]
nvc0/ir: Teach insnCanLoad about double immediates

Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"

This turns the following (nvc0) code:
  1: mov u32 $r2 0x00000000 (8)
  2: mov u32 $r3 0x3fe00000 (8)
  3: add f64 $r0d $r0d $r2d (8)

Into:
  1: add f64 $r0d $r0d 0.500000 (8)

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: Add support for merge-s to the ConstantFolding pass
Hans de Goede [Thu, 5 Nov 2015 13:32:36 +0000 (14:32 +0100)]
nv50/ir: Add support for merge-s to the ConstantFolding pass

This allows later passes like LoadPropagation to properly deal with 64
bit immediates.

If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: disallow 64-bit immediates on nv50 targets
Ilia Mirkin [Fri, 6 Nov 2015 22:58:42 +0000 (17:58 -0500)]
nv50/ir: disallow 64-bit immediates on nv50 targets

No instructions are able to load short immediates like nvc0 can.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: allow movs with TYPE_F64 destinations to be split
Ilia Mirkin [Fri, 6 Nov 2015 22:18:01 +0000 (17:18 -0500)]
nv50/ir: allow movs with TYPE_F64 destinations to be split

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agogm107/ir: Add support for double immediates
Hans de Goede [Thu, 5 Nov 2015 13:32:35 +0000 (14:32 +0100)]
gm107/ir: Add support for double immediates

Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonvc0/ir: Add support for double immediates
Hans de Goede [Thu, 5 Nov 2015 13:32:34 +0000 (14:32 +0100)]
nvc0/ir: Add support for double immediates

Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoi965/nir/fs: Add comment for no-op memory barrier functions
Francisco Jerez [Fri, 6 Nov 2015 21:19:56 +0000 (13:19 -0800)]
i965/nir/fs: Add comment for no-op memory barrier functions

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
9 years agoi965/nir/fs: Implement new barrier functions for compute shaders
Jordan Justen [Sat, 10 Oct 2015 20:00:04 +0000 (13:00 -0700)]
i965/nir/fs: Implement new barrier functions for compute shaders

For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:

 * nir_intrinsic_memory_barrier_atomic_counter
 * nir_intrinsic_memory_barrier_buffer
 * nir_intrinsic_memory_barrier_image

We treat these nir intrinsics as no-ops:
 * nir_intrinsic_group_memory_barrier
 * nir_intrinsic_memory_barrier_shared

v3:
 * Add comment for no-op cases (curro)

v4:
 * Moving comment to a separate patch authored by curro

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agonir: Add new barrier functions for compute shaders
Jordan Justen [Sat, 10 Oct 2015 15:59:42 +0000 (08:59 -0700)]
nir: Add new barrier functions for compute shaders

When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoglsl: Add new barrier functions for compute shaders
Jordan Justen [Fri, 9 Oct 2015 21:16:05 +0000 (14:16 -0700)]
glsl: Add new barrier functions for compute shaders

When these functions are called in GLSL code, we create an intrinsic
function call:

 * groupMemoryBarrier => __intrinsic_group_memory_barrier
 * memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
 * memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
 * memoryBarrierImage => __intrinsic_memory_barrier_image
 * memoryBarrierShared => __intrinsic_memory_barrier_shared

v2:
 * Consolidate with memoryBarrier function/intrinsic creation (curro)

v3:
 * Instead of add_memory_barrier_function, add an intrinsic_name
   parameter to _memory_barrier (curro)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoradeon/uvd: fix VC-1 simple/main profile decode v2
Boyuan Zhang [Wed, 23 Sep 2015 08:11:08 +0000 (10:11 +0200)]
radeon/uvd: fix VC-1 simple/main profile decode v2

We just needed to set the extra width/height fields to get this working.

v2 (chk): rebased, CC stable added, commit message added, fixed coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
9 years agost/vaapi: fix vaapi VC-1 simple/main corruption v2
Boyuan Zhang [Wed, 23 Sep 2015 08:11:07 +0000 (10:11 +0200)]
st/vaapi: fix vaapi VC-1 simple/main corruption v2

Apply the start code fix only to advanced profile.

v2 (chk): add commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
9 years agost/va: add support for RGBX and BGRX in VPP
Julien Isorce [Fri, 6 Nov 2015 09:45:22 +0000 (09:45 +0000)]
st/va: add support for RGBX and BGRX in VPP

Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.

This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agovl/buffers: add RGBX and BGRX to the supported formats
Julien Isorce [Fri, 6 Nov 2015 09:45:19 +0000 (09:45 +0000)]
vl/buffers: add RGBX and BGRX to the supported formats

Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agost/va: properly use brackets in vlVaAcquireBufferHandle's switch
Julien Isorce [Fri, 6 Nov 2015 09:45:17 +0000 (09:45 +0000)]
st/va: properly use brackets in vlVaAcquireBufferHandle's switch

In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agost/va: properly indent buffer.c, config.c, image.c and picture.c
Julien Isorce [Fri, 6 Nov 2015 09:45:11 +0000 (09:45 +0000)]
st/va: properly indent buffer.c, config.c, image.c and picture.c

Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agofreedreno/a4xx: fix blend color
Rob Clark [Tue, 27 Oct 2015 15:38:34 +0000 (11:38 -0400)]
freedreno/a4xx: fix blend color

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: update generated headers
Rob Clark [Tue, 27 Oct 2015 15:33:32 +0000 (11:33 -0400)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: add a305 support
Guillaume Charifi [Fri, 6 Nov 2015 16:17:25 +0000 (11:17 -0500)]
freedreno: add a305 support

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/ir3: Use nir_foreach_variable
Boyan Ding [Fri, 16 Oct 2015 07:15:38 +0000 (15:15 +0800)]
freedreno/ir3: Use nir_foreach_variable

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agonir: some small cleanups
Rob Clark [Wed, 21 Oct 2015 14:57:15 +0000 (10:57 -0400)]
nir: some small cleanups

The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit.  Plus couple other corrections/
clarifications.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonvc0: reintroduce BGRA4 format support
Ilia Mirkin [Fri, 6 Nov 2015 04:12:52 +0000 (23:12 -0500)]
nvc0: reintroduce BGRA4 format support

Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.

The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agomesa: report enum name in glClientActiveTexture() error string
Brian Paul [Fri, 6 Nov 2015 02:03:39 +0000 (19:03 -0700)]
mesa: report enum name in glClientActiveTexture() error string

As we do for glActiveTexture().  Trivial.

9 years agost/va: fix memory leak on error in vlVaCreateSurfaces2
Julien Isorce [Thu, 5 Nov 2015 08:24:45 +0000 (08:24 +0000)]
st/va: fix memory leak on error in vlVaCreateSurfaces2

Found by coverity: CID #1337953

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agost/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2
Julien Isorce [Thu, 5 Nov 2015 08:24:44 +0000 (08:24 +0000)]
st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2

Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agoi965: Fix scalar VS float[] and vec2[] output arrays.
Kenneth Graunke [Tue, 13 Oct 2015 22:30:03 +0000 (15:30 -0700)]
i965: Fix scalar VS float[] and vec2[] output arrays.

The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken).  Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4.  However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4.  This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them.  Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s.  So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
    by Connor Abbott), and rename it to type_size_vec4_times_4()
    (renaming suggested by Jason Ekstrand).  Use type_size_vec4
    rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agollvmpipe: disable texture cache
Roland Scheidegger [Thu, 5 Nov 2015 17:00:40 +0000 (18:00 +0100)]
llvmpipe: disable texture cache

There are some weird problems with 8-wide vectors.

9 years agonouveau: send back a debug message when waiting for a fence to complete
Ilia Mirkin [Sat, 31 Oct 2015 00:44:57 +0000 (20:44 -0400)]
nouveau: send back a debug message when waiting for a fence to complete

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50,nvc0: provide debug messages with shader compilation stats
Ilia Mirkin [Fri, 30 Oct 2015 22:41:09 +0000 (18:41 -0400)]
nv50,nvc0: provide debug messages with shader compilation stats

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonouveau: add support for sending debug messages via KHR_debug
Ilia Mirkin [Fri, 30 Oct 2015 21:23:22 +0000 (17:23 -0400)]
nouveau: add support for sending debug messages via KHR_debug

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agost/clover: provide a path for drivers to call through to pfn_notify
Ilia Mirkin [Sat, 31 Oct 2015 03:25:59 +0000 (23:25 -0400)]
st/clover: provide a path for drivers to call through to pfn_notify

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[ Francisco Jerez: Clean up clover::context interface by passing
  around a function object. ]

9 years agost/mesa: set debug callback for debug contexts
Ilia Mirkin [Sat, 31 Oct 2015 03:28:01 +0000 (23:28 -0400)]
st/mesa: set debug callback for debug contexts

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agogallium: expose a debug message callback settable by context owner
Ilia Mirkin [Fri, 30 Oct 2015 07:17:35 +0000 (03:17 -0400)]
gallium: expose a debug message callback settable by context owner

This will allow gallium drivers to send messages to KHR_debug endpoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agost/mesa: account for texture views when doing CopyImageSubData
Ilia Mirkin [Thu, 5 Nov 2015 05:33:22 +0000 (00:33 -0500)]
st/mesa: account for texture views when doing CopyImageSubData

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoi965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
Iago Toral Quiroga [Fri, 30 Oct 2015 10:10:02 +0000 (11:10 +0100)]
i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE

Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
Iago Toral Quiroga [Fri, 30 Oct 2015 09:57:47 +0000 (10:57 +0100)]
i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE

Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD
Iago Toral Quiroga [Fri, 30 Oct 2015 09:24:12 +0000 (10:24 +0100)]
i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const, do not add unnecessary temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD
Iago Toral Quiroga [Fri, 30 Oct 2015 07:48:57 +0000 (08:48 +0100)]
i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD
Iago Toral Quiroga [Fri, 30 Oct 2015 07:39:11 +0000 (08:39 +0100)]
i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const and remove useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/skl+: Enable support for 16x multisampling
Neil Roberts [Mon, 7 Sep 2015 17:23:14 +0000 (18:23 +0100)]
i965/skl+: Enable support for 16x multisampling

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
9 years agomesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Neil Roberts [Mon, 28 Sep 2015 17:22:32 +0000 (18:22 +0100)]
mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit

Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.

To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.

This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.

v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
    [Ian Romanick]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agometa/blit: Always try to enable GL_ARB_sample_shading
Neil Roberts [Thu, 22 Oct 2015 08:55:35 +0000 (10:55 +0200)]
meta/blit: Always try to enable GL_ARB_sample_shading

Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agometa: Support 16x MSAA in the multisample scaled blit shader
Neil Roberts [Wed, 16 Sep 2015 16:43:33 +0000 (17:43 +0100)]
meta: Support 16x MSAA in the multisample scaled blit shader

v2: Fix the x_scale in the shader. Remove the doubts in the commit
    message.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agoi965/meta: Support 16x MSAA in the meta stencil blit
Neil Roberts [Fri, 11 Sep 2015 17:09:46 +0000 (18:09 +0100)]
i965/meta: Support 16x MSAA in the meta stencil blit

The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
9 years agoi965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA
Neil Roberts [Wed, 9 Sep 2015 16:44:17 +0000 (17:44 +0100)]
i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA

In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agoi965: Support allocating the MCS buffer for 16x MSAA
Neil Roberts [Wed, 9 Sep 2015 13:38:08 +0000 (14:38 +0100)]
i965: Support allocating the MCS buffer for 16x MSAA

When 16 samples are used the MCS buffer needs 64 bits per pixel.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>