Ben Widawsky [Sat, 7 Nov 2015 02:12:27 +0000 (18:12 -0800)]
i965/skl/gt4: Fix URB programming restriction.
The comment in the code details the restriction. Thanks to Ken for having a very
helpful conversation with me, and spotting the blurb in the link I sent him :P.
There are still stability problems for me on GT4, but this definitely helps with
some of the failures.
v2: Comment fixes
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ilia Mirkin [Mon, 9 Nov 2015 17:39:05 +0000 (12:39 -0500)]
nv50,nvc0: add ARB_clear_texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Wed, 5 Mar 2014 02:51:55 +0000 (21:51 -0500)]
st/mesa: implement ARB_clear_texture
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ilia Mirkin [Mon, 9 Nov 2015 18:27:07 +0000 (13:27 -0500)]
gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototype
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 27 Oct 2015 20:42:49 +0000 (07:42 +1100)]
glsl: add helper to check for enhanced layouts support
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Timothy Arceri [Sun, 4 Oct 2015 13:01:45 +0000 (00:01 +1100)]
mesa: add ARB_enhanced_layouts
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Dave Airlie [Wed, 11 Nov 2015 22:34:18 +0000 (08:34 +1000)]
r600: initialised PGM_RESOURCES_2 for ES/GS
This fixes the corruption on rendering that we are seeing in
certain geometry shaders.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91780
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested / Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Kenneth Graunke [Thu, 5 Nov 2015 07:05:07 +0000 (23:05 -0800)]
i965: Split nir_emit_intrinsic by stage with a general fallback.
Many intrinsics only apply to a particular stage (such as discard).
In other cases, we may want to interpret them differently based on
the stage (such as load_primitive_id or load_input).
The current method isn't that pretty - we handle all intrinsics in
one giant function. Sometimes we assert on stage, sometimes we forget.
Different behaviors are handled via if-ladders based on stage.
This commit introduces new nir_emit_<stage>_intrinsic() functions,
and makes nir_emit_instr() call those. In turn, those fall back to
the generic nir_emit_intrinsic() function for cases they don't want
to handle specially.
This makes it clear which intrinsics only exist in one stage, and makes
it easy to handle inputs/outputs differently for various stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Ilia Mirkin [Sun, 8 Nov 2015 09:46:38 +0000 (04:46 -0500)]
mesa/copyimage: allow width/height to not be multiples of block
For compressed textures, the image size is not necessarily a multiple of
the block size (e.g. the last mip levels). Section 18.3.2 (Copying
Between Images) of the OpenGL 4.5 Core Profile spec says:
An INVALID_VALUE error is generated if the dimensions of either
subregion exceeds the boundaries of the corresponding image
object, or if the image format is compressed and the dimensions of
the subregion fail to meet the alignment constraints of the
format.
and Section 8.7 (Compressed Texture Images) says:
An INVALID_OPERATION error is generated if any of the following
conditions occurs:
* width is not a multiple of four, and width + xoffset is not
equal to the value of TEXTURE_WIDTH.
* height is not a multiple of four, and height + yoffset is not
equal to the value of TEXTURE_HEIGHT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 20 Aug 2015 05:15:33 +0000 (22:15 -0700)]
i965/brw_reg: Add a brw_VxH_indirect helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Brian Paul [Wed, 11 Nov 2015 00:03:37 +0000 (17:03 -0700)]
mesa: remove old comments in arrayobj.c
Brian Paul [Tue, 10 Nov 2015 00:25:22 +0000 (17:25 -0700)]
st/wgl: clarify code in stw_framebuffer_from_hwnd_locked()
Just a minor code change to make it obvious that NULL is returned when
we don't find the given HWND.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Tue, 10 Nov 2015 00:35:55 +0000 (17:35 -0700)]
st/wgl: improve some function comments
In particular, explain when stw_framebuffer objects are
locked/unlocked/etc.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Tue, 10 Nov 2015 00:19:35 +0000 (17:19 -0700)]
st/wgl: whitespace/formatting fixes
Brian Paul [Mon, 9 Nov 2015 21:51:56 +0000 (14:51 -0700)]
st/wgl: fix locking issue in stw_st_framebuffer_present_locked()
When stw_st_framebuffer_present_locked() is called, the
stw_framebuffer's mutex will already be locked. Normally, the
stw_framebuffer_present_locked() function calls
stw_framebuffer_release() to unlock the mutex when it's done. But if
for some reason the 'resource' pointer in
stw_st_framebuffer_present_locked() is null, we'd return without
unlocking the stw_framebuffer. This fixes that to avoid potential
deadlocks.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Kenneth Graunke [Tue, 10 Nov 2015 07:55:58 +0000 (23:55 -0800)]
i965: Print force_writemask_all in dump_instructions().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Tue, 25 Nov 2014 10:59:28 +0000 (02:59 -0800)]
i965: Combine BRW_NEW_*_BINDING_TABLE dirty bits.
A while back, we moved to directly emitting the Gen7+ state when
constructing the binding tables. These flags are only used on
Gen4-6, which emit all the binding table pointers at once.
We gain nothing by having separate flags, so combine them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Sat, 25 Jul 2015 04:15:35 +0000 (21:15 -0700)]
i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.
Inspired by a patch by Fabian Bieler.
Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid
topology type); I instead chose to make a macro that takes an argument.
He also took the number of patch vertices from _mesa_prim (which was set
to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to
avoid the need for the VBO patch.
v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match
the documentation (suggested by Ian).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Emil Velikov [Wed, 11 Nov 2015 11:18:27 +0000 (11:18 +0000)]
docs: add news item and link release notes for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Wed, 11 Nov 2015 11:10:30 +0000 (11:10 +0000)]
docs: add sha256 checksums for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
66c949d0a19b1e601243be22b6506528b866388b)
Emil Velikov [Wed, 11 Nov 2015 10:05:57 +0000 (10:05 +0000)]
docs: add release notes for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
ee57c22141c42d9b511a7dfa5971c4428cd1c6e7)
Glenn Kennard [Sat, 17 Oct 2015 14:53:28 +0000 (16:53 +0200)]
r600g: Pass conservative depth parameters to hw
Supported on R700 and up.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 10 Nov 2015 23:05:50 +0000 (09:05 +1000)]
Revert "r600g: Pass conservative depth parameters to hw"
This reverts commit
a1fc78911e9a6439db94d6ae91d5672c76e5fb1c.
I pushed the wrong patch.
Glenn Kennard [Thu, 15 Oct 2015 23:53:47 +0000 (01:53 +0200)]
r600g: Implement ARB_texture_view
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Glenn Kennard [Fri, 16 Oct 2015 22:52:39 +0000 (00:52 +0200)]
r600g: Pass conservative depth parameters to hw
Supported on R700 and up.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eduardo Lima Mitev [Thu, 22 Oct 2015 13:32:13 +0000 (15:32 +0200)]
i965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
When both fadd and fmul instructions have at least one operand that is a
constant and it is only used once, the total number of instructions can
be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
the constants will be progagated as immediate operands of fmul and fadd.
This patch detects these situations and prevents fusing fmul+fadd into ffma.
Shader-db results on i965 Haswell:
total instructions in shared programs:
6235835 ->
6225895 (-0.16%)
instructions in affected programs:
1124094 ->
1114154 (-0.88%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 7612
HURT: 843
GAINED: 4
LOST: 0
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Eduardo Lima Mitev [Fri, 23 Oct 2015 14:31:41 +0000 (16:31 +0200)]
util: Add list_is_singular() helper function
Returns whether the list has exactly one element.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eduardo Lima Mitev [Thu, 22 Oct 2015 13:25:23 +0000 (15:25 +0200)]
nir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driver
Because the next patch will add an optimization that is specific to i965,
we want to move this loweing pass to that driver altogether.
This is safe because i965 is the only consumer.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kristian Høgsberg Kristensen [Wed, 4 Nov 2015 22:58:54 +0000 (14:58 -0800)]
glsl: Use array deref for access to vector components
We've assumed that we could lower per-component vector access from
vec[i] = scalar
to
vec = ir_triop_vector_insert(vec, scalar, i)
but with SSBOs (and compute shader SLM and tesselation outputs) this is
no longer valid. If a vector is "externally visible", multiple threads
can write independent components simultaneously. With lowering to
ir_triop_vector_insert, each thread read the entire vector, changes one
component, then writes out the entire vector. This is racy.
Instead of generating a ir_binop_vector_extract when we see v[i], we
generate ir_dereference_array. We then add a lowering pass to lower the
ir_dereference_array to ir_binop_vector_extract for rvalues and for to
vector_insert for lvalues in a separate lowering pass.
The resulting IR is the same as before, but we now have a window between
ast->ir conversion and the lowering pass where v[i] appears in the IR as
an array deref. This lets us run lowering passes that lower the vector
access to I/O (eg for SSBO load/store) before we lower the per-component
access to full vector writes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Kristian Høgsberg Kristensen [Wed, 4 Nov 2015 22:55:32 +0000 (14:55 -0800)]
glsl: Lower UBO and SSBO access in glsl linker
All GLSL IR consumers run this lowering pass so we can move it to the
linker. This moves the pass up quite a bit, but that's the point: it
needs to run before we throw away information about per-component vector
access.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Kristian Høgsberg Kristensen [Wed, 4 Nov 2015 22:50:51 +0000 (14:50 -0800)]
glsl: Drop exec_list argument to lower_ubo_reference
We always pass in shader->ir and we already pass in the shader, so just
drop the exec_list. Most passes either take just a exec_list or a
shader, so this seems more consistent.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Connor Abbott [Sat, 31 Oct 2015 20:31:59 +0000 (16:31 -0400)]
nir/glsl: switch to using the builder
v2: use nir_bulder_cf_insert (Ken)
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Sat, 31 Oct 2015 03:56:49 +0000 (23:56 -0400)]
nir/glsl: make emit() take nir_ssa_def * sources
Again, this matches what the builder will have to do.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Sat, 31 Oct 2015 03:47:46 +0000 (23:47 -0400)]
nir/glsl: convert nir_visitor::result to a nir_ssa_def *
Its only user now returns a nir_ssa_def *, and we'll need this since the
builder returns a nir_ssa_def *.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Sat, 31 Oct 2015 03:32:50 +0000 (23:32 -0400)]
nir/glsl: make evaluate_rvalue() return a nir_ssa_def *
A long time ago, before NIR was even merged to master, glsl_to_nir used
registers and these sources were actually register sources. But nowadays
everything in glsl_to_nir is an SSA value, so stop pretending that by
evaluating an rvalue we can get an arbitrary nir_src. Most importantly,
we need this since the builder takes nir_ssa_def * sources directly.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jose Fonseca [Mon, 9 Nov 2015 22:25:27 +0000 (22:25 +0000)]
st/mesa: Destroy buffer object's mutex.
Ideally we should have a _mesa_cleanup_buffer_object function in
src/mesa/bufferobj.c so that the destruction logic resided in a single
place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Kenneth Graunke [Fri, 9 Oct 2015 22:49:49 +0000 (15:49 -0700)]
nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.
These tessellation shader related fields need plumbing through NIR.
v2: Use uint32_t instead of uint64_t to match the source type of
GLbitfield (caught by Iago Toral).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Eric Anholt [Mon, 9 Nov 2015 17:12:20 +0000 (09:12 -0800)]
vc4: Avoid loading undefined (newly-allocated) FBO contents.
Since X has undefined contents in new pixmaps, it will allocate new
textures for an FBO and draw to them without an explicit clear. For
VC4, it's much faster to emit a clear than the load of the actual
undefined memory contents, so just do that instead.
Eric Anholt [Mon, 9 Nov 2015 16:56:01 +0000 (08:56 -0800)]
vc4: Return NULL when we can't make our shadow for a sampler view.
I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Fri, 6 Nov 2015 19:07:25 +0000 (11:07 -0800)]
vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Eric Anholt [Tue, 27 Oct 2015 23:14:05 +0000 (16:14 -0700)]
vc4: Add CL dumping for GL_ARRAY_PRIMITIVE.
Eric Anholt [Mon, 9 Nov 2015 16:51:47 +0000 (08:51 -0800)]
vc4: Fix a compiler warning.
Jordan Justen [Tue, 28 Jul 2015 22:00:47 +0000 (15:00 -0700)]
glsl: Use shared storage variable type for shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Jordan Justen [Tue, 28 Jul 2015 21:56:49 +0000 (14:56 -0700)]
glsl: Add shared variable type
Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.
These variables are similar to OpenCL's local/__local variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Jordan Justen [Mon, 9 Nov 2015 03:07:43 +0000 (19:07 -0800)]
glsl: Add space to shader_storage in print_visitor
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Jordan Justen [Tue, 28 Jul 2015 21:56:49 +0000 (14:56 -0700)]
glsl: Align comments on variables types
v2:
* Split from patch to add ir_var_shader_shared (tarceri)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Jordan Justen [Sun, 15 Mar 2015 20:53:06 +0000 (13:53 -0700)]
glsl: Parse shared keyword for compute shader variables
v2:
* Move shared parsing under storage qualifiers (tarceri)
* Fail to compile if shared is used in non-compute shader (tarceri)
* Use separate shared_storage bit for shared variables (tarceri)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Timothy Arceri [Fri, 6 Nov 2015 23:53:53 +0000 (10:53 +1100)]
glsl: simplify interface block stream qualifier validation
Qualifiers on member variables are redundent all we need to do
if check if it matches the stream associated with the block and
throw an error if its not.
Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Ilia Mirkin [Mon, 9 Nov 2015 12:13:29 +0000 (07:13 -0500)]
docs: note that ARB_copy_image was added to nv50, nvc0 in this release
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Brian Paul [Mon, 6 Jul 2015 20:53:06 +0000 (14:53 -0600)]
st/wgl: add null pointer check for HUD texture
Fixes crash when using HUD with Nobel Clinician Viewer.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Tue, 16 Jun 2015 01:14:42 +0000 (19:14 -0600)]
st/wgl: fix double-present on swapbuffers bug
The stw_st_framebuffer_present_locked() function was getting called
twice per SwapBuffers. First, when st_context_iface::flush() was
called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was
given. Second, by stw_st_swap_framebuffer_locked() which does the
actual SwapBuffers.
Two code changes:
1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT.
2. Move the implementation of stw_flush_current_locked() into
DrvSwapBuffers() since it's not called anywhere else.
Not much change in perf for benchmarks like Lightsmark, but some simple
Mesa demos are measurably faster.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 1 Jun 2015 14:45:07 +0000 (08:45 -0600)]
st/wgl: reorder pixel formats to put MSAA formats last
And put 8-bit/channel formats before 5/6/5 formats.
The ChoosePixelFormat() function seems to be finicky about format
selection. Putting the MSAA formats after the non-MSAA formats
means most apps get a low-numbered format. Now we generally get
the same pixel format regardless of whether using vgpu9 or 10.
VMware bug
1455030
Reviewed-by: José Fonseca <jfonseca@vmware.com>
José Fonseca [Thu, 22 Mar 2012 12:16:17 +0000 (12:16 +0000)]
st/wgl: Don't rely on GDI to bookkeep pixelformat for us.
This allows to use apitrace's retracediff script on Windows to retrace and
compare two builds of a Mesa based opengl32.dll/ICD side-by-side.
See also https://github.com/apitrace/apitrace/commit/
e4a4f15f5b92e0abbd24d7d053da25f8278c9f64
Michel Dänzer [Thu, 21 Aug 2014 09:30:44 +0000 (18:30 +0900)]
winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3
Fixes GPUVM conflicts with non-4K page size.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738
v2: Replace sanitization of VM base address alignment with comment why
that's not necessary.
v3: Use unsigned instead of long as the type for the size_align member.
(Marek)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Christian König [Fri, 6 Nov 2015 20:15:56 +0000 (15:15 -0500)]
radeon/uvd: add H.265/HEVC to legal notes
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 4 Nov 2015 21:38:28 +0000 (16:38 -0500)]
st/omx: add headless support
This will allow dec/enc/transcode without X
v2: use env override even with X,
use loader_open_device instead of open
v3: clean up
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Leo Liu [Thu, 5 Nov 2015 16:56:37 +0000 (11:56 -0500)]
st/va: use vl screen drm support from vl_wys_drm
v2: move the dup to vl_wys_drm for pipe loader
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Leo Liu [Wed, 4 Nov 2015 21:24:26 +0000 (16:24 -0500)]
vl: add drm support for vl_screen
This will allow the state trackers to use render nodes
with screen creation
v2: dup fd for pipe loader
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Leo Liu [Thu, 5 Nov 2015 16:22:22 +0000 (11:22 -0500)]
st/va: fix build fails with pipe loader
There is no dev in drv, and dev should be from vl_screen here
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Samuel Pitoiset [Thu, 5 Nov 2015 23:33:48 +0000 (00:33 +0100)]
nvc0: enable compute support on Fermi
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.
This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 23:48:55 +0000 (18:48 -0500)]
nv50/ir: fix emission of s[] args in certain situations
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 23:47:40 +0000 (18:47 -0500)]
nv50/ir: only take abs value when computing high result
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 6 Nov 2015 05:44:10 +0000 (00:44 -0500)]
nouveau: avoid queueing too much work onto a single fence
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.
This fixes crashes in teximage-colors --benchmark with too many active
maps.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Dave Airlie [Sat, 7 Nov 2015 21:55:17 +0000 (07:55 +1000)]
llvmpipe: disable front updates for now
As pointed out by Emil, this sometimes hangs, appears to be due to threading
need to rethink how this stuff works for llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sat, 31 Oct 2015 06:19:43 +0000 (16:19 +1000)]
virgl: wrap ret assignment with braces to do correct thing
Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Sat, 7 Nov 2015 20:01:50 +0000 (12:01 -0800)]
nir: Add a nir_deref_tail helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 1 May 2015 18:26:40 +0000 (11:26 -0700)]
nir/types: Add an is_vector_or_scalar helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 6 Nov 2015 00:37:47 +0000 (16:37 -0800)]
i965/fs: Use regs_read/written for post-RA scheduling in calculate_deps
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16). This isn't actually true. In
particular, the PLN instruction reads 2 logical registers in one of the
components. This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 22 Oct 2015 23:53:27 +0000 (16:53 -0700)]
nir/validate: Add better validation of load/store types
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 3 Nov 2015 11:20:18 +0000 (12:20 +0100)]
radeonsi: add register definitions for Stoney
There are a few non-stoney changes too.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Sun, 1 Nov 2015 12:43:26 +0000 (13:43 +0100)]
radeonsi: add workarounds for CP DMA to stay on the fast path
v2: set emit_scratch_reloc, add a NULL check
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Oct 2015 00:33:42 +0000 (01:33 +0100)]
radeonsi: unify CP DMA preparation logic
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Oct 2015 00:21:01 +0000 (01:21 +0100)]
radeonsi: unify CP DMA code determining various flags
v2: don't call get_flush_flags twice per function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Oct 2015 00:03:42 +0000 (01:03 +0100)]
radeonsi: only enable write confirmation on the last CP DMA packet
This should improve performance for big copies that need to be split.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Ilia Mirkin [Sat, 7 Nov 2015 05:41:05 +0000 (00:41 -0500)]
nv50/ir: allow emission of immediates in imul/imad ops
Nothing actually uses this yet (due to complications), but the emission
logic is right.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 00:28:29 +0000 (19:28 -0500)]
nv50/ir: properly set the type of the constant folding result
This removes the hack used for merge, which only covers a fraction of
the cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 7 Nov 2015 00:13:35 +0000 (19:13 -0500)]
nv50/ir: add support for const-folding OP_CVT with F64 source/dest
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 23 Feb 2015 00:49:49 +0000 (19:49 -0500)]
nv50/ir: add fp64 opcode emission support for G200 (NVA0)
Need to emulate rcp/rsq before providing full fp64 support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:38 +0000 (14:32 +0100)]
nv50/ir: Add support for 64bit immediates to checkSwapSrc01
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:37 +0000 (14:32 +0100)]
nvc0/ir: Teach insnCanLoad about double immediates
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"
This turns the following (nvc0) code:
1: mov u32 $r2 0x00000000 (8)
2: mov u32 $r3 0x3fe00000 (8)
3: add f64 $r0d $r0d $r2d (8)
Into:
1: add f64 $r0d $r0d 0.500000 (8)
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:36 +0000 (14:32 +0100)]
nv50/ir: Add support for merge-s to the ConstantFolding pass
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.
If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 6 Nov 2015 22:58:42 +0000 (17:58 -0500)]
nv50/ir: disallow 64-bit immediates on nv50 targets
No instructions are able to load short immediates like nvc0 can.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 6 Nov 2015 22:18:01 +0000 (17:18 -0500)]
nv50/ir: allow movs with TYPE_F64 destinations to be split
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:35 +0000 (14:32 +0100)]
gm107/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hans de Goede [Thu, 5 Nov 2015 13:32:34 +0000 (14:32 +0100)]
nvc0/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Francisco Jerez [Fri, 6 Nov 2015 21:19:56 +0000 (13:19 -0800)]
i965/nir/fs: Add comment for no-op memory barrier functions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jordan Justen [Sat, 10 Oct 2015 20:00:04 +0000 (13:00 -0700)]
i965/nir/fs: Implement new barrier functions for compute shaders
For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:
* nir_intrinsic_memory_barrier_atomic_counter
* nir_intrinsic_memory_barrier_buffer
* nir_intrinsic_memory_barrier_image
We treat these nir intrinsics as no-ops:
* nir_intrinsic_group_memory_barrier
* nir_intrinsic_memory_barrier_shared
v3:
* Add comment for no-op cases (curro)
v4:
* Moving comment to a separate patch authored by curro
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Jordan Justen [Sat, 10 Oct 2015 15:59:42 +0000 (08:59 -0700)]
nir: Add new barrier functions for compute shaders
When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Jordan Justen [Fri, 9 Oct 2015 21:16:05 +0000 (14:16 -0700)]
glsl: Add new barrier functions for compute shaders
When these functions are called in GLSL code, we create an intrinsic
function call:
* groupMemoryBarrier => __intrinsic_group_memory_barrier
* memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
* memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
* memoryBarrierImage => __intrinsic_memory_barrier_image
* memoryBarrierShared => __intrinsic_memory_barrier_shared
v2:
* Consolidate with memoryBarrier function/intrinsic creation (curro)
v3:
* Instead of add_memory_barrier_function, add an intrinsic_name
parameter to _memory_barrier (curro)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Boyuan Zhang [Wed, 23 Sep 2015 08:11:08 +0000 (10:11 +0200)]
radeon/uvd: fix VC-1 simple/main profile decode v2
We just needed to set the extra width/height fields to get this working.
v2 (chk): rebased, CC stable added, commit message added, fixed coding style
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Boyuan Zhang [Wed, 23 Sep 2015 08:11:07 +0000 (10:11 +0200)]
st/vaapi: fix vaapi VC-1 simple/main corruption v2
Apply the start code fix only to advanced profile.
v2 (chk): add commit message
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Julien Isorce [Fri, 6 Nov 2015 09:45:22 +0000 (09:45 +0000)]
st/va: add support for RGBX and BGRX in VPP
Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.
This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Julien Isorce [Fri, 6 Nov 2015 09:45:19 +0000 (09:45 +0000)]
vl/buffers: add RGBX and BGRX to the supported formats
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Julien Isorce [Fri, 6 Nov 2015 09:45:17 +0000 (09:45 +0000)]
st/va: properly use brackets in vlVaAcquireBufferHandle's switch
In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Julien Isorce [Fri, 6 Nov 2015 09:45:11 +0000 (09:45 +0000)]
st/va: properly indent buffer.c, config.c, image.c and picture.c
Some lines were using 4 indentation spaces instead of 3.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Rob Clark [Tue, 27 Oct 2015 15:38:34 +0000 (11:38 -0400)]
freedreno/a4xx: fix blend color
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 27 Oct 2015 15:33:32 +0000 (11:33 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Guillaume Charifi [Fri, 6 Nov 2015 16:17:25 +0000 (11:17 -0500)]
freedreno: add a305 support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Boyan Ding [Fri, 16 Oct 2015 07:15:38 +0000 (15:15 +0800)]
freedreno/ir3: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 21 Oct 2015 14:57:15 +0000 (10:57 -0400)]
nir: some small cleanups
The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit. Plus couple other corrections/
clarifications.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>