Gregory Hainaut [Sat, 29 Jun 2013 00:18:35 +0000 (17:18 -0700)]
mesa/sso: Add gl_pipeline_object parameter to _mesa_use_shader_program
Extend use_shader_program to support a different target. Allow to reuse the
function to update the pipeline state. Note I bypass the flush when target
isn't current. Maybe it would be better to create a new UseProgramStages
driver function
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gregory Hainaut [Fri, 3 May 2013 17:44:11 +0000 (19:44 +0200)]
meta/sso: Update meta to save and restore SSO state.
save and restore _Shader/Pipeline binding point. Rational we don't want any
conflict when the program will be unattached.
V2: formatting improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gregory Hainaut [Fri, 3 May 2013 17:44:10 +0000 (19:44 +0200)]
mesa/sso: rename Shader to the pointer _Shader
Basically a sed but shaderapi.c and get.c.
get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior
shaderapi.c => the old api stil update the Shader object directly
V2: formatting improvement
V3 (idr):
* Rebase fixes after a block of code was moved from ir_to_mesa.cpp to
shaderapi.c.
* Trivial reformatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gregory Hainaut [Fri, 28 Jun 2013 21:33:54 +0000 (14:33 -0700)]
mesa/sso: replace Shader binding point with _Shader
To avoid NULL pointer check a default pipeline object is installed in
_Shader when no program is current
The spec say that UseProgram/UseShaderProgramEXT/ActiveProgramEXT got an
higher priority over the pipeline object. When default program is
uninstall, the pipeline is used if any was bound.
Note: A careful rename need to be done now...
V2: formating improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
* Trivial reformatting.
* Updated comment of gl_context::_Shader
v4 (idr): Reformat spec quotations to look like spec quotations. Update
comments describing what gl_context::_Shader can point to. Bot
suggested by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
José Fonseca [Fri, 14 Mar 2014 17:01:05 +0000 (17:01 +0000)]
llvmpipe: Simplify vertex and geometry shaders.
Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader.
Simplify lp_geometry_shader, as most of the incoming state is unneeded.
(We could also just use draw_geometry_shader if we were willing to peek
inside the structure.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
José Fonseca [Fri, 14 Mar 2014 16:57:34 +0000 (16:57 +0000)]
draw: Duplicate TGSI tokens in draw_pipe_pstipple module.
As done in draw_pipe_aaline and draw_pipe_aapoint modules.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Alexander von Gluck IV [Wed, 19 Mar 2014 00:58:01 +0000 (00:58 +0000)]
haiku: Fix build through scons corrections and viewport fixes
* Add HAVE_PTHREAD, we do have pthread support wrappers now for
non-native Haiku threaded applications.
* Viewport changed behavior recently breaking the build.
We fix this by looking at the gl_context ViewportArray
(Thanks Brian for the idea)
Acked-by: Brian Paul <brianp@vmware.com>
Kenneth Graunke [Fri, 21 Mar 2014 10:47:16 +0000 (03:47 -0700)]
i965: For color clears, only disable writes to components that exist.
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 50%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
v2: Use _mesa_format_has_color_component() to properly handle ALPHA
formats (and generally be less fragile).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Kenneth Graunke [Fri, 21 Mar 2014 22:58:02 +0000 (15:58 -0700)]
mesa: Skip clearing color buffers when color writes are disabled.
WebGL Aquarium in Chrome 24 actually hits this.
v2: Move to core Mesa (wisely suggested by Ian); only consider
components which actually exist.
v3: Use _mesa_format_has_color_component to determine whether components
actually exist, fixing alpha format handling.
v4: Add a comment, as requested by Brian. No actual code changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Kenneth Graunke [Mon, 24 Mar 2014 08:16:57 +0000 (01:16 -0700)]
mesa: Introduce a _mesa_format_has_color_component() helper.
When considering color write masks, we often want to know whether an
RGBA component actually contains any meaningful data. This function
provides an easy way to answer that question, and handles luminance,
intensity, and alpha formats correctly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Eric Anholt [Mon, 24 Mar 2014 18:16:38 +0000 (11:16 -0700)]
i965: Fix compiler warning about signed/unsigned.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 13 Feb 2014 19:03:49 +0000 (11:03 -0800)]
i965/gen8: Change the winsys MSAA blits from blorp to meta.
This gets us equivalent code paths on BDW and pre-BDW, except for stencil
(where we don't have MSAA stencil resolve code yet)
Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces
DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9).
v2: Move the new meta code to brw_meta_updownsample.c, name it
brw_meta_updownsample(), add a comment about
intel_rb_storage_first_mt_slice(), and rename that function and move
the RB generation into it (review ideas by Ken).
v3: Fix 2 src vs dst pasteos in previous change.
v4: Skip this path pre-gen8 for now, until we can analyze the glxgears
performance delta some more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 3 Mar 2014 17:16:48 +0000 (09:16 -0800)]
mesa: Stop skipping the FinishRenderTexture calls for winsys FBOs.
Now that BindRenderbufferTexImage() is a thing that drivers can do, winsys
FBOs *can* have NeedsFinishRenderTexture set.
v2: Keep the short-circuit for non-BindRenderbufferTexImage() drivers
(review by Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 21 Mar 2014 23:20:13 +0000 (16:20 -0700)]
i965: Skip reallocating the private MSAA miptree, unless it's resized.
Even if the singlesample_mt got reopened from DRI due to
pageflipping/buffer swapping, our private miptree shouldn't need any
changes.
Improves performance of a little swapbuffers-loving microbenchmark with
MSAA forced on, by 1.2371% +/- 0.624802% (n=102)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 21 Mar 2014 22:25:49 +0000 (15:25 -0700)]
i965: Simplify the no-reopening-the-winsys-buffer tests.
The formatting was weird, and the tests were duplicated, and it is
guaranteed that mt->region exists.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 21 Mar 2014 22:36:24 +0000 (15:36 -0700)]
i965: Don't forget to free the old singlesample_mt.
Fixes a memory leak with MSAA winsys buffers since my move of
singlesample_mt to the rb in
4e0924c5de5f3964e4ca81f923d877dbb59fad0a
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 21 Mar 2014 23:36:22 +0000 (16:36 -0700)]
i965: Add an env var for forcing window system MSAA.
Sometimes it would be nice to benchmark some app with MSAA versus not, but
it doesn't offer the controls you want. Just provide a handy knob to
force the issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 19 Mar 2014 02:19:02 +0000 (19:19 -0700)]
i965/vec4: Eliminate dead writes to the flag register.
For each write, search previous instructions for unread writes to the
flag register and remove them. Note that this will not eliminate the
last unread write.
total instructions in shared programs: 788074 -> 788004 (-0.01%)
instructions in affected programs: 4930 -> 4860 (-1.42%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 11 Mar 2014 20:16:37 +0000 (13:16 -0700)]
i965/vec4: Eliminate writes that are never read.
With an awful O(n^2) algorithm that searches previous instructions for
dead writes.
total instructions in shared programs: 805582 -> 788074 (-2.17%)
instructions in affected programs: 144561 -> 127053 (-12.11%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Wed, 12 Mar 2014 21:23:40 +0000 (14:23 -0700)]
i965/vec4: Factor code out of DCE into a separate function.
Will be reused in the next commit.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 11 Mar 2014 20:14:08 +0000 (13:14 -0700)]
i965/vec4: Let dead code eliminate trim dead channels.
That is, modify
mad dst, a, b, c
to be
mad dst.xyz, a, b, c
if dst.w is never read.
total instructions in shared programs: 811869 -> 805582 (-0.77%)
instructions in affected programs: 168287 -> 162000 (-3.74%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 11 Mar 2014 20:06:20 +0000 (13:06 -0700)]
i965/vec4: Track live ranges per-channel, not per vgrf.
Will be squashed with the next patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 11 Mar 2014 20:07:42 +0000 (13:07 -0700)]
i965/vec4: Don't dead code eliminate instructions writing the flag.
A future patch adds support for removing dead writes to the flag
register. This patch simplifies the logic until then.
total instructions in shared programs: 811813 -> 811869 (0.01%)
instructions in affected programs: 3378 -> 3434 (1.66%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 11 Mar 2014 20:04:26 +0000 (13:04 -0700)]
i965/vec4: Preparatory clean up of dead_code_eliminate().
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Thu, 13 Mar 2014 18:21:36 +0000 (11:21 -0700)]
i965/vec4: Add is_null() method to dst_reg.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Thu, 13 Mar 2014 18:22:08 +0000 (11:22 -0700)]
i965/vec4: Print the predicate in dump_instructions().
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Wed, 12 Mar 2014 07:16:24 +0000 (00:16 -0700)]
i965/vec4: Rename depends_on_flags() to reads_flag().
To be consistent with the fs backend.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Wed, 12 Mar 2014 07:14:07 +0000 (00:14 -0700)]
i965/vec4: Add and use vec4_instruction::writes_flag().
To be consistent with the fs backend. Also the instruction scheduler
incorrectly considered SEL with a conditional modifier to read the flag
register.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Wed, 12 Mar 2014 07:11:27 +0000 (00:11 -0700)]
i965/vec4: Add missing doxygen close brace.
Reviewed-by: Eric Anholt <eric@anholt.net>
Chris Forbes [Sun, 23 Mar 2014 09:41:28 +0000 (22:41 +1300)]
mesa: Generate FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT earlier
The ARB_framebuffer_object spec lists this case before the
FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER and
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases.
Fixes two broken cases in piglit's fbo-incomplete test, if
ARB_ES2_compatibility is not advertised. (If it is, this is masked
because the FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER /
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases are removed by that extension)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Chris Forbes [Sun, 23 Mar 2014 09:07:02 +0000 (22:07 +1300)]
mesa: Fix format matching checks for GL_INTENSITY* internalformats.
GL_INTENSITY has never been valid as a pixel format -- to get the memcpy
pack/unpack paths, the app needs to specify GL_RED as the pixel format
(or GL_RED_INTEGER for the integer formats).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Christian König [Sat, 22 Mar 2014 20:30:07 +0000 (21:30 +0100)]
st/mesa: recreate sampler view on context change v3
With shared glx contexts it is possible that a texture is create and used
in one context and then used in another one resulting in incorrect
sampler view usage.
v2: avoid template copy
v3: add XXX comment
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Kenneth Graunke [Fri, 21 Mar 2014 11:47:32 +0000 (04:47 -0700)]
i965: Report the type of color clear in INTEL_DEBUG=blorp.
It's useful to know whether a clear is fast (MCS-based), using the
SIMD16 repdata message, or slow.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Marek Olšák [Sat, 22 Mar 2014 16:25:26 +0000 (17:25 +0100)]
radeonsi: disable fast color clear for 1D-tiled surfaces on CIK
This will be re-enabled once my kernel fix lands.
Kenneth Graunke [Sat, 22 Mar 2014 00:02:41 +0000 (17:02 -0700)]
Revert "i965: For color clears, only disable writes to components that exist."
This reverts commit
2919c3fdb40cf457f2e47f378a46f4cefa9e9f6d.
For formats like BGRX, looping through 0..num_components works fine.
But for formats like XRGB, we'd check the color mask for X and fail to
check it for B.
Kenneth Graunke [Fri, 21 Mar 2014 10:47:16 +0000 (03:47 -0700)]
i965: For color clears, only disable writes to components that exist.
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 40%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Kenneth Graunke [Fri, 21 Mar 2014 09:33:09 +0000 (02:33 -0700)]
i965: Print number of multisamples in INTEL_DEBUG=blorp output.
This lets us distinguish MSAA resolves from other ordinary blits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Thu, 20 Mar 2014 21:41:43 +0000 (14:41 -0700)]
i965: Drop BLT TexSubImage Y-tiling restriction on Gen6+.
Currently, we don't use this path on Sandybridge because we suspect
other paths will be faster. But we potentially could. If we do, we
should allow it to support Y-tiled BLTs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Chris Forbes [Fri, 21 Mar 2014 10:00:58 +0000 (23:00 +1300)]
i965: Enable ARB_vertex_type_10f_11f_11f_rev for Gen4/5 also.
Tested on ILK and CTG (with the GL3isms taken out of the piglits).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tom Stellard [Fri, 21 Mar 2014 18:08:26 +0000 (11:08 -0700)]
clover: Fix typo in validate_object()
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Roland Scheidegger [Thu, 20 Mar 2014 15:43:36 +0000 (16:43 +0100)]
llvmpipe: add support for b5g6r5_srgb
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Thu, 20 Mar 2014 15:27:57 +0000 (16:27 +0100)]
gallium: add b5g6r5 srgb format
GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit
for the rgb channels, though some hw could easily do it (typically for formats
with up to 10 bits for the rgb channels, at least for formats with less than 8
bits support is likely widespread even). While it may be true there aren't
really any benefits for such formats, we need for it for d3d, though luckily
only for b5g6r5_srgb it seems.
So add this format along with the util code for conversion - since that util
code is heavily tuned for 8bit srgb this isn't really all that well optimized
and rounding doesn't seem right but at least it should give some halfway
meaningful results.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Ilia Mirkin [Thu, 20 Mar 2014 21:37:00 +0000 (17:37 -0400)]
nvc0/ir: move sample id to second source arg to fix sampler2DMS
The nvc0 texfetch instruction expects the sample id to be in the second
source (usually used for the offset) rather than as part of the texture
coordinate.
This fixes all the sampler2DMS/Array tests on nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Marek Olšák [Mon, 10 Mar 2014 15:27:21 +0000 (16:27 +0100)]
st/mesa: drop the lowering of quad strips to triangle strips
This fallback to triangle strips is silly and should be done in drivers
if they need it.
This should fix the case when quad strips are used with flatshading that is
enabled by the "flat" GLSL varying modifier. It also fixes primitive restart
for quad strips.
This fixes piglit:
NV_primitive_restart/primitive-restart-draw-mode-quad_strip
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Wed, 12 Mar 2014 01:18:06 +0000 (02:18 +0100)]
gallium/u_gen_mipmap: remove the software fallback
The last changes to it are from 2008 and 2009.
It doesn't support most texture formats and some texture targets.
Nobody can possibly be using this.
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Tue, 11 Mar 2014 13:52:39 +0000 (14:52 +0100)]
st/mesa: fix generating mipmaps for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Wed, 12 Mar 2014 01:04:03 +0000 (02:04 +0100)]
mesa: fix software fallback for generating mipmaps for 3D textures
It didn't use the driver-provided src/dstRowStride at all.
This was broken for the cases when stride != width*bpp.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Tue, 11 Mar 2014 14:04:33 +0000 (15:04 +0100)]
mesa: fix software fallback for generating mipmaps for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Tue, 11 Mar 2014 14:04:00 +0000 (15:04 +0100)]
mesa: allow generating mipmaps for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Tue, 11 Mar 2014 14:02:39 +0000 (15:02 +0100)]
mesa: fix texture border handling for cube arrays
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Mon, 17 Mar 2014 00:19:51 +0000 (01:19 +0100)]
r600g: use more appropriate names for async DMA functions
*_dma_copy calls either *_dma_copy_buffer or *_dma_copy_tile.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Mon, 17 Mar 2014 00:18:43 +0000 (01:18 +0100)]
r600g: deobfuscate async DMA code
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 16 Mar 2014 18:59:50 +0000 (19:59 +0100)]
r600g: don't flush the gfx IB explicitly before doing DMA
It's flushed by calling r600_context_bo_reloc.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 16 Mar 2014 16:17:58 +0000 (17:17 +0100)]
winsys/radeon: only add duplicate relocations for DMA if VM isn't supported
Also rewrite the comment for it to be readable and reorder the code.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Niels Ole Salscheider [Mon, 17 Mar 2014 17:48:06 +0000 (18:48 +0100)]
radeonsi: Implement DMA blit
This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.
I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.
v2: Marek> removed gfx.flush in si_dma_copy_tile
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Niels Ole Salscheider [Mon, 17 Mar 2014 17:48:05 +0000 (18:48 +0100)]
radeon: Move r600_need_dma_space to common code
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Richard Sandiford [Wed, 19 Mar 2014 17:12:38 +0000 (17:12 +0000)]
llvmpipe: Tighten check for alpha-only formats
The AoS version of ld_build_blend_factor was assuming that if the first
channel was alpha, there were no rgb components.
Fixes glean/blendFunc on System z. No piglit regressions on x86_64.
The shortcut is still used in tests like spec/ARB_framebuffer_object/
fbo-alpha.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Jonathan Gray [Thu, 20 Mar 2014 03:43:29 +0000 (14:43 +1100)]
nouveau: don't assume libdrm include prefix
drm headers may be installed in a different directory
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Jonathan Gray [Thu, 20 Mar 2014 03:45:04 +0000 (14:45 +1100)]
nouveau: use DLOPEN_LIBS instead of -ldl
libdl does not exist on many platforms which have dlopen in libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Brian Paul [Tue, 18 Mar 2014 23:25:07 +0000 (17:25 -0600)]
c11/threads: don't include assert.h if the assert macro is already defined
In the gallium code, the assert() macro could come from either the
system's assert.h file (via c11/threads.h) or from gallium's u_debug.h.
It looks like all known assert.h files unconditionally #undef assert
before defining their own version. So the assert you get depends on
whether threads.h or u_debug.h was included last.
In the gallium code we really want to use the assert() from u_debug.h
(it behaves better on Windows). In gallium, c11/threads.h is only
included after u_debug.h in the os_thread.h wrapper. So Adding
an #ifndef assert test in the threads*.h files avoids using the system's
assert().
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Ilia Mirkin [Mon, 17 Mar 2014 14:37:12 +0000 (10:37 -0400)]
nouveau: there may not have been a texture if the fbo was incomplete
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Thu, 13 Mar 2014 14:36:25 +0000 (10:36 -0400)]
nouveau: add forgotten GL_COMPRESSED_INTENSITY to texture format list
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Thu, 13 Mar 2014 11:01:15 +0000 (07:01 -0400)]
mesa/main: condition GL_DEPTH_STENCIL on ARB_depth_texture
EXT_packed_depth_stencil is supported by all drivers, but
ARB_depth_texture isn't (notably nouveau_vieux). This should avoid
passing unexpected values down to ChooseTextureFormat.
The EXT_packed_depth_stencil spec does not make any explicit references
to requiring ARB_depth_texture in order to allow textures with that
format, however if there is no dependency, ARB_depth_texture would be
practically implied by the extension.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Note for 10.0 backport: This will produce a conflict, the solution is to
move the surrounding if as well.
Ilia Mirkin [Mon, 17 Mar 2014 18:42:12 +0000 (14:42 -0400)]
loader: add special logic to distinguish nouveau from nouveau_vieux
There are a lot of different pci ids supported by nouveau, and more are
added all the time. The relevant distinguisher between drivers is the
chipset id.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Matt Turner [Sun, 2 Mar 2014 18:34:45 +0000 (10:34 -0800)]
glsl: Allow dot() on scalars, and throw out dotlike().
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Fri, 28 Feb 2014 21:33:19 +0000 (13:33 -0800)]
glsl: Optimize pow(x, 2) into x * x.
Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 18 Mar 2014 22:40:49 +0000 (15:40 -0700)]
glsl: Match whitespace changes from previous patch.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 11 Mar 2014 19:36:04 +0000 (12:36 -0700)]
glsl: Expose pack/unpack built-ins for ARB_gpu_shader5.
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 13 Mar 2014 23:53:09 +0000 (16:53 -0700)]
i965: Drop some more dead code from the old CACHED_BATCH feature.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 5 Mar 2014 01:26:54 +0000 (17:26 -0800)]
i965: Drop special case for edgeflag thanks to Marek's change to core.
As of
780ce576bb1781f027797039693b98253ee4813e, we end up with R8_SSCALED
anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Brian Paul [Tue, 18 Mar 2014 17:55:50 +0000 (11:55 -0600)]
mesa: include stdbool.h in register_allocate.h to fix build
https://bugs.freedesktop.org/show_bug.cgi?id=76331
Ian Romanick [Tue, 4 Mar 2014 09:08:05 +0000 (11:08 +0200)]
i965: Enable EWA anisotropic filtering algorithm
Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA
approximation algorithm results in higher image quality than the legacy
algorithm." Using a classic anisotropic filtering "tunnel" demo, it
appears that there is *no* anisotropic filtering on IVB without this bit
set.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Tue, 18 Mar 2014 17:49:10 +0000 (10:49 -0700)]
i965: Actually initialize simd16_unsupported and no16_msg.
I meant to include this fixes in v3 of commit
de7ad2c88f4ec243c95eaed22c41d0e537912e01, but accidentally pushed a
previous version.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Mon, 3 Mar 2014 07:09:20 +0000 (23:09 -0800)]
i965/upload: Refactor open-coded ALIGN-like computations.
Sadly, we can't use actual ALIGN(), since that only supports
power-of-two values for the alignment parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Mon, 3 Mar 2014 00:39:56 +0000 (16:39 -0800)]
i965: Fix indentation in brw_upload_indices().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Sun, 2 Mar 2014 23:02:54 +0000 (15:02 -0800)]
i965: Consolidate code for setting brw->ib.start_vertex_offset.
This was set identically in three places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Mon, 17 Mar 2014 20:53:44 +0000 (13:53 -0700)]
i965: Allocate register sets at screen creation, not context creation.
Register sets depend on the particular hardware generation, but don't
depend on anything in the actual OpenGL context. Computing them is
fairly expensive, and they take up a large amount of memory. Putting
them in the screen allows us to compute/allocate them once for all
contexts, saving both time and space.
Improves the performance of a context creation/destruction
microbenchmark by about 3x on my Haswell i7-4750HQ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Mon, 17 Mar 2014 20:57:14 +0000 (13:57 -0700)]
i965: Allocate the screen using ralloc rather than calloc.
This will allow us to use the screen as a memory context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Mon, 17 Mar 2014 21:53:08 +0000 (14:53 -0700)]
ra: Convert another bool array to bitsets.
This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1,
with no performance difference on timing short shader-db runs (n=9/10,
warmup outlier removed).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Sat, 22 Feb 2014 03:50:15 +0000 (19:50 -0800)]
ra: Use a bitset for storing which registers belong to a class.
This should use 1/8 the memory.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
Kenneth Graunke [Sat, 22 Feb 2014 03:31:44 +0000 (19:31 -0800)]
ra: Create a reg_belongs_to_class() helper function.
This is a little easier to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
Kenneth Graunke [Sat, 22 Feb 2014 03:32:24 +0000 (19:32 -0800)]
ra: Use bool instead of GLboolean.
This isn't the GL API, so there's no reason to use GLboolean.
Using bool is safer: any non-zero value is treated as "true". When
converting a value to a GLboolean, all but the low byte is discarded,
which means that values like 256 will be incorrectly rendered as false.
Done via the following vim commands:
:%s/GLboolean/bool/g
:%s/GL_TRUE/true/g
:%s/GL_FALSE/false/g
and one line of manual whitespace tidying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Kenneth Graunke [Fri, 7 Mar 2014 08:49:45 +0000 (00:49 -0800)]
i965: Accurately bail on SIMD16 compiles.
Ideally, we'd like to never even attempt the SIMD16 compile if we could
know ahead of time that it won't succeed---it's purely a waste of time.
This is especially important for state-based recompiles, which happen at
draw time.
The fragment shader compiler has a number of checks like:
if (dispatch_width == 16)
fail("...some reason...");
This patch introduces a new no16() function which replaces the above
pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag.
Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and
issue a helpful performance warning if INTEL_DEBUG=perf is set. (In
SIMD16 mode, no16() calls fail(), for safety's sake.)
The great part is that this is not a heuristic---if the flag is set, we
know with 100% certainty that the SIMD16 compile would fail. (It might
fail anyway if we run out of registers, but it's always worth trying.)
v2: Fix missing va_end in early-return case (caught by Ilia Mirkin).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Sat, 8 Mar 2014 00:10:50 +0000 (16:10 -0800)]
i965/fs: Support pull parameters in SIMD16 mode.
This is just a matter of reusing the pull/push constant information set
up by the SIMD8 compile.
This gains us 78 SIMD16 programs in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Wed, 12 Mar 2014 05:24:39 +0000 (22:24 -0700)]
i965/fs: Use a single instance of the pull_constant_loc[] array.
Now that we don't renumber uniform registers, assign_constant_locations
and move_uniform_array_access_to_pull_constants use the same names.
So, they can share a single copy of the pull_constant_loc[] array.
This simplifies the code considerably. assign_constant_locations()
doesn't need to walk through pull_params[] to rediscover reladdr
demotions; it just has that information in pull_constant_loc[]. We also
only need to rewrite the instruction stream once, instead of twice.
Even better, we now have a single array describing the layout of
all pull parameters, which we can pass to the SIMD16 program.
This actually hurts a few shaders in Serious Sam 3, and one in KWin:
total instructions in shared programs:
1841957 ->
1842035 (0.00%)
instructions in affected programs: 1165 -> 1243 (6.70%)
Comparing dump_instructions() before and after the pull constant
transformations with and without this patch, it appears that there is
a uniform array with variable indexing (reladdr) and constant indexing
(of array element 0). Previously, we uploaded array element 0 as both
a pull constant (for reladdr) /and/ a push constant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Tue, 11 Mar 2014 21:35:27 +0000 (14:35 -0700)]
i965/fs: Don't renumber UNIFORM registers.
Previously, remove_dead_constants() would renumber the UNIFORM registers
to be sequential starting from zero, and the resulting register number
would be used directly as an index into the params[] array.
This renumbering made it difficult to collect and save information about
pull constant locations, since setup_pull_constants() and
move_uniform_array_access_to_pull_constants() used different names.
This patch generalizes setup_pull_constants() to decide whether each
uniform register should be a pull constant, push constant, or neither
(because it's unused). Then, it stores mappings from UNIFORM register
numbers to params[] or pull_params[] indices in the push_constant_loc
and pull_constant_loc arrays. (We already did this for pull constants.)
Then, assign_curb_setup() just needs to consult the push_constant_loc
array to get the real index into the params[] array.
This effectively folds all the remove_dead_constants() functionality
into assign_constant_locations(), while being less irritable to work
with.
v2: Add assert(remapped <= i), requested by Topi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Mon, 10 Mar 2014 20:14:03 +0000 (13:14 -0700)]
i965/fs: Split pull parameter decision making from mechanical demoting.
move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:
1. Decide which UNIFORM registers to demote to pull constants, and
assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
value into a temporary VGRF and use that, eliminating the UNIFORM
file access.
In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.
This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.
For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things. So, we get an ugly boolean parameter saying
which to do. This will go away.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 7 Mar 2014 23:45:13 +0000 (15:45 -0800)]
i965/fs: Record pull constant locations for all array elements.
When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).
Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 7 Mar 2014 10:10:14 +0000 (02:10 -0800)]
i965/fs: Save push constant location information.
Previously, both move_uniform_array_access_to_pull_constants() and
setup_pull_constants() maintained stack-local arrays with this
information. Storing this information will allow it to be used from
multiple functions, allowing us to split and move code around.
We'll also eventually want to pass pull constant location information
to the SIMD16 compile. Saving this information will help us do that.
Unfortunately, the two functions *cannot* share the contents of the
array just yet. remove_dead_constants() renumbers all the UNIFORM
registers to be contiguous starting at zero, so the two functions
talk about uniforms using different names. We can't even remap them,
since move_uniform_array_access_to_pull_constants() deletes UNIFORM
registers that are only accessed with reladdr, so remove_dead_constants
can't even see them.
This situation will improve in the next few patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Tue, 11 Mar 2014 06:22:48 +0000 (23:22 -0700)]
i965/fs: Delete dead code to fail compiles with SIMD16 pull parameters.
The SIMD8 compile will determine whether pull parameters are necessary.
If so, it will set prog_data->nr_pull_params to a value greater than 0.
brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile
altogether. So, this code should never occur.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Brian Paul [Mon, 17 Mar 2014 21:13:55 +0000 (15:13 -0600)]
gallium/docs: update SLT, SGE, SFL, STR opcode docs
To emphasize that the result is floating point 1.0 or 0.0, to match
other opcodes like SLE and SEQ.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Charmaine Lee [Thu, 13 Mar 2014 17:33:00 +0000 (11:33 -0600)]
glx: Fix incorrect pdp assignment in dri2_bind_context().
pdp should be set to dpyPriv->dri2Display.
Fixes blank frame failure running glretrace ClearView.
Reviewed-by: Brian Paul <brianp@vmware.com>
Maarten Lankhorst [Tue, 18 Mar 2014 13:47:40 +0000 (14:47 +0100)]
nvc0: Handle user mapped vertex buffer for edgeflag
Handle mapping edgeflag data similar to the code around it.
This fixes a crash in piglit test gl-2.0-edgeflag.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Francisco Jerez [Fri, 14 Mar 2014 10:31:11 +0000 (11:31 +0100)]
clover: Fix region size error checking in some buffer transfer commands.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Ilia Mirkin [Sat, 15 Mar 2014 16:42:51 +0000 (12:42 -0400)]
nv50/ir/gk110: add postfactor support for fmul
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 15 Mar 2014 14:22:22 +0000 (10:22 -0400)]
nv50/ir/gk110: set not modifier on first source of logic op
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 14 Mar 2014 12:16:00 +0000 (08:16 -0400)]
nv50/ir/gk110: use shl/shr instead of lshf/rshf so that c[] is supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 14 Mar 2014 10:20:36 +0000 (06:20 -0400)]
nv50/ir/gk110: add 64/128-bit fetch/export support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 14 Mar 2014 10:11:37 +0000 (06:11 -0400)]
nv50/ir/gk110: fix handling of OP_SUB for floating point ops
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 14 Mar 2014 09:46:14 +0000 (05:46 -0400)]
nv50/ir/gk110: presin/preex2 take their source at bit 23
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>