Eric Anholt [Mon, 11 Oct 2010 20:38:38 +0000 (13:38 -0700)]
i965: Don't compute-to-MRF in gen6 math instructions.
Eric Anholt [Mon, 11 Oct 2010 20:30:12 +0000 (13:30 -0700)]
i965: Add a couple of checks for gen6 math instruction limits.
Eric Anholt [Mon, 11 Oct 2010 20:19:47 +0000 (13:19 -0700)]
i965: Don't consider gen6 math instructions to write to MRFs.
This was leftover from the pre-gen6 cleanups. One tests regresses
where compute-to-MRF now occurs.
Chad Versace [Fri, 8 Oct 2010 19:05:02 +0000 (12:05 -0700)]
glsl: Changes in generated file glsl_lexer.cpp
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Chad Versace [Fri, 8 Oct 2010 19:03:40 +0000 (12:03 -0700)]
glsl: Add lexer rules for uint and uvecN (N=2..4)
Commit for generated file glsl_lexer.cpp follows this commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Chad Versace [Thu, 7 Oct 2010 23:05:39 +0000 (16:05 -0700)]
glsl: Add glsl_type::uvecN_type for N=2,3
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Chad Versace [Thu, 7 Oct 2010 23:04:30 +0000 (16:04 -0700)]
intel_extensions: Add ability to set GLSL version via environment
Add ability to set the GLSL version used by the GLcontext by setting the
environment variable INTEL_GLSL_VERSION. For example,
env INTEL_GLSL_VERSION=130 prog args
If the environment variable is missing, the GLSL versions defaults to 120.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Daniel Vetter [Sun, 10 Oct 2010 15:04:42 +0000 (17:04 +0200)]
r200: revalidate after radeon_update_renderbuffers
By calling radeon_draw_buffers (which sets the necessary flags
in radeon->NewGLState) and revalidating if NewGLState is non-zero
in r200TclPrimitive. This fixes an assert in libdrm (the color-/
depthbuffer was changed but not yet validated) and and stops the
kernel cs checker from complaining about them (when they're too
small).
Thanks to Mario Kleiner for the hint to call radeon_draw_buffer
(instead of my half-broken hack).
v2: Also fix the swtcl r200 path.
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Eric Anholt [Fri, 8 Oct 2010 21:00:14 +0000 (14:00 -0700)]
i965: Compute to MRF in the new FS backend.
This didn't produce a statistically significant performance difference
in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea
and is recommended by the HW team.
Eric Anholt [Fri, 8 Oct 2010 22:11:42 +0000 (15:11 -0700)]
i965: Give the FB write and texture opcodes the info on base MRF, like math.
Eric Anholt [Fri, 8 Oct 2010 21:35:34 +0000 (14:35 -0700)]
i965: Give the math opcodes information on base mrf/mrf len.
This is progress towards enabling a compute-to-MRF pass.
Eric Anholt [Sun, 10 Oct 2010 22:42:37 +0000 (15:42 -0700)]
i965: Move FS backend structures to a header.
It's time to start splitting some of this up.
Eric Anholt [Sun, 10 Oct 2010 19:13:35 +0000 (12:13 -0700)]
i965: Reduce register interference checks for changed FS_OPCODE_DISCARD.
While I don't know of any performance changes from this (once extra
reg available out of 128), it makes the generated asm a lot cleaner
looking.
Eric Anholt [Sun, 10 Oct 2010 18:54:05 +0000 (11:54 -0700)]
i965: Split FS_OPCODE_DISCARD into two steps.
Having the single opcode write then read the reg meant that single
instruction opcodes had to consider their source regs to interfere
with their dest regs.
José Fonseca [Thu, 2 Sep 2010 15:30:23 +0000 (16:30 +0100)]
llvmpipe: Use lp_tgsi_info.
José Fonseca [Thu, 2 Sep 2010 14:54:07 +0000 (15:54 +0100)]
gallivm: More detailed analysis of tgsi shaders.
To allow more optimizations, in particular for direct textures.
José Fonseca [Thu, 2 Sep 2010 14:52:44 +0000 (15:52 +0100)]
tgsi: Export some names for some tgsi enums.
Useful to give human legible names in other cases.
José Fonseca [Sun, 22 Aug 2010 16:27:56 +0000 (17:27 +0100)]
gallium: Define C99 restrict keyword where absent.
José Fonseca [Sun, 10 Oct 2010 22:55:24 +0000 (23:55 +0100)]
gallivm: Eliminate unsigned integer arithmetic from texture coordinates.
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.
In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.
So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
José Fonseca [Sun, 10 Oct 2010 22:36:14 +0000 (23:36 +0100)]
llvmpipe: Remove outdated comment about stencil testing.
Dave Airlie [Mon, 11 Oct 2010 06:20:56 +0000 (16:20 +1000)]
r600g: don't run with scissors.
This could probably be done much nicer, I've spent a day chasing
a coherency problem in the kernel, that turned out to be incorrect
scissor setup.
Dave Airlie [Mon, 11 Oct 2010 02:18:05 +0000 (12:18 +1000)]
r600g: add TXL opcode support.
fixes glsl1-2D Texture lookup with explicit lod (Vertex shader)
Dave Airlie [Mon, 11 Oct 2010 01:58:27 +0000 (11:58 +1000)]
r600g: enable vertex samplers.
We need to move the texture sampler resources out of the range of the vertex attribs.
We could probably improve this using an allocator but this is the simple answer for now.
makes mesa-demos/src/glsl/vert-tex work.
Dave Airlie [Fri, 8 Oct 2010 00:17:51 +0000 (10:17 +1000)]
r600g: evergreen has no request size bit in texture word4
Dave Airlie [Thu, 7 Oct 2010 23:51:09 +0000 (09:51 +1000)]
r600g: fix input/output Z export mixup for evergreen.
José Fonseca [Sun, 10 Oct 2010 18:51:35 +0000 (19:51 +0100)]
gallivm: Pass texture coords derivates as scalars.
We end up treating them as scalars in the end, and it saves some
instructions.
José Fonseca [Sun, 10 Oct 2010 18:05:05 +0000 (19:05 +0100)]
gallivm: Use variables instead of Phis in loops.
With this commit all explicit Phi emission is now gone.
José Fonseca [Sun, 10 Oct 2010 17:47:24 +0000 (18:47 +0100)]
gallivm: Allow to disable bri-linear filtering with GALLIVM_DEBUG=no_brilinear runtime option
José Fonseca [Sun, 10 Oct 2010 17:45:14 +0000 (18:45 +0100)]
gallivm: Fix a long standing bug with nested if-then-else emission.
We can't patch true-block at end-if time, as there is no guarantee that
the block at the beginning of the true stanza is the same at the end of
the true stanza -- other control flow elements may have been emitted half
way the true stanza.
Although this bug surfaced recently with the commit to skip mip filtering
when lod is an integer the bug was always there, although probably it
was avoided until now: e.g., cubemap selection nests if-then-else on the
else stanza, which does not suffer from the same problem.
Francisco Jerez [Sat, 9 Oct 2010 23:39:13 +0000 (01:39 +0200)]
dri/nv10: Fake fast Z clears for pre-nv17 cards.
Francisco Jerez [Sat, 9 Oct 2010 23:45:23 +0000 (01:45 +0200)]
dri/nouveau: Minor cleanup.
José Fonseca [Sat, 9 Oct 2010 20:39:14 +0000 (21:39 +0100)]
gallivm: Cleanup the rest of the flow module.
José Fonseca [Sat, 9 Oct 2010 20:14:05 +0000 (21:14 +0100)]
gallivm: Simplify if/then/else implementation.
No need for for a flow stack anymore.
José Fonseca [Sat, 9 Oct 2010 19:26:11 +0000 (20:26 +0100)]
gallivm: Factor out the SI->FP texture size conversion for SoA path too
José Fonseca [Sat, 9 Oct 2010 19:14:03 +0000 (20:14 +0100)]
gallivm: Remove support for Phi generation.
Simply rely on mem2reg pass. It's easier and more reliable.
José Fonseca [Sat, 9 Oct 2010 18:53:21 +0000 (19:53 +0100)]
gallivm: Use varilables instead of Phis for cubemap selection.
José Fonseca [Sat, 9 Oct 2010 11:55:31 +0000 (12:55 +0100)]
gallivm: Don't generate Phis for execution mask.
José Fonseca [Sat, 9 Oct 2010 11:12:03 +0000 (12:12 +0100)]
gallivm: Special bri-linear computation path for unmodified rho.
José Fonseca [Sat, 9 Oct 2010 11:11:20 +0000 (12:11 +0100)]
gallivm: Less code duplication in log computation.
José Fonseca [Sat, 9 Oct 2010 11:10:07 +0000 (12:10 +0100)]
util: Defined M_SQRT2 when not available.
José Fonseca [Sat, 9 Oct 2010 11:08:25 +0000 (12:08 +0100)]
gallivm: Handle code have ret correctly.
Stop disassembling on unconditional backwards jumps.
José Fonseca [Wed, 6 Oct 2010 20:01:38 +0000 (21:01 +0100)]
llvmpipe: Fix MSVC build. Enable the new SSE2 code on non SSE3 systems.
Keith Whitwell [Fri, 1 Oct 2010 14:13:51 +0000 (15:13 +0100)]
llvmpipe: simplified SSE2 swz/unswz routines
We've been using these in the linear path for a while now. Based on
Chris's SSSE3 code, but using only sse2 opcodes. Speed seems to be
identical, but code is simpler & removes dependency on SSE3.
Should be easier to extend to other rgba8 formats.
Keith Whitwell [Sat, 9 Oct 2010 10:28:00 +0000 (11:28 +0100)]
llvmpipe: clean up shader pre/postamble, try to catch more early-z
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.
Improves demos/fire.c especially in the case where you get close to
the trees.
Keith Whitwell [Thu, 7 Oct 2010 14:01:07 +0000 (15:01 +0100)]
llvmpipe: try to be sensible about whether to branch after mask updates
Don't branch more than once in quick succession. Don't branch at the
end of the shader.
Keith Whitwell [Wed, 6 Oct 2010 18:09:03 +0000 (19:09 +0100)]
gallivm: simpler uint8->float conversions
LLVM seems to finds it easier to reason about these than our
mantissa-manipulation code.
Keith Whitwell [Wed, 6 Oct 2010 18:10:30 +0000 (19:10 +0100)]
gallivm: prefer blendvb for integer arguments
Keith Whitwell [Wed, 6 Oct 2010 17:21:56 +0000 (18:21 +0100)]
gallivm: specialized x8z24 depthtest path
Avoid unnecessary masking of non-existant stencil component.
Keith Whitwell [Thu, 7 Oct 2010 18:01:12 +0000 (19:01 +0100)]
llvmpipe: dump fragment shader ir and asm when LP_DEBUG=fs
Better than GALLIVM_DEBUG if you're only interested in fragment shaders.
Keith Whitwell [Thu, 7 Oct 2010 18:49:20 +0000 (19:49 +0100)]
llvmpipe: store zero into all alloca'd values
Fixes slowdown in isosurf with earlier versions of llvm.
Keith Whitwell [Thu, 7 Oct 2010 17:59:54 +0000 (18:59 +0100)]
llvmpipe: use alloca for fs color outputs
Don't try to emit our own phi's, let llvm mem2reg do it for us.
Keith Whitwell [Wed, 6 Oct 2010 21:25:48 +0000 (22:25 +0100)]
llvmpipe: defer attribute interpolation until after mask and ztest
Don't calculate 1/w for quads which aren't visible...
José Fonseca [Wed, 6 Oct 2010 19:42:30 +0000 (20:42 +0100)]
llvmpipe: Prevent z > 1.0
The current interpolation schemes causes precision loss.
Changing the operation order helps, but does not completely avoid the
problem.
The only short term solution is to clamp z to 1.0.
This is unfortunate, but probably unavoidable until interpolation is
improved.
José Fonseca [Sat, 9 Oct 2010 08:34:31 +0000 (09:34 +0100)]
gallivm: Do size computations simultanously for all dimensions (AoS).
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.
Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.
AoS path only for now -- the same thing can be done for SoA.
Zack Rusin [Thu, 7 Oct 2010 20:26:17 +0000 (16:26 -0400)]
llvmpipe: fix rasterization of vertical lines on pixel boundaries
Vinson Lee [Fri, 8 Oct 2010 23:40:29 +0000 (16:40 -0700)]
i965: Initialize member variables.
Fixes these GCC warnings.
brw_wm_fp.c: In function 'search_or_add_const4f':
brw_wm_fp.c:92: warning: 'reg.Index2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.Index2' was declared here
brw_wm_fp.c:92: warning: 'reg.RelAddr2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.RelAddr2' was declared here
Vinson Lee [Fri, 8 Oct 2010 23:30:59 +0000 (16:30 -0700)]
i965: Silence unused variable warning on non-debug builds.
Fixes this GCC warning.
brw_vs.c: In function 'do_vs_prog':
brw_vs.c:46: warning: unused variable 'ctx'
Vinson Lee [Fri, 8 Oct 2010 23:02:59 +0000 (16:02 -0700)]
i965: Silence unused variable warning on non-debug builds.
Fixes this GCC warning.
brw_eu_emit.c: In function 'brw_math2':
brw_eu_emit.c:1189: warning: unused variable 'intel'
Vinson Lee [Fri, 8 Oct 2010 22:49:02 +0000 (15:49 -0700)]
i915: Silence unused variable warning in non-debug builds.
Fixes this GCC warning.
i915_vtbl.c: In function 'i915_assert_not_dirty':
i915_vtbl.c:670: warning: unused variable 'dirty'
Roland Scheidegger [Fri, 8 Oct 2010 22:35:58 +0000 (00:35 +0200)]
gallivm: make use of new iround code in lp_bld_conv.
Only requires sse2 now.
Roland Scheidegger [Fri, 8 Oct 2010 22:14:11 +0000 (00:14 +0200)]
gallivm: optimize soa linear clamp to edge wrap mode a bit
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
Roland Scheidegger [Fri, 8 Oct 2010 19:08:49 +0000 (21:08 +0200)]
gallivm: avoid unnecessary URem in linear wrap repeat case
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
Roland Scheidegger [Fri, 8 Oct 2010 19:06:04 +0000 (21:06 +0200)]
gallivm: more linear tex wrap mode calculation simplification
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
Roland Scheidegger [Fri, 8 Oct 2010 18:57:50 +0000 (20:57 +0200)]
gallivm: optimize some tex wrap mode calculations a bit
Sometimes coords are clamped to positive numbers before doing conversion
to int, or clamped to 0 afterwards, in this case can use itrunc
instead of ifloor which is easier. This is only the case for nearest
calculations unfortunately, except linear MIRROR_CLAMP_TO_EDGE which
for the same reason can use a unsigned float build context so the
ifloor_fract helper can reduce this to itrunc in the ifloor helper itself.
Roland Scheidegger [Fri, 8 Oct 2010 16:43:49 +0000 (18:43 +0200)]
gallivm: replace sub/floor/ifloor combo with ifloor_fract
Roland Scheidegger [Fri, 8 Oct 2010 16:38:25 +0000 (18:38 +0200)]
gallivm: faster iround implementation for sse2
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
Roland Scheidegger [Fri, 8 Oct 2010 16:31:38 +0000 (18:31 +0200)]
gallivm: fix trunc/itrunc comment
trunc of -1.5 is -1.0 not 1.0...
Vinson Lee [Fri, 8 Oct 2010 22:33:44 +0000 (15:33 -0700)]
i915: Silence unused variable warning in non-debug builds.
Fixes this GCC warning.
i830_vtbl.c: In function 'i830_assert_not_dirty':
i830_vtbl.c:704: warning: unused variable 'i830'
Ian Romanick [Fri, 8 Oct 2010 21:55:27 +0000 (14:55 -0700)]
docs: Update status of GL 3.x related extensions
Ian Romanick [Fri, 8 Oct 2010 21:50:34 +0000 (14:50 -0700)]
docs: skeleton for 7.10 release notes
Ian Romanick [Fri, 8 Oct 2010 21:29:11 +0000 (14:29 -0700)]
glsl: Remove const decoration from inlined function parameters
The constness of the function parameter gets inlined with the rest of
the function. However, there is also an assignment to the parameter.
If this occurs inside a loop the loop analysis code will get confused
by the assignment to a read-only variable.
Fixes bugzilla #30552.
NOTE: this is a candidate for the 7.9 branch.
Ian Romanick [Thu, 7 Oct 2010 22:15:44 +0000 (15:15 -0700)]
intel: Enable GL_ARB_explicit_attrib_location
Ian Romanick [Tue, 5 Oct 2010 23:01:17 +0000 (16:01 -0700)]
main: Enable GL_ARB_explicit_attrib_location for swrast
Ian Romanick [Fri, 8 Oct 2010 00:21:22 +0000 (17:21 -0700)]
glsl: Add linker support for explicit attribute locations
Ian Romanick [Thu, 7 Oct 2010 22:13:38 +0000 (15:13 -0700)]
glsl: Track explicit location in AST to IR translation
Ian Romanick [Wed, 6 Oct 2010 00:03:25 +0000 (17:03 -0700)]
glsl: Regenerate files changes by previous commit
Ian Romanick [Wed, 6 Oct 2010 00:00:31 +0000 (17:00 -0700)]
glsl: Add parser support for GL_ARB_explicit_attrib_location layouts
Only layout(location=#) is supported. Setting the index requires GLSL
1.30 and GL_ARB_blend_func_extended.
Ian Romanick [Tue, 5 Oct 2010 23:03:12 +0000 (16:03 -0700)]
glcpp: Regenerate files changes by previous commit
Ian Romanick [Tue, 5 Oct 2010 23:02:38 +0000 (16:02 -0700)]
glcpp: Add the define for ARB_explicit_attrib_location when present
Ian Romanick [Tue, 5 Oct 2010 23:40:00 +0000 (16:40 -0700)]
glsl: Regenerate files modified by previous commits
Ian Romanick [Tue, 5 Oct 2010 23:38:47 +0000 (16:38 -0700)]
glsl: Wrap ast_type_qualifier contents in a struct in a union
This will ease adding non-bit fields in the near future.
Ian Romanick [Tue, 5 Oct 2010 23:23:32 +0000 (16:23 -0700)]
glsl: Clear type_qualifier using memset
Ian Romanick [Tue, 5 Oct 2010 23:18:56 +0000 (16:18 -0700)]
glsl: Slight refactor of error / warning checking for ARB_fcc layout
Ian Romanick [Tue, 5 Oct 2010 23:14:18 +0000 (16:14 -0700)]
glsl: Refactor 'layout' grammar to match GLSL 1.60 spec grammar
Ian Romanick [Fri, 8 Oct 2010 00:20:15 +0000 (17:20 -0700)]
glsl: Fail linking if assign_attribute_locations fails
Vinson Lee [Fri, 8 Oct 2010 21:17:14 +0000 (14:17 -0700)]
r600g: Silence uninitialized variable warning.
Vinson Lee [Fri, 8 Oct 2010 21:14:16 +0000 (14:14 -0700)]
r600g: Silence uninitialized variable warning.
Vinson Lee [Fri, 8 Oct 2010 21:08:50 +0000 (14:08 -0700)]
r600g: Silence uninitialized variable warning.
Vinson Lee [Fri, 8 Oct 2010 21:03:10 +0000 (14:03 -0700)]
gallivm: Remove unnecessary header.
Eric Anholt [Tue, 5 Oct 2010 17:29:42 +0000 (10:29 -0700)]
i965: Add register coalescing to the new FS backend.
Improves performance of my GLSL demo 14.3% (+/- 4%, n=4) by
eliminating the moves used in ir_assignment and ir_swizzle handling.
Still 16.5% to go to catch up to the Mesa IR backend, presumably
because instructions are almost perfectly mis-scheduled now.
Eric Anholt [Fri, 8 Oct 2010 18:16:16 +0000 (11:16 -0700)]
i965: Enable attribute swizzling (repositioning) in the gen6 SF.
We were trying to remap a fully-filled array down to only handing the
WM the components it uses. This is called attribute swizzling, and if
you don't enable it you just get 1:1 mappings of inputs to outputs.
This almost fixes glsl-routing, except for the highest gl_TexCoord[]
indices.
Eric Anholt [Fri, 8 Oct 2010 18:14:48 +0000 (11:14 -0700)]
i965: Fix new FS gen6 interpolation for sparsely-populated arrays.
We'd overwrite the same element twice.
Eric Anholt [Fri, 8 Oct 2010 18:33:40 +0000 (11:33 -0700)]
i965: Fix gen6 WM push constants updates.
We would compute a new buffer, but never point the hardware at the new
buffer. This partially fixes glsl-routing, as now it get the updated
uniform for which attribute to draw.
José Fonseca [Fri, 8 Oct 2010 18:48:16 +0000 (19:48 +0100)]
gallivm: Help for combined extraction and broadcasting.
Doesn't change generated code quality, but saves some typing.
José Fonseca [Fri, 8 Oct 2010 18:11:32 +0000 (19:11 +0100)]
llvmpipe: First minify the texture size, then broadcast.
José Fonseca [Fri, 8 Oct 2010 17:24:54 +0000 (18:24 +0100)]
gallivm: Move into the as much of the second level code as possible.
Also, pass more stuff trhough the sample build context, instead of
arguments.
Eric Anholt [Fri, 8 Oct 2010 05:37:36 +0000 (22:37 -0700)]
i965: Handle swizzles in the addition of YUV texture constants.
If someone happened to land a set in a different swizzle order, we
would have assertion failed.
Eric Anholt [Fri, 8 Oct 2010 05:39:41 +0000 (22:39 -0700)]
i965: Drop the check for YUV constants in the param list.
_mesa_add_unnamed_constant() already does that.
Eric Anholt [Fri, 8 Oct 2010 05:25:35 +0000 (22:25 -0700)]
i965: Drop the check for duplicate _mesa_add_state_reference.
_mesa_add_state_reference does that check for us anyway.
Eric Anholt [Fri, 8 Oct 2010 05:24:38 +0000 (22:24 -0700)]
mesa: Simplify a bit of _mesa_add_state_reference using memcmp.