Emil Velikov [Sat, 24 Jan 2015 12:49:17 +0000 (12:49 +0000)]
Add release notes for the 10.4.3 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit
e92bfa3f9512832b61706b76b5c9f7afa78199b0)
Matt Turner [Mon, 5 Jan 2015 21:51:03 +0000 (13:51 -0800)]
i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.
total instructions in shared programs:
5952059 ->
5951603 (-0.01%)
instructions in affected programs: 138812 -> 138356 (-0.33%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 31 Dec 2014 01:19:41 +0000 (17:19 -0800)]
i965/fs: Add support for removing MOV.NZ instructions.
For some reason, we occasionally write the flag register with a MOV.NZ
instruction:
add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F
cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F
mov.nz.f0(8) null g26<8,8,1>D
A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:
add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F
total instructions in shared programs:
5955701 ->
5951657 (-0.07%)
instructions in affected programs: 302910 -> 298866 (-1.34%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 30 Dec 2014 20:18:57 +0000 (12:18 -0800)]
i965/fs: Allow flipping cond mod for negated arguments.
This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:
add(8) temp a b
cmp.l.f0(8) null -temp 0.0
into
add.g.f0(8) temp a b
total instructions in shared programs:
5958360 ->
5955701 (-0.04%)
instructions in affected programs: 466880 -> 464221 (-0.57%)
GAINED: 0
LOST: 1
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 3 Jan 2015 20:18:15 +0000 (12:18 -0800)]
i965/fs: Propagate cmod across flag read if it contains the same value.
total instructions in shared programs:
5959463 ->
5958900 (-0.01%)
instructions in affected programs: 70031 -> 69468 (-0.80%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 6 Nov 2014 00:13:59 +0000 (16:13 -0800)]
i965/fs: Add unit tests for cmod propagation pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 22 Aug 2014 17:54:43 +0000 (10:54 -0700)]
i965/fs: Add pass to propagate conditional modifiers.
total instructions in shared programs:
5974160 ->
5959463 (-0.25%)
instructions in affected programs:
1743737 ->
1729040 (-0.84%)
GAINED: 0
LOST: 12
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 22 Aug 2014 18:01:13 +0000 (11:01 -0700)]
i965/fs: Eliminate null-dst instructions without side-effects.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 30 Dec 2014 20:56:13 +0000 (12:56 -0800)]
i965/fs: Apply conditional mod specially to split MAD/LRP.
Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.
NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 30 Dec 2014 04:33:12 +0000 (20:33 -0800)]
i965/fs: Add a pass to fixup 3-src instructions that have a null dest.
3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 30 Dec 2014 03:29:21 +0000 (19:29 -0800)]
i965: Add is_3src() to backend_instruction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sun, 24 Aug 2014 21:01:48 +0000 (14:01 -0700)]
i965: Add backend_instruction::can_do_cmod().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 31 Dec 2014 03:54:50 +0000 (19:54 -0800)]
i965/cfg: Add a foreach_block_reverse macro.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Matt Turner [Wed, 31 Dec 2014 00:14:43 +0000 (16:14 -0800)]
i965/cfg: Add a foreach_inst_in_block_reverse_safe macro.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Matt Turner [Wed, 31 Dec 2014 00:14:25 +0000 (16:14 -0800)]
glsl: Add a foreach_in_list_reverse_safe macro.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 9 Apr 2014 20:38:14 +0000 (13:38 -0700)]
i965: Don't make instructions with a null dest a barrier to scheduling.
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
GAINED: 103
LOST: 43
Causes one program to spill (643 -> 1076 instructions).
I committed this patch last year (commit
42a26cb5) but reverted it
(commit
0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Ian Romanick [Fri, 23 Jan 2015 20:31:05 +0000 (12:31 -0800)]
i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful
If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.
No piglit regressions on any i965 platform in Jenkins.
total instructions in shared programs:
4382707 ->
4382707 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 2089
LOST: 0
No other platforms affected in shader-db.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 5 Nov 2014 23:10:37 +0000 (15:10 -0800)]
nir: Expose nir_print_instr() for debug prints
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.
v2: Just pass a NULL state, now that it won't crash when you do.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Eric Anholt [Fri, 23 Jan 2015 22:47:50 +0000 (14:47 -0800)]
nir: When asked to print with a NULL state, just use bare variable names.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Eric Anholt [Thu, 13 Nov 2014 20:40:59 +0000 (12:40 -0800)]
nir: Add nir_lower_alu_to_scalar.
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.
v2: Use the nir_src_for_ssa() helper, and another instance of
nir_alu_src_copy().
v3: Drop the non-SSA support. All intended callers will have SSA-only ALU
ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
about weird input_sizes[].
Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
Eric Anholt [Wed, 21 Jan 2015 23:55:23 +0000 (15:55 -0800)]
nir: Make some helpers for copying ALU src/dests.
There aren't many users yet, but I wanted to do this from my scalarizing
pass.
v2: Constify the src arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Thu, 22 Jan 2015 07:52:17 +0000 (23:52 -0800)]
nir: Add algebraic optimizations for division and reciprocal.
These also exist in opt_algebraic.cpp.
total NIR instructions in shared programs:
2011430 ->
2011211 (-0.01%)
NIR instructions in affected programs: 42221 -> 42002 (-0.52%)
helped: 198
total i965 instructions in shared programs:
6020553 ->
6020116 (-0.01%)
i965 instructions in affected programs: 84322 -> 83885 (-0.52%)
helped: 394
HURT: 1 (by 1 instruction)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 22 Jan 2015 07:47:06 +0000 (23:47 -0800)]
nir: Add algebraic optimizations for exponential/logarithmic functions.
Most of these exist in the GLSL IR algebraic pass already. However,
SSA allows us to find more instances of the patterns.
total NIR instructions in shared programs:
2015593 ->
2011430 (-0.21%)
NIR instructions in affected programs: 124189 -> 120026 (-3.35%)
helped: 604
total i965 instructions in shared programs:
6025505 ->
6018717 (-0.11%)
i965 instructions in affected programs: 261295 -> 254507 (-2.60%)
helped: 1295
HURT: 3
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 22 Jan 2015 07:36:01 +0000 (23:36 -0800)]
nir: Add algebraic optimizations for simplifying comparisons.
The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.
The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.
total NIR instructions in shared programs:
2020814 ->
2015593 (-0.26%)
NIR instructions in affected programs: 411143 -> 405922 (-1.27%)
helped: 2233
HURT: 214
A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x. We can always clean that up later.
total i965 instructions in shared programs:
6035392 ->
6025505 (-0.16%)
i965 instructions in affected programs: 784980 -> 775093 (-1.26%)
helped: 4508
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 22 Jan 2015 07:25:56 +0000 (23:25 -0800)]
nir: Add algebraic optimizations for pointless shifts.
The GLSL IR optimization pass contained these; we may as well include
them too.
v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).
No change in the number of NIR instructions on a shader-db run.
total i965 instructions in shared programs:
6035397 ->
6035392 (-0.00%)
i965 instructions in affected programs: 542 -> 537 (-0.92%)
helped: 2 (in glamor)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 19 Jan 2015 22:57:38 +0000 (14:57 -0800)]
nir: Add a bunch of algebraic optimizations on logic/bit operations.
Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that. While there,
I decided to add a bunch more of them.
v2: Delete bogus fand/for optimizations (caught by Jason).
total NIR instructions in shared programs:
2023511 ->
2020814 (-0.13%)
NIR instructions in affected programs: 149634 -> 146937 (-1.80%)
helped: 1032
total i965 instructions in shared programs:
6035392 ->
6035397 (0.00%)
i965 instructions in affected programs: 537 -> 542 (0.93%)
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Mon, 19 Jan 2015 22:51:24 +0000 (14:51 -0800)]
nir: Implement CSE on intrinsics that can be eliminated and reordered.
Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value. This
made ALU operations on those values appear distinct as well.
Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.
Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered. I didn't implement
support for variables for the time being.
v2: Assert that info->num_variables == 0 (requested by Jason).
total NIR instructions in shared programs:
2435936 ->
2023511 (-16.93%)
NIR instructions in affected programs:
2413496 ->
2001071 (-17.09%)
helped: 16872
total i965 instructions in shared programs:
6028987 ->
6008427 (-0.34%)
i965 instructions in affected programs: 640654 -> 620094 (-3.21%)
helped: 2071
HURT: 585
GAINED: 14
LOST: 25
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Wed, 21 Jan 2015 20:20:59 +0000 (12:20 -0800)]
nir: Pull nir_instr_can_cse()'s SSA checks out of the switch.
This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.
The next patch will introduce another case that requires SSA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Wed, 21 Jan 2015 09:51:21 +0000 (01:51 -0800)]
i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.
This allows us to count NIR instructions via shader-db.
Use "run" as normal. The results file will contain both NIR and
assembly.
Then, to generate a NIR report:
./report.py <(grep NIR results/foo) <(grep NIR results/bar)
Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Tue, 20 Jan 2015 07:11:54 +0000 (23:11 -0800)]
i965/nir: Print NIR on INTEL_DEBUG=fs.
This is useful for debugging and looking for optimization opportunities.
It will need to be expanded when we add support for other scalar stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Tue, 20 Jan 2015 06:11:39 +0000 (22:11 -0800)]
i965/nir: Do optimizations again just before lowering source mods.
We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.
total instructions in shared programs:
6046190 ->
6045768 (-0.01%)
instructions in affected programs: 22406 -> 21984 (-1.88%)
helped: 47
HURT: 0
GAINED: 0
LOST: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Matt Turner [Thu, 18 Dec 2014 22:10:17 +0000 (14:10 -0800)]
loader: Remove NEED_OPENGL_COMMON check.
HAVE_DRICOMMON is sufficient since OpenGL must be enabled for DRI.
Matt Turner [Thu, 18 Dec 2014 21:35:47 +0000 (13:35 -0800)]
gitignore: Ignore .tar.xz files.
Matt Turner [Thu, 18 Dec 2014 19:44:07 +0000 (11:44 -0800)]
mesa: Build with subdir-objects.
Matt Turner [Thu, 18 Dec 2014 21:33:29 +0000 (13:33 -0800)]
glsl: Build a libglsl_util library.
Rather than sourcing files with ../dir/file.c which leads to distclean
wiping out ../dir's .deps directory.
Matt Turner [Thu, 18 Dec 2014 02:50:25 +0000 (18:50 -0800)]
mapi: Build with subdir-objects.
Matt Turner [Thu, 18 Dec 2014 04:28:19 +0000 (20:28 -0800)]
mapi: Remove vgapi from SUBDIRS.
OpenVG is disabled with via autotools.
Matt Turner [Thu, 18 Dec 2014 03:46:22 +0000 (19:46 -0800)]
mesa: Drop inclusion of glapi_gen.mk.
Some glapi headers used to be generated from this Makefile.am, but no
longer.
Matt Turner [Mon, 8 Dec 2014 00:09:35 +0000 (16:09 -0800)]
glsl: Build with subdir-objects.
Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.
Matt Turner [Fri, 23 Jan 2015 22:27:39 +0000 (14:27 -0800)]
nir: Add headers to distribution.
Matt Turner [Fri, 23 Jan 2015 22:25:46 +0000 (14:25 -0800)]
nir: Add nir_{opt_,}algebraic.py to distribution.
Matt Turner [Fri, 23 Jan 2015 22:25:10 +0000 (14:25 -0800)]
mesa: Add format_{un,}pack.py to distribution.
Matt Turner [Fri, 23 Jan 2015 21:35:25 +0000 (13:35 -0800)]
mesa: Remove pack_tmp.h from sources.
Missed in commit
3a4de321.
Connor Abbott [Fri, 23 Jan 2015 04:32:13 +0000 (23:32 -0500)]
nir: add generated file to .gitignore
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Ville Syrjälä [Mon, 19 Jan 2015 14:09:10 +0000 (16:09 +0200)]
i965: Fix min_vs_entries for CHV
According to BSpec the correct number for min_vs_entries is 34 for CHV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Mon, 19 Jan 2015 14:08:31 +0000 (16:08 +0200)]
i965: Fix max_wm_threads for CHV
Change max_wm_threads to match the spec on CHV. The max number of
threads in 3DSTATE_PS is always programmed to 64 and the hardware
internally scales that depending on the GT SKU. So this doesn't
change the max number of threads actually used, but it does affect
the scratch space calculation.
On CHV the old value was too small, so the amount of scratch space
allocated wasn't sufficient to satisfy the actual max number of
threads used.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Connor Abbott [Wed, 17 Dec 2014 03:32:21 +0000 (22:32 -0500)]
glsl: fix stale comment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 22 Jan 2015 23:49:56 +0000 (15:49 -0800)]
i965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversion
When emitting texturing from indirect texture units, we need to be able to
scratch around in the header message. Since we only do this for >= HSW,
this is ok since there are no MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj phogat <anuj.phogat@gmail.com>
Jason Ekstrand [Thu, 22 Jan 2015 21:46:44 +0000 (13:46 -0800)]
i965/emit: Do the sampler index adjustment directly in header.0.3
Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space. The usual candidate was
the destination of the sampler instruction. However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data. Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Axel Davy [Wed, 7 Jan 2015 09:27:23 +0000 (10:27 +0100)]
st/nine: Correctly handle when ff vs should have no texture coord input/output
Previous code semantic was:
. if ff ps will not run a ff stage, then do not output texture coords for this stage
for vs
. if XYZRHW is used (position_t), use only the mode where input coordinates are copied
to the outputs.
Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means
copy texture coord input to texture coord output if there is such input. The case
where there is no texture coord input wasn't handled correctly.
Drivers like r300 dislike when vs has inputs that are not fed.
Moreover if the app uses ff vs with a programmable ps, we shouldn't look at
what are the parameters of the ff ps to decide to output or not texture
coordinates.
The new code semantic is:
. if XYZRHW is used, restrict to PASSTHRU
. if PASSTHRU is used and no texture input is declared, then do not output
texture coords for this stage
The case where ff ps needs a texture coord input and ff vs doesn't output
it is not handled, and should probably be a runtime error.
This fixes 3Dmark05, which uses ff vs with programmable ps.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Mon, 5 Jan 2015 15:26:27 +0000 (16:26 +0100)]
st/nine: Change comment relating to vertex shader inputs not matching declaration
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 3 Jan 2015 10:29:40 +0000 (11:29 +0100)]
st/nine: Allocate vs constbuf buffer for indirect addressing once.
When the shader does indirect addressing on the constants,
we allocate a temporary constant buffer to which we copy
the constants from the app given user constants and
the constants filled in the shader.
This patch makes this buffer be allocated once.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Fri, 2 Jan 2015 13:38:01 +0000 (14:38 +0100)]
st/nine: Allocate the correct size for the user constant buffer
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Fri, 2 Jan 2015 13:22:17 +0000 (14:22 +0100)]
st/nine: Add variables containing the size of the constant buffers
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 7 Dec 2014 12:42:41 +0000 (13:42 +0100)]
st/nine: Fix sm3 relative addressing for non-debug build
Relative addressing needs the constant buffer to get all
the correct constants, even those defined by the shader.
The code to copy the shader constants to the constant buffer
was enabled only for debug build. Enable it always.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 6 Dec 2014 23:14:19 +0000 (00:14 +0100)]
st/nine: Remove unused code for ps
Since constant indirect adressing is not allowed for ps,
we can remove our code to handle that.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sat, 6 Dec 2014 21:26:50 +0000 (22:26 +0100)]
st/nine: Correct rules for relative adressing and constants.
relative adressing for constants is possible only for vs float
constants.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 13:56:02 +0000 (14:56 +0100)]
st/nine: Implement TEXREG2AR, TEXREG2GB and TEXREG2RGB
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 13:46:01 +0000 (14:46 +0100)]
st/nine: Implement TEXDP3TEX
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 13:42:33 +0000 (14:42 +0100)]
st/nine: Implement TEXDP3
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 13:38:25 +0000 (14:38 +0100)]
st/nine: Implement TEXDEPTH
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 13:26:12 +0000 (14:26 +0100)]
st/nine: Implement TEXM3x3SPEC
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 13:21:15 +0000 (14:21 +0100)]
st/nine: Implement TEXM3x2TEX
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 13:18:26 +0000 (14:18 +0100)]
st/nine: implement TEXM3x2DEPTH
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 28 Dec 2014 12:05:15 +0000 (13:05 +0100)]
st/nine: Fix TEXM3x3 and implement TEXM3x3VSPEC
The fix is that this line:
"src[s] = tx->regs.vT[s];" is wrong if s doesn't start from 0.
Instead access tx->regs.vT directly when needed.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Thu, 8 Jan 2015 21:21:20 +0000 (22:21 +0100)]
st/nine: Fill missing dst and src number for some instructions.
Not filling them correctly results in bad padding and later crash.
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 15:28:56 +0000 (16:28 +0100)]
st/nine: Implement TEXCOORD special behaviours
texcoord for ps < 1_4 should clamp between 0 and 1 the values.
texcrd (texcoord ps 1_4) does not clamp and can be used with
two modifiers _dw and _dz that means the channels are divided
by w or z.
Implement those in shared code, since the same modifiers can be used
for texld ps 1_4.
v2: replace DIV by RCP + MUL
v3: Remove an useless MOV
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Fri, 26 Dec 2014 10:14:05 +0000 (11:14 +0100)]
st/nine: Fix CALLNZ implementation
Nothing seems to indicates the negation modifier would be stored in the
instruction flags instead of the source modifier. tx_src_param has
already handled it if it is in the source modifier.
In addition,
when the card supports native integers, the boolean
are stored in 32 bits int and are equal to
0 or 0xFFFFFFFF.
Given 0xFFFFFFFF is NaN if it was a float, better use
UIF than IF.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Thu, 25 Dec 2014 15:50:09 +0000 (16:50 +0100)]
st/nine: Fix some fixed function pipeline operation
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 24 Dec 2014 08:58:49 +0000 (09:58 +0100)]
st/nine: Clamp ps 1.X constants
This is wine (and windows) behaviour.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 14:58:34 +0000 (15:58 +0100)]
st/nine: Remove duplicated code for ps texcoord input declaration
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Thu, 25 Dec 2014 10:37:28 +0000 (11:37 +0100)]
st/nine: Fix CND implementation
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Fri, 2 Jan 2015 13:57:00 +0000 (14:57 +0100)]
st/nine: Match REP implementation to LOOP
Previous implementation was behaving fine, but improve it by:
. Improved documentation
. Decreasing counter (comparing to 0 is likely to be faster than to constant)
. Move the counter update at the end for better performance for shaders that
break the loop earlier than when the count is done.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Mon, 8 Dec 2014 14:38:28 +0000 (15:38 +0100)]
st/nine: Rewrite LOOP implementation, and a0 aL handling
Previous implementation didn't work well with nested loops.
Instead of using several address registers, put a0 and aL
into normal registers, and copy them to one address register when
we need to use them.
Wine tests loop_index_test() and nested_loop_test() now pass correctly.
Fixes r600g crash while loading Bioshock -
bug https://bugs.freedesktop.org/show_bug.cgi?id=85696
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 14:50:53 +0000 (15:50 +0100)]
st/nine: Correct LOG on negative values
We should take the absolute value of the input.
Also return -FLT_MAX instead of -Inf for an input of 0.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 14:31:44 +0000 (15:31 +0100)]
st/nine: Handle NRM with input of null norm
When the input's xyz are 0.0, the output
should be 0.0. This is due to the fact that
Inf * 0 = 0 for dx9. To handle this case,
cap the result of RSQ to FLT_MAX. We have
FLT_MAX * 0 = 0.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 14:28:42 +0000 (15:28 +0100)]
st/nine: Handle RSQ special cases
We should use the absolute value of the input as input to ureg_RSQ.
Moreover, an input of 0.0 should return FLT_MAX.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 14:14:07 +0000 (15:14 +0100)]
st/nine: Fix POW implementation
POW doesn't match directly TGSI, since we should
take the absolute value of src0.
Fixes black textures in some games
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Fri, 26 Dec 2014 10:02:08 +0000 (11:02 +0100)]
st/nine: Fix typo for M4x4
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Fri, 26 Dec 2014 08:22:26 +0000 (09:22 +0100)]
st/nine: Correctly declare NineTranslateInstruction_Mkxn inputs
Let's say we have c1 and c2 declared in the shader and c0 given by the app
Then here we would have read c0, c1 and c2 given by the app, instead
of the correct c0, c1, c2.
This correction fixes several issues in some games.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 13:47:24 +0000 (14:47 +0100)]
st/nine: Saturate oFog and oPts vs outputs
According to docs and Wine, these two vs outputs have
to be saturated.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Mon, 22 Dec 2014 17:44:06 +0000 (18:44 +0100)]
st/nine: Remove some shader unused code
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Fri, 2 Jan 2015 12:42:11 +0000 (13:42 +0100)]
st/nine: Convert integer constants to floats before storing them when cards don't support integers
The shader code is already behaving as if they are floats when the the card doesn't support integers
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Fri, 2 Jan 2015 12:00:06 +0000 (13:00 +0100)]
st/nine: Rework of boolean constants
Convert them to shader booleans at earlier stage.
Previous code is fine, but later patch will make
integers being converted at earlier stage, so do
the same for booleans
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 7 Dec 2014 17:11:40 +0000 (18:11 +0100)]
st/nine: Add ATI1 and ATI2 support
Adds ATI1 and ATI2 support to nine.
They map to PIPE_FORMAT_RGTC1_UNORM and PIPE_FORMAT_RGTC2_UNORM,
but need special handling.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Wed, 3 Dec 2014 22:33:07 +0000 (23:33 +0100)]
st/nine: Check if srgb format is supported before trying to use it.
According to msdn, we must act as if user didn't ask srgb if we don't
support it.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Stanislaw Halik [Thu, 4 Dec 2014 15:52:22 +0000 (16:52 +0100)]
st/nine: Hack to generate resource if it doesn't exist when getting view
Buffers in the MANAGED pool are supposed to have the content in a ram buffer,
a copy in VRAM if there is enough memory (driver manages memory and decide when
to delete the buffer in VRAM).
This is not implemented properly in nine, and a VRAM copy is going to be created
when the RAM memory is filled, and the VRAM copy will get synced with the RAM
memory updates.
Due to some issues (in the implementation or in app logic), it can happen
we try to create a sampler view of the resource while we haven't created the
VRAM resource. This hack creates the resource when we hit this case, which prevents
crashing, but doesn't help with the resource content.
This fixes several games crashing at launch.
Acked-by: Axel Davy <axel.davy@ens.fr>
Acked-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Stanislaw Halik <sthalik@misaki.pl>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Tue, 2 Dec 2014 22:33:37 +0000 (23:33 +0100)]
st/nine: NineBaseTexture9: update sampler view creation
While previous code was having the correct behaviour in general,
this new code is more readable (without checking all gallium formats
manually) and has a more defined behaviour for depth stencil resources.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 7 Dec 2014 15:51:49 +0000 (16:51 +0100)]
st/nine: Return D3DERR_INVALIDCALL when trying to create a texture of bad format
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Tue, 2 Dec 2014 23:07:26 +0000 (00:07 +0100)]
st/nine: Fix crash when deleting non-implicit swapchain
The implicit swapchains are destroyed when the device instance is
destroyed. However for non-implicit swapchains, it is not the case,
and the application can have kept an reference on the swapchain
buffers to reuse them.
Fixes problems with battle.net launcher.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Tue, 2 Dec 2014 21:44:37 +0000 (22:44 +0100)]
st/nine: CubeTexture: fix GetLevelDesc
This->surfaces contains the surfaces associated to the levels
and faces. This->surfaces[6*Level] is what we want here,
since it gives us a face descriptor for the level 'Level'.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Tue, 2 Dec 2014 21:18:30 +0000 (22:18 +0100)]
st/nine: NineBaseTexture9: fix setting of last_layer
Use same similar settings as u_sampler_view_default_template
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Thu, 25 Dec 2014 10:04:10 +0000 (11:04 +0100)]
st/nine: Correctly advertise D3DPMISCCAPS_CLIPTLVERTS
The cap means D3DFVF_XYZRHW vertices will see clipping.
This is not the case when
PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION is supported, since
it'll disable clipping.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Xavier Bouchoux [Wed, 17 Dec 2014 22:10:04 +0000 (23:10 +0100)]
st/nine: Fix D3DRS_POINTSPRITE support
It's done by testing the existence of the point sprite output register *after* parsing the vertex shader.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 7 Dec 2014 15:46:28 +0000 (16:46 +0100)]
st/nine: Add new texture format strings
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Xavier Bouchoux [Mon, 8 Dec 2014 22:28:28 +0000 (23:28 +0100)]
st/nine: Add missing c++ declaration for IDirect3DVolumeTexture9
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Xavier Bouchoux [Mon, 8 Dec 2014 22:31:13 +0000 (23:31 +0100)]
st/nine: Additional defines to d3dtypes.h
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Axel Davy [Sun, 21 Dec 2014 12:03:47 +0000 (13:03 +0100)]
st/nine: Fix clip state logic
The clip state was reset everytime, incurring an overhead.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
David Heidelberger [Sun, 28 Dec 2014 01:44:53 +0000 (02:44 +0100)]
st/nine: query: remove unused variable (trivial)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: David Heidelberg <david@ixit.cz>
Eric Anholt [Wed, 21 Jan 2015 23:32:48 +0000 (15:32 -0800)]
nir: Fix setup of constant bool initializers.
brw_fs_nir has only seen scalar bools so far, thanks to vector splitting,
and the ralloc of in glsl_to_nir.cpp will *usually* get you a 0-filled
chunk of memory, so reading too large of a value will usually get you the
right bool value. But once we start doing vector bools in a few commits,
we end up getting bad values.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>