Patrick Rudolph [Thu, 15 Jan 2015 16:20:17 +0000 (17:20 +0100)]
st/nine: Remove duplicated debug message
Likely a rebase error
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Patrick Rudolph [Thu, 15 Jan 2015 08:43:33 +0000 (09:43 +0100)]
st/nine: Return E_FAIL for unused vertexdeclaration type
Add returncode E_FAIL.
Return E_FAIL for any vertexdeclaration element with type unused.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Patrick Rudolph [Thu, 15 Jan 2015 08:18:25 +0000 (09:18 +0100)]
st/nine: Missing sanity check for CALLOC return E_OUTOFMEMORY if allocation of usage_map fails
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Axel Davy [Wed, 14 Jan 2015 11:41:16 +0000 (12:41 +0100)]
st/nine: Implement ATOC hack
ATOC is an hack for Alpha to coverage
that is supported by NV and Intel.
You need to check the support for it
with CheckDeviceFormat.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 14 Jan 2015 11:33:21 +0000 (12:33 +0100)]
st/nine: Implement AMD alpha to coverage
This D3D hack is supposed to be supported
by all AMD SM2+ cards. Apps use it without
checking if they are on AMD.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 14 Jan 2015 11:10:48 +0000 (12:10 +0100)]
st/nine: Add D3DFMT_DF16 support
This depth buffer format, like D3DFMT_INTZ, can be used to read
the depth buffer values when bound to a shader.
Some apps may use this format to get better performance when
they don't need the precision of INTZ (24 bits for depth, 8 for
stencil, whereas DF16 is just 16 bits for depth)
We don't add support for DF24 yet, because it implies support
for FETCH4, which we don't support for now.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 31 Dec 2014 16:26:05 +0000 (17:26 +0100)]
st/nine: Change the value of some advertised caps
These values are taken from wine.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Thu, 8 Jan 2015 14:11:19 +0000 (15:11 +0100)]
st/nine: NineDevice9_SetClipPlane: pPlane must be non-NULL
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 10 Jan 2015 14:11:30 +0000 (15:11 +0100)]
st/nine: Implement fallback for D3DFMT_D24S8, D3DFMT_D24X8 and D3DFMT_INTZ
Some drivers support PIPE_FORMAT_S8_UINT_Z24_UNORM,
some others PIPE_FORMAT_Z24_UNORM_S8_UINT, some both.
It doesn't matter which one we use, since the d3d formats
they map to aren't lockable (app can read it directly).
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 10 Jan 2015 13:58:03 +0000 (14:58 +0100)]
st/nine: Refactor format d3d9 to pipe conversion
Move the checks of whether the format is supported
into a common place.
The advantage is that allows to handle when a d3d9
format can be mapped to several formats, and that
cards don't support all of them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 10 Jan 2015 11:38:02 +0000 (12:38 +0100)]
st/nine: Refactor nine_d3d9_to_pipe_format_map
The order of the format is changed to have
an increasing ordering of the d3d9 format values.
Some missing formats are added and matched to PIPE_FORMAT_NONE
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 10 Jan 2015 11:19:10 +0000 (12:19 +0100)]
st/nine: Improve CheckDeviceFormat debug output
Because the debug output of this function was cut in two parts,
sometimes the second part wasn't print when we would return earlier,
whereas we would like to get it.
The reason of the separation was that it's only at the end of the function
we can print what we map to the d3d9 arguments, but we can always retrieve
that info by hand.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 7 Jan 2015 17:43:20 +0000 (18:43 +0100)]
st/nine: Implement RESZ hack
This D3D hack allows to resolve a multisampled
depth buffer into a single sampled one.
Note that the implementation is slightly incorrect.
When querying the content of D3DRS_POINTSIZE,
it should return the resz code if it has been set.
This behaviour will be implemented when state changes
will be reworked. For now the current behaviour is ok,
since apps use the D3DCREATE_PUREDEVICE flag when creating
the device, which means they won't read states and in exchange
get better performance.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 24 Jan 2015 22:33:07 +0000 (23:33 +0100)]
st/nine: fix early basetexture destruction
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Patrick Rudolph [Wed, 7 Jan 2015 18:26:39 +0000 (19:26 +0100)]
st/nine: Do not leak private data in volume9.
This->data was allocated by nine, but not freed.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Patrick Rudolph [Tue, 6 Jan 2015 16:47:39 +0000 (17:47 +0100)]
st/nine: Check block alignment for compressed textures in NineSurface9_CopySurface
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Axel Davy [Wed, 7 Jan 2015 15:34:12 +0000 (16:34 +0100)]
st/nine: Commit sampler views again if srgb state changed.
This fixes a wine test and some minor visual issues on some games.
The patch is not optimal, there is probably a more efficient way to
fix this issue, but the code there already has some innefficiencies.
There is plans to rewrite that part of the code to make it more
efficient.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Thu, 8 Jan 2015 10:31:24 +0000 (11:31 +0100)]
st/nine: Fix use of D3DSP_NOSWIZZLE
D3DSP_NOSWIZZLE already contains the shift.
Detected with Clang.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 7 Jan 2015 11:01:50 +0000 (12:01 +0100)]
st/nine: Check for the correct number of constants.
This removes unneeded hack for Anno 1404.
This app is not checking the number of supporting
constants, and rely on the shader compilation to fail
if it puts too many constants.
This patch also checks for the correct number of constants for ps.
Note that we don't check the official limitations for old vs and ps
versions. The restrictions were fixed, unlike for the number of vertex
shader constants for later versions. Likely apps use the correct number,
and it's not a problem for us if it wants use more.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 7 Jan 2015 11:00:00 +0000 (12:00 +0100)]
st/nine: Introduce failure handling for shader parsing.
Instead of crashing on buggy shaders, we should return an error.
This patch introduces this behaviour in the case of invalid constant
access
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 7 Jan 2015 10:21:00 +0000 (11:21 +0100)]
st/nine: Print warnings for r500 when shader is likely to go wrong
r500 hasn't enough float constants for vs to fill all needs.
Overlapping issues can happen with complex shaders.
The fix would be to recompile shaders to include the integer
and boolean constants, instead of reserving slots for them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 7 Jan 2015 10:12:56 +0000 (11:12 +0100)]
st/nine: Declare constants only up to the maximum needed.
Previously 276 constants were declared everytime.
This patch makes shaders declare constants up to the maximum
constant needed and moves the moment we print the TGSI
shader after the moment we declare the constants.
This is needed for r500, since when indirect addressing is used,
it cannot reduce the amount of constants needed, and that it is
restricted to 256 constant slots.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Wed, 7 Jan 2015 10:07:23 +0000 (11:07 +0100)]
st/nine: Refactor how user constbufs sizes are calculated
Count explicitly the slots for float, int and bool constants,
and deduce the constbuf size in nine_shader.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 3 Jan 2015 10:16:23 +0000 (11:16 +0100)]
st/nine: Explicit nine requirements
This patch raises nine requirements and disables nine for old
hw that don't match them.
Currently for these cards only games that don't have tight requirements
would work well with nine. However nine is missing several checks
regarding these limitations.
To make code and future patches less heavy, dropping support for these old
card seems a good solution.
That makes r500 the only dx9 generation cards supported by nine. It seems the one
with the less limitations for nine. Still not everything is ok, and we'll have
for example to implement shader recompilation for these cards to include
integer and boolean constants in the shader.
Eventually when this is done, we can reintroduce support for older cards.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Axel Davy [Sat, 17 Jan 2015 13:30:17 +0000 (14:30 +0100)]
gallium: Add MULTISAMPLE_Z_RESOLVE cap
Resolving a multisampled depth texture into
a single sampled texture is supported on >= SM4.1
hw. It is possible some previous hw support it.
The ability was tested on radeonsi and nvc0.
Apparently is is also supported for radeon >= r700.
This patch adds the MULTISAMPLE_Z_RESOLVE cap and
add it to the drivers. It is advertised for drivers
for which it is sure the ability is supported.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Laura Ekstrand [Thu, 5 Feb 2015 00:36:24 +0000 (16:36 -0800)]
GL: Update glext.h to Revision 29735 (
20150202).
Khronos modified glext.h to get rid of GL_TEXTURE_BINDING, a special enum
added for ARB_direct_state_access. This enum was ruled unimplementable.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Jose Fonseca [Thu, 5 Feb 2015 14:33:06 +0000 (14:33 +0000)]
llvmpipe: Trivially advertise PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT.
Nothing special needs to be done.
Even though llvmpipe copies constant (ie uniform) buffers internally, the
application is supposed to flush and sync, so all should work.
All bufferstorage piglit tests pass.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Matt Turner [Wed, 7 Jan 2015 20:01:43 +0000 (12:01 -0800)]
i965: Remove now unnecessary Gen8 CMP destination type override.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 7 Jan 2015 19:52:05 +0000 (11:52 -0800)]
i965: Set CMP's destination type to src0's type.
Allows CMP instructions with float sources to be compacted and coissued.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 4 Feb 2015 01:38:49 +0000 (17:38 -0800)]
i965/fs: Implement the WaCMPInstFlagDepClearedEarly work-around.
Prevents piglit regressions from the next patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jose Fonseca [Wed, 4 Feb 2015 15:21:41 +0000 (15:21 +0000)]
gallium/util: Don't implement u_bit_scan64 on MSVC.
As ffsll doesn't exist in MSVC yet, and u_bit_scan64 is only used by
radeonsi which is never built with MSVC.
This is just a stop-gap fix to unbreak MSVC build until we refactor these
mathematical portability wrappers into src/util.
Trivial.
Jose Fonseca [Wed, 4 Feb 2015 14:58:20 +0000 (14:58 +0000)]
gallium/util: Define ffsll on MinGW.
Trivial.
(Fixing MSVC will be far less so, as _BitScanForward64 is only supported on x64.)
Marek Olšák [Sat, 31 Jan 2015 20:43:37 +0000 (21:43 +0100)]
radeonsi: implement polygon stippling
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 3 Feb 2015 11:49:19 +0000 (12:49 +0100)]
radeonsi: add polygon stipple texture slot
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Jan 2015 19:09:46 +0000 (20:09 +0100)]
radeonsi: deduce rasterizer primitive type at the beginning of draw_vbo
I will need this for polygon stippling.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Jan 2015 16:22:35 +0000 (17:22 +0100)]
radeonsi: allow 64 descriptors per array
We need a slot for the stipple texture and the pixel shader already uses
32 textures (16 API slots + 16 FMASK slots).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 1 Feb 2015 13:38:48 +0000 (14:38 +0100)]
radeonsi: add support for sampler views where resource = NULL
The hardware obeys swizzles even if the resource is NULL.
This will be used by set_polygon_stipple.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 1 Feb 2015 12:16:45 +0000 (13:16 +0100)]
radeonsi: add support for NULL texture sampler views that return (0,0,0,1)
This used to hang.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 1 Feb 2015 12:16:06 +0000 (13:16 +0100)]
radeonsi: fix a crash when binding a NULL sampler view list
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 1 Feb 2015 15:58:08 +0000 (16:58 +0100)]
radeonsi: move the buffer descriptor to the end of the image descriptor
This will allow supporting NULL textures.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Jan 2015 16:31:23 +0000 (17:31 +0100)]
radeonsi: don't use tgsi_parse_context to get processor type
Also remove unused "tokens".
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Jan 2015 18:00:44 +0000 (19:00 +0100)]
radeonsi: fix instanced arrays with non-zero start instance
Fixes piglit ARB_base_instance/arb_base_instance-drawarrays.
Cc: 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 1 Feb 2015 12:47:01 +0000 (13:47 +0100)]
r600g,radeonsi: don't append to streamout buffers that haven't been used yet
The FILLED_SIZE counter is uninitialized at the beginning, so we can't use it.
Instead, use offset = 0, which is what we always do when not appending.
This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*.
Yes, the test does use transform feedback.
Cc: 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 31 Jan 2015 17:58:19 +0000 (18:58 +0100)]
gallium: set PIPE_MAX_SAMPLERS to 18
For drivers that use higher slots not to crash in tgsi_shader_info.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Sat, 31 Jan 2015 17:56:54 +0000 (18:56 +0100)]
gallium/u_pstipple: add ability to specify a fixed texture unit
E.g. r600g can use slot 17, which is outside of the API range.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Sat, 31 Jan 2015 16:15:16 +0000 (17:15 +0100)]
gallium/util: add u_bit_scan64
Same as u_bit_scan, but for uint64_t.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Sat, 31 Jan 2015 16:17:05 +0000 (17:17 +0100)]
tgsi: add tgsi_get_processor_type helper from radeon
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Kenneth Graunke [Tue, 3 Feb 2015 08:50:23 +0000 (00:50 -0800)]
i965/fs: Fix saturate on MAD and LRP with the NIR backend.
Fixes misrendering in "Witcher 2" with INTEL_USE_NIR=1, and probably
many other programs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Iago Toral Quiroga [Mon, 2 Feb 2015 12:59:27 +0000 (13:59 +0100)]
mesa: Fix _mesa_format_convert fallback path when src is not an array format
When a rebase swizzle is provided and we call _mesa_swizzle_and_convert
after unpacking the source format we were always passing normalized=false.
We should pass true or false depending on the formats involved in the
conversion for the byte and float paths (the integer path cannot ever be
normalized).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Park, Jeongmin [Tue, 3 Feb 2015 02:52:03 +0000 (11:52 +0900)]
st/osmesa: Fix osbuffer->textures indexing
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88930
Cc: 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Connor Abbott [Tue, 3 Feb 2015 06:50:49 +0000 (01:50 -0500)]
i965/nir: use redundant phi optimization
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Connor Abbott [Tue, 3 Feb 2015 06:49:44 +0000 (01:49 -0500)]
nir: add an optimization to remove useless phi nodes
This removes phi nodes whose sources all point to the same thing.
Shader-db results:
total NIR instructions in shared programs:
2045293 ->
2041209 (-0.20%)
NIR instructions in affected programs: 126564 -> 122480 (-3.23%)
helped: 615
HURT: 0
total FS instructions in shared programs:
4321840 ->
4320392 (-0.03%)
FS instructions in affected programs: 24622 -> 23174 (-5.88%)
helped: 138
HURT: 0
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 3 Feb 2015 18:10:59 +0000 (10:10 -0800)]
nir/validate: Ensure that phi sources are SSA-only
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 3 Feb 2015 20:42:07 +0000 (12:42 -0800)]
nir/validate: Validate that only float ALU outputs are saturated
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 3 Feb 2015 20:41:36 +0000 (12:41 -0800)]
nir/lower_source_mods: Don't lower saturate for non-float outputs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 22 Jan 2015 00:00:55 +0000 (16:00 -0800)]
i965/fs_nir: Get rid of get_alu_src
Originally, get_alu_src was supposed to handle resolving swizzles and
things like that. However, now that basically every instruction we have
only takes scalar sources, we don't really need it anymore. The only case
where it's still marginally useful is for the mov and vecN operations that
are left over from SSA form. We can handle those cases as a special case
easily enough. As a side-effect, we don't need the vec_to_movs pass
anymore.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we detect if we need an extra copy for swizzling. The
old code involved a pile of confusing switch fall-throughs; we now use a
loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 23 Dec 2014 22:44:19 +0000 (14:44 -0800)]
i965/fs: Use NIR's scalarizing abilities and stop handling vectors
Now that we can scalarize with NIR, there's no need for all this code
anymore. Let's get rid of it and just do scalar operations.
v2: run copy prop before lowering phi nodes
v3: Get rid of the "emit(...)->saturate = foo" pattern
v4: Run alu_to_scalar as an optimization pass
total instructions in shared programs:
5998321 ->
5974070 (-0.40%)
instructions in affected programs: 732075 -> 707824 (-3.31%)
helped: 3137
HURT: 191
GAINED: 18
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Wed, 21 Jan 2015 23:23:32 +0000 (15:23 -0800)]
nir: Add a pass to lower vector phi nodes to scalar phi nodes
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add better comments
- Use nir_ssa_dest_init and nir_src_for_ssa more places
- Fix some void * casts
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we determine whether or not to sccalarize a phi node to
make the recursion non-bogus
- Treat load_const instructions as scalarizable
v4 Jason Ekstrand <jason.ekstrand@intel.com>:
- Allow uniform and input loads to be scalarizable
v5 Jason Ekstrand <jason.ekstrand@intel.com>:
- Also consider loads of inputs (varying, uniform, or ubo) to be
scalarizable. We were already doing this for load_var on uniforms and
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 29 Jan 2015 02:37:32 +0000 (18:37 -0800)]
i965/fs: Add support for constant propagating into sources with modifiers.
All but 16 of the programs helped were ARB fp programs.
total instructions in shared programs:
5949286 ->
5945470 (-0.06%)
instructions in affected programs: 275162 -> 271346 (-1.39%)
helped: 1197
GAINED: 1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Matt Turner [Fri, 30 Jan 2015 23:13:48 +0000 (15:13 -0800)]
i965/vec4: Use abs/negate functions in const propagation.
No changes in shader-db.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Matt Turner [Fri, 30 Jan 2015 22:14:43 +0000 (14:14 -0800)]
i965: Add function to take the abs of immediates.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Matt Turner [Thu, 29 Jan 2015 19:15:10 +0000 (11:15 -0800)]
i965: Add function to negate immediates.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Matt Turner [Thu, 29 Jan 2015 19:16:43 +0000 (11:16 -0800)]
i965: Mark UB/B immediates as unreachable.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Matt Turner [Tue, 3 Feb 2015 01:26:49 +0000 (17:26 -0800)]
gallium/util: Don't use __builtin_clrsb in util_last_bit().
Unclear circumstances lead to undefined symbols on x86.
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=536916
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Matt Turner [Tue, 3 Feb 2015 01:23:25 +0000 (17:23 -0800)]
glsl/list: Note that exec_lists may not be realloc'd.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nils Wallménius [Thu, 22 Jan 2015 19:47:28 +0000 (20:47 +0100)]
st/mesa: mark constant array of swizzles as static const
This saves about 0.5k in the text section for a gallium driver
on amd64.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Eduardo Lima Mitev [Wed, 21 Jan 2015 14:32:47 +0000 (15:32 +0100)]
mesa: Returns a GL_INVALID_VALUE error on several APIs when buffer size is negative
Section 2.3.1 (Errors) of the OpenGL 4.5 spec says:
"If a negative number is provided where an argument of type sizei or
sizeiptr is specified, an INVALID_VALUE error is generated.
This patch adds checks for negative buffer size values passed to different APIs.
It also moves up the check on other APIs that already had it, making it the first
error check performed in the function, for consistency.
While there may be other APIs throughtout the code lacking this check (or at least
not at the beginning of the function), this patch focuses on the cases that break
the dEQP tests listed below. It could be a good excersize for the future to check
all other cases, and improve consistency in the order of the checks throughout the
whole Mesa code base.
This fixes 5 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_attached_shaders
* dEQP-GLES3.functional.negative_api.state.get_shader_source
* dEQP-GLES3.functional.negative_api.state.get_active_uniform
* dEQP-GLES3.functional.negative_api.state.get_active_attrib
* dEQP-GLES3.functional.negative_api.shader.program_binary
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Samuel Iglesias Gonsalvez [Fri, 16 Jan 2015 15:00:13 +0000 (16:00 +0100)]
mesa: fix error value in GetFramebufferAttachmentParameteriv for OpenGL ES 3.0
Section 6.1.13 "Framebuffer Object Queries" of OpenGL ES 3.0 spec:
"If the default framebuffer is bound to target, then attachment must be
BACK, identifying the color buffer; DEPTH, identifying the depth buffer; or
STENCIL, identifying the stencil buffer."
OpenGL ES 3.0, section 2.5 (GL Errors):
"If a command that requires an enumerated value is passed a
symbolic constant that is not one of those specified as allowable
for that command, an INVALID_ENUM error is generated."
Then change the returned error to INVALID_ENUM.
Fixes:
dEQP-GLES3.functional.fbo.api.attachment_query_default_fbo
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Iago Toral Quiroga [Tue, 20 Jan 2015 16:09:59 +0000 (17:09 +0100)]
glsl: Improve precision of mod(x,y)
Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:
Operation Precision error
-----------------------------------------------------
mod(-1.
951171875, 1.
9980468750) 0.
0000000447
mod(121.57, 13.29) 0.
0000023842
mod(3769.12, 321.99) 0.
0000762939
mod(3769.12, 1321.99) 0.
0001220703
mod(-987654.125, 123456.984375) 0.
0160663128
mod( 987654.125, 123456.984375) 0.
0312500000
This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:
mod(x,y) = x - y * floor(x/y)
This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.
v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.
Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eduardo Lima Mitev [Tue, 20 Jan 2015 12:58:45 +0000 (13:58 +0100)]
mesa: Allow querying for GL_PRIMITIVE_RESTART_FIXED_INDEX under GLES 3
GLES 3.0.0 spec introduces context state PRIMITIVE_RESTART_FIXED_INDEX
(2.8.1 Transferring Array Elements, page 26) which is not currently
possible to query using glGet*() funcs.
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getboolean
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger64
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Iago Toral Quiroga [Mon, 19 Jan 2015 11:32:12 +0000 (12:32 +0100)]
glsl: can't have 'const' qualifier used with struct or interface block members
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Iago Toral Quiroga [Mon, 19 Jan 2015 11:32:10 +0000 (12:32 +0100)]
glsl: interface blocks must be declared at global scope
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Iago Toral Quiroga [Mon, 19 Jan 2015 11:32:09 +0000 (12:32 +0100)]
i965: Fix negate with unsigned integers
For code such as:
uint tmp1 = uint(in0);
uint tmp2 = -tmp1;
float out0 = float(tmp2);
We produce code like:
mov(8) g5<1>.xF -g9<4,4,1>.xUD
which does not produce correct results. This code produces the
results we would expect if tmp1 and tmp2 were signed integers
instead.
It seems that a similar problem was detected and addressed when
using negations with unsigned integers as part of condionals, but
it looks like the problem has a wider impact than that.
This patch fixes the problem by preventing copy-propagation of
negated UD registers in all scenarios, not only in conditionals.
Fixes the following 24 dEQP tests:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_*
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Jose Fonseca [Tue, 3 Feb 2015 10:16:50 +0000 (10:16 +0000)]
scons: Fix Windows builds with LLVM 3.5.
LLVMBitReader dependency was introduced, as pointed out by Rob Conde.
Ilia Mirkin [Wed, 31 Dec 2014 07:20:51 +0000 (02:20 -0500)]
st/mesa: add EXT_polygon_offset_clamp support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Sun, 1 Feb 2015 14:01:50 +0000 (09:01 -0500)]
gallium: add a cap to determine whether the driver supports offset_clamp
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Wed, 31 Dec 2014 07:15:23 +0000 (02:15 -0500)]
i965/gen6+: enable EXT_polygon_offset_clamp
Replace the hard-coded 0's with the context clamp value.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ilia Mirkin [Wed, 31 Dec 2014 07:07:55 +0000 (02:07 -0500)]
mesa: add support for GL_EXT_polygon_offset_clamp
Nothing enables the extension yet, but the values are now available.
The spec calls for it to only be exposed for GL 3.3+, which is core-only
in mesa. Instead we allow any driver to enable it, including in a compat
context for any GL version.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Ilia Mirkin [Wed, 31 Dec 2014 06:47:15 +0000 (01:47 -0500)]
glapi: add GL_EXT_polygon_offset_clamp
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Kenneth Graunke [Fri, 22 Aug 2014 04:49:07 +0000 (21:49 -0700)]
glsl: Pick ast_conditional branch regardless of op1/2 being constant.
If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.
Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.
No changes in shader-db, probably because we usually optimize this later
anyway. But it does make us generate less stupid code up front.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Wed, 6 Aug 2014 08:08:19 +0000 (01:08 -0700)]
i965: Add a better PRM citation for the IMS dimension mangling.
Paul originally had to reverse engineer these formulas based on the
description about how the sampler works. The description here is not
the easiest to follow - especially given that it's from the Sandybridge
era, when the hardware only did 4x multisampling.
Jordan and I recently found another part of the documentation where they
simply state that IMS dimensions must be adjusted by a set of formulas.
Quoting this section provides an easy to follow explanation for the
code, including 2x/4x/8x/16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Laura Ekstrand [Mon, 2 Feb 2015 18:20:57 +0000 (10:20 -0800)]
swrast: Whitespace fixes.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Laura Ekstrand [Fri, 30 Jan 2015 22:03:53 +0000 (14:03 -0800)]
DD: Refactor BlitFramebuffer.
In preparation for glBlitNamedFramebuffer, the DD table function
BlitFramebuffer needs to accept two arbitrary framebuffer objects rather
than assuming ctx->ReadBuffer and ctx->DrawBuffer.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Laura Ekstrand [Fri, 23 Jan 2015 21:43:16 +0000 (13:43 -0800)]
GL: Update glext.h to Khronos Revision 29537.
Khronos Revision 29537 fixes ARB_direct_state_access function prototypes that
had GLsizei where they should have had GLsizeiptr. The mainly affects
functions related to buffer objects.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Mon, 2 Feb 2015 17:49:44 +0000 (09:49 -0800)]
i965: Don't use tiled_memcpy to download from RGBX or BGRX surfaces
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Neil Roberts [Sat, 31 Jan 2015 16:45:09 +0000 (17:45 +0100)]
dir-locals.el: Don't set variables for non-programming modes
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.
v2: Apply to all the .dir-locals.el files, not just the one in the
root directory.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Iago Toral Quiroga [Fri, 30 Jan 2015 08:03:57 +0000 (09:03 +0100)]
i965: Fix intel_miptree_copy_teximage for GL_TEXTURE_1D_ARRAY
For GL_TEXTURE_1D_ARRAY targets we store the depth of the array
in the Height field and leave Depth=1 in the underlying texture
object. When we call intel_miptree_copy_teximage in the process
of re-creating a miptree (possibily because the number of miplevels
has changed) we didn't account for this, so we where only copying
texture images for the first slice.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Eric Anholt [Sun, 1 Feb 2015 22:09:12 +0000 (14:09 -0800)]
vc4: Kill a bunch of color write calculation when colormask is all off.
I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful. Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.
total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs: 101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs: 1639 -> 1138 (-30.57%)
Fredrik Höglund [Sun, 1 Feb 2015 21:53:40 +0000 (22:53 +0100)]
docs: Update ARB_direct_state_access
Mark vertex array objects as started.
Martin Peres [Thu, 29 Jan 2015 14:54:08 +0000 (16:54 +0200)]
doc: break down ARB_direct_state_access in GL3.txt
A student was wondering what was going on + I started working on it too.
CC: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Eric Anholt [Fri, 30 Jan 2015 19:23:26 +0000 (11:23 -0800)]
vc4: Dump the VPM read index in QIR disasm.
Since the VPM reads have to be in order, it's useful to see their indices
in the dump.
Jason Ekstrand [Sat, 31 Jan 2015 02:47:59 +0000 (18:47 -0800)]
i965/pixel_read: Don't try to do a tiled_memcpy from a multisampled buffer
The GL spec guarantees that glGetTexImage will never get a multisampled
texture, but this is not true for glReadPixels. If we get a multisampled
buffer, we have to do a multisample resolve on it before we can pull the
data down for the user. Since this isn't practical to handle in
tiled_memcpy, we just fall back to the other paths that can handle this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Tue, 16 Dec 2014 14:11:57 +0000 (16:11 +0200)]
i965: Enable L3 caching of buffer surfaces.
And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its
semantics vary greatly from one generation to another, so it kind of
encourages the caller to pass 0 which is the only valid setting across
generations. After this commit the hardware-specific code decides what the
best cacheability settings are for buffer surfaces, just like we do for
textures.
This together with some additional changes coming is expected to improve
performance of pull constants, buffer textures, atomic counters and image
objects on Gen7 and up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
José Fonseca [Mon, 19 Jan 2015 23:07:17 +0000 (23:07 +0000)]
egl: Pass the correct X visual depth to xcb_put_image().
The dri2_x11_add_configs_for_visuals() function happily matches a 32
bits EGLconfig with a 24 bits X visual. However it was passing 32bits
depth to xcb_put_image(), making X server unhappy:
https://github.com/apitrace/apitrace/issues/313#issuecomment-
70571911
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Wed, 28 Jan 2015 11:31:06 +0000 (03:31 -0800)]
intel/pixel_read: Properly flip the results for window system buffers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841
Reviewed-by: Chad Versace <chad.versace@intel.com>
Jason Ekstrand [Wed, 28 Jan 2015 11:30:32 +0000 (03:30 -0800)]
i965/tiled_memcpy: Support a signed linear pitch
Reviewed-by: Chad Versace <chad.versace@intel.com>
Jason Ekstrand [Fri, 30 Jan 2015 22:24:13 +0000 (14:24 -0800)]
main: Add STENCIL_INDEX formats to base_tex_format
This fixes a bug on BDW when our meta-based stencil blit path assert-fails
due to an invalid internal format even though we do support the
ARB_stencil_texturing extension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 30 Jan 2015 23:42:59 +0000 (15:42 -0800)]
teximage: Don't indent switch cases
No functional change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Fri, 30 Jan 2015 16:12:46 +0000 (09:12 -0700)]
mesa: remove some dead display list code
The size of a Node is always four bytes so no need for the old code
that was used when sizeof(Node)==8.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Fri, 30 Jan 2015 15:54:19 +0000 (08:54 -0700)]
mesa: remove stale comment in dlist.c code
sizeof(Node) is always 4 bytes.
Reviewed-by: Matt Turner <mattst88@gmail.com>